Query regarding solr plugin.

2011-04-22 Thread rajini maski
One doubt regarding adding the solr plugin.


  I have a new java file created that includes few changes in
SynonymFilterFactory.java. I want this java file to be added to solr
instance.

I created a package as : org.apache.pco.search
This includes OrcSynonymFilterFactory java class extends
BaseTokenFilterFactory implements ResourceLoaderAware {code.}

Packages included: import org.apache.solr.analysis.*;

import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.solr.common.ResourceLoader;
import org.apache.solr.common.util.StrUtils;
import org.apache.solr.util.plugin.ResourceLoaderAware;

import java.io.File;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;


 I exported this java file in eclipse,
 selecting  File tab-Export to package
-org.apache.pco.search-OrchSynonymFilterFactory.java
 and generated jar file - org.apache.pco.orchSynonymFilterFactory.jar

 This jar file placed in /lib folder of solr home instance
 Changes in solr config - 

 Now i want to add this in schema fieldtype for synonym filter as



But i am not able to do it.." It has an error
as org.apache.solr.common.SolrException: Error loading class
'pointcross.orchSynonymFilterFactory' at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:388)
at org.apache.solr.util.plugin.AbstractPluginLoader"

Please can anyone tell me , What is the mistake i am doing here and the fix
for it ?

Rajani


Multi-word Solr Synonym issue

2011-04-22 Thread Pla Gong
I am trying to do a simple mapping of a 2 word term to a 1 word term and
it does not work. See my configuration at the bottom of the email. My
scenario is that I have a term called "pond care" and I want to map it
to the term "fountain".  So whenever a user enters the term "pond care"
in the search box, I want Solr to search on the word "fountain".
Searching on  "fountain" or "pond food" should returns the same number
of products.  I try all type of filter combination and cannot get it to
work.  I use the Solr analysis and "pond food" does map to "fountain"
but when test on Solr Admin, the search would not query on "fountain"
but only on "pond care".  Here is my log from solr Admin search:

- 
- 
  fountain 
   
  
  pond food 
  pond food 
  +text:pond +text:food 
  +text:pond +text:food 
- 
  1.4865229 = (MATCH) sum of:
0.5013988 = (MATCH) weight(text:pond in 1137), product of: 0.5317582 =
queryWeight(text:pond), product of: 3.180518 = idf(docFreq=730,
maxDocs=6470) 0.16719232 = queryNorm 

I am new to Solr and I have Google the issue but I did not find a
solution that will work for my case.  Please let me know if you  have
encounter this issue and how you resolved it and what configuration you
used.  I want term to term mapping results from the query and not a
combination of the two terms.

I would greatly appreciate any help.

Thanks,
Pla

---field type and filter Configuration


  





  
  






  



Re: How to return score without using _val_

2011-04-22 Thread Em
Hi,

did you have a look at the query()-function mentioned in the Wiki?
It sounds like something you should give a try!

Regards,
Em


Bill Bell wrote:
> 
> I know that the _val_ is the only thing influencing the score.
> 
> The fq is just to limit also by those queries.
> 
> What I am asking is if it is possible to just influence the score using
> _val_ but not in the Q parameter?
> 
> Something like bq=val_:"{!type=dismax qf=$qqf  v=$qspec}"
> _val_:"{!type=dismax
> qt=dismaxname v=$qname}"
> 
> 
> Is there something like that?
> 
> On 4/21/11 2:45 AM, "Em"  wrote:
> 
>>Hi,
>>
>>I agree with Yonik here - I do not understand what you would like to do as
>>well.
>>But some additional note from my side:
>>Your FQs never influences the score! Of course you can specify the same
>>query twice, once as a filter - query and once as a regular query but I do
>>not see the reason to do so. It sounds like unnecessary effort without a
>>win. 
>>
>>Regards,
>>Em 
>>
>>
>>Bill Bell wrote:
>>> 
>>> I would like to influence the score but I would rather not mess with the
>>> q=
>>> field since I want the query to dismax for Q.
>>> 
>>> Something like:
>>> 
>>> fq={!type=dismax qf=$qqf v=$qspec}&
>>> fq={!type=dismax qt=dismaxname v=$qname}&
>>> q=_val_:"{!type=dismax qf=$qqf  v=$qspec}" _val_:"{!type=dismax
>>> qt=dismaxname v=$qname}"
>>> 
>>> Is there a way to do a filter and add the FQ to the score by doing it
>>> another way? 
>>> 
>>> Also does this do multiple queries? Is this the right way to do it?
>>> 
>>
>>
>>--
>>View this message in context:
>>http://lucene.472066.n3.nabble.com/How-to-return-score-without-using-val-t
>>p2841443p2846317.html
>>Sent from the Solr - User mailing list archive at Nabble.com.
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-return-score-without-using-val-tp2841443p2850979.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr sorting problem

2011-04-22 Thread Pratik
Were you able to get it work .. if yes how ? 
I'm having almost the same problem. 

I used the " fieldType name="alphaOnlySort" class="solr.TextField" as in
the sample schema.xml , to define a field named "alphaname". 
Then copied from one of the fields name "foodDescUS" to "alphaname". 
When i try to sort using alphaname ... i get this error :- 
The field :foodDesc present in DataConfig does not have a counterpart in
Solr Schema 

Please help 

Thanks 
Pratik 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2851229.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing 20M documents from MySQL with DIH

2011-04-22 Thread Erick Erickson
{{{A custom indexer, so that's a fairly common practice? So when you are
dealing with these large indexes, do you try not to fully rebuild them
when you can? It's not a nightly thing, but something to do in case of
a disaster? Is there a difference in the performance of an index that
was built all at once vs. one that has had delta inserts and updates
applied over a period of months?}}}

Is it a common practice? Like all of this, "it depends". It's certainly
easier to let DIH do the work. Sometimes DIH doesn't have all the
capabilities necessary. Or as Chris said, in the case where you already
have a system built up and it's easier to just grab the output from
that and send it to Solr, perhaps with SolrJ and not use DIH. Some people
are just more comfortable with their own code...

"Do you try not to fully rebuild". It depends on how painful a full rebuild
is. Some people just like the simplicity of starting over every day/week/month.
But you *have* to be able to rebuild your index in case of disaster, and
a periodic full rebuild certainly keeps that process up to date.

"Is there a difference...delta inserts...updates...applied over months". Not
if you do an optimize. When a document is deleted (or updated), it's only
marked as deleted. The associated data is still in the index. Optimize will
reclaim that space and compact the segments, perhaps down to one.
But there's no real operational difference between a newly-rebuilt index
and one that's been optimized. If you don't delete/update, there's not
much reason to optimize either

I'll leave the DIH to others..

Best
Erick

On Thu, Apr 21, 2011 at 8:09 PM, Scott Bigelow  wrote:
> Thanks for the e-mail. I probably should have provided more details,
> but I was more interested in making sure I was approaching the problem
> correctly (using DIH, with one big SELECT statement for millions of
> rows) instead of solving this specific problem. Here's a partial
> stacktrace from this specific problem:
>
> ...
> Caused by: java.io.EOFException: Can not read response from server.
> Expected to read 4 bytes, read 0 bytes before connection was
> unexpectedly lost.
>        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539)
>        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)
>        ... 22 more
> Apr 21, 2011 3:53:28 AM
> org.apache.solr.handler.dataimport.EntityProcessorBase getNext
> SEVERE: getNext() failed for query 'REDACTED'
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
> Communications link failure
>
> The last packet successfully received from the server was 128
> milliseconds ago.  The last packet sent successfully to the server was
> 25,273,484 milliseconds ago.
> ...
>
>
> A custom indexer, so that's a fairly common practice? So when you are
> dealing with these large indexes, do you try not to fully rebuild them
> when you can? It's not a nightly thing, but something to do in case of
> a disaster? Is there a difference in the performance of an index that
> was built all at once vs. one that has had delta inserts and updates
> applied over a period of months?
>
> Thank you for your insight.
>
>
> On Thu, Apr 21, 2011 at 4:31 PM, Chris Hostetter
>  wrote:
>>
>> : For a new project, I need to index about 20M records (30 fields) and I
>> : have been running into issues with MySQL disconnects, right around
>> : 15M. I've tried several remedies I've found on blogs, changing
>>
>> if you can provide some concrete error/log messages and the details of how
>> you are configuring your datasource that might help folks provide better
>> suggestions -- youv'e said you run into a problem but you havne't provided
>> any details for people to go on in giving you feedback.
>>
>> : resolved the issue. It got me wondering: Is this the way everyone does
>> : it? What about 100M records up to 1B; are those all pulled using DIH
>> : and a single query?
>>
>> I've only recently started using DIH, and while it definitely has a lot
>> of quirks/anoyances, it seems like a pretty good 80/20 solution for
>> indexing with Solr -- but that doens't mean it's perfect for all
>> situations.
>>
>> Writing custom indexer code can certianly make sense in a lot of cases --
>> particularly where you already have a data pblishing system that you wnat
>> to tie into directly -- the trick is to ensure you have a decent strategy
>> for rebuilding the entire index should the need arrise (but this is relaly
>> only an issue if your primary indexing solution is incremental -- many use
>> cases can be satisifed just fine with a brute force "full rebuild
>> periodically" impelmentation.
>>
>>
>> -Hoss
>>
>


Re: Query regarding solr plugin.

2011-04-22 Thread Erick Erickson
First I appreciate your writeup of the problem, it's very helpful when people
take the time to put in the details

I can't reconcile these two things:

{{{

as org.apache.solr.common.SolrException: Error loading class
'pointcross.orchSynonymFilterFactory' at}}}

This seems to indicate that your config file is really looking for
"pointcross.orchSynonymFilterFactory" rather than
"org.apachepco.search.orchSynonymFilterFactory".

Do you perhaps have another definition in your config
"pointcross.orchSynonymFilterFactory"?

Try running "jar -tfv " to see what classes
are actually defined in the file in the solr lib directory. Perhaps
it's not what you expect (Perhaps Eclipse did something
unexpected).

Given the anomaly above (the error reported doesn't correspond to
the class you defined) I'd also look to see if you have any old
jars lying around that you somehow get to first.

Finally, is there any chance that your "pointcross.orchSynonymFilterFactory"
is a dependency of "org.apachepco.search.orchSynonymFilterFactory"? In
which case Solr may be finding
"org.apachepco.search.orchSynonymFilterFactory"
but failing to load a dependency (that would have to be put in the lib
or the jar).

Hope that helps
Erick



On Fri, Apr 22, 2011 at 3:00 AM, rajini maski  wrote:
> One doubt regarding adding the solr plugin.
>
>
>          I have a new java file created that includes few changes in
> SynonymFilterFactory.java. I want this java file to be added to solr
> instance.
>
> I created a package as : org.apache.pco.search
> This includes OrcSynonymFilterFactory java class extends
> BaseTokenFilterFactory implements ResourceLoaderAware {code.}
>
> Packages included: import org.apache.solr.analysis.*;
>
> import org.apache.lucene.analysis.Token;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.solr.common.ResourceLoader;
> import org.apache.solr.common.util.StrUtils;
> import org.apache.solr.util.plugin.ResourceLoaderAware;
>
> import java.io.File;
> import java.io.IOException;
> import java.io.Reader;
> import java.io.StringReader;
> import java.util.ArrayList;
> import java.util.List;
>
>
>  I exported this java file in eclipse,
>  selecting  File tab-Export to package
> -org.apache.pco.search-OrchSynonymFilterFactory.java
>  and generated jar file - org.apache.pco.orchSynonymFilterFactory.jar
>
>  This jar file placed in /lib folder of solr home instance
>  Changes in solr config - 
>
>  Now i want to add this in schema fieldtype for synonym filter as
>
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>
> But i am not able to do it.." It has an error
> as org.apache.solr.common.SolrException: Error loading class
> 'pointcross.orchSynonymFilterFactory' at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
> at
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:388)
> at org.apache.solr.util.plugin.AbstractPluginLoader"
>
> Please can anyone tell me , What is the mistake i am doing here and the fix
> for it ?
>
> Rajani
>


Re: How to return score without using _val_

2011-04-22 Thread Yonik Seeley
On Fri, Apr 22, 2011 at 12:26 AM, Bill Bell  wrote:
> I know that the _val_ is the only thing influencing the score.

What creates the score is the main query.
There are tons of ways to build up that main query in different ways.
So the answer to your question is "yes", you can influence the score
without messing with the "q" param.
To get more help, you need to get more specific about what you are
trying to do though (or come up with a concrete example using the
example docs, etc).

> The fq is just to limit also by those queries.
>
> What I am asking is if it is possible to just influence the score using
> _val_ but not in the Q parameter?
>
> Something like bq=val_:"{!type=dismax qf=$qqf  v=$qspec}"
> _val_:"{!type=dismax
> qt=dismaxname v=$qname}"
>
>
> Is there something like that?

Sure.  Doesn't what you tried above work?


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: solr sorting problem

2011-04-22 Thread Erick Erickson
Let's see the query you submit. This looks like a typo or an
improperly specified field name

":foodDesc"

Best
Erick

On Fri, Apr 22, 2011 at 8:18 AM, Pratik  wrote:
> Were you able to get it work .. if yes how ?
> I'm having almost the same problem.
>
> I used the " fieldType name="alphaOnlySort" class="solr.TextField" as in
> the sample schema.xml , to define a field named "alphaname".
> Then copied from one of the fields name "foodDescUS" to "alphaname".
> When i try to sort using alphaname ... i get this error :-
> The field :foodDesc present in DataConfig does not have a counterpart in
> Solr Schema
>
> Please help
>
> Thanks
> Pratik
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2851229.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Index upgrade from 1.4.1 to 3.1 and 4.0

2011-04-22 Thread Ofer Fort
Nobody?
Am I the only one in need of upgrading an index that was created with 1.4.1?

Thanks for any info
Ofer

On Friday, April 22, 2011, Ofer Fort  wrote:
> Hi all,
> While doing some tests, I realized that an index that was created with
> solr 1.4.1 is readable by solr 3.1, but nt readable by solr 4.0.
> If I plan to migrate my index to 4.0, and I prefer not to reindex it
> all, what would be my best course of action?
> Will it be possible to continue to write to the index with 3.1? Will
> that make it readable from 4.0 or only the newly created segments?
> If I optimize it using 3.1, will that make it readable also from 4.0?
> Thanks
> Ofer
>


Re: Multi-word Solr Synonym issue

2011-04-22 Thread Otis Gospodnetic
Hi,

Maybe you are doing query-time synonym expansion?
Try changing that to do index-time synonym expansion.

See 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory


Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Pla Gong 
> To: solr-user@lucene.apache.org
> Sent: Fri, April 22, 2011 3:58:26 AM
> Subject: Multi-word Solr Synonym issue
> 
> I am trying to do a simple mapping of a 2 word term to a 1 word term and
> it  does not work. See my configuration at the bottom of the email. My
> scenario  is that I have a term called "pond care" and I want to map it
> to the term  "fountain".  So whenever a user enters the term "pond care"
> in the  search box, I want Solr to search on the word "fountain".
> Searching on   "fountain" or "pond food" should returns the same number
> of products.  I  try all type of filter combination and cannot get it to
> work.  I use the  Solr analysis and "pond food" does map to "fountain"
> but when test on Solr  Admin, the search would not query on "fountain"
> but only on "pond  care".  Here is my log from solr Admin search:
> 
> - 
> - 
>   fountain 
>
>   
>   pond  food 
>   pond food 
>   +text:pond +text:food 
>   +text:pond  +text:food 
> - 
>   1.4865229 = (MATCH) sum of:
> 0.5013988 =  (MATCH) weight(text:pond in 1137), product of: 0.5317582  =
> queryWeight(text:pond), product of: 3.180518 =  idf(docFreq=730,
> maxDocs=6470) 0.16719232 = queryNorm 
> 
> I am new to  Solr and I have Google the issue but I did not find a
> solution that will work  for my case.  Please let me know if you  have
> encounter this issue  and how you resolved it and what configuration you
> used.  I want term to  term mapping results from the query and not a
> combination of the two  terms.
> 
> I would greatly appreciate any  help.
> 
> Thanks,
> Pla
> 
> ---field type and filter  Configuration
> 
>  positionIncrementGap="100">
>
> 
>ignoreCase="true"
>  words="stopwords.txt"
>  enablePositionIncrements="true"
>  />
>   generateWordParts="1"  generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"  splitOnCaseChange="1"/>
> 
>   language="English"  protected="protwords.txt"/>
>
>   
> 
>   synonyms="synonyms.txt"  ignoreCase="true" expand="true"/>
>   ignoreCase="true"
>  words="stopwords.txt"
>  enablePositionIncrements="true"
>  />
>  generateWordParts="1"  generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"  splitOnCaseChange="1"/>
> 
>   language="English"  protected="protwords.txt"/>
>
> 
> 


Re: Solr search based on list of terms. Order by max(score) for each term.

2011-04-22 Thread Otis Gospodnetic
Hi,

You didn't say much about how your backend is configured, so it's hard to tell, 
but I imagine you could have multiple fields based on the same original data 
and 
one of those fields could be highly boosted (via dismax/edismax) field for 
exact 
matches.


Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Bogdan STOICA 
> To: solr-user@lucene.apache.org
> Sent: Thu, April 21, 2011 1:29:44 PM
> Subject: Solr search based on list of terms. Order by max(score) for each 
term.
> 
> Hello,
> 
> I am trying to query a solr server in order to obtain the most  relevant
> results for a list of terms.
> 
> For example i have the list of  words "nokia", "iphone", "charger"
> 
> My schema contains the following  data:
> nokia
> iphone
> nokia iphone otherwords
> nokia white
> iphone  white
> 
> If I run a simple query like q=nokia OR iphone OR charger i get  "nokia
> iphone otherwords" as the most relevant result (because it contains  more
> query terms)
> 
> I would like to get "nokia" or "iphone" or "iphone  white" as first results,
> because for each individual term they would be the  most relevant.
> 
> In order to obtain the correct list i would do a query for  each term, then
> aggregate the results and order them based on the maximum  score.
> 
> Can I make this query in one request?
> 
> This question has  also been asked on
> 
>http://stackoverflow.com/questions/5743264/solr-search-based-on-list-of-terms-order-by-maxscore-for-each-term
>m
> 
> Thank  you.
> 


Re: testing of stemming

2011-04-22 Thread Otis Gospodnetic
Bryan,

Have a look at page 111 of Lucene in Action 2, section 4.1.  Is that the sort 
of 
thing you are after?
If so, we may have some code that produced that in the LIA2 source code 
download...

You could also just write a small app/script that calls (via HTTP/SolrJ) one of 
the Solr analysis request handlers - if you look at solrconfig.xml you will see 
them defined there.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: bryan rasmussen 
> To: solr-user 
> Sent: Tue, April 19, 2011 11:15:49 AM
> Subject: testing of stemming
> 
> Hi,
> 
> I was wondering if I have a large number of queries I want to  test
> stemming on if there is a free standing library I can just run  it
> against without having to do all the overhead of a http  request?
> 
> Thanks,
> Bryan Rasmussen
> 


Re: Solr indexing size for a particular document.

2011-04-22 Thread Otis Gospodnetic
Rahul,

Here's a suggestion:
Write a simple app that uses *Lucene* to create N indices, one for each of the 
documents you want to test.  Then you can look at their sizes on disk.

Not sure if it's super valuable to see sizes of individual documents, but you 
can do it as described above.
Of course, if you *store* all your data, the index will be bigger than the 
original/input data.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: rahul 
> To: solr-user@lucene.apache.org
> Sent: Tue, April 19, 2011 7:49:39 AM
> Subject: Solr indexing size for a particular document.
> 
> Hi,
> 
> Is there a way to find out Solr indexing size for a particular  document. I
> am using Solrj to index the documents. 
> 
> Assume, I am  indexing multiple fields like title, description, content, and
> few integer  fields in schema.xml, then once I index the content, is there a
> way to  identify the index size for the particular document during indexing
> or after  indexing..??
> 
> Because, most of the common words are excluded from  StopWords.txt using
> StopFilterFactory. I just want to calculate the actual  index size of the
> particular document. Is there any way in current Solr  ??
> 
> thanks,
> 
> 
> --
> View this message in context: 
>http://lucene.472066.n3.nabble.com/Solr-indexing-size-for-a-particular-document-tp2838416p2838416.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
> 


Re: Query performance

2011-04-22 Thread Otis Gospodnetic
Charles,

Grab Solr nightly build and try that.  Should be much faster.

n.b. you don't need 10 in your 
config 
any more. (although this looks like a config from your master, not slave, if 
you 
are using that sort of setup)

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Charles Wardell 
> To: solr-user@lucene.apache.org
> Sent: Sat, April 16, 2011 4:55:25 PM
> Subject: Query performance
> 
> Hi All,
> 
> I have an index with about 30M documents. For the most part  queries are very 
>fast. However, when I add a wildcard to a search  field.
> +title:h*twitter it can take a few minutes. 
> 
> 
> 8GB
> 1 quad  core
> CENTOS
> 
>  false
>  100
>  512
>  10
> 
>  1
>  1000
>  1
> 
> 
>  native
> 
>   10
>   9 
> 
> 
> 
> 
> 
> 
> 


Re: Solr indexing size for a particular document.

2011-04-22 Thread rahul
thanks for all your inputs.



On Fri, Apr 22, 2011 at 8:36 PM, Otis Gospodnetic-2 [via Lucene] <
ml-node+2851624-1936255218-340...@n3.nabble.com> wrote:

> Rahul,
>
> Here's a suggestion:
> Write a simple app that uses *Lucene* to create N indices, one for each of
> the
> documents you want to test.  Then you can look at their sizes on disk.
>
> Not sure if it's super valuable to see sizes of individual documents, but
> you
> can do it as described above.
> Of course, if you *store* all your data, the index will be bigger than the
> original/input data.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
>
> - Original Message 
>
> > From: rahul <[hidden 
> > email]>
>
> > To: [hidden 
> > email]
> > Sent: Tue, April 19, 2011 7:49:39 AM
> > Subject: Solr indexing size for a particular document.
> >
> > Hi,
> >
> > Is there a way to find out Solr indexing size for a particular  document.
> I
> > am using Solrj to index the documents.
> >
> > Assume, I am  indexing multiple fields like title, description, content,
> and
> > few integer  fields in schema.xml, then once I index the content, is
> there a
> > way to  identify the index size for the particular document during
> indexing
> > or after  indexing..??
> >
> > Because, most of the common words are excluded from  StopWords.txt using
> > StopFilterFactory. I just want to calculate the actual  index size of the
>
> > particular document. Is there any way in current Solr  ??
> >
> > thanks,
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Solr-indexing-size-for-a-particular-document-tp2838416p2838416.html
> >
> > Sent  from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-indexing-size-for-a-particular-document-tp2838416p2851624.html
>  To unsubscribe from Solr indexing size for a particular document., click
> here.
>
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-size-for-a-particular-document-tp2838416p2851652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Understanding the DisMax tie parameter

2011-04-22 Thread Otis Gospodnetic
Thanks Tom!

I think I've seen a good explanation of tie from Hoss once something that 
described the background for tie beyond "it's good for breaking score ties 
between two documents".  For example, what are the scenarios where one can 
expect or fear scoring ties between multiple documents whose scores come from a 
single field?  When multiple documents have very similar or identical values of 
certain fields that are used in search and that tend to provide high scores 
and/or have high boosts?

Consider the situation where you are indexing documents that have 2 fields, 
author and body, there are very few distinct authors, and the author field has 
high boost (plus it's short) and your query is the name of the author.

Is this the situation where the author field is likely to end up being the 
field 
with max score and multiple documents (with the same author) are likely to have 
the same score if you take that max score on the author field as the final 
score 
for the document?
Is that the scenario where tie is very important?
Are there other scenarios that are different enough from the above worth 
describing on the Wiki?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: "Burton-West, Tom" 
> To: Chris Hostetter ; "solr-user@lucene.apache.org" 
>; "yo...@lucidimagination.com" 
>
> Sent: Fri, April 15, 2011 11:55:03 AM
> Subject: RE: Understanding the DisMax tie parameter
> 
> Thanks everyone.
> 
> I updated the wiki.  If you have a chance please  take a look and check to 
> make 
>sure I got it right on the wiki.
> 
> http://wiki.apache.org/solr/DisMaxQParserPlugin#tie_.28Tie_breaker.29
> 
> Tom
> 
> 
> 
> -Original  Message-
> From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
> Sent:  Thursday, April 14, 2011 5:41 PM
> To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> Cc:  Burton-West, Tom
> Subject: Re: Understanding the DisMax tie  parameter
> 
> 
> : Perhaps the parameter could have had a better name.   It's essentially
> : max(score of matching clauses) + tie * (score of matching  clauses that
> : are not the max)
> : 
> : So it can be used and thought of  as a tiebreak only in the sense that
> : if two docs match a clause (with  essentially the same score), then a
> : small tie value will act as a  tiebreaker *if* one of those docs also
> : matches some other  fields.
> 
> correct.  w/o a tiebreaker value, a dismax query will only  look at the 
> maximum scoring clause for each doc -- the "tie" param is named  for it's 
> ability to help break ties when multiple documents have the same  score 
> from the max scoring clause -- by adding in a small portion of the  scores 
> (based on the 0->1 ratio of the "tie" param) from the other  clauses.
> 
> 
> -Hoss
> 


Re: Index upgrade from 1.4.1 to 3.1 and 4.0

2011-04-22 Thread Otis Gospodnetic
Hi Ofer,

We recently helped a customer go through just such an upgrade (or maybe even 
from 1.3.*).  We used a tool that read data from one index and indexed it to 
the 
new index without having to reindex the data from the original sources.  All 
fields in the source index were obviously stored. :)

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Ofer Fort 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, April 22, 2011 10:34:26 AM
> Subject: Re: Index upgrade from 1.4.1 to 3.1 and 4.0
> 
> Nobody?
> Am I the only one in need of upgrading an index that was created with  1.4.1?
> 
> Thanks for any info
> Ofer
> 
> On Friday, April 22, 2011, Ofer  Fort  wrote:
> > Hi all,
> >  While doing some tests, I realized that an index that was created with
> >  solr 1.4.1 is readable by solr 3.1, but nt readable by solr 4.0.
> > If I  plan to migrate my index to 4.0, and I prefer not to reindex it
> > all,  what would be my best course of action?
> > Will it be possible to continue  to write to the index with 3.1? Will
> > that make it readable from 4.0 or  only the newly created segments?
> > If I optimize it using 3.1, will that  make it readable also from 4.0?
> > Thanks
> > Ofer
> >
> 


RE: Solr - Multi Term highlighting issue

2011-04-22 Thread Ramanathapuram, Rajesh
Does anybody has other suggestions?

thanks & regards,
Rajesh Ramana 
Enterprise Applications, Turner Broadcasting System, Inc.
404.878.7474 


-Original Message-
From: Ramanathapuram, Rajesh [mailto:rajesh.ramanathapu...@turner.com] 
Sent: Wednesday, April 20, 2011 2:51 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr - Multi Term highlighting issue

Thanks Erick. 

I tried your suggestion, the issue still exists.

http://localhost:8983/searchsolr/mainCore/select?indent=on&version=2.2&q=mec+us+chile&fq=storyid%3DXXX%22&start=0&rows=10&fl=*&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=story%2C+slug&hl.fragsize=10&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.mergeContiguous=false

- 
  10
  
  on
  false 


... Corboba. (MEC)CHILE/FOREST FIRES ...


thanks & regards,
Rajesh Ramana 


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, April 20, 2011 11:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Multi Term highlighting issue

Does your configuration have "hl.mergeContiguous" set to true by any chance? 
And what happens if you explicitly set this to "false" on your query?

Best
Erick

On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram, Rajesh 
 wrote:
> Hello,
>
> I am dealing with a highlighting issue in SOLR, I will try to explain 
> the issue.
>
> When I search for a single term in solr, it wraps  tag around the 
> words I want to highlight, all works well.
> But if I search multiple term, for most part highlighting works good 
> and then for some of the terms, the highlight return multiple terms in 
> a sing  tag     ...
> srchtrm1)  srchtrm2 I expect solr to return 
> highlight terms like    ... srchtrm1) ...
> srchtrm2
>
> When I search for 'US mec chile', here is how my result appears
>  ... Corboba. (MEC)CHILE/FOREST FIRES: 
> We had ... with US and Chile ...,
>  (MEC)US  
>
> This is what I was expecting it to be
>  ... Corboba. (MEC)CHILE/FOREST
> FIRES: We had ... with US and Chile ..., 
> (MEC)US 
>
> Here is my query params
> - 
> - 
>  0
>  26
> - 
>     10
>     
>     on
>     story, slug
>     standard
>     on
>     10
>     2.2
>     true
>     *
>     0
>     mec us chile
>     standard
>     true
>     storyid="  X"
>  
>  
>
> Here are some other links I found in the forum, but no real conclusion
>
> http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr_
> hi
> ghlighting_question#78163c42a67cb533
>
> I am going to try this patch, which also had no conclusive results
>   https://issues.apache.org/jira/browse/SOLR-1394
>
> Has anyone come across this issue?
> Any suggestions on how to fix this issue is much appreciated.
>
>
> thanks & regards,
> Rajesh Ramana
>


Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-22 Thread Otis Gospodnetic
Hi Renee,

Here's what I'd do:
* Check how many open files your system is set up for (ulimit -n).  You likely 
want to increase that (1024 seems to be a common default under Linux, and in 
the 
past I've set that to 30k+ without issues)
* Look at your mergeFactor.  If it's high, consider lowering it (will slow down 
indexing a bit)
* Consider using cfs, but if you do the above right, you can avoid using it.
* Consider a better Solr monitoring tool

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Renee Sun 
> To: solr-user@lucene.apache.org
> Sent: Fri, April 15, 2011 3:41:28 PM
> Subject: Re: partial optimize does not reduce the segment number to 
>maxNumSegments
> 
> sorry I should elaborate that earlier...
> 
> in our production environment,  we have multiple cores and the ingest
> continuously all day long; we only do  optimize periodically, and optimize
> once a day in mid night.
> 
> So  sometimes we could see 'too many open files' error. To prevent it  from
> happening, in production we maintain a script to monitor the segment  files
> total with all cores, and send out warnings if that number exceed  a
> threshold... it is kind of preventive measurement.  Currently we are  using
> the linux command to count the files. We are wondering if we can simply  use
> some formula to figure out this number, it will be better that way. Seems  we
> could use the stat url to get segment number and multiply it by 8 (that  is
> what we have given our schema).
> 
> Any better way to approach this?  thanks a lot!
> Renee
> 
> --
> View this message in context: 
>http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2825736.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
> 


Localized alphabetical order

2011-04-22 Thread Ben Preece
As someone who's new to Solr/Lucene, I'm having trouble finding 
information on sorting results in localized alphabetical order. I've 
ineffectively searched the wiki and the mail archives.


I'm thinking for example about Hawai'ian, where mīka (with an i-macron) 
comes after mika (i without the macron) but before miki (also without 
the macron), or about Welsh, where the digraphs (ch, dd, etc.) are 
treated as single letters, or about Ojibwe, where the apostrophe ' is a 
letter which sorts between h and i.


How do non-English languages typically handle this?

-Ben


Re: Localized alphabetical order

2011-04-22 Thread Robert Muir
please see http://wiki.apache.org/solr/UnicodeCollation

In general the idea is similar to how this is handled in databases,
you can index collation keys into a sort field at analysis time, then
you just do a standard solr sort.

However, I am not sure if your JRE provides a "haw" Locale for the
Hawaiian language.

Because of this, its probably better to use the ICU collation
integration 
(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUCollationKeyFilterFactory),
because ICU definitely supports this locale and has collation rules
for it.

On Fri, Apr 22, 2011 at 12:33 PM, Ben Preece  wrote:
> As someone who's new to Solr/Lucene, I'm having trouble finding information
> on sorting results in localized alphabetical order. I've ineffectively
> searched the wiki and the mail archives.
>
> I'm thinking for example about Hawai'ian, where mīka (with an i-macron)
> comes after mika (i without the macron) but before miki (also without the
> macron), or about Welsh, where the digraphs (ch, dd, etc.) are treated as
> single letters, or about Ojibwe, where the apostrophe ' is a letter which
> sorts between h and i.
>
> How do non-English languages typically handle this?
>
> -Ben
>


Re: Localized alphabetical order

2011-04-22 Thread Peter Keegan
On Fri, Apr 22, 2011 at 12:33 PM, Ben Preece  wrote:

> As someone who's new to Solr/Lucene, I'm having trouble finding information
> on sorting results in localized alphabetical order. I've ineffectively
> searched the wiki and the mail archives.
>
> I'm thinking for example about Hawai'ian, where mīka (with an i-macron)
> comes after mika (i without the macron) but before miki (also without the
> macron), or about Welsh, where the digraphs (ch, dd, etc.) are treated as
> single letters, or about Ojibwe, where the apostrophe ' is a letter which
> sorts between h and i.
>
> How do non-English languages typically handle this?
>
> -Ben
>


Re: Index upgrade from 1.4.1 to 3.1 and 4.0

2011-04-22 Thread Ofer Fort
Thanks Otis, but this is not my case. Most of my fields are not stored
, but I do have the original data in case I need to reindex.
My question is do I need to?
If my 1.4.1 can be read by 3.1, I assume 3.1 can continue to write to it?
In that case, I continue assuming that 4.0 will know how to read only
the new segments, and if I optimize it, than I will have only one new
segment, created by 3.1, thus readable by 4.0.
It makes sense to me, the only question is if my guesses are right:-)
Thanks.

On Friday, April 22, 2011, Otis Gospodnetic  wrote:
> Hi Ofer,
>
> We recently helped a customer go through just such an upgrade (or maybe even
> from 1.3.*).  We used a tool that read data from one index and indexed it to 
> the
> new index without having to reindex the data from the original sources.  All
> fields in the source index were obviously stored. :)
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>> From: Ofer Fort 
>> To: "solr-user@lucene.apache.org" 
>> Sent: Fri, April 22, 2011 10:34:26 AM
>> Subject: Re: Index upgrade from 1.4.1 to 3.1 and 4.0
>>
>> Nobody?
>> Am I the only one in need of upgrading an index that was created with  1.4.1?
>>
>> Thanks for any info
>> Ofer
>>
>> On Friday, April 22, 2011, Ofer  Fort  wrote:
>> > Hi all,
>> >  While doing some tests, I realized that an index that was created with
>> >  solr 1.4.1 is readable by solr 3.1, but nt readable by solr 4.0.
>> > If I  plan to migrate my index to 4.0, and I prefer not to reindex it
>> > all,  what would be my best course of action?
>> > Will it be possible to continue  to write to the index with 3.1? Will
>> > that make it readable from 4.0 or  only the newly created segments?
>> > If I optimize it using 3.1, will that  make it readable also from 4.0?
>> > Thanks
>> > Ofer
>> >
>>
>


DIH Transform XML?

2011-04-22 Thread Matt Galvin
Hello,

First post here... I spent some time researching this but can't seem
to find the answer I am looking for...

I have a MySQL DB that I have Solr indexing and all is well.

However, one field I need to index is a text field that contains XML
stored in the DB. I read up on DIH Transformers a bit and I am
wondering... is there a way to have solr DIH either transform the XML
data or strip the XML out of the field as it indexes it leaving only
the textual data in solr's index?

This XML field is the body content of web site articles (don't ask
why, not my choice :-/) and it also has a lot of CDATA's wrapping HTML
in the XML. I want solr to index this data, minus all the markup.

Should I be using a RegexTransformer to strip tags (this feels like
the wrong approach) or would HTMLStripTransformer work? Is there an
XMLTransformer I don't know about?

I have been reading this:

http://wiki.apache.org/solr/DataImportHandler

but I feel like I am missing something that would make this work.

My dataConfig is barebones ATM.

Any help is greatly appreciated.

Thanks,

Matt


Re: Localized alphabetical order

2011-04-22 Thread Bently Preece
Thank you.  This looks like the right direction.

I see the docs say ICUCollationKeyFilterFactory is deprecated in favor of
ICUCollationField.  So ... I'd implement a subclass of ICUCollationField,
and use that as the fieldtype in schema.xml.  And this means - what? - that
I'd also implement a custom SortField to be returned by
MyCollationField.getSortField(...), which would also require me to write a
custom FieldComparator?  Am I on the right track?

Do you know an example of another language which has already done this sort
of thing?

Really, thanks for your help.

-Ben

On Fri, Apr 22, 2011 at 11:41 AM, Peter Keegan wrote:

> On Fri, Apr 22, 2011 at 12:33 PM, Ben Preece  wrote:
>
> > As someone who's new to Solr/Lucene, I'm having trouble finding
> information
> > on sorting results in localized alphabetical order. I've ineffectively
> > searched the wiki and the mail archives.
> >
> > I'm thinking for example about Hawai'ian, where mīka (with an i-macron)
> > comes after mika (i without the macron) but before miki (also without the
> > macron), or about Welsh, where the digraphs (ch, dd, etc.) are treated as
> > single letters, or about Ojibwe, where the apostrophe ' is a letter which
> > sorts between h and i.
> >
> > How do non-English languages typically handle this?
> >
> > -Ben
> >
>


Re: Need to create dyanamic indexies base on different document workspaces

2011-04-22 Thread Marc Sturlese
In case you need to create lots of indexes and register/unregister fast,
there is work on the way http://wiki.apache.org/solr/LotsOfCores

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-to-create-dyanamic-indexies-base-on-different-document-workspaces-tp2845919p2852410.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Localized alphabetical order

2011-04-22 Thread Robert Muir
On Fri, Apr 22, 2011 at 2:37 PM, Bently Preece  wrote:
> Thank you.  This looks like the right direction.
>
> I see the docs say ICUCollationKeyFilterFactory is deprecated in favor of
> ICUCollationField.  So ... I'd implement a subclass of ICUCollationField,
> and use that as the fieldtype in schema.xml.  And this means - what? - that
> I'd also implement a custom SortField to be returned by
> MyCollationField.getSortField(...), which would also require me to write a
> custom FieldComparator?  Am I on the right track?

no, you don't have to write any code in either case:

solr 3.1:


  


  


solr 4.0:



then just copyField or whatever to get your data in there.


Re: Localized alphabetical order

2011-04-22 Thread Bently Preece
What if there is no standard localization already?  The case I'm
specifically interested in is Ojibwe.

So should I really be researching how the JRE does localization instead of
Solr?


On Fri, Apr 22, 2011 at 2:01 PM, Robert Muir  wrote:

> On Fri, Apr 22, 2011 at 2:37 PM, Bently Preece  wrote:
> > Thank you.  This looks like the right direction.
> >
> > I see the docs say ICUCollationKeyFilterFactory is deprecated in favor of
> > ICUCollationField.  So ... I'd implement a subclass of ICUCollationField,
> > and use that as the fieldtype in schema.xml.  And this means - what? -
> that
> > I'd also implement a custom SortField to be returned by
> > MyCollationField.getSortField(...), which would also require me to write
> a
> > custom FieldComparator?  Am I on the right track?
>
> no, you don't have to write any code in either case:
>
> solr 3.1:
>
> 
>  
>
> strength="secondary"/>
>  
> 
>
> solr 4.0:
>
>  strength="secondary"/>
>
> then just copyField or whatever to get your data in there.
>


Ant is not working in Eclipse

2011-04-22 Thread Em
Hello list,

there is a problem with the SVN-Checkout of the current Solr-version, I
think.
I can run ant eclipse, it does not show any errors (needed 20 seconds the
first time and 0.9 seconds afterwards).

However, the classpath-files were not set properly. A click on refresh did
not show the expected changes.
I use Linux Mint 10 (Julia).
On Windows-Systems everything works as expected.

Did you also recognize such a problem?

Regards,
Em

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Ant-is-not-working-in-Eclipse-tp2852641p2852641.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Ant is not working in Eclipse

2011-04-22 Thread Em
I have to correct myself.
I just tried to copy the files manually to the correct destinations. It
showed that those files already are there (however, they did not show up in
the terminal). 

What else could be responsible for the fact that the click on refresh does
not show a developer-like view?

Regards,
Em


Em wrote:
> 
> Hello list,
> 
> there is a problem with the SVN-Checkout of the current Solr-version, I
> think.
> I can run ant eclipse, it does not show any errors (needed 20 seconds the
> first time and 0.9 seconds afterwards).
> 
> However, the classpath-files were not set properly. A click on refresh did
> not show the expected changes.
> I use Linux Mint 10 (Julia).
> On Windows-Systems everything works as expected.
> 
> Did you also recognize such a problem?
> 
> Regards,
> Em
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Ant-is-not-working-in-Eclipse-tp2852641p2852660.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Localized alphabetical order

2011-04-22 Thread Robert Muir
On Fri, Apr 22, 2011 at 3:09 PM, Bently Preece  wrote:
> What if there is no standard localization already?  The case I'm
> specifically interested in is Ojibwe.
>

this is standard? to sort a field with a specific locale, you have to
tell it the locale you want. if you use the ICU implementation you get
support for more locales, its just that simple. The JRE has less
available locales because its internationalization and localization
support lags behind ICU.

On the other hand ICU keeps current with both the unicode standard and
locale data in CLDR (http://unicode.org/cldr), which is why it
supports more.

I noticed there is no locale for your language in CLDR, not even under
development it appears (http://unicode.org/cldr/apps/survey).

So if your language (Ojibwe) has special sort rules, I recommend
making the collation rules and using a custom collator as specified
here: 
http://wiki.apache.org/solr/UnicodeCollation#Sorting_text_with_custom_rules

for your "base collator" you just need to use "new Locale()" and your
rules will be a delta from that.

Separately, if these sort rules are well-defined/standardized for this
language, and you get them working, you might want to then consider
contributing them to CLDR.


Re: Index upgrade from 1.4.1 to 3.1 and 4.0

2011-04-22 Thread Otis Gospodnetic
Regardless of what anyone here says, you need to try it.
3.1 should be able to read 1.4.1, yes.
One the format is switched to 3.1, you can't go back and read it with 1.4.1.  
This is why you want to upgrade your Slaves first, then your Master (if you 
have 
them -- I remember we spoke a while back and that wasn't the case back then).

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Ofer Fort 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, April 22, 2011 1:00:05 PM
> Subject: Re: Index upgrade from 1.4.1 to 3.1 and 4.0
> 
> Thanks Otis, but this is not my case. Most of my fields are not stored
> , but  I do have the original data in case I need to reindex.
> My question is do I  need to?
> If my 1.4.1 can be read by 3.1, I assume 3.1 can continue to write  to it?
> In that case, I continue assuming that 4.0 will know how to read  only
> the new segments, and if I optimize it, than I will have only one  new
> segment, created by 3.1, thus readable by 4.0.
> It makes sense to me,  the only question is if my guesses are right:-)
> Thanks.
> 
> On Friday,  April 22, 2011, Otis Gospodnetic   
>wrote:
> > Hi Ofer,
> >
> > We recently helped a customer go through  just such an upgrade (or maybe 
even
> > from 1.3.*).  We used a tool that  read data from one index and indexed it 
> > to 
>the
> > new index without having  to reindex the data from the original sources. 
 All
> > fields in the source  index were obviously stored. :)
> >
> > Otis
> > 
> >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem  search :: http://search-lucene.com/
> >
> >
> >
> > - Original  Message 
> >> From: Ofer Fort 
> >> To: "solr-user@lucene.apache.org"  
> >>  Sent: Fri, April 22, 2011 10:34:26 AM
> >> Subject: Re: Index upgrade  from 1.4.1 to 3.1 and 4.0
> >>
> >> Nobody?
> >> Am I the  only one in need of upgrading an index that was created with  
> 1.4.1?
> >>
> >> Thanks for any info
> >>  Ofer
> >>
> >> On Friday, April 22, 2011, Ofer  Fort   wrote:
> >> > Hi all,
> >> >  While doing some tests, I  realized that an index that was created with
> >> >  solr 1.4.1 is  readable by solr 3.1, but nt readable by solr 4.0.
> >> > If I  plan  to migrate my index to 4.0, and I prefer not to reindex it
> >> > all,   what would be my best course of action?
> >> > Will it be possible to  continue  to write to the index with 3.1? Will
> >> > that make it  readable from 4.0 or  only the newly created segments?
> >> > If I  optimize it using 3.1, will that  make it readable also from 4.0?
> >>  > Thanks
> >> > Ofer
> >> >
> >>
> >
> 


Re: Index upgrade from 1.4.1 to 3.1 and 4.0

2011-04-22 Thread Ofer Fort
Thanks, I'll do the procedure on my test env and update the community,
 if anybody already went through the process, I would lov to here about it

On Friday, April 22, 2011, Otis Gospodnetic  wrote:
> Regardless of what anyone here says, you need to try it.
> 3.1 should be able to read 1.4.1, yes.
> One the format is switched to 3.1, you can't go back and read it with 1.4.1.
> This is why you want to upgrade your Slaves first, then your Master (if you 
> have
> them -- I remember we spoke a while back and that wasn't the case back then).
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>> From: Ofer Fort 
>> To: "solr-user@lucene.apache.org" 
>> Sent: Fri, April 22, 2011 1:00:05 PM
>> Subject: Re: Index upgrade from 1.4.1 to 3.1 and 4.0
>>
>> Thanks Otis, but this is not my case. Most of my fields are not stored
>> , but  I do have the original data in case I need to reindex.
>> My question is do I  need to?
>> If my 1.4.1 can be read by 3.1, I assume 3.1 can continue to write  to it?
>> In that case, I continue assuming that 4.0 will know how to read  only
>> the new segments, and if I optimize it, than I will have only one  new
>> segment, created by 3.1, thus readable by 4.0.
>> It makes sense to me,  the only question is if my guesses are right:-)
>> Thanks.
>>
>> On Friday,  April 22, 2011, Otis Gospodnetic 
>>wrote:
>> > Hi Ofer,
>> >
>> > We recently helped a customer go through  just such an upgrade (or maybe
> even
>> > from 1.3.*).  We used a tool that  read data from one index and indexed it 
>> > to
>>the
>> > new index without having  to reindex the data from the original sources.
>  All
>> > fields in the source  index were obviously stored. :)
>> >
>> > Otis
>> > 
>> >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> > Lucene ecosystem  search :: http://search-lucene.com/
>> >
>> >
>> >
>> > - Original  Message 
>> >> From: Ofer Fort 
>> >> To: "solr-user@lucene.apache.org"  
>> >>  Sent: Fri, April 22, 2011 10:34:26 AM
>> >> Subject: Re: Index upgrade  from 1.4.1 to 3.1 and 4.0
>> >>
>> >> Nobody?
>> >> Am I the  only one in need of upgrading an index that was created with
>> 1.4.1?
>> >>
>> >> Thanks for any info
>> >>  Ofer
>> >>
>> >> On Friday, April 22, 2011, Ofer  Fort   wrote:
>> >> > Hi all,
>> >> >  While doing some tests, I  realized that an index that was created with
>> >> >  solr 1.4.1 is  readable by solr 3.1, but nt readable by solr 4.0.
>> >> > If I  plan  to migrate my index to 4.0, and I prefer not to reindex it
>> >> > all,   what would be my best course of action?
>> >> > Will it be possible to  continue  to write to the index with 3.1? Will
>> >> > that make it  readable from 4.0 or  only the newly created segments?
>> >> > If I  optimize it using 3.1, will that  make it readable also from 4.0?
>> >>  > Thanks
>> >> > Ofer
>> >> >
>> >>
>> >
>>
>


Re: Localized alphabetical order

2011-04-22 Thread Bently Preece
Thanks.  I get it now.

I meet with our language experts again on Monday.  I'll ask them about
submitting localization info to the CLDR.

Thanks again.

-Ben

On Fri, Apr 22, 2011 at 2:44 PM, Robert Muir  wrote:

> On Fri, Apr 22, 2011 at 3:09 PM, Bently Preece  wrote:
> > What if there is no standard localization already?  The case I'm
> > specifically interested in is Ojibwe.
> >
>
> this is standard? to sort a field with a specific locale, you have to
> tell it the locale you want. if you use the ICU implementation you get
> support for more locales, its just that simple. The JRE has less
> available locales because its internationalization and localization
> support lags behind ICU.
>
> On the other hand ICU keeps current with both the unicode standard and
> locale data in CLDR (http://unicode.org/cldr), which is why it
> supports more.
>
> I noticed there is no locale for your language in CLDR, not even under
> development it appears (http://unicode.org/cldr/apps/survey).
>
> So if your language (Ojibwe) has special sort rules, I recommend
> making the collation rules and using a custom collator as specified
> here:
> http://wiki.apache.org/solr/UnicodeCollation#Sorting_text_with_custom_rules
>
> for your "base collator" you just need to use "new Locale()" and your
> rules will be a delta from that.
>
> Separately, if these sort rules are well-defined/standardized for this
> language, and you get them working, you might want to then consider
> contributing them to CLDR.
>


RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-22 Thread Robert Petersen
I can repeatedly demonstrate this in my dev environment, where I get
entirely different results searching for AppleTV vs. appletv and I
really just don't get it.  I set up a specific sku in dev with AppleTV
in its title to experiment with.  What can I provide to help diagnose?
I need to make this work...  thanks for the help!


-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
Seeley
Sent: Thursday, April 21, 2011 5:54 PM
To: solr-user@lucene.apache.org
Subject: Re: term position question from analyzer stack for
WordDelimiterFilterFactory

On Thu, Apr 21, 2011 at 8:06 PM, Robert Petersen 
wrote:
> So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory
settings I cannot get a match between AppleTV on the indexing side and
appletv on the search side.

Hmmm, that shouldn't be the case.  The "text" field in the solr
example config doesn't use preserveOriginal, and AppleTV is indexed as

appl, tv/appletv

And a search for appletv does match fine.

Perhaps on the search side there is actually a phrase query like "big
appletv"?  One workaround for that is to add a little slop... "big
appletv"~1

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: Solr - Multi Term highlighting issue

2011-04-22 Thread Koji Sekiguchi

How are your hl.fl fields defined in schema.xml?

Koji
--
http://www.rondhuit.com/en/

(11/04/23 1:23), Ramanathapuram, Rajesh wrote:

Does anybody has other suggestions?

thanks&  regards,
Rajesh Ramana
Enterprise Applications, Turner Broadcasting System, Inc.
404.878.7474


-Original Message-
From: Ramanathapuram, Rajesh [mailto:rajesh.ramanathapu...@turner.com]
Sent: Wednesday, April 20, 2011 2:51 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr - Multi Term highlighting issue

Thanks Erick.

I tried your suggestion, the issue still exists.

http://localhost:8983/searchsolr/mainCore/select?indent=on&version=2.2&q=mec+us+chile&fq=storyid%3DXXX%22&start=0&rows=10&fl=*&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=story%2C+slug&hl.fragsize=10&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.mergeContiguous=false

-
   10
   
   on
   false  


... Corboba. (MEC)CHILE/FOREST FIRES ...


thanks&  regards,
Rajesh Ramana


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, April 20, 2011 11:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Multi Term highlighting issue

Does your configuration have "hl.mergeContiguous" set to true by any chance? And what 
happens if you explicitly set this to "false" on your query?

Best
Erick

On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram, 
Rajesh  wrote:

Hello,

I am dealing with a highlighting issue in SOLR, I will try to explain
the issue.

When I search for a single term in solr, it wraps  tag around the
words I want to highlight, all works well.
But if I search multiple term, for most part highlighting works good
and then for some of the terms, the highlight return multiple terms in
a sing  tag ...
srchtrm1) srchtrm2  I expect solr to return
highlight terms like...srchtrm1)...
srchtrm2

When I search for 'US mec chile', here is how my result appears
  ... Corboba. (MEC)CHILE/FOREST FIRES:
We had ... withUS  andChile  ...,
  (MEC)US

This is what I was expecting it to be
  ... Corboba. (MEC)CHILE/FOREST
FIRES: We had ... withUS  andChile  ...,
(MEC)US  

Here is my query params
-
-
  0
  26
-
 10
 
 on
 story, slug
 standard
 on
 10
 2.2
 true
 *
 0
 mec us chile
 standard
 true
 storyid="  X"
  
  

Here are some other links I found in the forum, but no real conclusion

http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr_
hi
ghlighting_question#78163c42a67cb533

I am going to try this patch, which also had no conclusive results
   https://issues.apache.org/jira/browse/SOLR-1394

Has anyone come across this issue?
Any suggestions on how to fix this issue is much appreciated.


thanks&  regards,
Rajesh Ramana








Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-22 Thread Yonik Seeley
On Fri, Apr 22, 2011 at 8:24 PM, Robert Petersen  wrote:
> I can repeatedly demonstrate this in my dev environment, where I get
> entirely different results searching for AppleTV vs. appletv

You originally said "I cannot get a match between AppleTV on the
indexing side and appletv on the search side".
Getting different numbers of results or different results is slightly different.

For example, if there were a document with "Apple TV" in it, then a
query of "AppleTV" would match that doc, but a query of "appletv"
would not.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: DIH Transform XML?

2011-04-22 Thread Ahmet Arslan


I have a MySQL DB that I have Solr indexing and all is well.

However, one field I need to index is a text field that contains XML
stored in the DB. I read up on DIH Transformers a bit and I am
wondering... is there a way to have solr DIH either transform the XML
data or strip the XML out of the field as it indexes it leaving only
the textual data in solr's index?

This XML field is the body content of web site articles (don't ask
why, not my choice :-/) and it also has a lot of CDATA's wrapping HTML
in the XML. I want solr to index this data, minus all the markup.

Should I be using a RegexTransformer to strip tags (this feels like
the wrong approach) or would HTMLStripTransformer work? Is there an
XMLTransformer I don't know about?



Not sure about the cdata thing, but HTMLStripTranformer behaves like,
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
, so it can be used to strip xml tags as well.