Re: Looking for Developers

2010-10-29 Thread Gora Mohanty
On Fri, Oct 29, 2010 at 12:23 PM, scott chu (朱炎詹)
 wrote:
> When I first saw this particular email, I wrote a letter intend to ask the
> sender remove solr-user from its recepient cause I thought this should go to
> solr-dev. But then I thought again, it's about 'job-offer' not 'development
> of Solr', I just delete my email.

To add more with regards to the original mail that started this thread: We are
based in India, and for the first mail, I replied to the person
off-list offering our
services, but never got a reply. So, I wonder how serious this guy was in the
first place.

> Maybe solr-job is a good suggestion. A selfish reason pro this suggestion is
> that I'm also looking for some one familiar with Solr to work for me in
> Taiwan & I really don't know where to ask.

In other lists with a broader audience, such as a local Linux users list, our
practice has been that job offers are tolerated if posted once, and marked
as "Commercial" in the subject header. Given the low volume of such posts
in this list, maybe that could be an acceptable solution? We would also be
happy with a separate solr-jobs list.

Regards,
Gora


Maximum of length of a Dismax Query?

2010-10-29 Thread Swapnonil Mukherjee
Hi Everybody,

It seems that the maximum query length supported by the Dismax Query Handler is 
3534 characters. Is there anyway I can set this limit to be around 12,000?

If I fire a query beyond 3534 characters, I don't even get error messages in 
the catalina.XXX log files.

Swapnonil Mukherjee
+91-40092712
+91-9007131999





Re: QueryElevation Component is so slow

2010-10-29 Thread Chamnap Chhorn
anyone has some suggestions to improve the search?
thanks

On 10/28/10, Chamnap Chhorn  wrote:
> Sorry for very bad pasting. I paste it again.
>
> Slowest Components  Count   Exclusive
>  Total
> QueryElevationComponent 1 506,858 ms
> 100%
> 506,858 ms 100%
> SolrIndexSearcher 1 2.0 ms
>  0% 2.0 ms 0%
> org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms
>  0% 506,862 ms 100%
> QueryComponent 1 1.0 ms
>  0%   1.0 ms 0%
> DebugComponent 1 0.0 ms
>  0% 0.0 ms 0%
> FacetComponent 1 0.0 ms
>  0% 0.0 ms 0%
>
> On Thu, Oct 28, 2010 at 4:57 PM, Chamnap Chhorn
> wrote:
>
>> Hi,
>>
>> I'm using solr 1.4 and using QueryElevation Component for guaranteed
>> search
>> position. I have around 700,000 documents with 1 Mb elevation file. It
>> turns
>> out it is quite slow on the newrelic monitoring website:
>>
>> Slowest Components Count Exclusive Total   QueryElevationComponent 1
>> 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0%
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0% 506,862
>> ms
>> 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0
>> ms
>> 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0%
>>
>> As you could see, QueryElevationComponent takes quite a lot of time. Any
>> suggestion how to improve this?
>>
>> --
>> Chhorn Chamnap
>> http://chamnapchhorn.blogspot.com/
>>
>
>
>
> --
> Chhorn Chamnap
> http://chamnapchhorn.blogspot.com/
>


-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Newbie to Solr, LIKE:foo

2010-10-29 Thread MilleBii
I'm Nutch user but I'm considering to use Solr for the following reason.

I need a LIKE:foo , which turns into a *foo* query. I saw the built-in
prefix query parser but it does only look for foo*, if I understand it well
So is there a query parser that does what I'm looking.
If not how difficult is it to build one with Solr ?

-- 
-MilleBii-


Re: Looking for Developers

2010-10-29 Thread Mark Allan
For me, I simply deleted the original email, but I'm now quite  
enjoying the irony of the complaints causing more noise on the list  
than the original email!  ;-)


M


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Possible bug in query sorting

2010-10-29 Thread Pablo Recio
That's my schema XML:



 
   
   
   
   
   
 
   
   
   
   
 
 
   
   
   
   
   
 
   
 

 
  
  
  
  
  
  
 

 link
 text

 

 
 
 ...




2010/10/28 Gora Mohanty 

> On Thu, Oct 28, 2010 at 5:18 PM, Michael McCandless
>  wrote:
> > Is it somehow possible that you are trying to sort by a multi-valued
> field?
> [...]
>
> Either that, or or your field gets processed into multiple tokens via the
> analyzer/tokenizer path in your schema. The reported error is a
> consequence of the fact that different documents might result in a
> different number of tokens.
>
> Please show us the part of schema.xml that defines the field type for
> the field "title".
>
> Regards,
> Gora
>


Natural string sorting

2010-10-29 Thread RL

Just a quick question about natural sorting of strings.

I've a simple dynamic field in my schema:




There are 3 indexed strings for example
string1,string2,string10

Executing a query and sorting by this field leads to unnatural sorting of :
string1
string10
string2

(Some time ago i used Lucene and i was pretty sure that Lucene used a
natural sort, thus i expected the same from solr)
Is there a way to sort in a natural order? Config option? Plugin? Expected
output would be:
string1
string2
string10


Thanks in advance.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Natural-string-sorting-tp1791227p1791227.html
Sent from the Solr - User mailing list archive at Nabble.com.


org.tartarus package in lucene/solr?

2010-10-29 Thread Tharindu Mathew
Hi,

How come $subject is present??

-- 
Regards,

Tharindu


Re: Natural string sorting

2010-10-29 Thread Savvas-Andreas Moysidis
I think string10 is before string2 in lexicographic order?

On 29 October 2010 09:18, RL  wrote:

>
> Just a quick question about natural sorting of strings.
>
> I've a simple dynamic field in my schema:
>
>  omitNorms="true"/>
>  omitNorms="true"/>
>
> There are 3 indexed strings for example
> string1,string2,string10
>
> Executing a query and sorting by this field leads to unnatural sorting of :
> string1
> string10
> string2
>
> (Some time ago i used Lucene and i was pretty sure that Lucene used a
> natural sort, thus i expected the same from solr)
> Is there a way to sort in a natural order? Config option? Plugin? Expected
> output would be:
> string1
> string2
> string10
>
>
> Thanks in advance.
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Natural-string-sorting-tp1791227p1791227.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Possible bug in query sorting

2010-10-29 Thread Gora Mohanty
On Fri, Oct 29, 2010 at 1:47 PM, Pablo Recio  wrote:
> That's my schema XML:

>   
>     
>       
>       
>       
>       
>     
>     
>       
>        ignoreCase="true" expand="true"/>
>       
>       
>       
>     
>   
>  
>
>  
[...]
>   required="true" multiValued="false" omitNorms="false" />
>   required="false" multiValued="false" omitNorms="false" />
>  
>  
[...]

The issue is that you are using the WhitespaceTokenizerFactory
as an analyzer for the field. This is resulting in a different number
of tokens in different documents, which is causing the error.

Use a field that is non-tokenized, e.g., change the type of the
"title" field to "string". If you need a tokenized "title" field, copy
the field to another of type "string", and sort on that field instead.
Please see http://wiki.apache.org/solr/CommonQueryParameters#sort

Regards,
Gora


Re: Searching for terms on specific fields

2010-10-29 Thread Imran
Cheers Hoss. That did it for me.

~~Sent by an Android
On 29 Oct 2010 00:39, "Chris Hostetter"  wrote:
>
> The specifics of your overall goal confuse me a bit, but drilling down to
> your core question...
>
> : I want to be able to use the dismax parser to search on both terms
> : (assigning slops and tie breaks). I take it the 'fq' is a candidate for
> : this,but can I add dismax capabilities to fq as well? Also my query
would be
>
> ...you can use any parser you want for fq, using the localparams syntax...
>
> http://wiki.apache.org/solr/LocalParams
>
> ..so you could have something like...
>
> q=foo:bar&fq={!dismax qf='yak zak'}baz
>
> ..the one thing you have to watch out for when using localparams and
> dismax is that the outer params are inherited by the inner params by
> default -- so if you are using dismax for your main query 'q' (with
> defType) and you have global params for qf, pf, bq, etc... those are
> inherited by your fq={!dismax} query unless you override them with local
> params
>
>
> -Hoss


Re: OutOfMemory and auto-commit

2010-10-29 Thread Tommaso Teofili
If the problem is autowarming queries running in the meantime maybe you
could consider changing set to true the following:

false

and/or change this value
2

another option would be lowering the value of autowarmCount inside the cache
definitions.

Hope this helps.
Tommaso

2010/10/25 Jonathan Rochkind 

> Yes, that's my question too.  Anyone?
>
> Dennis Gearon wrote:
>
>> How is this avoided?
>>
>> Dennis Gearon
>>
>>
>>
>>
>> --- On Thu, 10/21/10, Lance Norskog  wrote:
>>
>>
>>
>>> From: Lance Norskog 
>>> Subject: Re: OutOfMemory and auto-commit
>>> To: solr-user@lucene.apache.org
>>> Date: Thursday, October 21, 2010, 9:53 PM
>>> Yes. Indexing activity suspends until
>>> the commit finishes, then
>>> starts. Having both queries and indexing on the same Solr
>>> will have
>>> this memory problem.
>>>
>>> Lance
>>>
>>> On Thu, Oct 21, 2010 at 1:16 PM, Jonathan Rochkind 
>>> wrote:
>>>
>>>
 If I do _not_ have any auto-commit enabled, and add


>>> 500k documents and
>>>
>>>
 commit at end, no problem.

 If I instead set auto-commit maxDocs to 10 (pretty


>>> large number), and
>>>
>>>
 try to add 500k docs, with autocommits theoretically


>>> happening every 100k...
>>>
>>>
 I run into an OutOfMemory error.

 Can anyone think of any reasons that would cause this,


>>> and how to resolve
>>>
>>>
 it?
 All I can think of is that in the first case, my


>>> newSearcher and
>>>
>>>
 firstSearcher warming queries don't run until the


>>> 'document add' is
>>>
>>>
 completely done. In the second case, there are


>>> newSearcher and firstSearcher
>>>
>>>
 warming queries happening at the same time another


>>> process is continuing to
>>>
>>>
 stream 'add's to Solr.   Although at a maxDocs of


>>> 10, I shouldn't (I
>>>
>>>
 think) get _overlapping_ warming queries, the warming


>>> queries should be done
>>>
>>>
 before the next commit. I think. But nonetheless, just


>>> the fact that warming
>>>
>>>
 queries are happening at the same time 'add's are


>>> continuing to stream,
>>>
>>>
 could that be enough to somehow increase memory usage


>>> enough to run into
>>>
>>>
 OOM?



>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>>>
>>>
>>>
>>


Re: Maximum of length of a Dismax Query?

2010-10-29 Thread Swapnonil Mukherjee
I am using the SOLRJ client to post my query, The query length is roughly 
10,000 characters. I am using GET like this.

int page = 1;
int resultsPerPage = 24;
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("q", query);
params.set("start", "" + (page - 1) * resultsPerPage);
params.set("rows", resultsPerPage);
try
{
QueryResponse response = 
QueryServerManager.getSolrServer().query(params, SolrRequest.METHOD.GET);
assertNotNull(response);
}
catch (SolrServerException e)
{
e.printStackTrace();
}
This hits the exception block with the following exception

org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: 
Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:122)
at 
com.getty.search.tests.DismaxQueryTestCase.testAssetQuery(DismaxQueryTestCase.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.textui.TestRunner.doRun(TestRunner.java:116)
at 
com.intellij.junit3.JUnit3IdeaTestRunner.doRun(JUnit3IdeaTestRunner.java:108)
at junit.textui.TestRunner.doRun(TestRunner.java:109)
at 
com.intellij.junit3.JUnit3IdeaTestRunner.startRunnerWithArgs(JUnit3IdeaTestRunner.java:42)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64)

Swapnonil Mukherjee



On 29-Oct-2010, at 12:44 PM, Swapnonil Mukherjee wrote:

> Hi Everybody,
> 
> It seems that the maximum query length supported by the Dismax Query Handler 
> is 3534 characters. Is there anyway I can set this limit to be around 12,000?
> 
> If I fire a query beyond 3534 characters, I don't even get error messages in 
> the catalina.XXX log files.
> 
> Swapnonil Mukherjee
> +91-40092712
> +91-9007131999
> 
> 
> 



Re: Natural string sorting

2010-10-29 Thread Toke Eskildsen
On Fri, 2010-10-29 at 10:18 +0200, RL wrote:
> Executing a query and sorting by this field leads to unnatural sorting of :
> string1
> string10
> string2

That's very much natural. Numbers are not treated any different from
words made up of letters. Your have to use alignment if you want to use
natural sorting:
string01
string02
string10

> (Some time ago i used Lucene and i was pretty sure that Lucene used a
> natural sort, thus i expected the same from solr)

Lucene sorts the same way, if you just use standard sort.

> Is there a way to sort in a natural order? Config option? Plugin? Expected
> output would be:
> string1
> string2
> string10

I don't know how to do this in Solr, sorry. To do it in Lucene without
changing the terms, you could use a custom comparator that tokenizes the
Strings in numbers vs. everything else and do the compare
token-by-token, alternating between natural sort and numeric sort
depending on the token type.



Re: Overriding Tika's field processing

2010-10-29 Thread Lance Norskog
If you change 'title' to be single-valued, the Extracting thing may or
may not override it. I remember a go-round on this problem. But the
ExtractingWhatsIt has code that explicitly checks for single-valued
v.s. multi-valued.

And this may all be different in different Solr versions. The
DataImportHandler has Tika support in 3.x and trunk, and the DIH gives
a lot more control about what field has what value.

On Thu, Oct 28, 2010 at 8:53 AM, Tod  wrote:
> I'm reading my document data from a CMS and indexing it using calls to curl.
>  The curl call includes 'stream.url' so Tika will also index the actual
> document pointed to by the CMS' stored url.  This works fine.
>
> Presentation side I have a dropdown with the title of all the indexed
> documents such that when a user clicks one of them it opens in a new window.
>  Using js, I've been parsing the json returned from Solr to create the
> dropdown.  The problem is I can't get the titles sorted alphabetically.
>
> If I use a facet.sort on the title field I get back ALL the sorted titles in
> the facet block, but that doesn't include the associated URL's.  A sorted
> query won't work because title is a multivalued field.
>
> The one option I can think of is to make the title single valued so that I
> have a one to one relationship to the returned url.  To do that I'd need to
> be able to *not* index the Tika returned values.
>
> If I read right, my understanding was that I could use 'literal.title' in
> the curl call to limit what would be included in the index from Tika.  That
> doesn't seem to be working as a test facet query returns more than I have in
> the CMS.
>
> Am I understanding the 'literal.title' processing correctly?  Does anybody
> have experience/suggestions on how to handle this?
>
>
> Thanks - Tod
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: RAM increase

2010-10-29 Thread satya swaroop
Hi All,

 Thanks for your reply.I have a doubt whether to increase the ram or
heap size to java or to tomcat where the solr is running


Regards,
satya


Re: Looking for Developers

2010-10-29 Thread Toke Eskildsen
On Fri, 2010-10-29 at 10:06 +0200, Mark Allan wrote:
> For me, I simply deleted the original email, but I'm now quite  
> enjoying the irony of the complaints causing more noise on the list  
> than the original email!  ;-)

He he. An old classic. Next in line is the meta-meta-discussion about
whether meta-discussions belong on the list or if they should be moved
to solr-user-meta. Repeat ad nauseam.

Job-postings are on-topic IMHO and unless their volume grows
significantly, I see no reason to create a new mail lists.



Re: Upgrading from Solr 1.2 to 1.4.1

2010-10-29 Thread Lance Norskog
Yes, from Solr 1.2 to 1.3/Lucene 2.4.1 to 2.9 there was a change in
the Porter stemmer for English. I don't know what it was. It may also
affect the other language variants of the stemmer.

If stemming is important for your users, you might want to try the
Solr 3.x branch instead, or find Lucid's KStem implementation for
1.4.1. 3.x has a lot of work on better stemmers for many languages.

On Thu, Oct 28, 2010 at 2:23 PM, Robert Muir  wrote:
> On Thu, Oct 28, 2010 at 4:44 PM,   wrote:
>>
>> I'm using Solr 1.2.  If I upgrade to 1.4.1, must I re-index because of 
>> LUCENE-1142?  If so, how will this affect me if I don’t re-index (I'm using 
>> EnglishPorterFilterFactory)?  What about when I’m using non-English stammers 
>> from Snowball?
>>
>> Beside the brief note "IMPORTANT UPGRADE NOTE" about this in CHANGES.txt, 
>> where can I read more about this?  I looked in JIRA, LUCENE-1142, there 
>> isn't much.
>
> I haven't looked in detail regarding these changes, but the snowball
> was upgraded to revision 500 here.
> you can see the revisions/logs of the various algorithms here:
> http://svn.tartarus.org/snowball/trunk/snowball/algorithms/?pathrev=500
>
> One problem being, i don't know the previous revision you were
> using...but since it had no Hungarian before LUCENE-1142, it couldnt
> have possibly been any *later* than revision 385:
>
>    Revision 385 - Directory Listing
>    Added Mon Sep 4 14:06:56 2006 UTC (4 years, 1 month ago) by martin
>    New Hungarian stemmer
>
> This means, for example, that you would certainly be affected by
> changes in the english stemmer such as revision 414, among others:
>
>    Revision 414 - Directory Listing
>    Modified Mon Nov 20 10:49:29 2006 UTC (3 years, 11 months ago) by martin
>    'arsen' as exceptional p1 position, to prevent 'arsenic' and
> 'arsenal' conflating
>
> In my opinion, it would be best to re-index.
>



-- 
Lance Norskog
goks...@gmail.com


Re: No response from Solr on complex request after several days

2010-10-29 Thread Lance Norskog
There are a few problems that can happen. This is usually a sign of
garbage collection problems.
You can monitor the Tomcat instance with JConsole or one of the other
java monitoring tools and see if there is a memory leak.

Also, most people don't need to do it, but you can automatically
restart it once a day.

On Thu, Oct 28, 2010 at 2:20 AM, Xavier Schepler
 wrote:
> Hi,
>
> We are in a beta testing phase, with several users a day.
>
> After several days of waiting, the solr server didn't respond to requests
> that require a lot of processing time.
>
> I'm using Solr inside Tomcat.
>
> This is the request that had no response from the server :
>
> wt=json&omitHeader=true&q=qiAndMSwFR%3A%28transport%29&q.op=AND&start=0&rows=5&fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,&sort=score%20desc&fq=solrLangCode%3AFR&facet=true&facet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainId&facet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecade&facet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieId&facet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionId&facet.sort=count&f.studyDecade.facet.sort=lex&spellcheck=true&spellcheck.count=10&spellcheck.dictionary=qiAndMFR&spellcheck.q=transport&hl=on&hl.fl=qSwFR,iHLSwFR,mHLSwFR&hl.fragsize=0&hl.snippets=1&hl.usePhraseHighlighter=true&hl.highlightMultiTerm=true&hl.simple.pre=%3Cb%3E&hl.simple.post=%3C%2Fb%3E&hl.mergeContiguous=false
>
> It involves highlighting on a multivalued field with more than 600 short
> values inside. It takes 200 or 300 ms because of highlighting.
>
> After restarting tomcat all went fine again.
>
> I'm trying to understand why I had to restart tomcat and solr and what
> should I do to have it working 7/7 24/24.
>
> Xavier
>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Sorting and filtering on fluctuating multi-currency price data?

2010-10-29 Thread Lance Norskog
ExternalFileField can only be used for boosting. It is not a
"first-class" field.

On Thu, Oct 28, 2010 at 11:07 AM, Chris Hostetter
 wrote:
>
> : Another approach would be to use ExternalFileField and keep the price data,
> : normalized to USD, outside of the index. Every time the currency rates
> : changed, we would calculate new normalized prices for every document in the
> : index.
>
> ...that is the approach i would normally suggest.
>
> : Still another approach would be to do the currency conversion at IndexReader
> : warmup time. We would index native price and currency code and create a
> : normalized currency field on the fly. This would be somewhat like
> : ExternalFileField in that it involved data from outside the index, but it
> : wouldn't need to be scoped to the parent SolrIndexReader, but could be
> : per-segment. Perhaps a custom poly-field could accomplish something like
> : this?
>
> ...that would essentially be what ExternalFileFiled should start doing, it
> just hasn't had anyone bite the bullet to implement it yet -- if you wnat
> to tackle that, then i would suggest/request/encourage you to look at
> doing it as a patch to ExternalFileField that could be contributed back
> and reused by all.
>
> With all of that said: there has also been a recent contribution of a
> "MoneyFieldType" for dealing precisesly with multicurrency
> sorting/filtering issues that you should definitley take a look at...
>
> https://issues.apache.org/jira/browse/SOLR-2202
>
> -Hoss
>



-- 
Lance Norskog
goks...@gmail.com


Re: Looking for Developers

2010-10-29 Thread Lance Norskog
Then, Godwin!

On Fri, Oct 29, 2010 at 3:04 AM, Toke Eskildsen  
wrote:
> On Fri, 2010-10-29 at 10:06 +0200, Mark Allan wrote:
>> For me, I simply deleted the original email, but I'm now quite
>> enjoying the irony of the complaints causing more noise on the list
>> than the original email!  ;-)
>
> He he. An old classic. Next in line is the meta-meta-discussion about
> whether meta-discussions belong on the list or if they should be moved
> to solr-user-meta. Repeat ad nauseam.
>
> Job-postings are on-topic IMHO and unless their volume grows
> significantly, I see no reason to create a new mail lists.
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: No response from Solr on complex request after several days

2010-10-29 Thread Xavier Schepler

On 29/10/2010 12:08, Lance Norskog wrote:

There are a few problems that can happen. This is usually a sign of
garbage collection problems.
You can monitor the Tomcat instance with JConsole or one of the other
java monitoring tools and see if there is a memory leak.

Also, most people don't need to do it, but you can automatically
restart it once a day.

On Thu, Oct 28, 2010 at 2:20 AM, Xavier Schepler
  wrote:
   

Hi,

We are in a beta testing phase, with several users a day.

After several days of waiting, the solr server didn't respond to requests
that require a lot of processing time.

I'm using Solr inside Tomcat.

This is the request that had no response from the server :

wt=json&omitHeader=true&q=qiAndMSwFR%3A%28transport%29&q.op=AND&start=0&rows=5&fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,&sort=score%20desc&fq=solrLangCode%3AFR&facet=true&facet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainId&facet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecade&facet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieId&facet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionId&facet.sort=count&f.studyDecade.facet.sort=lex&spellcheck=true&spellcheck.count=10&spellcheck.dictionary=qiAndMFR&spellcheck.q=transport&hl=on&hl.fl=qSwFR,iHLSwFR,mHLSwFR&hl.fragsize=0&hl.snippets=1&hl.usePhraseHighlighter=true&hl.highlightMultiTerm=true&hl.simple.pre=%3Cb%3E&hl.simple.post=%3C%2Fb%3E&hl.mergeContiguous=false

It involves highlighting on a multivalued field with more than 600 short
values inside. It takes 200 or 300 ms because of highlighting.

After restarting tomcat all went fine again.

I'm trying to understand why I had to restart tomcat and solr and what
should I do to have it working 7/7 24/24.

Xavier



 



   

Thanks for your response.
Today, I've increased the Tomcat JVM heap size from 128-256 to 
1024-2048. I will see if it helps.





Re: RAM increase

2010-10-29 Thread Lance Norskog
When you start the Tomcat app, you tell it how much memory to allocate
to the JVM. I don't remember where, probably in catalina.sh.

On Fri, Oct 29, 2010 at 2:56 AM, satya swaroop  wrote:
> Hi All,
>
>         Thanks for your reply.I have a doubt whether to increase the ram or
> heap size to java or to tomcat where the solr is running
>
>
> Regards,
> satya
>



-- 
Lance Norskog
goks...@gmail.com


Re: QueryElevation Component is so slow

2010-10-29 Thread Lance Norskog
I do not know if this is accurate. There are direct tools to monitor
these problems: jconsole, visualgc/visualvm, YourKit, etc. Often these
counts allot many things to one place that should be spread out.

On Fri, Oct 29, 2010 at 12:27 AM, Chamnap Chhorn
 wrote:
> anyone has some suggestions to improve the search?
> thanks
>
> On 10/28/10, Chamnap Chhorn  wrote:
>> Sorry for very bad pasting. I paste it again.
>>
>> Slowest Components                                      Count   Exclusive
>>      Total
>> QueryElevationComponent                                 1     506,858 ms
>> 100%
>> 506,858 ms 100%
>> SolrIndexSearcher                                         1     2.0 ms
>>  0%     2.0 ms     0%
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter()     1     1.0 ms
>>  0%     506,862 ms 100%
>> QueryComponent                                             1     1.0 ms
>>  0%   1.0 ms     0%
>> DebugComponent                                             1     0.0 ms
>>  0%     0.0 ms     0%
>> FacetComponent                                             1     0.0 ms
>>  0%     0.0 ms     0%
>>
>> On Thu, Oct 28, 2010 at 4:57 PM, Chamnap Chhorn
>> wrote:
>>
>>> Hi,
>>>
>>> I'm using solr 1.4 and using QueryElevation Component for guaranteed
>>> search
>>> position. I have around 700,000 documents with 1 Mb elevation file. It
>>> turns
>>> out it is quite slow on the newrelic monitoring website:
>>>
>>> Slowest Components Count Exclusive Total   QueryElevationComponent 1
>>> 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0%
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0% 506,862
>>> ms
>>> 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0
>>> ms
>>> 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0%
>>>
>>> As you could see, QueryElevationComponent takes quite a lot of time. Any
>>> suggestion how to improve this?
>>>
>>> --
>>> Chhorn Chamnap
>>> http://chamnapchhorn.blogspot.com/
>>>
>>
>>
>>
>> --
>> Chhorn Chamnap
>> http://chamnapchhorn.blogspot.com/
>>
>
>
> --
> Chhorn Chamnap
> http://chamnapchhorn.blogspot.com/
>



-- 
Lance Norskog
goks...@gmail.com


Influencing scores on values in multiValue fields

2010-10-29 Thread Imran
Hi All

We've got an index in which we have a multiValued field per document.

Assume the multivalue field values in each document to be;

Doc1:
bar lifters

Doc2:
truck tires
back drops
bar lifters

Doc 3:
iron bar lifters

Doc 4:
brass bar lifters
iron bar lifters
tire something
truck something
oil gas

Now when we search for 'bar lifters' the expectation (based on the
requirements) is that we get results in the order of Doc1, Doc 2, Doc4 and
Doc3.
Doc 1 - since there's an exact match (and only one) for the search terms
Doc 2 - since ther'e an exact match amongst the values
Doc 4 - since there's a partial match on the values but the number of
matches are more than Doc 3
Doc 3 - since there's a partial match

However, the results come out as Doc1, Doc3, Doc2, Doc4. Looking at the
explaination of the result it appears Doc 2 is loosing to Doc3 and Doc 4 is
loosing to Doc3 based on length normalisation.

We think we can see the reason for that - the field length in doc2 is
greater than doc3 and doc 4 is greater doc3.
However, is there any mechanism I can force doc2 to beat doc3 and doc4 to
beat doc3 with this structure.

We did look at using omitNorms=true, but that messes up the scores for all
docs. The result comes out as Doc4, Doc1, Doc2, Doc3 (where Doc1, Doc2 and
Doc3 gets the same score)
This is because the fieldNorm is not taken into account anymore (as
expected) and the termFrequence being the only contributing factor. So
trying to avoid length normalisation through omitNorms is not helping.

Is there anyway where we can influence an exact match of a value in a
multiValue field to add on to the overall score whilst keeping the lenght
normalisation?

Hope that makes sense.

Cheers
-- Imran


Re: Exception while processing: attach document

2010-10-29 Thread Bac Hoang

 Could any one shed a light please?

I saw in the log a message as below, but I don't think it's the root 
cause, because my dataSrouce, the readOnly is true


Caused by: java.sql.SQLException: READ_COMMITTED and SERIALIZABLE are 
the only valid transaction levels


A newbie Solr user

=

On 10/29/2010 1:49 PM, Bac Hoang wrote:

Hello all,

I'm getting stuck when trying to import oracle DB to solr index, could 
any one of you give a hand. Thanks million.


Below is some short info. that might be a question

My Sorl: 1.4.1

 *LOG *
INFO: Starting Full Import
Oct 29, 2010 1:19:35 PM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties

INFO: Read dataimport.properties
Oct 29, 2010 1:19:35 PM 
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity attach with URL: 
jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22
Oct 29, 2010 1:19:36 PM org.apache.solr.handler.dataimport.DocBuilder 
buildDocument
*SEVERE: Exception while processing: attach document *: 
SolrInputDocument[{}]
org.apache.solr.handler.dataimport.DataImportHandlerException: *Unable 
to execute query: *select * from A.B Processing Document # 1


where A: a schema
B: a table

 *dataSource *===
 url="jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22" user="abc" 
password="xyz"

 readOnly="true" autoCommit="false" batchSize="1"/>


format="text">






where TOPIC is a filed of table B

Thanks again



RE: Influencing scores on values in multiValue fields

2010-10-29 Thread Michael Sokolov
How about creating another field for doing exact matches (a string);
searching both and boosting the string match?

-Mike 

> -Original Message-
> From: Imran [mailto:imranboho...@gmail.com] 
> Sent: Friday, October 29, 2010 6:25 AM
> To: solr-user@lucene.apache.org
> Subject: Influencing scores on values in multiValue fields
> 
> Hi All
> 
> We've got an index in which we have a multiValued field per document.
> 
> Assume the multivalue field values in each document to be;
> 
> Doc1:
> bar lifters
> 
> Doc2:
> truck tires
> back drops
> bar lifters
> 
> Doc 3:
> iron bar lifters
> 
> Doc 4:
> brass bar lifters
> iron bar lifters
> tire something
> truck something
> oil gas
> 
> Now when we search for 'bar lifters' the expectation (based on the
> requirements) is that we get results in the order of Doc1, 
> Doc 2, Doc4 and Doc3.
> Doc 1 - since there's an exact match (and only one) for the 
> search terms Doc 2 - since ther'e an exact match amongst the 
> values Doc 4 - since there's a partial match on the values 
> but the number of matches are more than Doc 3 Doc 3 - since 
> there's a partial match
> 
> However, the results come out as Doc1, Doc3, Doc2, Doc4. 
> Looking at the explaination of the result it appears Doc 2 is 
> loosing to Doc3 and Doc 4 is loosing to Doc3 based on length 
> normalisation.
> 
> We think we can see the reason for that - the field length in 
> doc2 is greater than doc3 and doc 4 is greater doc3.
> However, is there any mechanism I can force doc2 to beat doc3 
> and doc4 to beat doc3 with this structure.
> 
> We did look at using omitNorms=true, but that messes up the 
> scores for all docs. The result comes out as Doc4, Doc1, 
> Doc2, Doc3 (where Doc1, Doc2 and
> Doc3 gets the same score)
> This is because the fieldNorm is not taken into account anymore (as
> expected) and the termFrequence being the only contributing 
> factor. So trying to avoid length normalisation through 
> omitNorms is not helping.
> 
> Is there anyway where we can influence an exact match of a 
> value in a multiValue field to add on to the overall score 
> whilst keeping the lenght normalisation?
> 
> Hope that makes sense.
> 
> Cheers
> -- Imran
> 



Re: Reverse range query

2010-10-29 Thread kenf_nc

I modified the text of this hopefully to make it clearer. I wasn't sure what
I was asking was coming across well. And I'm adding this comment in a
shameless attempt to boost my question back to the top for people to see.
Before I write a messy work around, just wanted to check the community to
see if this was already handled, it seems like a useful, common, data type.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Reverse-range-query-tp1789135p1792126.html
Sent from the Solr - User mailing list archive at Nabble.com.


eDismax result differs from Dismax

2010-10-29 Thread Ryan Walker

We are launching a new version of our job board helping returning veterans find 
a civilian job, and we chose Solr and Sunspot[1] to power our search. We really 
didn't consider the power users in the HR world who are trained to use boolean 
search, for example:

"Engineer" AND ("Electrical" OR "Mechanical")

Sunspot supports the Dismax request handler, which unfortunately does not 
handle the query above properly. So we read about eDismax and that it was baked 
into Solr 1.5. At the same time, Sunspot has switched from LocalSolr 
integration to storing a geohash in a full-text searchable field.

We're having some problems with some complex queries that Sunspot generates:

INFO: [] webapp=/solr path=/select 
params={fl=+score&start=0&q=query:"{!dismax+qf%3D'title_text+description_text'}Ruby+on+Rails+Developer"+(location_details_s:dngythdb25fu^1.0+OR+location_details_s:dngythdb25f^0.0625+OR+location_details_s:dngythdb25*^0.00391+OR+location_details_s:dngythdb2*^0.000244+OR+location_details_s:dngythdb*^0.153+OR+location_details_s:dngythd*^0.00954+OR+location_details_s:dngyth*^0.000596+OR+location_details_s:dngyt*^0.373+OR+location_details_s:dngy*^0.0233+OR+location_details_s:dng*^0.00146)&wt=ruby&fq=type:Job&defType=edismax&rows=20}
 hits=1 status=0 QTime=13

Under Dismax no results are returned for this query, however, as you can see 
above with eDismax a result is returned -- the only difference between the two 
queries are '&defType=edismax' vs '&defType=dismax'

Debug Output Solr 1.5 eDismax:
https://gist.github.com/32f3a52064ec300fdca0

Debug Output Solr 1.5 Dismax:
https://gist.github.com/d82b82a026878ecce36b

My question is if you have any ideas why the query above returns a record that 
doesn't match, in eDismax?

We are at a crossroads where we have to decide if we want to forge ahead with 
Sunspot 1.2rc4 and Solr 1.5, or we may fall back to Sunspot 1.1 and Solr 1.4 
until Solr 3.1/4.0 come out, hopefully with eDismax support and better location 
search support.

I plan to do a blog posting on this issue when we figure it out, I'll give you 
props if you can help us out :)

Best regards,

Ryan Walker
Chief Experience Officer
http://www.recruitmilitary.com
513.677.7078
Best regards,

[1] http://outoftime.github.com/sunspot/

Re: eDismax result differs from Dismax

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 9:30 AM, Ryan Walker  wrote:
>
> We are launching a new version of our job board helping returning veterans 
> find a civilian job, and we chose Solr and Sunspot[1] to power our search. We 
> really didn't consider the power users in the HR world who are trained to use 
> boolean search, for example:
>
> "Engineer" AND ("Electrical" OR "Mechanical")
>
> Sunspot supports the Dismax request handler, which unfortunately does not 
> handle the query above properly. So we read about eDismax and that it was 
> baked into Solr 1.5. At the same time, Sunspot has switched from LocalSolr 
> integration to storing a geohash in a full-text searchable field.
>
> We're having some problems with some complex queries that Sunspot generates:
>
> INFO: [] webapp=/solr path=/select 
> params={fl=+score&start=0&q=query:"{!dismax+qf%3D'title_text+description_text'}Ruby+on+Rails+Developer"+(location_details_s:dngythdb25fu^1.0+OR+location_details_s:dngythdb25f^0.0625+OR+location_details_s:dngythdb25*^0.00391+OR+location_details_s:dngythdb2*^0.000244+OR+location_details_s:dngythdb*^0.153+OR+location_details_s:dngythd*^0.00954+OR+location_details_s:dngyth*^0.000596+OR+location_details_s:dngyt*^0.373+OR+location_details_s:dngy*^0.0233+OR+location_details_s:dng*^0.00146)&wt=ruby&fq=type:Job&defType=edismax&rows=20}
>  hits=1 status=0 QTime=13
>
> Under Dismax no results are returned for this query, however, as you can see 
> above with eDismax a result is returned -- the only difference between the 
> two queries are '&defType=edismax' vs '&defType=dismax'

That's to be expected.  Dismax doesn't even support fielded queries
(where you specify the fieldname in the query itself) so this clause
is treated all as text:

(location_details_s:dngythdb25fu^1.0

and dismax QP will be looking for tokens like "location_details_s"
"dngythdb25fu" (assuming tokenization would split on the
non-alphanumeric chars) in your text fields.

-Yonik
http://www.lucidimagination.com


Re: Maximum of length of a Dismax Query?

2010-10-29 Thread Swapnonil Mukherjee
Solved this issue, by  setting the maxHttpHeaderSize to 65536 in 
tomcat/conf/server.xml file.

Otherwise Tomcat was not responding.

Swapnonil Mukherjee



On 29-Oct-2010, at 2:43 PM, Swapnonil Mukherjee wrote:

I am using the SOLRJ client to post my query, The query length is roughly 
10,000 characters. I am using GET like this.

int page = 1;
   int resultsPerPage = 24;
   ModifiableSolrParams params = new ModifiableSolrParams();
   params.set("q", query);
   params.set("start", "" + (page - 1) * resultsPerPage);
   params.set("rows", resultsPerPage);
   try
   {
   QueryResponse response = 
QueryServerManager.getSolrServer().query(params, SolrRequest.METHOD.GET);
   assertNotNull(response);
   }
   catch (SolrServerException e)
   {
   e.printStackTrace();
   }
This hits the exception block with the following exception

org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: 
Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:122)
at 
com.getty.search.tests.DismaxQueryTestCase.testAssetQuery(DismaxQueryTestCase.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.textui.TestRunner.doRun(TestRunner.java:116)
at com.intellij.junit3.JUnit3IdeaTestRunner.doRun(JUnit3IdeaTestRunner.java:108)
at junit.textui.TestRunner.doRun(TestRunner.java:109)
at 
com.intellij.junit3.JUnit3IdeaTestRunner.startRunnerWithArgs(JUnit3IdeaTestRunner.java:42)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64)

Swapnonil Mukherjee



On 29-Oct-2010, at 12:44 PM, Swapnonil Mukherjee wrote:

Hi Everybody,

It seems that the maximum query length supported by the Dismax Query Handler is 
3534 characters. Is there anyway I can set this limit to be around 12,000?

If I fire a query beyond 3534 characters, I don't even get error messages in 
the catalina.XXX log files.

Swapnonil Mukherjee
+91-40092712
+91-9007131999







Re: QueryElevation Component is so slow

2010-10-29 Thread Chamnap Chhorn
Thanks for reply.

I'm looking for how to improve the speed of the search query. The
QueryElevation Component is taking too much time which is
unacceptable. The size of elevation file is only 1 Mb. I wonder other
people using this component without problems (related to speed)? Am I
using it the wrong way or there is a limit when using this component?

On 10/29/10, Lance Norskog  wrote:
> I do not know if this is accurate. There are direct tools to monitor
> these problems: jconsole, visualgc/visualvm, YourKit, etc. Often these
> counts allot many things to one place that should be spread out.
>
> On Fri, Oct 29, 2010 at 12:27 AM, Chamnap Chhorn
>  wrote:
>> anyone has some suggestions to improve the search?
>> thanks
>>
>> On 10/28/10, Chamnap Chhorn  wrote:
>>> Sorry for very bad pasting. I paste it again.
>>>
>>> Slowest Components                                      Count   Exclusive
>>>      Total
>>> QueryElevationComponent                                 1     506,858 ms
>>> 100%
>>> 506,858 ms 100%
>>> SolrIndexSearcher                                         1     2.0 ms
>>>  0%     2.0 ms     0%
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter()     1     1.0 ms
>>>  0%     506,862 ms 100%
>>> QueryComponent                                             1     1.0 ms
>>>  0%   1.0 ms     0%
>>> DebugComponent                                             1     0.0 ms
>>>  0%     0.0 ms     0%
>>> FacetComponent                                             1     0.0 ms
>>>  0%     0.0 ms     0%
>>>
>>> On Thu, Oct 28, 2010 at 4:57 PM, Chamnap Chhorn
>>> wrote:
>>>
 Hi,

 I'm using solr 1.4 and using QueryElevation Component for guaranteed
 search
 position. I have around 700,000 documents with 1 Mb elevation file. It
 turns
 out it is quite slow on the newrelic monitoring website:

 Slowest Components Count Exclusive Total   QueryElevationComponent 1
 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0%
 org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0%
 506,862
 ms
 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0
 ms
 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0%

 As you could see, QueryElevationComponent takes quite a lot of time. Any
 suggestion how to improve this?

 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/

>>>
>>>
>>>
>>> --
>>> Chhorn Chamnap
>>> http://chamnapchhorn.blogspot.com/
>>>
>>
>>
>> --
>> Chhorn Chamnap
>> http://chamnapchhorn.blogspot.com/
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Re: QueryElevation Component is so slow

2010-10-29 Thread Chamnap Chhorn
Thanks for reply.

I'm looking for how to improve the speed of the search query. The
QueryElevation Component is taking too much time which is
unacceptable. The size of elevation file is only 1 Mb. I wonder other
people using this component without problems (related to speed)? Am I
using it the wrong way or there is a limit when using this component?

On 10/29/10, Lance Norskog  wrote:
> I do not know if this is accurate. There are direct tools to monitor
> these problems: jconsole, visualgc/visualvm, YourKit, etc. Often these
> counts allot many things to one place that should be spread out.
>
> On Fri, Oct 29, 2010 at 12:27 AM, Chamnap Chhorn
>  wrote:
>> anyone has some suggestions to improve the search?
>> thanks
>>
>> On 10/28/10, Chamnap Chhorn  wrote:
>>> Sorry for very bad pasting. I paste it again.
>>>
>>> Slowest Components                                      Count   Exclusive
>>>      Total
>>> QueryElevationComponent                                 1     506,858 ms
>>> 100%
>>> 506,858 ms 100%
>>> SolrIndexSearcher                                         1     2.0 ms
>>>  0%     2.0 ms     0%
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter()     1     1.0 ms
>>>  0%     506,862 ms 100%
>>> QueryComponent                                             1     1.0 ms
>>>  0%   1.0 ms     0%
>>> DebugComponent                                             1     0.0 ms
>>>  0%     0.0 ms     0%
>>> FacetComponent                                             1     0.0 ms
>>>  0%     0.0 ms     0%
>>>
>>> On Thu, Oct 28, 2010 at 4:57 PM, Chamnap Chhorn
>>> wrote:
>>>
 Hi,

 I'm using solr 1.4 and using QueryElevation Component for guaranteed
 search
 position. I have around 700,000 documents with 1 Mb elevation file. It
 turns
 out it is quite slow on the newrelic monitoring website:

 Slowest Components Count Exclusive Total   QueryElevationComponent 1
 506,858 ms 100% 506,858 ms 100% SolrIndexSearcher 1 2.0 ms 0% 2.0 ms 0%
 org.apache.solr.servlet.SolrDispatchFilter.doFilter() 1 1.0 ms 0%
 506,862
 ms
 100% QueryComponent 1 1.0 ms 0% 1.0 ms 0% DebugComponent 1 0.0 ms 0% 0.0
 ms
 0% FacetComponent 1 0.0 ms 0% 0.0 ms 0%

 As you could see, QueryElevationComponent takes quite a lot of time. Any
 suggestion how to improve this?

 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/

>>>
>>>
>>>
>>> --
>>> Chhorn Chamnap
>>> http://chamnapchhorn.blogspot.com/
>>>
>>
>>
>> --
>> Chhorn Chamnap
>> http://chamnapchhorn.blogspot.com/
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


RE: Natural string sorting

2010-10-29 Thread Bob Sandiford
Well, you could do a magnitude notation approach.  Depends on how complex the 
strings are, but based on your examples, this would work:

1) Identify a series of integers in the string.  (This assumes lengths are no 
more than 9 for each series).

2) Insert the number of integers into the string before the integer series 
itself

So - for sorting - you would have:

string1 --> string11
string10 --> string210
string2 --> string12

which will then sort as string11, string12, string210, but use the original 
strings as the displays you want.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 

> -Original Message-
> From: Savvas-Andreas Moysidis
> [mailto:savvas.andreas.moysi...@googlemail.com]
> Sent: Friday, October 29, 2010 4:33 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Natural string sorting
> 
> I think string10 is before string2 in lexicographic order?
> 
> On 29 October 2010 09:18, RL  wrote:
> 
> >
> > Just a quick question about natural sorting of strings.
> >
> > I've a simple dynamic field in my schema:
> >
> >  > omitNorms="true"/>
> >  > omitNorms="true"/>
> >
> > There are 3 indexed strings for example
> > string1,string2,string10
> >
> > Executing a query and sorting by this field leads to unnatural
> sorting of :
> > string1
> > string10
> > string2
> >
> > (Some time ago i used Lucene and i was pretty sure that Lucene used a
> > natural sort, thus i expected the same from solr)
> > Is there a way to sort in a natural order? Config option? Plugin?
> Expected
> > output would be:
> > string1
> > string2
> > string10
> >
> >
> > Thanks in advance.
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Natural-string-sorting-
> tp1791227p1791227.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >


RE: spellchecker results not as desired

2010-10-29 Thread Dyer, James
You should be building your index on a field that creates tokens on whitespace. 
 So your dictionary would have "iphone" and "case" as separate terms instead of 
"iphone case" as one term.  And if you query on something like "iphole case", 
it will give suggestions for "iphole" but not for "case" because the later is 
in the dictionary.  (The spellchecker will always assume a term is correctly 
spelled if it is in the Dictionary).

If you set collate=true, in addition to getting word-by-word suggestions, it 
will return a re-written query (aka a "collation").  SOLR 1.4 wil always use 
the top suggestions for each word to form the collation.  In this example, the 
collation would be "iphone case".  You can then requery SOLR with the collation 
and hope to get better hits.  While 1.4 doesn't check to see if the collation 
is going to return any hits, an enhancement to 3.x and 4.0 allows you to 
guarantee that collations will always give you hits if you requery them.

As for your second question, likely "ipj" is close enough to "ipad" to warrant 
a suggestion but the others are not considered close enough.  You can tweak 
this by setting spellcheck.accuracy.  However, I do not believe this option is 
available in 1.4.  The wiki indicates it is 3.x/4.0 only.

For more information, look at the "SpellCheckComponent" page on the wiki.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: abhayd [mailto:ajdabhol...@hotmail.com] 
Sent: Thursday, October 28, 2010 4:34 PM
To: solr-user@lucene.apache.org
Subject: spellchecker results not as desired


hi 

I added spellchecker to request handler. Spellchecker is indexed based.
Terms in index are like
iphone
iphone 4
iphone case
phone
gophoe

when i set q=iphole i get suggestions like
iphone
phone
gophone
ipad

Not sure how would i get iphone, iphone 4, iphone case, phone. Any thoughts?

At the same time when i type ipj
i get result as ipad, why not iphone, iphone 4 , ipad
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/spellchecker-results-not-as-desired-tp1789192p1789192.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: RAM increase

2010-10-29 Thread Tommaso Teofili
Hello Lance,
form the command line run:

> export JAVA_OPTS='-d64 -Xms128m -Xmx5g'

eventually changing values of Xms and Xmx.
Hope this helps.
Tommaso

2010/10/29 Lance Norskog 

> When you start the Tomcat app, you tell it how much memory to allocate
> to the JVM. I don't remember where, probably in catalina.sh.
>
> On Fri, Oct 29, 2010 at 2:56 AM, satya swaroop 
> wrote:
> > Hi All,
> >
> > Thanks for your reply.I have a doubt whether to increase the ram
> or
> > heap size to java or to tomcat where the solr is running
> >
> >
> > Regards,
> > satya
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Something for the weekend - Lily 0.2 is OUT ! :)

2010-10-29 Thread Steven Noels
Dear all,

three months after the highly anticipated proof of architecture release,
we're living up to our promises, and are releasing Lily 'CR' 0.2 today - a
fully-distributed, highly scalable and highly available content repository,
marrying best-of-breed database and search technology into a powerful,
productive and easy-to-use solution for contemporary internet-scale content
applications.
For whom

You're building content applications (content management, archiving, asset
management, DMS, WCMS, portals, ...) that scale well, either as a product, a
project or in the cloud. You need a trustworthy underlying content
repository that provides a flexible and easy-to-use content model you can
adapt to your requirements. You have a keen interest in NoSQL/HBase
technology but needs a higher-level API, and scalable indexing and search as
well.
Foundations

Lily builds further upon Apache HBase and Apache SOLR. HBase is a faithful
implementation of the Google BigTable database, and provides infinite
elastic scaling and high-performance access to huge amounts of data. SOLR is
the server version of Lucene, the industry-standard search library. Lily
joins HBase and SOLR in a single, solidly packaged content repository
product, with automated sharding (making use of multiple hardware nodes to
provide scaling of volume and performance) and automatic index maintenance.
Lily adds a sophisticated, yet flexible and surprisingly practical content
schema on top of this, providing the structuredness of more classic
databases, versioning, secondary indexing, queuing: all the stuff developers
care for when fixing real-world problems.
Key features of this release

   - Fully distributed: Lily has a fully-distributed architecture making
   maximum use of all available hardware for scalability and availability.
   ZooKeeper is used for distributed process coordination, configuration and
   locking. Index maintenance is based on an HBase-backed RowLog mechanism
   allowing fast but reliable updating of SOLR indexes.
   - Index maintenance: Lily offers all the features and functionality of
   SOLR, but makes index maintenance a breeze, both for interactive as-you-go
   updating and MapReduce-based full index rebuilds
   - Multi-indexers: for high-load situations, multiple indexers can work in
   parallel and talk to a sharded SOLR setup
   - REST interface: a flexible and platform-neutral access method for all
   Lily operations using HTTP and JSON
   - Improved content model: we added URI as a base Lily type as a (small)
   indication of our interest in semantic technology

More importantly, we commit ourselves to take care of API compatibility and
data format layout from this release onwards - as much as humanly possible.
Lily 0.2 offers the API we want to support in the final release. Lily 0.2 is
our contract for content application developers, upgrading to Lily final
should require them to do as little code or data changes as possible.
>From where

Download Lily from www.lilyproject.org. It's Apache Licensed Open Source. No
strings attached.
Enterprise support

Together with this release, we're rolling out our commercial support
services  (and signed up a
first customer, yay!) that allows you to use Lily with peace of mind. Also,
this release has been fully tested and depends on the latest Cloudera
Distribution for Hadoop  (CDH3 beta3).
Next up

Lily 1.0 is planned for March 2011, with an interim release candidate in
January. We'll be working on performance enhancements, feature additions,
and are happily - eagerly - awaiting your feedback and comments. We'll post
a roadmap for Lily 0.3 and onwards by mid November.
Follow us

If you want to keep track of Lily's on-going development, join the Lily
discussion list or follow our company Twitter
@outerthought
.
Thank you

I'd like to thank Bruno and Evert for their hard work so far, the HBase and
SOLR community for their help, the IWT government fund for their partial
financial support, and all of our early Lily adopters and enthusiasts for
their much valued feedback. You guys rock!

Steven.
-- 
Steven Noels
http://outerthought.org/
Open Source Content Applications
Makers of Kauri, Daisy CMS and Lily


Re: Exception while processing: attach document

2010-10-29 Thread Tommaso Teofili
I think this is a JDBC warning message since some isolation levels may not
be implemented in the actual (Oracle) implementation (e.g.:
READ_UNCOMMITTED).
May your issue be related to some transactions updating/inserting/deleting
records on your Oracle DB while trying to run DIH?
Regards,
Tommaso

2010/10/29 Bac Hoang 

>  Could any one shed a light please?
>
> I saw in the log a message as below, but I don't think it's the root cause,
> because my dataSrouce, the readOnly is true
>
> Caused by: java.sql.SQLException: READ_COMMITTED and SERIALIZABLE are the
> only valid transaction levels
>
> A newbie Solr user
>
> =
>
>
> On 10/29/2010 1:49 PM, Bac Hoang wrote:
>
>> Hello all,
>>
>> I'm getting stuck when trying to import oracle DB to solr index, could any
>> one of you give a hand. Thanks million.
>>
>> Below is some short info. that might be a question
>>
>> My Sorl: 1.4.1
>>
>>  *LOG *
>> INFO: Starting Full Import
>> Oct 29, 2010 1:19:35 PM org.apache.solr.handler.dataimport.SolrWriter
>> readIndexerProperties
>> INFO: Read dataimport.properties
>> Oct 29, 2010 1:19:35 PM
>> org.apache.solr.handler.dataimport.JdbcDataSource$1 call
>> INFO: Creating a connection for entity attach with URL:
>> jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22
>> Oct 29, 2010 1:19:36 PM org.apache.solr.handler.dataimport.DocBuilder
>> buildDocument
>> *SEVERE: Exception while processing: attach document *:
>> SolrInputDocument[{}]
>> org.apache.solr.handler.dataimport.DataImportHandlerException: *Unable to
>> execute query: *select * from A.B Processing Document # 1
>> 
>> where A: a schema
>> B: a table
>>
>>  *dataSource *===
>> >  url="jdbc:oracle:thin:@192.168.72.7:1521:OFIRDS22" user="abc"
>> password="xyz"
>>  readOnly="true" autoCommit="false" batchSize="1"/>
>> 
>> 
>> > format="text">
>> 
>> 
>> 
>> 
>> 
>> where TOPIC is a filed of table B
>>
>> Thanks again
>>
>>


Re: Multiple indexes inside a single core

2010-10-29 Thread Valli Indraganti
Here's the Jira issue for the distributed search issue.
https://issues.apache.org/jira/browse/SOLR-1632

I tried applying this patch but, get the same error that is posted in the
discussion section for that issue. I will be glad to help too on this one.

On Sat, Oct 23, 2010 at 2:35 PM, Erick Erickson wrote:

> Ah, I should have read more carefully...
>
> I remember this being discussed on the dev list, and I thought there might
> be
> a Jira attached but I sure can't find it.
>
> If you're willing to work on it, you might hop over to the solr dev list
> and
> start
> a discussion, maybe ask for a place to start. I'm sure some of the devs
> have
> thought about this...
>
> If nobody on the dev list says "There's already a JIRA on it", then you
> should
> open one. The Jira issues are generally preferred when you start getting
> into
> design because the comments are preserved for the next person who tries
> the idea or makes changes, etc
>
> Best
> Erick
>
> On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess 
> wrote:
>
> > Thanks Erick.  The problem with multiple cores is that the documents are
> > scored independently in each core.  I would like to be able to search
> across
> > both cores and have the scores 'normalized' in a way that's similar to
> what
> > Lucene's MultiSearcher would do.  As far a I understand, multiple cores
> > would likely result in seriously skewed scores in my case since the
> > documents are not distributed evenly or randomly.  I could have one
> > core/index with 20 million docs and another with 200.
> >
> > I've poked around in the code and this feature doesn't seem to exist.  I
> > would be happy with finding a decent place to try to add it.  I'm not
> sure
> > if there is a clean place for it.
> >
> > Ben
> >
> > On Oct 20, 2010, at 8:36 PM, Erick Erickson 
> > wrote:
> >
> > > It seems to me that multiple cores are along the lines you
> > > need, a single instance of Solr that can search across multiple
> > > sub-indexes that do not necessarily share schemas, and are
> > > independently maintainable..
> > >
> > > This might be a good place to start:
> > http://wiki.apache.org/solr/CoreAdmin
> > >
> > > HTH
> > > Erick
> > >
> > > On Wed, Oct 20, 2010 at 3:23 PM, ben boggess 
> > wrote:
> > >
> > >> We are trying to convert a Lucene-based search solution to a
> > >> Solr/Lucene-based solution.  The problem we have is that we currently
> > have
> > >> our data split into many indexes and Solr expects things to be in a
> > single
> > >> index unless you're sharding.  In addition to this, our indexes
> wouldn't
> > >> work well using the distributed search functionality in Solr because
> the
> > >> documents are not evenly or randomly distributed.  We are currently
> > using
> > >> Lucene's MultiSearcher to search over subsets of these indexes.
> > >>
> > >> I know this has been brought up a number of times in previous posts
> and
> > the
> > >> typical response is that the best thing to do is to convert everything
> > into
> > >> a single index.  One of the major reasons for having the indexes split
> > up
> > >> the way we do is because different types of data need to be indexed at
> > >> different intervals.  You may need one index to be updated every 20
> > minutes
> > >> and another is only updated every week.  If we move to a single index,
> > then
> > >> we will constantly be warming and replacing searchers for the entire
> > >> dataset, and will essentially render the searcher caches useless.  If
> we
> > >> were able to have multiple indexes, they would each have a searcher
> and
> > >> updates would be isolated to a subset of the data.
> > >>
> > >> The other problem is that we will likely need to shard this large
> single
> > >> index and there isn't a clean way to shard randomly and evenly across
> > the
> > >> of
> > >> the data.  We would, however like to shard a single data type.  If we
> > could
> > >> use multiple indexes, we would likely be also sharding a small sub-set
> > of
> > >> them.
> > >>
> > >> Thanks in advance,
> > >>
> > >> Ben
> > >>
> >
>


Re: Stored or indexed?

2010-10-29 Thread Elizabeth L. Murnane
Hi Ron,

In a nutshell - an indexed field is searchable, and a stored field has its 
content stored in the index so it is retrievable. Here are some examples that 
will hopefully give you a feel for how to set the indexed and stored options:

indexed="true" stored="true"
Use this for information you want to search on and also display in search 
results - for example, book title or author.

indexed="false" stored="true"
Use this for fields that you want displayed with search results but that don't 
need to be searchable - for example, destination URL, file system path, time 
stamp, or icon image.

indexed="true" stored="false"
Use this for fields you want to search on but don't need to get their values in 
search results. Here are some of the common reasons you would want this:

Large fields and a database: Storing a field makes your index larger, so set 
stored to false when possible, especially for big fields. For this case a 
database is often used, as the previous responder said. Use a separate 
identifier field to get the field's content from the database.

Ordering results: Say you define field name="bookName" type="text" 
indexed="true" stored="true" that is tokenized and used for searching. If you 
want to sort results based on book name, you could copy the field into a 
separate nonretrievable, nontokenized field that can be used just for sorting -
field name="bookSort" type="string" indexed="true" stored="false"
copyField source="bookName" dest="bookSort"

Easier searching: If you define the field  you can use it as a 
catch-all field that contains all of the other text fields. Since solr looks in 
a default field when given a text query without field names, you can support 
this type of general phrase query by making the catch-all the default field.

indexed="false" stored="false"
Use this when you want to ignore fields. For example, the following will ignore 
unknown fields that don't match a defined field rather than throwing an error 
by default.
fieldtype name="ignored" stored="false" indexed="false"
dynamicField name="*" type="ignored"


Elizabeth Murnane
emurn...@architexa.com
Architexa Lead Developer - www.architexa.com
Understand & Document Code In Seconds


--- On Thu, 10/28/10, Savvas-Andreas Moysidis 
 wrote:

From: Savvas-Andreas Moysidis 
Subject: Re: Stored or indexed?
To: solr-user@lucene.apache.org
Date: Thursday, October 28, 2010, 4:25 AM

In our case, we just store a database id and do a secondary db query when
displaying the results.
This is handy and leads to a more centralised architecture when you need to
display properties of a domain object which you don't index/search.

On 28 October 2010 05:02, kenf_nc  wrote:

>
> Interesting wiki link, I hadn't seen that table before.
>
> And to answer your specific question about indexed=true, stored=false, this
> is most often done when you are using analyzers/tokenizers on your field.
> This field is for search only, you would never retrieve it's contents for
> display. It may in fact be an amalgam of several fields into one 'content'
> field. You have your display copy stored in another field marked
> indexed=false, stored=true and optionally compressed. I also have simple
> string fields set to lowercase so searching is case-insensitive, and have a
> duplicate field where the string is normal case. the first one is
> indexed/not stored, the second is stored/not indexed.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Stored-or-indexed-tp1782805p1784315.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


How can I disable fsync()?

2010-10-29 Thread Igor Chudov
Thanks to all and I made Solr work very well on one newer machine.

Now I am setting up Solr on an older server with an IDE hard drive.

Unfortunately, populating the index takes FOREVER due to
Solr/Lucene/Tomcat calling fsync() a lot after every write.

I would like to know how to disable fsync.

I am very aware of the risks of not having fsync() and I DO NOT CARE
ABOUT THEM AND DO NOT WANT TO BE REMINDED.

I just want to know how can I disable fsync() when adding to Solr index.

Thanks, guys!

Igor


Re: documentCache clarification

2010-10-29 Thread Jay Luker
On Thu, Oct 28, 2010 at 7:27 PM, Chris Hostetter
 wrote:

> The queryResultCache is keyed on  and the
> value is a "DocList" object ...
>
> http://lucene.apache.org/solr/api/org/apache/solr/search/DocList.html
>
> Unlike the Document objects in the documentCache, the DocLists in the
> queryResultCache never get modified (techincally Solr doesn't actually
> modify the Documents either, the Document just keeps track of it's fields
> and updates itself as Lazy Load fields are needed)
>
> if a DocList containing results 0-10 is put in the cache, it's not
> going to be of any use for a query with start=50.  but if it contains 0-50
> it *can* be used if start < 50 and rows < 50 -- that's where the
> queryResultWindowSize comes in.  if you use start=0&rows=10, but your
> window size is 50, SolrIndexSearcher will (under the covers) use
> start=0&rows=50 and put that in the cache, returning a "slice" from 0-10
> for your query.  the next query asking for 10-20 will be a cache hit.

This makes sense but still doesn't explain what I'm seeing in my cache
stats. When I issue a request with rows=10 the stats show an insert
into the queryResultCache. If I send the same query, this time with
rows=1000, I would not expect to see a cache hit but I do. So it seems
like there must be something useful in whatever gets cached on the
first request for rows=10 for it to be re-used by the request for
rows=1000.

--jay


Re: documentCache clarification

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 2:31 PM, Jay Luker  wrote:
> This makes sense but still doesn't explain what I'm seeing in my cache
> stats. When I issue a request with rows=10 the stats show an insert
> into the queryResultCache. If I send the same query, this time with
> rows=1000, I would not expect to see a cache hit but I do.

This is a limitation in the SolrCache API.
The key into the cache does not contain rows, so the cache returns the
first 10 docs and increments it's hit count.  Then the cache user
(SolrIndexSearcher) looks at the entry and determines it can't use it.
 One way to fix this would be to add a method that says "that was
actually a miss" to the cache API.

-Yonik
http://www.lucidimagination.com


Custom Sorting in Solr

2010-10-29 Thread Ezequiel Calderara
Hi all guys!
I'm in a weird situation here.
We have index a set of documents which are ordered using a linked list (each
documents has the reference of the previous and the next).

Is there a way when sorting in the solr search, Use the linked list to sort?


If that is not possible, how can i use the DIH to access a Service in WCF or
a Webservice? Should i develop my own DIH?


-- 
__
Ezequiel.

Http://www.ironicnet.com


RE: Custom Sorting in Solr

2010-10-29 Thread Jonathan Rochkind
There's no way I know of to make Solr use that kind of data to create the sort 
order you want. 

Generally for 'custom' sorts, you want to create a field in your Solr index 
with possibly artificially constructed values that will 'naturally' sort the 
way you want. 

How to do that with a linked list seems kind of tricky, before you index you 
may have to write code to analyze your whole graph order and then just supply 
sort order keys.  And then if you sometimes update just a few documents, but 
not your whole thing.. Geez, i'm not really sure. It's kind of a tricky 
problem.  That kind of data is not really the expected use case for Solr 
sorting. 

Sorry, I'm not sure what this means or how it would help: "use the DIH to 
access a Service in WCF or a Webservice?"  Maybe someone else will know exactly 
what you mean. Or maybe if you rephrase with more specificity as to how you 
think this will help you solve your problem, it will be more clear. 

Recall that you don't need to use DIH to index at all, it's just one of several 
methods, it simplifies things for common patterns, it's possible you fall out 
of the common pattern nad it would be simpler not to use DIH.   Although even 
without DIH, I can't think of a particularly simple way to solve your problem. 

Just curious, but is your _entire_ corpus, your entire document set, part of a 
_single_ linked list?  Or do you have several different linked lists in there? 
If several, what do you want to happen with sort if two documents in the result 
set aren't even part of the same linked list?   This kind of thing is one 
reason translating the sort of data you have to a solr sort order starts to 
seem kind of confusing to me. 


From: Ezequiel Calderara [ezech...@gmail.com]
Sent: Friday, October 29, 2010 3:39 PM
To: Solr Mailing List
Subject: Custom Sorting in Solr

Hi all guys!
I'm in a weird situation here.
We have index a set of documents which are ordered using a linked list (each
documents has the reference of the previous and the next).

Is there a way when sorting in the solr search, Use the linked list to sort?


If that is not possible, how can i use the DIH to access a Service in WCF or
a Webservice? Should i develop my own DIH?


--
__
Ezequiel.

Http://www.ironicnet.com


Re: documentCache clarification

2010-10-29 Thread Chris Hostetter

: This is a limitation in the SolrCache API.
: The key into the cache does not contain rows, so the cache returns the
: first 10 docs and increments it's hit count.  Then the cache user
: (SolrIndexSearcher) looks at the entry and determines it can't use it.

Wow, I never realized that.

Why don't we just include the start & rows (modulo the window size) in 
the cache key?

-Hoss


Re: Custom Sorting in Solr

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 3:39 PM, Ezequiel Calderara  wrote:
> Hi all guys!
> I'm in a weird situation here.
> We have index a set of documents which are ordered using a linked list (each
> documents has the reference of the previous and the next).
>
> Is there a way when sorting in the solr search, Use the linked list to sort?

It seems like you should be able to encode this linked list as an
integer instead, and sort by that?
If there are multiple linked lists in the index, it seems like you
could even use the high bits of the int to designate which list the
doc belongs to, and the low order bits as the order in that list.

-Yonik
http://www.lucidimagination.com


Re: documentCache clarification

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 3:49 PM, Chris Hostetter
 wrote:
>
> : This is a limitation in the SolrCache API.
> : The key into the cache does not contain rows, so the cache returns the
> : first 10 docs and increments it's hit count.  Then the cache user
> : (SolrIndexSearcher) looks at the entry and determines it can't use it.
>
> Wow, I never realized that.
>
> Why don't we just include the start & rows (modulo the window size) in
> the cache key?

The implementation of equals() would be rather difficult... actually
impossible w/o abusing the semantics.
It would also be impossible w/o the Map implementation guaranteeing
what object was on the LHS vs the RHS when equals was called.

Unless I'm missing something obvious?

-Yonik
http://www.lucidimagination.com


Re: documentCache clarification

2010-10-29 Thread Chris Hostetter

: > Why don't we just include the start & rows (modulo the window size) in
: > the cache key?
: 
: The implementation of equals() would be rather difficult... actually
: impossible w/o abusing the semantics.
: It would also be impossible w/o the Map implementation guaranteeing
: what object was on the LHS vs the RHS when equals was called.
: 
: Unless I'm missing something obvious?

You've totally confused me.

What i'm saying is that SolrIndexSearcher should consult the window size 
before consulting the cache -- the start param should be rounded down to 
the nearest multiple of hte window size, and start+rows (ie: end) should 
be rounded up to one less then the nearest multiple of the windows size, 
and then that should be looked up in the cache.

equality on the cache key is straight forward...
   this.q==that.q && this.start==that.start && this.end==that.end && 
   this.sort == that.sort && this.filters == that.filters

so if the window size is "50" and SOlrIndexSearcher gets a request like 
q=x&start=33&rows=10&sort=y&fq=... it should  
generate a cache key where start=0 and end=49.  (if start=33&rows=42, then 
the key would contain start=0 and end=99 ... which could result in some 
overlap, but that's why people are suppose to pick a window size greater 
then the largest number of rows typically requested)



-Hoss


Re: documentCache clarification

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 4:21 PM, Chris Hostetter
 wrote:
>
> : > Why don't we just include the start & rows (modulo the window size) in
> : > the cache key?
> :
> : The implementation of equals() would be rather difficult... actually
> : impossible w/o abusing the semantics.
> : It would also be impossible w/o the Map implementation guaranteeing
> : what object was on the LHS vs the RHS when equals was called.
> :
> : Unless I'm missing something obvious?
>
> You've totally confused me.
>
> What i'm saying is that SolrIndexSearcher should consult the window size
> before consulting the cache -- the start param should be rounded down to
> the nearest multiple of hte window size, and start+rows (ie: end) should
> be rounded up to one less then the nearest multiple of the windows size,
> and then that should be looked up in the cache.

That's already done.
In "example", do
q=*:*&rows=12
q=*:*&rows=16
and you should see a queryResultCache hit since queryResultWindowSize
is 20 and both requests round up to that.

*but* if you do this (with an index with more than 20  docs in it)
q=*:*&rows=25

Currently that query will round up to 40, but since nResults
(start+row) isn't in the key, it will still get a cache hit but then
not be usable.

Now, if your proposal is to put nResults into the key, we then have a
worse problem.
Assume we're starting over with a clean cache.
q=*:*&rows=25   // cached under a key including nResults=40
q=*:*&rows=15  // looked up under a key including nResults=20... not found!

> but that's why people are suppose to pick a window size greater
> then the largest number of rows typically requested)

Hmmm, I don't think so.  If that were the case, there would be no need
for two parameters (no need for queryResultWindowSize) since we would
always just pick queryResultMaxDocsCached.

-Yonik
http://www.lucidimagination.com


SolrCore.getSearcher() and postCommit()

2010-10-29 Thread Grant Ingersoll
Is it OK to call and increment a Searcher ref (i.e. SolrCore.getSearcher()) in 
a SolrEventListener.postCommit() hook as long as I decrement it when I am done? 
 I need to get a handle on an IndexReader so I can dump out a portion of the 
index to an external process.

Thanks,
Grant

Re: How can I disable fsync()?

2010-10-29 Thread Grant Ingersoll

On Oct 29, 2010, at 2:11 PM, Igor Chudov wrote:

> Thanks to all and I made Solr work very well on one newer machine.
> 
> Now I am setting up Solr on an older server with an IDE hard drive.
> 
> Unfortunately, populating the index takes FOREVER due to
> Solr/Lucene/Tomcat calling fsync() a lot after every write.
> 
> I would like to know how to disable fsync.
> 
> I am very aware of the risks of not having fsync() and I DO NOT CARE
> ABOUT THEM AND DO NOT WANT TO BE REMINDED.
> 
> I just want to know how can I disable fsync() when adding to Solr index.

Have a look at FSDirectory.fsync().  That's at least a starting point. YMMV.




Re: SolrCore.getSearcher() and postCommit()

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 5:36 PM, Grant Ingersoll  wrote:
> Is it OK to call and increment a Searcher ref (i.e. SolrCore.getSearcher()) 
> in a SolrEventListener.postCommit() hook as long as I decrement it when I am 
> done?  I need to get a handle on an IndexReader so I can dump out a portion 
> of the index to an external process.

Yes, just be aware that the searcher you will get will not contain the
recently committed documents.
If you want that, look at the newSearcher hook instead.

-Yonik
http://www.lucidimagination.com


Re: NOT keyword - doesn't work with dismax?

2010-10-29 Thread Scott K
I couldn't even get the bq= to work with negated queries, although
with edismax, negated queries work with just q=-term

Works:
/solr/select?qt=edismax&q=-red

Here is the failed attempt with dismax
/solr/select?qt=dismax&rows=1&indent=true&q=-red&bq=*:*^0.001&echoParams=all&debugQuery=true

{
  "responseHeader":{
"status":0,
"QTime":20,
"params":{
  "mm":"2<-1 5<-2 6<90%",
  "pf":"title^10.0 sbody^2.0",
  "echoParams":"all",
  "tie":"0.01",
  "qf":"title^10.0 sbody^2.0 tags^1.0 text^1.0",
  "q.alt":"*:*",
  "hl.fl":"body",
  "wt":"json",
  "ps":"100",
  "defType":"dismax",
  "bq":"*:*^0.001",
  "echoParams":"all",
  "debugQuery":"true",
  "indent":"true",
  "q":"-red",
  "qt":"dismax",
  "rows":"1"}},
  "response":{"numFound":0,"start":0,"docs":[]
  },
  "debug":{
"rawquerystring":"-red",
"querystring":"-red",
"parsedquery":"+(-DisjunctionMaxQuery((tags:red | text:red |
title:red^10.0 | sbody:red^2.0)~0.01))
DisjunctionMaxQuery((title:red^10.0 | sbody:red^2.0)~0.01)
MatchAllDocsQuery(*:*^0.0010)",
"parsedquery_toString":"+(-(tags:red | text:red | title:red^10.0 |
sbody:red^2.0)~0.01) (title:red^10.0 | sbody:red^2.0)~0.01
*:*^0.0010",
"explain":{},
"QParser":"DisMaxQParser",
"altquerystring":null,
"boost_queries":["*:*^0.001"],
"parsed_boost_queries":["MatchAllDocsQuery(*:*^0.0010)"],
"boostfuncs":null,
"timing":{
  "time":20.0,
  "prepare":{
"time":19.0,
"org.apache.solr.handler.component.QueryComponent":{
  "time":19.0},
"org.apache.solr.handler.component.FacetComponent":{
  "time":0.0},
"org.apache.solr.handler.component.MoreLikeThisComponent":{
  "time":0.0},
"org.apache.solr.handler.component.HighlightComponent":{
  "time":0.0},
"org.apache.solr.handler.component.StatsComponent":{
  "time":0.0},
"org.apache.solr.handler.component.DebugComponent":{
  "time":0.0}},
  "process":{
"time":1.0,
"org.apache.solr.handler.component.QueryComponent":{
  "time":0.0},
"org.apache.solr.handler.component.FacetComponent":{
  "time":0.0},
"org.apache.solr.handler.component.MoreLikeThisComponent":{
  "time":0.0},
"org.apache.solr.handler.component.HighlightComponent":{
  "time":0.0},
"org.apache.solr.handler.component.StatsComponent":{
  "time":0.0},
"org.apache.solr.handler.component.DebugComponent":{
  "time":1.0}


On Wed, Apr 28, 2010 at 23:35, Chris Hostetter  wrote:
>
> : Ah, dismax doesn't support top-level NOT query.
>
> Hmm, yeah i don' think support for purely negated queries was ever added
> to dismax.
>
> I'm pretty sure that as a workarround you can use add
> something like...
>        bq=*:*^0.001
> ...to your query.  based on the dismax structure, that should allow purely
> negative queries to work.
>
>
>
> -Hoss
>
>


Solr + Zookeeper Integration

2010-10-29 Thread Claudio Devecchi
Hi people,

I'm trying to configure a little solr cluster but I need to shard the
documents.

I configured my solr with core0 (/opt/solr/core0) and installer the
zookeeper (/opt/zookeeper).

1. On my solrconfig.xml I added the lines below:


host1:2181
http://host1:8983/solr/core0
5000
/solr_domain/nodes
 


2. On my /opt/zookeeper/conf/zoo.cfg I configured this way:

tickTime=2000
dataDir=/var/zookeeper
clientPort=2181

And start it with zkServer.sh


After start the zookeeper my dir "/solr_domain/nodes" continues empty,
following the documentations I didn't find some extra thing to do, but
nothing is working.

SOmebody could tell me what is missing or wrong please?


Thanks


Would it be nuts to store a bunch of large attachments (images, videos) in stored but-not-indexed fields

2010-10-29 Thread Ron Mayer
I have some documents with a bunch of attachments (images, thumbnails
for them, audio clips, word docs, etc); and am currently dealing with
them by just putting a path on a filesystem to them in solr; and then
jumping through hoops of keeping them in sync with solr.

Would it be nuts to stick the image data itself in solr?

More specifically - if I have a bunch of large stored fields,
would it significantly impact search performance in the
cases when those fields aren't fetched.

Searches are very common in this system, and it's very rare
that someone actually opens up one of these attachments
so I'm not really worried about the time it takes to fetch
them when someone does actually want one.



Re: replication not working between 1.4.1 and 3.1-dev

2010-10-29 Thread Shawn Heisey

On 10/27/2010 8:34 PM, Shawn Heisey wrote:
I started to upgrade my slave servers from 1.4.1 to 3.1-dev checked 
out this morning.  Because of SOLR-2034 (new javabin version) the 
replication fails.


Asking about it in comments on SOLR-2034 brought up the suggestion of 
switching to XML instead of javabin, but so far I have not been able 
to figure out how to do this.  I filed a new Jira (SOLR-2204) on the 
replication failure.


Is there any way (through either a config change or minor code 
changes) to make the replication handler use XML?  If I have to make 
small edits to the 1.4.1 source as well as 3.1, that would be OK.


Talking to yourself is probably a sign of mental instability, but I'm 
doing it anyway.  There's been deafening silence from everyone else!


The recommended method of safely upgrading Solr that I've read about is 
to upgrade slave servers, keeping your production application pointed 
either at another set of slave servers or your master servers.  Then you 
test it with a dev copy of your application, and once you're sure it's 
working, you can switch production traffic over to the upgraded set.  If 
it falls over, you just switch back to the old version.  Once you're 
sure it's TRULY working, you upgrade everything else.  To convert fully 
to the new index format, you have the option of reindexing or optimizing 
your existing indexes.


I like this method, and this is the way I want to do it, except that the 
new javabin format makes it impossible.  I need a viable way to 
replicate indexes from a set of 1.4.1 master servers to 3.1-dev slaves.  
Delving into the source and tackling the problem myself is something I 
would truly love to do, but I lack the necessary skills.


I believe this will be a showstopper problem if 3.1 is released in its 
current state.


Are there any clever workarounds that would let me proceed with my 
upgrade now?


Thanks,
Shawn



Re: Looking for Developers

2010-10-29 Thread Dennis Gearon
LOL!

We ARE programmers, and we do like absolutes :-)
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Fri, 10/29/10, Lance Norskog  wrote:

> From: Lance Norskog 
> Subject: Re: Looking for Developers
> To: solr-user@lucene.apache.org, t...@statsbiblioteket.dk
> Date: Friday, October 29, 2010, 3:14 AM
> Then, Godwin!
> 
> On Fri, Oct 29, 2010 at 3:04 AM, Toke Eskildsen 
> wrote:
> > On Fri, 2010-10-29 at 10:06 +0200, Mark Allan wrote:
> >> For me, I simply deleted the original email, but
> I'm now quite
> >> enjoying the irony of the complaints causing more
> noise on the list
> >> than the original email!  ;-)
> >
> > He he. An old classic. Next in line is the
> meta-meta-discussion about
> > whether meta-discussions belong on the list or if they
> should be moved
> > to solr-user-meta. Repeat ad nauseam.
> >
> > Job-postings are on-topic IMHO and unless their volume
> grows
> > significantly, I see no reason to create a new mail
> lists.
> >
> >
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
>


Re: Would it be nuts to store a bunch of large attachments (images, videos) in stored but-not-indexed fields

2010-10-29 Thread Shashi Kant
On Fri, Oct 29, 2010 at 6:00 PM, Ron Mayer  wrote:

> I have some documents with a bunch of attachments (images, thumbnails
> for them, audio clips, word docs, etc); and am currently dealing with
> them by just putting a path on a filesystem to them in solr; and then
> jumping through hoops of keeping them in sync with solr.
>
>

Not sure why that is an issue. Keeping them in sync with solr would be the
same as storing within a file-system. Why would storing within solr be any
different.


> Would it be nuts to stick the image data itself in solr?
>
> More specifically - if I have a bunch of large stored fields,
> would it significantly impact search performance in the
> cases when those fields aren't fetched.
>
>
Hard to say. Assume you mean storing by converting into a base64 format. If
you do not retrieve the field when fetching, AFAIK should not affect it
significantly, if at all.
So if you manage your retrieval should be fine.


> Searches are very common in this system, and it's very rare
> that someone actually opens up one of these attachments
> so I'm not really worried about the time it takes to fetch
> them when someone does actually want one.
>
>