Re: Custom sort (score + custom value)

2008-11-04 Thread George
Todd: Yes, I looked into these arguments before I found the problem I
described in the first email.

Yonik: It's exactly what I was looking for.

George

On Mon, Nov 3, 2008 at 7:10 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Mon, Nov 3, 2008 at 12:37 PM, George <[EMAIL PROTECTED]> wrote:
> > Ok Yonik, thank you.
> >
> > I've tried to execute the following query: "{!boost b=log(myrank)
> > defType=dismax}q" and it works great.
> >
> > Do you know if I can do the same (combine a DisjunctionMaxQuery with a
> > BoostedQuery) in solrconfig.xml?
>
> Do you mean set it as a default for a handler in solrconfig.xml?  That
> should work.
> You could set a default of q={!boost b=log(myrank) defType=dismax v=$uq}
> then all the client would have to pass in is uq (the user query)
>
> -Yonik
>


moreLikeThis scores and ranking

2008-11-04 Thread Scurtu Vitalie
Hi all,
I am having some issues with solr moreLikeThis function.
First,
I have no idea how to view scores. What should I do in order to make
solr to show me tf-idf scores of each document retrieved by
moreLikeThis function?

Another issue with mlt. 
How can I
weight terms inside documents? For example, I have a field with terms
and for some terms I want to give a high weight, since I know they are
important for this document and it is representative. 

When I set boost value instead, it behaves very strange.

Here you have more detailed, from XML data files.

Query:
http://localhost/solr/blog1/select/?q=Id:1917&mlt=true&mlt.fl=tagsb2&fl=Id,tagsb2&mlt.count=100&mlt.match.includ=true

Internal representation of document

Target document
---
Id=1917


One of  documents retrieved by moreLikeThis on position 6

Id=1075
---



Now, I change boost value for document with Id=1075, which is on 6th position. 
Therefore, the new value in document is:


After quering solr, document with Id=1075 is on the same position, and it is 
6th.
I do the same for target document, and it is changing boost value to 1002.5

After
quering solr, the similar document is on 8th position, not anymore on
6th position. I expect that after changing the boost value to both
document to very high value, the ranking should be radically changed,
with document with id=1075 on top. 

What did I do wrong?
Thank you





  

RE: SOLR Performance

2008-11-04 Thread Feak, Todd
Most desktops nowadays have at least a dual-core and 1GB, you may be
able to get a semi-realistic feel for performance on a local desktop. If
you have access to something meaty in a desktop, you may not have to
spend a dime to find out what it's going to take in a server.

-T

-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 03, 2008 4:25 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR Performance

If you never execute any queries, a gig should be more than enough.

Of course, I've never played around with a .8 billion doc corpus on  
one machine.

-Mike

On 3-Nov-08, at 2:16 PM, Alok Dhir wrote:

> in terms of RAM -- how to size that on the indexer?
>
> ---
> Alok K. Dhir
> Symplicity Corporation
> www.symplicity.com
> (703) 351-0200 x 8080
> [EMAIL PROTECTED]
>
> On Nov 3, 2008, at 4:07 PM, Walter Underwood wrote:
>
>> The indexing box can be much smaller, especially in terms of CPU.
>> It just needs one fast thread and enough disk.
>>
>> wunder
>>
>> On 11/3/08 2:58 PM, "Alok Dhir" <[EMAIL PROTECTED]> wrote:
>>
>>> I was afraid of that.  Was hoping not to need another big fat box  
>>> like
>>> this one...
>>>
>>> ---
>>> Alok K. Dhir
>>> Symplicity Corporation
>>> www.symplicity.com
>>> (703) 351-0200 x 8080
>>> [EMAIL PROTECTED]
>>>
>>> On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote:
>>>
 I believe this is one of the reasons that a master/slave  
 configuration
 comes in handy. Commits to the Master don't slow down queries on  
 the
 Slave.

 -Todd

 -Original Message-
 From: Alok Dhir [mailto:[EMAIL PROTECTED]
 Sent: Monday, November 03, 2008 1:47 PM
 To: solr-user@lucene.apache.org
 Subject: SOLR Performance

 We've moved past this issue by reducing date precision -- thanks to
 all for the help.  Now we're at another problem.

 There is relatively constant updating of the index -- new log  
 entries
 are pumped in from several applications continuously.  Obviously,  
 new
 entries do not appear in searches until after a commit occurs.

 The problem is, issuing a commit causes searches to come to a
 screeching halt for up to 2 minutes.  We're up to around 80M docs.
 Index size is 27G.  The number of docs will soon be 800M, which
 doesn't bode well for these "pauses" in search performance.

 I'd appreciate any suggestions.

 ---
 Alok K. Dhir
 Symplicity Corporation
 www.symplicity.com
 (703) 351-0200 x 8080
 [EMAIL PROTECTED]

 On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote:

> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core  
> machine.
>
> Fairly simple schema -- no large text fields, standard request
> handler.  4 small facet fields.
>
> The index is an event log -- a primary search/retrieval  
> requirement
> is date range queries.
>
> A simple query without a date range subquery is ridiculously  
> fast -
> 2ms.  The same query with a date range takes up to 30s (30,000ms).
>
> Concrete example, this query just look 18s:
>
> instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z
 TO
> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position"
>
> The exact same query without the date range took 2ms.
>
> I saw a thread from Apr 2008 which explains the problem being  
> due to
> too much precision on the DateField type, and the range expansion
> leading to far too many elements being checked.  Proposed solution
> appears to be a hack where you index date fields as strings and
> hacking together date functions to generate proper queries/format
> results.
>
> Does this remain the recommended solution to this issue?
>
> Thanks
>
> ---
> Alok K. Dhir
> Symplicity Corporation
> www.symplicity.com
> (703) 351-0200 x 8080
> [EMAIL PROTECTED]
>


>>>
>>
>




Re: combining negative queries and OR

2008-11-04 Thread Yonik Seeley
On Tue, Nov 4, 2008 at 1:21 AM, Joe Pollard <[EMAIL PROTECTED]> wrote:
> I am trying to decide if this is a solr or a lucene problem, using solr
> 1.3:
>
> take this example --
>
> (-productName:"whatever") OR (anotherField:"Johnny")
>
> I would expect to get back records that have anotherField=Johnny, but
> also, any records that don't have 'whatever' as the productName.

Lucene can't really do pure negative queries... Solr currently only
allows negative queries at the top level... so the sub-query
(-productName:"whatever") matches nothing.  Try

(*:* -productName:"whatever") OR (anotherField:"Johnny")

-Yonik


Re: Question about score...

2008-11-04 Thread Yonik Seeley
On Mon, Nov 3, 2008 at 6:52 PM, Craig Stadler <[EMAIL PROTECTED]> wrote:
> BluesBrothers01.mp3
> Breaux_Brothers_Tiger_Rag_Blues.mp3
> Blues Brothers - Theme From Rawhide V1.mp3
>
> Why in the world is result 2 higher in score than #3 ???
> Is there something we can set in our schema or sol config to change this..
> Ideally we want all the Blues Brothers to appear with higher score because of 
> word order and proximity to the beginning of the string, etc.

Boolean queries have no implicit proximity... you either need to add
it yourself or use a query parser like dismax that can add it for you.
 If you want to add it yourself, use something like "blues
brothers"~100

-Yonik


How to use multicore feature in JBOSS

2008-11-04 Thread con

Hi all

I have deployed the sample instance that comes with solr, in JBOSS.
But actually i want to use the multicore feature. I am not able to find
resource on how to use the multicore feature in JBOSS.
1) Which all files do I need to edit to use the multicore feature?
2) Also, where can I specify the index directly so that we can point the
indexed documents to a custom folder instead of jboss/bin?

I really appreciate any suggestion, at least which file to edit, or some
other tips to solve this problem.
Thanking in advance
con
-- 
View this message in context: 
http://www.nabble.com/How-to-use-multicore-feature-in-JBOSS-tp20327580p20327580.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: MySql / Solr 1.3 / Tomcat55 - Full Import for 8,5M of data >> Exception in thread "Thread-33"

2008-11-04 Thread sunnyfr

I can't do that, we don't have another server to manage that.


Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> From the data-config.xml it is obvious that the your indexing will
> take a lot of time. MySql has very poor join performance. It is not a
> very good idea to run this on a production database.
> 
> I would suggest you to configure another mysql server and do mysql
> replication to that and run the import from there.
> 
> 
> 
> On Mon, Nov 3, 2008 at 10:06 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>> I've put a batchsize parameter at -1, it works fine, the point is I will
>> monopolize the MySql's database for 10hours.
>> And other request on it like update, or other process will be stack. And
>> if
>> I don't use batchsize -1 I will have an OOM error like below. I tried to
>> put
>> batchsize 1000 or 1 but it doesn't work either.
>>
>> What would you reckon ?
>> I can't have another server just for that, I have to use MySql
>> production's
>> database.
>>
>>> Should I try to generate CSV files from MySql and then make a full
>>> import
>>> from it, and delta import on the database ??
>>
>>> Should I modify the connection code in solr, to try to manage this
>>> memory.
>>
>>> Do you have another idea ?
>>
>> If anyone has any suggestions/needs more info, i'd be very greatful.
>>
>>
>>
>> FYI : Linux server / 8G Mem
>>
>> TomCat55 :
>> JAVA_OPTS="-Xms2000m -Xmx4000m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -Xloggc:gc.log
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
>>
>> data-config.xml:
>> http://www.nabble.com/file/p20305986/data-config.xml data-config.xml
>>
>>
>> My Error:
>> Nov  3 16:45:43 solr-test jsvc.exec[29099]:  [PSYoungGen:
>> 227583K->0K(455104K)] [PSOldGen: 3413375K->142574K(1706688K)]
>> 3640959K->142574K(2161792K)
>> Nov  3 16:45:43 solr-test jsvc.exec[29099]:  [PSPermGen:
>> 20751K->20751K(21504K)], 1.0853010 secs] [Times: user=0.99 sys=0.10,
>> real=1.09 secs]
>> Nov  3 16:45:43 solr-test jsvc.exec[29099]: Nov 3, 2008 4:45:43 PM
>> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor
>> processChildren SEVERE: Exception invoking periodic operation:
>> java.lang.OutOfMemoryError: Java heap space ^Iat
>> sun.nio.cs.US_ASCII.newDecoder(US_ASCII.java:39) ^Iat
>> java.nio.charset.CharsetEncoder.isLegalReplacement(CharsetEncoder.java:311)
>> ^Iat java.nio.charset.CharsetEncoder.replaceWith(CharsetEncoder.java:267)
>> ^Iat java.nio.charset.CharsetEncoder.(CharsetEncoder.java:186) ^Iat
>> java.nio.charset.CharsetEncoder.(CharsetEncoder.java:209) ^Iat
>> sun.nio.cs.US_ASCII$Encoder.(US_ASCII.java:121) ^Iat
>> sun.nio.cs.US_ASCII$Encoder.(US_ASCII.java:118) ^Iat
>> sun.nio.cs.US_ASCII.newEncoder(US_ASCII.java:43) ^Iat
>> java.lang.StringCoding$StringEncoder.(StringCoding.java:215) ^Iat
>> java.lang.StringCoding$StringEncoder.(StringCoding.java:207) ^Iat
>> java.lang.StringCoding.encode(StringCoding.java:266) ^Iat
>> java.lang.String.getBytes(String.java:947) ^Iat java.io.UnixFi
>> Nov  3 16:45:43 solr-test jsvc.exec[29099]:
>> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228) ^Iat
>> java.io.File.isDirectory(File.java:754) ^Iat
>> org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:873)
>> ^Iat
>> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:492)
>> ^Iat org.apache.catalina.startup.HostConfig.check(HostConfig.java:1206)
>> ^Iat
>> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:293)
>> ^Iat
>> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
>> ^Iat
>> org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1306)
>> ^Iat
>> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1570)
>> ^Iat
>> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1579)
>> ^Iat
>> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1559)
>> ^Iat java.lang.Thread.run(Thread.java:619)
>>
>> --
>> View this message in context:
>> http://www.nabble.com/MySql---Solr-1.3---Tomcat55-Full-Import-for-8%2C5M-of-data-%3E%3E-Exception-in-thread-%22Thread-33%22-tp20305986p20305986.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/MySql---Solr-1.3---Tomcat55-Full-Import-for-8%2C5M-of-data-%3E%3E-Exception-in-thread-%22Thread-33%22-tp20305986p20319273.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr Autowarming

2008-11-04 Thread Manepalli, Kalyan
Hi all,

I am working on smartfill solution using Solr. For
increasing the speed, I want to warm the cache at startup, using large
number of queries.

Is it possible to use a custom class to fire these queries instead of
listing the queries in solrConfig

 

Any suggestions will be helpful

 

Thanks,

Kalyan Manepalli

 



How to index special char "§"?

2008-11-04 Thread felizimm

Hi,

for a german law-search engine, I am in need to index the char "§". Do I
have to change the filter Factory? If yes, where?

Thanks a lot,
Felix.
-- 
View this message in context: 
http://www.nabble.com/How-to-index-special-char-%22%C2%A7%22--tp20332277p20332277.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR Performance

2008-11-04 Thread Walter Underwood
Funny, that is exactly what Infoseek did back in 1996. A big index that
changed rarely and a small index with real-time changes. Once each week,
merge to make a new big index and start over with the small one.

You also need to handle deletes specially.

wunder

On 11/3/08 6:44 PM, "Lance Norskog" <[EMAIL PROTECTED]> wrote:

> The logistics of handling giant index files hit us before search
> performance. We switched to a set of indexes running inside one server
> (tomcat) instance with the Multicore+Distributed Search tools, with a frozen
> old index and a new index actively taking updates. The smaller new index
> takes much less time to recover after a commit.
> 
> The DS code does not handle cases where the new and old index have different
> versions of the same document. We wrote a custom distributed search that
> favored the "new" index over the "old".
> 
> Lance
> 
> -Original Message-
> From: Mike Klaas [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 03, 2008 4:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR Performance
> 
> If you never execute any queries, a gig should be more than enough.
> 
> Of course, I've never played around with a .8 billion doc corpus on one
> machine.
> 
> -Mike
> 
> On 3-Nov-08, at 2:16 PM, Alok Dhir wrote:
> 
>> in terms of RAM -- how to size that on the indexer?
>> 
>> ---
>> Alok K. Dhir
>> Symplicity Corporation
>> www.symplicity.com
>> (703) 351-0200 x 8080
>> [EMAIL PROTECTED]
>> 
>> On Nov 3, 2008, at 4:07 PM, Walter Underwood wrote:
>> 
>>> The indexing box can be much smaller, especially in terms of CPU.
>>> It just needs one fast thread and enough disk.
>>> 
>>> wunder
>>> 
>>> On 11/3/08 2:58 PM, "Alok Dhir" <[EMAIL PROTECTED]> wrote:
>>> 
 I was afraid of that.  Was hoping not to need another big fat box
 like this one...
 
 ---
 Alok K. Dhir
 Symplicity Corporation
 www.symplicity.com
 (703) 351-0200 x 8080
 [EMAIL PROTECTED]
 
 On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote:
 
> I believe this is one of the reasons that a master/slave
> configuration comes in handy. Commits to the Master don't slow down
> queries on the Slave.
> 
> -Todd
> 
> -Original Message-
> From: Alok Dhir [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 03, 2008 1:47 PM
> To: solr-user@lucene.apache.org
> Subject: SOLR Performance
> 
> We've moved past this issue by reducing date precision -- thanks to
> all for the help.  Now we're at another problem.
> 
> There is relatively constant updating of the index -- new log
> entries are pumped in from several applications continuously.
> Obviously, new entries do not appear in searches until after a
> commit occurs.
> 
> The problem is, issuing a commit causes searches to come to a
> screeching halt for up to 2 minutes.  We're up to around 80M docs.
> Index size is 27G.  The number of docs will soon be 800M, which
> doesn't bode well for these "pauses" in search performance.
> 
> I'd appreciate any suggestions.
> 
> ---
> Alok K. Dhir
> Symplicity Corporation
> www.symplicity.com
> (703) 351-0200 x 8080
> [EMAIL PROTECTED]
> 
> On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote:
> 
>> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core
>> machine.
>> 
>> Fairly simple schema -- no large text fields, standard request
>> handler.  4 small facet fields.
>> 
>> The index is an event log -- a primary search/retrieval
>> requirement is date range queries.
>> 
>> A simple query without a date range subquery is ridiculously fast
>> - 2ms.  The same query with a date range takes up to 30s
>> (30,000ms).
>> 
>> Concrete example, this query just look 18s:
>> 
>> instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z
> TO
>> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position"
>> 
>> The exact same query without the date range took 2ms.
>> 
>> I saw a thread from Apr 2008 which explains the problem being due
>> to too much precision on the DateField type, and the range
>> expansion leading to far too many elements being checked.
>> Proposed solution appears to be a hack where you index date fields
>> as strings and hacking together date functions to generate proper
>> queries/format results.
>> 
>> Does this remain the recommended solution to this issue?
>> 
>> Thanks
>> 
>> ---
>> Alok K. Dhir
>> Symplicity Corporation
>> www.symplicity.com
>> (703) 351-0200 x 8080
>> [EMAIL PROTECTED]
>> 
> 
> 
 
>>> 
>> 
> 
> 



Re: How to index special char "§"?

2008-11-04 Thread Ryan McKinley

have you tried yet?

solr supports UTF-8... so I don't see why there would be a problem...

you should even be able to put a synonym mapping § => section (or the  
other way around)


Check the utf8-example.xml to see some examples of working with utf8  
chars.


ryan


On Nov 4, 2008, at 5:06 PM, felizimm wrote:



Hi,

for a german law-search engine, I am in need to index the char "§".  
Do I

have to change the filter Factory? If yes, where?

Thanks a lot,
Felix.
--
View this message in context: 
http://www.nabble.com/How-to-index-special-char-%22%C2%A7%22--tp20332277p20332277.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: How to use multicore feature in JBOSS

2008-11-04 Thread Norberto Meijome
On Tue, 4 Nov 2008 09:55:38 -0800 (PST)
con <[EMAIL PROTECTED]> wrote:

> 1) Which all files do I need to edit to use the multicore feature?
> 2) Also, where can I specify the index directly so that we can point the
> indexed documents to a custom folder instead of jboss/bin?

Con, please check the wiki - the answers should be there 

(
 1) = solr.xml ( previously multicore.xml)
2) look in solrconfig.xml for each core
)
_
{Beto|Norberto|Numard} Meijome

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Choosing Which Branch To Use

2008-11-04 Thread Chris Harris
My current pre-production Solr install is a 1.3 pre-release build, and
I think I'm going to update to a more recent version before an
upcoming product release. Actually, "release" is probably a bit of an
exaggeration; it's more of an alpha test, or perhaps a beta test.
Anyway, the question is which more recent version of Solr I should be
running. I'm not under pressure from on high to stick with an official
Solr release, so all these seem like legit possibilities for me:

Run the 1.3.0 release
Run a more recent build from the 1.3 branch
Run a nightly build of the trunk

Obviously, I would attempt to do sufficient testing before putting
Solr live regardless of which route I chose.

One factor is that I need to run a slightly modified Solr, as opposed
to a 100% out-of-the-box install. Currently I'm using these patches:

https://issues.apache.org/jira/browse/SOLR-538 (copyField maxLength property)
https://issues.apache.org/jira/browse/SOLR-284 (parsing rich document types)
https://issues.apache.org/jira/browse/SOLR-744 /
https://issues.apache.org/jira/browse/LUCENE-1370 (Patch to make
output a unigram if no ngrams can be generated)

I also may need to have a custom query parser plugin.

Any ideas?

Cheers,
Chris


Throughput Optimization

2008-11-04 Thread wojtekpia

I've been running load tests over the past week or 2, and I can't figure out
my system's bottle neck that prevents me from increasing throughput. First
I'll describe my Solr setup, then what I've tried to optimize the system.

I have 10 million records and 59 fields (all are indexed, 37 are stored, 17
have termVectors, 33 are multi-valued) which takes about 15GB of disk space.
Most field values are very short (single word or number), and usually about
half the fields have any data at all. I'm running on an 8-core, 64-bit, 32GB
RAM Redhat box. I allocate about 24GB of memory to the java process, and my
filterCache size is 700,000. I'm using a version of Solr between 1.3 and the
current trunk (including the latest SOLR-667 (FastLRUCache) patch), and
Tomcat 6.0.

I'm running a ramp-test, increasing the number of users every few minutes. I
measure the maximum number of requests that Solr can handle per second with
a fixed response time, and call that my throughput. I'd like to see a single
physical resource be maxed out at some point during my test so I know it is
my bottle neck. I generated random queries for my dataset representing a
more or less realistic scenario. The queries include faceting by up to 6
fields, and quering by up to 8 fields.

I ran a baseline on the un-optimized setup, and saw peak CPU usage of about
50%, IO usage around 5%, and negligible network traffic. Interestingly, the
CPU peaked when I had 8 concurrent users, and actually dropped down to about
40% when I increased the users beyond 8. Is that because I have 8 cores?

I changed a few settings and observed the effect on throughput:

1. Increased filterCache size, and throughput increased by about 50%, but it
seems to peak.
2. Put the entire index on a RAM disk, and significantly reduced the average
response time, but my throughput didn't change (i.e. even though my response
time was 10X faster, the maximum number of requests I could make per second
didn't increase). This makes no sense to me, unless there is another bottle
neck somewhere.
3. Reduced the number of records in my index. The throughput increased, but
the shape of all my graphs stayed the same, and my CPU usage was identical.

I have a few questions:
1. Can I get more than 50% CPU utilization?
2. Why does CPU utilization fall when I make more than 8 concurrent
requests?
3. Is there an obvious bottleneck that I'm missing?
4. Does Tomcat have any settings that affect Solr performance?

Any input is greatly appreciated. 

-- 
View this message in context: 
http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Choosing Which Branch To Use

2008-11-04 Thread Erik Hatcher
Personally, I'd go with a nightly build for your situation - just  
makes it easier (for me at least) to support and fix if there are any  
issues, and you'll benefit from great new features as well (stats  
component, java replication, etc).


The one drawback is if any of those patches don't keep with trunk and  
become a hassle to apply.  The SOLR-284 patch is not likely (-0.5 from  
me as-is) to get committed anywhere near where it is now, so consider  
it a risky one to rely on too strongly.


As for the custom query parser plugin... you can drop that in within a  
JAR in /lib so no need to patch Solr locally for that.


Erik

On Nov 4, 2008, at 7:20 PM, Chris Harris wrote:


My current pre-production Solr install is a 1.3 pre-release build, and
I think I'm going to update to a more recent version before an
upcoming product release. Actually, "release" is probably a bit of an
exaggeration; it's more of an alpha test, or perhaps a beta test.
Anyway, the question is which more recent version of Solr I should be
running. I'm not under pressure from on high to stick with an official
Solr release, so all these seem like legit possibilities for me:

Run the 1.3.0 release
Run a more recent build from the 1.3 branch
Run a nightly build of the trunk

Obviously, I would attempt to do sufficient testing before putting
Solr live regardless of which route I chose.

One factor is that I need to run a slightly modified Solr, as opposed
to a 100% out-of-the-box install. Currently I'm using these patches:

https://issues.apache.org/jira/browse/SOLR-538 (copyField maxLength  
property)
https://issues.apache.org/jira/browse/SOLR-284 (parsing rich  
document types)

https://issues.apache.org/jira/browse/SOLR-744 /
https://issues.apache.org/jira/browse/LUCENE-1370 (Patch to make
output a unigram if no ngrams can be generated)

I also may need to have a custom query parser plugin.

Any ideas?

Cheers,
Chris




Re: Solr Autowarming

2008-11-04 Thread Shalin Shekhar Mangar
Yes, you can extend QuerySenderListener to do this.

Also see https://issues.apache.org/jira/browse/SOLR-784

On Wed, Nov 5, 2008 at 3:22 AM, Manepalli, Kalyan <
[EMAIL PROTECTED]> wrote:

> Hi all,
>
>I am working on smartfill solution using Solr. For
> increasing the speed, I want to warm the cache at startup, using large
> number of queries.
>
> Is it possible to use a custom class to fire these queries instead of
> listing the queries in solrConfig
>
>
>
> Any suggestions will be helpful
>
>
>
> Thanks,
>
> Kalyan Manepalli
>
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Query on distributed search ...

2008-11-04 Thread Shalin Shekhar Mangar
Yes, StatsComponent can be used in a distributed Solr environment.

StatComponent is in the 1.4 nightly builds (unreleased yet).

On Tue, Nov 4, 2008 at 2:40 AM, souravm <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm new to Solr. Here is a query on distributed search.
>
> I have huge volume of log files which I would like to search. Apart from
> generic test search I would also like to get statistics - say each record
> has a field telling request processing time and I would like to get average
> of processing time for a given type of request. So for this I'm planning to
> use StatComponent.
>
> Since the log file volume is huge I plan to distribute it in multiple
> physical boxes and plan to use distributed search. However, I'm not sure
> whether StaComponent can be used in distributed search scenario.
>
> Any pointer on this query would be really helpful.
>
> Regards,
> Sourav
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further, you
> are not
> to copy, disclose, or distribute this e-mail or its contents to any other
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has
> taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should carry
> out your
> own virus checks before opening the e-mail or attachment. Infosys reserves
> the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS End of Disclaimer INFOSYS***
>



-- 
Regards,
Shalin Shekhar Mangar.


Need to write a start.jar file

2008-11-04 Thread Muhammed Sameer
Salaam,

I read somewhere that it is better to write a new start.jar file than use the 
one that is provided within the example directory, can someone please guide me 
to some documentation that can help me achieve this and write out my own 
start.jar file.

Regards,
Muhammed Sameer


  


Re: How to use multicore feature in JBOSS

2008-11-04 Thread con



Thanks Norberto 
Thanks for your reply
Its my mistake to forget the basics. :-( 
But for the first question, I am still not clear.
I think to use the multicore feature we should inform the server. In the
Jetty server, we are starting the server using:   java
-Dsolr.solr.home=multicore -jar start.jar
Once the server is started I think it will take the parameters from
multicore/solr.xml.

But I am confused on how and where to pass this argument to JBOSS. 

Looking forward for a positive reply
Thanking in advance
coN



Norberto Meijome-6 wrote:
> 
> On Tue, 4 Nov 2008 09:55:38 -0800 (PST)
> con <[EMAIL PROTECTED]> wrote:
> 
>> 1) Which all files do I need to edit to use the multicore feature?
>> 2) Also, where can I specify the index directly so that we can point the
>> indexed documents to a custom folder instead of jboss/bin?
> 
> Con, please check the wiki - the answers should be there 
> 
> (
>  1) = solr.xml ( previously multicore.xml)
> 2) look in solrconfig.xml for each core
> )
> _
> {Beto|Norberto|Numard} Meijome
> 
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"
> 
> I speak for myself, not my employer. Contents may be hot. Slippery when
> wet. Reading disclaimers makes you go blind. Writing them is worse. You
> have been Warned.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-use-multicore-feature-in-JBOSS-tp20327580p20337321.html
Sent from the Solr - User mailing list archive at Nabble.com.