date:20101207

Re: DIH - rdbms to index confusion

2010-12-07 Thread Stefan Matheis

Hi,

have a look at the mysql-query-log - it will tell you what queries are
executed from the solr dih. so you'll see which variables are empty/not set
as expected and therefore maybe missing in the result.

otherwise (for the rest of the list) it would be easier to help you, when
you're using real queries and not pseudo-queries :)

Regards
Stefan

Re: Solr Newbie - need a point in the right direction

2010-12-07 Thread Gora Mohanty

On Tue, Dec 7, 2010 at 9:12 AM, Mark  wrote:
[...]
> What I'm trying to do is extract some (presumably) structured information
> from non-uniform data (eg, prices from a nutch crawl) that needs to show in
> search queries, and I've come up against a wall.
>
> I've been unable to figure out where is the best place to begin.
>
> I had a look through the solr wiki and did a search via Lucid's search tool
> and I'm guessing this is handled at index time through my schema? But I've
> also seen dismax being thrown around as a possible solution and this has
> confused me.
>
> Basically, if you guys could point me in the right direction for resources
> (even as much as saying, you need X, it's over there) that would be a huge
> help.
[...]

Sorry, the above is a little unclear, at least to me. The basic steps in running
Solr are:
* Installing, configuring, and getting Solr running
* Indexing data, as also updating, and deleting: The best way to do this
  depends on where your data are coming from. Since you mention Nutch,
  that already integrates with Solr, although by default in a manner that
  dumps the entire content from a crawl into a Solr field. You will probably
  need to write a custom Nutch parser plugin in order to extract a subset
  from the content. Please see http://wiki.apache.org/nutch/RunningNutchAndSolr
* Searching through Solr

A good way of getting started is by going through the Solr tutorial:
http://lucene.apache.org/solr/tutorial.html . The Solr Wiki is also fairly
extensive: http://wiki.apache.org/solr/FrontPage . Finally, searching
Google for "solr getting started" turns up many likely-looking links.

Regards,
Gora

Re: Solr & JVM performance issue after 2 days

2010-12-07 Thread Sven Almgren

Have you run any optimize requests yet?

/Sven

On Tue, Dec 7, 2010 at 08:40, Hamid Vahedi  wrote:
> Hi,
>
> I am using multi-core tomcat on 2 servers. 3 language per server.
>
> I am adding documents to solr up to 200 doc/sec. when updating process is
> started, every thing is fine (update performance is max 200 ms/doc. with about
> 800 MB memory used with minimal cpu usage).
>
> After 15-17 hours it's became so slow  (more that 900 sec for update), used 
> heap
> memory is about 15GB, GC time is became more than one hour.
>
>
> I don't know what's wrong with it? Can anyone describe me what's the problem?
> Is that came from Solr or JVM?
>
> Note: when i stop updating, CPU busy within 15-20 min. and when start updating
> again i have same issue. but when stop tomcat service and start it again, all
> thing is OK.
>
> I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr 1.4.1
>
> thanks in advanced
> Hamid
>
>
>

Re: Solr & JVM performance issue after 2 days

2010-12-07 Thread Peter Karich


 Hi Hamid,

try to avoid autowarming when indexing (see solrconfig.xml: 
caches->autowarm + newSearcher + maxSearcher).

If you need to query and indexing at the same time,
then probably you'll need one read-only core and one for writing with no 
autowarming configured.

See: http://wiki.apache.org/solr/NearRealtimeSearchTuning

Or replicate from the indexing-core to a different core with different 
settings.


Regards,
Peter.



Hi,

I am using multi-core tomcat on 2 servers. 3 language per server.

I am adding documents to solr up to 200 doc/sec. when updating process is
started, every thing is fine (update performance is max 200 ms/doc. with about
800 MB memory used with minimal cpu usage).

After 15-17 hours it's became so slow  (more that 900 sec for update), used heap
memory is about 15GB, GC time is became more than one hour.


I don't know what's wrong with it? Can anyone describe me what's the problem?
Is that came from Solr or JVM?

Note: when i stop updating, CPU busy within 15-20 min. and when start updating
again i have same issue. but when stop tomcat service and start it again, all
thing is OK.

I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr 1.4.1

thanks in advanced
Hamid



--
http://jetwick.com twitter search prototype

Re: Solr & JVM performance issue after 2 days

2010-12-07 Thread Hamid Vahedi

hi Sven

no, only auto commit 


  1000
  1000








From: Sven Almgren 
To: solr-user@lucene.apache.org
Sent: Tue, December 7, 2010 1:54:40 PM
Subject: Re: Solr & JVM performance issue after 2 days

Have you run any optimize requests yet?

/Sven

On Tue, Dec 7, 2010 at 08:40, Hamid Vahedi  wrote:
> Hi,
>
> I am using multi-core tomcat on 2 servers. 3 language per server.
>
> I am adding documents to solr up to 200 doc/sec. when updating process is
> started, every thing is fine (update performance is max 200 ms/doc. with about
> 800 MB memory used with minimal cpu usage).
>
> After 15-17 hours it's became so slow  (more that 900 sec for update), used 
>heap
> memory is about 15GB, GC time is became more than one hour.
>
>
> I don't know what's wrong with it? Can anyone describe me what's the problem?
> Is that came from Solr or JVM?
>
> Note: when i stop updating, CPU busy within 15-20 min. and when start updating
> again i have same issue. but when stop tomcat service and start it again, all
> thing is OK.
>
> I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr 1.4.1
>
> thanks in advanced
> Hamid
>
>
>

Re: only index synonyms

2010-12-07 Thread lee carroll

Hi tom

This seems to place in the index
This is a scenic line of words
I just want scenic and words in the index

I'm not at a terminal at the moment but will try again to make sure. I'm
sure I'm missing the obvious

Cheers lee
On 7 Dec 2010 07:40, "Tom Hill"  wrote:
> Hi Lee,
>
>
> On Mon, Dec 6, 2010 at 10:56 PM, lee carroll
>  wrote:
>> Hi Erik
>
> Nope, Erik is the other one. :-)
>
>> thanks for the reply. I only want the synonyms to be in the index
>> how can I achieve that ? Sorry probably missing something obvious in the
>> docs
>
> Exactly what he said, use the => syntax. You've already got it. Add the
lines
>
> pretty => scenic
> text => words
>
> to synonyms.txt, and it will do what you want.
>
> Tom
>
>> On 7 Dec 2010 01:28, "Erick Erickson"  wrote:
>>> See:
>>>
>>
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>
>>> with the => syntax, I think that's what you're looking for
>>>
>>> Best
>>> Erick
>>>
>>> On Mon, Dec 6, 2010 at 6:34 PM, lee carroll <
lee.a.carr...@googlemail.com
>>>wrote:
>>>
 Hi Can the following usecase be achieved.

 value to be analysed at index time "this is a pretty line of text"

 synonym list is pretty => scenic , text => words

 valued placed in the index is "scenic words"

 That is to say only the matching synonyms. Basically i want to produce
a
 normalised set of phrases for faceting.

 Cheers Lee C

>>

Re: Solr & JVM performance issue after 2 days

2010-12-07 Thread Upayavira

Also, reduce your commit frequency, if you are doing an initial import.
You only need to commit (manually) once all of your content has been
imported.

I gave a talk about this sort of thing last week at the Online
Information Show in London, and am attempting to get the slides put
online, when I can get access to the account I need to do it!

Upayavira


On Tue, 07 Dec 2010 11:36 +0100, "Peter Karich" 
wrote:
>   Hi Hamid,
> 
> try to avoid autowarming when indexing (see solrconfig.xml: 
> caches->autowarm + newSearcher + maxSearcher).
> If you need to query and indexing at the same time,
> then probably you'll need one read-only core and one for writing with no 
> autowarming configured.
> See: http://wiki.apache.org/solr/NearRealtimeSearchTuning
> 
> Or replicate from the indexing-core to a different core with different 
> settings.
> 
> Regards,
> Peter.
> 
> 
> > Hi,
> >
> > I am using multi-core tomcat on 2 servers. 3 language per server.
> >
> > I am adding documents to solr up to 200 doc/sec. when updating process is
> > started, every thing is fine (update performance is max 200 ms/doc. with 
> > about
> > 800 MB memory used with minimal cpu usage).
> >
> > After 15-17 hours it's became so slow  (more that 900 sec for update), used 
> > heap
> > memory is about 15GB, GC time is became more than one hour.
> >
> >
> > I don't know what's wrong with it? Can anyone describe me what's the 
> > problem?
> > Is that came from Solr or JVM?
> >
> > Note: when i stop updating, CPU busy within 15-20 min. and when start 
> > updating
> > again i have same issue. but when stop tomcat service and start it again, 
> > all
> > thing is OK.
> >
> > I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr 1.4.1
> >
> > thanks in advanced
> > Hamid
> 
> 
> -- 
> http://jetwick.com twitter search prototype
> 
>

Re: Solr & JVM performance issue after 2 days

2010-12-07 Thread Hamid Vahedi

Hi Peter

Thanks a lot for reply. Actually I need real time indexing and query at the 
same 
time. 

Here  told: 
"You  can run multiple Solr instances in separate JVMs, with both having  their 
solr.xml configured to use the same index folder."

Now
Q1: I'm using Tomcat now, Could you please tell me how to have separate JVMs 
with Tomcat? 

Q2:What should  I set for LockType?

Thanks in advanced 





From: Peter Karich 
To: solr-user@lucene.apache.org
Sent: Tue, December 7, 2010 2:06:49 PM
Subject: Re: Solr & JVM performance issue after 2 days

  Hi Hamid,

try to avoid autowarming when indexing (see solrconfig.xml: 
caches->autowarm + newSearcher + maxSearcher).
If you need to query and indexing at the same time,
then probably you'll need one read-only core and one for writing with no 
autowarming configured.
See: http://wiki.apache.org/solr/NearRealtimeSearchTuning

Or replicate from the indexing-core to a different core with different 
settings.

Regards,
Peter.


> Hi,
>
> I am using multi-core tomcat on 2 servers. 3 language per server.
>
> I am adding documents to solr up to 200 doc/sec. when updating process is
> started, every thing is fine (update performance is max 200 ms/doc. with about
> 800 MB memory used with minimal cpu usage).
>
> After 15-17 hours it's became so slow  (more that 900 sec for update), used 
>heap
> memory is about 15GB, GC time is became more than one hour.
>
>
> I don't know what's wrong with it? Can anyone describe me what's the problem?
> Is that came from Solr or JVM?
>
> Note: when i stop updating, CPU busy within 15-20 min. and when start updating
> again i have same issue. but when stop tomcat service and start it again, all
> thing is OK.
>
> I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr 1.4.1
>
> thanks in advanced
> Hamid


-- 
http://jetwick.com twitter search prototype

Field Collapsing - sort by group count, get total groups

2010-12-07 Thread ssetem


Hi,

I wondered if it is possible to sort groups by the total within the group,
and to bring back total amount groups?

Trying to build reporting system, which shows highest aggregates first, then
allows pagination through this list.

Cheers
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-Collapsing-sort-by-group-count-get-total-groups-tp2033086p2033086.html
Sent from the Solr - User mailing list archive at Nabble.com.

MultiCore config less stable than SingleCore?

2010-12-07 Thread Jan Simon Winkelmann

Hi,

i have recently moved Solr at one of our customers to a MultiCore environment 
running 2 indexes. Since then, we seem to be having problems with locks not 
being removed properly, .lock files keep sticking around in the index 
directory. 
Hence, any updates to the index keep returning 500 errors with the following 
stack trace:

Error 500 Lock obtain timed out: 
NativeFSLock@/data/jetty/solr/index1/data/index/lucene-96165c19c16f26b93de3954f6891-write.lock

org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
NativeFSLock@/data/jetty/solr/index1/data/index/lucene-96165c19c16f26b93de3954f6891-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
at 
org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1187)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:425)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:457)
at 
org.eclipse.jetty.server.session.SessionHandler.handle(SessionHandler.java:182)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:933)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:362)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:867)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:245)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
at org.eclipse.jetty.server.Server.handle(Server.java:334)
at 
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:559)
at 
org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:1007)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:747)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:209)
at 
org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:406)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:462)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
at java.lang.Thread.run(Thread.java:662)

All our other installations with a similar SingleCore config are running very 
smoothly.
Does anyone have an idea what the problem is? Could I have missed something 
when configuring the MultiCore environment?

Regards,
Jan

Re: Out of memory error

2010-12-07 Thread Erick Erickson

Have you seen this page? http://wiki.apache.org/solr/DataImportHandlerFaq
See especially batchsize,
but it looks like you're already on to that.

Do you have any idea how big the records are in the database? You might
try adjusting the rambuffersize down, what is it at now?

In general, what are our Solr commit options?

Does anything get to Solr or is the OOM when the SQL is executed?
The first question to answer is whether you index anything at all...

There's a little-know DIH debug page you can access at:
.../solr/admin/dataimport.jsp that might help, and progress can be monitored
at:
.../solr/dataimport

DIH can be "interesting", you get finer control with SolrJ and a direct
JDBC connection. If you don't get anywhere with DIH.

Scattergun response, but things to try...

Best
Erick

On Tue, Dec 7, 2010 at 12:03 AM, sivaprasad wrote:

>
> Hi,
>
> When i am trying to import the data using DIH, iam getting Out of memory
> error.The below are the configurations which i have.
>
> Database:Mysql
> Os:windows
> No Of documents:15525532
> In Db-config.xml i made batch size as "-1"
>
> The solr server is running on Linux machine with tomcat.
> i set tomcat arguments as ./startup.sh -Xms1024M -Xmx2048M
>
> Can anybody has idea, where the things are going wrong?
>
> Regards,
> JS
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Out-of-memory-error-tp2031761p2031761.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Field Collapsing - sort by group count, get total groups

2010-12-07 Thread Yonik Seeley

On Tue, Dec 7, 2010 at 7:03 AM, ssetem  wrote:
> I wondered if it is possible to sort groups by the total within the group,
> and to bring back total amount groups?

That is planned, but not currently implemented.
You can use faceting to get both totals and sort by highest total though.

Total number of groups is a different problem - we don't return it
because we don't know.
It will take a different algorithm (that's more memory intensive) to
find out the total number of groups.
If the number is unlikely to be too large, you could just return all
groups (or use faceting to do that more efficiently).

-Yonik
http://www.lucidimagination.com

Re: Solr Newbie - need a point in the right direction

2010-12-07 Thread Erick Erickson

Solr is downstream of what I think you want. There's nothing in Solr
that allows you to take an arbitrary page and extract specific info
from it. I suspect the Nutch folks have dealt with this kind of question,
looking over the user's list there might give some insight.

Basically, once you have the page, you extract the information to
put into your structured Solr document, "extracting the information"
is the hard part and there's nothing built into Solr that I know of
that helps with that...

Best
Erick

On Mon, Dec 6, 2010 at 10:42 PM, Mark  wrote:

> Hi,
>
> First time poster here - I'm not entirely sure where I need to look for
> this
> information.
>
> What I'm trying to do is extract some (presumably) structured information
> from non-uniform data (eg, prices from a nutch crawl) that needs to show in
> search queries, and I've come up against a wall.
>
> I've been unable to figure out where is the best place to begin.
>
> I had a look through the solr wiki and did a search via Lucid's search tool
> and I'm guessing this is handled at index time through my schema? But I've
> also seen dismax being thrown around as a possible solution and this has
> confused me.
>
> Basically, if you guys could point me in the right direction for resources
> (even as much as saying, you need X, it's over there) that would be a huge
> help.
>
> Cheers
>
> Mark
>

Re: only index synonyms

2010-12-07 Thread Erick Erickson

OK, the light finally dawns

*If* you have a defined list of words to remove, you can put them in
with your stopwords and add a stopword filter to the field in
schema.xml.

Otherwise, you'll have to do some pre-processing and only send to
solr words you want. I'm assuming you have a list of valid words
(i.e. the words in your synonyms file) and could pre-filter the input
to remove everything else. In that case you don't need a synonyms
filter since you're controlling the whole process anyway

Best
Erick

On Tue, Dec 7, 2010 at 6:07 AM, lee carroll wrote:

> Hi tom
>
> This seems to place in the index
> This is a scenic line of words
> I just want scenic and words in the index
>
> I'm not at a terminal at the moment but will try again to make sure. I'm
> sure I'm missing the obvious
>
> Cheers lee
> On 7 Dec 2010 07:40, "Tom Hill"  wrote:
> > Hi Lee,
> >
> >
> > On Mon, Dec 6, 2010 at 10:56 PM, lee carroll
> >  wrote:
> >> Hi Erik
> >
> > Nope, Erik is the other one. :-)
> >
> >> thanks for the reply. I only want the synonyms to be in the index
> >> how can I achieve that ? Sorry probably missing something obvious in the
> >> docs
> >
> > Exactly what he said, use the => syntax. You've already got it. Add the
> lines
> >
> > pretty => scenic
> > text => words
> >
> > to synonyms.txt, and it will do what you want.
> >
> > Tom
> >
> >> On 7 Dec 2010 01:28, "Erick Erickson"  wrote:
> >>> See:
> >>>
> >>
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> >>>
> >>> with the => syntax, I think that's what you're looking for
> >>>
> >>> Best
> >>> Erick
> >>>
> >>> On Mon, Dec 6, 2010 at 6:34 PM, lee carroll <
> lee.a.carr...@googlemail.com
> >>>wrote:
> >>>
>  Hi Can the following usecase be achieved.
> 
>  value to be analysed at index time "this is a pretty line of text"
> 
>  synonym list is pretty => scenic , text => words
> 
>  valued placed in the index is "scenic words"
> 
>  That is to say only the matching synonyms. Basically i want to produce
> a
>  normalised set of phrases for faceting.
> 
>  Cheers Lee C
> 
> >>
>

Re: Solr & JVM performance issue after 2 days

2010-12-07 Thread Erick Erickson

Your autocommit is unrealistic. You're telling the server to commit
every 5 seconds and you're overloading the system. At least that's
my guess. Every time you do this, you're causing all the caches to
be thrown away, any autowarming to be triggered, etc, etc, etc.

There have been significant improvements in this area on trunk, but you
haven't told us what version of Solr you're using, so I'm not sure
whether that helps. See NRT (Near Real Time) discussions on the JIRA.

Here's what I'd do to test this:
Increase the autocommit parameters significantly and see if the problem
goes away (something like 10,000 docs and 60,000 milliseconds). And
don't forget maxRamBufferSizeMB. That'll point you in the right direction.

You really need to consider a master/slave architecture here I think, where
the master only indexes and the slaves periodically get the updates. There
will be some latency.

Best
Erick

On Tue, Dec 7, 2010 at 7:01 AM, Hamid Vahedi  wrote:

> Hi Peter
>
> Thanks a lot for reply. Actually I need real time indexing and query at the
> same
> time.
>
> Here  told:
> "You  can run multiple Solr instances in separate JVMs, with both having
>  their
> solr.xml configured to use the same index folder."
>
> Now
> Q1: I'm using Tomcat now, Could you please tell me how to have separate
> JVMs
> with Tomcat?
>
> Q2:What should  I set for LockType?
>
> Thanks in advanced
>
>
>
>
> 
> From: Peter Karich 
> To: solr-user@lucene.apache.org
> Sent: Tue, December 7, 2010 2:06:49 PM
> Subject: Re: Solr & JVM performance issue after 2 days
>
>   Hi Hamid,
>
> try to avoid autowarming when indexing (see solrconfig.xml:
> caches->autowarm + newSearcher + maxSearcher).
> If you need to query and indexing at the same time,
> then probably you'll need one read-only core and one for writing with no
> autowarming configured.
> See: http://wiki.apache.org/solr/NearRealtimeSearchTuning
>
> Or replicate from the indexing-core to a different core with different
> settings.
>
> Regards,
> Peter.
>
>
> > Hi,
> >
> > I am using multi-core tomcat on 2 servers. 3 language per server.
> >
> > I am adding documents to solr up to 200 doc/sec. when updating process is
> > started, every thing is fine (update performance is max 200 ms/doc. with
> about
> > 800 MB memory used with minimal cpu usage).
> >
> > After 15-17 hours it's became so slow  (more that 900 sec for update),
> used
> >heap
> > memory is about 15GB, GC time is became more than one hour.
> >
> >
> > I don't know what's wrong with it? Can anyone describe me what's the
> problem?
> > Is that came from Solr or JVM?
> >
> > Note: when i stop updating, CPU busy within 15-20 min. and when start
> updating
> > again i have same issue. but when stop tomcat service and start it again,
> all
> > thing is OK.
> >
> > I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr
> 1.4.1
> >
> > thanks in advanced
> > Hamid
>
>
> --
> http://jetwick.com twitter search prototype
>
>
>
>

Re: MultiCore config less stable than SingleCore?

2010-12-07 Thread Erick Erickson

Could you tell us what version of Solr you're running?
And what OS you're concerned about?
And what file system you're operating on?
And anything else you can think of that'd help us help you?

Best
Erick

On Tue, Dec 7, 2010 at 4:56 AM, Jan Simon Winkelmann <
jansimon.winkelm...@newsfactory.de> wrote:

> Hi,
>
> i have recently moved Solr at one of our customers to a MultiCore
> environment running 2 indexes. Since then, we seem to be having problems
> with locks not being removed properly, .lock files keep sticking around in
> the index directory.
> Hence, any updates to the index keep returning 500 errors with the
> following stack trace:
>
> Error 500 Lock obtain timed out: NativeFSLock@
> /data/jetty/solr/index1/data/index/lucene-96165c19c16f26b93de3954f6891-write.lock
>
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:
> NativeFSLock@
> /data/jetty/solr/index1/data/index/lucene-96165c19c16f26b93de3954f6891-write.lock
>at org.apache.lucene.store.Lock.obtain(Lock.java:85)
>at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
>at
> org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402)
>at
> org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
>at
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
>at
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
>at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
>at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>at
> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
>at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1187)
>at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:425)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:457)
>at
> org.eclipse.jetty.server.session.SessionHandler.handle(SessionHandler.java:182)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:933)
>at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:362)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:867)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:245)
>at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
>at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
>at org.eclipse.jetty.server.Server.handle(Server.java:334)
>at
> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:559)
>at
> org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:1007)
>at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:747)
>at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:209)
>at
> org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:406)
>at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:462)
>at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
>at java.lang.Thread.run(Thread.java:662)
>
> All our other installations with a similar SingleCore config are running
> very smoothly.
> Does anyone have an idea what the problem is? Could I have missed something
> when configuring the MultiCore environment?
>
> Regards,
> Jan
>

Re: Taxonomy and Faceting

2010-12-07 Thread webdev1977


Can someone enlighten me on how to get started with this patch?  I am running
solr 1.4.1 and I need to download the latest trunk and apply the patch
obviously.. But after that, I am sort of clueless.. I am assuming there are
some things that have to happen in solr config and schema files. 

Reading the code now... 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2033563.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr & JVM performance issue after 2 days

2010-12-07 Thread Upayavira

On Tue, 07 Dec 2010 04:01 -0800, "Hamid Vahedi" 
wrote:
> Hi Peter
> 
> Thanks a lot for reply. Actually I need real time indexing and query at
> the same 
> time. 

What do you mean by real time? The answer to that is going to heavily
influence your architecture and the amount of effort you are going to
have to put in.

If you can accept a five minute lead time, then you're not going to have
too much difficulty. If you want it to be, say less than 1 minute, then
you're likely going to have to investigate the near real time stuff in
newer (unreleased) Solr/Lucene.

> Here  told: 
> "You  can run multiple Solr instances in separate JVMs, with both having 
> their 
> solr.xml configured to use the same index folder."
> 
> Now
> Q1: I'm using Tomcat now, Could you please tell me how to have separate
> JVMs 
> with Tomcat? 
> 
> Q2:What should  I set for LockType?

Sounds dangerous to me, as Lucene assumes it has complete control over
its files. Unless there's a specific way to set up a 'read only' solr?

Upayavira

Re: Taxonomy and Faceting

2010-12-07 Thread Tommaso Teofili

Hi,
as I made the patch I can guide you through the Solr-UIMA integration
configuration, just give me some more time as I am really busy at the moment
and can't deepen it. There was a mini tutorial but it's outdated, I'll
update it and let you know here in a few hours.
Cheers,
Tommaso

2010/12/7 webdev1977 

>
> Can someone enlighten me on how to get started with this patch?  I am
> running
> solr 1.4.1 and I need to download the latest trunk and apply the patch
> obviously.. But after that, I am sort of clueless.. I am assuming there are
> some things that have to happen in solr config and schema files.
>
> Reading the code now...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2033563.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: how to config DataImport Scheduling

2010-12-07 Thread Ahmet Arslan

> I want to config DataImport Scheduling, but not know, how
> to do it.

I do it with a cronjob.  

curl "http://localhost:8080/solr/dataimport?command=delta-import&optimize=false";

Re: Taxonomy and Faceting

2010-12-07 Thread webdev1977


That would be AMAZING!! And much appreciated ;-)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2033657.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Newbie - need a point in the right direction

2010-12-07 Thread webdev1977


I my experience, the hardest (but most flexible part) is exactly what was
mentioned.. processing the data.  Nutch does have a really easy plugin
interface that you can use, and the example plugin is a great place to
start.  Once you have the raw parsed text, you can do what ever you want
with it.  For example, I wrote a  plugin to add geospatial information to my
NutchDocument.  You then map the fields you added in the NutchDocument to
something you want to have Solr index.  In my case I created a geography
field where I put lat, lon info.  Then you create that same geography field
in the nutch to solr mapping file as well as your solr schema.xml file. 
Then, when you run the crawl and tell it to use "solrindex" it will send the
document to solr to be indexed.  Since you have your new field in the
schema, it knows what to do with it at index time.  Now you can build a user
interface around what you want to do with that field.  


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Newbie-need-a-point-in-the-right-direction-tp2031381p2033687.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field Collapsing - sort by group count, get total groups

2010-12-07 Thread ssetem


Thanks for the reply,

How would i get the total amount of possible facets(non zero), I've searched
around but have no luck.

Cheers
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-Collapsing-sort-by-group-count-get-total-groups-tp2033086p2033645.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field Collapsing - sort by group count, get total groups

2010-12-07 Thread Yonik Seeley

On Tue, Dec 7, 2010 at 9:07 AM, ssetem  wrote:
> Thanks for the reply,
>
> How would i get the total amount of possible facets(non zero), I've searched
> around but have no luck.

Only current way would be to request them all.

Just like field collapsing, this is a number we don't (generally)
have.  There are optimizations like short-circuiting on the docfreq
that would need to be disabled to generate that count.

-Yonik
http://www.lucidimagination.com

Re: Solr Newbie - need a point in the right direction

2010-12-07 Thread Mark

Thanks to everyone who responded, no wonder I was getting confused, I was
completely focusing on the wrong half of the equation.

I had a cursory look through some of the Nutch documentation available and
it is looking promising.

Thanks everyone.

Mark

On Tue, Dec 7, 2010 at 10:19 PM, webdev1977  wrote:

>
> I my experience, the hardest (but most flexible part) is exactly what was
> mentioned.. processing the data.  Nutch does have a really easy plugin
> interface that you can use, and the example plugin is a great place to
> start.  Once you have the raw parsed text, you can do what ever you want
> with it.  For example, I wrote a  plugin to add geospatial information to
> my
> NutchDocument.  You then map the fields you added in the NutchDocument to
> something you want to have Solr index.  In my case I created a geography
> field where I put lat, lon info.  Then you create that same geography field
> in the nutch to solr mapping file as well as your solr schema.xml file.
> Then, when you run the crawl and tell it to use "solrindex" it will send
> the
> document to solr to be indexed.  Since you have your new field in the
> schema, it knows what to do with it at index time.  Now you can build a
> user
> interface around what you want to do with that field.
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Newbie-need-a-point-in-the-right-direction-tp2031381p2033687.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Severe NoClassDefFoundError Spell StringDistance Nightly 20101207

2010-12-07 Thread Dan Hertz (Insight 49, LLC)

Whilst running java -jar start.jar from the latest nightly build example 
directory, I get the following...any ideas how to fix this? Thanks! Dan.


Dec 7, 2010 8:46:56 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError: 
org/apache/lucene/search/spell/StringDistance
at 
org.apache.solr.search.ValueSourceParser.(ValueSourceParser.java:297)
at 
org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:1517)

at org.apache.solr.core.SolrCore.(SolrCore.java:554)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:243)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86)

at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)

at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)

at org.mortbay.jetty.Server.doStart(Server.java:224)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)

at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:194)
at org.mortbay.start.Main.start(Main.java:534)
at org.mortbay.start.Main.start(Main.java:441)
at org.mortbay.start.Main.main(Main.java:119)
Caused by: java.lang.ClassNotFoundException: 
org.apache.lucene.search.spell.StringDistance

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at 
org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:401)
at 
org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)

... 33 more

DataDevRoom at the 2011 edition of the FOSDEM

2010-12-07 Thread Isabel Drost

Hello,

We (Olivier, Nicolas and I) are organizing a Data Analytics DevRoom
that will take place during the next edition of the FOSDEM in Brussels
on Feb. 5. Here is the CFP:

  http://datadevroom.couch.it/CFP

You might be interested in attending the event and take the
opportunity to speak about your projects. 

Important Dates (all dates in GMT +2):

Submission deadline:  2010-12-17
Notification of accepted speakers: 2010-12-20
Publication of final schedule:  2011-01-10
Meetup: 2011-02-05

The event will comprise presentations on scalable data processing. We
invite you to submit talks on the topics: Information retrieval / Search
Large Scale data processing, Machine Learning, Text Mining, Computer
vision, Linked Open Data.

High quality, technical submissions are called for, ranging from
principles to practice. We are looking for presentations on the
implementation of the systems themselves, real world applications and
case studies.

Submissions should be based on free software solutions.

Looking forward to meeting you face to face in Brussels,
Isabel

Re: Index version on slave nodes

2010-12-07 Thread Markus Jelsma

But why? I'd expect valid version numbers although the replication handler's 
source code seems to agree with you judging from the comments.

On Monday 06 December 2010 17:49:16 Xin Li wrote:
> I think this is expected behavior. You have to issue the "details"
> command to get the real indexversion for slave machines.
> 
> Thanks,
> Xin
> 
> On Mon, Dec 6, 2010 at 11:26 AM, Markus Jelsma
> 
>  wrote:
> > Hi,
> > 
> > The indexversion command in the replicationHandler on slave nodes returns
> > 0 for indexversion and generation while the details command does return
> > the correct information. I haven't found an existing ticket on this one
> > although https://issues.apache.org/jira/browse/SOLR-1573 has
> > similarities.
> > 
> > Cheers,
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

highlighting encoding issue

2010-12-07 Thread getagrip


Hi,
when I query solr (trunk) I get "numeric character references" instead 
of regular UTF-8 strings in case of special characters in the 
highlighting section, in the result section the characters are presented 
fine.


e.g instead of the German Umlaut Ä I get ä

Example:



Vielfachmessgerät






Vielfachmessgerät


Any hints are welcome.

Re: Index version on slave nodes

2010-12-07 Thread Xin Li

I read it somewhere (sorry for not remembering the source).. the
indexversion command gets the "replicable" index version #. Since it
is a slave machine, so the result is 0.

Thanks,

On Tue, Dec 7, 2010 at 11:06 AM, Markus Jelsma
 wrote:
> But why? I'd expect valid version numbers although the replication handler's
> source code seems to agree with you judging from the comments.
>
> On Monday 06 December 2010 17:49:16 Xin Li wrote:
>> I think this is expected behavior. You have to issue the "details"
>> command to get the real indexversion for slave machines.
>>
>> Thanks,
>> Xin
>>
>> On Mon, Dec 6, 2010 at 11:26 AM, Markus Jelsma
>>
>>  wrote:
>> > Hi,
>> >
>> > The indexversion command in the replicationHandler on slave nodes returns
>> > 0 for indexversion and generation while the details command does return
>> > the correct information. I haven't found an existing ticket on this one
>> > although https://issues.apache.org/jira/browse/SOLR-1573 has
>> > similarities.
>> >
>> > Cheers,
>> >
>> > --
>> > Markus Jelsma - CTO - Openindex
>> > http://www.linkedin.com/in/markus17
>> > 050-8536620 / 06-50258350
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Re: Index version on slave nodes

2010-12-07 Thread Markus Jelsma

Yes, i read that too in the replication request handler's source comments. But 
i would find it convenient if it would just use the same values as we see using 
the details command.

Any devs agree? Then i'd open a ticket for this one.

On Tuesday 07 December 2010 17:14:09 Xin Li wrote:
> I read it somewhere (sorry for not remembering the source).. the
> indexversion command gets the "replicable" index version #. Since it
> is a slave machine, so the result is 0.
> 
> Thanks,
> 
> On Tue, Dec 7, 2010 at 11:06 AM, Markus Jelsma
> 
>  wrote:
> > But why? I'd expect valid version numbers although the replication
> > handler's source code seems to agree with you judging from the comments.
> > 
> > On Monday 06 December 2010 17:49:16 Xin Li wrote:
> >> I think this is expected behavior. You have to issue the "details"
> >> command to get the real indexversion for slave machines.
> >> 
> >> Thanks,
> >> Xin
> >> 
> >> On Mon, Dec 6, 2010 at 11:26 AM, Markus Jelsma
> >> 
> >>  wrote:
> >> > Hi,
> >> > 
> >> > The indexversion command in the replicationHandler on slave nodes
> >> > returns 0 for indexversion and generation while the details command
> >> > does return the correct information. I haven't found an existing
> >> > ticket on this one although
> >> > https://issues.apache.org/jira/browse/SOLR-1573 has
> >> > similarities.
> >> > 
> >> > Cheers,
> >> > 
> >> > --
> >> > Markus Jelsma - CTO - Openindex
> >> > http://www.linkedin.com/in/markus17
> >> > 050-8536620 / 06-50258350
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Solr & JVM performance issue after 2 days

2010-12-07 Thread Peter Karich


 Am 07.12.2010 13:01, schrieb Hamid Vahedi:

Hi Peter

Thanks a lot for reply. Actually I need real time indexing and query at the same
time.

Here  told:
"You  can run multiple Solr instances in separate JVMs, with both having  their
solr.xml configured to use the same index folder."

Now
Q1: I'm using Tomcat now, Could you please tell me how to have separate JVMs
with Tomcat?


Are you sure you don't want two servers and you really want real time?
Slow down indexing + less cache should do the trick I think.

I wouldn't recommend indexing AND querying on the same machine unless 
you have a lot RAM and CPU.


you could even deploy two indices into one tomcat... the read only index 
refers to the data dir via:

/path/to/index/data
then issue an empty (!!) commit to the read only index every minute. so 
that the read only index sees the changes from the feeding index.

(again: see the wikipage!)

setting up two tomcats on one server I woudn't recommend too, but its 
possible via copying tomcat into, say tomcat2

and change the shutdown and 8080 port in the tomcat2/conf/server.xml


Q2:What should  I set for LockType?


I'm using simple, but native should also be ok.


Thanks in advanced





From: Peter Karich
To: solr-user@lucene.apache.org
Sent: Tue, December 7, 2010 2:06:49 PM
Subject: Re: Solr&  JVM performance issue after 2 days

   Hi Hamid,

try to avoid autowarming when indexing (see solrconfig.xml:
caches->autowarm + newSearcher + maxSearcher).
If you need to query and indexing at the same time,
then probably you'll need one read-only core and one for writing with no
autowarming configured.
See: http://wiki.apache.org/solr/NearRealtimeSearchTuning

Or replicate from the indexing-core to a different core with different
settings.

Regards,
Peter.



Hi,

I am using multi-core tomcat on 2 servers. 3 language per server.

I am adding documents to solr up to 200 doc/sec. when updating process is
started, every thing is fine (update performance is max 200 ms/doc. with about
800 MB memory used with minimal cpu usage).

After 15-17 hours it's became so slow  (more that 900 sec for update), used
heap
memory is about 15GB, GC time is became more than one hour.


I don't know what's wrong with it? Can anyone describe me what's the problem?
Is that came from Solr or JVM?

Note: when i stop updating, CPU busy within 15-20 min. and when start updating
again i have same issue. but when stop tomcat service and start it again, all
thing is OK.

I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr 1.4.1

thanks in advanced
Hamid





--
http://jetwick.com twitter search prototype

solrj & http client 4

2010-12-07 Thread Stevo Slavić

Hello solr users and developers,

Are there any plans to upgraded http client dependency in solrj from 3.x to
4.x? Found this  ticket -
judging by comments in it upgrade might help fix the issue. I have a project
in jar hell, getting different versions of http client as transitive
dependency...

Regards,
Stevo.

Re: solrj & http client 4

2010-12-07 Thread Yonik Seeley

On Tue, Dec 7, 2010 at 12:32 PM, Stevo Slavić  wrote:
> Hello solr users and developers,
>
> Are there any plans to upgraded http client dependency in solrj from 3.x to
> 4.x?

I'd certainly be for moving to 4.x (and I think everyone else would too).
The issue is that it's not a drop-in replacement, so someone needs to
do the work.

-Yonik
http://www.lucidimagination.com

> Found this  ticket -
> judging by comments in it upgrade might help fix the issue. I have a project
> in jar hell, getting different versions of http client as transitive
> dependency...
>
> Regards,
> Stevo.

Re: Out of memory error

2010-12-07 Thread Fuad Efendi

Related: SOLR-846

Sent on the TELUS Mobility network with BlackBerry

-Original Message-
From: Erick Erickson 
Date: Tue, 7 Dec 2010 08:11:41 
To: 
Reply-To: solr-user@lucene.apache.org
Subject: Re: Out of memory error

Have you seen this page? http://wiki.apache.org/solr/DataImportHandlerFaq
See especially batchsize,
but it looks like you're already on to that.

Do you have any idea how big the records are in the database? You might
try adjusting the rambuffersize down, what is it at now?

In general, what are our Solr commit options?

Does anything get to Solr or is the OOM when the SQL is executed?
The first question to answer is whether you index anything at all...

There's a little-know DIH debug page you can access at:
.../solr/admin/dataimport.jsp that might help, and progress can be monitored
at:
.../solr/dataimport

DIH can be "interesting", you get finer control with SolrJ and a direct
JDBC connection. If you don't get anywhere with DIH.

Scattergun response, but things to try...

Best
Erick

On Tue, Dec 7, 2010 at 12:03 AM, sivaprasad wrote:

>
> Hi,
>
> When i am trying to import the data using DIH, iam getting Out of memory
> error.The below are the configurations which i have.
>
> Database:Mysql
> Os:windows
> No Of documents:15525532
> In Db-config.xml i made batch size as "-1"
>
> The solr server is running on Linux machine with tomcat.
> i set tomcat arguments as ./startup.sh -Xms1024M -Xmx2048M
>
> Can anybody has idea, where the things are going wrong?
>
> Regards,
> JS
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Out-of-memory-error-tp2031761p2031761.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Terms component with shards?

2010-12-07 Thread bbarani


Hi,

Will terms component work along with Shards?

I have 3 cores and I am using shards to to distributed search. 

I have a autosuggest feature implemented using terms component (when I had
just one core before) and its working fine as long as I have just one core.
It doesnt seems to work fine when I use shards..

Can someone let me know if terms component will work fine when using with
shards?

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Terms-component-with-shards-tp2035735p2035735.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: only index synonyms

2010-12-07 Thread lee carroll

ok thanks for your response

To summarise the solution then:

To only index synonyms you must only send words that will match the synonym
list. If words with out synonym ,atches are in the field to be indexed these
words will be indexed. No way to avoid this by using schema.xml config.

thanks lee c

On 7 December 2010 13:21, Erick Erickson  wrote:

> OK, the light finally dawns
>
> *If* you have a defined list of words to remove, you can put them in
> with your stopwords and add a stopword filter to the field in
> schema.xml.
>
> Otherwise, you'll have to do some pre-processing and only send to
> solr words you want. I'm assuming you have a list of valid words
> (i.e. the words in your synonyms file) and could pre-filter the input
> to remove everything else. In that case you don't need a synonyms
> filter since you're controlling the whole process anyway
>
> Best
> Erick
>
> On Tue, Dec 7, 2010 at 6:07 AM, lee carroll  >wrote:
>
> > Hi tom
> >
> > This seems to place in the index
> > This is a scenic line of words
> > I just want scenic and words in the index
> >
> > I'm not at a terminal at the moment but will try again to make sure. I'm
> > sure I'm missing the obvious
> >
> > Cheers lee
> > On 7 Dec 2010 07:40, "Tom Hill"  wrote:
> > > Hi Lee,
> > >
> > >
> > > On Mon, Dec 6, 2010 at 10:56 PM, lee carroll
> > >  wrote:
> > >> Hi Erik
> > >
> > > Nope, Erik is the other one. :-)
> > >
> > >> thanks for the reply. I only want the synonyms to be in the index
> > >> how can I achieve that ? Sorry probably missing something obvious in
> the
> > >> docs
> > >
> > > Exactly what he said, use the => syntax. You've already got it. Add the
> > lines
> > >
> > > pretty => scenic
> > > text => words
> > >
> > > to synonyms.txt, and it will do what you want.
> > >
> > > Tom
> > >
> > >> On 7 Dec 2010 01:28, "Erick Erickson" 
> wrote:
> > >>> See:
> > >>>
> > >>
> >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> > >>>
> > >>> with the => syntax, I think that's what you're looking for
> > >>>
> > >>> Best
> > >>> Erick
> > >>>
> > >>> On Mon, Dec 6, 2010 at 6:34 PM, lee carroll <
> > lee.a.carr...@googlemail.com
> > >>>wrote:
> > >>>
> >  Hi Can the following usecase be achieved.
> > 
> >  value to be analysed at index time "this is a pretty line of text"
> > 
> >  synonym list is pretty => scenic , text => words
> > 
> >  valued placed in the index is "scenic words"
> > 
> >  That is to say only the matching synonyms. Basically i want to
> produce
> > a
> >  normalised set of phrases for faceting.
> > 
> >  Cheers Lee C
> > 
> > >>
> >
>

Re: only index synonyms

2010-12-07 Thread Tom Hill

Hi Lee,

Sorry, I think Erick and I both thought the issue was converting the
synonyms, not removing the other words.

To keep only a set of words that match a list, use the
KeepWordFilterFactory, with your list of synonyms.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.KeepWordFilterFactory

I'd put the synonym filter first in your configuration for the field,
then the keep words filter factory.

Tom




On Tue, Dec 7, 2010 at 12:06 PM, lee carroll
 wrote:
> ok thanks for your response
>
> To summarise the solution then:
>
> To only index synonyms you must only send words that will match the synonym
> list. If words with out synonym ,atches are in the field to be indexed these
> words will be indexed. No way to avoid this by using schema.xml config.
>
> thanks lee c
>
> On 7 December 2010 13:21, Erick Erickson  wrote:
>
>> OK, the light finally dawns
>>
>> *If* you have a defined list of words to remove, you can put them in
>> with your stopwords and add a stopword filter to the field in
>> schema.xml.
>>
>> Otherwise, you'll have to do some pre-processing and only send to
>> solr words you want. I'm assuming you have a list of valid words
>> (i.e. the words in your synonyms file) and could pre-filter the input
>> to remove everything else. In that case you don't need a synonyms
>> filter since you're controlling the whole process anyway
>>
>> Best
>> Erick
>>
>> On Tue, Dec 7, 2010 at 6:07 AM, lee carroll > >wrote:
>>
>> > Hi tom
>> >
>> > This seems to place in the index
>> > This is a scenic line of words
>> > I just want scenic and words in the index
>> >
>> > I'm not at a terminal at the moment but will try again to make sure. I'm
>> > sure I'm missing the obvious
>> >
>> > Cheers lee
>> > On 7 Dec 2010 07:40, "Tom Hill"  wrote:
>> > > Hi Lee,
>> > >
>> > >
>> > > On Mon, Dec 6, 2010 at 10:56 PM, lee carroll
>> > >  wrote:
>> > >> Hi Erik
>> > >
>> > > Nope, Erik is the other one. :-)
>> > >
>> > >> thanks for the reply. I only want the synonyms to be in the index
>> > >> how can I achieve that ? Sorry probably missing something obvious in
>> the
>> > >> docs
>> > >
>> > > Exactly what he said, use the => syntax. You've already got it. Add the
>> > lines
>> > >
>> > > pretty => scenic
>> > > text => words
>> > >
>> > > to synonyms.txt, and it will do what you want.
>> > >
>> > > Tom
>> > >
>> > >> On 7 Dec 2010 01:28, "Erick Erickson" 
>> wrote:
>> > >>> See:
>> > >>>
>> > >>
>> >
>> >
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>> > >>>
>> > >>> with the => syntax, I think that's what you're looking for
>> > >>>
>> > >>> Best
>> > >>> Erick
>> > >>>
>> > >>> On Mon, Dec 6, 2010 at 6:34 PM, lee carroll <
>> > lee.a.carr...@googlemail.com
>> > >>>wrote:
>> > >>>
>> >  Hi Can the following usecase be achieved.
>> > 
>> >  value to be analysed at index time "this is a pretty line of text"
>> > 
>> >  synonym list is pretty => scenic , text => words
>> > 
>> >  valued placed in the index is "scenic words"
>> > 
>> >  That is to say only the matching synonyms. Basically i want to
>> produce
>> > a
>> >  normalised set of phrases for faceting.
>> > 
>> >  Cheers Lee C
>> > 
>> > >>
>> >
>>
>

Re: Terms component with shards?

2010-12-07 Thread Shawn Heisey


On 12/7/2010 12:53 PM, bbarani wrote:

Hi,

Will terms component work along with Shards?

I have 3 cores and I am using shards to to distributed search.


Yes - but not in Solr 1.4.x.  You'll need branch_3x or trunk.

https://issues.apache.org/jira/browse/SOLR-1177

Shawn

Re: highlighting encoding issue

2010-12-07 Thread Koji Sekiguchi


(10/12/08 1:12), getagrip wrote:

Hi,
when I query solr (trunk) I get "numeric character references" instead of 
regular UTF-8 strings in
case of special characters in the highlighting section, in the result section 
the characters are
presented fine.

e.g instead of the German Umlaut Ä I get ä

Example:



Vielfachmessgerät






Vielfachmessgerät


Any hints are welcome.


It may be due to HtmlEncoder in solrconfig.xml:




Try to remove the setting or use DefaultEncoder (it is "null" encoder)
instead of HtmlEncoder.

Koji
--
http://www.rondhuit.com/en/

Spatial search - Solr 4.0

2010-12-07 Thread Jae Joo

Hi,

I am implementing spatial search and found some odd things. As I know that
the returning distance is still being implemented, so I have implement
algorithm to calculate the actual distance based on lat and long returned.
when I do it, I have found the sort is not working properly. Any thing I
missed?

Jae

Re: only index synonyms

2010-12-07 Thread lee carroll

That's ace tom
Will give it a go but sounds spot on
On 7 Dec 2010 20:49, "Tom Hill"  wrote:
> Hi Lee,
>
> Sorry, I think Erick and I both thought the issue was converting the
> synonyms, not removing the other words.
>
> To keep only a set of words that match a list, use the
> KeepWordFilterFactory, with your list of synonyms.
>
>
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.KeepWordFilterFactory
>
> I'd put the synonym filter first in your configuration for the field,
> then the keep words filter factory.
>
> Tom
>
>
>
>
> On Tue, Dec 7, 2010 at 12:06 PM, lee carroll
>  wrote:
>> ok thanks for your response
>>
>> To summarise the solution then:
>>
>> To only index synonyms you must only send words that will match the
synonym
>> list. If words with out synonym ,atches are in the field to be indexed
these
>> words will be indexed. No way to avoid this by using schema.xml config.
>>
>> thanks lee c
>>
>> On 7 December 2010 13:21, Erick Erickson  wrote:
>>
>>> OK, the light finally dawns
>>>
>>> *If* you have a defined list of words to remove, you can put them in
>>> with your stopwords and add a stopword filter to the field in
>>> schema.xml.
>>>
>>> Otherwise, you'll have to do some pre-processing and only send to
>>> solr words you want. I'm assuming you have a list of valid words
>>> (i.e. the words in your synonyms file) and could pre-filter the input
>>> to remove everything else. In that case you don't need a synonyms
>>> filter since you're controlling the whole process anyway
>>>
>>> Best
>>> Erick
>>>
>>> On Tue, Dec 7, 2010 at 6:07 AM, lee carroll <
lee.a.carr...@googlemail.com
>>> >wrote:
>>>
>>> > Hi tom
>>> >
>>> > This seems to place in the index
>>> > This is a scenic line of words
>>> > I just want scenic and words in the index
>>> >
>>> > I'm not at a terminal at the moment but will try again to make sure.
I'm
>>> > sure I'm missing the obvious
>>> >
>>> > Cheers lee
>>> > On 7 Dec 2010 07:40, "Tom Hill"  wrote:
>>> > > Hi Lee,
>>> > >
>>> > >
>>> > > On Mon, Dec 6, 2010 at 10:56 PM, lee carroll
>>> > >  wrote:
>>> > >> Hi Erik
>>> > >
>>> > > Nope, Erik is the other one. :-)
>>> > >
>>> > >> thanks for the reply. I only want the synonyms to be in the index
>>> > >> how can I achieve that ? Sorry probably missing something obvious
in
>>> the
>>> > >> docs
>>> > >
>>> > > Exactly what he said, use the => syntax. You've already got it. Add
the
>>> > lines
>>> > >
>>> > > pretty => scenic
>>> > > text => words
>>> > >
>>> > > to synonyms.txt, and it will do what you want.
>>> > >
>>> > > Tom
>>> > >
>>> > >> On 7 Dec 2010 01:28, "Erick Erickson" 
>>> wrote:
>>> > >>> See:
>>> > >>>
>>> > >>
>>> >
>>> >
>>>
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>> > >>>
>>> > >>> with the => syntax, I think that's what you're looking for
>>> > >>>
>>> > >>> Best
>>> > >>> Erick
>>> > >>>
>>> > >>> On Mon, Dec 6, 2010 at 6:34 PM, lee carroll <
>>> > lee.a.carr...@googlemail.com
>>> > >>>wrote:
>>> > >>>
>>> >  Hi Can the following usecase be achieved.
>>> > 
>>> >  value to be analysed at index time "this is a pretty line of
text"
>>> > 
>>> >  synonym list is pretty => scenic , text => words
>>> > 
>>> >  valued placed in the index is "scenic words"
>>> > 
>>> >  That is to say only the matching synonyms. Basically i want to
>>> produce
>>> > a
>>> >  normalised set of phrases for faceting.
>>> > 
>>> >  Cheers Lee C
>>> > 
>>> > >>
>>> >
>>>
>>

Warming searchers/Caching

2010-12-07 Thread Mark

Is there any plugin or easy way to auto-warm/cache a new searcher with a 
bunch of searches read from a file? I know this can be accomplished 
using the EventListeners (newSearcher, firstSearcher) but I rather not 
add 100+ queries to my solrconfig.xml.


If there is no hook/listener available, is there some sort of Handler 
that performs this sort of function? Thanks!

Re: Spatial search - Solr 4.0

2010-12-07 Thread Erick Erickson

What version of solr are you using? What is your configuration?
What query are you using?

Best
Erick

On Tue, Dec 7, 2010 at 5:40 PM, Jae Joo  wrote:

> Hi,
>
> I am implementing spatial search and found some odd things. As I know that
> the returning distance is still being implemented, so I have implement
> algorithm to calculate the actual distance based on lat and long returned.
> when I do it, I have found the sort is not working properly. Any thing I
> missed?
>
> Jae
>

Re: Warming searchers/Caching

2010-12-07 Thread Erick Erickson

Warning: I haven't used this personally, but Xinclude looks like what
you're after, see: http://wiki.apache.org/solr/SolrConfigXml#XInclude



Best
Erick

On Tue, Dec 7, 2010 at 6:33 PM, Mark  wrote:

> Is there any plugin or easy way to auto-warm/cache a new searcher with a
> bunch of searches read from a file? I know this can be accomplished using
> the EventListeners (newSearcher, firstSearcher) but I rather not add 100+
> queries to my solrconfig.xml.
>
> If there is no hook/listener available, is there some sort of Handler that
> performs this sort of function? Thanks!
>

customer ping response

2010-12-07 Thread Tri Nguyen

Can I have a custom xml response for the ping request?

thanks,

Tri

Re: Warming searchers/Caching

2010-12-07 Thread Markus Jelsma

XInclude works fine but that's not what your looking for i guess. Having the 
100 top queries is overkill anyway and it can take too long for a new searcher 
to warmup.

Depending on the type of requests, i usually tend to limit warming to popular 
filter queries only as they generate a very high hit ratio at make caching 
useful [1].

If there are very popular user entered queries having a high initial latency, 
i'd have them warmed up as well.

[1]: http://wiki.apache.org/solr/SolrCaching#Tradeoffs

> Warning: I haven't used this personally, but Xinclude looks like what
> you're after, see: http://wiki.apache.org/solr/SolrConfigXml#XInclude
> 
> 
> 
> Best
> Erick
> 
> On Tue, Dec 7, 2010 at 6:33 PM, Mark  wrote:
> > Is there any plugin or easy way to auto-warm/cache a new searcher with a
> > bunch of searches read from a file? I know this can be accomplished using
> > the EventListeners (newSearcher, firstSearcher) but I rather not add 100+
> > queries to my solrconfig.xml.
> > 
> > If there is no hook/listener available, is there some sort of Handler
> > that performs this sort of function? Thanks!

Re: customer ping response

2010-12-07 Thread Markus Jelsma

Of course! The ping request handler behaves like any other request handler and 
accepts at last the wt parameter [1]. Use xslt [2] to transform the output to 
any desirable form or use other response writers [1].

Why anyway, is it a load balancer that only wants an OK output or something?

[1]: http://wiki.apache.org/solr/CoreQueryParameters
[2]: http://wiki.apache.org/solr/XsltResponseWriter
[3]: http://wiki.apache.org/solr/QueryResponseWriter
> Can I have a custom xml response for the ping request?
> 
> thanks,
> 
> Tri

Re: customer ping response

2010-12-07 Thread Tri Nguyen

I need to return this:




Server
ok







From: Markus Jelsma 
To: solr-user@lucene.apache.org
Cc: Tri Nguyen 
Sent: Tue, December 7, 2010 4:27:32 PM
Subject: Re: customer ping response

Of course! The ping request handler behaves like any other request handler and 
accepts at last the wt parameter [1]. Use xslt [2] to transform the output to 
any desirable form or use other response writers [1].

Why anyway, is it a load balancer that only wants an OK output or something?

[1]: http://wiki.apache.org/solr/CoreQueryParameters
[2]: http://wiki.apache.org/solr/XsltResponseWriter
[3]: http://wiki.apache.org/solr/QueryResponseWriter
> Can I have a custom xml response for the ping request?
> 
> thanks,
> 
> Tri

Re: customer ping response

2010-12-07 Thread Markus Jelsma

Well, you can go a long way with xslt but i wouldn't know how to embed the 
server name in the response as Solr simply doesn't return that information.

You'd have to patch the response Solr's giving or put a small script in front 
that can embed the server name.

> I need to return this:
> 
> 
> 
> 
> Server
> ok
> 
> 
> 
> 
> 
> 
> 
> From: Markus Jelsma 
> To: solr-user@lucene.apache.org
> Cc: Tri Nguyen 
> Sent: Tue, December 7, 2010 4:27:32 PM
> Subject: Re: customer ping response
> 
> Of course! The ping request handler behaves like any other request handler
> and accepts at last the wt parameter [1]. Use xslt [2] to transform the
> output to any desirable form or use other response writers [1].
> 
> Why anyway, is it a load balancer that only wants an OK output or
> something?
> 
> [1]: http://wiki.apache.org/solr/CoreQueryParameters
> [2]: http://wiki.apache.org/solr/XsltResponseWriter
> [3]: http://wiki.apache.org/solr/QueryResponseWriter
> 
> > Can I have a custom xml response for the ping request?
> > 
> > thanks,
> > 
> > Tri

Re: customer ping response

2010-12-07 Thread Tri Nguyen

Hi,

I'm reading the wiki.

What does q=apache mean in the url?

http://localhost:8983/solr/select/?stylesheet=&q=apache&wt=xslt&tr=example.xsl

thanks,

tri

 




From: Markus Jelsma 
To: Tri Nguyen 
Cc: solr-user@lucene.apache.org
Sent: Tue, December 7, 2010 4:35:28 PM
Subject: Re: customer ping response

Well, you can go a long way with xslt but i wouldn't know how to embed the 
server name in the response as Solr simply doesn't return that information.

You'd have to patch the response Solr's giving or put a small script in front 
that can embed the server name.

> I need to return this:
> 
> 
> 
> 
> Server
> ok
> 
> 
> 
> 
> 
> 
> 
> From: Markus Jelsma 
> To: solr-user@lucene.apache.org
> Cc: Tri Nguyen 
> Sent: Tue, December 7, 2010 4:27:32 PM
> Subject: Re: customer ping response
> 
> Of course! The ping request handler behaves like any other request handler
> and accepts at last the wt parameter [1]. Use xslt [2] to transform the
> output to any desirable form or use other response writers [1].
> 
> Why anyway, is it a load balancer that only wants an OK output or
> something?
> 
> [1]: http://wiki.apache.org/solr/CoreQueryParameters
> [2]: http://wiki.apache.org/solr/XsltResponseWriter
> [3]: http://wiki.apache.org/solr/QueryResponseWriter
> 
> > Can I have a custom xml response for the ping request?
> > 
> > thanks,
> > 
> > Tri

Re: customer ping response

2010-12-07 Thread Erick Erickson

That's the query term being sent to the server.

On Tue, Dec 7, 2010 at 8:50 PM, Tri Nguyen  wrote:

> Hi,
>
> I'm reading the wiki.
>
> What does q=apache mean in the url?
>
>
> http://localhost:8983/solr/select/?stylesheet=&q=apache&wt=xslt&tr=example.xsl
>
> thanks,
>
> tri
>
>
>
>
>
> 
> From: Markus Jelsma 
> To: Tri Nguyen 
> Cc: solr-user@lucene.apache.org
> Sent: Tue, December 7, 2010 4:35:28 PM
> Subject: Re: customer ping response
>
> Well, you can go a long way with xslt but i wouldn't know how to embed the
> server name in the response as Solr simply doesn't return that information.
>
> You'd have to patch the response Solr's giving or put a small script in
> front
> that can embed the server name.
>
> > I need to return this:
> >
> > 
> > 
> > 
> > Server
> > ok
> > 
> > 
> >
> >
> >
> >
> > 
> > From: Markus Jelsma 
> > To: solr-user@lucene.apache.org
> > Cc: Tri Nguyen 
> > Sent: Tue, December 7, 2010 4:27:32 PM
> > Subject: Re: customer ping response
> >
> > Of course! The ping request handler behaves like any other request
> handler
> > and accepts at last the wt parameter [1]. Use xslt [2] to transform the
> > output to any desirable form or use other response writers [1].
> >
> > Why anyway, is it a load balancer that only wants an OK output or
> > something?
> >
> > [1]: http://wiki.apache.org/solr/CoreQueryParameters
> > [2]: http://wiki.apache.org/solr/XsltResponseWriter
> > [3]: http://wiki.apache.org/solr/QueryResponseWriter
> >
> > > Can I have a custom xml response for the ping request?
> > >
> > > thanks,
> > >
> > > Tri
>

Re: customer ping response

2010-12-07 Thread Tom Hill

Hi Tri,

Well, I wouldn't really recommend this, but I just tried making a
custom XMLReponseWriter that wrote the response you wanted. So you can
use it with any request handler you want. Works fine, but it's pretty
hack-y.

The downside is, you are writing code, and you have to modify
SolrCore. But it's trivial to do.

So, I wouldn't recommend it, but it was fun to play around with. :)

It's probably easier to fix the load balancer, which is almost
certainly just looking for any string you specify. Just change what
it's expecting. They are built so you can configure this.

Tom

On Tue, Dec 7, 2010 at 5:56 PM, Erick Erickson  wrote:
> That's the query term being sent to the server.
>
> On Tue, Dec 7, 2010 at 8:50 PM, Tri Nguyen  wrote:
>
>> Hi,
>>
>> I'm reading the wiki.
>>
>> What does q=apache mean in the url?
>>
>>
>> http://localhost:8983/solr/select/?stylesheet=&q=apache&wt=xslt&tr=example.xsl
>>
>> thanks,
>>
>> tri
>>
>>
>>
>>
>>
>> 
>> From: Markus Jelsma 
>> To: Tri Nguyen 
>> Cc: solr-user@lucene.apache.org
>> Sent: Tue, December 7, 2010 4:35:28 PM
>> Subject: Re: customer ping response
>>
>> Well, you can go a long way with xslt but i wouldn't know how to embed the
>> server name in the response as Solr simply doesn't return that information.
>>
>> You'd have to patch the response Solr's giving or put a small script in
>> front
>> that can embed the server name.
>>
>> > I need to return this:
>> >
>> > 
>> > 
>> > 
>> > Server
>> > ok
>> > 
>> > 
>> >
>> >
>> >
>> >
>> > 
>> > From: Markus Jelsma 
>> > To: solr-user@lucene.apache.org
>> > Cc: Tri Nguyen 
>> > Sent: Tue, December 7, 2010 4:27:32 PM
>> > Subject: Re: customer ping response
>> >
>> > Of course! The ping request handler behaves like any other request
>> handler
>> > and accepts at last the wt parameter [1]. Use xslt [2] to transform the
>> > output to any desirable form or use other response writers [1].
>> >
>> > Why anyway, is it a load balancer that only wants an OK output or
>> > something?
>> >
>> > [1]: http://wiki.apache.org/solr/CoreQueryParameters
>> > [2]: http://wiki.apache.org/solr/XsltResponseWriter
>> > [3]: http://wiki.apache.org/solr/QueryResponseWriter
>> >
>> > > Can I have a custom xml response for the ping request?
>> > >
>> > > thanks,
>> > >
>> > > Tri
>>
>

Re: Warming searchers/Caching

2010-12-07 Thread Mark


Maybe I should explain my problem a little more in detail.

The problem we are experiencing is after a delta-import we notice a 
extremely high load time on the slave machines that just replicated. It 
goes away after a min or so production traffic once everything is cached.


I already have a before/after hook that is in place before/after 
replication takes place. The before hook removes the slave from the 
cluster and then starts to replicate. When its done it calls the after 
hook and I would like to warm up the cache in this method so no users 
experience extremely long wait times.


On 12/7/10 4:22 PM, Markus Jelsma wrote:

XInclude works fine but that's not what your looking for i guess. Having the
100 top queries is overkill anyway and it can take too long for a new searcher
to warmup.

Depending on the type of requests, i usually tend to limit warming to popular
filter queries only as they generate a very high hit ratio at make caching
useful [1].

If there are very popular user entered queries having a high initial latency,
i'd have them warmed up as well.

[1]: http://wiki.apache.org/solr/SolrCaching#Tradeoffs


Warning: I haven't used this personally, but Xinclude looks like what
you're after, see: http://wiki.apache.org/solr/SolrConfigXml#XInclude



Best
Erick

On Tue, Dec 7, 2010 at 6:33 PM, Mark  wrote:

Is there any plugin or easy way to auto-warm/cache a new searcher with a
bunch of searches read from a file? I know this can be accomplished using
the EventListeners (newSearcher, firstSearcher) but I rather not add 100+
queries to my solrconfig.xml.

If there is no hook/listener available, is there some sort of Handler
that performs this sort of function? Thanks!

Re: Terms component with shards?

2010-12-07 Thread bbarani


Hey Shawn,

Thanks for your reply.

I tried using shards and shards qt parameter, its working like charm..

I included both these component in Terms request handler and it seems to
work fine even in SOLR 1.4..

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Terms-component-with-shards-tp2035735p2038319.html
Sent from the Solr - User mailing list archive at Nabble.com.

complex boolean filtering in fq queries

2010-12-07 Thread Andy

I have a facet query that requires some complex boolean filtering. Something 
like:

fq=location:national OR (fq=location:CA AND fq=city:"San Francisco")

1) How do I turn the above filters into a REST query string?
2) Do I need the double quotes around "San Francisco"?
3) Will complex boolean filters like this substantially slow down query 
performance?

Thanks

Re: complex boolean filtering in fq queries

2010-12-07 Thread Andy

Forgot to add, my defaultOperator is "AND".

--- On Wed, 12/8/10, Andy  wrote:

> From: Andy 
> Subject: complex boolean filtering in fq queries
> To: solr-user@lucene.apache.org
> Date: Wednesday, December 8, 2010, 1:21 AM
> I have a facet query that requires
> some complex boolean filtering. Something like:
> 
> fq=location:national OR (fq=location:CA AND fq=city:"San
> Francisco")
> 
> 1) How do I turn the above filters into a REST query
> string?
> 2) Do I need the double quotes around "San Francisco"?
> 3) Will complex boolean filters like this substantially
> slow down query performance?
> 
> Thanks
> 
> 
>       
>

Re: complex boolean filtering in fq queries

2010-12-07 Thread Tom Hill

For one thing, you wouldn't have fq= in there, except at the beginning.

fq=location:national OR (location:CA AND city:"San Francisco")

more below...

On Tue, Dec 7, 2010 at 10:25 PM, Andy  wrote:
> Forgot to add, my defaultOperator is "AND".
>
> --- On Wed, 12/8/10, Andy  wrote:
>
>> From: Andy 
>> Subject: complex boolean filtering in fq queries
>> To: solr-user@lucene.apache.org
>> Date: Wednesday, December 8, 2010, 1:21 AM
>> I have a facet query that requires
>> some complex boolean filtering. Something like:
>>
>> fq=location:national OR (fq=location:CA AND fq=city:"San
>> Francisco")
>>
>> 1) How do I turn the above filters into a REST query
>> string?

Do you mean URL encoding it? You can just type your query into the
search box in the admin UI, and copy from the resulting URL.

>> 2) Do I need the double quotes around "San Francisco"?

Yes. Else is will be
(city:San) (Francisco)
Probably not what you want.

>> 3) Will complex boolean filters like this substantially
>> slow down query performance?

That's not very complex, and the filter may be cached. Probably won't
be a problem.

Tom

>>
>> Thanks
>>
>>
>>
>>
>
>
>
>

Re: Index version on slave nodes

2010-12-07 Thread Tom Hill

Just off the top of my head, aren't you able to use a slave as a
repeater, so it's configured as both a master and a slave?

http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater

This would seem to require that the slave return the same values as
its master for indexversion. What happens if you configure your slave
as a master, also? Does that get the behavior you want?

Tom



On Tue, Dec 7, 2010 at 8:16 AM, Markus Jelsma
 wrote:
> Yes, i read that too in the replication request handler's source comments. But
> i would find it convenient if it would just use the same values as we see 
> using
> the details command.
>
> Any devs agree? Then i'd open a ticket for this one.
>
> On Tuesday 07 December 2010 17:14:09 Xin Li wrote:
>> I read it somewhere (sorry for not remembering the source).. the
>> indexversion command gets the "replicable" index version #. Since it
>> is a slave machine, so the result is 0.
>>
>> Thanks,
>>
>> On Tue, Dec 7, 2010 at 11:06 AM, Markus Jelsma
>>
>>  wrote:
>> > But why? I'd expect valid version numbers although the replication
>> > handler's source code seems to agree with you judging from the comments.
>> >
>> > On Monday 06 December 2010 17:49:16 Xin Li wrote:
>> >> I think this is expected behavior. You have to issue the "details"
>> >> command to get the real indexversion for slave machines.
>> >>
>> >> Thanks,
>> >> Xin
>> >>
>> >> On Mon, Dec 6, 2010 at 11:26 AM, Markus Jelsma
>> >>
>> >>  wrote:
>> >> > Hi,
>> >> >
>> >> > The indexversion command in the replicationHandler on slave nodes
>> >> > returns 0 for indexversion and generation while the details command
>> >> > does return the correct information. I haven't found an existing
>> >> > ticket on this one although
>> >> > https://issues.apache.org/jira/browse/SOLR-1573 has
>> >> > similarities.
>> >> >
>> >> > Cheers,
>> >> >
>> >> > --
>> >> > Markus Jelsma - CTO - Openindex
>> >> > http://www.linkedin.com/in/markus17
>> >> > 050-8536620 / 06-50258350
>> >
>> > --
>> > Markus Jelsma - CTO - Openindex
>> > http://www.linkedin.com/in/markus17
>> > 050-8536620 / 06-50258350
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Re: complex boolean filtering in fq queries

2010-12-07 Thread Andy

--- On Wed, 12/8/10, Tom Hill  wrote:
> 
> fq=location:national OR (location:CA AND city:"San
> Francisco")

> Do you mean URL encoding it? You can just type your query
> into the
> search box in the admin UI, and copy from the resulting
> URL.

Thanks Tom.

I wasn't referring to URL encoding. I was just unsure about the syntax of the 
fq filter. I didn't know how to express "AND", "OR" in queries or whether I 
could use parentheses in fq.

Let me make sure I got this right. So I could just do something like:

q=foo
&facet=on
&facet.field=location
&facet.field=city
&fq=location:national OR (location:CA AND city:"San Francisco")

And it would do what I want? Thanks.

Re: How to update solr index.

2010-12-07 Thread Anurag


Can you clarify ur question?

-
Kumar Anurag

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-update-solr-index-tp2038480p2038580.html
Sent from the Solr - User mailing list archive at Nabble.com.

61 matches

Mail list logo