Re: Multi CPU Cores

2011-10-17 Thread Robert Brown
Where exactly do you set this up?  We're running Solr3.4 under tomcat,
OpenJDK 1.6.0.20

btw, is the JRE just a different name for the VM?  Apologies for such a
newbie Java question.



On Sun, 16 Oct 2011 12:51:44 -0400, Johannes Goll
 wrote:
> we use the the following in production
> 
> java -server -XX:+UseParallelGC -XX:+AggressiveOpts
> -XX:+DisableExplicitGC -Xms3G -Xmx40G -Djetty.port=
> -Dsolr.solr.home= jar start.jar
> 
> more information
> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
> 
> Johannes



Solr indexing plugin: skip single faulty document?

2011-10-17 Thread samuele.mattiuzzo
Hi all, as far as i know, when solr finds a faulty document (inside an xml
containing let say 1000 docs) it skips the whole file and the indexing
process exits with exception (am i correct?)

I'm using a custom indexing plugin, and i can trap the exception. Instead of
using "default" values if that exception is raised, i would like to skip the
document raising the error (example: sometimes i try to insert a string
inside a "string" field, but solr exits saying it's expecting a multiValued
field... i guess it's because of some ascii chars within the text, something
like \n or sort...) maybe logging it somewhere, and pass to the next one.
We're indexing millions of them, and we don't care much if we loose 10-20%
of them, so the best solution is skip the single faulty doc and continue
with the rest.

I guess i have to work on the super.processAdd() call, but i don't know
where i can find info about it. Can anybody help me? Is there a book talking
about advanced solr plugin developement i could read?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3427646.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: multiple document types in a core

2011-10-17 Thread lee carroll
Just as a follow up

it looks like stored fields are stored verbatim for every doc.

hotel index and store dest attributes
index size: 131M
number of records 49147

hotel index only dest attributes

index size: 111m
number of records 49147


~400 chars(bytes) of destination data * 49147 (number of hotel docs) = ~19m

basically everything is being stored

No difference in time to index (very rough and not scientific :-) )

So it does seem an ok strategy to denormalise docs with index fields
but normalise with stored fields ?
Or have i missed some problems with this ?

cheers lee c



On 16 October 2011 11:54, lee carroll  wrote:
> Hi Chris thanks for the response
>
>> It's an inverted index, so *tems* exist once (per segment) and those terms
>> "point" to the documents -- so having the same terms (in the same fields)
>> for multiple types of documents in one index is going to take up less
>> overall space then having distinct collections for each type of document.
>
> I'm not asking about the indexed terms but rather the stored values.
> By having two doc types are we gaining anything by "storing"
> attributes only for that doc type
>
> cheers lee c
>


RE: document update / nested documents / document join

2011-10-17 Thread Kai Gülzau
Nobody?

SOLR-139 seems to be the most popular issue but I don’t think this will be 
resolved in near future (this year). Right?

So I will try SOLR-2272 as a workaround, split up my documents in "static" and 
" frequently updated"
and join them at query time.

What is the exact join query to do a query like "category:bugfixes AND 
body:answer"
  matching "category:bugfixes" in doc1 and
  matching "body:answer" in doc3
  with just returning "doc 1"??

I adopted the fieldnames of
doc 3:
type: out
out_ticketid: 1001
out_body: this is my answer
out_category: other

q={!join+from=out_ticketid+to=ticketid}(category:bugfixes+OR+out_category:bugfixes)+AND+(body:answer+OR+out_body:answer)


Writing this, I doubt this syntax is even possible!?
Additionally I'm not sure if trunk with SOLR-2272 is "production ready".

The only way to do what I want in a released 3.x version is to do several 
searches and joining the results manually.
e.g. 
q=category:bugfixes -> doc1 -> ticketid: 1001
q=body:answers -> doc3 -> ticket:1001
-> result ticketid:1001

This I way I would lose benefits like faceted search etc. :-\

Any suggestions?


Regards,

Kai Gülzau

-Original Message-
From: Kai Gülzau [mailto:kguel...@novomind.com] 
Sent: Thursday, October 13, 2011 4:52 PM
To: solr-user@lucene.apache.org
Subject: document update / nested documents / document join

Hi *,

i am a bit confused about what is the best way to achieve my requirements.

We have a mail ticket system. A ticket is created when a mail is received by 
the system:

doc 1:
uid: 1001_in
ticketid: 1001
type: in
body: I have a problem
category: bugfixes
date: 201110131955

This incoming document is static. While the ticket is in progress there is 
another document representing the current/last state of the ticket. Some fields 
of this document are updated frequently:

doc 2:
uid: 1001_out
ticketid: 1001
type: out
body:
category: bugfixes
date: 201110132015

a bit later (doc 2 is deleted/updated):
doc 3:
uid: 1001_out
ticketid: 1001
type: out
body: this is my answer
category: other
date: 201110140915

I would like to do a boolean search spanning multiple documents like 
"category:bugfixes AND body:answer".

I think it's the same what was proposed by:
http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

So I dig into the deeps of Lucene and Solr tickets and now i am stuck choosing 
the "right" way:

https://issues.apache.org/jira/browse/LUCENE-2454 Nested Document query support
https://issues.apache.org/jira/browse/LUCENE-3171 BlockJoinQuery/Collector
https://issues.apache.org/jira/browse/LUCENE-1879 Parallel incremental indexing
https://issues.apache.org/jira/browse/SOLR-139 Support updateable/modifiable 
documents
https://issues.apache.org/jira/browse/SOLR-2272 Join


If it is easily possible to update one field in a document i would just merge 
the two logical documents into one representing the whole ticket. But i can't 
see this is already possible.

SOLR-2272 seems to be the best solution by now but feels like workaround.
" I can't update a document field so i split it up in static and dynamic 
content and join both at query time."

SOLR-2272 is committed to trunk/solr 4.
Are there any planned release dates for solr 4 or a possible backport for 
SOLR-2272 in 3.x?


I would appreciate any suggestions.

Regards,

Kai Gülzau







millions of records problem

2011-10-17 Thread Jesús Martín García

Hi,

I've got 500 millions of documents in solr everyone with the same number 
of fields an similar width. The version of solr which I used is 1.4.1 
with lucene 2.9.3.


I don't have the option to use shards so the whole index has to be in a 
machine...


The size of the index is about 50Gb and the ram is 8GbEverything is 
working but the searches are so slowly, although I tried different 
configurations of the solrconfig.xml as:


- configure a first searcher with the most used searches
- configure the caches (query, filter and document) with great numbers...

but everything is still working slowly, so do you have any ideas to 
boost the searches without the penalty to use much more ram?


Thanks in advance,

Jesús

--
...
  __
/   /   Jesús Martín García
C E / S / C A   Tècnic de Projectes
  /__ / Centre de Serveis Científics i Acadèmics de Catalunya

Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat
...



Re: Performance issue: Frange with geodist()

2011-10-17 Thread roySolr
Hi Yonik,

I have used your suggestion to implement a better radius searcher:

&facet.query={!geofilt d=10 key=d10}
&facet.query={!geofilt d=20 key=d20}
&facet.query={!geofilt d=50 key=d50} 

It is a little bit faster than with geodist() but still a bottleneck i
think.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-issue-Frange-with-geodist-tp3417962p3427820.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: millions of records problem

2011-10-17 Thread Jan Høydahl
Hi,

What exactly do you mean by "slow" search? 1s? 10s?
Which operating system, how many CPUs, which servlet container and how much RAM 
have you allocated to your JVM? (-Xmx)
What kind and size of docs? Your numbers indicate about 100bytes per doc?
What kind of searches? Facets? Sorting? Wildcards?
Have you tried to "slim down" you schema by setting indexed="false" and 
stored="false" wherever possible?

First thought is that it's really impressive if you've managed to get 500mill 
docs into one index with only 8Gb RAM!! I would expect that to fail or best 
case be veery slow. If you have a beefy server I'd first try putting in 64Gb 
RAM, slim down your schema and perhaps even switch to Solr4.0(trunk) which is 
more RAM efficient.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 17. okt. 2011, at 12:19, Jesús Martín García wrote:

> Hi,
> 
> I've got 500 millions of documents in solr everyone with the same number of 
> fields an similar width. The version of solr which I used is 1.4.1 with 
> lucene 2.9.3.
> 
> I don't have the option to use shards so the whole index has to be in a 
> machine...
> 
> The size of the index is about 50Gb and the ram is 8GbEverything is 
> working but the searches are so slowly, although I tried different 
> configurations of the solrconfig.xml as:
> 
> - configure a first searcher with the most used searches
> - configure the caches (query, filter and document) with great numbers...
> 
> but everything is still working slowly, so do you have any ideas to boost the 
> searches without the penalty to use much more ram?
> 
> Thanks in advance,
> 
> Jesús
> 
> -- 
> ...
>  __
>/   /   Jesús Martín García
> C E / S / C A   Tècnic de Projectes
>  /__ / Centre de Serveis Científics i Acadèmics de Catalunya
> 
> Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
> T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat
> ...
> 



Re: document update / nested documents / document join

2011-10-17 Thread Thijs

Hi,

First. I'm not sure you know. But the join isn't like a join in a database
it's more like
  select * from (set of documents that match query)
  where exists (set of documents that match join query)

I have some complex (multiple join fq) in one call and that is fine, so 
I think this query may work also.

other wise you could try something like:
q=*:*&fq={!join+from=out_ticketid+to=ticketid}(category:bugfixes+OR+out_category:bugfixes)&fq={!join+from=out_ticketid+to=ticketid}(body:answer+OR+out_body:answer)

My wish would also be that this where backported to 3.x. But if not 
we'll probably go live on 4.x


Thijs


On 17-10-2011 11:46, Kai Gülzau wrote:

Nobody?

SOLR-139 seems to be the most popular issue but I don’t think this will be 
resolved in near future (this year). Right?

So I will try SOLR-2272 as a workaround, split up my documents in "static" and " 
frequently updated"
and join them at query time.

What is the exact join query to do a query like "category:bugfixes AND 
body:answer"
   matching "category:bugfixes" in doc1 and
   matching "body:answer" in doc3
   with just returning "doc 1"??

I adopted the fieldnames of
doc 3:
type: out
out_ticketid: 1001
out_body: this is my answer
out_category: other

q={!join+from=out_ticketid+to=ticketid}(category:bugfixes+OR+out_category:bugfixes)+AND+(body:answer+OR+out_body:answer)


Writing this, I doubt this syntax is even possible!?
Additionally I'm not sure if trunk with SOLR-2272 is "production ready".

The only way to do what I want in a released 3.x version is to do several 
searches and joining the results manually.
e.g.
q=category:bugfixes ->  doc1 ->  ticketid: 1001
q=body:answers ->  doc3 ->  ticket:1001
->  result ticketid:1001

This I way I would lose benefits like faceted search etc. :-\

Any suggestions?


Regards,

Kai Gülzau

-Original Message-
From: Kai Gülzau [mailto:kguel...@novomind.com]
Sent: Thursday, October 13, 2011 4:52 PM
To: solr-user@lucene.apache.org
Subject: document update / nested documents / document join

Hi *,

i am a bit confused about what is the best way to achieve my requirements.

We have a mail ticket system. A ticket is created when a mail is received by 
the system:

doc 1:
uid: 1001_in
ticketid: 1001
type: in
body: I have a problem
category: bugfixes
date: 201110131955

This incoming document is static. While the ticket is in progress there is 
another document representing the current/last state of the ticket. Some fields 
of this document are updated frequently:

doc 2:
uid: 1001_out
ticketid: 1001
type: out
body:
category: bugfixes
date: 201110132015

a bit later (doc 2 is deleted/updated):
doc 3:
uid: 1001_out
ticketid: 1001
type: out
body: this is my answer
category: other
date: 201110140915

I would like to do a boolean search spanning multiple documents like 
"category:bugfixes AND body:answer".

I think it's the same what was proposed by:
http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

So I dig into the deeps of Lucene and Solr tickets and now i am stuck choosing the 
"right" way:

https://issues.apache.org/jira/browse/LUCENE-2454 Nested Document query support
https://issues.apache.org/jira/browse/LUCENE-3171 BlockJoinQuery/Collector
https://issues.apache.org/jira/browse/LUCENE-1879 Parallel incremental indexing
https://issues.apache.org/jira/browse/SOLR-139 Support updateable/modifiable 
documents
https://issues.apache.org/jira/browse/SOLR-2272 Join


If it is easily possible to update one field in a document i would just merge 
the two logical documents into one representing the whole ticket. But i can't 
see this is already possible.

SOLR-2272 seems to be the best solution by now but feels like workaround.
" I can't update a document field so i split it up in static and dynamic content and 
join both at query time."

SOLR-2272 is committed to trunk/solr 4.
Are there any planned release dates for solr 4 or a possible backport for 
SOLR-2272 in 3.x?


I would appreciate any suggestions.

Regards,

Kai Gülzau









Re: Multi CPU Cores

2011-10-17 Thread Johannes Goll
Yes, same thing. This was for the jetty servlet container not tomcat. I would 
refer to the tomcat documentation on how to modify/configure the java runtime 
environment (JRE) arguments for your running instance.
Johannes

On Oct 17, 2011, at 4:01 AM, Robert Brown  wrote:

> Where exactly do you set this up?  We're running Solr3.4 under tomcat,
> OpenJDK 1.6.0.20
> 
> btw, is the JRE just a different name for the VM?  Apologies for such a
> newbie Java question.
> 
> 
> 
> On Sun, 16 Oct 2011 12:51:44 -0400, Johannes Goll
>  wrote:
>> we use the the following in production
>> 
>> java -server -XX:+UseParallelGC -XX:+AggressiveOpts
>> -XX:+DisableExplicitGC -Xms3G -Xmx40G -Djetty.port=
>> -Dsolr.solr.home= jar start.jar
>> 
>> more information
>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>> 
>> Johannes
> 


RE: document update / nested documents / document join

2011-10-17 Thread Kai Gülzau
I just found another feature/ticket to be able to update fields:
https://issues.apache.org/jira/browse/SOLR-2753
https://issues.apache.org/jira/browse/LUCENE-1231

-> CSF Column Stride Fields

This should work well with simple fields like category/date/...!?

So I have 2 options:
1.)
Introduce a rather complex logic on client side to form the right join query 
(or do join manually),
which should, as you stated, work even with complex queries.

2.)
Or do it straightforward, combine all docs to one and WAIT for one of the 
various "update field/doc"
features to be realized.


I think I'll give 1.) a try and wait for 2.) if I get into trouble.


Regards,

Kai Gülzau
  

-Original Message-
From: Thijs [mailto:vonk.th...@gmail.com] 
Sent: Monday, October 17, 2011 1:22 PM
To: solr-user@lucene.apache.org
Subject: Re: document update / nested documents / document join

Hi,

First. I'm not sure you know. But the join isn't like a join in a database it's 
more like
   select * from (set of documents that match query)
   where exists (set of documents that match join query)

I have some complex (multiple join fq) in one call and that is fine, so I think 
this query may work also.
other wise you could try something like:
q=*:*&fq={!join+from=out_ticketid+to=ticketid}(category:bugfixes+OR+out_category:bugfixes)&fq={!join+from=out_ticketid+to=ticketid}(body:answer+OR+out_body:answer)

My wish would also be that this where backported to 3.x. But if not we'll 
probably go live on 4.x

Thijs


On 17-10-2011 11:46, Kai Gülzau wrote:
> Nobody?
>
> SOLR-139 seems to be the most popular issue but I don’t think this will be 
> resolved in near future (this year). Right?
>
> So I will try SOLR-2272 as a workaround, split up my documents in "static" 
> and " frequently updated"
> and join them at query time.
>
> What is the exact join query to do a query like "category:bugfixes AND 
> body:answer"
>matching "category:bugfixes" in doc1 and
>matching "body:answer" in doc3
>with just returning "doc 1"??
>
> I adopted the fieldnames of
> doc 3:
> type: out
> out_ticketid: 1001
> out_body: this is my answer
> out_category: other
>
> q={!join+from=out_ticketid+to=ticketid}(category:bugfixes+OR+out_categ
> ory:bugfixes)+AND+(body:answer+OR+out_body:answer)
>
>
> Writing this, I doubt this syntax is even possible!?
> Additionally I'm not sure if trunk with SOLR-2272 is "production ready".
>
> The only way to do what I want in a released 3.x version is to do several 
> searches and joining the results manually.
> e.g.
> q=category:bugfixes ->  doc1 ->  ticketid: 1001 q=body:answers ->  
> doc3 ->  ticket:1001
> ->  result ticketid:1001
>
> This I way I would lose benefits like faceted search etc. :-\
>
> Any suggestions?
>
>
> Regards,
>
> Kai Gülzau
>
> -Original Message-
> From: Kai Gülzau [mailto:kguel...@novomind.com]
> Sent: Thursday, October 13, 2011 4:52 PM
> To: solr-user@lucene.apache.org
> Subject: document update / nested documents / document join
>
> Hi *,
>
> i am a bit confused about what is the best way to achieve my requirements.
>
> We have a mail ticket system. A ticket is created when a mail is received by 
> the system:
>
> doc 1:
> uid: 1001_in
> ticketid: 1001
> type: in
> body: I have a problem
> category: bugfixes
> date: 201110131955
>
> This incoming document is static. While the ticket is in progress there is 
> another document representing the current/last state of the ticket. Some 
> fields of this document are updated frequently:
>
> doc 2:
> uid: 1001_out
> ticketid: 1001
> type: out
> body:
> category: bugfixes
> date: 201110132015
>
> a bit later (doc 2 is deleted/updated):
> doc 3:
> uid: 1001_out
> ticketid: 1001
> type: out
> body: this is my answer
> category: other
> date: 201110140915
>
> I would like to do a boolean search spanning multiple documents like 
> "category:bugfixes AND body:answer".
>
> I think it's the same what was proposed by:
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-sup
> port-in-lucene
>
> So I dig into the deeps of Lucene and Solr tickets and now i am stuck 
> choosing the "right" way:
>
> https://issues.apache.org/jira/browse/LUCENE-2454 Nested Document 
> query support
> https://issues.apache.org/jira/browse/LUCENE-3171 
> BlockJoinQuery/Collector
> https://issues.apache.org/jira/browse/LUCENE-1879 Parallel incremental 
> indexing
> https://issues.apache.org/jira/browse/SOLR-139 Support 
> updateable/modifiable documents
> https://issues.apache.org/jira/browse/SOLR-2272 Join
>
>
> If it is easily possible to update one field in a document i would just merge 
> the two logical documents into one representing the whole ticket. But i can't 
> see this is already possible.
>
> SOLR-2272 seems to be the best solution by now but feels like workaround.
> " I can't update a document field so i split it up in static and dynamic 
> content and join both at query time."
>
> SOLR-2272 is committed to trunk/solr 4.
> Are there any plann

Re: millions of records problem

2011-10-17 Thread Nick Veenhof
You could use this technique? I'm currently reading up on it
http://khaidoan.wikidot.com/solr-common-gram-filter


On 17 October 2011 12:57, Jan Høydahl  wrote:
> Hi,
>
> What exactly do you mean by "slow" search? 1s? 10s?
> Which operating system, how many CPUs, which servlet container and how much 
> RAM have you allocated to your JVM? (-Xmx)
> What kind and size of docs? Your numbers indicate about 100bytes per doc?
> What kind of searches? Facets? Sorting? Wildcards?
> Have you tried to "slim down" you schema by setting indexed="false" and 
> stored="false" wherever possible?
>
> First thought is that it's really impressive if you've managed to get 500mill 
> docs into one index with only 8Gb RAM!! I would expect that to fail or best 
> case be veery slow. If you have a beefy server I'd first try putting in 64Gb 
> RAM, slim down your schema and perhaps even switch to Solr4.0(trunk) which is 
> more RAM efficient.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 17. okt. 2011, at 12:19, Jesús Martín García wrote:
>
>> Hi,
>>
>> I've got 500 millions of documents in solr everyone with the same number of 
>> fields an similar width. The version of solr which I used is 1.4.1 with 
>> lucene 2.9.3.
>>
>> I don't have the option to use shards so the whole index has to be in a 
>> machine...
>>
>> The size of the index is about 50Gb and the ram is 8GbEverything is 
>> working but the searches are so slowly, although I tried different 
>> configurations of the solrconfig.xml as:
>>
>> - configure a first searcher with the most used searches
>> - configure the caches (query, filter and document) with great numbers...
>>
>> but everything is still working slowly, so do you have any ideas to boost 
>> the searches without the penalty to use much more ram?
>>
>> Thanks in advance,
>>
>> Jesús
>>
>> --
>> ...
>>      __
>>    /   /       Jesús Martín García
>> C E / S / C A   Tècnic de Projectes
>>  /__ /         Centre de Serveis Científics i Acadèmics de Catalunya
>>
>> Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
>> T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat
>> ...
>>
>
>


Getting errors thrown from sun.nio.ch.FileDispatcher with native or simple or single lock .Please , i need help in resolving the issue.

2011-10-17 Thread Anitha Muppural (amuppura)
Hi,

 

My name is Anitha Muppural and I work as a software Engineer at Cisco.

 

I have a solr/lucene 3.3.0 installed in I have a single core.

The sorl web application is deployed to websphere application Server 6.1
in a cluster JVM.(2)

I do delta imports programmatically using solrj and application listener
as per the instruction given here 
http://wiki.apache.org/solr/DataImportHandler#Scheduling

 

1.   Here is my sole config setting snippets

  simple

 

1

6

6

 

 



  

qiscore-data-config.xml

  

  

 

 

2.  Here is the code where I call the delta import inside a timer task

 

SolrServer server = new CommonsHttpSolrServer(solrCore);

ModifiableSolrParams params = new ModifiableSolrParams();

params.set("qt", "/qis/dataimport");

params.set("command", "delta-import");  

params.set("commit", "true");

params.set("clean","false");

params.set("optimize", "true");

QueryResponse factSummaryResponse = server.query(params);

 

 

3. I have set the timer to run every hour.

Once  the delta import is done I get this error intermittently. I have
to restart the solr war for it to go away.

 

 

3.   I installed solr/lucene in our development environment and
deployed the solr war to WAS but with a single JVM. I do not see this
error there. Not even once.

4.   The difference between my development and stage environment is 

1.   Single to Multiple JVMs.

2.   The owner of the files created in the 2 environments differ in
the sense that development is set to my userd but stage is set a a
generic id.

 

 

I appreciate all your help.

 

Regards,

Anitha Muppural



 



Re: Getting errors thrown from sun.nio.ch.FileDispatcher with native or simple or single lock .Please , i need help in resolving the issue.

2011-10-17 Thread Otis Gospodnetic
Anitha,

I don't know about others, but your image didn't come through.  Try describing 
and pasting the text of the error instead.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Anitha Muppural (amuppura) 
>To: solr-user@lucene.apache.org
>Sent: Monday, October 17, 2011 8:57 AM
>Subject: Getting errors thrown from sun.nio.ch.FileDispatcher  with native or 
>simple or single lock .Please , i need help in resolving the issue.
>
>
>Hi,
> 
>My name is Anitha Muppural and I work as a software Engineer at Cisco.
> 
>I have a solr/lucene 3.3.0 installed in I have a single core.
>The sorl web application is deployed to websphere application Server 6.1 in a 
>cluster JVM.(2)
>I do delta imports programmatically using solrj and application listener as 
>per the instruction given here 
>http://wiki.apache.org/solr/DataImportHandler#Scheduling
> 
>1.   Here is my sole config setting snippets
>  simple
> 
>1
>    6
>    6
> 
> 
>class="org.apache.solr.handler.dataimport.DataImportHandler">
>  
>    qiscore-data-config.xml
>  
>  
> 
> 
>2.  Here is the code where I call the delta import inside a timer task
> 
>SolrServer server = newCommonsHttpSolrServer(solrCore);
>    ModifiableSolrParams params = newModifiableSolrParams();
>    params.set("qt", "/qis/dataimport");
>    params.set("command", "delta-import");  
>    params.set("commit", "true");
>    params.set("clean","false");
>    params.set("optimize", "true");
>    QueryResponse factSummaryResponse = server.query(params);
> 
> 
>3. I have set the timer to run every hour.
>Once  the delta import is done I get this error intermittently. I have to 
>restart the solr war for it to go away.
> 
>3.   I installed solr/lucene in our development environment and deployed 
>the solr war to WAS but with a single JVM. I do not see this error there. Not 
>even once.
>4.   The difference between my development and stage environment is 
>1.   Single to Multiple JVMs.
>2.   The owner of the files created in the 2 environments differ in the 
>sense that development is set to my userd but stage is set a a generic id.
> 
> 
>I appreciate all your help.
> 
>Regards,
>Anitha Muppural
>    
> 
>
>

Re: Multi CPU Cores

2011-10-17 Thread Otis Gospodnetic
Robert,

You have to add (some of) that stuff to the command for starting Java/Tomcat.  
Likely in a catalina.sh script.

That said, I do NOT recommend you use those parameters at all because they may 
be completely unneeded or even unsuitable for your environment.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/




>
>From: Robert Brown 
>To: solr-user@lucene.apache.org
>Sent: Monday, October 17, 2011 4:01 AM
>Subject: Re: Multi CPU Cores
>
>Where exactly do you set this up?  We're running Solr3.4 under tomcat,
>OpenJDK 1.6.0.20
>
>btw, is the JRE just a different name for the VM?  Apologies for such a
>newbie Java question.
>
>
>
>On Sun, 16 Oct 2011 12:51:44 -0400, Johannes Goll
> wrote:
>> we use the the following in production
>> 
>> java -server -XX:+UseParallelGC -XX:+AggressiveOpts
>> -XX:+DisableExplicitGC -Xms3G -Xmx40G -Djetty.port=
>> -Dsolr.solr.home= jar start.jar
>> 
>> more information
>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>> 
>> Johannes
>
>
>
>

Re: millions of records problem

2011-10-17 Thread Otis Gospodnetic
Hi Jesús,

Others have already asked a number of relevant question.  If I had to guess, 
I'd guess this is simply a disk IO issue, but of course there may be room for 
improvement without getting more RAM or SSDs, so tell us more about your 
queries, about disk IO you are seeing, etc.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Jesús Martín García 
>To: solr-user@lucene.apache.org
>Sent: Monday, October 17, 2011 6:19 AM
>Subject: millions of records problem
>
>Hi,
>
>I've got 500 millions of documents in solr everyone with the same number of 
>fields an similar width. The version of solr which I used is 1.4.1 with lucene 
>2.9.3.
>
>I don't have the option to use shards so the whole index has to be in a 
>machine...
>
>The size of the index is about 50Gb and the ram is 8GbEverything is 
>working but the searches are so slowly, although I tried different 
>configurations of the solrconfig.xml as:
>
>- configure a first searcher with the most used searches
>- configure the caches (query, filter and document) with great numbers...
>
>but everything is still working slowly, so do you have any ideas to boost the 
>searches without the penalty to use much more ram?
>
>Thanks in advance,
>
>Jesús
>
>-- ...
>      __
>    /   /       Jesús Martín García
>C E / S / C A   Tècnic de Projectes
>  /__ /         Centre de Serveis Científics i Acadèmics de Catalunya
>
>Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
>T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat
>...
>
>
>
>

Re: Multi CPU Cores

2011-10-17 Thread Robert Brown
Thanks Otis,

I certainly won't be copying & pasting - Good to know such options are
available tho.



On Mon, 17 Oct 2011 07:01:24 -0700 (PDT), Otis Gospodnetic
 wrote:
> Robert,
> 
> You have to add (some of) that stuff to the command for starting
> Java/Tomcat.  Likely in a catalina.sh script.
> 
> That said, I do NOT recommend you use those parameters at all because
> they may be completely unneeded or even unsuitable for your
> environment.
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> 
>>
>>From: Robert Brown 
>>To: solr-user@lucene.apache.org
>>Sent: Monday, October 17, 2011 4:01 AM
>>Subject: Re: Multi CPU Cores
>>
>>Where exactly do you set this up?  We're running Solr3.4 under tomcat,
>>OpenJDK 1.6.0.20
>>
>>btw, is the JRE just a different name for the VM?  Apologies for such a
>>newbie Java question.
>>
>>
>>
>>On Sun, 16 Oct 2011 12:51:44 -0400, Johannes Goll
>> wrote:
>>> we use the the following in production
>>>
>>> java -server -XX:+UseParallelGC -XX:+AggressiveOpts
>>> -XX:+DisableExplicitGC -Xms3G -Xmx40G -Djetty.port=
>>> -Dsolr.solr.home= jar start.jar
>>>
>>> more information
>>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>>>
>>> Johannes
>>
>>
>>
>>



Re: Question about near query order

2011-10-17 Thread Ahmet Arslan
> I have some near query like "analyze term"~2.
> That is matched in that order.
> But I want to search regardless of order.
> So far, I just queried "analyze term"~2 OR "term
> analyze"~2.
> Is there a better way than what i did?

I think, PhraseQuery should be unordered with slop values grater than 0.
(but it is ordered with slope value of 0)
The following two queries should return same result set:

"analyze term"~2 
"term analyze"~2

Isn't that the case for you?



feeding while solr is running ?

2011-10-17 Thread lorenlai
Hello Solr experts,

I'm newbie regarding Solr. 

1) I would like to know if it is possible to import data (feeding) while
Solr is still running ?
Is it possible or Solr should be "shut down", I then can start my feeding
process ?

Any LINKS regarding to this topic? :-)

2) How can I import my data as into the index ? Via HTTP ? And is it
possible to "automate" this feeding process ?

Any LINKS regarding to this topic? :-)

3) Is it possible to write a "Batch-Loader" (Batch Jobs) which import the
data into the index ?

Any LINKS regarding to this topic? :-)

Thank you very much in advance. ;-)

with best regards

Loren


--
View this message in context: 
http://lucene.472066.n3.nabble.com/feeding-while-solr-is-running-tp3428500p3428500.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: feeding while solr is running ?

2011-10-17 Thread Alireza Salimi
Well,

I'm not a Solr expert, but the first thing that you should
start reading is the Solr tutorial and then Solr wiki.
It won't take long to read both of them.

Regarding your questions:
1) It's possible
2 and 3) There are different ways to update (HTTP or Java). You can create a
CRON job to send the HTTP Command.

On Mon, Oct 17, 2011 at 11:15 AM, lorenlai  wrote:

> Hello Solr experts,
>
> I'm newbie regarding Solr.
>
> 1) I would like to know if it is possible to import data (feeding) while
> Solr is still running ?
> Is it possible or Solr should be "shut down", I then can start my feeding
> process ?
>
> Any LINKS regarding to this topic? :-)
>
> 2) How can I import my data as into the index ? Via HTTP ? And is it
> possible to "automate" this feeding process ?
>
> Any LINKS regarding to this topic? :-)
>
> 3) Is it possible to write a "Batch-Loader" (Batch Jobs) which import the
> data into the index ?
>
> Any LINKS regarding to this topic? :-)
>
> Thank you very much in advance. ;-)
>
> with best regards
>
> Loren
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/feeding-while-solr-is-running-tp3428500p3428500.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Alireza Salimi
Java EE Developer


Query with star returns double type values equal 0

2011-10-17 Thread romain
Hello,

I am experiencing an unexpected behavior using solr 3.4.0.

if my query includes a star, all the properties of type 'long' or 'LatLon'
have 0 as value
(ex: select/?start=0&q=way*&rows=10&version=2)

Though the same request without stars returns correct values
(ex: select/?start=0&q=way&rows=10&version=2)

Does anyone have an idea?

Romain.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-with-star-returns-double-type-values-equal-0-tp3428721p3428721.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: feeding while solr is running ?

2011-10-17 Thread Robert Stewart
See below...

On Oct 17, 2011, at 11:15 AM, lorenlai wrote:

> 1) I would like to know if it is possible to import data (feeding) while
> Solr is still running ?

Yes.  You can search and index new content at the same time.  But typically in 
production systems you may have one or more "master" SOLR instances accepting 
new documents, and then setup SOLR replication and multiple "slave" instances 
behind a load balancer in order to handle search requests.

> 2) How can I import my data as into the index ? Via HTTP ? And is it
> possible to "automate" this feeding process ?

You can post new documents via HTTP POST (as single documents, or as a batch of 
documents).
You can also use a data import handler (DIH) to pull data from some repository 
such as a SQL database, and then initiate such imports via HTTP request called 
by a cron job, etc. 

> 
> Any LINKS regarding to this topic? :-)

http://wiki.apache.org/solr/UpdateXmlMessages
http://wiki.apache.org/solr/DataImportHandler

> 
> 3) Is it possible to write a "Batch-Loader" (Batch Jobs) which import the
> data into the index ?

Yes.  You can use any of the available SOLR clients from some program or script 
(solrj for java, various python clients, etc.).  Or you could write some java 
code that embeds SOLR (or even Lucene) directly to build indexes.  There are 
many possibilities in that case.

> 
> Any LINKS regarding to this topic? :-)

http://wiki.apache.org/solr/IntegratingSolr



Re: millions of records problem

2011-10-17 Thread Vadim Kisselmann
Hi,
a number of relevant questions is given.
i have another one:
which type of docs do you have? Do you add some new docs every day? Or is it
a stable number of docs (500Mio.) ?
What about Replication?

Regards Vadim


2011/10/17 Otis Gospodnetic 

> Hi Jesús,
>
> Others have already asked a number of relevant question.  If I had to
> guess, I'd guess this is simply a disk IO issue, but of course there may be
> room for improvement without getting more RAM or SSDs, so tell us more about
> your queries, about disk IO you are seeing, etc.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> >
> >From: Jesús Martín García 
> >To: solr-user@lucene.apache.org
> >Sent: Monday, October 17, 2011 6:19 AM
> >Subject: millions of records problem
> >
> >Hi,
> >
> >I've got 500 millions of documents in solr everyone with the same number
> of fields an similar width. The version of solr which I used is 1.4.1 with
> lucene 2.9.3.
> >
> >I don't have the option to use shards so the whole index has to be in a
> machine...
> >
> >The size of the index is about 50Gb and the ram is 8GbEverything is
> working but the searches are so slowly, although I tried different
> configurations of the solrconfig.xml as:
> >
> >- configure a first searcher with the most used searches
> >- configure the caches (query, filter and document) with great numbers...
> >
> >but everything is still working slowly, so do you have any ideas to boost
> the searches without the penalty to use much more ram?
> >
> >Thanks in advance,
> >
> >Jesús
> >
> >-- ...
> >  __
> >/   /   Jesús Martín García
> >C E / S / C A   Tècnic de Projectes
> >  /__ / Centre de Serveis Científics i Acadèmics de Catalunya
> >
> >Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
> >T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat
> >...
> >
> >
> >
> >
>


Re: NRT and replication

2011-10-17 Thread Esteban Donato
thanks Yonik.  Any idea of when this should be completed?  In the
meantime I think I will have to add docs to every replica, possibly
implementing an update processor.  Something similar to SOLR-2355?

On Fri, Oct 14, 2011 at 7:31 PM, Yonik Seeley
 wrote:
> On Fri, Oct 14, 2011 at 5:49 PM, Esteban Donato
>  wrote:
>>  I found soft commits very useful for NRT search requirements.
>> However I couldn't figure out how replication works with this feature.
>>  I mean, if I have N replicas of an index for load balancing purposes,
>> when I soft commit a doc in one of this nodes, is there any way that
>> those "in-memory" docs get replicated to the rest of replicas?
>
> Nope.  Index replication isn't really that compatible with NRT.
> But the new distributed indexing features we're working on will be!
> The parent issue for this effort is SOLR-2358.
>
> -Yonik
> http://www.lucene-eurocon.com - The Lucene/Solr User Conference
>


OS Cache - Solr

2011-10-17 Thread Sujatha Arun
Hello

I am trying to understand the  OS cache utilization of Solr .Our server has
several solr instances on a server .The total combined Index size of all
instances is abt 14 Gb and the size of the maximum single Index is abt 2.5
GB .

Our Server has Quad processor with 32 GB RAM .Out of which 20 GB has been
assigned to  JVM. We are running solr1.3  on tomcat 5.5 and Java 1.6

Our current Statistics indicate that  solr uses 18-19 GB of 20 GB RAM
assigned to JVM .However the  Free physical seems to remain constant as
below.
Free physical memory = 163 Mb
Total physical memory = 32,232 Mb,

The server also serves as a backup server for Mysql where the application DB
is backed up and restored .During this activity we see that lot of queries
that nearly take even 10+ minutes to execute .But other wise
maximum query time is less than  1-2 secs

The physical memory that is free seems to be constant . Why is this constant
and how this will be used between the  Mysql backup and solr while
backup activity is  happening How much free physical memory should be
available to OS given out stats.?

Any pointers would be helpful.

Regards
Sujatha


Re: Callback on starting solr?

2011-10-17 Thread Jithin
How do I configure solr with a ping request?
http://localhost:8983/solr/admin/ping/ gives HTTP 404.

On Mon, Oct 17, 2011 at 1:06 AM, Jan Høydahl / Cominvent [via Lucene] <
ml-node+s472066n3426539...@n3.nabble.com> wrote:

> Your app-server will start listening to the port some time before the Solr
> webapp is ready, so you should check directly with Solr. You could also use
> JMX to check Solr's status. If you want help with your reindex failing
> issue, please provide more context. 25Mb is very low, please try give your
> VM more memory and see if indexing succeeds then.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 16. okt. 2011, at 20:38, Jithin wrote:
>
> > I am doing something similar to that. checking netstat for any connection
> on
> > port. Wanted to know if there is anything solr can do built in.
> >
> > Also I notice that my reindex is failing when I have to reindex some 7k+
> > docs. Solr is giving error in logs -
> >
> >
> > Caused by: java.net.SocketException: Broken pipe
> >at java.net.SocketOutputStream.socketWrite0(Native Method)
> >at
> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> >at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> >at
> org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)
> >at
> org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)
> >at
> org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)
> >at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)
> >... 25 more
> >
> > 2011-10-16 18:05:05.431:WARN::Committed before 500
> > null||org.mortbay.jetty.EofException|?at
> > org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at
> >
> org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at
>
> >
> org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at
> > sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:296)|?at
> > sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)|?at
> > java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)|?at
> > org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)|?at
> >
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)|?at
>
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)|?at
>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)|?at
>
> >
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)|?at
>
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)|?at
>
> >
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)|?at
>
> >
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)|?at
>
> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)|?at
>
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)|?at
>
> >
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)|?at
>
> >
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)|?at
>
> > org.mortbay.jetty.Server.handle(Server.java:326)|?at
> >
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)|?at
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)|?at
>
> > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)|?at
> > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)|?at
> > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)|?at
> >
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)|?at
>
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)|Caused
>
> > by: java.net.SocketException: Broken pipe|?at
> > java.net.SocketOutputStream.socketWrite0(Native Method)|?at
> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)|?at
> > java.net.SocketOutputStream.write(SocketOutputStream.java:153)|?at
> > org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)|?at
> > org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)|?at
> > org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)|?at
> > org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)|?... 25
> more|
> > 2011-10-16 18:05:05.432:WARN::/solr/core0/update/
> > java.lang.IllegalStateException: Committed
> >
> >
> > Is it a case where solr is not able to handle load? Currently solr is
> > running with a max memory setting of 25MB. All the docs are very small.
> Each
> > one contains just a few words.
> >
> > On Sun, Oct 16, 2011 at 11:52 PM, Jan Høydahl / Cominvent [via Lucene] <
> > [hidden email] >
> wrote:
> >
> >> Hi,
> >>
> >> This depends on your application server and

RE: Error loading class 'solr.extraction.ExtractingRequestHandler'

2011-10-17 Thread Jaeger, Jay - DOT
It sounds like maybe you either have not told Solr where the Solr home 
directory is, or , more likely, have not copied the jar files for this 
particular class into the right directory (typically a "lib" directory) so 
Tomcat cannot find that class.  There is other correspondence on this list that 
you can look for that discusses the options for defining the Solr home 
directory.

JRU

-Original Message-
From: Sina Fakhraee [mailto:dx3...@wayne.edu] 
Sent: Wednesday, October 12, 2011 4:27 PM
To: solr-user@lucene.apache.org
Subject: Error loading class 'solr.extraction.ExtractingRequestHandler'

Dear Sir/Mam,

I am trying to use curl 
"http://localhost:8080/solr/update/extract?literal.id=doc1&commit=true"; -F 
"myfile=@somefile.pdf" from the wiki site... but I get the error cause by: 
Caused by: org.apache.solr.common.SolrException: Error loading class 
'solr.extraction.ExtractingRequestHandler'

With the jetty and the provided example, I have no problem. It all happens when 
I use tomcat and solr.

My setup is as follows: 

I downloaded the apache-solr-3.3.0 and unpacked itI am using 
"apache-solr-3.3.0" folder as my solr-home folder. Inside the "dist" folder I 
have the apache-solr-3.3.0.war and coppied everything from the 
contrib/extraction/lib into dist. 

I would greatly appreciate it if you can possibly point me to the right 
direction. I have read everything on the wiki page and the documentation but no 
luck!

regards,
Sina


-- 


Sina Fakhraee , PhD  candidate 
Department of Computer Science 
Wayne State University 
5057 Woodward Avenue 
3rd floor, Suite 3105 
Detroit, Michigan 48202 
(517)974-8437(Cell) 
http://uwerg.cs.wayne.edu/ShowPage.aspx?node=0c5b13ef-2d8e-4abd-a216-a2037d947b63&acc=1
 



RE: Xsl for query output

2011-10-17 Thread Jaeger, Jay - DOT
It depends upon whether you want Solr to do the XSL processing, or the browser. 
 After fussing a bit, and doing some reading and thinking, we decided it was 
best to let the browser do the work, at least in our case.

If the browser is doing the processing, you don't need to modify sorlconfig.xml 
for that, and where you put it depends a bit on where the HTML page that points 
to the xsl came from.

What we did was create a separate directory called "search" in the webapp 
directory (parallel to admin, WEB-INF and the like).  In there we placed three 
things:  our HTML page, our .css file and our xsl.  (That way, when Solr 
updates, we know exactly how to handle it).

The HTML page refers to the xsl thusly:

  

With a value of

/solr/search/ourXSLStyleSheet.xsl

(We use javascript to generate the HTML, so it isn't in the tag initially).

We started with a copy of example.xsl and modified it *extensively*.  If you 
are not a programmer, trying to edit the xsl to produce what you want may be 
more adventure than you want to tackle.

JRU


-Original Message-
From: Jeremy Cunningham [mailto:jeremy.cunningham.h...@statefarm.com] 
Sent: Thursday, October 13, 2011 2:43 PM
To: solr-user@lucene.apache.org
Subject: Xsl for query output

I am new to solr and not a web developer.  I am a data warehouse guy trying to 
use solr for the first time.  I am familiar with xsl but I can't figure out how 
to get the example.xsl to be applied to my xml results.  I am running tomcat 
and have solr working.  I copied over the solr mulitiple core example to the 
conf directory on my tomcat server. I also added the war file and the search is 
fine.  I can't seem to figure out what I need to add to the solrcofig.xml or 
where ever so that the example.xsl is used.  Basically can someone tell me 
where to put the xsl and where to configure its usage?

Thanks


Re: OS Cache - Solr

2011-10-17 Thread Jan Høydahl
Hi Sujatha,

Are you sure you need 20Gb for Tomcat? Have you profiled using JConsole or 
similar? Try with 15Gb and see how it goes. The reason why this is beneficial 
is that you WANT your OS to have available memory for disk caching. If you have 
17Gb free after starting Solr, your OS will be able to cache all index files in 
memory and you get very high search performance. With your current settings, 
there is only 12Gb free for both caching the index and for your MySql 
activities.  Chances are that when you backup MySql, the cached part of your 
Solr index gets flushed from disk caches and need to be re-cached later.

How to interpret memory stats vary between OSes, and seing 163Mb free may 
simply mean that your OS has used most RAM for various caches and paging, but 
will flush it once an application asks for more memory. Have you seen 
http://wiki.apache.org/solr/SolrPerformanceFactors ?

You should also slim down your index maximally by setting stored=false and 
indexed=false wherever possible. I would also upgrade to a more current Solr 
version.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 17. okt. 2011, at 19:51, Sujatha Arun wrote:

> Hello
> 
> I am trying to understand the  OS cache utilization of Solr .Our server has
> several solr instances on a server .The total combined Index size of all
> instances is abt 14 Gb and the size of the maximum single Index is abt 2.5
> GB .
> 
> Our Server has Quad processor with 32 GB RAM .Out of which 20 GB has been
> assigned to  JVM. We are running solr1.3  on tomcat 5.5 and Java 1.6
> 
> Our current Statistics indicate that  solr uses 18-19 GB of 20 GB RAM
> assigned to JVM .However the  Free physical seems to remain constant as
> below.
> Free physical memory = 163 Mb
> Total physical memory = 32,232 Mb,
> 
> The server also serves as a backup server for Mysql where the application DB
> is backed up and restored .During this activity we see that lot of queries
> that nearly take even 10+ minutes to execute .But other wise
> maximum query time is less than  1-2 secs
> 
> The physical memory that is free seems to be constant . Why is this constant
> and how this will be used between the  Mysql backup and solr while
> backup activity is  happening How much free physical memory should be
> available to OS given out stats.?
> 
> Any pointers would be helpful.
> 
> Regards
> Sujatha



Re: Callback on starting solr?

2011-10-17 Thread Jan Høydahl
Check your SolrConfig whether your ping handler is not configured 
http://wiki.apache.org/solr/SolrConfigXml#The_Admin.2BAC8-GUI_Section

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 17. okt. 2011, at 20:07, Jithin wrote:

> How do I configure solr with a ping request?
> http://localhost:8983/solr/admin/ping/ gives HTTP 404.
> 
> On Mon, Oct 17, 2011 at 1:06 AM, Jan Høydahl / Cominvent [via Lucene] <
> ml-node+s472066n3426539...@n3.nabble.com> wrote:
> 
>> Your app-server will start listening to the port some time before the Solr
>> webapp is ready, so you should check directly with Solr. You could also use
>> JMX to check Solr's status. If you want help with your reindex failing
>> issue, please provide more context. 25Mb is very low, please try give your
>> VM more memory and see if indexing succeeds then.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> On 16. okt. 2011, at 20:38, Jithin wrote:
>> 
>>> I am doing something similar to that. checking netstat for any connection
>> on
>>> port. Wanted to know if there is anything solr can do built in.
>>> 
>>> Also I notice that my reindex is failing when I have to reindex some 7k+
>>> docs. Solr is giving error in logs -
>>> 
>>> 
>>> Caused by: java.net.SocketException: Broken pipe
>>>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>>>   at
>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>>>   at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>>>   at
>> org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)
>>>   at
>> org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)
>>>   at
>> org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)
>>>   at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)
>>>   ... 25 more
>>> 
>>> 2011-10-16 18:05:05.431:WARN::Committed before 500
>>> null||org.mortbay.jetty.EofException|?at
>>> org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at
>>> 
>> org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at
>> 
>>> 
>> org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at
>>> sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:296)|?at
>>> sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)|?at
>>> java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)|?at
>>> org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)|?at
>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)|?at
>> 
>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)|?at
>> 
>>> 
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)|?at
>> 
>>> 
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)|?at
>> 
>>> 
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)|?at
>> 
>>> 
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)|?at
>> 
>>> 
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)|?at
>> 
>>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)|?at
>> 
>>> 
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)|?at
>> 
>>> 
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)|?at
>> 
>>> 
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)|?at
>> 
>>> org.mortbay.jetty.Server.handle(Server.java:326)|?at
>>> 
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)|?at
>>> 
>> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)|?at
>> 
>>> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)|?at
>>> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)|?at
>>> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)|?at
>>> 
>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)|?at
>> 
>>> 
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)|Caused
>> 
>>> by: java.net.SocketException: Broken pipe|?at
>>> java.net.SocketOutputStream.socketWrite0(Native Method)|?at
>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)|?at
>>> java.net.SocketOutputStream.write(SocketOutputStream.java:153)|?at
>>> org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)|?at
>>> org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)|?at
>>> org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)|?at
>>> org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)|?... 25
>> more|
>>> 2011-10-16 18:05:05.432:WARN::/solr/core0/update/
>>> java.lang.IllegalStateException: Committed
>>> 
>>> 
>>> Is it a case where solr is not a

Transformer is applied on deltaQuery rather than deltaImportQuery

2011-10-17 Thread Jeff Zhang
Hi all,

I have a custom transformer, but it make me confused that the custom
transformed is applied on deltaQuery rather than deltaImportQuery.
As my understanding is that solr first execute deltaQuery and then
deltaImportQuery. So I think the output format of deltaImportQuery should be
the same as the output of query (full import).

Could anyone help me figure out what's wrong here ? Thanks



-- 
Best Regards

Jeff Zhang


Can you please guide me through step-by-step installation of Solr Cell ?

2011-10-17 Thread Sina Fakhraee
Dear Sir/Mam,

I am trying to use curl 
"http://localhost:8080/solr/update/extract?literal.id=doc1&commit=true"; -F 
"myfile=@somefile.pdf" from the wiki site... but I get the error cause by: 
Caused by: org.apache.solr.common.SolrException: Error loading class 
'solr.extraction.ExtractingRequestHandler'

With the jetty and the provided example, I have no problem. It all happens when 
I use tomcat and solr.

My setup is as follows: 

I downloaded the apache-solr-3.3.0 and unpacked itI am using 
"apache-solr-3.3.0" folder as my solr-home folder. Inside the "dist" folder I 
have the apache-solr-3.3.0.war and coppied everything from the 
contrib/extraction/lib into dist. 

I would greatly appreciate it if you can possibly point me to the right 
direction. I have read everything on the wiki page and the documentation but no 
luck!

Can you please guide me through step-by-step usgae of Solr Cell installation?

regards,
Sina

-- 


Sina Fakhraee , PhD  candidate 
Department of Computer Science 
Wayne State University 
5057 Woodward Avenue 
3rd floor, Suite 3105 
Detroit, Michigan 48202 
(517)974-8437(Cell) 
http://uwerg.cs.wayne.edu/ShowPage.aspx?node=0c5b13ef-2d8e-4abd-a216-a2037d947b63&acc=1
 




Re: Query with star returns double type values equal 0

2011-10-17 Thread Ahmet Arslan
> I am experiencing an unexpected behavior using solr 3.4.0.
> 
> if my query includes a star, all the properties of type
> 'long' or 'LatLon'
> have 0 as value
> (ex: select/?start=0&q=way*&rows=10&version=2)
> 
> Though the same request without stars returns correct
> values
> (ex: select/?start=0&q=way&rows=10&version=2)
> 
> Does anyone have an idea?

Please keep in mind that wildcard queries are not analyzed. 

What query parser are you using? lucene, dismax, edismax?




score based on unique words matching

2011-10-17 Thread Craig Stadler

Heres my problem :

field1 (text) - subject
q=david bowie changes

Problem : If a record mentions david bowie a lot, it beats out something 
more relevant (more unique matches) ...


A. (now appearing david bowie at the cineplex 7pm david bowie goes on 
stage, then mr. bowie will sign autographs)

B. song :david bowie - changes

(A) ends up more relevant because of the frequency or number of words in 
it.. not cool...

I want it so the number of words matching will trump density/weight

Thanks im a newbie.
-Craig 




Re: Selective Result Grouping

2011-10-17 Thread entdeveloper
Not necessarily collapse.type=adjacent. That is only when two docs with the
same field value appear next to each other. I'm more concerned with the case
where we only want a group of a certain type (no matter where the subsequent
docs may be), leaving the rest of the documents ungrouped.

The current grouping functionality using group.field is basically
all-or-nothing: all documents will be grouped by the field value or none
will. So there would be no way to, for example, collapse just the videos or
images like they do in google.

You're correct it would be difficult to support this in a sharded
environment, but like most other features, it could be available in a single
shard first and work toward supporting it in a sharded env.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Selective-Result-Grouping-tp3391538p3429618.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Controlling the order of partial matches based on the position

2011-10-17 Thread aronitin
Guys,

It's been almost a week but there are no replies to the question that I
posted. 

If its a small problem and already answered somewhere, please point me to
that post. Otherwise please suggest any pointer to handle the requirement
mentioned in the question,

Nitin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Controlling-the-order-of-partial-matches-based-on-the-position-tp3413867p3429823.html
Sent from the Solr - User mailing list archive at Nabble.com.


Word de-compounding using the terms in the index?

2011-10-17 Thread mtraynham
Say for instance at query time, I would like to use the terms within my index
to de-compound query Terms.  The current solution I'm aiming for is to build
a "suggester" like component into the query pipeline using TSTLookups. 
Since all Lookups require to be SolrCoreAware, that is; rebuilt when commits
happen, is it feasible to keep a Lookup hash for certain fields stored in an
Query Parser Attribute and pass that down to a certain piece in the
pipeline?  Then it would decomp the terms as needed...

Is this feasible, or will I see my hash blow my memory out of the water?  If
I could achieve the same result using a Field filter factory, I would, but
I'm not sure if they are allowed to be SolrCoreAware.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Word-de-compounding-using-the-terms-in-the-index-tp3429873p3429873.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question about near query order

2011-10-17 Thread Jason, Kim
"analyze term"~2
"term analyze"~2 

In my case, two queries return different result set.
Isn't that in your case?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-near-query-order-tp3427312p3429916.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OS Cache - Solr

2011-10-17 Thread Sujatha Arun
Hello Jan,

Thanks for your response and  clarification.

We are monitoring the JVM cache utilization and we are currently using about
18 GB of the 20 GB assigned to JVM. Out total index size being abt 14GB

Regards
Sujatha

On Tue, Oct 18, 2011 at 1:19 AM, Jan Høydahl  wrote:

> Hi Sujatha,
>
> Are you sure you need 20Gb for Tomcat? Have you profiled using JConsole or
> similar? Try with 15Gb and see how it goes. The reason why this is
> beneficial is that you WANT your OS to have available memory for disk
> caching. If you have 17Gb free after starting Solr, your OS will be able to
> cache all index files in memory and you get very high search performance.
> With your current settings, there is only 12Gb free for both caching the
> index and for your MySql activities.  Chances are that when you backup
> MySql, the cached part of your Solr index gets flushed from disk caches and
> need to be re-cached later.
>
> How to interpret memory stats vary between OSes, and seing 163Mb free may
> simply mean that your OS has used most RAM for various caches and paging,
> but will flush it once an application asks for more memory. Have you seen
> http://wiki.apache.org/solr/SolrPerformanceFactors ?
>
> You should also slim down your index maximally by setting stored=false and
> indexed=false wherever possible. I would also upgrade to a more current Solr
> version.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 17. okt. 2011, at 19:51, Sujatha Arun wrote:
>
> > Hello
> >
> > I am trying to understand the  OS cache utilization of Solr .Our server
> has
> > several solr instances on a server .The total combined Index size of all
> > instances is abt 14 Gb and the size of the maximum single Index is abt
> 2.5
> > GB .
> >
> > Our Server has Quad processor with 32 GB RAM .Out of which 20 GB has been
> > assigned to  JVM. We are running solr1.3  on tomcat 5.5 and Java 1.6
> >
> > Our current Statistics indicate that  solr uses 18-19 GB of 20 GB RAM
> > assigned to JVM .However the  Free physical seems to remain constant as
> > below.
> > Free physical memory = 163 Mb
> > Total physical memory = 32,232 Mb,
> >
> > The server also serves as a backup server for Mysql where the application
> DB
> > is backed up and restored .During this activity we see that lot of
> queries
> > that nearly take even 10+ minutes to execute .But other wise
> > maximum query time is less than  1-2 secs
> >
> > The physical memory that is free seems to be constant . Why is this
> constant
> > and how this will be used between the  Mysql backup and solr while
> > backup activity is  happening How much free physical memory should be
> > available to OS given out stats.?
> >
> > Any pointers would be helpful.
> >
> > Regards
> > Sujatha
>
>


Solr scraping: Nutch and other alternatives.

2011-10-17 Thread Luis Cappa Banda
Hello everyone.

I've been thinking about a way to retrieve information from a domain (for
example, http://www.ign.com) to process and index. My idea is to use Solr as
a searcher. I'm familiarized with Apache Nutch and I know that the latest
version has a gateway to Solr to retrieve and index information with it. I
tried it and it worked fine, but it's a little bit complex to develop
plugins to process info and index it in a new field desired. Perhaps one of
you have tried another (and better) alternative to data mine web
information. Which is your recommendation? Can you give me any scraping
suggestion?

Thank you very much.

Luis Cappa.


Re: feeding while solr is running ?

2011-10-17 Thread lorenlai
Hello Alireza,

thank you for your reply. I will read the solr tutorial ;-)

Cheers

Loren

--
View this message in context: 
http://lucene.472066.n3.nabble.com/feeding-while-solr-is-running-tp3428500p3430478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: feeding while solr is running ?

2011-10-17 Thread lorenlai
Hello Robert,

also many thanks to you for the LINKS and the short explanation. ;-)

*hug* & cheers

Loren




--
View this message in context: 
http://lucene.472066.n3.nabble.com/feeding-while-solr-is-running-tp3428500p3430483.html
Sent from the Solr - User mailing list archive at Nabble.com.