Solr Security

2009-08-07 Thread Francis Yakin

Have anyone had an experience to setup the Solr Security?

http://wiki.apache.org/solr/SolrSecurity

I would like to implement using HTTP Authentication or using Path Based 
Authentication.

So, in the webdefault.xml I set like the following:



  Solr authenticated application
  /core1/*


  core1-role

  

  
BASIC
Test Realm
  

What should I put in "url-pattern" and "web-resource-name" ?

Then I set up

Realm.properties like this

guest: guest, core1-role


Francis




Example dir

2009-08-12 Thread Francis Yakin

As of right now when I installed and configure the Solr, I will get "example" 
dir ( like /opt/apache-solr-1.3.0/example ).

How can I change that to something else, because "example" to me is not real?


Thanks

Francis




RE: Example dir

2009-08-12 Thread Francis Yakin

Any one has any inputs for this? I really appreciated.

Thanks

Francis

-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: Wednesday, August 12, 2009 3:39 PM
To: 'solr-user@lucene.apache.org'
Subject: Example dir


As of right now when I installed and configure the Solr, I will get "example" 
dir ( like /opt/apache-solr-1.3.0/example ).

How can I change that to something else, because "example" to me is not real?


Thanks

Francis




RE: Example dir

2009-08-12 Thread Francis Yakin
Thanks!

How for weblogic?

Francis

-Original Message-
From: ant [mailto:dormant.m...@gmail.com]
Sent: Wednesday, August 12, 2009 8:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Example dir

http://wiki.apache.org/solr/SolrJetty#head-5663a826c263727cad83bc58cac0cb02f53d6a80
SolrJetty
and others
http://wiki.apache.org/solr/SolrInstall#head-ec97d15a70656e9c0308009db70d71af3efc7cd2


2009/8/13 Francis Yakin 

>
> Any one has any inputs for this? I really appreciated.
>
> Thanks
>
> Francis
>
> -Original Message-----
> From: Francis Yakin [mailto:fya...@liquid.com]
> Sent: Wednesday, August 12, 2009 3:39 PM
> To: 'solr-user@lucene.apache.org'
> Subject: Example dir
>
>
> As of right now when I installed and configure the Solr, I will get
> "example" dir ( like /opt/apache-solr-1.3.0/example ).
>
> How can I change that to something else, because "example" to me is not
> real?
>
>
> Thanks
>
> Francis
>
>
>


RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin
I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer 
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The Master 
only taking the new index from Database and slaves will pull the new index 
using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master 
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any & all clients) becomes 
visible to all searches henceforth.

The "web interface" has direct access to Solr, and SolrJ remotely accesses that 
Solr.

SolrEmbeddedSolrServer is something that few people should actually use.  It's 
mostly for embedding Solr without running Solr as a server, which is a somewhat 
rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, "Paul Tomblin"  wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin



RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin
Thanks.

The issue we have actually, it could be firewall issue more likely than network 
latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial 
load only) and subsequently we actively adding the new docs to Solr after the 
initial load. We prefer to use JDBC connection , so if solrj uses JDBC 
connection that might usefull. I also like the multi-threading option from 
Solrj. So, since we want the solr Master running as server also 
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-Original Message-----
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web simultaneously?

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The
Master only taking the new index from Database and slaves will pull the new
index using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any & all clients)
becomes visible to all searches henceforth.

The "web interface" has direct access to Solr, and SolrJ remotely accesses
that Solr.

SolrEmbeddedSolrServer is something that few people should actually use.
It's mostly for embedding Solr without running Solr as a server, which is a
somewhat rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, "Paul Tomblin"  wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin





RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin
 We already opened port 80 from solr to DB so that's not the issue, but 
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master 
only accept the new index from DB and slaves will pull the new indexes from 
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and 
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than EmbeddedSolrServer, 
since we want the Solr Master acting as a solr server as well.
I just worried that http will be a bottle neck, that's why I prefer JDBC 
connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web simultaneously?

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The
Master only taking the new index from Database and slaves will pull the new
index using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any & all clients)
becomes visible to all searches henceforth.

The "web interface" has direct access to Solr, and SolrJ remotely accesses
that Solr.

SolrEmbeddedSolrServer is something that few people should actually use.
It's mostly for embedding Solr without running Solr as a serv

RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin
No, we don't want to put at the same box as Database box.

Agree, that indexing/committing/merging and optimizing is the bottle neck.

I think it worths to try SolrJ with CommmonsHttpSolrServer option for now and 
let's see what happened to load 3 millions docs.

Thanks

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 1:34 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

With this configuration probably preferred method is to run standalone Java
application on same box as DB, or very close to DB (in same network
segment).

HTTP is not a bottleneck; main bottleneck is
indexing/committing/merging/optimizing in SOLR...

Just as a sample, if  you submit to SOLR batch of large documents, - expect
5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but
nothing related to network latency nor to firewalling... upload 1Mb over
100Mbps network takes less than 0.1 seconds, but indexing it may take > 0.5
secs...

Standalone application with SolrJ is also good because you may schedule
batch updates etc; automated...


P.S.
In theory, if you are using Oracle, you may even try to implement triggers
written in Java causing SOLR update on each row update (transactional); but
I haven't heard anyone uses stored procs in Java, too risky and slow, with
specific dependencies...




-Original Message-----
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 4:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

 We already opened port 80 from solr to DB so that's not the issue, but
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master
only accept the new index from DB and slaves will pull the new indexes from
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than
EmbeddedSolrServer, since we want the Solr Master acting as a solr server as
well.
I just worried that http will be a bottle neck, that's why I prefer JDBC
connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-

RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin

Thanks for the response.

I will try CommonsHttpSolrServer for now.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 1:34 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

With this configuration probably preferred method is to run standalone Java
application on same box as DB, or very close to DB (in same network
segment).

HTTP is not a bottleneck; main bottleneck is
indexing/committing/merging/optimizing in SOLR...

Just as a sample, if  you submit to SOLR batch of large documents, - expect
5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but
nothing related to network latency nor to firewalling... upload 1Mb over
100Mbps network takes less than 0.1 seconds, but indexing it may take > 0.5
secs...

Standalone application with SolrJ is also good because you may schedule
batch updates etc; automated...


P.S.
In theory, if you are using Oracle, you may even try to implement triggers
written in Java causing SOLR update on each row update (transactional); but
I haven't heard anyone uses stored procs in Java, too risky and slow, with
specific dependencies...




-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 4:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

 We already opened port 80 from solr to DB so that's not the issue, but
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master
only accept the new index from DB and slaves will pull the new indexes from
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than
EmbeddedSolrServer, since we want the Solr Master acting as a solr server as
well.
I just worried that http will be a bottle neck, that's why I prefer JDBC
connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-----Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only

OutOfMemory issue after upgrade to 1.3 solr

2009-09-09 Thread Francis Yakin

Our slaves servers is having issue with the following error after we upgraded 
to Solr 1.3.

Any suggestions?

Thanks

Francis

NFO: [] webapp=/solr path=/select/ 
params={q=(type:artist+AND+alphaArtistSort:"forever+in+terror")} hits=1 
status=0 QTime=1
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 
14140776, Num elements: 3535189
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 
442984, Num elements: 55371


RE: OutOfMemory error on solr 1.3

2009-09-09 Thread Francis Yakin
 Xms is 1.5Gb  , Xnx is 1.5Gb and Xns is 128Mb. Physical memory  is 4Gb.

We are running Jrockit version 1.5.0_15 on weblogic 10.

./java -version
java version "1.5.0_15"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_15-b04)
BEA JRockit(R) (build R27.6.0-50_o-100423-1.5.0_15-20080626-2104-linux-x86_64, 
compiled mode)

4 S root  7532  7487  8  75   0 - 804721 184466 05:10 ?   00:07:18 
/opt/bea/jrmc-3.0.3-1.5.0/bin/java -Xms1536m -Xmx1536m -Xns:128m -Xgc:gencon 
-Djavelin.jsp.el.elcache=4096 
-Dsolr.solr.home=/opt/apache-solr-1.3.0/example/solr

Francis

-Original Message-
From: Constantijn Visinescu [mailto:baeli...@gmail.com]
Sent: Wednesday, September 09, 2009 11:35 PM
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemory error on solr 1.3

Just wondering, how much memory are you giving your JVM ?

On Thu, Sep 10, 2009 at 7:46 AM, Francis Yakin  wrote:

>
> I am having OutOfMemory error on our slaves server, I would like to know if
> someone has the same issue and have the solution for this.
>
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@96cd2ffc:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size:
> 441216, Num elements: 55150
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@519116e0:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@74dc52fa:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@d0dd3e28:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@b6dfa5bc:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@482b13ef:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@2309438c:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@277bd48c:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> Exception in thread "[ACTIVE] ExecuteThread: '7' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> Exception in thread "[ACTIVE] ExecuteThread: '8' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> Exception in thread "[ACTIVE] ExecuteThread: '10' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> Exception in thread "[ACTIVE] ExecuteThread: '11' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@41405463:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 751552, Num elements: 187884
>  java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
> Num elements: 8192
> java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
> Num elements: 8192
> java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5096,
> Num elements: 2539
> java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5400,
> Num elements: 2690
>
>  deployment service message for request id "-1" from server "AdminServer".
> Exception is: "java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
> size: 4368, Num elements: 2174
> SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size:
> 14140768, Num elements: 3535188
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@8dbcc7ab:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5320,
&g

RE: OutOfMemory error on solr 1.3

2009-09-10 Thread Francis Yakin
SO, do you think increasing the JVM will help? We also have 
500 in solrconfig.xml
Originally was set to 200

Currently we give solr 1.5GB for Xms and Xmx, we use jrockit version 1.5.0_15

4 S root 12543 12495 16  76   0 - 848974 184466 Jul20 ?   8-11:12:03 
/opt/bea/jrmc-3.0.3-1.5.0/bin/java -Xms1536m -Xmx1536m -Xns:128m -Xgc:gencon 
-Djavelin.jsp.el.elcache=4096 
-Dsolr.solr.home=/opt/apache-solr-1.3.0/example/solr

Francis

-Original Message-
From: Constantijn Visinescu [mailto:baeli...@gmail.com]
Sent: Wednesday, September 09, 2009 11:35 PM
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemory error on solr 1.3

Just wondering, how much memory are you giving your JVM ?

On Thu, Sep 10, 2009 at 7:46 AM, Francis Yakin  wrote:

>
> I am having OutOfMemory error on our slaves server, I would like to know if
> someone has the same issue and have the solution for this.
>
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@96cd2ffc:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size:
> 441216, Num elements: 55150
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@519116e0:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@74dc52fa:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@d0dd3e28:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@b6dfa5bc:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@482b13ef:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@2309438c:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@277bd48c:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> Exception in thread "[ACTIVE] ExecuteThread: '7' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> Exception in thread "[ACTIVE] ExecuteThread: '8' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> Exception in thread "[ACTIVE] ExecuteThread: '10' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> Exception in thread "[ACTIVE] ExecuteThread: '11' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@41405463:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 751552, Num elements: 187884
>  java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
> Num elements: 8192
> java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
> Num elements: 8192
> java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5096,
> Num elements: 2539
> java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5400,
> Num elements: 2690
>
>  deployment service message for request id "-1" from server "AdminServer".
> Exception is: "java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
> size: 4368, Num elements: 2174
> SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size:
> 14140768, Num elements: 3535188
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@8dbcc7ab:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5320,
> Num elements: 2649
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@4d0c6fc5:java.lang.OutOfMemoryError:
> allocLargeO

RE: OutOfMemoryError due to auto-warming

2009-09-24 Thread Francis Yakin
You also can increase the JVM HeapSize if you have enough physical memory, like 
for example if you have 4GB physical, gives the JVM heapsize 2GB or 2.5GB. 

Francis

-Original Message-
From: didier deshommes [mailto:dfdes...@gmail.com] 
Sent: Thursday, September 24, 2009 3:32 PM
To: solr-user@lucene.apache.org
Cc: Andrew Montalenti
Subject: OutOfMemoryError due to auto-warming

Hi there,
We are running solr and allocating  1GB to it and we keep having
OutOfMemoryErrors. We get messages like this:

Error during auto-warming of
key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError:
Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.(String.java:216)
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at 
org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169)
at 
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
at 
org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181)
at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252)
at org.apache.lucene.search.Searcher.search(Searcher.java:173)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at 
org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51)
at 
org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332)
at org.apache.solr.search.LRUCache.warm(LRUCache.java:194)
at 
org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

And like this:
   Error during auto-warming of
key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError:
Java heap space

We've searched and one suggestion was to reduce the size of the
various caches that do sorting in solrconfig.xml
(http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html).
Does this solution generally work?  Can anyone think of any other
cause for this problem?

didier


RE: OutOfMemoryError due to auto-warming

2009-09-24 Thread Francis Yakin
 
I reduced the size of queryResultCache in solrconfig seems to fix the issue as 
well.


200


>From 500


500

Francis

-Original Message-
From: didier deshommes [mailto:dfdes...@gmail.com] 
Sent: Thursday, September 24, 2009 3:32 PM
To: solr-user@lucene.apache.org
Cc: Andrew Montalenti
Subject: OutOfMemoryError due to auto-warming

Hi there,
We are running solr and allocating  1GB to it and we keep having
OutOfMemoryErrors. We get messages like this:

Error during auto-warming of
key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError:
Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.(String.java:216)
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at 
org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169)
at 
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
at 
org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181)
at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252)
at org.apache.lucene.search.Searcher.search(Searcher.java:173)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at 
org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51)
at 
org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332)
at org.apache.solr.search.LRUCache.warm(LRUCache.java:194)
at 
org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

And like this:
   Error during auto-warming of
key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError:
Java heap space

We've searched and one suggestion was to reduce the size of the
various caches that do sorting in solrconfig.xml
(http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html).
Does this solution generally work?  Can anyone think of any other
cause for this problem?

didier


RE: cleanup old index directories on slaves

2009-10-05 Thread Francis Yakin
I use it in our env(Prod), it seems to working fine for years now only clean up 
the snapshot, but not the index.

I added it to the cron that run once a day to clean up

-francis

-Original Message-
From: Feak, Todd [mailto:todd.f...@smss.sony.com] 
Sent: Monday, October 05, 2009 2:34 PM
To: solr-user@lucene.apache.org
Subject: RE: cleanup old index directories on slaves

We use the snapcleaner script.

http://wiki.apache.org/solr/SolrCollectionDistributionScripts#snapcleaner

Will that do the job?

-Todd

-Original Message-
From: solr jay [mailto:solr...@gmail.com] 
Sent: Monday, October 05, 2009 1:58 PM
To: solr-user@lucene.apache.org
Subject: cleanup old index directories on slaves

Is there a reliable way to safely clean up index directories? This is needed
mainly on slave side as in several situations, an old index directory is
replaced with a new one, and I'd like to remove those that are no longer in
use.

Thanks,

-- 
J



RE: Solr over DRBD

2009-10-13 Thread Francis Yakin
Ypu should set a hearbeat and have the virtual IP setup for the active instance.
So in haresources you can set like this:


node1  IPaddr::10.2.0.11 drbddisk::r0 
Filesystem::/dev/drbd0::/cluster/Solr::ext3::defaults,noatime  httpd

Are you running active/active cluster or active/passive?

Francis

-Original Message-
From: Pieter Steyn [mailto:pieter...@gmail.com] 
Sent: Monday, October 12, 2009 8:39 AM
To: solr-user@lucene.apache.org
Subject: Solr over DRBD

Hi there,

I have a 2 node cluster running apache and solr over a shared
partition ontop of DRBD.   Think of it like a SAN.

I'm curios as to how I should do load balancing / sharing with Solr in
this setup.  I'm already using DNS round robbin for apache.

My Solr installation is on /cluster/Solr.  I've been starting an
instance of Solr on each server out of the same installation / working
directory.
Is this safe?  I haven't noticed any problems so far.

Does this mean they'll share the same index?  Is there a better way to
do this?  Should I perhaps only do commits on one of the servers (and
setup heartbeat to determine which server to run the commit on)?

I'm running Solr 1.3, but I'm not against upgrading if that provides
me with a better way of load balancing.

Kind regards,
Pieter


OutOfMemory error

2009-05-05 Thread Francis Yakin

I am having frequent OutOfMemory error on our slaves server.

SEVERE: Error during auto-warming of 
key:org.apache.solr.search.queryresult...@aca6b9cb:java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 34279632, Num elements: 8569904
SEVERE: Error during auto-warming of 
key:org.apache.solr.search.queryresult...@f9947c35:java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
SEVERE: Error during auto-warming of 
key:org.apache.solr.search.queryresult...@d938cfa3:java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
Exception in thread "[ACTIVE] ExecuteThread: '2' for queue: 
'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
Exception in thread "[ACTIVE] ExecuteThread: '5' for queue: 
'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
Exception in thread "[ACTIVE] ExecuteThread: '8' for queue: 
'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
Exception in thread "[STANDBY] ExecuteThread: '3' for queue: 
'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
Exception in thread "[ACTIVE] ExecuteThread: '13' for queue: 
'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 8208, Num elements: 8192


We are running weblogic and java version is 1.5.

We set the heap size to 1.5GB?

What's the recommendation for this issue?

Thanks

Francis



Upgrading from 1.2.0 to 1.3.0

2009-05-05 Thread Francis Yakin

What's the best way to upgrade solr from 1.2.0 to 1.3.0 ?

We have the current index that our users search running on 1.2.0 Solr version.

We would like to upgrade it to 1.3.0?

We have Master/Slaves env.

What's the best way to upgrade it without affecting the search? Do we need to 
do it on master or slaves first?



Thanks

Francis




RE: OutOfMemory error

2009-05-05 Thread Francis Yakin

Here is cache in solrconfig.xml


 


   


  



true




   

   
10














 into cached filters if the number of docs selected by the clause 
exceeds

  1-2 for read-only slaves, higher for masters w/o cache warming. -->
   every xsltCacheLifetimeSeconds.
5


And here is in schema.xml

 Sort artist name used by mp3 store to sort artist title for search
-->



   
   

   
   

   
   

   
   

   
   


   
   

 













!-- Numeric field types that manipulate the value into
 a string value that isn't human-readable in its internal form,
 but with a lexicographic ordering the same as the numeric ordering,
 so that range queries work correctly. -->






 

  






 
  
  


 




  



-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Tuesday, May 05, 2009 10:32 AM
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemory error


Hi Francis,

How big are your caches?  Please paste the relevant part of the config.
Which of your fields do you sort by?  Paste definitions of those fields from 
schema.xml, too.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Francis Yakin 
> To: "solr-user@lucene.apache.org" 
> Sent: Tuesday, May 5, 2009 1:00:07 PM
> Subject: OutOfMemory error
>
>
> I am having frequent OutOfMemory error on our slaves server.
>
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@aca6b9cb:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 34279632, Num elements: 8569904
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@f9947c35:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@d938cfa3:java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
> Exception in thread "[ACTIVE] ExecuteThread: '2' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> Exception in thread "[ACTIVE] ExecuteThread: '5' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> Exception in thread "[ACTIVE] ExecuteThread: '8' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> Exception in thread "[STANDBY] ExecuteThread: '3' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> Exception in thread "[ACTIVE] ExecuteThread: '13' for queue:
> 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
>
>
> We are running weblogic and java version is 1.5.
>
> We set the heap size to 1.5GB?
>
> What's the recommendation for this issue?
>
> Thanks
>
> Francis



Solrconfig.xml

2009-05-06 Thread Francis Yakin

I just upgraded from 1.2.0 to 1.3.0 of solr.
We have an existing data/index that I will be using from 1.2.0 to 1.3.0 and I 
use the default solrconfig.xml that come from 1.3.0.

For some reason when I used solrconfig.xml from 1.2.0 it works and I can see 
the index and data, but I used solrconfig.xml from 1.3.0 I don't see the data 
and index.

What did I do wrong?

Thanks

Francis




RE: Solrconfig.xml

2009-05-06 Thread Francis Yakin

No error, attached is solrconfig.xml files( one is from 1.2.0 that works and 
the other is 1.3.0 that doesn't work)

Thanks in advance.

Francis


-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Wednesday, May 06, 2009 4:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Solrconfig.xml

Is there an error in the logs?

On May 6, 2009, at 2:12 PM, Francis Yakin wrote:

>
> I just upgraded from 1.2.0 to 1.3.0 of solr.
> We have an existing data/index that I will be using from 1.2.0 to
> 1.3.0 and I use the default solrconfig.xml that come from 1.3.0.
>
> For some reason when I used solrconfig.xml from 1.2.0 it works and I
> can see the index and data, but I used solrconfig.xml from 1.3.0 I
> don't see the data and index.
>
> What did I do wrong?
>
> Thanks
>
> Francis
>
>

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search



Java OutOfmemory error during autowarming

2009-05-28 Thread Francis Yakin
During auto-warming of solr search on QueryResultKey, Our Production solrslaves 
errors throw OutOfMemory error and application need to be bounced.

Here is the error logs:

SEVERE: Error during auto-warming of 
key:org.apache.solr.search.queryresult...@9acd5d67:java.lang.OutOfMemoryError:
 allocLargeObject
OrArray - Object size: 25291544, Num elements: 6322881
SEVERE: Error during auto-warming of 
key:org.apache.solr.search.queryresult...@84554284:java.lang.OutOfMemoryError:
 allocLargeObject
OrArray - Object size: 579368, Num elements: 144837
SEVERE: Error during auto-warming of 
key:org.apache.solr.search.queryresult...@8c1e63e4:java.lang.OutOfMemoryError:
 allocLargeObject
OrArray - Object size: 4280976, Num elements: 1070240


I read some suggestions from google that increasing the size of 
queryResultCache in the solrconfig.xml file will help, here is the suggestion:


This is not directly related to the jvm heap size. During slave replication the 
current Searcher is used as the source of auto-warming. When a new searcher is 
opened, its caches may be prepopulated or "autowarmed" using data from caches 
in the old searcher. It sounds like caching configuration problem to me as is 
evident from the error message "Error during auto-warming of key". I suggest 
increasing the size of the query cache on one of the slave servers and monitor 
it over next week or so and then incrementally roll this change to rest of the 
slave boxes. This is the change that you need to make in solrconfig.xml

Right now this what I have:

  

Should I increase "size" to 1024 will help?

Any input?

Thanks

Francis




RE: Java OutOfmemory error during autowarming

2009-05-29 Thread Francis Yakin

There is no "FieldCache" entries in solrconfig.xml ( BTW we are running version 
1.2.0)


-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com]
Sent: Friday, May 29, 2009 9:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Java OutOfmemory error during autowarming

It's probably not the size of the query cache, but the size of the FieldCache 
entries that are used for sorting and function queries (that's the only thing 
that should be allocating huge arrays like that).

What fields do you sort on or use function queries on?  There may be a way to 
decrease the memory consumption.

-Yonik
http://www.lucidimagination.com

On Fri, May 29, 2009 at 1:02 AM, Francis Yakin  wrote:
> During auto-warming of solr search on QueryResultKey, Our Production 
> solrslaves errors throw OutOfMemory error and application need to be bounced.
>
> Here is the error logs:
>
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@9acd5d67<mailto:org.apache.s
> olr.search.queryresult...@9acd5d67>:java.lang.OutOfMemoryError:
> allocLargeObject OrArray - Object size: 25291544, Num elements:
> 6322881
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@84554284<mailto:org.apache.s
> olr.search.queryresult...@84554284>:java.lang.OutOfMemoryError:
> allocLargeObject OrArray - Object size: 579368, Num elements: 144837
> SEVERE: Error during auto-warming of
> key:org.apache.solr.search.queryresult...@8c1e63e4<mailto:org.apache.s
> olr.search.queryresult...@8c1e63e4>:java.lang.OutOfMemoryError:
> allocLargeObject OrArray - Object size: 4280976, Num elements: 1070240
>
>
> I read some suggestions from google that increasing the size of 
> queryResultCache in the solrconfig.xml file will help, here is the suggestion:
>
>
> This is not directly related to the jvm heap size. During slave
> replication the current Searcher is used as the source of
> auto-warming. When a new searcher is opened, its caches may be
> prepopulated or "autowarmed" using data from caches in the old
> searcher. It sounds like caching configuration problem to me as is
> evident from the error message "Error during auto-warming of key". I
> suggest increasing the size of the query cache on one of the slave
> servers and monitor it over next week or so and then incrementally
> roll this change to rest of the slave boxes. This is the change that
> you need to make in solrconfig.xml
>
> Right now this what I have:
>
>class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="256"/>
>
> Should I increase "size" to 1024 will help?
>
> Any input?
>
> Thanks
>
> Francis
>
>
>


RE: Java OutOfmemory error during autowarming

2009-05-29 Thread Francis Yakin

I know, but the FieldCache is not in the solrconfig.xml


-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com]
Sent: Friday, May 29, 2009 10:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Java OutOfmemory error during autowarming

On Fri, May 29, 2009 at 1:44 PM, Francis Yakin  wrote:
>
> There is no "FieldCache" entries in solrconfig.xml ( BTW we are
> running version 1.2.0)

Lucene FieldCache entries are created when you sort on a field or when you use 
a field in a function query.

-Yonik


RE: Java OutOfmemory error during autowarming

2009-06-01 Thread Francis Yakin

Hi Chris,

I am new in solr.

When it is initialized for the first time, how can I change it?

Thanks

Francis

-Original Message-
From: Chris Harris [mailto:rygu...@gmail.com]
Sent: Sunday, May 31, 2009 3:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Java OutOfmemory error during autowarming

Solr offers no configuration for FieldCache, neither in solrconfig.xml nor 
anywhere else; rather, that cache gets populated automatically in the depths of 
Lucene when you do a sort (or also apparently, as Yonik says, when you use a 
field in a function query).

>From the wiki: 'Lucene has a low level "FieldCache" which is used for sorting 
>(and in some cases faceting). This cache is not managed by Solr it has no 
>configuration options and cannot be autowarmed -- it is initialized the first 
>time it is used for each Searcher.' (
http://wiki.apache.org/solr/SolrCaching)

2009/5/29 Francis Yakin 

>
> I know, but the FieldCache is not in the solrconfig.xml
>
>
> -Original Message-
> From: Yonik Seeley [mailto:ysee...@gmail.com]
> Sent: Friday, May 29, 2009 10:47 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Java OutOfmemory error during autowarming
>
> On Fri, May 29, 2009 at 1:44 PM, Francis Yakin  wrote:
> >
> > There is no "FieldCache" entries in solrconfig.xml ( BTW we are
> > running version 1.2.0)
>
> Lucene FieldCache entries are created when you sort on a field or when
> you use a field in a function query.
>
> -Yonik
>


Solr.war

2009-06-01 Thread Francis Yakin

We are planning to upgrade solr 1.2.0 to 1.3.0

Under 1.3.0 - Which of war file that I need to use and deploy on my application?

We are using weblogic.

There are two war files under 
/opt//apache-solr-1.3.0/dist/apache-solr-1.3.0.war and under 
/opt/apache-solr-1.3.0/example/webapps/solr.war.
Which is one are we suppose to use?


Thanks

Francis




RE: Solr.war

2009-06-02 Thread Francis Yakin
 Thank You!

Francis

-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
Sent: Monday, June 01, 2009 5:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr.war

They are identical. solr.war is a copy of apache-solr-1.3.0.war.
You may want to look at example target in build.xml:

  



Koji

Francis Yakin wrote:
> We are planning to upgrade solr 1.2.0 to 1.3.0
>
> Under 1.3.0 - Which of war file that I need to use and deploy on my 
> application?
>
> We are using weblogic.
>
> There are two war files under 
> /opt//apache-solr-1.3.0/dist/apache-solr-1.3.0.war and under 
> /opt/apache-solr-1.3.0/example/webapps/solr.war.
> Which is one are we suppose to use?
>
>
> Thanks
>
> Francis
>
>
>
>



Example folder - can we change it?

2009-06-08 Thread Francis Yakin

When I install solr , by default it will install it under 
/opt/apache-solr-1.3.0/

The bin , config file and data is under /opt/apache-solr-1.3.0/example/solr

Is there anyway that we change the example to something else?
Because "example" is can be interpreted wrong ( like sample, so it's not real)


Francis




RE: Upgrading 1.2.0 to 1.3.0 solr

2009-06-11 Thread Francis Yakin

DO you have experience to upgrade from 1.2.0 to 1.3.0?
In other words, do you have any suggestions or best if you have any docs or 
instructions for doing this.

I appreciate if you can help me.

Thanks

Francis


-Original Message-
From: Ryan Grange [mailto:rgra...@dollardays.com]
Sent: Thursday, June 11, 2009 8:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Upgrading 1.2.0 to 1.3.0 solr

I disagree with waiting that month.  At this point, most of the kinks in the 
upgrade from 1.2 to 1.3 have been worked out.  Waiting for 1.4 to come out 
risks you becoming a guinea pig for the upgrade procedure.
Plus, if any show-stoppers come along delaying 1.4, you delay implementation of 
your auto-complete function.  When 1.4 comes out, if it has any features you 
feel compel an upgrade, you can begin another round of testing and migration, 
but don't upgrade a production system just for the sake of being bleeding edge.

Ryan T. Grange, IT Manager
DollarDays International, Inc.
rgra...@dollardays.com (480)922-8155 x106



Otis Gospodnetic wrote:
> Francis,
>
> If you can wait another month or so, you could skip 1.3.0, and jump to 1.4 
> which will be released soon.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
>> From: Francis Yakin 
>> To: "solr-user@lucene.apache.org" 
>> Sent: Wednesday, June 10, 2009 1:17:25 AM
>> Subject: Upgrading 1.2.0 to 1.3.0 solr
>>
>>
>> I am in process to upgrade our solr 1.2.0 to solr 1.3.0
>>
>> Our solr 1.2.0 now is working fine, we just want to upgrade it cause we have 
>> an application that requires some function from 1.3.0( we call it 
>> autocomplete).
>>
>> Currently our config files on 1.2.0 are as follow:
>>
>> Solrconfig.xml
>> Schema.xml ( we wrote this in house)
>> Index_synonyms.txt ( we also modified and wrote this in house)
>> Scripts.conf Protwords.txt Stopwords.txt Synonyms.txt
>>
>> I understand on 1.3.0 , it has new solrconfig.xml .
>>
>> My questions are:
>>
>> 1) what config files that I can reuse from 1.2.0 for 1.3.0
>>   can I use the same schema.xml
>> 2) Solrconfig.xml, can I use the 1.2.0 version or I have to stick with 1.3.0
>>   If I need to stick with 1.3.0, what that I need to change.
>>
>> As of right I am testing it in my sandbox, so it doesn't work.
>>
>> Please advice, if you have any docs for upgrading 1.2.0 to 1.3.0 let me know.
>>
>> Thanks in advance
>>
>> Francis
>>
>> Note: I attached my solrconfigand schema.xml in this email
>>
>>
>>
>> -Inline Attachment Follows-
>> {edited out by Ryan for brevity}
>>


OutOfMemory error on solrslaves

2009-06-17 Thread Francis Yakin

We are experiencing "OutOfMemory" error frequently on our slaves, this is the 
error:

SEVERE: Error during auto-warming of 
key:org.apache.solr.search.queryresult...@a8c6f867:java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 5120080, Num elements: 1280015
java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5968, Num 
elements: 2974
java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 3736, Num 
elements: 1859
Exception in thread "DynamicListenThread[Default]" java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 8208, Num elements: 8192

We need to bounce our weblogic and solr application when that happened to clear 
the java "OutOfMemory" error.

Some people suggested me to made change on "queryResultCache"

Currently se to:



So the recommendation is:



We are still running solr 1.2.0 version, do you upgrade to higher version will 
also resolve the issue.

Any inputs will be much appreciated.

Thanks

Francis




RE: OutOfMemory error on solrslaves

2009-06-17 Thread Francis Yakin

Ok Thanks Koji!

We have a test machine that currently running 1.3.0, I see the 
200

This set to "200" by default, should I increase it, if yes what should I set to?

Regards,

Francis

-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
Sent: Wednesday, June 17, 2009 8:28 PM
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemory error on solrslaves

Francis Yakin wrote:
> We are experiencing "OutOfMemory" error frequently on our slaves, this is the 
> error:
>
> SEVERE: Error during auto-warming of 
> key:org.apache.solr.search.queryresult...@a8c6f867:java.lang.OutOfMemoryError:
>  allocLargeObjectOrArray - Object size: 5120080, Num elements: 1280015
> java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5968, Num 
> elements: 2974
> java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 3736, Num 
> elements: 1859
> Exception in thread "DynamicListenThread[Default]" 
> java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num 
> elements: 8192
>
> We need to bounce our weblogic and solr application when that happened to 
> clear the java "OutOfMemory" error.
>
> Some people suggested me to made change on "queryResultCache"
>
> Currently se to:
>
>   class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="256"/>
>
> So the recommendation is:
>
>  class="solr.LRUCache"
> size="1024"
> initialSize="512"
> autowarmCount="256"/>
>
>
I don't think the change solves your problem... Since the error occured
during auto-warming,
autowarmingCount="0" might help. But...

> We are still running solr 1.2.0 version, do you upgrade to higher version 
> will also resolve the issue.
>
>
Solr 1.3 has queryResultMaxDocsCached parameter in solrconfig.xml.
I'd expect it works for you.

Koji
> Any inputs will be much appreciated.
>
> Thanks
>
> Francis
>
>
>
>



Can I use the same index from 1.2.0 to 1.3.0?

2009-06-18 Thread Francis Yakin


Can I transport the index from Solr 1.2 to Sol 1.3 without 
resubmiting/reloading again from Database?

Francis




Slowness during submit the index

2009-06-19 Thread Francis Yakin

We are experiencing slowness during reloading/resubmitting index from Database 
to the master.

We have two environments:

QA and Prod.

The slowness is happened only in Production but not in QA.

It only takes one hours to reload 2.5Mil indexes compare 5-6 hours to load the 
same size of index in Prod.

I checked both the config files in QA and Prod, they are all identical, except:


In QA:
false
In Prod:
true

I believe that we use "http" protocol reload/submit the index from Database to 
Solr Master.
I did test copying big files thru network from database to the solr box, I 
don't see any issue.

We are running solr 1.2

Any inputs will be much appreciated.




RE: Java OutOfmemory error during autowarming

2009-06-19 Thread Francis Yakin

Thanks Chris for the update.
Right now we still having constant issue with our Production slaves.
Do you think when I upgrade Solr to 1.3.0 this will fix the issue. I heard that 
in 1.3.0 there is parameter that you can set:

200

Francis


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Friday, June 19, 2009 5:49 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Java OutOfmemory error during autowarming



: Date: Mon, 1 Jun 2009 11:34:08 -0700
: From: Francis Yakin
: Subject: RE: Java OutOfmemory error during autowarming
...
: When it is initialized for the first time, how can I change it?

Frnacis, I'm not sure if you already figured this out, but the point of
hte documentation that Chris refered you to is that it is a completley
under the covers caching mechanism of Lucene on a per field basis -- there
is no way to change it, or configure it.  It's sized according to the
number of documents in the index and the number of terms in the Field.

If you execute an operation that requires Lucene to use that datastructure
for a field, then Lucene will build it, and cache it for as long as the
IndexReader is open.


: -Original Message-
: From: Chris Harris [mailto:rygu...@gmail.com]
: Sent: Sunday, May 31, 2009 3:00 PM
: To: solr-user@lucene.apache.org
: Subject: Re: Java OutOfmemory error during autowarming
:
: Solr offers no configuration for FieldCache, neither in solrconfig.xml nor 
anywhere else; rather, that cache gets populated automatically in the depths of 
Lucene when you do a sort (or also apparently, as Yonik says, when you use a 
field in a function query).
:
: >From the wiki: 'Lucene has a low level "FieldCache" which is used for 
sorting (and in some cases faceting). This cache is not managed by Solr it has 
no configuration options and cannot be autowarmed -- it is initialized the 
first time it is used for each Searcher.' (
: http://wiki.apache.org/solr/SolrCaching)
:
: 2009/5/29 Francis Yakin 
:
: >
: > I know, but the FieldCache is not in the solrconfig.xml
: >
: >
: > -Original Message-
: > From: Yonik Seeley [mailto:ysee...@gmail.com]
: > Sent: Friday, May 29, 2009 10:47 AM
: > To: solr-user@lucene.apache.org
: > Subject: Re: Java OutOfmemory error during autowarming
: >
: > On Fri, May 29, 2009 at 1:44 PM, Francis Yakin  wrote:
: > >
: > > There is no "FieldCache" entries in solrconfig.xml ( BTW we are
: > > running version 1.2.0)
: >
: > Lucene FieldCache entries are created when you sort on a field or when
: > you use a field in a function query.
: >
: > -Yonik
: >
:



-Hoss



RE: Slowness during submit the index

2009-06-19 Thread Francis Yakin
 * is the java version the same on both machines (QA vs. PROD)  - YES
* are the same java parameters being used on both machines  - YES
* is the connection to the DB the same on both machines - Not sure, 
need to ask the network guy
* are both the PROD and QA DB servers the same and are both DB instances the 
same - they are not from the same DB

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Friday, June 19, 2009 6:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Slowness during submit the index


Francis,

I'm not sure if I understood your email correctly, but I think you are saying 
you are indexing your DB content into a Solr index.  If this is correct, here 
are things to look at:
* is the java version the same on both machines (QA vs. PROD)
* are the same java parameters being used on both machines
* is the connection to the DB the same on both machines
* are both the PROD and QA DB servers the same and are both DB instances the 
same
...


 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Francis Yakin 
> To: "solr-user@lucene.apache.org" 
> Sent: Friday, June 19, 2009 5:27:59 PM
> Subject: Slowness during submit the index
>
>
> We are experiencing slowness during reloading/resubmitting index from Database
> to the master.
>
> We have two environments:
>
> QA and Prod.
>
> The slowness is happened only in Production but not in QA.
>
> It only takes one hours to reload 2.5Mil indexes compare 5-6 hours to load the
> same size of index in Prod.
>
> I checked both the config files in QA and Prod, they are all identical, 
> except:
>
>
> In QA:
> false
> In Prod:
> true
>
> I believe that we use "http" protocol reload/submit the index from Database to
> Solr Master.
> I did test copying big files thru network from database to the solr box, I 
> don't
> see any issue.
>
> We are running solr 1.2
>
> Any inputs will be much appreciated.



RE: Slowness during submit the index

2009-06-19 Thread Francis Yakin
 The amount of data in Prod is about 20% more than QA.
We tested the network speed is fine. The hardware in Prod is larger and more 
powerful than QA.
But QA is faster during reload. It takes QA only one hour than 6 hours in Prod.

That's why we don't understand what's the reason, the amount of data is only 
20% more but it will not take 5 times slower because the data only 20% more.

So, we looked into the config file for solr, but it's not much different, 
except Prod has master/slave environment which QA only master.

Thanks for the response.

Francis


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Friday, June 19, 2009 8:58 PM
To: solr-user@lucene.apache.org
Subject: Re: Slowness during submit the index


Francis,

So it could easily be that your QA and PROD DBs are really just simply 
different (different amount of data, different network speed, different 
hardware...)

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message ----
> From: Francis Yakin 
> To: "solr-user@lucene.apache.org" 
> Sent: Friday, June 19, 2009 10:39:48 PM
> Subject: RE: Slowness during submit the index
>
> * is the java version the same on both machines (QA vs. PROD)  - YES
> * are the same java parameters being used on both machines  - YES
> * is the connection to the DB the same on both machines - Not sure, 
> need
> to ask the network guy
> * are both the PROD and QA DB servers the same and are both DB instances the
> same - they are not from the same DB
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Friday, June 19, 2009 6:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Slowness during submit the index
>
>
> Francis,
>
> I'm not sure if I understood your email correctly, but I think you are saying
> you are indexing your DB content into a Solr index.  If this is correct, here
> are things to look at:
> * is the java version the same on both machines (QA vs. PROD)
> * are the same java parameters being used on both machines
> * is the connection to the DB the same on both machines
> * are both the PROD and QA DB servers the same and are both DB instances the
> same
> ...
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: Francis Yakin
> > To: "solr-user@lucene.apache.org"
> > Sent: Friday, June 19, 2009 5:27:59 PM
> > Subject: Slowness during submit the index
> >
> >
> > We are experiencing slowness during reloading/resubmitting index from 
> > Database
> > to the master.
> >
> > We have two environments:
> >
> > QA and Prod.
> >
> > The slowness is happened only in Production but not in QA.
> >
> > It only takes one hours to reload 2.5Mil indexes compare 5-6 hours to load 
> > the
> > same size of index in Prod.
> >
> > I checked both the config files in QA and Prod, they are all identical,
> except:
> >
> >
> > In QA:
> > false
> > In Prod:
> > true
> >
> > I believe that we use "http" protocol reload/submit the index from Database 
> > to
> > Solr Master.
> > I did test copying big files thru network from database to the solr box, I
> don't
> > see any issue.
> >
> > We are running solr 1.2
> >
> > Any inputs will be much appreciated.



RE: Slowness during submit the index

2009-06-22 Thread Francis Yakin
No VM.

-Original Message-
From: Bruno [mailto:brun...@gmail.com]
Sent: Saturday, June 20, 2009 10:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Slowness during submit the index

We were having performance issues using servers running on VM. Are you
running QA or Prod in a VM?

2009/6/21, Stephen Weiss :
> Isn't it possible that the production equipment is simply under much
> higher load (given that, since it's in production, your various users
> are all actually using it), vs the QA equipment, which is only in use
> by the people doing QA?
>
> We've found the same thing at one point - we had a very small index (<
> 4 rows), so small it didn't seem worth the effort to do delta
> updates.  So we would just refresh the whole thing every time - or so
> we planned.  In the test environment it updated within a minute.  In
> production, it would take as long as 15 minutes.  What we finally
> realized was, because the DB was under much higher load in production
> than in the test environment, especially considering the amount of
> joins that needed to take place to pull out the data properly, various
> writes from the users to the affected tables would slow down the data
> selection process dramatically as the indexer would have to wait for
> locks to clear.  Now of course we do delta updates and everything's
> fine (and blazingly fast in both environments).
>
> Try simulating higher load (involving a "normal" amount of writes to
> the DB) against your QA equipment and then building the index.  See if
> the QA equipment still runs so quickly.
>
> --
> Steve
>
> On Jun 20, 2009, at 11:29 PM, Otis Gospodnetic wrote:
>
>>
>> Hi Francis,
>>
>> I can't tell what the problem is from the information you've
>> provided so far.  My gut instinct is that this is due to some
>> difference in QA vs. PROD environments that isn't Solr-specific.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message 
>>> From: Francis Yakin 
>>> To: "solr-user@lucene.apache.org" 
>>> Sent: Saturday, June 20, 2009 2:18:07 AM
>>> Subject: RE: Slowness during submit the index
>>>
>>> The amount of data in Prod is about 20% more than QA.
>>> We tested the network speed is fine. The hardware in Prod is larger
>>> and more
>>> powerful than QA.
>>> But QA is faster during reload. It takes QA only one hour than 6
>>> hours in Prod.
>>>
>>> That's why we don't understand what's the reason, the amount of
>>> data is only 20%
>>> more but it will not take 5 times slower because the data only 20%
>>> more.
>>>
>>> So, we looked into the config file for solr, but it's not much
>>> different, except
>>> Prod has master/slave environment which QA only master.
>>>
>>> Thanks for the response.
>>>
>>> Francis
>>>
>>>
>>> -Original Message-
>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>>> Sent: Friday, June 19, 2009 8:58 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Slowness during submit the index
>>>
>>>
>>> Francis,
>>>
>>> So it could easily be that your QA and PROD DBs are really just
>>> simply different
>>> (different amount of data, different network speed, different
>>> hardware...)
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>>
>>>
>>> - Original Message 
>>>> From: Francis Yakin
>>>> To: "solr-user@lucene.apache.org"
>>>> Sent: Friday, June 19, 2009 10:39:48 PM
>>>> Subject: RE: Slowness during submit the index
>>>>
>>>> * is the java version the same on both machines (QA vs. PROD)  - YES
>>>> * are the same java parameters being used on both machines  -
>>>> YES
>>>> * is the connection to the DB the same on both machines -
>>>> Not sure,
>>> need
>>>> to ask the network guy
>>>> * are both the PROD and QA DB servers the same and are both DB
>>>> instances the
>>>> same - they are not from the same DB
>>>>
>>>> -Original Message-
>>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>>>> Sent: Friday, June 19, 2009 6:23 PM
>>>> To: solr-user@luce

Is there any other way to load the index beside using "http" connection?

2009-07-01 Thread Francis Yakin

We have several thousands of  xml files in database that we load it to solr 
master
The Database uses "http"  connection and transfer those files to solr master. 
Solr then  translate xml files to their lindex.

We are experiencing issue with close/open connection in the firewall and very 
very slow.

Is there any other way to load the data/index from Database to solr master 
beside using http connection, so it means we just scp/ftp the xml file  from 
Database system to solr master  and let solr convert those to lucene indexes?

Any input or help will be much appreciated.


Thanks

Francis





RE: Is there any other way to load the index beside using "http" connection?

2009-07-01 Thread Francis Yakin

Otis,

Do you have the document how to do those things that you mentioned?

How about if I don't want use HHTP at all? Or we have no other option that we 
have to use HHTP to transfer the XML files to Solr master from Db box?

Thanks

Francis

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Wednesday, July 01, 2009 8:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using "http" 
connection?


Francis,

There are a number of things you can do to make indexing over HTTP faster.
You can also import documents as csv data/file.
Finally, you can use EmbeddedSolrServer.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Francis Yakin 
> To: "solr-user@lucene.apache.org" 
> Sent: Wednesday, July 1, 2009 6:07:12 PM
> Subject: Is there any other way to load the index beside using "http" 
> connection?
>
>
> We have several thousands of  xml files in database that we load it to solr
> master
> The Database uses "http"  connection and transfer those files to solr master.
> Solr then  translate xml files to their lindex.
>
> We are experiencing issue with close/open connection in the firewall and very
> very slow.
>
> Is there any other way to load the data/index from Database to solr master
> beside using http connection, so it means we just scp/ftp the xml file  from
> Database system to solr master  and let solr convert those to lucene indexes?
>
> Any input or help will be much appreciated.
>
>
> Thanks
>
> Francis



RE: Is there any other way to load the index beside using "http" connection?

2009-07-01 Thread Francis Yakin

Glen,

Database we use is Oracle, I am not the database administrator, so I don't 
familiar with their script.
SO, basically we have the Oracle SQL script to load the XML files over HTTP 
connection to our Solr Master.

My question is there any other way instead of using HTTP connection to load the 
XML files to our SOLR Master?

You mentioned about LuSql, I am not familiar with that. Can you provide us the 
docs or something? Again I am not the database Guys, I am only the solr Guy. 
The database we have is a different box than Solr master and both are running 
linux(RedHat).

Thanks

Francis

-Original Message-
From: Glen Newton [mailto:glen.new...@gmail.com]
Sent: Wednesday, July 01, 2009 8:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

You can directly load to the backend Lucene using LuSql[1]. It is
faster than Solr, sometimes as much as an order of magnitude faster.

Disclosure: I am the author of LuSql

-Glen
http://zzzoot.blogspot.com/

[1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql

2009/7/1 Francis Yakin :
>
> We have several thousands of  xml files in database that we load it to solr 
> master
> The Database uses "http"  connection and transfer those files to solr master. 
> Solr then  translate xml files to their lindex.
>
> We are experiencing issue with close/open connection in the firewall and very 
> very slow.
>
> Is there any other way to load the data/index from Database to solr master 
> beside using http connection, so it means we just scp/ftp the xml file  from 
> Database system to solr master  and let solr convert those to lucene indexes?
>
> Any input or help will be much appreciated.
>
>
> Thanks
>
> Francis
>
>
>
>



--

-


RE: Is there any other way to load the index beside using "http" connection?

2009-07-01 Thread Francis Yakin

Glen,

Are you saying that we have to use LuSql replacing our Solr?

Francis

-Original Message-
From: Glen Newton [mailto:glen.new...@gmail.com]
Sent: Wednesday, July 01, 2009 8:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

You can directly load to the backend Lucene using LuSql[1]. It is
faster than Solr, sometimes as much as an order of magnitude faster.

Disclosure: I am the author of LuSql

-Glen
http://zzzoot.blogspot.com/

[1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql

2009/7/1 Francis Yakin :
>
> We have several thousands of  xml files in database that we load it to solr 
> master
> The Database uses "http"  connection and transfer those files to solr master. 
> Solr then  translate xml files to their lindex.
>
> We are experiencing issue with close/open connection in the firewall and very 
> very slow.
>
> Is there any other way to load the data/index from Database to solr master 
> beside using http connection, so it means we just scp/ftp the xml file  from 
> Database system to solr master  and let solr convert those to lucene indexes?
>
> Any input or help will be much appreciated.
>
>
> Thanks
>
> Francis
>
>
>
>



--

-


RE: Is there any other way to load the index beside using "http" connection?

2009-07-01 Thread Francis Yakin

How you import the documents as csv data/file from Oracle Database to Sol 
master( they are two different machines)?

And you have the doc for using EmbeddedSolrServer?

Thanks Otis!

Francis

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Wednesday, July 01, 2009 8:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using "http" 
connection?


Francis,

There are a number of things you can do to make indexing over HTTP faster.
You can also import documents as csv data/file.
Finally, you can use EmbeddedSolrServer.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message ----
> From: Francis Yakin 
> To: "solr-user@lucene.apache.org" 
> Sent: Wednesday, July 1, 2009 6:07:12 PM
> Subject: Is there any other way to load the index beside using "http" 
> connection?
>
>
> We have several thousands of  xml files in database that we load it to solr
> master
> The Database uses "http"  connection and transfer those files to solr master.
> Solr then  translate xml files to their lindex.
>
> We are experiencing issue with close/open connection in the firewall and very
> very slow.
>
> Is there any other way to load the data/index from Database to solr master
> beside using http connection, so it means we just scp/ftp the xml file  from
> Database system to solr master  and let solr convert those to lucene indexes?
>
> Any input or help will be much appreciated.
>
>
> Thanks
>
> Francis



RE: Is there any other way to load the index beside using "http" connection?

2009-07-01 Thread Francis Yakin
 Thanks Noble!

This is only for version 1.3.0? We are running 1.2.0 currently.

Francis


-Original Message-
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
Paul ??? ??
Sent: Wednesday, July 01, 2009 9:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

did you explore DIH http://wiki.apache.org/solr/DataImportHandler

it has features to import from Db, xml files etc

On Thu, Jul 2, 2009 at 3:37 AM, Francis Yakin wrote:
>
> We have several thousands of  xml files in database that we load it to solr 
> master
> The Database uses "http"  connection and transfer those files to solr master. 
> Solr then  translate xml files to their lindex.
>
> We are experiencing issue with close/open connection in the firewall and very 
> very slow.
>
> Is there any other way to load the data/index from Database to solr master 
> beside using http connection, so it means we just scp/ftp the xml file  from 
> Database system to solr master  and let solr convert those to lucene indexes?
>
> Any input or help will be much appreciated.
>
>
> Thanks
>
> Francis
>
>
>
>



--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


RE: Is there any other way to load the index beside using "http" connection?

2009-07-02 Thread Francis Yakin
Norberto, Thanks for your input.

What do you mean with "Have you tried connecting to  SOLR over HTTP from 
localhost, therefore avoiding any firewall issues and network latency ? it 
should work a LOT faster than from a remote site." ?

Here are how our servers lay out:

1) Database ( Oracle ) is running on separate machine
2) Solr master is running on separate machine by itself
3) 6 solr slaves ( these 6 pulll the index from master using rsync)

We have a SQL(Oracle) script to post the data/index from Oracle Database 
machine to Solr Master over http.
We wrote those script(Someone in Oracle Database administrator write it).

In Solr master configuration we have scripts.conf that like this:

user=
solr_hostname=localhost
solr_port=7001
rsyncd_port=18983
data_dir=
webapp_name=solr
master_host=localhost
master_data_dir=solr/snapshot
master_status_dir=solr/status

So, basically from Oracle system we launch the Oracle/SQL script posting the 
data to Solr Master using
http://solrmaster/solr/update ( inside the SQL script we put this).

We can not do localhost since it's solr is not running on Oracle machine.

Another alternative that we think of is to transform XML into CSV and 
import/export it.

How about if LUSQL, some mentioned about this? Is this apps free(open source) 
application? Do you have any experience with this apps?

Thanks All for your valuable suggestions!

Francis


-Original Message-
From: Norberto Meijome [mailto:numard...@gmail.com]
Sent: Thursday, July 02, 2009 3:01 AM
To: solr-user@lucene.apache.org
Cc: Francis Yakin
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

On Wed, 1 Jul 2009 15:07:12 -0700
Francis Yakin  wrote:

>
> We have several thousands of  xml files in database that we load it to solr
> master The Database uses "http"  connection and transfer those files to solr
> master. Solr then  translate xml files to their lindex.
>
> We are experiencing issue with close/open connection in the firewall and very
> very slow.
>
> Is there any other way to load the data/index from Database to solr master
> beside using http connection, so it means we just scp/ftp the xml file  from
> Database system to solr master  and let solr convert those to lucene indexes?
>

Francis,
after reading the whole thread, it seems you have :
  - Data source : Oracle DB, on separate location to your SOLR.
  - Data format : XML output.

definitely DIH is a great option, but since you are on 1.2, not available to 
you (you should look into upgrading if you can!).

Have you tried connecting to  SOLR over HTTP from localhost, therefore avoiding 
any firewall issues and network latency ? it should work a LOT faster than from 
a remote site. Also make sure not to commit until you really needed.

Other alternatives are to transform the XML into csv and import it that way. Or 
write a simple app that will parse the xml and post it directly using the 
embedded solr method.

plenty of options, all of them documented @ solr's site.

good luck,
b
_
{Beto|Norberto|Numard} Meijome

"People demand freedom of speech to make up for the freedom of thought which 
they avoid. "
  Soren Aabye Kierkegaard

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


RE: Is there any other way to load the index beside using "http" connection?

2009-07-02 Thread Francis Yakin

Glen,

Is this LuSql is free? Is that an open source.
Is that requires a separate machine with Solr Master

I forgot to tell you that we have Master/Slaves environment of Solr.

The Database is running Oracle and it's separate machine that running in 
different network than Master and Slaves Solr(There is a firewall between 
Oracle machine and Solr Machines).
If we have LuSql Machine, do you think it's better to put into the same network 
with DataBase machine or Solr machines?
Do I need to create a sql script to get the data from Oarcle and loading it 
using LuSql and convert it to Lucene index, and how solr master will get that 
data?


Thanks

Francis


-Original Message-
From: Glen Newton [mailto:glen.new...@gmail.com]
Sent: Thursday, July 02, 2009 8:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

LuSql can be found here:
 http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
User Manual:
 http://cuvier.cisti.nrc.ca/~gnewton/lusql/v0.9/lusqlManual.pdf.html

LuSql can communicate directly with Oracle and create a Lucene index for you.
Of course - as mentioned by other posters - you need to make sure the
versions of Lucene and Solr are compatible (use same jars), you use
the same Analyzers, and you create the appropriate 'schema' that Solr
understands.

-glen

2009/7/2 Francis Yakin :
>
> Glen,
>
> Database we use is Oracle, I am not the database administrator, so I don't 
> familiar with their script.
> SO, basically we have the Oracle SQL script to load the XML files over HTTP 
> connection to our Solr Master.
>
> My question is there any other way instead of using HTTP connection to load 
> the XML files to our SOLR Master?
>
> You mentioned about LuSql, I am not familiar with that. Can you provide us 
> the docs or something? Again I am not the database Guys, I am only the solr 
> Guy. The database we have is a different box than Solr master and both are 
> running linux(RedHat).
>
> Thanks
>
> Francis
>
> -Original Message-
> From: Glen Newton [mailto:glen.new...@gmail.com]
> Sent: Wednesday, July 01, 2009 8:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Is there any other way to load the index beside using "http" 
> connection?
>
> You can directly load to the backend Lucene using LuSql[1]. It is
> faster than Solr, sometimes as much as an order of magnitude faster.
>
> Disclosure: I am the author of LuSql
>
> -Glen
> http://zzzoot.blogspot.com/
>
> [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
>
> 2009/7/1 Francis Yakin :
>>
>> We have several thousands of  xml files in database that we load it to solr 
>> master
>> The Database uses "http"  connection and transfer those files to solr 
>> master. Solr then  translate xml files to their lindex.
>>
>> We are experiencing issue with close/open connection in the firewall and 
>> very very slow.
>>
>> Is there any other way to load the data/index from Database to solr master 
>> beside using http connection, so it means we just scp/ftp the xml file  from 
>> Database system to solr master  and let solr convert those to lucene indexes?
>>
>> Any input or help will be much appreciated.
>>
>>
>> Thanks
>>
>> Francis
>>
>>
>>
>>
>
>
>
> --
>
> -
>



--

-


RE: Is there any other way to load the index beside using "http" connection?

2009-07-05 Thread Francis Yakin
 Norberto,

Yes, DIH is one of the option we think to use, but it's required 1.3.0 and 
above and currently we are running Sol 1.2.0.

I am thinking to use CSV file(Convert the XML to CSV format in Database 
machine( , then transport that CSV file to solr box.
In Solr we run the update to convert the CSV file to Lucene index.

Also , we think the one you suggested, note my question below:

>
>why not generate your SQL output directly into your oracle server as a file,
  question:  What type of file is this(XML or CSV)?

>upload the file to your SOLR server? Then the data file is local to your SOLR
>server , you will bypass any WAN and firewall you may be having. (or some
>variation of it, sql -> SOLR server as file, etc..)

How we upload the file? Do we need to convert the data file to Lucene Index 
first?
 And Documentation how we do this?

>Any speed issues that are rooted in the fact that you are posting via
>HTTP (vs embedded solr or DIH) aren't going to go away. But it's the simpler
>approach without changing too much of your current setup.


-Original Message-
From: Norberto Meijome [mailto:numard...@gmail.com]
Sent: Sunday, July 05, 2009 3:57 AM
To: Francis Yakin
Cc: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

On Thu, 2 Jul 2009 11:02:28 -0700
Francis Yakin  wrote:

> Norberto, Thanks for your input.
>
> What do you mean with "Have you tried connecting to  SOLR over HTTP from
> localhost, therefore avoiding any firewall issues and network latency ? it
> should work a LOT faster than from a remote site." ?
>
>
> Here are how our servers lay out:
>
> 1) Database ( Oracle ) is running on separate machine
> 2) Solr master is running on separate machine by itself
> 3) 6 solr slaves ( these 6 pulll the index from master using rsync)
>
> We have a SQL(Oracle) script to post the data/index from Oracle Database
> machine to Solr Master over http. We wrote those script(Someone in Oracle
> Database administrator write it).

You said in your other email you are having issues with slow transfers between
1) and 2). Your subject relates to the data transfer between 1) and 2, - 2) and
3) is irrelevant to this part.

My question (what you quoted above) relates to the point you made about it
being slow ( WHY is it slow?), and issues with opening so many connections
through firewall. so, I'll rephrase my question (see below...)

[]
>
> We can not do localhost since it's solr is not running on Oracle machine.

why not generate your SQL output directly into your oracle server as a file,
upload the file to your SOLR server? Then the data file is local to your SOLR
server , you will bypass any WAN and firewall you may be having. (or some
variation of it, sql -> SOLR server as file, etc..)

Any speed issues that are rooted in the fact that you are posting via
HTTP (vs embedded solr or DIH) aren't going to go away. But it's the simpler
approach without changing too much of your current setup.


> Another alternative that we think of is to transform XML into CSV and
> import/export it.
>
> How about if LUSQL, some mentioned about this? Is this apps free(open source)
> application? Do you have any experience with this apps?

Not i, sorry.

Have you looked into DIH? It's designed for this kind of work.

B
_
{Beto|Norberto|Numard} Meijome

"Great spirits have often encountered violent opposition from mediocre minds."
  Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.



RE: Is there any other way to load the index beside using "http" connection?

2009-07-05 Thread Francis Yakin
 Thanks Marcus,

I will give a try to a test machine first.

Francis

-Original Message-
From: Marcus Herou [mailto:marcus.he...@tailsweep.com]
Sent: Sunday, July 05, 2009 12:37 PM
To: solr-user@lucene.apache.org
Cc: Norberto Meijome
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

Sharing some of our exports from DB to solr. Note: many of the statements
below might not work due to clip-clip.

$SOLR_HOME/conf/dataConfig.xml

  
   
   
   


  
  



# Then add this to solrconfig.xml


  data-config.xml



restart solr

Issue a mysql dump
mysql --xml -uXXX -pXXX -hXXX -DXXX -e "select MD5(link) as
uid,DATE_FORMAT(publishedDate, \"%Y-%m-%dT%H:%i:%sZ\") as publishedDate from
X" > $dumpdir/dump.xml

# Warning: Note the clean command which will wipe your index...
GET "http://
$server:$port/$path/dataimport?command=full-import&clean=true&optimize=true"

Hope this helps out some.

Cheers

//Marcus


On Sun, Jul 5, 2009 at 7:28 PM, Francis Yakin  wrote:

>  Norberto,
>
> Yes, DIH is one of the option we think to use, but it's required 1.3.0 and
> above and currently we are running Sol 1.2.0.
>
> I am thinking to use CSV file(Convert the XML to CSV format in Database
> machine( , then transport that CSV file to solr box.
> In Solr we run the update to convert the CSV file to Lucene index.
>
> Also , we think the one you suggested, note my question below:
>
> >
> >why not generate your SQL output directly into your oracle server as a
> file,
>   question:  What type of file is this(XML or CSV)?
>
> >upload the file to your SOLR server? Then the data file is local to your
> SOLR
> >server , you will bypass any WAN and firewall you may be having. (or some
> >variation of it, sql -> SOLR server as file, etc..)
>
> How we upload the file? Do we need to convert the data file to Lucene Index
> first?
>  And Documentation how we do this?
>
> >Any speed issues that are rooted in the fact that you are posting via
> >HTTP (vs embedded solr or DIH) aren't going to go away. But it's the
> simpler
> >approach without changing too much of your current setup.
>
>
> -Original Message-
> From: Norberto Meijome [mailto:numard...@gmail.com]
> Sent: Sunday, July 05, 2009 3:57 AM
> To: Francis Yakin
> Cc: solr-user@lucene.apache.org
> Subject: Re: Is there any other way to load the index beside using "http"
> connection?
>
> On Thu, 2 Jul 2009 11:02:28 -0700
> Francis Yakin  wrote:
>
> > Norberto, Thanks for your input.
> >
> > What do you mean with "Have you tried connecting to  SOLR over HTTP from
> > localhost, therefore avoiding any firewall issues and network latency ?
> it
> > should work a LOT faster than from a remote site." ?
> >
> >
> > Here are how our servers lay out:
> >
> > 1) Database ( Oracle ) is running on separate machine
> > 2) Solr master is running on separate machine by itself
> > 3) 6 solr slaves ( these 6 pulll the index from master using rsync)
> >
> > We have a SQL(Oracle) script to post the data/index from Oracle Database
> > machine to Solr Master over http. We wrote those script(Someone in Oracle
> > Database administrator write it).
>
> You said in your other email you are having issues with slow transfers
> between
> 1) and 2). Your subject relates to the data transfer between 1) and 2, - 2)
> and
> 3) is irrelevant to this part.
>
> My question (what you quoted above) relates to the point you made about it
> being slow ( WHY is it slow?), and issues with opening so many connections
> through firewall. so, I'll rephrase my question (see below...)
>
> []
> >
> > We can not do localhost since it's solr is not running on Oracle machine.
>
> why not generate your SQL output directly into your oracle server as a
> file,
> upload the file to your SOLR server? Then the data file is local to your
> SOLR
> server , you will bypass any WAN and firewall you may be having. (or some
> variation of it, sql -> SOLR server as file, etc..)
>
> Any speed issues that are rooted in the fact that you are posting via
> HTTP (vs embedded solr or DIH) aren't going to go away. But it's the
> simpler
> approach without changing too much of your current setup.
>
>
> > Another alternative that we think of is to transform XML into CSV and
> > import/export it.
> >
> > How about if LUSQL, some mentioned about this? Is this apps free(open
> source)
> > application? Do you have any exper

RE: Is there any other way to load the index beside using "http" connection?

2009-07-06 Thread Francis Yakin
 Norberto,

Thanks, I think my questions is:

>>why not generate your SQL output directly into your oracle server as a file

What type of file is this?


Thanks again for your help.


Francis

-Original Message-
From: Norberto Meijome [mailto:numard...@gmail.com]
Sent: Monday, July 06, 2009 4:33 AM
To: Francis Yakin
Cc: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

On Sun, 5 Jul 2009 10:28:16 -0700
Francis Yakin  wrote:

[...]>
> >upload the file to your SOLR server? Then the data file is local to your SOLR
> >server , you will bypass any WAN and firewall you may be having. (or some
> >variation of it, sql -> SOLR server as file, etc..)
>
> How we upload the file? Do we need to convert the data file to Lucene Index
> first? And Documentation how we do this?

pick your poison... rsync? ftp? scp ?

B
_
{Beto|Norberto|Numard} Meijome

"The freethinking of one age is the common sense of the next."
   Matthew Arnold

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


RE: Is there any other way to load the index beside using "http" connection?

2009-07-06 Thread Francis Yakin

Ok, I have a CSV file(called it test.csv) from database.

When I tried to upload this file to solr using this cmd, I got 
"stream.contentType=text/plain: No such file or directory" error

curl 
http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv&stream.contentType=text/plain;charset=utf-8

-bash: stream.contentType=text/plain: No such file or directory
 undefined field cat

What did I do wrong?

Francis

-Original Message-
From: Norberto Meijome [mailto:numard...@gmail.com]
Sent: Monday, July 06, 2009 11:01 AM
To: Francis Yakin
Cc: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

On Mon, 6 Jul 2009 09:56:03 -0700
Francis Yakin  wrote:

>  Norberto,
>
> Thanks, I think my questions is:
>
> >>why not generate your SQL output directly into your oracle server as a file
>
> What type of file is this?
>
>

a file in a format that you can then import into SOLR.

_
{Beto|Norberto|Numard} Meijome

"Gravity cannot be blamed for people falling in love."
  Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


RE: Is there any other way to load the index beside using "http" connection?

2009-07-06 Thread Francis Yakin
Yes, I uploaded the CSV file that I get it from Database then I ran that cmd 
and I have the error.

Any suggestions?

Thanks

Francis

-Original Message-
From: NitinMalik [mailto:malik.ni...@yahoo.com]
Sent: Monday, July 06, 2009 11:32 AM
To: solr-user@lucene.apache.org
Subject: RE: Is there any other way to load the index beside using "http" 
connection?


Hi Francis,

I have experienced that update stream handler (for a xml file in my case)
worked only for Solr running on the same machine. I also got same error when
I tried to update the documents on a remote Solr instance.

Regards
Nitin


Francis Yakin wrote:
>
>
> Ok, I have a CSV file(called it test.csv) from database.
>
> When I tried to upload this file to solr using this cmd, I got
> "stream.contentType=text/plain: No such file or directory" error
>
> curl
> http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv&stream.contentType=text/plain;charset=utf-8
>
> -bash: stream.contentType=text/plain: No such file or directory
>  undefined field cat
>
> What did I do wrong?
>
> Francis
>
> -Original Message-
> From: Norberto Meijome [mailto:numard...@gmail.com]
> Sent: Monday, July 06, 2009 11:01 AM
> To: Francis Yakin
> Cc: solr-user@lucene.apache.org
> Subject: Re: Is there any other way to load the index beside using "http"
> connection?
>
> On Mon, 6 Jul 2009 09:56:03 -0700
> Francis Yakin  wrote:
>
>>  Norberto,
>>
>> Thanks, I think my questions is:
>>
>> >>why not generate your SQL output directly into your oracle server as a
>> file
>>
>> What type of file is this?
>>
>>
>
> a file in a format that you can then import into SOLR.
>
> _
> {Beto|Norberto|Numard} Meijome
>
> "Gravity cannot be blamed for people falling in love."
>   Albert Einstein
>
> I speak for myself, not my employer. Contents may be hot. Slippery when
> wet. Reading disclaimers makes you go blind. Writing them is worse. You
> have been Warned.
>
>

--
View this message in context: 
http://www.nabble.com/Is-there-any-other-way-to-load-the-index-beside-using-%22http%22-connection--tp24297934p24360603.html
Sent from the Solr - User mailing list archive at Nabble.com.



Creating DataSource for DIH to Oracle Database

2009-07-06 Thread Francis Yakin

Have any one had experience creating a datasource for DIH to an Oracle Database?

Also, from the Solr side we are running weblogic and deploy the application 
using weblogic.
I know in weblogic we can create a datasource that can connect to Oracle 
database, has any one had experience with this?


Thanks

Francis




RE: Is there any other way to load the index beside using "http" connection?

2009-07-07 Thread Francis Yakin

I did try:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv"&"stream.contentType=text/plain;charset=utf-8'

It doesn't work

Francis

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Tuesday, July 07, 2009 4:59 AM
To: solr-user@lucene.apache.org
Cc: Norberto Meijome
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

Look at the error - it's bash (your command line shell) complaining.
The '&' terminates one command and puts it in the background.
Surrounding the command with quotes will get you one step closer:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv&stream.contentType=text/plain;charset=utf-8'

-Yonik
http://www.lucidimagination.com



On Mon, Jul 6, 2009 at 2:11 PM, Francis Yakin wrote:
>
> Ok, I have a CSV file(called it test.csv) from database.
>
> When I tried to upload this file to solr using this cmd, I got 
> "stream.contentType=text/plain: No such file or directory" error
>
> curl 
> http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv&stream.contentType=text/plain;charset=utf-8
>
> -bash: stream.contentType=text/plain: No such file or directory
>  undefined field cat
>
> What did I do wrong?
>
> Francis
>
> -----Original Message-
> From: Norberto Meijome [mailto:numard...@gmail.com]
> Sent: Monday, July 06, 2009 11:01 AM
> To: Francis Yakin
> Cc: solr-user@lucene.apache.org
> Subject: Re: Is there any other way to load the index beside using "http" 
> connection?
>
> On Mon, 6 Jul 2009 09:56:03 -0700
> Francis Yakin  wrote:
>
>>  Norberto,
>>
>> Thanks, I think my questions is:
>>
>> >>why not generate your SQL output directly into your oracle server as a file
>>
>> What type of file is this?
>>
>>
>
> a file in a format that you can then import into SOLR.
>
> _
> {Beto|Norberto|Numard} Meijome
>
> "Gravity cannot be blamed for people falling in love."
>  Albert Einstein
>
> I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
> Reading disclaimers makes you go blind. Writing them is worse. You have been 
> Warned.
>


RE: Is there any other way to load the index beside using "http" connection?

2009-07-07 Thread Francis Yakin
 With
curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv&stream.contentType=text/plain;charset=utf-8'

No errors now.

But , how can  I verify if the update happening?

Thanks

Francis

-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: Tuesday, July 07, 2009 10:37 AM
To: 'solr-user@lucene.apache.org'; 'yo...@lucidimagination.com'
Cc: Norberto Meijome
Subject: RE: Is there any other way to load the index beside using "http" 
connection?


I did try:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv"&"stream.contentType=text/plain;charset=utf-8'

It doesn't work

Francis

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Tuesday, July 07, 2009 4:59 AM
To: solr-user@lucene.apache.org
Cc: Norberto Meijome
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

Look at the error - it's bash (your command line shell) complaining.
The '&' terminates one command and puts it in the background.
Surrounding the command with quotes will get you one step closer:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv&stream.contentType=text/plain;charset=utf-8'

-Yonik
http://www.lucidimagination.com



On Mon, Jul 6, 2009 at 2:11 PM, Francis Yakin wrote:
>
> Ok, I have a CSV file(called it test.csv) from database.
>
> When I tried to upload this file to solr using this cmd, I got 
> "stream.contentType=text/plain: No such file or directory" error
>
> curl 
> http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv&stream.contentType=text/plain;charset=utf-8
>
> -bash: stream.contentType=text/plain: No such file or directory
>  undefined field cat
>
> What did I do wrong?
>
> Francis
>
> -Original Message-
> From: Norberto Meijome [mailto:numard...@gmail.com]
> Sent: Monday, July 06, 2009 11:01 AM
> To: Francis Yakin
> Cc: solr-user@lucene.apache.org
> Subject: Re: Is there any other way to load the index beside using "http" 
> connection?
>
> On Mon, 6 Jul 2009 09:56:03 -0700
> Francis Yakin  wrote:
>
>>  Norberto,
>>
>> Thanks, I think my questions is:
>>
>> >>why not generate your SQL output directly into your oracle server as a file
>>
>> What type of file is this?
>>
>>
>
> a file in a format that you can then import into SOLR.
>
> _
> {Beto|Norberto|Numard} Meijome
>
> "Gravity cannot be blamed for people falling in love."
>  Albert Einstein
>
> I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
> Reading disclaimers makes you go blind. Writing them is worse. You have been 
> Warned.
>


RE: Is there any other way to load the index beside using "http" connection?

2009-07-07 Thread Francis Yakin
 yeah, It works now.

How can I verify if the new CSV file get uploaded?

Thanks

Francis

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Tuesday, July 07, 2009 10:49 AM
To: solr-user@lucene.apache.org
Cc: Norberto Meijome
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

The double quotes around the ampersand don't belong there.
I think that UTF8 should also be the default, so the following should also work:

curl 
'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv'

-Yonik
http://www.lucidimagination.com

On Tue, Jul 7, 2009 at 1:37 PM, Francis Yakin wrote:
>
> I did try:
>
> curl 
> 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv"&"stream.contentType=text/plain;charset=utf-8'
>
> It doesn't work
>
> Francis
>
> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
> Sent: Tuesday, July 07, 2009 4:59 AM
> To: solr-user@lucene.apache.org
> Cc: Norberto Meijome
> Subject: Re: Is there any other way to load the index beside using "http" 
> connection?
>
> Look at the error - it's bash (your command line shell) complaining.
> The '&' terminates one command and puts it in the background.
> Surrounding the command with quotes will get you one step closer:
>
> curl 
> 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv&stream.contentType=text/plain;charset=utf-8'
>
> -Yonik
> http://www.lucidimagination.com


RE: Is there any other way to load the index beside using "http" connection?

2009-07-07 Thread Francis Yakin
 Norberto,

You said last week:

"why not generate your SQL output directly into your oracle server as a file,
upload the file to your SOLR server? Then the data file is local to your SOLR
server , you will bypass any WAN and firewall you may be having. (or some
variation of it, sql -> SOLR server as file, etc..)"

I think this is the best solution that we are going to without changing too 
much on our setup.

Like said we have file name "test.xml" which come from SQL output , we put it 
locally on the solr server under "/opt/test.xml"

So, I need to execute the commands from solr system to add and update this to 
the solr data/indexes.

What commands do I have to use, for example the xml file named" /opt/test.xml" ?


Thanks

Francis


-Original Message-
From: Norberto Meijome [mailto:numard...@gmail.com]
Sent: Sunday, July 05, 2009 3:57 AM
To: Francis Yakin
Cc: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using "http" 
connection?

On Thu, 2 Jul 2009 11:02:28 -0700
Francis Yakin  wrote:

> Norberto, Thanks for your input.
>
> What do you mean with "Have you tried connecting to  SOLR over HTTP from
> localhost, therefore avoiding any firewall issues and network latency ? it
> should work a LOT faster than from a remote site." ?
>
>
> Here are how our servers lay out:
>
> 1) Database ( Oracle ) is running on separate machine
> 2) Solr master is running on separate machine by itself
> 3) 6 solr slaves ( these 6 pulll the index from master using rsync)
>
> We have a SQL(Oracle) script to post the data/index from Oracle Database
> machine to Solr Master over http. We wrote those script(Someone in Oracle
> Database administrator write it).

You said in your other email you are having issues with slow transfers between
1) and 2). Your subject relates to the data transfer between 1) and 2, - 2) and
3) is irrelevant to this part.

My question (what you quoted above) relates to the point you made about it
being slow ( WHY is it slow?), and issues with opening so many connections
through firewall. so, I'll rephrase my question (see below...)

[]
>
> We can not do localhost since it's solr is not running on Oracle machine.

why not generate your SQL output directly into your oracle server as a file,
upload the file to your SOLR server? Then the data file is local to your SOLR
server , you will bypass any WAN and firewall you may be having. (or some
variation of it, sql -> SOLR server as file, etc..)

Any speed issues that are rooted in the fact that you are posting via
HTTP (vs embedded solr or DIH) aren't going to go away. But it's the simpler
approach without changing too much of your current setup.


> Another alternative that we think of is to transform XML into CSV and
> import/export it.
>
> How about if LUSQL, some mentioned about this? Is this apps free(open source)
> application? Do you have any experience with this apps?

Not i, sorry.

Have you looked into DIH? It's designed for this kind of work.

B
_
{Beto|Norberto|Numard} Meijome

"Great spirits have often encountered violent opposition from mediocre minds."
  Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Updating Solr index from XML files

2009-07-07 Thread Francis Yakin

I have the following "curl" cmd to update and doing commit to Solr ( I have 10 
xml files just for testing)

 curl http://solr00:7001/solr/update --data-binary @xml_Artist-100170.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101062.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101238.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101400.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101513.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101517.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101572.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101691.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101694.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @xml_Artist-101698.txt -H 
'Content-type:text/plain; charset=utf-8'
 curl http://solr00:7001/solr/update --data-binary @commit.txt -H 
'Content-type:text/plain; charset=utf-8'

It works so far. But I will have  3 xml files.

What's the efficient way to do these things? I can script it with for loop 
using regular shell script or perl.

I am also looking into solr.pm from this:

http://wiki.apache.org/solr/IntegratingSolr

BTW: We are using weblogic to deploy the solr.war and by default solr in 
weblogic using port 7001, but not 8983.

Thanks

Francis




RE: Updating Solr index from XML files

2009-07-07 Thread Francis Yakin
 Otis,

What is the difference or advantage if using solr.pm?

http://search.cpan.org/~garafola/Solr-0.03/lib/Solr.pm

Thanks

Francis


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Tuesday, July 07, 2009 10:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Updating Solr index from XML files


If Perl is you choice:
http://search.cpan.org/~bricas/WebService-Solr-0.07/lib/WebService/Solr.pm

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Francis Yakin 
> To: "solr-user@lucene.apache.org" 
> Sent: Wednesday, July 8, 2009 1:16:04 AM
> Subject: Updating Solr index from XML files
>
>
> I have the following "curl" cmd to update and doing commit to Solr ( I have 10
> xml files just for testing)
>
> curl http://solr00:7001/solr/update --data-binary @xml_Artist-100170.txt -H
> 'Content-type:text/plain; charset=utf-8'
> curl http://solr00:7001/solr/update --data-binary @xml_Artist-101062.txt -H
> 'Content-type:text/plain; charset=utf-8'
> curl http://solr00:7001/solr/update --data-binary @xml_Artist-101238.txt -H
> 'Content-type:text/plain; charset=utf-8'
> curl http://solr00:7001/solr/update --data-binary @xml_Artist-101400.txt -H
> 'Content-type:text/plain; charset=utf-8'
> curl http://solr00:7001/solr/update --data-binary @xml_Artist-101513.txt -H
> 'Content-type:text/plain; charset=utf-8'
> curl http://solr00:7001/solr/update --data-binary @xml_Artist-101517.txt -H
> 'Content-type:text/plain; charset=utf-8'
> curl http://solr00:7001/solr/update --data-binary @xml_Artist-101572.txt -H
> 'Content-type:text/plain; charset=utf-8'
> curl http://solr00:7001/solr/update --data-binary @xml_Artist-101691.txt -H
> 'Content-type:text/plain; charset=utf-8'
> curl http://solr00:7001/solr/update --data-binary @xml_Artist-101694.txt -H
> 'Content-type:text/plain; charset=utf-8'
> curl http://solr00:7001/solr/update --data-binary @xml_Artist-101698.txt -H
> 'Content-type:text/plain; charset=utf-8'
> curl http://solr00:7001/solr/update --data-binary @commit.txt -H
> 'Content-type:text/plain; charset=utf-8'
>
> It works so far. But I will have  3 xml files.
>
> What's the efficient way to do these things? I can script it with for loop 
> using
> regular shell script or perl.
>
> I am also looking into solr.pm from this:
>
> http://wiki.apache.org/solr/IntegratingSolr
>
> BTW: We are using weblogic to deploy the solr.war and by default solr in
> weblogic using port 7001, but not 8983.
>
> Thanks
>
> Francis



Using curl comparing with using WebService::Solr

2009-07-09 Thread Francis Yakin

I have about 1000 folders, each folder consist 2581 xml files. Total of xml 
files is ~ 2.6 millions

I developed perl script, inside my script it's executed this cmd:

 curl http://localhost:7001/solr/update --data-binary "@0039000.xml" -H 
'Content-type:text/plain; charset=utf-8'

It tooks me about 4 1/2 hrs to load and commit.

I would like to know the advantages using curl to posting/add/update the xml 
files to solr comparing with using WebService::Solr module?

Is using WebService::Solr faster?

The XML files are local on the Solr Master box, so I posting it locally( not 
using wan or lan).

Any input will be much appreciated.

Thanks

Francis




RE: Using curl comparing with using WebService::Solr

2009-07-09 Thread Francis Yakin
Yes, the xml files are in complete add format.

This is my code:

#!/usr/bin/perl


   if (($#ARGV + 1) <= 0 ) {
print "Usage: perl prod.pl  \n\n";
exit(1);
   }


## -- CHANGE accordingly
   $timeout = 300;
   $topdir = "/opt/Test/xml-file/";
   #$topdir = "/opt/Test/";
   $dir = $topdir . $ARGV[0];
   $commit_dir = "/opt/commit";

#


   $curl="/usr/bin/curl";
   print "Loading xml files in $dir in progress \n";
   opendir(BIN, $dir) or die "Can't open $dir: $!";
   $commitCmd = '(cd ' . $commit_dir . '; ' . '/usr/bin/curl 
http://localhost:7001/solr/update --data-binary @commit.txt -H 
\'Content-type:text/plain; charset=utf-8\')';



   while( defined ($file = readdir BIN) ) {

   next if $file =~ /^\./;
   $insertCmd = '(cd ' . $dir   . '; ' . '/usr/bin/curl 
http://localhost:7001/solr/update --data-binary @' .  $file . ' -H 
\'Content-type:text/plain; charset=utf-8\')';

   system($insertCmd);


  }

  system($commitCmd);
closedir(BIN);

Thanks

Francis

-Original Message-
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
Paul ??? ??
Sent: Thursday, July 09, 2009 10:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Using curl comparing with using WebService::Solr

are these xml files in the solr add xml format?

When you post using curl, I guess it opens as many http connections as
there are files. if you can write a small program to post all these
files in one request, you should be able to get better perf.

the following can be the pseudo-code

open connection
write ""
for each file
  write filecontent
write ""
close connection




On Fri, Jul 10, 2009 at 10:23 AM, Francis Yakin wrote:
>
> I have about 1000 folders, each folder consist 2581 xml files. Total of xml 
> files is ~ 2.6 millions
>
> I developed perl script, inside my script it's executed this cmd:
>
>  curl http://localhost:7001/solr/update --data-binary "@0039000.xml" -H 
> 'Content-type:text/plain; charset=utf-8'
>
> It tooks me about 4 1/2 hrs to load and commit.
>
> I would like to know the advantages using curl to posting/add/update the xml 
> files to solr comparing with using WebService::Solr module?
>
> Is using WebService::Solr faster?
>
> The XML files are local on the Solr Master box, so I posting it locally( not 
> using wan or lan).
>
> Any input will be much appreciated.
>
> Thanks
>
> Francis
>
>
>



--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


RE: Using curl comparing with using WebService::Solr

2009-07-09 Thread Francis Yakin
I also commit too many I guess, since we have 1000 folders, so each loop will 
executed the load and commit.
So 1000 loops with 1000 commits. I think it will be help if I only commit once 
after the 1000 loops completed.

Any inputs?

Thhanks

Francis


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: Thursday, July 09, 2009 11:13 PM
To: 'solr-user@lucene.apache.org'; 'noble.p...@gmail.com'
Subject: RE: Using curl comparing with using WebService::Solr

Yes, the xml files are in complete add format.

This is my code:

#!/usr/bin/perl


   if (($#ARGV + 1) <= 0 ) {
print "Usage: perl prod.pl  \n\n";
exit(1);
   }


## -- CHANGE accordingly
   $timeout = 300;
   $topdir = "/opt/Test/xml-file/";
   #$topdir = "/opt/Test/";
   $dir = $topdir . $ARGV[0];
   $commit_dir = "/opt/commit";

#


   $curl="/usr/bin/curl";
   print "Loading xml files in $dir in progress \n";
   opendir(BIN, $dir) or die "Can't open $dir: $!";
   $commitCmd = '(cd ' . $commit_dir . '; ' . '/usr/bin/curl 
http://localhost:7001/solr/update --data-binary @commit.txt -H 
\'Content-type:text/plain; charset=utf-8\')';



   while( defined ($file = readdir BIN) ) {

   next if $file =~ /^\./;
   $insertCmd = '(cd ' . $dir   . '; ' . '/usr/bin/curl 
http://localhost:7001/solr/update --data-binary @' .  $file . ' -H 
\'Content-type:text/plain; charset=utf-8\')';

   system($insertCmd);


  }

  system($commitCmd);
closedir(BIN);

Thanks

Francis

-Original Message-
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
Paul ??? ??
Sent: Thursday, July 09, 2009 10:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Using curl comparing with using WebService::Solr

are these xml files in the solr add xml format?

When you post using curl, I guess it opens as many http connections as
there are files. if you can write a small program to post all these
files in one request, you should be able to get better perf.

the following can be the pseudo-code

open connection
write ""
for each file
  write filecontent
write ""
close connection




On Fri, Jul 10, 2009 at 10:23 AM, Francis Yakin wrote:
>
> I have about 1000 folders, each folder consist 2581 xml files. Total of xml 
> files is ~ 2.6 millions
>
> I developed perl script, inside my script it's executed this cmd:
>
>  curl http://localhost:7001/solr/update --data-binary "@0039000.xml" -H 
> 'Content-type:text/plain; charset=utf-8'
>
> It tooks me about 4 1/2 hrs to load and commit.
>
> I would like to know the advantages using curl to posting/add/update the xml 
> files to solr comparing with using WebService::Solr module?
>
> Is using WebService::Solr faster?
>
> The XML files are local on the Solr Master box, so I posting it locally( not 
> using wan or lan).
>
> Any input will be much appreciated.
>
> Thanks
>
> Francis
>
>
>



--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


RE: Using curl comparing with using WebService::Solr

2009-07-10 Thread Francis Yakin
How you batching all documents in one curl call? Do you have a sample, so I can 
modify my script and try it again.

Right now I do curl on each documents( I have 1000 docs on each folder and I 
have 1000 folders) using :

 curl http://localhost:7001/solr/update --data-binary @abc.xml -H 
'Content-type:text/plain; charset=utf-8'

Abc.xml is one doc, we have another 999 files ending with ".xml"

Please advice.

Thanks

Francis

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
Sent: Friday, July 10, 2009 12:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Using curl comparing with using WebService::Solr

On Fri, Jul 10, 2009 at 11:50 AM, Francis Yakin  wrote:

> I also commit too many I guess, since we have 1000 folders, so each loop
> will executed the load and commit.
> So 1000 loops with 1000 commits. I think it will be help if I only commit
> once after the 1000 loops completed.
>
> Any inputs?
>

Commit only when you must show changes to searchers. It is best to commit
once you are done with all documents. Also, batching documents in one curl
call will also help a lot (save on http overhead).

--
Regards,
Shalin Shekhar Mangar.


Segments_2 and segments.gen under Index folder and spellchecker1, spellchecker2, spellcheckerFile folder

2009-07-14 Thread Francis Yakin

I just upgraded our solr to 1.3.0

After I deployed the solr apps, I noticed there are:

Segments_2 and segments.gen and there are 3 folder spellchecker1, spellchecker2 
and spellcheckerFile

What's these for? When I deleted them, I need bounce the apps again and it will 
generate the new ones again.

Thanks

Francis



Synonyms.txt and index_synonyms.txt

2009-07-21 Thread Francis Yakin

Do you anyone the differences between these two?

>From the schema.xml

We have:


  






 
  
  


 




  


Do you know if we need both of them for search to be working?

Thanks

Francis