Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 10:24 AM, Bernadette Houghton <
bernadette.hough...@deakin.edu.au> wrote:

> We have an encoding problem with our solr application. That is, non-ASCII
> chars displaying fine in SOLR, but in googledegook in our application .
>
> Our tomcat server.xml file already contains URIencoding="UTF-8" under the
> relevant .
>
> A google search reveals that I should set the encoding for the JVM, but
> have no idea how to do this. I'm running Windows, and there is no tomcat
> process in my Windows Services.
>

Add the following parameter to the JVM:

-Dfile.encoding=UTF-8

-- 
Regards,
Shalin Shekhar Mangar.


RE: encoding problem

2009-08-26 Thread Bernadette Houghton
Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access 
the JVM???

Regards
Bern


-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, 26 August 2009 5:10 PM
To: solr-user@lucene.apache.org
Subject: Re: encoding problem

On Wed, Aug 26, 2009 at 10:24 AM, Bernadette Houghton <
bernadette.hough...@deakin.edu.au> wrote:

> We have an encoding problem with our solr application. That is, non-ASCII
> chars displaying fine in SOLR, but in googledegook in our application .
>
> Our tomcat server.xml file already contains URIencoding="UTF-8" under the
> relevant .
>
> A google search reveals that I should set the encoding for the JVM, but
> have no idea how to do this. I'm running Windows, and there is no tomcat
> process in my Windows Services.
>

Add the following parameter to the JVM:

-Dfile.encoding=UTF-8

-- 
Regards,
Shalin Shekhar Mangar.


Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 12:42 PM, Bernadette Houghton <
bernadette.hough...@deakin.edu.au> wrote:

> Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I
> access the JVM???
>

When you execute the java executable, just add -Dfile.encoding=UTF-8 as a
command line argument to the executable.

How are you consuming Solr? You mentioned there is no tomcat, is your solr
client a desktop java application?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Exact word search

2009-08-26 Thread Shalin Shekhar Mangar
On Tue, Aug 25, 2009 at 10:40 AM, bhaskar chandrasekar  wrote:

> Hi,
>
> Can any one helpe me with the below scenario?.
>
> Scenario 1:
>
> Assume that I give Google as input string
> i am using Carrot with Solr
> Carrot is for front end display purpose


It seems like Carrot is the one making the queries to Solr? In that case,
this question may be better suited for carrot users/developers.


>
> the issue is
> Assuming i give "BHASKAR" as input string
> It should give me search results pertaining to BHASKAR only.
>  Select * from MASTER where name ="Bhaskar";
>  Example:It should not display search results as "ChandarBhaskar" or
>  "BhaskarC".
>  Should display Bhaskar only.
>


That is easy with Solr, make a query like field-name:"Bhaskar". Make sure
that field name is not tokenized i.e. string type in schema.xml


>
> Scenario 2:
>  Select * from MASTER where name like "%BHASKAR%";
>  It should display records containing the word BHASKAR
>  Ex: Bhaskar
> ChandarBhaskar
>  BhaskarC
>  Bhaskarabc
>

Leading wildcards are not supported. However there are alternate ways of
doing it.

Create two fields, keep one as a normal string type and use a
KeywordTokenizer and ReverseFilter on the other. Make one field a copyField
of the other. Perform a prefix search on both fields.

-- 
Regards,
Shalin Shekhar Mangar.


RE: encoding problem

2009-08-26 Thread Bernadette Houghton
Thanks for your quick reply, Shalin.

Tomcat is running on my Windows machine, but does not appear in Windows 
Services (as I was expecting it should ... am I wrong?). I'm running it from a 
startup.bat on my desktop - see below. Do I add the Dfile line to the 
startup.bat?

SOLR is part of the repository software that we are running.

Thanks!

BERN

Startup.bat -
@echo off
if "%OS%" == "Windows_NT" setlocal
rem ---
rem Start script for the CATALINA Server
rem
rem $Id: startup.bat 302918 2004-05-27 18:25:11Z yoavs $
rem ---

rem Guess CATALINA_HOME if not defined
set CURRENT_DIR=%cd%
if not "%CATALINA_HOME%" == "" goto gotHome
set CATALINA_HOME=%CURRENT_DIR%
if exist "%CATALINA_HOME%\bin\catalina.bat" goto okHome
cd ..
set CATALINA_HOME=%cd%
cd %CURRENT_DIR%
:gotHome
if exist "%CATALINA_HOME%\bin\catalina.bat" goto okHome
echo The CATALINA_HOME environment variable is not defined correctly
echo This environment variable is needed to run this program
goto end
:okHome

set EXECUTABLE=%CATALINA_HOME%\bin\catalina.bat

rem Check that target executable exists
if exist "%EXECUTABLE%" goto okExec
echo Cannot find %EXECUTABLE%
echo This file is needed to run this program
goto end
:okExec

rem Get remaining unshifted command line arguments and save them in the
set CMD_LINE_ARGS=
:setArgs
if ""%1""== goto doneSetArgs
set CMD_LINE_ARGS=%CMD_LINE_ARGS% %1
shift
goto setArgs
:doneSetArgs

call "%EXECUTABLE%" start %CMD_LINE_ARGS%

:end





Re: shingle filter

2009-08-26 Thread Shalin Shekhar Mangar
On Tue, Aug 25, 2009 at 4:24 AM, Joe Calderon wrote:

> hello *, im currently faceting on a shingled field to obtain popular
> phrases and its working well, however ide like to limit the number of
> shingles that get created, the solr.ShingleFilterFactory supports
> maxShingleSize, can it be made to support a minimum as well? can
> someone point me in the right direction?
>

There is only maxShingleSize right now. The other configurable attribute is
outputUnigrams which controls whether or not unigrams may be added to the
index.

If you want to add support for minimum size, I think you can make the
changes in ShingleFilter.fillShingleBuffer(). Create an issue in jira and
someone who knows more about shingles can help out.

-- 
Regards,
Shalin Shekhar Mangar.


Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton <
bernadette.hough...@deakin.edu.au> wrote:

> Thanks for your quick reply, Shalin.
>
> Tomcat is running on my Windows machine, but does not appear in Windows
> Services (as I was expecting it should ... am I wrong?). I'm running it from
> a startup.bat on my desktop - see below. Do I add the Dfile line to the
> startup.bat?
>
> SOLR is part of the repository software that we are running.
>

Tomcat respects an environment variable called JAVA_OPTS through which you
can pass any jvm argument (e.g. heap size, file encoding). Set
JAVA_OPTS="-Dfile.encoding=UTF-8" either through the GUI or by adding the
following to startup.bat:

set JAVA_OPTS="-Dfile.encoding=UTF-8"

-- 
Regards,
Shalin Shekhar Mangar.


Re: solr 1.4: extending StatsComponent to recognize localparm {!ex}

2009-08-26 Thread Britske

Thanks for that. 
it works now ;-) 


Erik Hatcher-4 wrote:
> 
> 
> On Aug 25, 2009, at 6:35 PM, Britske wrote:
>> Moreover, I can't seem to find the actual code in FacetComponent or  
>> anywhere
>> else for that matter where the {!ex}-param case is treated. I assume  
>> it's in
>> FacetComponent.refineFacets but I can't seem to get a grip on it..  
>> Perhaps
>> it's late here..
>>
>> So, somone care to shed a light on how this might be done? (I only  
>> need some
>> general directions I hope..)
> 
> It's in SimpleFacets, that does a call to QueryParsing.getLocalParams().
> 
>   Erik
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/solr-1.4%3A-extending-StatsComponent-to-recognize-localparm-%7B%21ex%7D-tp25143428p25148403.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Create new core from existing

2009-08-26 Thread Noble Paul നോബിള്‍ नोब्ळ्
check this http://wiki.apache.org/solr/CoreAdmin

when you create a core you are allowed to use the same instance dir as
the old core just ensure that you give a different datadir

On Wed, Aug 26, 2009 at 3:05 PM, pavan kumar
donepudi wrote:
> Paul,
> Can you please guide me on which option i need to use to do this and if
> possible any sample or a wiki link.
> Thanks & Regard's,
> Pavan
>
> 2009/8/26 Noble Paul നോബിള്‍ नोब्ळ् 
>>
>> The coreadmin would not copy your data. However, it is possible to
>> create another core using the same config and schema
>>
>> On Wed, Aug 26, 2009 at 1:51 PM, pavan kumar
>> donepudi wrote:
>> > hi everyone   Is there any way to create a new solr core from the
>> > existing
>> > core using CoreAdminHandler,I want the instance directory to be created
>> > by
>> > copying the files from existing core and data directory path can be
>> > provided
>> > through dataDir querystring.
>> >
>> > Regard's,
>> > Pavan
>> >
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: solr nutch url indexing

2009-08-26 Thread Uri Boness

Do you mean the schema or the solrconfig.xml?

The request handler is configured in the solrconfig.xml and you can find 
out more about this particular configuration in 
http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(CategorySolrRequestHandler)|((CategorySolrRequestHandler)). 



To understand the schema better, you can read 
http://wiki.apache.org/solr/SchemaXml


Uri

last...@gmail.com wrote:

Uri Boness wrote:
Well... yes, it's a tool the Nutch ships with. It also ships with an 
example Solr schema which you can use. 


hi,
is there any documentation to understand what going in the schema ?


   
   dismax
   explicit
   0.01
   content0.5 anchor1.0 title5.2
   content0.5 anchor1.5 title5.2 site1.5
   url
   2<-1 5<-2 6<90%
   100
   
   *:*
   title url content
   0
   title
   0
   url
   regex
   




Re: Adding cores dynamically

2009-08-26 Thread Licinio Fernández Maurelo
These are the reasons why we are thinking on splitting and index via multi-core:

First of all all, we have an index of news which size is about 9G. As
we will keep aggregating news forever and ever and let users do free
text search on our system, we think that it will be easier for IT
crowd to manage fixed size indexes (read-only indexes) giving
flexibility to the plattform (i'm wondering how much performance will
we lose if read-only indexes live in NFS).

Secondly, we plan to store date ranges per core, then, when a
federated search is made it filter the cores to query on (we plan to
install multiple solr servers as the info growth)

2009/8/26 Chris Hostetter :
>
> : 1) We found the indexing speed starts dipping once the index grow to a
> : certain size - in our case around 50G. We don't optimize, but we have
> : to maintain a consistent index speed. The only way we could do that
> : was keep creating new cores (on the same box, though we do use
>
> Hmmm... it seems like ConcurrentMergeScheduler should make it possible to
> maintain semi-constant indexing speed by doing merges in background
> threads ... the only other issue would be making sure that an individual
> segment never got too big ... but that seems like it should be managable
> with the config options
>
> (i'm just hypothisizing, i don't normally worry about indexes of this
> size, and when i do i'm not incrementally adding to them as time goes one
> ... i guess what i'm asking is if you guys ever looked into these ideas
> and dissmissed them for some reason)
>
> : 2) Be able to drop the whole core for pruning purposes. We didn't want
>
> that makes a lot of sense ... removing older cores is on of the only
> reaosns i could think of for this model to really make a lot of sense for
> performance reasons.
>
> : > One problem is the IT logistics of handling the file set. At 200 million
> : > records you have at least 20G of data in one Lucene index. It takes hours 
> to
> : > optimize this, and 10s of minutes to copy the optimized index around to
> : > query servers.
>
> i get that full optimizes become ridiculous at that point, but you could
> still do partial optimizes ... and isn't the total disk space with this
> strategy still the same?  Aren't you still ultimately copying the same
> amout of data arround?
>
>
>
> -Hoss
>
>



-- 
Lici


HTML decoder is splitting tokens

2009-08-26 Thread Anders Melchiorsen
Hi.

When indexing the string "Günther" with
HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens,
"Gü" and "nther".

Is this a bug, or am I doing something wrong?

(Using a Solr nightly from 2009-05-29)


Anders.




Reason to change the xml files in solr

2009-08-26 Thread Tamilselvi

For the installation of apache solr integration module in Drupal we need to
install solr. 

The must do thing is we need to change the solr schema.xml and configure.xml
files with the files in apache solr integration module. 

can any body explain the reason behind this change. 
-- 
View this message in context: 
http://www.nabble.com/Reason-to-change-the-xml-files-in-solr-tp25151354p25151354.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: encoding problem

2009-08-26 Thread Fuad Efendi
If you are complaining about Web Application (other than SOLR) (probably
behind-the Apache HTTPD) having encoding problem - try to troubleshoot it
with Mozilla Firefox + Live Http Headers plugin.


Look at "Content-Encoding" HTTP response headers, and don't forget about
 tag inside HTML... 


-Fuad
http://www.tokenizer.org



-Original Message-
From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
Sent: August-26-09 12:55 AM
To: 'solr-user@lucene.apache.org'
Subject: encoding problem 

We have an encoding problem with our solr application. That is, non-ASCII
chars displaying fine in SOLR, but in googledegook in our application .

Our tomcat server.xml file already contains URIencoding="UTF-8" under the
relevant .

A google search reveals that I should set the encoding for the JVM, but have
no idea how to do this. I'm running Windows, and there is no tomcat process
in my Windows Services.

TIA

Bernadette Houghton, Library Business Applications Developer
Deakin University Geelong Victoria 3217 Australia.
Phone: 03 5227 8230 International: +61 3 5227 8230
Fax: 03 5227 8000 International: +61 3 5227 8000
MSN: bern_hough...@hotmail.com
Email:
bernadette.hough...@deakin.edu.au
Website: http://www.deakin.edu.au
Deakin University CRICOS Provider Code 00113B
(Vic)

Important Notice: The contents of this email are intended solely for the
named addressee and are confidential; any unauthorised use, reproduction or
storage of the contents is expressly prohibited. If you have received this
email in error, please delete it and any attachments immediately and advise
the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are
error or virus free





What makes a function query count as a match or not?

2009-08-26 Thread Christophe Biocca
I haven't been able to find what makes a function query count as a match
when used a part of a boolean query with Occur.MUST.
A Term query is simple, if the term is not found, it doesn't count as a
match. What's the equivalent for a function query? A score of zero (or less
than zero, as implied by the source code for explain in lucene's boolean
query?). Something else?


Re: HTML decoder is splitting tokens

2009-08-26 Thread Koji Sekiguchi

Hi Anders,

Sorry, I don't know this is a bug or a feature, but
I'd like to show an alternate way if you'd like.

In Solr trunk, HTMLStripWhitespaceTokenizerFactory is
marked as deprecated. Instead, HTMLStripCharFilterFactory and
an arbitrary TokenizerFactory are encouraged to use.
And I'd recommend you to use MappingCharFilterFactory
to convert character references to real characters.
That is, you have:


 
   mapping="mapping.txt"/>

   
   
 


where the contents of mapping.txt:

"ü" => "ü"
"ä" => "ä"
"ï" => "ï"
"ë" => "ë"
"ö" => "ö"
   : :

Then run analysis.jsp and see the result.

Thank you,

Koji


Anders Melchiorsen wrote:

Hi.

When indexing the string "Günther" with
HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens,
"Gü" and "nther".

Is this a bug, or am I doing something wrong?

(Using a Solr nightly from 2009-05-29)


Anders.



  




Solr admin url for example gives 404

2009-08-26 Thread Burton-West, Tom
Hello all,

When I start up Solr from the example directory using start.jar, it seems to 
start up, but when I go to the localhost admin url 
(http://localhost:8983/solr/admin) I get a 404 (See message appended below).  
Has the url for the Solr admin changed?


Tom
Tom Burton-West
---
Here is the message I get with the 404:


HTTP ERROR: 404 NOT_FOUND RequestURI=/solr/admin Powered by 
jetty://
Steps to reproduce the problems:

1 get the latest Solr from svn (R 808058)
2 run ant clean test   (all tests pass)
3 cd ./example
4. start solr
$ java -jar start.jar
2009-08-26 12:08:08.300::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2009-08-26 12:08:08.472::INFO:  jetty-6.1.3
2009-08-26 12:08:08.519::INFO:  Started SocketConnector @ 0.0.0.0:8983
5. go to browser and try to look at admin panel: 
http://localhost:8983/solr/admin



Re: Solr admin url for example gives 404

2009-08-26 Thread Rafał Kuć
Hello!

   Try running ant example and then run Solr.

-- 
Regards,
 Rafał Kuć


> Hello all,

> When I start up Solr from the example directory using start.jar, it
> seems to start up, but when I go to the localhost admin url
> (http://localhost:8983/solr/admin) I get a 404 (See message appended
> below).  Has the url for the Solr admin changed?


> Tom
> Tom Burton-West
> ---
> Here is the message I get with the 404:


> HTTP ERROR: 404 NOT_FOUND RequestURI=/solr/admin Powered by
> jetty://
> Steps to reproduce the problems:

> 1 get the latest Solr from svn (R 808058)
> 2 run ant clean test   (all tests pass)
> 3 cd ./example
> 4. start solr
> $ java -jar start.jar
> 2009-08-26 12:08:08.300::INFO:  Logging to STDERR via 
> org.mortbay.log.StdErrLog
> 2009-08-26 12:08:08.472::INFO:  jetty-6.1.3
> 2009-08-26 12:08:08.519::INFO:  Started SocketConnector @ 0.0.0.0:8983
> 5. go to browser and try to look at admin panel: 
> http://localhost:8983/solr/admin







JDWP Error

2009-08-26 Thread Licinio Fernández Maurelo
The servlet container (resin) where i deploy solr shows :

ERROR: transport error 202: bind failed: Address already in
use

ERROR: JDWP Transport dt_socket failed to initialize,
TRANSPORT_INIT(510)

JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports
initialized
[../../../src/share/back/debugInit.c:690]

FATAL ERROR in native method: JDWP No transports initialized,
jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

ERROR: transport error 202: bind failed: Address already in
use

ERROR: JDWP Transport dt_socket failed to initialize,
TRANSPORT_INIT(510)

JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports
initialized
[../../../src/share/back/debugInit.c:690]

FATAL ERROR in native method: JDWP No transports initialized,
jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)


then, when we want to stop resin it doesn't works, any advice?

thx

-- 
Lici


SolrJ and Solr web simultaneously?

2009-08-26 Thread Paul Tomblin
Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


-- 
http://www.linkedin.com/in/paultomblin


Re: SolrJ and Solr web simultaneously?

2009-08-26 Thread Smiley, David W.
Once a commit occurs, all data added before it (by any & all clients) becomes 
visible to all searches henceforth.

The "web interface" has direct access to Solr, and SolrJ remotely accesses that 
Solr.

SolrEmbeddedSolrServer is something that few people should actually use.  It's 
mostly for embedding Solr without running Solr as a server, which is a somewhat 
rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, "Paul Tomblin"  wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin



RE: JDWP Error

2009-08-26 Thread Fuad Efendi

JDPA/JDWP are for remote debugging of SUN JVM...
It shouldn't be SOLR related... check configs of Resin...
-Fuad
http://www.tokenizer.org



-Original Message-
From: Licinio Fernández Maurelo [mailto:licinio.fernan...@gmail.com] 
Sent: August-26-09 12:49 PM
To: solr-user@lucene.apache.org
Subject: JDWP Error

The servlet container (resin) where i deploy solr shows :

ERROR: transport error 202: bind failed: Address already in
use

ERROR: JDWP Transport dt_socket failed to initialize,
TRANSPORT_INIT(510)

JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports
initialized
[../../../src/share/back/debugInit.c:690]

FATAL ERROR in native method: JDWP No transports initialized,
jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

ERROR: transport error 202: bind failed: Address already in
use

ERROR: JDWP Transport dt_socket failed to initialize,
TRANSPORT_INIT(510)

JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports
initialized
[../../../src/share/back/debugInit.c:690]

FATAL ERROR in native method: JDWP No transports initialized,
jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)


then, when we want to stop resin it doesn't works, any advice?

thx

-- 
Lici




Pattern matching in Solr

2009-08-26 Thread bhaskar chandrasekar
Hi,
 
Can any one help me with the below scenario?.
 
Scenario 1:
 
Assume that I give Google as input string 
i am using Carrot with Solr 
Carrot is for front end display purpose 
the issue is 
Assuming i give "BHASKAR" as input string 
It should give me search results pertaining to BHASKAR only.
 Select * from MASTER where name ="Bhaskar";
 Example:It should not display search results as "ChandarBhaskar" or
 "BhaskarC".
 Should display Bhaskar only.
 
Scenario 2:
 Select * from MASTER where name like "%BHASKAR%";
 It should display records containing the word BHASKAR
 Ex: Bhaskar
ChandarBhaskar
 BhaskarC
 Bhaskarabc

 How to achieve Scenario 1 in Solr ?.


 
Regards
Bhaskar



  

RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin
I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer 
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The Master 
only taking the new index from Database and slaves will pull the new index 
using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master 
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any & all clients) becomes 
visible to all searches henceforth.

The "web interface" has direct access to Solr, and SolrJ remotely accesses that 
Solr.

SolrEmbeddedSolrServer is something that few people should actually use.  It's 
mostly for embedding Solr without running Solr as a server, which is a somewhat 
rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, "Paul Tomblin"  wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin



Re: SolrJ and Solr web simultaneously?

2009-08-26 Thread Smiley, David W.
You could implement a Data Import Handler "EntityProcessor".  There are at 
least 5 implementations I can see for you to learn from that come with Solr.  
If Solr truly doesn't need to be up and running as a server to serve any 
queries, then EmbeddedSolrServer will be fine.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:29 PM, "Paul Tomblin"  wrote:

On Wed, Aug 26, 2009 at 1:22 PM, Smiley, David W. wrote:
> SolrEmbeddedSolrServer is something that few people should actually use.
>  It's mostly for embedding Solr without running Solr as a server, which is a
> somewhat rare need.

I have a background app that's inserting thousands of documents into
Solr, and it seems like it would be a lot more efficient to do that
directly instead of handing the documents off to a web server that
otherwise isn't doing anything else.

--
http://www.linkedin.com/in/paultomblin



Re: SolrJ and Solr web simultaneously?

2009-08-26 Thread Smiley, David W.
See my response to Paul Tomblin.  You could use the existing DataImportHandler 
"SqlEntityProcessor" for DB access.  The DIH framework is fairly extensible.

BTW, I wouldn't immediately dismiss using HTTP to give data to Solr just 
because you believe it will be slow without having tried it.  Using SolrJ with 
StreamingUpdateSolrServer configured with multiple threads and using the 
default binary format is pretty darned fast.  Don't knock it till you've tried 
it.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server


On 8/26/09 1:41 PM, "Francis Yakin"  wrote:

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer 
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The Master 
only taking the new index from Database and slaves will pull the new index 
using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master 
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any & all clients) becomes 
visible to all searches henceforth.

The "web interface" has direct access to Solr, and SolrJ remotely accesses that 
Solr.

SolrEmbeddedSolrServer is something that few people should actually use.  It's 
mostly for embedding Solr without running Solr as a server, which is a somewhat 
rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, "Paul Tomblin"  wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin




RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Fuad Efendi
> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com] 
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web simultaneously?

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The
Master only taking the new index from Database and slaves will pull the new
index using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any & all clients)
becomes visible to all searches henceforth.

The "web interface" has direct access to Solr, and SolrJ remotely accesses
that Solr.

SolrEmbeddedSolrServer is something that few people should actually use.
It's mostly for embedding Solr without running Solr as a server, which is a
somewhat rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, "Paul Tomblin"  wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin





Re: SolrJ and Solr web simultaneously?

2009-08-26 Thread Avlesh Singh
>
> Is Solr like a RDBMS in that I can have multiple programs querying and
> updating the index at once, and everybody else will see the updates after a
> commit, or do I have to something explicit to see others updates?
>
Yes, everyone gets to search on an existing index unless writes to the index
(core) are committed. None of the searches would fetch uncommitted data.

Does it matter whether they're using the web interface, SolrJ with a
> CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?
>
Absolutely not. All of these are multiple ways to access the Solr server;
underlying implementation of searching the index and writing to the index
does not change in either case.

Cheers
Avlesh

On Wed, Aug 26, 2009 at 10:44 PM, Paul Tomblin  wrote:

> Is Solr like a RDBMS in that I can have multiple programs querying and
> updating the index at once, and everybody else will see the updates
> after a commit, or do I have to something explicit to see others
> updates?  Does it matter whether they're using the web interface,
> SolrJ with a
> CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?
>
>
> --
> http://www.linkedin.com/in/paultomblin
>


Problem using replication in 8/25/09 nightly build of 1.4

2009-08-26 Thread Ron Ellis
Hi Everyone,

When trying to utilize the new HTTP based replication built into Solr 1.4 I
encounter a problem. When I view the replication admin page on the slave all
of the master values are null i.e. Replicatable Index Version:null,
Generation: null | Latest Index Version:null, Generation: null. Despite
these missing values the two seem to be talking over HTTP successfully (if I
shutdown the master the slave replication page starts exploding with a NPE).

When I hit http://solr/replication?command=indexversion&wt=xml I get the
following...


-

0
13

0
0


However in the admin/replication UI on the master I see...

**
 Index Version: 1250525534711, Generation: 1778
Any idea what I'm doing wrong or how I could begin to diagnose? I am using
the 8/25 nightly build of solr with the example solrconfig.xml provided. The
only modifications to the config have been to uncomment the master/rslave
replication sections and remove the data directory location line so it falls
back to solr.home/data. Also if it's relevant this index was originally
created in solr 1.3.

Thanks,
Ron Ellis


RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin
Thanks.

The issue we have actually, it could be firewall issue more likely than network 
latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial 
load only) and subsequently we actively adding the new docs to Solr after the 
initial load. We prefer to use JDBC connection , so if solrj uses JDBC 
connection that might usefull. I also like the multi-threading option from 
Solrj. So, since we want the solr Master running as server also 
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web simultaneously?

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The
Master only taking the new index from Database and slaves will pull the new
index using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any & all clients)
becomes visible to all searches henceforth.

The "web interface" has direct access to Solr, and SolrJ remotely accesses
that Solr.

SolrEmbeddedSolrServer is something that few people should actually use.
It's mostly for embedding Solr without running Solr as a server, which is a
somewhat rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, "Paul Tomblin"  wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin





Re: Pattern matching in Solr

2009-08-26 Thread Avlesh Singh
You could have used your previous thread itself (
http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr),
Bhaskar.

In your scenario one, you need an exact token match, right? You are getting
expected results if your field type is "text". Look for the
"WordDelimiterFilterFactory" in your field type definition for the text
field inside schema.xml. You'll find an attribute splitOnCaseChange="1".
Because of this, "ChandarBhaskar" is converted into two tokens "Chandra" and
"Bhaskar" and hence the matches. You may choose to remove this attribute if
the behaviour is not desired.

For your scenario two, you may want to look at the KeywordTokenizerFactory
and EdgeNGramFilterFactory on Solr wiki.

Generally, for all such use cases people create multiple fields in their
schema storing the same data analyzed in different ways.

Cheers
Avlesh

On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar  wrote:

> Hi,
>
> Can any one help me with the below scenario?.
>
> Scenario 1:
>
> Assume that I give Google as input string
> i am using Carrot with Solr
> Carrot is for front end display purpose
> the issue is
> Assuming i give "BHASKAR" as input string
> It should give me search results pertaining to BHASKAR only.
>  Select * from MASTER where name ="Bhaskar";
>  Example:It should not display search results as "ChandarBhaskar" or
>  "BhaskarC".
>  Should display Bhaskar only.
>
> Scenario 2:
>  Select * from MASTER where name like "%BHASKAR%";
>  It should display records containing the word BHASKAR
>  Ex: Bhaskar
> ChandarBhaskar
>  BhaskarC
>  Bhaskarabc
>
>  How to achieve Scenario 1 in Solr ?.
>
>
>
> Regards
> Bhaskar
>
>
>
>


Re: Incremental Deletes to Index

2009-08-26 Thread Jason Rutherglen
You'll probably want to call Solr commit, however you'll want to
call IW.flush underneath (via a new Solr commit flag?).

Yes, the Solr caches would be somewhat useless if you're calling
Solr commit/flush rapidly. See SOLR-1308 on improving caches for
NRT.

On Tue, Aug 25, 2009 at 7:22 PM, KaktuChakarabati wrote:
>
> So basically the idea is to replace the underlying IndexReader currently
> associated with a searcher/solrCore following an update without calling
> commit explicitly? This will also have the effect of bringing in inserts
> btw? or is it just usable for deletes?
> In terms of cache invalidation etc there are probably some issues i.e in
> respect to documents which are cached
> as part of some result set or so and need to expunged due to a deletion?
>
>
> Jason Rutherglen-2 wrote:
>>
>> I can give an overview, IW.getReader replaces IR.reopen. So
>> you'd replace in SolrCore.getSearcher. However as per another
>> discussion IW isn't public yet, so all you'd need to do is
>> expose it from UpdateHandler. Then it should work as you want,
>> though there would need to be a new method to create a new
>> searcher from IW.getReader without calling IW.commit.
>>
>> On Tue, Aug 25, 2009 at 4:37 PM, KaktuChakarabati
>> wrote:
>>>
>>> Jason,
>>> sounds like a very promising change to me - so much that I would gladly
>>> work
>>> toward creating a patch myself.
>>> Are there any specific points in the code u could point me to if I wanna
>>> look at how to start off implementing it?
>>> Lucene/Solr Classes involved etc? i'll start looking myself anyhow but
>>> any
>>> tips would be helpful.. :)
>>>
>>> Thanks,
>>> -Chak
>>>
>>>
>>> Jason Rutherglen-2 wrote:

 This will be implemented as you're stating when
 IndexWriter.getReader is incorporated. This will carry over
 deletes in RAM until IW.commit is called (i.e. Solr commit).
 It's a fairly simple change though perhaps too late for 1.4
 release?

 On Tue, Aug 25, 2009 at 3:10 PM, KaktuChakarabati
 wrote:
>
> Hey,
> I was wondering - is there a mechanism in lucene and/or solr to mark a
> document in the index
> as deleted and then have this change reflect in query serving without
> performing the whole
> commit/warmup cycle? this seems to me largely appealing as it allows a
> kind
> of solution
> where deletes are simply processed by marking them in a bitmap or some
> such
> structure
> and then intersecting search results with those on a per-shard basis.
>
> Anything in that direction? Otherwise, is there any critical issue
> preventing such an implementation?
>
> Thanks
> -Chak
> --
> View this message in context:
> http://www.nabble.com/Incremental-Deletes-to-Index-tp25143093p25143093.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Incremental-Deletes-to-Index-tp25143093p25144124.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Incremental-Deletes-to-Index-tp25143093p25145535.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Fuad Efendi
Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves... 

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application. 
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB? 

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com] 
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web simultaneously?

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The
Master only taking the new index from Database and slaves will pull the new
index using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any & all clients)
becomes visible to all searches henceforth.

The "web interface" has direct access to Solr, and SolrJ remotely accesses
that Solr.

SolrEmbeddedSolrServer is something that few people should actually use.
It's mostly for embedding Solr without running Solr as a server, which is a
somewhat rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, "Paul Tomblin"  wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin







${solr.abortOnConfigurationError:false} - does it defaults to false

2009-08-26 Thread djain101

I have one quick question...

If in solrconfig.xml, if it says ...

${solr.abortOnConfigurationError:false}

does it mean  defaults to false if it is not set
as system property?

Thanks,
Dharmveer
-- 
View this message in context: 
http://www.nabble.com/%24%7Bsolr.abortOnConfigurationError%3Afalse%7D---does-it-defaults-to-false-tp25155213p25155213.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ${solr.abortOnConfigurationError:false} - does it defaults to false

2009-08-26 Thread Ryan McKinley


On Aug 26, 2009, at 3:33 PM, djain101 wrote:



I have one quick question...

If in solrconfig.xml, if it says ...

${solr.abortOnConfigurationError:false}abortOnConfigurationError>


does it mean  defaults to false if it is  
not set

as system property?



correct



Searching and Displaying Different Logical Entities

2009-08-26 Thread wojtekpia

I'm trying to figure out if Solr is the right solution for a problem I'm
facing. I have 2 data entities: P(arent) & C(hild). P contains up to 100
instances of C. I need to expose an interface that searches attributes of
entity C, but displays them grouped by parent entity, P. I need to include
facet counts in the result, and the counts are based on P.

My first solution was to create 2 Solr instances: one for each entity. I
would have to execute 2 queries each time: 1) get a list of matching P's
based on a query of the C instance (facet by P ID in C instance to get
unique list of P's), then 2) get all P's by ID, including facet counts, etc.
The problem I face with this solution is that I can have many matching P's
(10,000+), so my second query will have many (10,000+) constraints. 

My second (and current) solution is to create a single instance, and flatten
all C attributes into the appropriate P record using dynamic fields. For
example, if C has an attribute CA, then I have a dynamic field in P called
CA*. I name this field incrementally based on the number of C's per P (CA1,
CA2, ...).  This works, except that each query is very long (CA1:condition
OR CA2: condition ...). 

Neither solution is ideal. I'm wondering if I'm missing something obvious,
or if I'm using the wrong solution for this problem.

Any insight is appreciated.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp25156301p25156301.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Searching and Displaying Different Logical Entities

2009-08-26 Thread Fuad Efendi
>then 2) get all P's by ID, including facet counts, etc.
>The problem I face with this solution is that I can have many matching P's
(10,000+), so my second query will have many (10,000+) constraints.


SOLR can automatically provide you P's with Counts, and it will be
_unique_...

Even if cardinality of P is 10,000+ SOLR is very fast now (expect few
seconds response time for initial request). You need single query with
"faceting"...


(!) You do not need P's ID.

Single document will have unique ID, and fields such as P, C (with possible
attributes). Do not think in terms of RDBMS... Lucene does all
'normalization' behind the scenes, and SOLR will give you Ps with Cs... 



-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: August-26-09 3:58 PM
To: solr-user@lucene.apache.org
Subject: Searching and Displaying Different Logical Entities


I'm trying to figure out if Solr is the right solution for a problem I'm
facing. I have 2 data entities: P(arent) & C(hild). P contains up to 100
instances of C. I need to expose an interface that searches attributes of
entity C, but displays them grouped by parent entity, P. I need to include
facet counts in the result, and the counts are based on P.

My first solution was to create 2 Solr instances: one for each entity. I
would have to execute 2 queries each time: 1) get a list of matching P's
based on a query of the C instance (facet by P ID in C instance to get
unique list of P's), then 2) get all P's by ID, including facet counts, etc.
The problem I face with this solution is that I can have many matching P's
(10,000+), so my second query will have many (10,000+) constraints. 

My second (and current) solution is to create a single instance, and flatten
all C attributes into the appropriate P record using dynamic fields. For
example, if C has an attribute CA, then I have a dynamic field in P called
CA*. I name this field incrementally based on the number of C's per P (CA1,
CA2, ...).  This works, except that each query is very long (CA1:condition
OR CA2: condition ...). 

Neither solution is ideal. I'm wondering if I'm missing something obvious,
or if I'm using the wrong solution for this problem.

Any insight is appreciated.

Wojtek
-- 
View this message in context:
http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp
25156301p25156301.html
Sent from the Solr - User mailing list archive at Nabble.com.





RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin
 We already opened port 80 from solr to DB so that's not the issue, but 
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master 
only accept the new index from DB and slaves will pull the new indexes from 
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and 
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than EmbeddedSolrServer, 
since we want the Solr Master acting as a solr server as well.
I just worried that http will be a bottle neck, that's why I prefer JDBC 
connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web simultaneously?

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The
Master only taking the new index from Database and slaves will pull the new
index using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master
because of network latency( very slow).

Any suggestions?

Francis

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any & all clients)
becomes visible to all searches henceforth.

The "web interface" has direct access to Solr, and SolrJ remotely accesses
that Solr.

SolrEmbeddedSolrServer is something that few people should actually use.
It's mostly for embedding Solr without running Solr as a server, which is a
somewhat rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-ent

RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Fuad Efendi
With this configuration probably preferred method is to run standalone Java
application on same box as DB, or very close to DB (in same network
segment).

HTTP is not a bottleneck; main bottleneck is
indexing/committing/merging/optimizing in SOLR...

Just as a sample, if  you submit to SOLR batch of large documents, - expect
5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but
nothing related to network latency nor to firewalling... upload 1Mb over
100Mbps network takes less than 0.1 seconds, but indexing it may take > 0.5
secs...

Standalone application with SolrJ is also good because you may schedule
batch updates etc; automated...


P.S.
In theory, if you are using Oracle, you may even try to implement triggers
written in Java causing SOLR update on each row update (transactional); but
I haven't heard anyone uses stored procs in Java, too risky and slow, with
specific dependencies... 




-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com] 
Sent: August-26-09 4:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

 We already opened port 80 from solr to DB so that's not the issue, but
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master
only accept the new index from DB and slaves will pull the new indexes from
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than
EmbeddedSolrServer, since we want the Solr Master acting as a solr server as
well.
I just worried that http will be a bottle neck, that's why I prefer JDBC
connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web simultaneo

Re: What makes a function query count as a match or not?

2009-08-26 Thread Yonik Seeley
On Wed, Aug 26, 2009 at 11:27 AM, Christophe
Biocca wrote:
> I haven't been able to find what makes a function query count as a match
> when used a part of a boolean query with Occur.MUST.

A function query matches all non-deleted documents.

-Yonik
http://www.lucidimagination.com


RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Fuad Efendi
>I just worried that http will be a bottle neck, that's why I prefer JDBC
connection method.

- JDBC is a library for Java Application; it connects to Database; it uses
proprietary protocol provided by DB vendor in most cases, and specific port
number
- SolrJ is a library for Java Application; it connects to SOLR; it uses HTTP
protocol




RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin
No, we don't want to put at the same box as Database box.

Agree, that indexing/committing/merging and optimizing is the bottle neck.

I think it worths to try SolrJ with CommmonsHttpSolrServer option for now and 
let's see what happened to load 3 millions docs.

Thanks

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 1:34 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

With this configuration probably preferred method is to run standalone Java
application on same box as DB, or very close to DB (in same network
segment).

HTTP is not a bottleneck; main bottleneck is
indexing/committing/merging/optimizing in SOLR...

Just as a sample, if  you submit to SOLR batch of large documents, - expect
5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but
nothing related to network latency nor to firewalling... upload 1Mb over
100Mbps network takes less than 0.1 seconds, but indexing it may take > 0.5
secs...

Standalone application with SolrJ is also good because you may schedule
batch updates etc; automated...


P.S.
In theory, if you are using Oracle, you may even try to implement triggers
written in Java causing SOLR update on each row update (transactional); but
I haven't heard anyone uses stored procs in Java, too risky and slow, with
specific dependencies...




-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 4:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

 We already opened port 80 from solr to DB so that's not the issue, but
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master
only accept the new index from DB and slaves will pull the new indexes from
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than
EmbeddedSolrServer, since we want the Solr Master acting as a solr server as
well.
I just worried that http will be a bottle neck, that's why I prefer JDBC
connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR

RE: Solr Replication

2009-08-26 Thread J G

Thanks for the response.

It's interesting because when I run jconsole all I can see is one 
ReplicationHandler jmx mbean. It looks like it is defaulting to the first slice 
it finds on its path. Is there anyway to have multiple replication handlers or 
at least obtain replication on a per "slice"/"instance" via JMX like how you 
can see attributes for each "slice"/"instance" via each replication admin jsp 
page? 

Thanks again.

> From: noble.p...@corp.aol.com
> Date: Wed, 26 Aug 2009 11:05:34 +0530
> Subject: Re: Solr Replication
> To: solr-user@lucene.apache.org
> 
> The ReplicationHandler is not enforced as a singleton , but for all
> practical purposes it is a singleton for one core.
> 
> If an instance  (a slice as you say) is setup as a repeater, It can
> act as both a master and slave
> 
> in the repeater the configuration should be as follows
> 
> MASTER
>   |_SLAVE (I am a slave of MASTER)
>   |
> REPEATER (I am a slave of MASTER and master to my slaves )
>  |
>  |
> REPEATER_SLAVE( of REPEATER)
> 
> 
> the point is that REPEATER will have a slave section has a masterUrl
> which points to master and REPEATER_SLAVE will have a slave section
> which has a masterurl pointing to repeater
> 
> 
> 
> 
> 
> 
> On Wed, Aug 26, 2009 at 12:40 AM, J G wrote:
> >
> > Hello,
> >
> > We are running multiple slices in our environment. I have enabled JMX and I 
> > am inspecting the replication handler mbean to obtain some information 
> > about the master/slave configuration for replication. Is the replication 
> > handler mbean a singleton? I only see one mbean for the entire server and 
> > it's picking an arbitrary slice to report on. So I'm curious if every slice 
> > gets its own replication handler mbean? This is important because I have no 
> > way of knowing in this specific server any information about the other 
> > slices, in particular, information about the master/slave value for the 
> > other slices.
> >
> > Reading through the Solr 1.4 replication strategy, I saw that a slice can 
> > be configured to be a master and a slave, i.e. a repeater. I'm wondering 
> > how repeaters work because let's say I have a slice named 'A' and the 
> > master is on server 1 and the slave is on server 2 then how are these two 
> > servers communicating to replicate? Looking at the jmx information I have 
> > in the MBean both the isSlave and isMaster is set to true for my repeater 
> > so how does this solr slice know if it's the master or slave? I'm a bit 
> > confused.
> >
> > Thanks.
> >
> >
> >
> >
> > _
> > With Windows Live, you can organize, edit, and share your photos.
> > http://www.windowslive.com/Desktop/PhotoGallery
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com

_
Hotmail® is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009

SortableFloatFieldSource not accessible? (1.3)

2009-08-26 Thread Christophe Biocca
The class SortableFloatFieldSource cannot be accessed from outside its
package. So it can't be used as part of a FunctionQuery.
Is there a workaround to this, or should I roll my own? Will it be fixed in
1.4?


RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Francis Yakin

Thanks for the response.

I will try CommonsHttpSolrServer for now.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 1:34 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

With this configuration probably preferred method is to run standalone Java
application on same box as DB, or very close to DB (in same network
segment).

HTTP is not a bottleneck; main bottleneck is
indexing/committing/merging/optimizing in SOLR...

Just as a sample, if  you submit to SOLR batch of large documents, - expect
5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but
nothing related to network latency nor to firewalling... upload 1Mb over
100Mbps network takes less than 0.1 seconds, but indexing it may take > 0.5
secs...

Standalone application with SolrJ is also good because you may schedule
batch updates etc; automated...


P.S.
In theory, if you are using Oracle, you may even try to implement triggers
written in Java causing SOLR update on each row update (transactional); but
I haven't heard anyone uses stored procs in Java, too risky and slow, with
specific dependencies...




-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 4:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

 We already opened port 80 from solr to DB so that's not the issue, but
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master
only accept the new index from DB and slaves will pull the new indexes from
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than
EmbeddedSolrServer, since we want the Solr Master acting as a solr server as
well.
I just worried that http will be a bottle neck, that's why I prefer JDBC
connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nic

Re: Using Lucene's payload in Solr

2009-08-26 Thread Bill Au
While testing my code I discovered that my copyField with PatternTokenize
does not do what I want.  This is what I am indexing into Solr:

2.0|Solr In Action

My copyField is simply:

   

field titleRaw is of type title_raw:


  

  
  

  


For my example input "Solr in Action" is indexed into the titleRaw field
without the payload.  But the payload is still stored.  So when I retrieve
the field titleRaw I still get back "2.0|Solr in Action" where what I really
want is just "Solr in Action".

Is it possible to have the copyField strip off the payload while it is
copying since doing it in the analysis phrase is too late?  Or should I
start looking into using UpdateProcessors as Chris had suggested?

Bill

On Fri, Aug 21, 2009 at 12:04 PM, Bill Au  wrote:

> I ended up not using an XML attribute for the payload since I need to
> return the payload in query response.  So I ended up going with:
>
> 2.0|Solr In Action
>
> My payload is numeric so I can pick a non-numeric delimiter (ie '|').
> Putting the payload in front means I don't have to worry about the delimiter
> appearing in the value.  The payload is required in my case so I can simply
> look for the first occurrence of the delimiter and ignore the possibility of
> the delimiter appearing in the value.
>
> I ended up writing a custom Tokenizer and a copy field with a
> PatternTokenizerFactory to filter out the delimiter and payload.  That's is
> straight forward in terms of implementation.  On top of that I can still use
> the CSV loader, which I really like because of its speed.
>
> Bill.
>
> On Thu, Aug 20, 2009 at 10:36 PM, Chris Hostetter <
> hossman_luc...@fucit.org> wrote:
>
>>
>> : of the field are correct but the delimiter and payload are stored so
>> they
>> : appear in the response also.  Here is an example:
>> ...
>> : I am thinking maybe I can do this instead when indexing:
>> :
>> : XML for indexing:
>> : Solr In Action
>> :
>> : This will simplify indexing as I don't have to repeat the payload for
>> each
>>
>> but now you're into a custom request handler for the updates to deal with
>> the custom XML attribute so you can't use DIH, or CSV loading.
>>
>> It seems like it might be simpler have two new (generic) UpdateProcessors:
>> one that can clone fieldA into fieldB, and one that can do regex mutations
>> on fieldB ... neither needs to know about payloads at all, but the first
>> can made a copy of "2.0|Solr In Action" and the second can strip off the
>> "2.0|" from the copy.
>>
>> then you can write a new NumericPayloadRegexTokenizer that takes in two
>> regex expressions -- one that knows how to extract the payload from a
>> piece of input, and one that specifies the tokenization.
>>
>> those three classes seem easier to implemnt, easier to maintain, and more
>> generally reusable then a custom xml request handler for your updates.
>>
>>
>> -Hoss
>>
>>
>


Sorting by Unindexed Fields

2009-08-26 Thread Isaac Foster
Hi,

I have a situation where a particular kind of document can be categorized in
different ways, and depending on the categories it is in it will have
different fields that describe it (in practice the number of fields will be
fairly small, but whatever). These documents will each have a full-text
field that Solr is perfect for, and it seems like Solr's dynamic fields
ability makes it an even more perfect solution.

I'd like to be able to sort by any of the fields, but indexing them all
seems somewhere between unwise and impossible. Will Solr sort by fields that
are unindexed?

iSac


Manual facet sorting - possible?

2009-08-26 Thread Matthew Painter
Hi,
 
I am attempting to perform a faceted distributed search with manual
sorting of the value of a facet. Is this something which is possible
through a Solr query or would I be better off inserting a manual
weighting field and sort by that?
 
To clarify - I am performing a distributed search over three Solr
instances. Each instance returns results from a single site. I am then
grouping by site (through a manual grouping field, although I suspect I
could have done this more elegantly). I am wishing for a specific order
for the sites to return.
 
Ideally, I'd want something like:

+(search query) grouping_site:mainwebsite^1000
grouping_site:otherwebsite^100
 
however, this doesn't appear to work and instead returns 0 results. I'm
not sure if this is syntactically incorrect or whether it's related to
my earlier unresolved (and frustrating!) issue in which non-trivial
queries aren't working for me when using distributed searches.
 
Any help would be most appreciated.

Thanks,
Matt
This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) 
and may also be LEGALLY PRIVILEGED.  If you are not the intended addressee, 
please do not use, disclose, copy or distribute the message or the information 
it contains.  Instead, please notify me as soon as possible and delete the 
e-mail, including any attachments.  Thank you.


master/slave replication issue

2009-08-26 Thread J G







Hello,

I'm having an issue getting the master to replicate its index to the slave. 
Below you will find my configuration settings. Here is what is happening: I can 
access the replication dashboard for both the slave and master and I can 
successfully execute HTTP commands against both of these urls through my 
browser. Now, my slave is configured to use the same URL as the one I am using 
in my browser when I query the master, yet when I do a tail -f /logs/catalina.out on the slave server all I see is :


Master - server1.xyz.com Aug 27, 2009 12:13:29 AM org.apache.solr.core.SolrCore 
execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

Aug 27, 2009 12:13:32 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

Aug 27, 2009 12:13:34 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

Aug 27, 2009 12:13:36 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

Aug 27, 2009 12:13:39 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

Aug 27, 2009 12:13:42 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

Aug 27, 2009 12:13:44 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=


For some reason, the webapp and the path is being set to null and I "think" 
this is affecting the replication?!? I am running Solr as the WAR file and it's 
1.4 from a few weeks ago.






optimize


optimize





Notice that I commented out the replication of the configuration files. I 
didn't think this is important for the attempt to try to get replication 
working. However, is it good to have these files replicated?


Slave - server2.xyz.com





http://server1.xyz.com:8080/jdoe/replication  


00:00:20  


internal

5000
1


username
password

 




Thanks for your help!




_
Hotmail® is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009

Re: SortableFloatFieldSource not accessible? (1.3)

2009-08-26 Thread Yonik Seeley
SortableFloatField works in function queries... it's just that
everyone goes through SortableFloatField.getValueSource() to create
them.  Will that work for you?

-Yonik
http://www.lucidimagination.com


On Wed, Aug 26, 2009 at 6:23 PM, Christophe
Biocca wrote:
> The class SortableFloatFieldSource cannot be accessed from outside its
> package. So it can't be used as part of a FunctionQuery.
> Is there a workaround to this, or should I roll my own? Will it be fixed in
> 1.4?
>


Re: Sorting by Unindexed Fields

2009-08-26 Thread Avlesh Singh
>
> Will Solr sort by fields that are unindexed?
>
Unfortunately, No.

Cheers
Avlesh

On Thu, Aug 27, 2009 at 4:03 AM, Isaac Foster wrote:

> Hi,
>
> I have a situation where a particular kind of document can be categorized
> in
> different ways, and depending on the categories it is in it will have
> different fields that describe it (in practice the number of fields will be
> fairly small, but whatever). These documents will each have a full-text
> field that Solr is perfect for, and it seems like Solr's dynamic fields
> ability makes it an even more perfect solution.
>
> I'd like to be able to sort by any of the fields, but indexing them all
> seems somewhere between unwise and impossible. Will Solr sort by fields
> that
> are unindexed?
>
> iSac
>


Re: Sorting by Unindexed Fields

2009-08-26 Thread Isaac Foster
Is it also the case that it will not narrow by them?

Isaac

On Wed, Aug 26, 2009 at 8:59 PM, Avlesh Singh  wrote:

> >
> > Will Solr sort by fields that are unindexed?
> >
> Unfortunately, No.
>
> Cheers
> Avlesh
>
> On Thu, Aug 27, 2009 at 4:03 AM, Isaac Foster  >wrote:
>
> > Hi,
> >
> > I have a situation where a particular kind of document can be categorized
> > in
> > different ways, and depending on the categories it is in it will have
> > different fields that describe it (in practice the number of fields will
> be
> > fairly small, but whatever). These documents will each have a full-text
> > field that Solr is perfect for, and it seems like Solr's dynamic fields
> > ability makes it an even more perfect solution.
> >
> > I'd like to be able to sort by any of the fields, but indexing them all
> > seems somewhere between unwise and impossible. Will Solr sort by fields
> > that
> > are unindexed?
> >
> > iSac
> >
>


Re: Sorting by Unindexed Fields

2009-08-26 Thread Avlesh Singh
>
> Is it also the case that it will not narrow by them?

If "narrowing" means faceting, then again a no.

Cheers
Avlesh

On Thu, Aug 27, 2009 at 6:36 AM, Isaac Foster wrote:

> Is it also the case that it will not narrow by them?
>
> Isaac
>
> On Wed, Aug 26, 2009 at 8:59 PM, Avlesh Singh  wrote:
>
> > >
> > > Will Solr sort by fields that are unindexed?
> > >
> > Unfortunately, No.
> >
> > Cheers
> > Avlesh
> >
> > On Thu, Aug 27, 2009 at 4:03 AM, Isaac Foster  > >wrote:
> >
> > > Hi,
> > >
> > > I have a situation where a particular kind of document can be
> categorized
> > > in
> > > different ways, and depending on the categories it is in it will have
> > > different fields that describe it (in practice the number of fields
> will
> > be
> > > fairly small, but whatever). These documents will each have a full-text
> > > field that Solr is perfect for, and it seems like Solr's dynamic fields
> > > ability makes it an even more perfect solution.
> > >
> > > I'd like to be able to sort by any of the fields, but indexing them all
> > > seems somewhere between unwise and impossible. Will Solr sort by fields
> > > that
> > > are unindexed?
> > >
> > > iSac
> > >
> >
>


Re: Seattle / NW Hadoop, HBase Lucene, etc. Meetup , Wed August 26th, 6:45pm

2009-08-26 Thread Bradford Stephens

Hello,

My apologies, but there was a mix-up reserving our meeting location,  
and we don't have access to it.


I'm very sorry, and beer is on me next month. Promise :)

Sent from my Internets

On Aug 25, 2009, at 4:21 PM, Bradford Stephens > wrote:



Hey there,

Apologies for this not going out sooner -- apparently it was sitting
as a draft in my inbox. A few of you have pinged me, so thanks for
your vigilance.

It's time for another Hadoop/Lucene/Apache Stack meetup! We've had
great attendance in the past few months, let's keep it up! I'm always
amazed by the things I learn from everyone.

We're back at the University of Washington, Allen Computer Science
Center (not Computer Engineering)
Map: http://www.washington.edu/home/maps/?CSE

Room: 303 -or- the Entry level. If there are changes, signs will be  
posted.


More Info:

The meetup is about 2 hours: we'll have two in-depth talks of 15-20
minutes each, and then several "lightning talks" of 5 minutes. If no
one offers, We'll then have discussion and 'social time'.  we'll just
have general discussion. Let net know if you're interested in speaking
or attending. We'd like to focus on education, so every presentation
*needs* to ask some questions at the end. We can talk about these
after the presentations, and I'll record what we've learned in a wiki
and share that with the rest of us.

Contact: Bradford Stephens, 904-415-3009, bradfordsteph...@gmail.com

--
http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science


Re: SolrJ and Solr web simultaneously?

2009-08-26 Thread Erik Hatcher
With a relational database, the approach that has been working for us  
and many customers is to first give DataImportHandler a go.  It's  
powerful and fast.  3M docs should index in about an hour or less, I'd  
speculate.  But using DIH does require making access from Solr to the  
DB server solid, of course.


Erik

On Aug 26, 2009, at 6:26 PM, Francis Yakin wrote:



Thanks for the response.

I will try CommonsHttpSolrServer for now.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 1:34 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

With this configuration probably preferred method is to run  
standalone Java

application on same box as DB, or very close to DB (in same network
segment).

HTTP is not a bottleneck; main bottleneck is
indexing/committing/merging/optimizing in SOLR...

Just as a sample, if  you submit to SOLR batch of large documents, -  
expect
5-55 seconds response time (even with EmbeddedSolr or pure Lucene),  
but
nothing related to network latency nor to firewalling... upload 1Mb  
over
100Mbps network takes less than 0.1 seconds, but indexing it may  
take > 0.5

secs...

Standalone application with SolrJ is also good because you may  
schedule

batch updates etc; automated...


P.S.
In theory, if you are using Oracle, you may even try to implement  
triggers
written in Java causing SOLR update on each row update  
(transactional); but
I haven't heard anyone uses stored procs in Java, too risky and  
slow, with

specific dependencies...




-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 4:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

We already opened port 80 from solr to DB so that's not the issue, but
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru  
slaves( master
only accept the new index from DB and slaves will pull the new  
indexes from

Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box  
and

separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than
EmbeddedSolrServer, since we want the Solr Master acting as a solr  
server as

well.
I just worried that http will be a bottle neck, that's why I prefer  
JDBC

connection method.

Francis

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance?  
Do you

have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set  
port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD  
which you

might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why  
to share

I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-Original Message-
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely  
than

network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first  
initial
load only) and subsequently we actively adding the new docs to Solr  
after
the initial load. We prefer to use JDBC connection , so if solrj  
uses JDBC
connection that might usefull. I also like the multi-threading  
option from

Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?


I don't want or try not to use http connection from Database to Solr

Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more  
important.

With separate SOLR instance on a se

Fwd: Lucene Search Performance Analysis Workshop

2009-08-26 Thread Erik Hatcher
While Andrzej's talk will focus on things at the Lucene layer, I'm  
sure there'll be some great tips and tricks useful to Solrians too.   
Andrzej is one of the sharpest folks I've met, and he's also a very  
impressive presenter.  Tune in if you can.


Erik


Begin forwarded message:


From: Andrzej Bialecki 
Date: August 26, 2009 5:44:40 PM EDT
To: java-u...@lucene.apache.org
Subject: Lucene Search Performance Analysis Workshop
Reply-To: java-u...@lucene.apache.org

Hi all,

I am giving a free talk/ workshop next week on how to analyze and  
improve Lucene search performance for native lucene apps. If you've  
ever been challenged to get your Java Lucene search apps running  
faster, I think you might find the talk of interest.


Free online workshop:
Thursday, September 3rd 2009
11:00-11:30AM PDT / 14:00-14:30 EDT

Follow this link to sign up:
http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650dcb1d6bbc?trk=WR-SEP2009-AP

About:
Lucene Performance Workshop:
Understanding Lucene Search Performance
with Andrzej Bialecki

Experienced Java developers know how to use the Apache Lucene  
library to build powerful search applications natively in Java.
LucidGaze for Lucene from Lucid Imagination, just released this  
week, provides a powerful utility for making transparent the  
underlying indexing and search operations, and analyzing their  
impact on search performance.


Agenda:
* Understanding sources of variability in Lucene search performance
* LucidGaze for Lucene APIs for performance statistics
* Applying LucidGaze for Lucene performance statistics to real-world  
performance problems


Join us for a free online workshop. Sign up via the link below:
http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650dcb1d6bbc?trk=WR-SEP2009-AP

About the Presenter:
Andrzej Bialecki, Apache Lucene PMC Member, is on the Lucid  
Imagination Technical Advisory Board; he also serves as the project  
lead for Nutch, and as committer in the Lucene-java, Nutch and  
Hadoop projects. He has broad expertise, across domains as diverse  
as information retrieval, systems architecture, embedded systems  
kernels, networking and business process/e-commerce modeling. He's  
also the author of the popular Luke index inspection utility.  
Andrzej holds a master's degree in Electronics from Warsaw Technical  
University, speaks four languages and programs in many, many more.



--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





Re: Sorting by Unindexed Fields

2009-08-26 Thread Erik Hatcher
Solr sorts on indexed fields only, currently.  And only a single value  
per document per sort field (careful with analyzed fields, and no  
multiValued fields).


Unwise and impossible - of course this depends on the scale you're  
speaking of.  How many documents?  What types of fields?   How small  
is fairly small number of fields?


Erik


On Aug 26, 2009, at 6:33 PM, Isaac Foster wrote:


Hi,

I have a situation where a particular kind of document can be  
categorized in

different ways, and depending on the categories it is in it will have
different fields that describe it (in practice the number of fields  
will be
fairly small, but whatever). These documents will each have a full- 
text
field that Solr is perfect for, and it seems like Solr's dynamic  
fields

ability makes it an even more perfect solution.

I'd like to be able to sort by any of the fields, but indexing them  
all
seems somewhere between unwise and impossible. Will Solr sort by  
fields that

are unindexed?

iSac




Total count of records

2009-08-26 Thread bhaskar chandrasekar
Hi,
 
When Solr retrives records based on a input match , it gives total count of 
records.
Say for Ex , it displays like : 1 out of 20,000 for the particular search 
string.
 
How the total count of records are fetched in Solr , does it refer any Schema 
or XML file?.
 
 
Regards
Bhaskar
 


  

Re: Total count of records

2009-08-26 Thread Avlesh Singh
>
> How the total count of records are fetched in Solr , does it refer any
> Schema or XML file?.
>
Sorry, but I did not get you. What does that mean? The total count is not
stored anywhere; it is computed based on how many documents you have in your
index matching the query.

Cheers
Avlesh

On Thu, Aug 27, 2009 at 7:36 AM, bhaskar chandrasekar
wrote:

> Hi,
>
> When Solr retrives records based on a input match , it gives total count of
> records.
> Say for Ex , it displays like : 1 out of 20,000 for the particular search
> string.
>
> How the total count of records are fetched in Solr , does it refer any
> Schema or XML file?.
>
>
> Regards
> Bhaskar
>
>
>
>


RE: Lucene Search Performance Analysis Workshop

2009-08-26 Thread Fuad Efendi
I am wondering... are new SOLR filtering features faster than standard
Lucene queries like
{query} AND {filter}???

Why can't we improve Lucene then?

Fuad


P.S. 
https://issues.apache.org/jira/browse/SOLR-1169
https://issues.apache.org/jira/browse/SOLR-1179





-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: August-26-09 8:50 PM
To: solr-user@lucene.apache.org
Subject: Fwd: Lucene Search Performance Analysis Workshop

While Andrzej's talk will focus on things at the Lucene layer, I'm  
sure there'll be some great tips and tricks useful to Solrians too.   
Andrzej is one of the sharpest folks I've met, and he's also a very  
impressive presenter.  Tune in if you can.

Erik


Begin forwarded message:

> From: Andrzej Bialecki 
> Date: August 26, 2009 5:44:40 PM EDT
> To: java-u...@lucene.apache.org
> Subject: Lucene Search Performance Analysis Workshop
> Reply-To: java-u...@lucene.apache.org
>
> Hi all,
>
> I am giving a free talk/ workshop next week on how to analyze and  
> improve Lucene search performance for native lucene apps. If you've  
> ever been challenged to get your Java Lucene search apps running  
> faster, I think you might find the talk of interest.
>
> Free online workshop:
> Thursday, September 3rd 2009
> 11:00-11:30AM PDT / 14:00-14:30 EDT
>
> Follow this link to sign up:
>
http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d
cb1d6bbc?trk=WR-SEP2009-AP
>
> About:
> Lucene Performance Workshop:
> Understanding Lucene Search Performance
> with Andrzej Bialecki
>
> Experienced Java developers know how to use the Apache Lucene  
> library to build powerful search applications natively in Java.
> LucidGaze for Lucene from Lucid Imagination, just released this  
> week, provides a powerful utility for making transparent the  
> underlying indexing and search operations, and analyzing their  
> impact on search performance.
>
> Agenda:
> * Understanding sources of variability in Lucene search performance
> * LucidGaze for Lucene APIs for performance statistics
> * Applying LucidGaze for Lucene performance statistics to real-world  
> performance problems
>
> Join us for a free online workshop. Sign up via the link below:
>
http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d
cb1d6bbc?trk=WR-SEP2009-AP
>
> About the Presenter:
> Andrzej Bialecki, Apache Lucene PMC Member, is on the Lucid  
> Imagination Technical Advisory Board; he also serves as the project  
> lead for Nutch, and as committer in the Lucene-java, Nutch and  
> Hadoop projects. He has broad expertise, across domains as diverse  
> as information retrieval, systems architecture, embedded systems  
> kernels, networking and business process/e-commerce modeling. He's  
> also the author of the popular Luke index inspection utility.  
> Andrzej holds a master's degree in Electronics from Warsaw Technical  
> University, speaks four languages and programs in many, many more.
>
>
> -- 
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>





RE: SolrJ and Solr web simultaneously?

2009-08-26 Thread Fuad Efendi
Frankly, I never tried any DIH... probably it is the best option for this
specific case (they have Java developer) - but one should be knowledgeable
enough to design SOLR schema... And I noticed here (and also at HBase
mailing list) many first-time users are still thinking in terms of
Relational-DBMS and are trying to index as-is their tables with relations
(and different PKs) instead of indexing their documents... I have constantly
1000+ docs per second now, with 5%-15% CPU... small docs 5Kb in size in
average, 7 fields... yes, correct, 3M+ docs in an hour... could be 10 times
more!!! (5%-15%CPU currently)
Fuad

>With a relational database, the approach that has been working for us  
>and many customers is to first give DataImportHandler a go.  It's  
>powerful and fast.  3M docs should index in about an hour or less, I'd  
>speculate.  But using DIH does require making access from Solr to the  
>DB server solid, of course.
>
>   Erik





Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-26 Thread Aaron Aberg
Hey Guys,

Ok, I found this:

Troubleshooting Errors
It's possible that you get an error related to the following:

SEVERE: Exception starting filter SolrRequestFilter
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.solr.core.SolrConfig
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:76)
.
Caused by: java.lang.RuntimeException: XPathFactory#newInstance()
failed to create an XPathFactory for the default object model:
http://java.sun.com/jaxp/xpath/dom with the
XPathFactoryConfigurationException: javax.xml.x
path.XPathFactoryConfigurationException: No XPathFctory implementation
found for the object model: http://java.sun.com/jaxp/xpath/dom
at javax.xml.xpath.XPathFactory.newInstance(Unknown Source)

This is due to your tomcat instance not having the xalan jar file in
the classpath. It took me some digging to find this, and thought it
might be useful for others. The location varies from distribution to
distribution, but I essentially just added (via a symlink) the jar
file to the shared/lib directory under the tomcat directory.

I am a java n00b. How can I set this up?

On Tue, Aug 18, 2009 at 10:16 PM, Chris
Hostetter wrote:
>
> : -Dsolr.solr.home='/some/path'
> :
> : Should I be putting that somewhere? Or is that already taken care of
> : when I edited the web.xml file in my solr.war file?
>
> No ... you do not need to set that system property if you already have it
> working because of modifications to the web.xml ... according to the log
> you posted earlier, Solr is seeing your solr home dir set correctly...
>
> Aug 17, 2009 11:16:15 PM org.apache.solr.core.SolrResourceLoader 
> locateInstanceDir
> INFO: Using JNDI solr.home: /usr/share/solr
> Aug 17, 2009 11:16:15 PM org.apache.solr.core.CoreContainer$Initializer 
> initialize
> INFO: looking for solr.xml: /usr/share/solr/solr.xml
> Aug 17, 2009 11:16:15 PM org.apache.solr.core.SolrResourceLoader 
> INFO: Solr home set to '/usr/share/solr/'
>
> ...that's were you want it to point, correct?
>
> (don't be confused by the later message of "Check solr/home property" ...
> that's just a hint because 9 times out of 10 an error initializing solr
> comes from solr needing to *guess* about the solr home dir)
>
> The crux of your error is being able to load an XPathFactory, the fact
> that it can't load an XPath factory prevents the your
> classloader from even being able to load the SolrConfig class -- note this
> also in the log you posted earlier...
>
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.solr.core.SolrConfig
>
> ...the root of the problem is here...
>
> Caused by: java.lang.RuntimeException: XPathFactory#newInstance()
> failed to create an XPathFactory for the default object model:
> http://java.sun.com/jaxp/xpath/dom with the
> XPathFactoryConfigurationException:
> javax.xml.xpath.XPathFactoryConfigurationException: No XPathFctory
> implementation found for the object model:
> http://java.sun.com/jaxp/xpath/dom
>        at javax.xml.xpath.XPathFactory.newInstance(Unknown Source)
>        at org.apache.solr.core.Config.(Config.java:41)
>
> XPathFactory.newInstance() is used to construct an instance of an
> XPathFactory where the concrete type is unknown by the caller (in this
> case: solr)  There is an alternte form (XPathFactory.newInstance(String
> uri)) which allows callers to specify *which* model they want, and it can
> throw an exception if the model isn't available in the current JVM using
> reflection, but if you read the javadocs for hte method being called...
>
> http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/xpath/XPathFactory.html#newInstance()
>   Get a new XPathFactory instance using the default object model,
>   DEFAULT_OBJECT_MODEL_URI, the W3C DOM.
>
>   This method is functionally equivalent to:
>
>      newInstance(DEFAULT_OBJECT_MODEL_URI)
>
>   Since the implementation for the W3C DOM is always available, this
>   method will never fail.
>
> ...except that in your case, it is in fact clearly failing.  which
> suggests that your hosting provider has given you a crapy JVM.  I have no
> good suggestions for debugging this, other then this google link...
>
> http://www.google.com/search?q=+No+XPathFctory+implementation+found+for+the+object+model%3A+http%3A%2F%2Fjava.sun.com%2Fjaxp%2Fxpath%2Fdom
>
> The good new is, there isn't anything solr specific about this problem.
> Any servlet container giving you that error when you load solr, should
> cause the exact same error with a servlet as simple as this...
>
>  public class TestServlet extends javax.servlet.http.HttpServlet {
>    public static Object X = javax.xml.xpath.XPathFactory.newInstance();
>    public void doGet (javax.servlet.http.HttpServletRequest req,
>                       javax.servlet.http.HttpServletResponse res) {
>       // NOOP
>    }
>  }
>
> ...which should provide you with a nice short bug report for your hosting
> provider.
>
> One last important note (becau

RE: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-26 Thread Fuad Efendi
Looks like you totally ignored my previous post... 




> Who is vendor of this "openjdk-1.6.0.0"? Who is vendor of JVM which this
JDK
> runs on?
>... such installs for Java are totally mess, you
may have incompatible Servlet API loaded by bootstrap classloader before
Tomcat classes




First of all, please, try to install standard Java from SUN  on your
development box, and run some samples...




 
!>This is due to your tomcat instance not having the xalan jar file in
!>the classpath


P.S.
Don't rely on CentOS 'approved' Java libraries.





Re: master/slave replication issue

2009-08-26 Thread Noble Paul നോബിള്‍ नोब्ळ्
The log messages are shown when you hit the admin page. So on't worry
about that. Keep a minimal configuration of Replication. All you need
is  masterUrl and pollInterval.


On Thu, Aug 27, 2009 at 5:52 AM, J G wrote:
>
>
>
>
>
>
>
> Hello,
>
> I'm having an issue getting the master to replicate its index to the slave. 
> Below you will find my configuration settings. Here is what is happening: I 
> can access the replication dashboard for both the slave and master and I can 
> successfully execute HTTP commands against both of these urls through my 
> browser. Now, my slave is configured to use the same URL as the one I am 
> using in my browser when I query the master, yet when I do a tail -f  home>/logs/catalina.out on the slave server all I see is :
>
>
> Master - server1.xyz.com Aug 27, 2009 12:13:29 AM 
> org.apache.solr.core.SolrCore execute
>
> INFO: [] webapp=null path=null params={command=details} status=0 QTime=8
>
> Aug 27, 2009 12:13:32 AM org.apache.solr.core.SolrCore execute
>
> INFO: [] webapp=null path=null params={command=details} status=0 QTime=8
>
> Aug 27, 2009 12:13:34 AM org.apache.solr.core.SolrCore execute
>
> INFO: [] webapp=null path=null params={command=details} status=0 QTime=4
>
> Aug 27, 2009 12:13:36 AM org.apache.solr.core.SolrCore execute
>
> INFO: [] webapp=null path=null params={command=details} status=0 QTime=4
>
> Aug 27, 2009 12:13:39 AM org.apache.solr.core.SolrCore execute
>
> INFO: [] webapp=null path=null params={command=details} status=0 QTime=4
>
> Aug 27, 2009 12:13:42 AM org.apache.solr.core.SolrCore execute
>
> INFO: [] webapp=null path=null params={command=details} status=0 QTime=8
>
> Aug 27, 2009 12:13:44 AM org.apache.solr.core.SolrCore execute
>
> INFO: [] webapp=null path=null params={command=details} status=0 QTime=
>
>
> For some reason, the webapp and the path is being set to null and I "think" 
> this is affecting the replication?!? I am running Solr as the WAR file and 
> it's 1.4 from a few weeks ago.
>
>
>
> 
>    
>        
>        optimize
>
>        
>        optimize
>
>        
>        
>    
> 
> Notice that I commented out the replication of the configuration files. I 
> didn't think this is important for the attempt to try to get replication 
> working. However, is it good to have these files replicated?
>
>
> Slave - server2.xyz.com
>
> 
>    
>
>        
>         name="masterUrl">http://server1.xyz.com:8080/jdoe/replication
>
>        
>        00:00:20
>        
>        
>        internal
>        
>        5000
>        1
>
>        
>        username
>        password
>
>     
> 
>
>
>
> Thanks for your help!
>
>
>
>
> _
> Hotmail® is up to 70% faster. Now good news travels really fast.
> http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Max limit on number of cores?

2009-08-26 Thread Noble Paul നോബിള്‍ नोब्ळ्
There is no hard limit. It is going to be decided by your h/w . You
will be limited by the no:of files that can be kept open by your
system.

On Thu, Aug 27, 2009 at 1:06 AM, djain101 wrote:
>
> Hi,
>
> Is there any maximum limit on the number of cores one solr webapp can have
> without compromising on its performance? If yes, what is that limit?
>
> Thanks,
> Dharmveer
> --
> View this message in context: 
> http://www.nabble.com/Max-limit-on-number-of-cores--tp25155334p25155334.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Problem using replication in 8/25/09 nightly build of 1.4

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 11:53 PM, Ron Ellis  wrote:

> Hi Everyone,
>
> When trying to utilize the new HTTP based replication built into Solr 1.4 I
> encounter a problem. When I view the replication admin page on the slave
> all
> of the master values are null i.e. Replicatable Index Version:null,
> Generation: null | Latest Index Version:null, Generation: null.


If the master has just been started, it has no index which can be replicated
to slave. If you do a commit on master then a replicateable index version
will be shown on the slave and replication will proceed. Alternately, you
can add the following to master configuration

startup


> Despite
> these missing values the two seem to be talking over HTTP successfully (if
> I
> shutdown the master the slave replication page starts exploding with a
> NPE).


The slave replication page should not show a NPE if the master is down. I'll
look into it.


>
> When I hit http://solr/replication?command=indexversion&wt=xml I get the
> following...
>
> 
> -
> 
> 0
> 13
> 
> 0
> 0
> 
>
> However in the admin/replication UI on the master I see...
>
> **
>  Index Version: 1250525534711, Generation: 1778
> Any idea what I'm doing wrong or how I could begin to diagnose? I am using
> the 8/25 nightly build of solr with the example solrconfig.xml provided.
> The
> only modifications to the config have been to uncomment the master/rslave
> replication sections and remove the data directory location line so it
> falls
> back to solr.home/data. Also if it's relevant this index was originally
> created in solr 1.3.
>

I think that should be fine. I assume both master and slave are same Solr
version 1.4?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Pattern matching in Solr

2009-08-26 Thread bhaskar chandrasekar
 
Hi,
 
In Schema.xml file,I am not able ot find splitOnCaseChange="1".
I am not looking for case sensitive search.
Let me know what file you are refering to?.
I am looking for exact match search only

Moreover for scenario 2 the KeywordTokenizerFactory
and EdgeNGramFilterFactory refers which link in Solr wiki.
 
Regards
Bhaskar

--- On Wed, 8/26/09, Avlesh Singh  wrote:


From: Avlesh Singh 
Subject: Re: Pattern matching in Solr
To: solr-user@lucene.apache.org
Date: Wednesday, August 26, 2009, 11:31 AM


You could have used your previous thread itself (
http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr),
Bhaskar.

In your scenario one, you need an exact token match, right? You are getting
expected results if your field type is "text". Look for the
"WordDelimiterFilterFactory" in your field type definition for the text
field inside schema.xml. You'll find an attribute splitOnCaseChange="1".
Because of this, "ChandarBhaskar" is converted into two tokens "Chandra" and
"Bhaskar" and hence the matches. You may choose to remove this attribute if
the behaviour is not desired.

For your scenario two, you may want to look at the KeywordTokenizerFactory
and EdgeNGramFilterFactory on Solr wiki.

Generally, for all such use cases people create multiple fields in their
schema storing the same data analyzed in different ways.

Cheers
Avlesh

On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar  wrote:

> Hi,
>
> Can any one help me with the below scenario?.
>
> Scenario 1:
>
> Assume that I give Google as input string
> i am using Carrot with Solr
> Carrot is for front end display purpose
> the issue is
> Assuming i give "BHASKAR" as input string
> It should give me search results pertaining to BHASKAR only.
>  Select * from MASTER where name ="Bhaskar";
>  Example:It should not display search results as "ChandarBhaskar" or
>  "BhaskarC".
>  Should display Bhaskar only.
>
> Scenario 2:
>  Select * from MASTER where name like "%BHASKAR%";
>  It should display records containing the word BHASKAR
>  Ex: Bhaskar
> ChandarBhaskar
>  BhaskarC
>  Bhaskarabc
>
>  How to achieve Scenario 1 in Solr ?.
>
>
>
> Regards
> Bhaskar
>
>
>
>


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: ${solr.abortOnConfigurationError:false} - does it defaults to false

2009-08-26 Thread Shalin Shekhar Mangar
On Thu, Aug 27, 2009 at 1:05 AM, Ryan McKinley  wrote:

>
> On Aug 26, 2009, at 3:33 PM, djain101 wrote:
>
>
>> I have one quick question...
>>
>> If in solrconfig.xml, if it says ...
>>
>>
>> ${solr.abortOnConfigurationError:false}
>>
>> does it mean  defaults to false if it is not
>> set
>> as system property?
>>
>>
> correct
>
>
Should that be changed to be true by default in the example solrconfig?

-- 
Regards,
Shalin Shekhar Mangar.