searching while importing

2010-10-13 Thread Tri Nguyen
Hi,
 
Can I perform searches against the index while it is being imported?
 
Does importing add 1 document at a time or will solr make a temporary index and 
switch to that index when indexing is done?
 
Thanks,
 
Tri

Re: searching while importing

2010-10-13 Thread Tri Nguyen
Hi,

As long as I can search on the current ("older") index while importing, I'm 
good.  I've tested this and I can search the older index while data-importing 
the newer index.

So you can search the older index in your 5 hour wait?

Thanks,

Tri





From: Shawn Heisey 
To: solr-user@lucene.apache.org
Sent: Wed, October 13, 2010 3:38:48 PM
Subject: Re: searching while importing

If you are using the DataImportHandler, you will not be able to search new data 
until the full-import or delta-import is complete and the update is committed.  
When I do a full reindex, it takes about 5 hours, and until it is finished, I 
cannot search it.

I have not tried to issue a manual commit in the middle of an import to see 
whether that makes data inserted up to that point searchable, but I would not 
expect that to work.

If you need this kind of functionality, you may need to change your build 
system 
so that a full import clears the index manually and then does a series of 
delta-import batches.


On 10/13/2010 3:51 PM, Tri Nguyen wrote:
> Hi,
>  Can I perform searches against the index while it is being imported?
>  Does importing add 1 document at a time or will solr make a temporary index 
>and
> switch to that index when indexing is done?
>  Thanks,
>  Tri

scheduling imports and heartbeats

2010-11-09 Thread Tri Nguyen
Hi,
 
Can I configure solr to schedule imports at a specified time (say once a day, 
once an hour, etc)?
 
Also, does solr have some sort of heartbeat mechanism?
 
Thanks,
 
Tri

Re: scheduling imports and heartbeats

2010-11-10 Thread Tri Nguyen
i'm looking for another solution other than cron job.

can i configure solr to schedule imports?





From: Ranveer Kumar 
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 8:13:03 PM
Subject: Re: scheduling imports and heartbeats

You should use cron for that..

On 10 Nov 2010 08:47, "Tri Nguyen"  wrote:

Hi,

Can I configure solr to schedule imports at a specified time (say once a
day,
once an hour, etc)?

Also, does solr have some sort of heartbeat mechanism?

Thanks,

Tri


Re: scheduling imports and heartbeats

2010-11-10 Thread Tri Nguyen
Thanks for the tip Ken.  I tried that but don't see the importing happening 
when 
I check up on the status.

Below is what's in my dataimport.properties.

#Wed Nov 10 11:36:28 PST 2010
metadataObject.last_index_time=2010-09-20 11\:12\:47
interval=1
port=8080
server=localhost
params=/select?qt\=/dataimport&command\=full-import&clean\=true&commit\=true
webapp=solr
id.last_index_time=2010-11-10 11\:36\:27
syncEnabled=1
last_index_time=2010-11-10 11\:36\:27



 




From: Ken Stanley 
To: solr-user@lucene.apache.org
Sent: Wed, November 10, 2010 4:41:17 AM
Subject: Re: scheduling imports and heartbeats

On Tue, Nov 9, 2010 at 10:16 PM, Tri Nguyen  wrote:
> Hi,
>
> Can I configure solr to schedule imports at a specified time (say once a day,
> once an hour, etc)?
>
> Also, does solr have some sort of heartbeat mechanism?
>
> Thanks,
>
> Tri

Tri,

If you use the DataImportHandler (DIH), you can set up a
dataimport.properties file that can be configured to import on
intervals.

http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example

As for "heartbeat", you can use the ping handler (default is
/admin/ping) to check the status of the servlet.

- Ken


data import scheduling

2010-11-11 Thread Tri Nguyen
Hi,

Has anyone gotten solr to schedule data imports at a certain time interval 
through configuring solr?

I tried setting interval=1, which is import every minute but I don't see it 
happening.

I'm trying to avoid cron jobs.

Thanks,

Tri

importing from java

2010-11-11 Thread Tri Nguyen
Hi,

I'm restricted to the following in regards to importing.

I have access to a list (Iterator) of Java objects I need to import into solr.

Can I import the java objects as part of solr's data import interface (whenever 
an http request to solr to do a dataimport, it'll call my java class to get 
objects)?  


Before I had direct read only access to the db and specified the column 
mappings 
and things were fine with the data import.  


But now I am restricted to using a .jar file that has an api to get the records 
in the database and I need to publish these records in the db.  I do see solrj 
and but solrj is seaparate from the solr webapp.

Can I write my own dataimporthandler?

Thanks,

Tri

Re: importing from java

2010-11-11 Thread Tri Nguyen
another question is, can I write my own DataImportHandler class?

thanks,

Tri





From: Tri Nguyen 
To: solr user 
Sent: Thu, November 11, 2010 7:01:25 PM
Subject: importing from java

Hi,

I'm restricted to the following in regards to importing.

I have access to a list (Iterator) of Java objects I need to import into solr.

Can I import the java objects as part of solr's data import interface (whenever 
an http request to solr to do a dataimport, it'll call my java class to get 
objects)?  


Before I had direct read only access to the db and specified the column 
mappings 

and things were fine with the data import.  


But now I am restricted to using a .jar file that has an api to get the records 
in the database and I need to publish these records in the db.  I do see solrj 
and but solrj is seaparate from the solr webapp.

Can I write my own dataimporthandler?

Thanks,

Tri

sorl response xsd

2010-11-22 Thread Tri Nguyen
Hi,
 
I'm trying to look for the solr response xsd.
 
Is this it here?
 
https://issues.apache.org/jira/browse/SOLR-17
 
I'd basically want to know if the data import passed or failed.  I can get the 
xml string and search for "completed", but would wondering if I can use and xsd 
to parse the response.
 
Or is there another way?
 
Here's the response I have and I don't see in the xsd the lst element for 
statusMessages.
 
xml version="1.0" encoding="UTF-8" ?> 

- 


+ 


  0 

  15 
  

+ 


- 


  data-config.xml 
  
  

  full-import 

  idle 

   

- 


  0 

  0 

  0 

  2010-11-22 17:20:42 

  Indexing completed. Added/Updated: 0 documents. Deleted 0 
documents. 

  2010-11-22 17:20:43 

  2010-11-22 17:20:43 

  0 

  0:0:0.375 
  

  This response format is experimental. It is likely to 
change in the future. 
  
 
Thanks,
 
Tri

dataimports response returns before done?

2010-12-03 Thread Tri Nguyen
Hi,
 
After issueing a dataimport, I've noticed solr returns a response prior to 
finishing the import. Is this correct?   Is there anyway i can make solr not 
return until it finishes?
 
If not, how do I ping for the status whether it finished or not?
 
thanks,
 
tri

customer ping response

2010-12-07 Thread Tri Nguyen
Can I have a custom xml response for the ping request?

thanks,

Tri

Re: customer ping response

2010-12-07 Thread Tri Nguyen
I need to return this:




Server
ok







From: Markus Jelsma 
To: solr-user@lucene.apache.org
Cc: Tri Nguyen 
Sent: Tue, December 7, 2010 4:27:32 PM
Subject: Re: customer ping response

Of course! The ping request handler behaves like any other request handler and 
accepts at last the wt parameter [1]. Use xslt [2] to transform the output to 
any desirable form or use other response writers [1].

Why anyway, is it a load balancer that only wants an OK output or something?

[1]: http://wiki.apache.org/solr/CoreQueryParameters
[2]: http://wiki.apache.org/solr/XsltResponseWriter
[3]: http://wiki.apache.org/solr/QueryResponseWriter
> Can I have a custom xml response for the ping request?
> 
> thanks,
> 
> Tri


Re: customer ping response

2010-12-07 Thread Tri Nguyen
Hi,

I'm reading the wiki.

What does q=apache mean in the url?

http://localhost:8983/solr/select/?stylesheet=&q=apache&wt=xslt&tr=example.xsl

thanks,

tri

 




From: Markus Jelsma 
To: Tri Nguyen 
Cc: solr-user@lucene.apache.org
Sent: Tue, December 7, 2010 4:35:28 PM
Subject: Re: customer ping response

Well, you can go a long way with xslt but i wouldn't know how to embed the 
server name in the response as Solr simply doesn't return that information.

You'd have to patch the response Solr's giving or put a small script in front 
that can embed the server name.

> I need to return this:
> 
> 
> 
> 
> Server
> ok
> 
> 
> 
> 
> 
> 
> ________
> From: Markus Jelsma 
> To: solr-user@lucene.apache.org
> Cc: Tri Nguyen 
> Sent: Tue, December 7, 2010 4:27:32 PM
> Subject: Re: customer ping response
> 
> Of course! The ping request handler behaves like any other request handler
> and accepts at last the wt parameter [1]. Use xslt [2] to transform the
> output to any desirable form or use other response writers [1].
> 
> Why anyway, is it a load balancer that only wants an OK output or
> something?
> 
> [1]: http://wiki.apache.org/solr/CoreQueryParameters
> [2]: http://wiki.apache.org/solr/XsltResponseWriter
> [3]: http://wiki.apache.org/solr/QueryResponseWriter
> 
> > Can I have a custom xml response for the ping request?
> > 
> > thanks,
> > 
> > Tri


solr immediate response on data import

2010-12-09 Thread Tri Nguyen
Hi,

I do a data import with commit=false.  I get the response back saying it's idle 
and 


Total number of rows skipped = -1 
Total number of rows processed = -1

This is the very first time after i start solr.  Subsequent times it doesn't 
return -1 but the rows it read from the datasource.

Why does it return -1?

And how would I interpret this?  Did the dataimport fail?

thank,

Tri

master master, repeaters

2010-12-19 Thread Tri Nguyen
Hi,

In the master-slave configuration, I'm trying to figure out how to configure 
the 
system setup for master failover.

Does solr support master-master setup?  From my readings, solr does not.

I've read about repeaters as well where the slave can act as a master.  When 
the 
main master goes down, do the other slaves switch to the repeater?

Barring better solutions, I'm thinking about putting 2 masters behind  a load 
balancer.

If this is not implemented already, perhaps solr can be updated to support a 
list of masters for fault tolerance.

Tri

shard versus core

2010-12-19 Thread Tri Nguyen
Hi,

Was wondering about  the pro's and con's of using sharding versus cores.

An index can be split up to multiple cores or multilple shards.

So why one over the other?

Thanks,


tri

Re: master master, repeaters

2010-12-19 Thread Tri Nguyen
How do we tell the slaves to point to the new master without modifying the 
config files?  Can we do this while the slave is up, issuing a command to it?
 
Thanks,
 
Tri

--- On Sun, 12/19/10, Upayavira  wrote:


From: Upayavira 
Subject: Re: master master, repeaters
To: solr-user@lucene.apache.org
Date: Sunday, December 19, 2010, 10:13 AM


We had a (short) thread on this late last week. 

Solr doesn't support automatic failover of the master, at least in
1.4.1. I've been discussing with my colleague (Tommaso) about ways to
achieve this.

There's ways we could 'fake it', scripting the following:

* set up a 'backup' master, as a replica of the actual master
* monitor the master for 'up-ness'
* if it fails:
   * tell the master to start indexing to the backup instead
   * tell the slave(s) to connect to a different master (the backup)
* then, when the master is back:
   * wipe its index (backing up dir first?)
   * configure it to be a backup of the new master
   * make it pull a fresh index over

But, Jan Høydahl suggested using SolrCloud. I'm going to follow up on
how that might work in that thread.

Upayavira


On Sun, 19 Dec 2010 00:20 -0800, "Tri Nguyen" 
wrote:
> Hi,
> 
> In the master-slave configuration, I'm trying to figure out how to
> configure the 
> system setup for master failover.
> 
> Does solr support master-master setup?  From my readings, solr does not.
> 
> I've read about repeaters as well where the slave can act as a master. 
> When the 
> main master goes down, do the other slaves switch to the repeater?
> 
> Barring better solutions, I'm thinking about putting 2 masters behind  a
> load 
> balancer.
> 
> If this is not implemented already, perhaps solr can be updated to
> support a 
> list of masters for fault tolerance.
> 
> Tri


Re: shard versus core

2010-12-20 Thread Tri Nguyen
Hi Erick,
 
Thanks for the explanation.
 
At which point does the index get too big where sharding is appropriate where 
it affects performance?
 
Tri

--- On Sun, 12/19/10, Erick Erickson  wrote:


From: Erick Erickson 
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Sunday, December 19, 2010, 7:36 AM


Well, they can be different beasts. First of all, different cores can have
different schemas, which is not true of shards. Also, shards are almost
assumed to be running on different machines as a scaling technique,
whereas it multiple cores are run on a single Solr instance.

So using multiple cores is very similar to running multiple "virtual" Solr
serves on a single machine, each independent of the other. This can make
sense if, for instance, you wanted to have a bunch of small indexes all
on one machine. You could use multiple cores rather than multiple
instances of Solr. These indexes may or may not have anything to do with
each other.

Sharding, on the other hand, is almost always used to split a single logical
index up amongst multiple machines in order to improve performance. The
assumption usually is that the index is too big to give satisfactory
performance
on a single machine, so you'll split it into parts. That assumption really
implies that it makes no sense to put multiple shards on the #same# machine.

So really, the answer to your question is that you choose the right
technique
for the problem you're trying to solve. They aren't really different
solutions to
the same problem...

Hope this helps.
Erick

On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen  wrote:

> Hi,
>
> Was wondering about  the pro's and con's of using sharding versus cores.
>
> An index can be split up to multiple cores or multilple shards.
>
> So why one over the other?
>
> Thanks,
>
>
> tri


Re: shard versus core

2010-12-20 Thread Tri Nguyen
Thought about it some more and after some reading.  I suppose the answer 
depends on what kind of response time is expected to be good enough.
 
I can do some stress testing and see if disk i/o is the bottleneck as the index 
grows.  I can also look into optimizing/configuring solr parameters to help 
performance.  One thing I've read is my disk should be at least 2 times the 
index.
 
 


--- On Mon, 12/20/10, Tri Nguyen  wrote:


From: Tri Nguyen 
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Monday, December 20, 2010, 4:04 AM


Hi Erick,
 
Thanks for the explanation.
 
At which point does the index get too big where sharding is appropriate where 
it affects performance?
 
Tri

--- On Sun, 12/19/10, Erick Erickson  wrote:


From: Erick Erickson 
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Sunday, December 19, 2010, 7:36 AM


Well, they can be different beasts. First of all, different cores can have
different schemas, which is not true of shards. Also, shards are almost
assumed to be running on different machines as a scaling technique,
whereas it multiple cores are run on a single Solr instance.

So using multiple cores is very similar to running multiple "virtual" Solr
serves on a single machine, each independent of the other. This can make
sense if, for instance, you wanted to have a bunch of small indexes all
on one machine. You could use multiple cores rather than multiple
instances of Solr. These indexes may or may not have anything to do with
each other.

Sharding, on the other hand, is almost always used to split a single logical
index up amongst multiple machines in order to improve performance. The
assumption usually is that the index is too big to give satisfactory
performance
on a single machine, so you'll split it into parts. That assumption really
implies that it makes no sense to put multiple shards on the #same# machine.

So really, the answer to your question is that you choose the right
technique
for the problem you're trying to solve. They aren't really different
solutions to
the same problem...

Hope this helps.
Erick

On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen  wrote:

> Hi,
>
> Was wondering about  the pro's and con's of using sharding versus cores.
>
> An index can be split up to multiple cores or multilple shards.
>
> So why one over the other?
>
> Thanks,
>
>
> tri


exception obtaining write lock on startup

2010-12-30 Thread Tri Nguyen
Hi,
 
I'm getting this exception when I have 2 cores as masters.  Seems like one of 
the cores obtains a lock (file) and then the other tries to obtain the same 
one.   However, the first one is not deleted.
 
How do I fix this?
 
Dec 30, 2010 4:34:48 PM org.apache.solr.handler.ReplicationHandler inform
WARNING: Unable to get IndexCommit on startup
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: Native
fsl...@..\webapps\solr\tnsolr\data\index\lucene-fe3fc928a4bbfeb55082e49b32a70c10
-write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:85)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1565)
    at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1421)
    at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:19
1)
    at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHand
ler.java:98)
    at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHa
ndler2.java:173)
    at org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpd
ateHandler2.java:376)
    at org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.
 
 
Tri

solr benchmarks

2010-12-31 Thread Tri Nguyen
Hi,
 
I remember going through some page that had graphs of response times based on 
index size for solr.
 
Anyone know of such pages?
 
Internally, we have some requirements for response times and I'm trying to 
figure out when to shard the index.
 
Thanks,
 
Tri

abort data import on errors

2011-01-04 Thread Tri Nguyen
Hi,
 
Is there a way to specify to abort (rollback) the data import should there be 
an error/exception?
 
If everything runs smoothly, commit the data import.
 
Thanks,
 
Tri

Re: abort data import on errors

2011-01-04 Thread Tri Nguyen
I didn't want to issue the rollback command but have solr automatically detect 
exceptions and rollback should there be exceptions.
 
Probably there's an attribute I can configure to specify this for solr to 
understand.
 
Tri

--- On Tue, 1/4/11, Markus Jelsma  wrote:


From: Markus Jelsma 
Subject: Re: abort data import on errors
To: solr-user@lucene.apache.org
Date: Tuesday, January 4, 2011, 4:57 PM


http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22

> Hi,
>  
> Is there a way to specify to abort (rollback) the data import should there
> be an error/exception? 
> If everything runs smoothly, commit the data import.
>  
> Thanks,
>  
> Tri


Re: HTTP Status 400 - org.apache.lucene.queryParser.ParseException

2011-01-18 Thread Tri Nguyen
what's the alternative?

--- On Tue, 1/18/11, Erick Erickson  wrote:


From: Erick Erickson 
Subject: Re: HTTP Status 400 - org.apache.lucene.queryParser.ParseException
To: solr-user@lucene.apache.org
Date: Tuesday, January 18, 2011, 5:24 AM


Why do you want to do this? Because toString has never been
guaranteed to be re-parsable, even in Lucene, so it's not
surprising that taking a Lucene toString() clause and submitting
it to Solr doesn't work.

Best
Erick

On Tue, Jan 18, 2011 at 4:49 AM, kun xiong  wrote:

> -- Forwarded message --
> From: kun xiong 
> Date: 2011/1/18
> Subject: HTTP Status 400 - org.apache.lucene.queryParser.ParseException
> To: solr-user@lucene.apache.org
>
>
> Hi all,
>  I got a ParseException when I query solr with Lucene BooleanQuery
> expression (toString()).
>
> I use the default parser : LuceneQParserPlugin,which should support whole
> lucene syntax,right?
>
> Java Code:
>
> BooleanQuery bq = new BooleanQuery();
> Query q1 = new TermQuery(new Term("I_NAME_ENUM", "KFC"));
>  Query q2 = new TermQuery(new Term("I_NAME_ENUM", "MCD"));
> bq.add(q1, Occur.SHOULD);
>  bq.add(q2, Occur.SHOULD);
> bq.setMinimumNumberShouldMatch(1);
> String solrQuery = bq.toString();
>
> query string is : q=(I_NAME_ENUM:kfc I_NAME_ENUM:best western)~1
>
> Exceptions :
>
> *message* *org.apache.lucene.queryParser.ParseException: Cannot parse
> '(I_NAME_ENUM:kfc I_NAME_ENUM:best western)~1': Encountered " 
> "~1 "" at line 1, column 42. Was expecting one of:   ...  ...
>  ... "+" ... "-" ... "(" ... "*" ... "^" ...  ...  ...
>  ...  ... "[" ... "{" ...  ... *
>
> *description* *The request sent by the client was syntactically incorrect
> (org.apache.lucene.queryParser.ParseException: Cannot parse
> '(I_NAME_ENUM:kfc I_NAME_ENUM:best western)~1': Encountered " 
> "~1 "" at line 1, column 42. Was expecting one of:   ...  ...
>  ... "+" ... "-" ... "(" ... "*" ... "^" ...  ...  ...
>  ...  ... "[" ... "{" ...  ... ).*
>
> *
> *
>
> Anyone could help?
>
>
> Thanks
>
> Kun
>
> *
> *
>


using dismax

2011-01-18 Thread Tri Nguyen
Hi,
 
Maybe I'm missing something obvious.
 
I'm trying to use the dismax parser and it doesn't seem like I'm using it 
properly.
 
When I do this:
http://localhost:8080/solr/cs/select?q=(poi_id:3)
 
I get a row returned.
 
When I incorporate dismax and say mm=1, no results get returned.
http://localhost:8080/solr/cs/select?q=(poi_id:3)&defType=dismax&mm=1
 
What I wanted to do when I specify mm=1 is to say at least 1 query parameter 
matches.
 
What am I missing?
 
Thanks,
 
Tri

performance during index switch

2011-01-19 Thread Tri Nguyen
Hi,
 
Are there performance issues during the index switch?
 
As the size of index gets bigger, response time slows down?  Are there any 
studies on this?
 
Thanks,
 
Tri

Re: performance during index switch

2011-01-19 Thread Tri Nguyen
Yes, during a commit.
 
I'm planning to do as you suggested, having a master do the indexing and 
replicating the index to a slave which leads to my next questions.
 
During the slave replicates the index files from the master, how does it impact 
performance on the slave?
 
Tri


--- On Wed, 1/19/11, Jonathan Rochkind  wrote:


From: Jonathan Rochkind 
Subject: Re: performance during index switch
To: "solr-user@lucene.apache.org" 
Date: Wednesday, January 19, 2011, 11:30 AM


During commit?

A commit (and especially an optimize) can be expensive in terms of both CPU and 
RAM as your index grows larger, leaving less CPU for querying, and possibly 
less RAM which can cause Java GC slowdowns in some cases.

A common suggestion is to use Solr replication to seperate out a Solr index 
that you index to, and then replicate to a slave index that actually serves 
your queries. This should minimize any performance problems on your 'live' Solr 
while indexing, although there's still something that has to be done for the 
actual replication of course. Haven't tried it yet myself.  Plan to -- my plan 
is actually to put them both on the same server (I've only got one), but in 
seperate JVMs, and on a server with enough CPU cores that hopefully the 
indexing won't steal CPU the querying needs.

On 1/19/2011 2:23 PM, Tri Nguyen wrote:
> Hi,
>   Are there performance issues during the index switch?
>   As the size of index gets bigger, response time slows down?  Are there any 
>studies on this?
>   Thanks,
>   Tri


response when using my own QParserPlugin

2011-02-03 Thread Tri Nguyen
Hi,

I wrote a QParserPlugin.  When I hit solr and use this QParserPlugin, the 
response does not have the column names associated with the data such as:

0 29 0 {!tnav} faketn1 CA city san francisco US 10 - - 495,496,497 
500,657,498,499 us:ca:san francisco faketn,fakeregression 037.74 -122.49 
faketn1 
faketn1 faketn1 faketn1 faketn1 99902837 
+3774-12250|+3774-12250@1|+3772-12252@2 94116:us 495,496,497 
fakecs,fakeatti,fakevenable 500,657,498,499 San Francisco 667 US 37.742369 
-122.491240 Main Dishes Pancakes faketn1 2.99 Enjoy 
best chinese food. faketn1 1;0:0:0:0:8:20% off.0:0:0:3:0.0 4158281775 94116 
ACTION_MODEL TN CA 2350 Taraval St Enjoy best chinese food 40233 - 
5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3
 2027 - 



How do I get the data to be associated with the index columns so I can parse it 
and know the context of the data (such as this data is the business name, this 
data is the address, etc).

---


i was hoping it return something like this or some sort of structure.

 
- 
- 
  0 
  1 
- 
  on 
  0 
  I_NAME_EXACT:faketn1 
  10 
  2.2 
  
  
- 
- 
- 
  - 
  - 
  
  495,496,497 
  500,657,498,499 
  us:ca:san francisco 
  faketn,fakeregression 
  037.74 
  -122.49 
  faketn1 
  faketn1 
  faketn1 
  faketn1 
  faketn1 
  99902837 
  +3774-12250|+3774-12250@1|+3772-12252@2 
  94116:us 
  495,496,497 
  fakecs,fakeatti,fakevenable 
  500,657,498,499 
  San Francisco 
  667 
  US 
   
  37.742369 
  -122.491240 
  Main Dishes Pancakes faketn1 
2.99 

  Enjoy best chinese food. 
  faketn1 
  1;0:0:0:0:8:20% off.0:0:0:3:0.0 
  4158281775 
  94116 
  ACTION_MODEL 
  TN 
   
  CA 
  2350 Taraval St 
   
   
  Enjoy best chinese food 
  40233 
  - 
  
5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3
 
  2027 
   
  - 
  
  
  
 
Tri

solr current workding directory or reading config files

2011-02-09 Thread Tri Nguyen
Hi,

I have a class (in a jar) that reads from properties (text) files.  I have 
these 
files in the same jar file as the class.

However, when my class reads those properties files, those files cannot be 
found 
since solr reads from tomcat's bin directory.

I don't really want to put the config files in tomcat's bin directory.

How do I reconcile this?

Tri

pre and post processing when building index

2011-02-09 Thread Tri Nguyen
Hi,

I'm scheduling solr to build every hour or so.

I'd like to do some pre and post processing for each index build.  The 
preprocessing would do some checks and perhaps will skip the build.

For post processing, I will do some checks and either commit or rollback the 
build.

Can I write some class and plugin into solr for this?

Thanks,

Tri

communication between entity processor and solr DataImporter

2011-02-09 Thread Tri Nguyen
Hi,

I'd like to communicate errors between my entity processor and the DataImporter 
in case of error.

Should there be an error in my entity processor, I'd like the index build to 
rollback. How can I do this?

I want to throw an exception of some sort.  Only thing I can think of is to 
force a runtime exception be thrown in nextRow() of the entityprocessor since 
runtime exceptions are not checked and does not have to be declared in the 
nextRow() method signature.

How can I request the nextRow() method signature be updated to throw 
Exception?  
Would it even make sense?

Tri

Re: solr current workding directory or reading config files

2011-02-09 Thread Tri Nguyen
Wanted to add some more details to my problem.  I have many jars that have 
their 
own config files.  So I'd have to copy files for every jar.  Can solr read from 
the classpath (jar files)?

Yes my war is always deployed to the same location under webapps.  I do already 
have solr/home defined in web.xml.  I'll try copying my files into there, but I 
would have to extract every jar file and do this manually.





From: "Wilkes, Chris" 
To: solr-user@lucene.apache.org
Sent: Wed, February 9, 2011 3:44:03 PM
Subject: Re: solr current workding directory or reading config files

Is your war always deployed the the same location, ie 
"/usr/mycomp/myapplication/webapps/myapp.war"?  If so then on startup copy the 
files out of your directory and put them under CATALINA_BASE/solr 
(usr/mycomp/myapplication/solr) and in your war file have the 
META-INF/context.xml JNDI setting point to that.


  


If you know of a way to reference CATALINA_BASE in the context.xml that would 
make it easier.

On Feb 9, 2011, at 12:00 PM, Tri Nguyen wrote:

> Hi,
> 
> I have a class (in a jar) that reads from properties (text) files.  I have 
>these
> files in the same jar file as the class.
> 
> However, when my class reads those properties files, those files cannot be 
>found
> since solr reads from tomcat's bin directory.
> 
> I don't really want to put the config files in tomcat's bin directory.
> 
> How do I reconcile this?
> 
> Tri

Re: communication between entity processor and solr DataImporter

2011-02-09 Thread Tri Nguyen
I can throw DataImportHandlerException (a runtime exception) from my 
entityprocessor which will force a rollback.

Tri





From: Tri Nguyen 
To: solr-user@lucene.apache.org
Sent: Wed, February 9, 2011 3:50:05 PM
Subject: communication between entity processor and solr DataImporter

Hi,

I'd like to communicate errors between my entity processor and the DataImporter 
in case of error.

Should there be an error in my entity processor, I'd like the index build to 
rollback. How can I do this?

I want to throw an exception of some sort.  Only thing I can think of is to 
force a runtime exception be thrown in nextRow() of the entityprocessor since 
runtime exceptions are not checked and does not have to be declared in the 
nextRow() method signature.

How can I request the nextRow() method signature be updated to throw 
Exception?  

Would it even make sense?

Tri

running optimize on master

2011-02-10 Thread Tri Nguyen
Hi,

I've read running optimize is similar to running defrag on a hard disk.  
Deleted 
docs are removed and segments are reorganized for faster searching.

I have a couple questions.

Is optimize necessary if  I never delete documents?  I build the index every 
hour but we don't delete in between builds.

Secondly, what kind of reorganizing of segments is done to make searches faster?

Thanks,

Tri

Re: running optimize on master

2011-02-10 Thread Tri Nguyen
Does optimize merge all segments into 1 segment on the master after the build?

Or after the build, there's only 1 segment.

thanks,

Tri





From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Thu, February 10, 2011 5:08:44 PM
Subject: Re: running optimize on master

Optimizing isn't necessary in your scenario, as you don't delete
documents and rebuild the whole thing each time anyway.

As for faster searches, this has been largely been made obsolete
by recent changes in how indexes are built in the first place. Especially
as you can build your index in an hour, it's likely not big enough to
benefit from optimizing even under the old scenario

So, unless you have some evidence that your queries are performing
poorly, I would just leave the optimize step off.

Best
Erick


On Thu, Feb 10, 2011 at 7:09 PM, Tri Nguyen  wrote:
> Hi,
>
> I've read running optimize is similar to running defrag on a hard disk.  
>Deleted
> docs are removed and segments are reorganized for faster searching.
>
> I have a couple questions.
>
> Is optimize necessary if  I never delete documents?  I build the index every
> hour but we don't delete in between builds.
>
> Secondly, what kind of reorganizing of segments is done to make searches 
>faster?
>
> Thanks,
>
> Tri


slave out of sync

2011-02-14 Thread Tri Nguyen
Hi,

We're thinking of having a master-slave configuration where there are multiple 
slaves.  Let's say during replication, one of the slaves does not replicate 
properly.

How will we dectect that the 1 slave is out of sync?

Tri

rollback to other versions of index

2011-02-14 Thread Tri Nguyen
Hi,

Does solr version each index build?  

We'd like to be able to rollback to not just a previous version but maybe a few 
version before the current one.

Thanks,

Tri

Re: rollback to other versions of index

2011-02-15 Thread Tri Nguyen
Hi,

Wanted to explain my situation in more detail.

I have a master which never adds or deletes documents incrementally.  I just 
run 
the dataimport with autocommit.

Seems like I'll need to make a custom DeletionPolicy to keep more than one 
index 
around.

I'm accessing indices from Solr.  How do I tell solr to use a particular index?

Thanks,

Tri





From: Michael McCandless 
To: solr-user@lucene.apache.org
Sent: Tue, February 15, 2011 5:36:49 AM
Subject: Re: rollback to other versions of index

Lucene is able to do this, if you make a custom DeletionPolicy (which
controls when commit points are deleted).

By default Lucene only saves the most recent commit
(KeepOnlyLastCommitDeletionPolicy), but if your policy keeps more
around, then you can open an IndexReader or IndexWriter on any
IndexCommit.

Any changes (including optimize, and even opening a new IW with
create=true) are safe within a commit; Lucene is fully transactional.

For example, I use this for benchmarking: I save 4 commit points in a
single index.  First is a multi-segment index, second is the same
index with 5% deletions, third is an optimized index, and fourth is
the optimized index with 5% deletions.  This gives me a single index
w/ 4 different commit points, so I can then benchmark searching
against any of those 4.

Mike

On Tue, Feb 15, 2011 at 4:43 AM, Jan Høydahl  wrote:
> Yes and no. The index grows like an onion adding new segments for each commit.
> There is no API to remove the newly added segments, but I guess you could 
> hack 
>something.
> The other problem is that as soon as you trigger an optimize() all history is 
>gone as the segments are merged into one. Optimize normally happens 
>automatically behind the scenes. You could turn off merging but that will 
>badly 
>hurt your performance after some time and ultimately crash your OS.
>
> Since you only need a few versions back, you COULD write your own custom 
>mergePolicy, always preserving at least N versions. But beware that a 
>"version" 
>may be ONE document or 1 documents, depending on how you commit or if 
>autoCommit is active. so if you go this route you also need strict control 
>over 
>your commits.
>
> Perhaps best option is to handle this on feeding client side, where you keep 
> a 
>buffer of N last docs. Then you can freely roll back or re-index as you 
>choose, 
>based on time, number of docs etc.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> On 15. feb. 2011, at 01.21, Tri Nguyen wrote:
>
>> Hi,
>>
>> Does solr version each index build?
>>
>> We'd like to be able to rollback to not just a previous version but maybe a 
>few
>> version before the current one.
>>
>> Thanks,
>>
>> Tri
>
>


adding a TimerTask

2011-02-18 Thread Tri Nguyen
Hi,

How can I add a TimerTask to Solr?

Tri

Re: adding a TimerTask

2011-02-19 Thread Tri Nguyen
Seems like one way is to write a servlet who's init method creates a TimerTask.





From: Tri Nguyen 
To: solr user 
Sent: Fri, February 18, 2011 6:02:44 PM
Subject: adding a TimerTask

Hi,

How can I add a TimerTask to Solr?

Tri

Re: slave out of sync

2011-02-19 Thread Tri Nguyen
there is an http api where I can look at the latest replication and whether 
there is an "ERROR" keyword.  If so, the latest replication failed.





From: Otis Gospodnetic 
To: solr-user@lucene.apache.org
Sent: Wed, February 16, 2011 11:31:26 AM
Subject: Re: slave out of sync

Hi Tri,

You could look at the stats page for each slave and compare the number of docs 
in them.  The one(s) that are off from the rest/majority are out of sync.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message ----
> From: Tri Nguyen 
> To: solr-user@lucene.apache.org
> Sent: Mon, February 14, 2011 7:19:58 PM
> Subject: slave out of sync
> 
> Hi,
> 
> We're thinking of having a master-slave configuration where there are  
> multiple 
>
>
> slaves.  Let's say during replication, one of the slaves does not  replicate 
> properly.
> 
> How will we dectect that the 1 slave is out of  sync?
> 
> Tri


class not found

2011-04-07 Thread Tri Nguyen
Hi,

I wrote my own parser plugin.

I'm getting a NoClassCefFoundError.  Any ideas why?

Apr 7, 2011 1:12:43 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.solr.search.QParserPlugin
    at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1444)
    at org.apache.solr.core.SolrCore.(SolrCore.java:548)
    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
    at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
    at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)

    at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
    at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)

    at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)

    at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)

    at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
    at 
org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
    at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
    at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
    at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
    at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:850)
    at 
org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:724)
    at 
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:493)
    at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
    at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
    at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)

    at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
    at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
    at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
    at 
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
    at 
org.apache.catalina.core.StandardService.start(StandardService.java:516)
    at 
org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
    at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
    at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)

Tri

Re: class not found

2011-04-07 Thread Tri Nguyen
yes.





From: Ahmet Arslan 
To: solr-user@lucene.apache.org
Sent: Thu, April 7, 2011 3:23:56 PM
Subject: Re: class not found

> I wrote my own parser plugin.
> 
> I'm getting a NoClassCefFoundError.  Any ideas why?

Did you put jar file - that contains you custom code - into /lib directory?
http://wiki.apache.org/solr/SolrPlugins


Re: class not found

2011-04-07 Thread Tri Nguyen
The jar containing the class is in here:

/usr/local/apache-tomcat-6.0.20/webapps/solr/WEB-INF/lib

for my setup.

Tri





From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Thu, April 7, 2011 3:24:14 PM
Subject: Re: class not found

Can you give us some more details? I suspect the jar file containing
your plugin isn't in the Solr lib directory and/or you don't have a lib
directive in your solrconfig.xml file pointing to where your jar is.

But that's a guess since you haven't provided any information about
what you did to try to use your plugin, like how you deployed it, how
you compiled it, how

Best
Erick

On Thu, Apr 7, 2011 at 4:43 PM, Tri Nguyen  wrote:

> Hi,
>
> I wrote my own parser plugin.
>
> I'm getting a NoClassCefFoundError.  Any ideas why?
>
> Apr 7, 2011 1:12:43 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.solr.search.QParserPlugin
>        at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1444)
>        at org.apache.solr.core.SolrCore.(SolrCore.java:548)
>        at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
>        at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
>        at
>
>org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
>)
>
>        at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
>        at
>
>org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
>)
>
>        at
>
>org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
>)
>
>        at
>
>org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
>)
>
>        at
>
> 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
>        at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
>        at
>
> 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
>        at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
>        at
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
>        at
> org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:850)
>        at
> org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:724)
>        at
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:493)
>        at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
>        at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
>        at
>
>org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
>)
>
>        at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
>        at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
>        at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>        at
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
>        at
> org.apache.catalina.core.StandardService.start(StandardService.java:516)
>        at
> org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
>        at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>)
>
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
>        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
>
> Tri


parsing many documents takes too long

2011-08-11 Thread Tri Nguyen
Hi,
 
My results from solr returns about 982 documents and I use jaxb to parse them 
into java objects, which takes about 469 ms, which is over my 150-200ms 
threshold.
 
Is there a solution around this?  Can I store the java objects in the index and 
return them in the solr response and then serialize them back into java 
objects?  Would this take less time?
 
Any other ideas?
 
Thanks,
 
Tri

sorting distance in solr 1.4.1

2011-08-12 Thread Tri Nguyen
Hi,
 
We are using solr 1.4.1 and we need to sort our results by distance. We have 
lat lons for each document in the response and our reference point.
 
Is it possible?  I read about the spatial plugin but the does range searching:
 
http://blog.jayway.com/2010/10/27/geo-search-with-spatial-solr-plugin/
 
Doesn't talk about sorting the results by distance (as supported by solr 3.1).
 
Tri