Re: cron job update index

2008-09-17 Thread sunnyfr
is it this one ? http://wiki.apache.org/solr/CollectionDistribution#head-9f393ae2a6230fe23e422f1583f31edbff7b1007 Otis Gospodnetic wrote: > > Hi Sunny, > > There is a very detailed page about this on the Wiki. Have you seen it? > > > Otis -- > Sematext -- http://sematext.com/ -- Lucene -

Re: cron job update index

2008-09-17 Thread Shalin Shekhar Mangar
On Wed, Sep 17, 2008 at 1:37 PM, sunnyfr <[EMAIL PROTECTED]> wrote: > is it this one ? > > > http://wiki.apache.org/solr/CollectionDistribution#head-9f393ae2a6230fe23e422f1583f31edbff7b1007 > Yes. -- Regards, Shalin Shekhar Mangar.

scripts.conf

2008-09-17 Thread sunnyfr
Hi, Just to be sure ? scripts.conf is used if in my command runned .. snappuller or snapinstaller ... I don't write the value straight. It's like conf file with parameters inside ... Somebdoy has an exemple about scripts.conf and comand line to use parameters? Thanks, Sunny -- View this messag

How to set term frequency given a term and a value stating the frequency?

2008-09-17 Thread ristretto . rb
Hello, I'm looking through the wiki, so if it's there, I'll find it, and you can ignore this post. If this isn't documented, can anyone explain how to achieve this? Suppose I have two docs A and B that I want to index. I want to index these documents so that A has the equivalent of 100 copies of

Re: scripts.conf

2008-09-17 Thread sunnyfr
There is as well solr/conf/rsyncd.conf what is the difference with scripts.conf ? and should that be in every instance of solr : like solr/user/conf and solr/books/conf ...? Cheers, sunnyfr wrote: > > Hi, > > Just to be sure ? > scripts.conf is used if in my command runned .. snappuller or >

Re: scripts.conf

2008-09-17 Thread sunnyfr
Ok, obviously rsyncd.conf is generated automaticly by rsync. Somebody has an exemple of scripts.conf ? sunnyfr wrote: > > There is as well solr/conf/rsyncd.conf what is the difference with > scripts.conf ? > and should that be in every instance of solr : like solr/user/conf and > solr/books/con

Re: scripts.conf

2008-09-17 Thread Koji Sekiguchi
> Ok, obviously rsyncd.conf is generated automaticly by rsync. > Somebody has an exemple of scripts.conf ? Have you read this? http://wiki.apache.org/solr/SolrCollectionDistributionScripts Koji

Can Solr be used to search public websites(Newbie).

2008-09-17 Thread convoyer
Hi all. I am quite new to solr. I am just checking whether this tool suits my application. I am developing a search application that searches all publically available websites and also some selective websites. Can I use solr for this purpose. If yes how can I get started. All the tutorials are po

Re: scripts.conf

2008-09-17 Thread sunnyfr
Yes I did, it's just not clear about how it works with several instance. So far, like I explained, my tree looks like solr/user/bin (snappuller rsyncd...) solr/user/conf (scripts.conf) solr/user/logs (rsyncd-enable ...) solr/books/bin (snappuller rsyncd...) solr/books/conf (scripts.conf) solr/

Re: Can Solr be used to search public websites(Newbie).

2008-09-17 Thread Ryan McKinley
Solr only manages the indexing/search side, it does not do any crawling like nutch. For crawling a small site, you may want to check out: http://aperture.sourceforge.net/ (mature, but RDF heavy) Or Droids: http://people.apache.org/~thorsten/droids/ Droids is new, and will change a lot soon,

Re: Can Solr be used to search public websites(Newbie).

2008-09-17 Thread George Everitt
Dear Con, Searching the entire Internet is a non-trivial computer science problem. It's kind of like asking a brain surgeon the best way to remove a tumor. The answer should be "First, spend 16 years becoming a neurosurgeon". My point is, there is a whole lot you need to know beyond "i

help >> rsyncd-enable

2008-09-17 Thread sunnyfr
[EMAIL PROTECTED]:/solr/user/bin# bash rsyncd-enable rsyncd-enable: line 21: cd: rsyncd-enable/..: Not a directory rsyncd-enable: line 26: /solr/user/bin/bin/scripts-util: No such file or directory rsyncd-enable: line 60: fixUser: command not found rsyncd-enable: line 62: setStartTime: command no

Re: scripts.conf

2008-09-17 Thread Bill Au
All the scripts dot in (".") the utility script scripts-util, which in turn dots in scripts.conf. Why are you running several instances, multiple ports, multiple webapps, or multiple cores? http://wiki.apache.org/solr/MultipleIndexes Bill On Wed, Sep 17, 2008 at 8:50 AM, sunnyfr <[EMAIL PROTE

Re: help >> rsyncd-enable

2008-09-17 Thread Bill Au
try the command line in stead: /solr/user/bin/rsyncd-enable The scripts do not like to be "bashed". Bill On Wed, Sep 17, 2008 at 9:24 AM, sunnyfr <[EMAIL PROTECTED]> wrote: > > [EMAIL PROTECTED]:/solr/user/bin# bash rsyncd-enable > rsyncd-enable: line 21: cd: rsyncd-enable/..: Not a director

Re: scripts.conf

2008-09-17 Thread sunnyfr
I created several instance for a multi core to manage users and books independently. didn't get : "All the scripts dot in (".") the utility script scripts-util, which in turn dots in scripts.conf." Bill Au wrote: > > All the scripts dot in (".") the utility script scripts-util, which in > turn

Re: help >> rsyncd-enable

2008-09-17 Thread sunnyfr
Thanks my bad, it was a problem with my user. Bill Au wrote: > > try the command line in stead: > > /solr/user/bin/rsyncd-enable > > The scripts do not like to be "bashed". > > Bill > > > > On Wed, Sep 17, 2008 at 9:24 AM, sunnyfr <[EMAIL PROTECTED]> wrote: > >> >> [EMAIL PROTECTED]:/sol

Re: How to copy a solr index to another index with a different schema collapsing stored data?

2008-09-17 Thread Erick Erickson
You *might* be able to reconstruct enough of the "original" documents from your indexes to create another without recrawling. I know Luke can reconstruct documents form an index, but for unstored data it's slow and may be lossy. But it may suit your needs given how long it takes to make your index

Re: scripts.conf

2008-09-17 Thread Bill Au
The "." (dot) command executes a shell script in the current shell environment. Do you have a separate instance directory for each instance? http://wiki.apache.org/solr/CoreAdmin Each separate instance directory will have its own conf and data directory. So each one has its own scritps.conf: ht

snappuller / rsync

2008-09-17 Thread sunnyfr
Hi, Sorry again, Can you please clear up a point : snappuller should be run on a slave's server to check new snapshots and pull them. and rsyncd is runned from the master. I don't get really what is rsyncd role? thanks -- View this message in context: http://www.nabble.com/snappuller---rsync

Highlighter throws StringIndexOutOfBoundsException on multivalued fields

2008-09-17 Thread dojolava
Hi, if I want to highlight a mutivalued field I get the following exception: String index out of range: 21 java.lang.StringIndexOutOfBoundsException: String index out of range: 21 at java.lang.String.substring(Unknown Source) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(

Re: How to copy a solr index to another index with a different schema collapsing stored data?

2008-09-17 Thread Brian Carmalt
It wouldn't be that bad to merge the index externally and the reindex the results, if it is as simple as your example. Search for id:[1 TO *] and a fq for the category, increment the slice of the results you need to process until you have covered all of the docs in the category. Request the content

Re: admin/logging page and "Effective" level

2008-09-17 Thread Sean Timm
Chris-- Sorry, your e-mail got lost in the noise. You're right, there does appear to be a problem. I can reproduce this by setting the "root" level to "OFF" and then setting it back to "INFO". I'll take a look into it. Have you opened a JIRA issue for this? -Sean Chris Hostetter wrote:

Re: Highlighter throws StringIndexOutOfBoundsException on multivalued fields

2008-09-17 Thread dojolava
I forgot: this concerns the Solr 1.3.0 release. On Wed, Sep 17, 2008 at 4:15 PM, dojolava <[EMAIL PROTECTED]> wrote: > Hi, > > if I want to highlight a mutivalued field I get the following exception: > > String index out of range: 21 java.lang.StringIndexOutOfBoundsException: > String index out o

Re: scripts.conf

2008-09-17 Thread sunnyfr
Ok it's exactly what I've done. Bill Au wrote: > > The "." (dot) command executes a shell script in the current shell > environment. > > Do you have a separate instance directory for each instance? > > http://wiki.apache.org/solr/CoreAdmin > > Each separate instance directory will have its o

Re: cron job update index

2008-09-17 Thread sunnyfr
hi, According to the fact that a Collection is a Lucene collection is a directory of files. These comprise the indexed and returnable data of a Solr search repository. I just want to be sure because this page speak about : http://wiki.apache.org/solr/CollectionDistribution#head-9f393ae2a6230fe23

RE: snappuller / rsync

2008-09-17 Thread Kashyap, Raghu
Hi, Rsyncd is the rsync(http://samba.anu.edu.au/rsync/) daemon. You need to make sure that Rsynchd is running on both master & the slave machines. You use snapshooter on the master server to create the snapshot & run snappuller on the slave machines to pull those snapshots from master server an

Multiple Process of the SAME solr instance

2008-09-17 Thread mohitranka
Hi All, I am using Solr 1.3 with Tomcat 5.5.20 as servlet container. I need to create multiple process of the same Solr instance to process the incoming indexes effectively. Can you point me how (and where :-) ) to do it? Thanks and regards, Mohit Ranka -- View this message in cont

RE: snappuller / rsync

2008-09-17 Thread sunnyfr
Hi Raghu, Thanks it's clear now; Kashyap, Raghu wrote: > > Hi, > > Rsyncd is the rsync(http://samba.anu.edu.au/rsync/) daemon. You need > to make sure that Rsynchd is running on both master & the slave > machines. You use snapshooter on the master server to create the > snapshot & run snap

Re: admin/logging page and "Effective" level

2008-09-17 Thread Sean Timm
I didn't see a bug on this issue, so I opened SOLR-774 with a patch to fix this. -Sean Sean Timm wrote: Chris-- Sorry, your e-mail got lost in the noise. You're right, there does appear to be a problem. I can reproduce this by setting the "root" level to "OFF" and then setting it back to

Re: cron job update index

2008-09-17 Thread Shalin Shekhar Mangar
On Wed, Sep 17, 2008 at 8:12 PM, sunnyfr <[EMAIL PROTECTED]> wrote: > > According to the fact that a Collection is a Lucene collection is a > directory of files. These comprise the indexed and returnable data of a > Solr > search repository. > > I just want to be sure because this page speak about

Re: [SPAM] Multiple Process of the SAME solr instance

2008-09-17 Thread Matthew Runo
I'm not 100% sure on what you mean, but if you're asking if you can run two or more solr webapps and use them all to build up one index, then you can't. You'll end up with a corrupted index. Only one solr.war webapp can write to an index at a time. Thanks for your time! Matthew Runo Softwa

Re: What's the bottleneck?

2008-09-17 Thread Sean Timm
The HitCollector used by the Searcher is wrapped by a TimeLimitedCollector which times out search requests that take longer than the maximum allowed search time limit during the

Re: cron job update index

2008-09-17 Thread sunnyfr
No actually I worked as well on replication so both answers are interesting. Ok Just saw that, I've to create a cron job that uses wget to hit the delta import, every 5mn or so. Am I doing something wrong or not? Every time I start (manually) delta-import (.../dataimport?command=delta-import) and

Re: cron job update index

2008-09-17 Thread Shalin Shekhar Mangar
On Wed, Sep 17, 2008 at 9:14 PM, sunnyfr <[EMAIL PROTECTED]> wrote: > > Am I doing something wrong or not? > Every time I start (manually) delta-import > (.../dataimport?command=delta-import) > and then I go back to check the statut : http://.../solr/books/dataimport, > it's still running like it

Re: cron job update index

2008-09-17 Thread sunnyfr
Thanks it's clear now, It just means loads of documents has changed. Sorry but silly question about "Then the main "query" is executed for each primary key identified by the deltaQuery. This main query is used to create the documents and index them." I don't see in the code the link between the

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread Otis Gospodnetic
Hi Mohit, I think we'll need a bit more info before we can help. What kinds of processes do you need and what are you trying to achieve? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: mohitranka <[EMAIL PROTECTED]> > To: solr-user@lucene

Re: Some new SOLR features

2008-09-17 Thread Yonik Seeley
On Tue, Sep 16, 2008 at 10:12 AM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: >> SQL database such as H2 > Mainly to offer joins and be able to perform hierarchical queries. Can you define or give an example of what you mean by "hierarchical" queries? A downside of any type of cross-document quer

Re: How to set term frequency given a term and a value stating the frequency?

2008-09-17 Thread Otis Gospodnetic
There are Lucene field term Paylods that can be associated with each token, which I think you could use for this type of boosting, but there is not much built-in support for Payloads in Solr yet. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message >

Re: cron job update index

2008-09-17 Thread Shalin Shekhar Mangar
On Wed, Sep 17, 2008 at 9:42 PM, sunnyfr <[EMAIL PROTECTED]> wrote: > > Sorry but silly question about "Then the main "query" is > executed for each primary key identified by the deltaQuery. This main query > is used to create the documents and index them." > > I don't see in the code the link be

Re: snappuller / rsync

2008-09-17 Thread Bill Au
You only need to run the rsync daemon on the master. Bill On Wed, Sep 17, 2008 at 10:54 AM, sunnyfr <[EMAIL PROTECTED]> wrote: > > Hi Raghu, > > Thanks it's clear now; > > > Kashyap, Raghu wrote: > > > > Hi, > > > > Rsyncd is the rsync(http://samba.anu.edu.au/rsync/) daemon. You need > > to m

Re: Some new SOLR features

2008-09-17 Thread Jason Rutherglen
If the configuration code is going to be rewritten then I would like to see the ability to dynamically update the configuration and schema without needing to reboot the server. Also I would like the configuration classes to just contain data and not have so many methods that operate on the filesys

Re: admin/logging page and "Effective" level

2008-09-17 Thread Chris Hostetter
: : I didn't see a bug on this issue, so I opened SOLR-774 with a patch to fix : this. thanks sean, i hadn't opened a bug yet because i assumed i was missunderstanding something and i hadn't had time to dig into the code to double check. -Hoss

Re: Some new SOLR features

2008-09-17 Thread Jason Rutherglen
> Can you define or give an example of what you mean by "hierarchical" queries? Good question, I think Erik Hatcher had more ideas on that. I was imagining joins or sub queries like SQL does. Clearly they won't be efficient, but it's easier than implementing joins (or is it) in SOLR? Joins lim

Re: Some new SOLR features

2008-09-17 Thread Yonik Seeley
On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > If the configuration code is going to be rewritten then I would like > to see the ability to dynamically update the configuration and schema > without needing to reboot the server. Exactly. Actually, multi-core allows

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread mohitranka
Thanks for your replies. Actually the Solr instance will have many indexes to be updated simaltaneously, (say 100). Now i want to create 10 thread/process, so that I can process 10 indexes at a time, instead of 1. I hope i am more clear with my requirement. :-) Thanks and regards, Mohit Ranka

RE: Some new SOLR features

2008-09-17 Thread Lance Norskog
My vote is for dynamically scanning a directory of configuration files. When a new one appears, or an existing file is touched, load it. When a configuration disappears, unload it. This model works very well for servlet containers. Lance -Original Message- From: [EMAIL PROTECTED] [mailto

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread Otis Gospodnetic
Mohit, it sounds like you are looking for http://wiki.apache.org/solr/MultipleIndexes Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: mohitranka <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, September 17, 2008 3:1

Re: How to set term frequency given a term and a value stating the frequency?

2008-09-17 Thread Gene Campbell
I decided to store the word X number of times when indexing the doc. times = 5 value = times * "dog " # "dog dog dog dog dog " gets indexed, of course times is specific to each doc. thanks for the help and advice Otis!! cheers gene On Thu, Sep 18, 2008 at 4:27 AM, Otis Gospodnetic <[EMAIL PRO

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread mohitranka
Otis, Thanks for your reply. I think i misdirected you from my previous message. What I meant was 100 "documents". which should be added to solr index. Sorry for lack of clarity in the query. Thanks and regards, Mohit Ranla Otis Gospodnetic wrote: > > Mohit, it sounds like you are l

Re: Some new SOLR features

2008-09-17 Thread Henrib
Yonik Seeley wrote: > > ...multi-core allows you to instantiate a completely > new core and swap it for the old one, but it's a bit of a heavyweight > approach > ...a schema object would not be mutable, but > that one could easily swap in a new schema object for an index at any > time... >

Re: Some new SOLR features

2008-09-17 Thread Yonik Seeley
On Wed, Sep 17, 2008 at 4:50 PM, Henrib <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: >> >> ...multi-core allows you to instantiate a completely >> new core and swap it for the old one, but it's a bit of a heavyweight >> approach >> ...a schema object would not be mutable, but >> that one co

Re: Searching with Wildcards

2008-09-17 Thread dojolava
Hi, I have another question on the wildcard problem: In the previous Solr releases there was a workaround to highlight wildcard queries using the StandardRequestHandler by adding a ? in between: e.g. hou?* would highlight house. But this is not working anymore. Is there maybe another workaround?

Re: Searching with Wildcards

2008-09-17 Thread Mark Miller
Alas no, the queryparser now uses an unhighlightable constantscore query. I'd personally like to make it work at the Lucene level, but not sure how thats going to proceed. The tradeoff is that you won't have max boolean clause issues and wildcard searches should be faster. It is a bummer though

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread Otis Gospodnetic
Mohit, Have you tried following the Solr tutorial? Adding multiple documents to Solr is a normal Solr usage and you go through that if you follow the tutorial on the site. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: mohitranka <[EMAI

Hardware config for SOLR

2008-09-17 Thread Andrey Shulinskiy
Hello, We're planning to use SOLR for our project, got some questions. So I asked some Qs yesterday, got no answers whatsoever. Wondering if they didn't make sense, or if the e-mail was too long... :-) Anyway, I'll try to ask them again and hope for some answers this time. It's a very n

Setting request method to post on SolrQuery causes ClassCastException

2008-09-17 Thread syoung
Hi, I need to have queries over a certain length done as a post instead of a get. However, when I set the method to post, I get a ClassCastException. Here is the code: public QueryResponse query(SolrQuery solrQuery) { QueryResponse response = null; try { if (solrQuery.toString(

how to find terms on a page?

2008-09-17 Thread ristretto . rb
Hello, I haven't heard of or found a way to find the number of times a term is found on a page. Lucene uses it in scoring, I believe, (solr scoring: http://tinyurl.com/4tb55r) Basically, for a given page, I would like a list of terms on the page and number of times the terms appear on the page?

problem index accented character with release version of solr 1.3

2008-09-17 Thread Joshua Reedy
I have been using a stable dev version of 1.3 for a few months. Today, I began testing the final release version, and I encountered a strange problem. The only thing that has changed in my setup is the solr code (I didn't make any config change or change the schema). a document has a text field wi

Re: problem index accented character with release version of solr 1.3

2008-09-17 Thread Ryan McKinley
My guess is it has to do with switching the StAX implementation to geronimo API and the woodstox implementation https://issues.apache.org/jira/browse/SOLR-770 I'm not sure what the solution is though... On Sep 17, 2008, at 10:02 PM, Joshua Reedy wrote: I have been using a stable dev versio

Special character matching 'x' ?

2008-09-17 Thread Sanjay Suri
Hi, Can someone shed some light on this? One of my field values has the name "Räikkönen" which contains a special characters. Strangely, as I see it anyway, it matches on the search query 'x' ? Can someone explain or point me to the solution/documentation? Any help appreciated, -Sanjay -- S

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread mohitranka
Otis, I understand that 1 solr instance can store n documents (one-by-one). My query was how to create m such instances/processes/threads so that m documents get stored at a time, instead of 1 at a time. All the instances should read at the same port. Otis Gospodnetic wrote: > > Mohit, > > Ha

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread mohitranka
Will having multiple cores, instead of one, server the purpose? mohitranka wrote: > > Otis, I understand that 1 solr instance can store n documents > (one-by-one). My query was how to create m such > instances/processes/threads so that m documents get stored at a time, > instead of 1 at a time.

Field level security

2008-09-17 Thread Geoff Hopson
Hi, First post/question, so please be gentle :-) I am trying to put together a security model around fields in my index. My requirement is that a user may not have permission to view certain fields in the index when he does a search. For example, he may have permission to see the name and address

Solr vs Autonomy

2008-09-17 Thread Geoff Hopson
Hi, I'm under pressure to justify the use of Solr on my project, and others are suggesting that Autonomy be used instead. Apart from price, does anyone have a list of pros/cons around Autonomy compared to Solr? Thanks geoff

Re: Special character matching 'x' ?

2008-09-17 Thread Akshay
You need to configure Tomcat appropriately for recognizing international characters in the URI. Take a look at this to see if it helps, http://wiki.apache.org/solr/SolrTomcat#head-20147ee4d9dd5ca83ed264898280ab60457847c4 On Thu, Sep 18, 2008 at 10:53 AM, Sanjay Suri <[EMAIL PROTECTED]> wrote: > H

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread Shalin Shekhar Mangar
On Thu, Sep 18, 2008 at 11:03 AM, mohitranka <[EMAIL PROTECTED]> wrote: > > Otis, I understand that 1 solr instance can store n documents (one-by-one). > My query was how to create m such instances/processes/threads so that m > documents get stored at a time, instead of 1 at a time. > > All the in

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread mohitranka
Shalin, I understand that :-) My problem is, if 1 solr instance process(save) 100 documents one-by-one, it would not be very effective, I want to create 10 clones (process/threads/cores) of the same solr instance, so that 10 documents get processed(saved to solr) simaltaneously. Tha

Re: Solr vs Autonomy

2008-09-17 Thread Otis Gospodnetic
Geoff, Perhaps you can find out the list of features/functionalities that your project requires and we can give you quick yes/no. Or perhaps you can get those "others" to list those Autonomy features that they think they really need, and we can tell you how Solr compares. Otis -- Sematext -- ht

Re: Setting request method to post on SolrQuery causes ClassCastException

2008-09-17 Thread Otis Gospodnetic
A quick work-around is, I think, to tell Solr to use the non-binary response, e.g. &wt=xml (I think that's the "syntax"). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: syoung <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent:

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread Otis Gospodnetic
Mohit, I think you are thinking too hard - trying to optimize something that doesn't sound like it needs optimizing at this point in your project. I suggest you start with 1 Solr instance and then see if anything needs to be faster after you've pushed that to its limits. Otis -- Sematext -- h

Re: Field level security

2008-09-17 Thread Otis Gospodnetic
Hi, I don't understand all the details, but I'll inline a few comments. - Original Message > From: Geoff Hopson <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, September 18, 2008 1:44:33 AM > Subject: Field level security > > Hi, > > First post/question, so p