Re: Solr Core Size limit

2008-11-12 Thread Otis Gospodnetic
Right. Of course, in most cases you'd run out of hardware resources before you run out of Integers. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Norberto Meijome <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday,

Re: Newbie Question - getting search results from dataimport request handler

2008-11-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Nov 13, 2008 at 3:52 AM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : You need to modify the schema which came with Solr to suit your data. There > > If i'm understanding this thread correctly, DIH ran "successfully", docs > were created, some fields were stored and indexed (because the

Re: DataImportHandler not indexing all the records

2008-11-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
the fact that it got committed in the end suggests there was no error in between look at the status url and see the no:of rows returned etc. It gives a clue as to what would have really happened. or you can paste your dataconfig and status xmls and we may be able to suggest something On Thu, Nov

Re: indexing data and deleting from index and database

2008-11-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
The JdbcDataSource can run any query even updates and deletes On Thu, Nov 13, 2008 at 9:27 AM, Noble Paul നോബിള്‍ नोब्ळ् <[EMAIL PROTECTED]> wrote: > DIH can delete rows from the index. look at the 'deletedPkQuery' option . > http://wiki.apache.org/solr/DataImportHandler#head-70d3fdda52de9ee4fdb54

Re: indexing data and deleting from index and database

2008-11-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
DIH can delete rows from the index. look at the 'deletedPkQuery' option . http://wiki.apache.org/solr/DataImportHandler#head-70d3fdda52de9ee4fdb54e1c6f84199f0e1caa76 Deleting from the DB is not possible for DIH . but you can write a transformer or Entityprocessor which can do that. On Wed, Nov 12

Re: DataImportHandler not indexing all the records

2008-11-12 Thread Giri
Hi Noble, thanks for reply, my comments are below >>why is the id field multivalued? I was just trying various options, yes, this ID is unique, and I check for duplicates, when I did a distinct (id) query to the MySQL database, it returned almost 2 million. >> look at the status host:post/dataim

Re: Newbie Question - getting search results from dataimport request handler

2008-11-12 Thread Shalin Shekhar Mangar
On Thu, Nov 13, 2008 at 3:52 AM, Chris Hostetter <[EMAIL PROTECTED]>wrote: > > : You need to modify the schema which came with Solr to suit your data. > There > > If i'm understanding this thread correctly, DIH ran "successfully", docs > were created, some fields were stored and indexed (because t

Re: DIH and repeated chunked input

2008-11-12 Thread Shalin Shekhar Mangar
It is implemented. We used this feature to ingest data from a REST API quite similar to Solr's own. Our use-case was that the first call to the API returned a token in the xml response. To get to the next set of results, the value of the token in the last response needs to be passed as a request p

Re: Solr Core Size limit

2008-11-12 Thread Norberto Meijome
On Tue, 11 Nov 2008 20:39:32 -0800 (PST) Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > With Distributed Search you are limited to # of shards * Integer.MAX_VALUE. yeah, makes sense. And i would suspect since this is PER INDEX , it applies to each core only ( so you could have n cores in m shards

Re: Solr Core Size limit

2008-11-12 Thread Norberto Meijome
On Tue, 11 Nov 2008 10:25:07 -0800 (PST) Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Doc ID gaps are zapped during segment merges and index optimization. > thanks Otis :) b _ {Beto|Norberto|Numard} Meijome "I didn't attend the funeral, but I sent a nice letter saying

Re: indexing data and deleting from index and database

2008-11-12 Thread Daniel Gimenez
Hi! I have a similar problem but I don't have the solution for now. I will send my progress. Marc Sturlese wrote: > > Hey there, > Since few weeks ago I am trying to migrate my lucene core app to Solr and > many questions are coming to my mind... > Before being in ApacheCon I thought that my L

Re: Boost Query effect with Standard Request Handler

2008-11-12 Thread Chris Hostetter
: The reason I brought the question back up is that hossman said: ... : I tried it and it didn't work, so I was curious if I was still doing : something wrong. no ... i'm just a foolish foolish man who says things with a lot of authority even though i clearly don't know what i'm talking

RE: FW: Score customization

2008-11-12 Thread Chris Hostetter
: I effectively need to use a multiplication in the sorting of the items. : Something like score*popularity. : It seems the only way to do this is to use a bf parameter. : However how do you use bf in combination with the standard requestHandler? functions are understood by the standard query par

RE: Query Performance while updating teh index

2008-11-12 Thread Chris Hostetter
: How about create a new core, index data, then swap the core? Old core : is still available to handle queries till new core replaces it. a new SolrCore shouldn't help in a situation like this ... with snapshots and commits on a single SolrCore you at least get the benefits of autowarming and

DIH and repeated chunked input

2008-11-12 Thread Norskog, Lance
In http://wiki.apache.org/solr/DataImportHandler there is this paragraph: If an API supports chunking (when the dataset is too large) multiple calls need to be made to complete the process. XPathEntityprocessor supports this with a transformer. If transformer returns a row which contains a fi

Re: Newbie Question - getting search results from dataimport request handler

2008-11-12 Thread Chris Hostetter
: You need to modify the schema which came with Solr to suit your data. There If i'm understanding this thread correctly, DIH ran "successfully", docs were created, some fields were stored and indexed (because they did exist in the schema) but other fields the user was attempting to create didn

Re: Solr 1.3 stack overflow when accessing solr/admin page

2008-11-12 Thread Chris Hostetter
: I get the exception when accessing http://localhost:7001/solr/admin but : http://localhost:7001/solr/admin/luke works fine. i don't have time to really dig into the code right now, but out of curiosity what happens when you hit http://localhost:7001/solr/admin/ and/or http://localhost:7001/so

Re: NIO not working yet

2008-11-12 Thread Yonik Seeley
On Wed, Nov 12, 2008 at 3:53 PM, Feak, Todd <[EMAIL PROTECTED]> wrote: > Is support for setting the FSDirectory this way built into 1.3.0 > release? Or is it necessary to grab a trunk build. It's not in 1.3, you need a very recent trunk build. -Yonik

RE: Query Performance while updating teh index

2008-11-12 Thread Nguyen, Joe
Another way to handle this is not to run commit script at peak time(still pull snapshot periodically). Keeping track of the number of requests, resource utilization, etc.. If the number of request exceeds the threshold, don't commit. Also, how many segments do you see under index dir? High numb

Re: Query Performance while updating teh index

2008-11-12 Thread oleg_gnatovskiy
Well we never had 1.2 deployed, so I don't know if it's a new issue or not... Yonik Seeley wrote: > > Warming only uses one CPU, so it shouldn't have that much of an impact > on a multi-CPU box. > > Did this issue begin with Solr 1.3? Perhaps it has something to do > with our use of reopen()

RE: NIO not working yet

2008-11-12 Thread Feak, Todd
Is support for setting the FSDirectory this way built into 1.3.0 release? Or is it necessary to grab a trunk build. -Todd Feak -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, November 12, 2008 11:59 AM To: solr-user@lucene.ap

NIO not working yet

2008-11-12 Thread Yonik Seeley
NIO support in the latest Solr development versions does not work yet (I previously advised that some people with possible lock contention problems try it out). We'll let you know when it's fixed, but in the meantime you can always set the system property "org.apache.lucene.FSDirectory.class" to "

Re: Query Performance while updating teh index

2008-11-12 Thread Yonik Seeley
Warming only uses one CPU, so it shouldn't have that much of an impact on a multi-CPU box. Did this issue begin with Solr 1.3? Perhaps it has something to do with our use of reopen() (to share parts of the index that are not in use). This can lead to greater lock contention while reading from th

Re: Query Performance while updating teh index

2008-11-12 Thread Otis Gospodnetic
And you have searcher warming set up? Does it use sort and do your queries use sort? What do your cache settings look like? How big is your index, how much RAM does your machine have, how much heap does the JVM have, what does vmstat output look like during warm-up? ... Otis -- Sematext -- http:

RE: Query Performance while updating teh index

2008-11-12 Thread Nguyen, Joe
How about create a new core, index data, then swap the core? Old core is still available to handle queries till new core replaces it. -Original Message- From: Lance Norskog [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 12, 2008 11:16 Joe To: solr-user@lucene.apache.org Subject: RE

Re: Synonyms impacting the performance

2008-11-12 Thread Chris Hostetter
two general comments on this thread as a whole... 1) it's hard to compare the timing of a query with no synonyms and a query with a lot of synonyms since the number of terms increases and (most likely) the number of documents matched in increases as well. the more clauses in the query, the mor

Re: Query Performance while updating teh index

2008-11-12 Thread oleg_gnatovskiy
Yonik Seeley wrote: > > On Wed, Nov 12, 2008 at 2:06 PM, oleg_gnatovskiy > <[EMAIL PROTECTED]> wrote: >> The rsync seems to have nothing to do with slowness, because while the >> rsync >> is going on, there isn't any reload occurring, once the files are on the >> system, it tries a curl request

RE: Query Performance while updating teh index

2008-11-12 Thread Lance Norskog
Yes, this is the cache autowarming. We turned this off and staged separate queries that pre-warm our standard queries. We are looking at pulling the query server out of the load balancer during this process; it is the most effective way to give fixed response time. Lance -Original Message---

Re: Query Performance while updating teh index

2008-11-12 Thread Yonik Seeley
On Wed, Nov 12, 2008 at 2:06 PM, oleg_gnatovskiy <[EMAIL PROTECTED]> wrote: > The rsync seems to have nothing to do with slowness, because while the rsync > is going on, there isn't any reload occurring, once the files are on the > system, it tries a curl request to reload the searcher, which at th

Re: Query Performance while updating teh index

2008-11-12 Thread oleg_gnatovskiy
The rsync seems to have nothing to do with slowness, because while the rsync is going on, there isn’t any reload occurring, once the files are on the system, it tries a curl request to reload the searcher, which at that point causes the delays. The file transfer probably has nothing to do with thi

Re: Query Performance while updating teh index

2008-11-12 Thread Yonik Seeley
On Tue, Nov 11, 2008 at 9:31 PM, oleg_gnatovskiy <[EMAIL PROTECTED]> wrote: > Hello. We have an index with 15 million documents working on a distributed > environment, with an index distribution setup. While an index on a slave > server is being updated, query response times become extremely slow (

Re: posting error in solr

2008-11-12 Thread Chris Hostetter
: I am using Solr Lucene - 2.0 Hmmm that doesn't exist. what do you see when you view the /admin/registry.jsp page in your browser, and you look at these values... Solr Specification Version Solr Implementation Version Lucene Specification Version Lucene Imp

RE: FW: Score customization

2008-11-12 Thread Nguyen, Joe
You could use function query with standardRequestHandler to influence the final score and sort result by score. If you want to control how much the function query would affect the original score, you could use the linear function. -Original Message- From: lajkonik86 [mailto:[EMAIL PROTECT

Re: simple filter query solr processing

2008-11-12 Thread Chris Hostetter
: Cannot parse ' +i_subjects:"Film': Lexical error at line 1, column 19. : Encountered: after : "\"Film" : i do not want it splitting commas and replacing them with fq, but completely : matching on i_subjects:"film,media,mass communication" i'm having trouble interpreting the formating of you

indexing data and deleting from index and database

2008-11-12 Thread Marc Sturlese
Hey there, Since few weeks ago I am trying to migrate my lucene core app to Solr and many questions are coming to my mind... Before being in ApacheCon I thought that my Lucene Index works fine with my Solr Search Engine but after my conversation with Erik in the Solr BootCamp I understood that the

RE: Synonyms impacting the performance

2008-11-12 Thread Nguyen, Joe
Could you collaborate further? 20 synonyms would translated to 20 booleanQueries. Are you saying each booleanQuery requires a disk access? -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 12, 2008 7:46 Joe To: solr-user@lucene.apache.org Sub

Re: MoreLikeThis

2008-11-12 Thread Ryan McKinley
hymmm -- if it does not come out with debugQuery, I don't think there is a way to get it easily Can you create a JIRA issue for this? Adding the 'explain' info for each MLT result should be relatively easy. ryan On Nov 12, 2008, at 11:43 AM, Jeff Newburn wrote: I have also tried de

RE: Synonyms impacting the performance

2008-11-12 Thread Manepalli, Kalyan
Yes there is a querycomponent which checks if there are any results based on a query and if the results are not present then modify the Boolean query. So this queryComponent is does call the process(). Thanks, Kalyan Manepalli -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTE

Re: Boost Query effect with Standard Request Handler

2008-11-12 Thread CameronL
Ahh shoot! Ok, I copied the original thread at the bottom for context. Basically what I need is the bq functionality with the StandardRequestHandler. I can't use dismax because that requires using qf and doesn't offer as much flexibility as we need. I have used Erik's technique of appending an

Re: MoreLikeThis

2008-11-12 Thread Jeff Newburn
I have also tried debugQuery=true. It outputs a large amount of data but none of it appears related to moreLikeThis information. Continuing to work on it but not sure how it is going to be possible to debug the functionality. Does anybody have any other suggestions on how to extract information

RE: Displaying stdout from postCommit command

2008-11-12 Thread Jerry Mindek
Thanks for the reply Koji. The reason why I asked is because I have a user who wants to post their own updates. When the postCommit is active, after he posts his documents, it appears that the job has stalled because there is a long period with no output. After speaking with me, he now realizes

Re: Displaying stdout from postCommit command

2008-11-12 Thread Koji Sekiguchi
Jerry, > I would like to see the output from snapshooter snapshooter outputs snapshooter.log. But, > Is there a way to send snapshooter's output to stdout of the terminal > which I executed the commit command? I don't think it's possible. (You can modify RunExecutableListener to redirect stdou

Re: Synonyms impacting the performance

2008-11-12 Thread Walter Underwood
If there are twenty synonyms, then a one term query becomes a twenty term query, and that means 20X more disk accesses. wunder On 11/12/08 7:08 AM, "Erik Hatcher" <[EMAIL PROTECTED]> wrote: > > On Nov 12, 2008, at 9:41 AM, Manepalli, Kalyan wrote: >> I did the index time synonyms and results do

Re: Synonyms impacting the performance

2008-11-12 Thread Erik Hatcher
On Nov 12, 2008, at 9:41 AM, Manepalli, Kalyan wrote: I did the index time synonyms and results do look much better than the query time indexing. But is there a reason for the searches to be that slow. I understand that we have a pretty long list of synonyms (one word contains atleast 20

Re: SpellChecker Component

2008-11-12 Thread Grant Ingersoll
See https://issues.apache.org/jira/browse/LUCENE-1417 and http://lucene.markmail.org/message/sktohlgqxcpmpf7z?q=list:org%2Eapache%2Elucene%2Esolr-user+spellchecker+Rennie In short, frequency is the second order sort level. I think it should be made pluggable.A patch would be most welcome.

RE: Synonyms impacting the performance

2008-11-12 Thread Manepalli, Kalyan
Hi Erik, I did the index time synonyms and results do look much better than the query time indexing. But is there a reason for the searches to be that slow. I understand that we have a pretty long list of synonyms (one word contains atleast 20 words as synonyms). Does this have such an adv

Re: Synonyms impacting the performance

2008-11-12 Thread Erik Hatcher
On Nov 12, 2008, at 9:12 AM, Kashyap, Raghu wrote: {quote}It's hard to tell where exactly the bottleneck is without looking at the server and a few other things. {quote} Can you suggest some areas where we can start looking into this issue? Using &debugQuery=true will output the timings of

RE: Synonyms impacting the performance

2008-11-12 Thread Kashyap, Raghu
Hi Otis, {quote}It's hard to tell where exactly the bottleneck is without looking at the server and a few other things. {quote} Can you suggest some areas where we can start looking into this issue? -Raghu -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tue

Re: Solr 1.3 stack overflow when accessing solr/admin page

2008-11-12 Thread Mike Robins
I'm experiencing the same java.lang.StackOverflowError problem with solr 1.3.0 on Weblogic 10.3 when accessing the admin page. I'm using the distributed war but have added a weblogic.xml file to the WEB-INF directory. I get the exception when accessing http://localhost:7001/solr/admin but http:/

RE: FW: Score customization

2008-11-12 Thread lajkonik86
I effectively need to use a multiplication in the sorting of the items. Something like score*popularity. It seems the only way to do this is to use a bf parameter. However how do you use bf in combination with the standard requestHandler? hossman wrote: > > > : Now I need to know whether the

Re: Boost Query effect with Standard Request Handler

2008-11-12 Thread Erik Hatcher
bq only works with dismax (&defType=dismax). To get the same effect with the lucene/solr query parser, append a clause to the original query (OR'ing it in). Erik On Nov 11, 2008, at 11:52 PM, Otis Gospodnetic wrote: Hi, It's hard to tell what you are replying to since you remove

Re: simple filter query solr processing

2008-11-12 Thread joeMcElroy
tried that and managed to get no results. cheers for the help &fq=i_subjects:Anesthesia&fq=i_subjects:Intensive+Care&fq=i_subjects:Pain+Management ryantxu wrote: > >> >> tried removing the plusses i am inserting but now shows too many >> results >> >> &fq=+i_subjects:Film+i_subjects:+media+

Need help with SolrIndexSearcher & CoreContainer

2008-11-12 Thread Kraus, Ralf | pixelhouse GmbH
Hi, I want to use a SolrIndexSearcher for some special searches in my app... I startup my Solr with two cores in it (core_de & core_uk). But when I try this then my Solr Server generates a complete new cory instead of using the existing one... After 5-6 searches I run out of memory :-( Examp