Re: How to debug ?

2008-06-24 Thread Norberto Meijome
On Wed, 25 Jun 2008 08:37:35 +0200 Brian Carmalt <[EMAIL PROTECTED]> wrote: > There is a plugin for jetty: http://webtide.com/eclipse. Insert this as > and update site and let eclipse install the plugin for you You can then > start the jetty server from eclipse and debug it. Thanks Brian, good i

Re: How to debug ?

2008-06-24 Thread Brian Carmalt
Hello Beto, There is a plugin for jetty: http://webtide.com/eclipse. Insert this as and update site and let eclipse install the plugin for you You can then start the jetty server from eclipse and debug it. Brian. Am Mittwoch, den 25.06.2008, 12:48 +1000 schrieb Norberto Meijome: > On Tue, 24 J

Re: DataImportHandler running out of memory

2008-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is batchSize="-1" not fetchSize. Or keep it to a very small value. --Noble On Wed, Jun 25, 2008 at 9:31 AM, Noble Paul നോബിള്‍ नोब्ळ् <[EMAIL PROTECTED]> wrote: > DIH streams rows one by one. > set the fetchSize="-1" this might help. It may make the indexing a bit > slower but memory consumptio

Re: DataImportHandler running out of memory

2008-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
DIH streams rows one by one. set the fetchSize="-1" this might help. It may make the indexing a bit slower but memory consumption would be low. The memory is consumed by the jdbc driver. try tuning the -Xmx value for the VM --Noble On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar <[EMAIL PRO

RE: UnicodeNormalizationFilterFactory

2008-06-24 Thread Lance Norskog
ISOLatin1AccentFilterFactory works quite well for us. It solves our basic euro-text keyboard searching problem, where "protege" should find protégé. ("protege" with two accents.) -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 24, 2008 4:05 PM To: sol

Re: How to debug ?

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 19:17:58 -0700 Ryan McKinley <[EMAIL PROTECTED]> wrote: > also, check the LukeRequestHandler > > if there is a document you think *should* match, you can see what > tokens it has actually indexed... right, I will look into that a bit more. I am actually using the lukeall.

Re: DataImportHandler running out of memory

2008-06-24 Thread Shalin Shekhar Mangar
Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index docum

Re: How to debug ?

2008-06-24 Thread Ryan McKinley
also, check the LukeRequestHandler if there is a document you think *should* match, you can see what tokens it has actually indexed... On Jun 24, 2008, at 7:12 PM, Norberto Meijome wrote: hi, I'm trying to understand why a search on a field tokenized with the nGram tokenizer, with minGram

How to debug ?

2008-06-24 Thread Norberto Meijome
hi, I'm trying to understand why a search on a field tokenized with the nGram tokenizer, with minGramSize=n and maxGramSize=m doesn't find any matches for queries of length (in characters) of n+1..m (n works fine). analysis.jsp shows that it SHOULD match, but /select doesn't bring anything back. (

Re: DataImportHandler running out of memory

2008-06-24 Thread Grant Ingersoll
This is a bug in MySQL. Try setting the Fetch Size the Statement on the connection to Integer.MIN_VALUE. See http://forums.mysql.com/read.php?39,137457 amongst a host of other discussions on the subject. Basically, it tries to load all the rows into memory, the only alternative is to set

DataImportHandler running out of memory

2008-06-24 Thread wojtekpia
I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driv

Re: Can I specify the default operator at query time ?

2008-06-24 Thread Chris Hostetter
: Subject: Can I specify the default operator at query time ? : In-Reply-To: <[EMAIL PROTECTED]> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh

Can I add field compression without reindexing?

2008-06-24 Thread Chris Harris
I have an index that I eventually want to rebuild so I can set compressed=true on a couple of fields. It's not really practical to rebuild the whole thing right now, though. If I change my schema.xml to set compressed=true and then keep adding new data to the existing index, will this corrupt the i

Re: UnicodeNormalizationFilterFactory

2008-06-24 Thread Chris Hostetter
: I've seen mention of these filters: : : : Are you asking because you saw these in Robert Haschart's reply to your previous question? I think those are custom Filters that he has in his project ... not open source (but i may be wrong) they are certainly not something that comes out of t

Re: How to use SOLR1.2

2008-06-24 Thread Chris Hostetter
: I am new in SOLR 1.2, configured Admin GUI. Facing problem in using : this. could you pls help me out to configure the nex. the admin GUI isn't really a place where you configure Solr. It's a way to see the status of things -- configuration is done via config files. have you con through t

Re: Wildcard search question

2008-06-24 Thread Jon Drukman
Norberto Meijome wrote: ok well let's say that i can live without john/jon in the short term. what i really need today is a case insensitive wildcard search with literal matching (no fancy stemming. bobby is bobby, not bobbi.) what are my options? http://wiki.apache.org/solr/AnalyzersTokeniz

Re: SpellCheckComponent: No file-based suggestions + Location issue

2008-06-24 Thread Ronald K. Braun
Shalin: > The index directory location is being created inside the current working > directory. We should change that. I've opened SOLR-604 and attached a patch > which fixes this. I updated from nightly build to incorporate your fix and it works perfectly, now building the spell indexes in solr/

Nutch <-> Solr latest?

2008-06-24 Thread Jon Baer
Hi, Im curious, is there a spot / patch for the latest on Nutch / Solr integration, Ive found a few pages (a few outdated it seems), it would be nice (?) if it worked as a DataSource type to DataImportHandler, but not sure if that fits w/ how it works. Either way a nice contrib patch the

Re: Attempting dataimport using FileListEntityProcessor

2008-06-24 Thread Shalin Shekhar Mangar
Ok, I got your point. DataImportHandler currently creates documents and adds them one-by-one to Solr. A commit/optimize is called once after all documents are finished. If a document fails to add due to any exception then the import fails. You can still achieve the functionality you want by setti

Re: Attempting dataimport using FileListEntityProcessor

2008-06-24 Thread mike segv
I do want to import all documents. My understanding of the way things work, correct me if I'm wrong, is that there can be a certain number of documents included in a single atomic update. Instead of having all my 16 Million documents be part of a single update (that could more easily fail being

solr-14 help

2008-06-24 Thread Geoffrey Young
hi all :) last week I reworked an older patch for SOLR-14 https://issues.apache.org/jira/browse/SOLR-14 this functionality is actually fairly important for our ongoing migration to solr, so I'd really love to get SOLR-14 into 1.3. but open-source being what it is, my super-important featur

RE: never desallocate RAM...during search

2008-06-24 Thread r.nieto
Hi, I'm having problems with the patch. With this schema.xml: > If I send documents with a content smaller than 3 I have an exception during the indexing. If I change the maxLength to, for example, 30 the documents that before gave the exception are now indexed correctly. The except

Re: SOLR-469 - bad patch?

2008-06-24 Thread Shalin Shekhar Mangar
I've just uploaded a new patch which applies cleanly on the trunk. Thanks! On Tue, Jun 24, 2008 at 7:35 PM, Jon Baer <[EMAIL PROTECTED]> wrote: > It seems the new patch @ https://issues.apache.org/jira/browse/SOLR-469 is > x2 the size but turns out the patch itself might be bad? > > Ie, it dumps

Re: SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 16:34:44 +0100 Dave Searle <[EMAIL PROTECTED]> wrote: > I am currently storing the thread id within the message index, however, > although this would allow me to sort, it doesn't help with the grouping of > threads based on relevancy. See the idea is to index message data in

Re: Accented search

2008-06-24 Thread Robert Haschart
climbingrose wrote: Here is how I did it (the code is from memory so it might not be correct 100%): private boolean hasAccents; private Token filteredToken; public final Token next() throws IOException { if (hasAccents) { hasAccents = false; return filteredToken; } Token t = input.nex

RE: SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Dave Searle
where you are going, any road will get you there. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. __ Information from ESET Smart Security, version of vi

Re: SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 16:04:24 +0100 Dave Searle <[EMAIL PROTECTED]> wrote: > At the moment I have an index of forum messages (each message being a > separate doc). Results are displayed on a per message basis, however, I would > like to group the results via their thread. Apart from using a facet

RE: SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Dave Searle
randomly monitor > outgoing > and incoming emails and > other telecommunications on its email and telecommunications systems. By > replying to this email you give > your consent to such monitoring. Copyright in this e-mail and any attachments > created by Magicalia Media > belon

Otis : Re: n-Gram, only works with queries of 2 letters

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 09:10:58 +1000 Norberto Meijome <[EMAIL PROTECTED]> wrote: > On Mon, 23 Jun 2008 05:33:49 -0700 (PDT) > Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > When you add &debugQuery=true to the request, what does your query look > > like after parsing? hi Otis

Re: Accented search

2008-06-24 Thread climbingrose
Here is how I did it (the code is from memory so it might not be correct 100%): private boolean hasAccents; private Token filteredToken; public final Token next() throws IOException { if (hasAccents) { hasAccents = false; return filteredToken; } Token t = input.next(); String filte

Re: SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Otis Gospodnetic
I don't know if SOLR-139 will make it into 1.3, but from your brief description, I'd say you might want to consider a different schema for your data. Stuffing thread messages in the same doc that represents a thread may not be the best choice. Of course, you may have good reasons for doing tha

SOLR-469 - bad patch?

2008-06-24 Thread Jon Baer
It seems the new patch @ https://issues.apache.org/jira/browse/ SOLR-469 is x2 the size but turns out the patch itself might be bad? Ie, it dumps build.xml twice, is it just me? Thanks. - Jon

Re: (Edge)NGram tokenizer interaction with other filters

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 04:54:46 -0700 (PDT) Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > One tokenizer is followed by filters. I think this all might be a bit > clearer if you read the chapter about Analyzers in Lucene in Action if you > have a copy. I think if you try to break down that "the re

SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Dave Searle
Hi, Does anyone know if SOLR-139 (Support updateable/modifiable documents) will make it back into the 1.3 release? I'm looking for a way to append data to a multivalued field in a document over a period of time (in which the document represents a forum thread and the multivalued field represe

Re: (Edge)NGram tokenizer interaction with other filters

2008-06-24 Thread Otis Gospodnetic
One tokenizer is followed by filters. I think this all might be a bit clearer if you read the chapter about Analyzers in Lucene in Action if you have a copy. I think if you try to break down that "the result of all this passed to " into something more concrete and real you will see how things

Re: Parser of Response XML

2008-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
org.apache.solr.client.solrj.impl.XMLResponseParser On Tue, Jun 24, 2008 at 3:06 PM, Ranjeet <[EMAIL PROTECTED]> wrote: > Hi, > > is any class is available in SOLR API to parse the response XML? > > Regards, > Ranjeet -- --Noble Paul

Parser of Response XML

2008-06-24 Thread Ranjeet
Hi, is any class is available in SOLR API to parse the response XML? Regards, Ranjeet

Re: several tokenizers in one field type

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 00:14:57 -0700 Ryan McKinley <[EMAIL PROTECTED]> wrote: > best docs are here: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters yes, I've been reading that already , thanks :) > > > - If I define 2 tokenizers in a fieldtype, only the first one is > > applied, t

(Edge)NGram tokenizer interaction with other filters

2008-06-24 Thread Norberto Meijome
hi everyone, if I define a field as

Re: several tokenizers in one field type

2008-06-24 Thread Ryan McKinley
On Jun 24, 2008, at 12:07 AM, Norberto Meijome wrote: hi all, ( I'm using 1.3 nightly build from 15th June 08.) Is there some documentation about how analysers + tokenizers are applied in fields ? In particular, my question : best docs are here: http://wiki.apache.org/solr/AnalyzersToken

several tokenizers in one field type

2008-06-24 Thread Norberto Meijome
hi all, ( I'm using 1.3 nightly build from 15th June 08.) Is there some documentation about how analysers + tokenizers are applied in fields ? In particular, my question : - If I define 2 tokenizers in a fieldtype, only the first one is applied, the other is ignored. Is that because the 2nd tok