Some sort of join in SOLR?

2008-01-16 Thread Michael Lackhoff
Hello, I have two sources of data for the same "things" to search. It is book data in a library. First there is the usual bibliographic data (author, title...) and then I have scanned and OCRed table of contents data about the same books. Both are updated independently. Now I don't know how t

Re: FunctionQuery in a custom request handler

2008-01-16 Thread Chris Hostetter
: How do I access the ValueSource for my DateField? I'd like to use a : ReciprocalFloatFunction from inside the code, adding it aside others in the : main BooleanQuery. The FieldType API provides a getValueSource method (so every FieldType picks it's own best ValueSource implementaion). -Hoss

Re: Cache size and Heap size

2008-01-16 Thread Chris Hostetter
: > I know this is a lot and I'm going to decrease it, I was just experimenting, : > but I need some guidelines of how to calculate the right size of the cache. : : Each filter that matches more than ~3000 documents will occupy maxDocs/8 bytes : of memory. Certain kinds of faceting require one en

Re: batch indexing takes more time than shown on SOLR output --> something to do with IO?

2008-01-16 Thread Chris Hostetter
: INFO: {add=[10485, 10488, 10489, 10490, 10491, 10495, 10497, 10498, ...(42 : more) : ]} 0 875 : : However, when timing this instruction on the client-side (I use SOlrJ --> : req.process(server)) I get totally different numbers (in the beginning the : client-side measured time is about 2 seconds

Re: 2D Facet

2008-01-16 Thread Chris Hostetter
: : Hello, is this possible to do in one query: I have a query which returns : 1000 documents with names and addresses. I can run facet on state field : and see how many addresses I have in each state. But also I need to see : how many families lives in each state. So as a result I need a matri

Re: Fwd: Solr "Text" field

2008-01-16 Thread Chris Hostetter
: searches. That is fine by me. But I'm still at the first question: : How do I conduct a wildcard search for ARIZONA on a solr.textField? I tried as i said: it really depends on what kind of "index" analyzer you have configured for the field -- the query analyzer isn't used at all when dealin

Re: Wildcard on last char

2008-01-16 Thread Chris Hostetter
: i have encountered a problem concerning the wildcard. When i search for : field:testword i get 50 results. That's ok but when I search for : field:testwor* i get just 3 hits! I get only words returned without a : whitespace after the char like "testwordtest" but i wont find any single : "testwo

Re: Restrict values in a multivalued field

2008-01-16 Thread Chris Hostetter
: In my schema I have a multivalued field, and the values of that field are : "stored" and "indexed" in the index. I wanted to know if its possible to : restrict the number of multiple values being returned from that field, on a : search? And how? Because, lets say, if I have thousands of values i

Re: Transactions and Solr Was: Re: Delte by multiple id problem

2008-01-16 Thread Chris Hostetter
: Does anyone have more experience doing this kind of stuff and whants to share? My advice: don't. I work with (or work with people who work with) about two dozen Solr indexes -- we don't attempt to update a single one of them in any sort of transactional way. Some of them are updated "real t

Re: Newbie question: facets and filter query?

2008-01-16 Thread Chris Hostetter
: The problem is that when I use the 'cd' request handler, the facet count for : 'dvd' provided in the response is 0 because of the filter query used to only : show the 'cd' facet results. How do I retrieve facet counts for both : categories while only retrieving the results for one category? the

Re: Fuzziness with DisMaxRequestHandler

2008-01-16 Thread Chris Hostetter
: Is there any way to make the DisMaxRequestHandler a bit more forgiving with : user queries, I'm only getting results when the user enters a close to : perfect match. I'd like to allow near matches if possible, but I'm not sure : how to add something like this when special query syntax isn't allo

Re: DisMax Syntax

2008-01-16 Thread Chris Hostetter
: I may be mistaken, but this is not equivalent to my query.In my query i have : matches for x1, matches for x2 without slope and/or boosting and then match : to "x1 x2" (exact match) with slope (~) a and boost (b) in order to have : results with exact match score better. : The total score is the

Re: Solr schema filters

2008-01-16 Thread Chris Hostetter
: For this exact example, use the WordDelimiterFilter exactly as : configured in the "text" fieldType in the example schema that ships : with solr. The trick is to then use some slop when querying. : : FT-50-43 will be indexed as FT, 50, 43 / 5043 (the last two tokens : are in the same position)

Re: conceptual issues with solr

2008-01-16 Thread Norberto Meijome
On Wed, 16 Jan 2008 16:54:56 +0100 "Philippe Guillard" <[EMAIL PROTECTED]> wrote: > Hi here, > > It seems that Lucene accepts any kind of XML document but Solr accepts only > flat name/value pairs inside a document to be indexed. > You'll find below what I'd like to do, Thanks for help of any kin

Logging in Solr

2008-01-16 Thread David Thibault
All, I'm new to Solr and Tomcat and I'm trying to track down some odd errors. How do I set up Tomcat to do fine-grained Solr-specific logging? I have looked around enough to know that it should be possible to do per-webapp logging in Tomcat 5.5, but the details are hard to follow for a newbie. A

Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
I did see that bug, which made me suspect Lucene. In my case, I tracked down the problem. It was my own application. I was using Java's FileChannel.transferTo functions to copy my index from one location to another. One of the files is bigger than 2^31-1 bytes. So, one of my files was corrupted

Re: Indexing very large files.

2008-01-16 Thread David Thibault
Thanks, Otis. I will take a look at those profiling tools. Best, Dave On 1/16/08, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > David, > I bet you can quickly identify the source using YourKit or another Java > profiler jmap command line tool might also give you some direction. > > Otis >

Re: Indexing very large files.

2008-01-16 Thread David Thibault
Yonik, I pulled SimplePostTool apart, pulled out the main() and the postFiles() and just use it directly in Java via postFile() -> postData(). It seems to work OK. Maybe I should upgrade to v1.3 and try doing things directly through Solrj. Is 1.3 stable yet? Might that be a better plan altogethe

Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Yonik Seeley
This may be a Lucene bug... IIRC, I saw at least one other lucene user with a similar stack trace. I think the latest lucene version (2.3 dev) should fix it if that's the case. -Yonik On Jan 16, 2008 3:07 PM, Kevin Osborn <[EMAIL PROTECTED]> wrote: > I am using the embedded Solr API for my index

dojo and solr

2008-01-16 Thread Sean Laval
has anyone done any work integrating dojo based applications with solr? I am pretty new to both but I wondered if it anyone had developed an xsl for solr that returns solr queries in dojo data store format - json, but a specific format of json. I am not even sure if this is sensible/possible.

Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
Our basic setup is master/slave. We just want to make sure that we are not syncing against an index that is in the middle of a large rebuild. But, I think these issues are still separate from what I am experiencing. I also tried this same scenario in a different development environment. No prob

Re: Indexing very large files.

2008-01-16 Thread Otis Gospodnetic
David, I bet you can quickly identify the source using YourKit or another Java profiler jmap command line tool might also give you some direction. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: David Thibault <[EMAIL PROTECTED]> To: solr-

Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Otis Gospodnetic
Kevin, Perhaps you want to look at how Solr can be used in a master-slave setup. This will separate your indexing from searching. Don't have the URL, but it's on zee Wiki. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kevin Osborn <[EMA

Re: Indexing very large files.

2008-01-16 Thread Yonik Seeley
>From your stack trace, it looks like it's your client running out of memory, right? SimplePostTool was meant as a command-line replacement to curl to remove that dependency, not as a recommended way to talk to Solr. -Yonik On Jan 16, 2008 4:29 PM, David Thibault <[EMAIL PROTECTED]> wrote: > OK,

Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
It is more of a file structure thing for our application. We build in one place and do our index syncing in a different place. I doubt it is relevant to this issue, but figured I would include this information anyway. - Original Message From: Otis Gospodnetic <[EMAIL PROTECTED]> To: sol

Re: Spell checker index rebuild

2008-01-16 Thread Otis Gospodnetic
Do you trust the spellchecker 100% (not looking at its source now). I'd peek at the index with Luke (Luke I trust :)) and see if that term is really there first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald <[EMAIL PROTECT

Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Otis Gospodnetic
Kevin, Don't have the answer to EOF but I'm wondering why the index moving. You don't need to do that as far as Solr is concerned. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kevin Osborn <[EMAIL PROTECTED]> To: Solr Sent: Wednesday,

RE: Indexing very large files.

2008-01-16 Thread Timothy Wonil Lee
I think you should try isolating the problem. It may turn out that the problem isn't really to do with Solr, but file uploading. I'm no expert, but that's what I'd try out in such situation. Cheers, Timothy Wonil Lee http://timundergod.blogspot.com/ http://www.google.com/reader/shared/1684924941

Re: Indexing very large files.

2008-01-16 Thread David Thibault
OK, I have now bumped my tomcat JVM up to 1024MB min and 1500MB max. For some reason Walter's suggestion helped me get past the 8MB file upload to Solr but it's still choking on a 32MB file. Is there a way to set per-webapp JVM settings in tomcat, or is the overall tomcat JVM sufficient to set?

Re: Big number of conditions of the search

2008-01-16 Thread evgeniy . strokin
I see,.. but I really need to run it on Solr. We have already indexed everything. I don't really want to construct a query with 1K OR conditions, and send to Solr to parse it first and run it after. May be there is a way to go directly to Lucene, or Solr and run such query from Java, passing Ar

IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
I am using the embedded Solr API for my indexing process. I created a brand new index with my application without any problem. I then ran my indexer in incremental mode. This process copies the working index to a temporary Solr location, adds/updates any records, optimizes the index, and then co

Spell checker index rebuild

2008-01-16 Thread Doug Steigerwald
Having another weird spell checker index issue. Starting off from a clean index and spell check index, I'll index everything in example/exampledocs. On the first rebuild of the spellchecker index using the query below says the word 'blackjack' exists in the spellchecker index. Great, no proble

Re: Problem with dismax handler when searching Solr along with field

2008-01-16 Thread Mike Klaas
On 16-Jan-08, at 3:15 AM, farhanali wrote: when i search the query for example http://localhost:8983/solr/select/?q=category&qt=dismax it gives the results but when i want to search on the basis of field name like http://localhost:8983/solr/select/?q=maincategory:Cars&qt=dismax it does n

Re: Cache size and Heap size

2008-01-16 Thread Mike Klaas
On 16-Jan-08, at 11:15 AM, [EMAIL PROTECTED] wrote: I'm using Tomcat. I set Max Size = 5Gb and I checked in profiler that it's actually uses whole memory. There is no significant memory use by other applications. Whole change was I increased the size of cache to: LRU Cache(maxSize=1048576, i

Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Mike Klaas
On 15-Jan-08, at 9:23 PM, Srikant Jakilinki wrote: 2) Solr that has to handle a large collective index which has to be split up on multi-machines - The index is ever increasing (TB scale) and dynamic and all of it has to be searched at any point This will require significant development on you

Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Mike Klaas
On 16-Jan-08, at 11:09 AM, Srikant Jakilinki wrote: Thanks for that Shalin. Looks like I have to wait and keep track of developments. Forgetting about indexes that cannot be fit on a single machine (distributed search), any links to have Solr running in a 2-machine environment? I want to

Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Shalin Shekhar Mangar
Solr provides a few scripts to create a multiple-machine deployment. One box is setup as the master (used primarily for writes) and others as slaves. Slaves are added as per application requirements. The index is transferred using rsync. Look at http://wiki.apache.org/solr/CollectionDistribution fo

Re: Cache size and Heap size

2008-01-16 Thread evgeniy . strokin
I'm using Tomcat. I set Max Size = 5Gb and I checked in profiler that it's actually uses whole memory. There is no significant memory use by other applications. Whole change was I increased the size of cache to: LRU Cache(maxSize=1048576, initialSize=1048576, autowarmCount=524288, [EMAIL PROTECT

Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Srikant Jakilinki
Thanks for that Shalin. Looks like I have to wait and keep track of developments. Forgetting about indexes that cannot be fit on a single machine (distributed search), any links to have Solr running in a 2-machine environment? I want to measure how much improvement there will be in performanc

Re: Indexing very large files.

2008-01-16 Thread David Thibault
Walter and all, I had been bumping up the heap for my Java app (running outside of Tomcat) but I hadn't yet tried bumping up my Tomcat heap. That seems to have helped me upload the 8MB file, but it's crashing while uploading a 32MB file now. I Just bumped tomcat to 1024MB of heap, so I'm not sure

Re: Indexing very large files.

2008-01-16 Thread David Thibault
Nice signature...=) On 1/16/08, Erick Erickson <[EMAIL PROTECTED]> wrote: > > The PS really wasn't related to your OOM, and raising that shouldn't > have changed the behavior. All that happens if you go beyond 10,000 > tokens is that the rest gets thrown away. > > But we're beyond my real knowledg

Re: Indexing very large files.

2008-01-16 Thread Walter Underwood
This error means that the JVM has run out of heap space. Increase the heap space. That is an option on the "java" command. I set my heap to 200 Meg and do it this way with Tomcat 6: JAVA_OPTS="-Xmx600M" tomcat/bin/startup.sh wunder On 1/16/08 8:33 AM, "David Thibault" <[EMAIL PROTECTED]> wrote:

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
The PS really wasn't related to your OOM, and raising that shouldn't have changed the behavior. All that happens if you go beyond 10,000 tokens is that the rest gets thrown away. But we're beyond my real knowledge level about SOLR, so I'll defer to others. A very quick-n-dirty test as to whether y

Re: Indexing very large files.

2008-01-16 Thread David Thibault
I tried raising the 1 under as well as and still no luck. I'm trying to upload a text file that is about 8 MB in size. I think the following stack trace still points to some sort of overflowed String issue. Thoughts? Solr returned an error: Java heap space java.lang.OutOfMemoryError: J

Re: Indexing very large files.

2008-01-16 Thread David Thibault
I think your PS might do the trick. My JVM doesn't seem to be the issue, because I've set it to -Xmx512m -Xms256m. I will track down the solr config parameter you mentioned and try that. Thanks for the quick response! Dave On 1/16/08, Erick Erickson <[EMAIL PROTECTED]> wrote: > > P.S. Lucene by

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
P.S. Lucene by default limits the maximum field length to 10K tokens, so you have to bump that for large files. Erick On Jan 16, 2008 11:04 AM, Erick Erickson <[EMAIL PROTECTED]> wrote: > I don't think this is a StringBuilder limitation, but rather your Java > JVM doesn't start with enough memor

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
I don't think this is a StringBuilder limitation, but rather your Java JVM doesn't start with enough memory. i.e. -Xmx. In raw Lucene, I've indexed 240M files Best Erick On Jan 16, 2008 10:12 AM, David Thibault <[EMAIL PROTECTED]> wrote: > All, > I just found a thread about this on the

conceptual issues with solr

2008-01-16 Thread Philippe Guillard
Hi here, It seems that Lucene accepts any kind of XML document but Solr accepts only flat name/value pairs inside a document to be indexed. You'll find below what I'd like to do, Thanks for help of any kind ! Phil I need to index products (hotels) whi

Re: Cache size and Heap size

2008-01-16 Thread Daniel Alheiros
Hi Gene. Have you set your app server / servlet container to use allocate some of this memory to be used? You can define the maximum and minimum heap size adding/replacing some parameters on the app server initialization: -Xmx1536m -Xms1536m Which app server / servlet container are you using?

Cache size and Heap size

2008-01-16 Thread Evgeniy Strokin
Hello,.. I have relatively large RAM (10Gb) on my server which is running Solr. I increased Cache settings and start to see OutOfMemory exceptions, specially on facet search. Is anybody has some suggestions how Cache settings related to Memory consumptions? What are optimal settings? How they c

Re: Indexing very large files.

2008-01-16 Thread David Thibault
All, I just found a thread about this on the mailing list archives because I'm troubleshooting the same problem. The kicker is that it doesn't take such large files to kill the StringBuilder. I have discovered the following: By using a text file made up of 3,443,464 bytes or less, I get no erro

Re: Solr replication

2008-01-16 Thread Bill Au
my answers inilne... On Jan 16, 2008 3:51 AM, Dilip.TS <[EMAIL PROTECTED]> wrote: > Hi Bill, > I have some questions regarding the SOLR collection distribution. > !) Is it possilbe to add the index operations on the the slave server > using > SOLR collection distribution and still the master serv

Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Shalin Shekhar Mangar
Look at http://issues.apache.org/jira/browse/SOLR-303 Please note that it is still work in progress. So you may not be able to use it immeadiately. On Jan 16, 2008 10:53 AM, Srikant Jakilinki <[EMAIL PROTECTED]> wrote: > Hi All, > > There is a requirement in our group of indexing and searching s

Indexing two sets of details

2008-01-16 Thread Gavin
Hi, In the web application we are developing we have two sets of details. The personal details and the resume details. We allow 5 different resumes to be available for each user. But we want the personal details to remain same for each 5 resumes. The problem is when personal details are cha

Problem with dismax handler when searching Solr along with field

2008-01-16 Thread farhanali
when i search the query for example http://localhost:8983/solr/select/?q=category&qt=dismax it gives the results but when i want to search on the basis of field name like http://localhost:8983/solr/select/?q=maincategory:Cars&qt=dismax it does not gives results however http://localhost:8983/

RE: Solr replication

2008-01-16 Thread Dilip.TS
Hi Bill, I have some questions regarding the SOLR collection distribution. !) Is it possilbe to add the index operations on the the slave server using SOLR collection distribution and still the master server is updated with these changes? 2)I have a requirement of having more than one solr instance