org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:

2011-06-07 Thread bramsreddy
Hi, I have solr master slave setup.On master i keep on updating the index using DIH.I am delta-import requests one after other using some batch job.And my slave polls for every 20 min to get the updated index from master.But for delta-import i am getting the following exception Error creating do

Re: Data not always returned

2011-06-07 Thread Jerome Renard
Hi Erick On Tue, Jun 7, 2011 at 11:42 PM, Erick Erickson wrote: > Well, this is odd. Several questions > > 1> what do your logs show? I'm wondering if somehow some data is getting >     rejected. I have no idea why that would be, but if you're seeing indexing >     exceptions that would explain i

tika integration exception and other related queries

2011-06-07 Thread Naveen Gupta
Hi Can somebody answer this ... 3. can somebody tell me an idea how to do indexing for a zip file ? 1. while sending docx, we are getting following error. java.lang. > > NumberFormatException: For input string: "2011-01-27T07:18:00Z" > at > java.lang.NumberFormatException.forInputString(

RE: 400 MB Fields

2011-06-07 Thread Burton-West, Tom
Hi Otis, Our OCR fields average around 800 KB. My guess is that the largest docs we index (in a single OCR field) are somewhere between 2 and 10MB. We have had issues where the in-memory representation of the document (the in memory index structures being built)is several times the size of t

Boosting result on query.

2011-06-07 Thread Jeff Boul
Hi, I am trying to figure out options for the following problem. I am on Solr 1.4.1 (Lucene 2.9.1). I need to perform a boost on a query related to the value of a multiple value field. Lets say the result return the following documents: id namelinked_items

Re: How to deal with many files using solr external file field

2011-06-07 Thread Simon Rosenthal
Can you provide a stack trace for the OOM eexception ? On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven wrote: > Hi all, > > we're using solr 1.4 and external file field ([1]) for sorting our > searchresults. We have about 40.000 Terms, for which we use this sorting > option. > Currently we're runn

Re: 400 MB Fields

2011-06-07 Thread Lance Norskog
The Salesforce book is 2800 pages of PDF, last I looked. What can you do with a field that big? Can you get all of the snippets? On Tue, Jun 7, 2011 at 5:33 PM, Fuad Efendi wrote: > Hi Otis, > > > I am recalling "pagination" feature, it is still unresolved (with default > scoring implementation)

Re: 400 MB Fields

2011-06-07 Thread Fuad Efendi
Hi Otis, I am recalling "pagination" feature, it is still unresolved (with default scoring implementation): even with small documents, searching-retrieving documents 1 to 10 can take 0 milliseconds, but from 100,000 to 100,010 can take few minutes (I saw it with trunk version 6 months ago, and wi

Re: 400 MB Fields

2011-06-07 Thread Otis Gospodnetic
Hi, > I think the question is strange... May be you are wondering about possible > OOM exceptions? No, that's an easier one. I was more wondering whether with 400 MB Fields (indexed, not stored) it becomes incredibly slow to: * analyze * commit / write to disk * search > I think we can pass

Re: 400 MB Fields

2011-06-07 Thread Fuad Efendi
I think the question is strange... May be you are wondering about possible OOM exceptions? I think we can pass to Lucene single document containing comma separated list of "term, term, ..." (few billion times)... Except "stored" and "TermVectorComponent"... I believe thousands companies already in

Re: 400 MB Fields

2011-06-07 Thread Erick Erickson
>From older (2.4) Lucene days, I once indexed the 23 volume "Encyclopedia of Michigan Civil War Volunteers" in a single document/field, so it's probably within the realm of possibility at least ... Erick On Tue, Jun 7, 2011 at 6:59 PM, Otis Gospodnetic wrote: > Hello, > > What are the biggest do

400 MB Fields

2011-06-07 Thread Otis Gospodnetic
Hello, What are the biggest document fields that you've ever indexed in Solr or that you've heard of? Ah, it must be Tom's Hathi trust. :) I'm asking because I just heard of a case of an index where some documents having a field that can be around 400 MB in size! I'm curious if anyone has an

Re: wildcard search

2011-06-07 Thread Erick Erickson
Yes there is, but you haven't provided enough information to make a suggestion. What isthe fieldType definition? What is the field definition? Two resources that'll help you greatly are: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and the admin/analysis page... Best Erick On Tue

Re: Solr Coldfusion Search Issue

2011-06-07 Thread Alejandro Delgadillo
Thanks Lee for the quick response, Let me explain it a little bit better In the CFSEARCH tag, you use the CRITERIA attribute, what it does... By default is that it sents to the SOLR via post the search query of the user to the field where the text is stored in this case since I'm indexing PDF fil

wildcard search

2011-06-07 Thread Thomas Fischer
Hello, I am testing solr 3.2 and have problems with wildcards. I am indexing values like "IA 300; IC 330; IA 317; IA 318" in a field "GOK", and can't find a way to search with wildcards. I want to use a wild card search to match something like "IA 31?" but cannot find a way to do so. GOK:IA\ 38*

Re: Compound word search not what I expected

2011-06-07 Thread Markus Jelsma
You must catenateWord on index-time as well. > I tried setting catenateWords="1" on the Query analyzer and that didn't do > anything. I think what I need is to set my Index Analyzer to have > preserveOriginal="1" and then re-index everything. That will be a pain, so > I'll do a small test to make

Re: Solr Coldfusion Search Issue

2011-06-07 Thread lee carroll
Can you see the query actually presented to solr in the logs ? maybe capture that and then run it with a debug true in the admin pages. sorry i cant help directly with your syntax On 7 June 2011 23:06, Alejandro Delgadillo wrote: > Hi, > > I¹m having some troubles using Solr throught Coldfusio

Solr Coldfusion Search Issue

2011-06-07 Thread Alejandro Delgadillo
Hi, I¹m having some troubles using Solr throught Coldfusion, the problem right now is that when I search for a term in a Custom field, the results sometimes have the value that I sent to the custom field and not to the field that contains the text, this is the cfsearch sintax that I¹m using: E

Re: Default query parser operator

2011-06-07 Thread lee carroll
Hi Brian could your front end app do this field query logic? (assuming you have an app in front of solr) On 7 June 2011 18:53, Jonathan Rochkind wrote: > There's no feature in Solr to do what you ask, no. I don't think. > > On 6/7/2011 1:30 PM, Brian Lamb wrote: >> >> Hi Jonathan, >> >> Thank

Re: Compound word search not what I expected

2011-06-07 Thread kenf_nc
I tried setting catenateWords="1" on the Query analyzer and that didn't do anything. I think what I need is to set my Index Analyzer to have preserveOriginal="1" and then re-index everything. That will be a pain, so I'll do a small test to make sure first. I'm really surprised preserveOriginal="1"

Re: Compound word search not what I expected

2011-06-07 Thread lee carroll
see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory from the wiki Example of generateWordParts="1" and catenateWords="1": "PowerShot" -> 0:"Power", 1:"Shot" 1:"PowerShot" (where 0,1,1 are token positions) "A's+B's&C's" -> 0:"A", 1:"B", 2:"C", 2:"ABC" "S

Re: Compound word search not what I expected

2011-06-07 Thread Erick Erickson
WordDelimiterFilterFactory is doing this to you. It's not clear to me that you want this in place at all. Look at admin/analysis for that field to see how that filter breaks things up, it's often surprising to people. Best Erick On Tue, Jun 7, 2011 at 3:13 PM, kenf_nc wrote: > I have a field de

Re: Data not always returned

2011-06-07 Thread Erick Erickson
Well, this is odd. Several questions 1> what do your logs show? I'm wondering if somehow some data is getting rejected. I have no idea why that would be, but if you're seeing indexing exceptions that would explain it. 2> on the admin/stats page, are maxDocs and numDocs the same in the su

Available Solr Indexing strategies

2011-06-07 Thread zarni aung
Hi, I am very new to Solr and my client is trying to implement full text searching capabilities to their product by using Solr. They will also have master storage that would be the Authoritative data store which will also provide meta data searches. Can you please point me in the right direction

How to deal with many files using solr external file field

2011-06-07 Thread Bohnsack, Sven
Hi all, we're using solr 1.4 and external file field ([1]) for sorting our searchresults. We have about 40.000 Terms, for which we use this sorting option. Currently we're running into massive OutOfMemory-Problems and were not pretty sure, what's the matter. It seems that the garbage collector s

Re: Compound word search not what I expected

2011-06-07 Thread Markus Jelsma
catenateWords should be set to true. Same goes for the index analyzer. preserveOriginal would also work. > I have a field defined as: > termVectors="true" multiValued="true" /> > where "text" is unmodified from the schema.xml example that came with Solr > 1.4.1. > > I have documents with so

Compound word search not what I expected

2011-06-07 Thread kenf_nc
I have a field defined as: where "text" is unmodified from the schema.xml example that came with Solr 1.4.1. I have documents with some compound words indexed, words like Sandstone. And in several cases words that are camel case like MaxSize. If I query using all lower case, sandstone or maxs

Solr Cloud and Range Facets

2011-06-07 Thread Jamie Johnson
I have a solr cloud setup wtih 2 servers, when executing a query against them of the form: http://localhost:8983/solr/select/?distrib=true&q=*:*&facet=true&facet.mincount=1&facet.range=dateTime&f.dateTime.facet.range.gap=%2B1MONTH&f.dateTime.facet.range.start=2011-06-01T00%3A00%3A00Z-1YEAR&f.dateT

Re: Question about tokenizing, searching and retrieving results.

2011-06-07 Thread Yonik Seeley
On Tue, Jun 7, 2011 at 12:34 PM, Luis Cappa Banda wrote: > *Expression*: A B C D E F G H I As written, this is equivalent to *Expression*: A default_field:B default_field:C default_field:D default_field:E default_field:F default_field:G default_field:H default_field:I Try *Expression*:( A B C D

Re: Default query parser operator

2011-06-07 Thread Jonathan Rochkind
There's no feature in Solr to do what you ask, no. I don't think. On 6/7/2011 1:30 PM, Brian Lamb wrote: Hi Jonathan, Thank you for your reply. Your point about my example is a good one. So let me try to restate using your example. Suppose I want to apply AND to any search terms within field1.

Re: Solr Cloud Query Question

2011-06-07 Thread Yonik Seeley
On Tue, Jun 7, 2011 at 1:01 PM, Jamie Johnson wrote: > Thanks Yonik.  I have a follow on now, how does Solr ensure consistent > results across pages?  So for example if we had my 3 theoretical solr > instances again and a, b and c each returned 100 documents with the same > score and the user only

Re: Question about tokenizing, searching and retrieving results.

2011-06-07 Thread Tomás Fernández Löbbe
My first guess would be that you are using AND as default operator? you can see the generated query by using the parameter debugQuery=true On Tue, Jun 7, 2011 at 1:34 PM, Luis Cappa Banda wrote: > Hello! > > My problem is as follows: I've got a field (indexed and stored setted as > true) tokenize

Re: Default query parser operator

2011-06-07 Thread Brian Lamb
Hi Jonathan, Thank you for your reply. Your point about my example is a good one. So let me try to restate using your example. Suppose I want to apply AND to any search terms within field1. Then field1:foo field2:bar field1:baz field2:bom would by written as http://localhost:8983/solr/?q=field

Re: Solr Cloud Query Question

2011-06-07 Thread Jamie Johnson
Thanks Yonik. I have a follow on now, how does Solr ensure consistent results across pages? So for example if we had my 3 theoretical solr instances again and a, b and c each returned 100 documents with the same score and the user only requested 100 documents, how are those 100 documents chosen f

Question about tokenizing, searching and retrieving results.

2011-06-07 Thread Luis Cappa Banda
Hello! My problem is as follows: I've got a field (indexed and stored setted as true) tokenized by whitespaces and other patterns, with a gap with value 100. For example, if index the following expression for the field that I mentioned: *Expression*: A B C D E-> *Index*: tokenAt

Data not always returned

2011-06-07 Thread Jerome Renard
Hi all, I have a problem with my index. Even though I always index the same data over and over again, whenever I try a couple of searches (they are always the same as they are issued by a unit test suite) I do not get the same results, sometimes I get 3 successes and 2 failures and sometimes it is

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread Jonathan Rochkind
Okay, if you're using a custom similarity, I'm not sure what's going on, I'm not familiar with that. But ordinarily, you are right, you would require k1 with "+k1". What you say about the "+" being lost suggests something is going wrong. Either you are not sending your query to Solr properly e

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread Gabriele Kahlout
You are right, Lucene will return based on my scoring function implementation (Similarity class ): score(q,d) = coord(q,d)

Re: Solr Custom Installation

2011-06-07 Thread Tomás Fernández Löbbe
Hi Federico, you can take a look to this wiki page: http://wiki.apache.org/solr/EmbeddedSolr Solr also has some maven support, see the ant target "generate-maven-artifacts", don't know if that's what you need. Regards, Tomás On Tue, Jun 7, 2011 at 12:17 PM

Re: Default query parser operator

2011-06-07 Thread Jonathan Rochkind
Nope, not possible. I'm not even sure what it would mean semantically. If you had default operator "OR" ordinarily, but default operator "AND" just for "field2", then what would happen if you entered: field1:foo field2:bar field1:baz field2:bom Where the heck would the ANDs and ORs go? The

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread Jonathan Rochkind
Um, normally that would never happen, because, well, like you say, the inverted index doesn't have docC for term K1, because doc C didn't include term K1. If you search on q=K1, then how/why would docC ever be in your result set? Are you seeing it in your result set? The question then would b

Solr Custom Installation

2011-06-07 Thread Federico Czerwinski
Hey there. I was wondering if Solr can be embedded into my Java Web App. As far as I know, Solr comes as a war or bundled with Jetty if you don't have a container. I've opened the war's web.xml and found out that it only has a couple of servlets, filters and that's it. So, would it be possible to

Re: Default query parser operator

2011-06-07 Thread Brian Lamb
I feel like this should be fairly easy to do but I just don't see anywhere in the documentation on how to do this. Perhaps I am using the wrong search parameters. On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb wrote: > Hi all, > > Is it possible to change the query parser operator for a specific fie

Re: Debugging a Solr/Jetty Hung Process

2011-06-07 Thread Chris Cowan
OK... The fix I thought would fix it didn't fix it (which was to use the commitWithin feature). What I can gather from `ps` is that the thread has pages locked in memory. Currently I'm using native locking for Solr. Would switching to simple help alleviate this problem? Chris On Jun 4, 2011, a

Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value

2011-06-07 Thread Stefan Moises
Hi Yonik, thanks, it's working in trunk now again... I had to re-index though because of exceptions at startup, did the index format change again between trunk of beginning / mid may and current trunk? best regards, Stefan Am 03.06.2011 15:32, schrieb Yonik Seeley: This bug was introduced d

RE: SpellCheckComponent performance

2011-06-07 Thread Dyer, James
Demian, If you omit "spellcheckIndexDir" from the configuration, it will create an in-memory spelling dictionary. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Demian Katz [mailto:demian.k...@villanova.edu] Sent: Tuesday, June 07, 2011 7:

Re: function queries scope

2011-06-07 Thread Marco Martinez
Thanks, but its not what i'm looking for, because the BoostQParserPlugin multiplies the score of the query with the function queries defined in the b param of the BoostQParserPlugin. and i can't use the edismax because we have our own qparser. Its seems that i have to code another qparser. Thanks

Re: function queries scope

2011-06-07 Thread Yonik Seeley
One way is to use the boost qparser: http://search-lucene.com/jd/solr/org/apache/solr/search/BoostQParserPlugin.html q={!boost b=productValueField}shops in madrid Or you can use the edismax parser which as a "boost" parameter that does the same thing: defType=edismax&q=shops in madrid&boost=produc

Re: Solr Cloud Query Question

2011-06-07 Thread Yonik Seeley
On Tue, Jun 7, 2011 at 9:35 AM, Jamie Johnson wrote: > I am currently experimenting with the Solr Cloud code on trunk and just had > a quick question.  Lets say my setup had 3 nodes a, b and c.  Node a has > 1000 results which meet a particular query, b has 2000 and c has 3000.  When > executing t

Solr Cloud Query Question

2011-06-07 Thread Jamie Johnson
I am currently experimenting with the Solr Cloud code on trunk and just had a quick question. Lets say my setup had 3 nodes a, b and c. Node a has 1000 results which meet a particular query, b has 2000 and c has 3000. When executing this query and asking for row 900 what specifically happens? F

Re: java.lang.AbstractMethodError at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)

2011-06-07 Thread idivad
Finally figured out the problem. -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-AbstractMethodError-at-org-apache-solr-handler-ContentStreamHandlerBase-handleRequestBody--tp3026470p3034456.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: [ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released

2011-06-07 Thread roySolr
Hello, I have some problems with the installation of the new PECL package solr-1.0.1. I run this command: pecl uninstall solr-beta ( to uninstall old version, 0.9.11) pecl install solr The installing is running but then it gives the following error message: /tmp/tmpKUExET/solr-1.0.1/solr_funct

RE: SpellCheckComponent performance

2011-06-07 Thread Demian Katz
As I may have mentioned before, VuFind is actually doing two Solr queries for every search -- a base query that gets basic spelling suggestions, and a supplemental spelling-only query that gets shingled spelling suggestions. If there's a way to get two different spelling responses in a single q

Re: problem: zooKeeper Integration with solr

2011-06-07 Thread Mohammad Shariq
how this method (http://localhost:8983/solr/select?shards=*:/,**:/*&indent=true&q=) is better than zooKeeper, could you please refer any performance doc. On 7 June 2011 08:18, bmdakshinamur...@gmail.com wrote: > Instead of integrating zookeeper, you could create shards over multiple > machines

Re: Commit taking very long

2011-06-07 Thread Erick Erickson
Are you optimizing? That is unnecessary when committing, and is often the culprit. Best Erick On Tue, Jun 7, 2011 at 5:42 AM, Rohit Gupta wrote: > Hi, > > My commit seems to be taking too much time, if you notice from the Dataimport > status given below to commit 1000 docs its taking longer tha

clustering problems on 3.1

2011-06-07 Thread bryan rasmussen
I added the following to my configuration explicit true default true title all_text all_text title 150 clustering default org.carrot2.clustering.lingo.LingoClusteringAlgorithm 20 which ended up wit

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread lee carroll
Gabriele Lucene uses a combination of boolean and VSM for its IR. A straight forward query for a keyword will only match docs with that keyword. Now things quickly get subtle and complex the more sugar you add, more complicated queries across fields and more complex analysis chains but I think th

Re: Documents update

2011-06-07 Thread Denis Kuzmenok
Created file, reloaded solr - externalfilefield works fine, if i change change external files and do "curl http://127.0.0.1:4900/solr/site/update -H "Content-Type: text/xml" --data-binary ''" then no thanges are made. If i start solr without external files and

solr 3.1 java.lang.NoClassDEfFoundError org/carrot2/core/ControllerFactory

2011-06-07 Thread bryan rasmussen
As per the subject I am getting java.lang.NoClassDEfFoundError org/carrot2/core/ControllerFactory when I try to run clustering. I am using Solr 3.1: I get the following error: java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory at org.apache.solr.handler.clustering.carrot

Indexing Mediawiki

2011-06-07 Thread Tod
I have a need to index an internal instance of Mediawiki. I'd like to use DIH if I can since I have access to the database but the example provided on the Solr wiki uses a Mediawiki dump XML file. Does anyone have any experience using DIH in this manner? Am I barking up the wrong tree and wo

function queries scope

2011-06-07 Thread Marco Martinez
Hi, I need to use the function queries operations with the score of a given query, but only in the docset that i get from the query and i dont know if this is possible. Example: q=shops in madridreturns 1 docs with a specific score for each doc but now i need to do some stuff like q=

How many fields can SOLR handle?

2011-06-07 Thread roySolr
Hello, I have a SOLR implementation with 1m products. Every products has some information, lets say a television has some information about pixels and inches, a computer has information about harddisk, cpu, gpu. When a user search for computer i want to show the correct facets. An example: User s

getting numberformat exception while using tika

2011-06-07 Thread Naveen Gupta
Hi We are using requestextractinghandler and we are getting following error. we are giving microsoft docx file for indexing. I think that this is something to do with field date definition .. but now very sure ...what field type should we use? 2. we are trying to index jpg (when we search over t

Commit taking very long

2011-06-07 Thread Rohit Gupta
Hi, My commit seems to be taking too much time, if you notice from the Dataimport status given below to commit 1000 docs its taking longer than 24 minutes busy A command is still running... − 0:24:43.156 1001 1658 0 2011-06-07 09:15:17 − Indexing completed. Added/Updated: 1000 documents. Dele

Re: Master Slave help

2011-06-07 Thread Rohit Gupta
thanks Jayendra.. From: Jayendra Patil To: solr-user@lucene.apache.org Sent: Tue, 7 June, 2011 6:55:58 AM Subject: Re: Master Slave help Do you mean the replication happens everytime you restart the server ? If so, you would need to modify the events you want

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread Gabriele Kahlout
On Tue, Jun 7, 2011 at 8:43 AM, pravesh wrote: > >k0 --> A | C > >k1 --> A | B > >k2 --> A | B | C > >k3 --> B | C > >Now let q=k1, how do I make sure C doesn't appear as a result since it > doesn't contain any occurence of k1? > Do we bother to do that. Now that's what lucene does :) > > Lucene/