Re: counter field

2012-04-05 Thread Manish Bafna
Yes, before indexing, we go and check whether that document is already there in index or not. Because along with the document, we also have meta-data information which needs to be appended. So, we have few multivalued metadata fields, which we update if the same document is found again. On Fri,

Re: counter field

2012-04-05 Thread Walter Underwood
So you will need to do a search for each document before adding it to the index, in case it is already there. That will be slow. And where do you store the last-assigned number? And there are plenty of other problems, like reloading after a corrupted index (disk failure), or deleted documents w

Re: counter field

2012-04-05 Thread Manish Bafna
Actually not. If i am updating the existing document, i need to keep the old number itself. may be this way we can do it. If we pass the number to the field, it will take that value, if we dont pass it, it will do auto-increment. Because if we update, i will have old number and i will pass it as a

Re: counter field

2012-04-05 Thread Walter Underwood
Why? When you reindex, is it OK if they all change? If you reindex one document, is it OK if it gets a new sequential number? wunder On Apr 5, 2012, at 9:23 PM, Manish Bafna wrote: > We already have a unique key (We use md5 value). > We need another id (sequential numbers). > > On Fri, Apr 6,

Re: counter field

2012-04-05 Thread Manish Bafna
We already have a unique key (We use md5 value). We need another id (sequential numbers). On Fri, Apr 6, 2012 at 9:47 AM, Chris Hostetter wrote: > > : We need to have a document id available for every document (Per core). > > : We can pass docid as one of the parameter for fq, and it will return

Re: counter field

2012-04-05 Thread Chris Hostetter
: We need to have a document id available for every document (Per core). : We can pass docid as one of the parameter for fq, and it will return the : docid in the search result. So it sounds like you need a *unique* id, but nothing you described requies that it be a counter. Take a look at th

Re: counter field

2012-04-05 Thread Manish Bafna
We need to have a document id available for every document (Per core). There is DocID in Lucene Index but did not find any API to expose it using Solr. May be if we can alter Solr to optionally return the DocId (which is unique), We can pass docid as one of the parameter for fq, and it will retur

Re: A tool for frequent re-indexing...

2012-04-05 Thread Ahmet Arslan
> I am considering writing a small tool that would read from > one solr core > and write to another as a means of quick re-indexing of > data.  I have a > large-ish set (hundreds of thousands) of documents that I've > already parsed > with Tika and I keep changing bits and pieces in schema and > co

Re: It cost some many memory with solrj 3.5 & how to decrease it?

2012-04-05 Thread a sd
hi,Erick. thanks at first. I had watched the status of JVM at runtime helped by "jconsole" and "jmap". 1,When the "Xmx" was not assigned, then, the "Old Gen" area was full whose size was up to 1.5Gb and whose major content are instances of "String" , when the whole size of heap was up to the maxim

A little onfusion with maxPosAsterisk

2012-04-05 Thread neosky
maxPosAsterisk - maximum position (1-based) of the asterisk wildcard ('*') that triggers the reversal of query term. Asterisk that occurs at positions higher than this value will not cause the reversal of query term. Defaults to 2, meaning that asterisks on positions 1 and 2 will cause a reversal.

schema design question

2012-04-05 Thread N. Tucker
Apologies if this is a very straightforward schema design problem that should be fairly obvious, but I'm not seeing a good way to do it. Let's say I have an index that wants to model Albums and Tracks, and they all have arbitrary tags attached to them (represented by multivalue string type fields).

EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-05 Thread pcrao
Hi, I am using EmbeddedSolrServer for full indexing (Multi core) and StreamingUpdateSolrServer for incremental indexing. The steps involved are mentioned below. Full indexing (Daily) 1) Start EmbeddedSolrServer 2) Delete all docs 3) Add all docs 4) Commit and optimize collection 5) Stop Embedde

RE: waitFlush and waitSearcher with SolrServer.add(docs, commitWithinMs)

2012-04-05 Thread Mike O'Leary
First of all, what I was seeing was different from what I thought I was seeing because a few weeks ago I uncommented the block in the solrconfig.xml file and I didn't realize it until yesterday just before I went home, so that was controlling the commits more than the add and commit calls that

How to store the secondary results from the Solr

2012-04-05 Thread neosky
Because the first query result doesn't meet my requirement I have to do a secondary process manually based on the first query full results. Only after I finish the secondary process, I begin to show it to the end user based on specific records(for instance like the Solr does 10 records a time) one

Re: It cost some many memory with solrj 3.5 & how to decrease it?

2012-04-05 Thread Erick Erickson
"What's memory"? Really, how are you measuring it? If it's virtual, you don't need to worry about it. Is this causing you a real problem or are you just nervous about the difference? Best Erick On Wed, Apr 4, 2012 at 11:23 PM, a sd wrote: > hi,all. >    I have write a program which send data to

Re: Is there any performance cost of using lots of OR in the solr query

2012-04-05 Thread Erick Erickson
Of course putting more clauses in an OR query will have a performance cost, there's more work to do OK, being a smart-alec aside you will probably be fine with a few hundred clauses. The question is simply whether the performance hit is acceptable. I'm afraid that question can't be answered in

Re: waitFlush and waitSearcher with SolrServer.add(docs, commitWithinMs)

2012-04-05 Thread Erick Erickson
Solr version? I suspect your outlier is due to merging segments, if so this should have happened quite some time into the run. See Simon Wilnauer's blog post on DocumenWriterPerThread (trunk) code. What commitWithin time are you using? Best Erick On Wed, Apr 4, 2012 at 7:50 PM, Mike O'Leary wr

Re: Choosing tokenizer based on language of document

2012-04-05 Thread Erick Erickson
This is really difficult to imagine working well. Even if you do choose the appropriate analysis chain (and it must be a chain here), and manage to appropriately tokenize for each language, what happens at query time? How do you expect to get matches on, say, Ukranian when the tokens of the query

Re: How to return a result with multiple query?

2012-04-05 Thread Erick Erickson
the default query operator is pretty much ignored with (e)dismax style parsers. You can get there by varying the "mm" parameter. See: http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 Best Erick On Tue, Apr 3, 2012 at 10:58 PM, neosky wrote: > 1.I did 5 gram t

upgrade solr from 1.4 to 3.5 not working

2012-04-05 Thread Robert Petersen
Hi folks, I'm a little stumped here. I have an existing solr 1.4 setup which is well configured. I want to upgrade to the latest solr release, and after reading release notes, the wiki, etc, I concluded the correct path would be to not change any config items and just replace the solr.war file

Re: custom field default qf of requestHandler

2012-04-05 Thread Chris Hostetter
: : : : : : : i'm pretty sure what you are seeing here is a variation on the "stopwords" confusion people tend to have about dismax (and edismax) just like hte lucene qparser, "whitespace" in the query string is significant, and is

Re: counter field

2012-04-05 Thread Chris Hostetter
: > Is it possible to define a field as "Counter Column" which can be : > auto-incremented. a feature like this does not exist in Solr at the moment, but it would be possible to implement this fairly easily in an UpdateProcessor -- however it would only be functional in very limited situations

JSP support not configured

2012-04-05 Thread Joseph Werner
I worked through the Solr tutorial and everthing worked like a charm; I figured I would go ahead and install Jetty and try to install Solr and get a functional prototype search engine up. Unfortunatly, my Jetty installation seems to be broken: HTTP ERROR 500 Problem accessing /solr/admin/index.

Re: Search for "library" returns 0 results, but search for "marion library" returns many results

2012-04-05 Thread Sean Adams-Hiett
Thanks for all the replies on this. It turns out that the reason that I wasn't getting the expected results is because I was not properly indexed one of the fields. My content type display settings for that field were set to hidden in Drupal. After I corrected this and re-indexed I started getting

Re: Search for "library" returns 0 results, but search for "marion library" returns many results

2012-04-05 Thread Erik Hatcher
> It looks like somehow the query is getting converted from "library" to > "librari". Any idea how that would happen? Yeah, that happens from having stemming involved in your query time analysis (look at your field type, you've surely got Snowball in there) Also, you're using the dismax query pa

Re: SolrCloud replica and leader out of Sync somehow

2012-04-05 Thread Yonik Seeley
On Thu, Apr 5, 2012 at 12:19 AM, Jamie Johnson wrote: > Not sure if this got lost in the shuffle, were there any thoughts on this? Sorting by "id" could be pretty expensive (memory-wise), so I don't think it should be default or anything. We also need a way for a client to hit the same set of ser

Large numbers of executeWithRetry INFO messages

2012-04-05 Thread Shubham Srivastava
Hi, I am getting the below log's Apr 5, 2012 6:27:59 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server 192.168.6.135 failed to respond Apr 5, 2012 6:27:59 PM

Re: Solr: Highlighting word parts in excerpt does not work

2012-04-05 Thread Koji Sekiguchi
(12/04/05 15:34), Thomas Werthmüller wrote: Hi I configured solr that also word parts are found. When is search "Monday" or "Mond" the right document is found. This is done with the following configuration in the schema.xml:. Now, when I add hl=true to the query sting, the excerpt for "Monday"

A endless loop in new SolrCloud probably

2012-04-05 Thread Jam Luo
Hi I deployed a solr cluster,the code version is "NightlyBuilds apache-solr-4.0-2012-03-19_09-25-37". Cluster has 4 nodes named "A", "B", "C", "D", "num_shards=2", A and C in shard1 , B and D in shard2, A and B is the leader of their shard. It has ran 2 days, added 20m docs, all of th

Re: query time customized boosting

2012-04-05 Thread Monmohan Singh
the problem is how do I determine for each document the degree of separation and then apply boosting for example - say there is a user A - with friends X, Y, Z and another User B with friends L, M if there is a doc in index D1, with author field as Z and another doc D2 in index with author as L, I

Re: alt attribute img tag

2012-04-05 Thread Marcelo Carvalho Fernandes
Hi Manuel, Why don't you create a program to parse the html files, maybe using xslt, and them submit the output to Solr? --- Marcelo On Thursday, April 5, 2012, Manuel Antonio Novoa Proenza < mano...@estudiantes.uci.cu> wrote: > Hello, > > I would like to know the method of extracting from the i

counter field

2012-04-05 Thread Manish Bafna
> > Hi, > Is it possible to define a field as "Counter Column" which can be > auto-incremented. > > Thanks, > Manish. >