Re: Question about the message "Indexing failed. Rolled back all changes."

2009-11-10 Thread Bertie Shen
No. I did not check the logs. But even after I successfully index data using http://host:port/solr-example/dataimport?command=full-import&commit=true&clean=true, do solr search which returns meaningful results, and then visit http://host:port/solr-example/dataimport?command=status, I can see the

Re: Question about the message "Indexing failed. Rolled back all changes."

2009-11-10 Thread Avlesh Singh
> > But even after I successfully index data using > http://host:port/solr-example/dataimport?command=full-import&commit=true&clean=true, > do solr search which returns meaningful results > I am not sure what "meaningful" means. The full-import command starts an asynchronous process to start re-i

Re: How TEXT field make sortable?

2009-11-10 Thread Lucas F. A. Teixeira
That's correct. You can use copyField to copy this field's content to another field of other tipo (string) and sort by this one. []s, Lucas Frare Teixeira .·. - lucas...@gmail.com - lucastex.com.br - blog.lucastex.com - twitter.com/lucastex On Tue, Nov 10, 2009 at 5:36 AM, Avlesh Singh wrote:

Re: A question about how to make schema.xml change take effect

2009-11-10 Thread Chantal Ackermann
Did the schema browser really show a different type after restarting? I would think you'd have to reindex before the change gets applied to the actual data. Or is you're index/import process launched on Tomcat startup? (schema.xml != schema browser ?!) Chantal Bertie Shen schrieb: Oh. Sorry

Configuring 1.4 - multi master setup?

2009-11-10 Thread Kevin Jackson
Hi all, We have a situation where we would like to have 1 Master server (creates the index) 1 input slave server (which receives the updated index from the master) n slaves (which receive the updated index from the input slave server) This is to prevent each of the n slaves polling the master ser

distributed facet dates

2009-11-10 Thread Marc Sturlese
Hey there, I am thinking to develope facet dates for distributed search but I don't know exacly where to start. I am familiar with facet dates source code and I think if I could undesertand how distributed facet queries work shouldn't be that difficult. I have read http://wiki.apache.org/solr/Writ

Re: Configuring 1.4 - multi master setup?

2009-11-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
see the setting up a repeater section in this page http://wiki.apache.org/solr/SolrReplication On Tue, Nov 10, 2009 at 5:17 PM, Kevin Jackson wrote: > Hi all, > > We have a situation where we would like to have > 1 Master server (creates the index) > 1 input slave server (which receives the upda

Re: distributed facet dates

2009-11-10 Thread Yonik Seeley
On Tue, Nov 10, 2009 at 7:09 AM, Marc Sturlese wrote: > Hey there, > I am thinking to develope facet dates for distributed search but I don't > know exacly where to start. I am familiar with facet dates source code and I > think if I could undesertand how distributed facet queries work shouldn't b

Re: distributed facet dates

2009-11-10 Thread Yonik Seeley
On Tue, Nov 10, 2009 at 7:54 AM, Yonik Seeley wrote: > On Tue, Nov 10, 2009 at 7:09 AM, Marc Sturlese > wrote: >> Hey there, >> I am thinking to develope facet dates for distributed search but I don't >> know exacly where to start. I am familiar with facet dates source code and I >> think if I c

Re: tracking solr response time

2009-11-10 Thread bharath venkatesh
Otis, This means we have to leave enough space for os cache to cache the whole index . so In case of 16 GB index ., if I am not wrong at least 16 GB memory must not be allocated to any application for os cache to utilize the memory . >> The operating systems are very good at maintaining t

Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-11-10 Thread Eugene Dzhurinsky
On Tue, Nov 03, 2009 at 05:49:23PM -0800, Lance Norskog wrote: > The DIH has improved a great deal from Solr 1.3 to 1.4. You will be > much better off using the DIH from this. > > This is the current Solr release candidate binary: > http://people.apache.org/~gsingers/solr/1.4.0/ In fact we are pr

Re: tracking solr response time

2009-11-10 Thread Yonik Seeley
On Tue, Nov 10, 2009 at 8:07 AM, bharath venkatesh wrote: > how much ram would be good enough for the Solr JVM  to run comfortably. It really depends on how much stuff is cached, what fields you facet and sort on, etc. It can be easier to measure than to try and calculate it. Run jconsole to see

Re: Configuring 1.4 - multi master setup?

2009-11-10 Thread Kevin Jackson
Hi, 2009/11/10 Noble Paul നോബിള്‍ नोब्ळ् : > see the setting up a repeater section in this page > > http://wiki.apache.org/solr/SolrReplication Doh! Sorry for the noise Thanks, Kev

Re: sanizing/filtering query string for security

2009-11-10 Thread michael8
Thanks guys for your input and suggestions! Michael Otis Gospodnetic wrote: > > Word of warning: > Careful with q.alt=*:* if you are dealing with large indices! :) > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA,

Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-11-10 Thread Israel Ekpo
On Tue, Nov 10, 2009 at 8:26 AM, Eugene Dzhurinsky wrote: > On Tue, Nov 03, 2009 at 05:49:23PM -0800, Lance Norskog wrote: > > The DIH has improved a great deal from Solr 1.3 to 1.4. You will be > > much better off using the DIH from this. > > > > This is the current Solr release candidate binary

Re: Configuring 1.4 - multi master setup?

2009-11-10 Thread Walter Underwood
Replication creates very little load on the master, so you should not need to have a separate machine just to handle the replication. Why do you think you need that? wunder On Nov 10, 2009, at 5:37 AM, Kevin Jackson wrote: Hi, 2009/11/10 Noble Paul നോബിള്‍ नोब्ळ् : see the setting up a

Re: Slow Commits

2009-11-10 Thread Jim Murphy
Just an update to the list. It appears that memory was the culprit. I attached a JMX console to the running Tomcat instance and monitored memory usage. Used Total memory stayed ~900MB till a commit then jumped to m Xmx setting of 1.2GB where the "peak" flatlined and fell down likely after an OO

Re: de-boosting certain facets during search

2009-11-10 Thread Paul Rosen
Thanks Erik, Your suggestion below works great. And we do want a particularly relevant Citation to appear higher in the list. I'm guessing that the value of the boost (you've given "5" in your example) is important to getting the Citations to be just high enough. Is there a way for me to d

Converting SortableIntField to Integer (Externalizing)

2009-11-10 Thread Chantal Ackermann
Hi all, has anyone some code snippet on how to convert the String representation of a SortableIntField (or SortableLongField or else) to a java.lang.Integer or int? Input: String (cryptic, non human readable, value of a sint field) Output: Integer or int I would appreciate if anyone could gi

Re: tracking solr response time

2009-11-10 Thread bharath venkatesh
Thanks yonik .. will consider Jconsole On Tue, Nov 10, 2009 at 7:01 PM, Yonik Seeley wrote: > On Tue, Nov 10, 2009 at 8:07 AM, bharath venkatesh > wrote: > > how much ram would be good enough for the Solr JVM to run comfortably. > > It really depends on how much stuff is cached, what fields you

understanding how solr/lucene handles a select query (to analyze where solr/lucene is taking time )

2009-11-10 Thread bharath venkatesh
Hi, As mentioned in my previous post , we are experiencing a delay (latency ) for 15 % of the request to solr . delay is about 2-4 sec sometimes it even reaches 10 sec (noticed from apache tomcat logs where solr is running

[ANN] Solr 1.4.0 Released

2009-11-10 Thread Grant Ingersoll
Apache Solr 1.4 has been released and is now available for public download! http://www.apache.org/dyn/closer.cgi/lucene/solr/ Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highl

Selection of terms for MoreLikeThis

2009-11-10 Thread Andrew Clegg
Hi, If I run a MoreLikeThis query like the following: http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=list&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1 one of the hits in the results is "and" (I don't do any stopword removal on this field). Howev

Re: Highlighting is very slow

2009-11-10 Thread Nicolas Dessaigne
I'm afraid there is no perfect solution for this problem, as you may always have very long documents that will result in long response times, even with a faster implementation (see https://issues.apache.org/jira/browse/SOLR-1268 ). The only way to avoid confusion for users and to ensure correct re

Re: Converting SortableIntField to Integer (Externalizing)

2009-11-10 Thread Yonik Seeley
On Tue, Nov 10, 2009 at 10:26 AM, Chantal Ackermann wrote: > has anyone some code snippet on how to convert the String representation of > a SortableIntField (or SortableLongField or else) to a java.lang.Integer or > int? FieldType.indexedToReadable() -Yonik http://www.lucidimagination.com

[Fwd: [ANN] Solr 1.4.0 Released]

2009-11-10 Thread Sean Timm
--- Begin Message --- Apache Solr 1.4 has been released and is now available for public download! http://www.apache.org/dyn/closer.cgi/lucene/solr/ Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful ful

Re: Configuring 1.4 - multi master setup?

2009-11-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Nov 10, 2009 at 7:58 PM, Walter Underwood wrote: > Replication creates very little load on the master, so you should not need > to have a separate machine just to handle the replication. > > Why do you think you need that? correct. A repeater is setup when your main master is not located

Re: Configuring 1.4 - multi master setup?

2009-11-10 Thread Walter Underwood
If the master and slaves are separated by a WAN, sure, but Kev wants all the slaves to go to a single repeater in order to "reduce polling", so I doubt this is a WAN issue. Just trying to keep the configuration simple. Only use a repeater if you actually need it. wunder On Nov 10, 2009,

HTMLStripCharFilterFactory not working when using SolrJ java client

2009-11-10 Thread aseem cheema
Hey Guys, I have HTMLStripCharFilterFactory char filter declared in my schema.xml for fieldType text (code below). I am using this field type for body field of my schema. I am seeing different behavior when I use SolrJ to post a document (code below) and when I use the analysis.jsp. The text I am p

Re: [Fwd: [ANN] Solr 1.4.0 Released]

2009-11-10 Thread Sean Timm
Apologies. Meant to forward the message to a corporate internal list. I blame my e-mail address auto-complete. ;-) Sean Timm wrote: Subject: [ANN] Solr 1.4.0 Released From: Grant Ingersoll Date: Tue, 10 Nov 2009 11:0

How to Post Search Query

2009-11-10 Thread deepak agrawal
Hi All, My Solr Search query is too long so i am not able to put it through get method. So i want to post it through POST method. is there any way through i can POST the Search Query through POST Method. -- DEEPAK AGRAWAL +91-9379433455 GOOD LUCK.

RE: How to Post Search Query

2009-11-10 Thread Ankit Bhatnagar
Hi Deepak, U can specify - METHOD.POST -Ankit -Original Message- From: deepak agrawal [mailto:dk.a...@gmail.com] Sent: Tuesday, November 10, 2009 3:08 PM To: solr-user@lucene.apache.org Subject: How to Post Search Query Hi All, My Solr Search query is too long so i am not able to put

any docs on solr.EdgeNGramFilterFactory?

2009-11-10 Thread Peter Wolanin
This fairly recent blog post: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ describes the use of the solr.EdgeNGramFilterFactory as the tokenizer for the index. I don't see any mention of that tokenizer on the Solr wiki - is it just waiting t

Re: HTMLStripCharFilterFactory not working when using SolrJ java client

2009-11-10 Thread aseem cheema
I printed the UpdateRequest object (getXML) and the XML is: http://haha.com
content
I can see that the issue is because the HTML/XML <> are replaced by < > I understand that it is required to do so to keep them from interfering with the solr xml document, but how do I accomplish wh

Field settings for best highlighting performance

2009-11-10 Thread Jake Brownell
Hi, I've seen the use case for highlighting on: http://wiki.apache.org/solr/FieldOptionsByUseCase I just wanted to confirm that for best performance Indexed=true Stored=true termVectors=true termPositions=true is the way to go for highlighting for Solr 1.4. Note that I'm not doing anything el

RE: Segment file not found error - after replicating

2009-11-10 Thread Maduranga Kannangara
Thanks Otis, I did the du -s for all three index directories as you said right after replicating and when I find errors. All three gave me the exact same value. This time I found the error in a rather small index too (31Mb). BTW, if I copy the segment_x file to what Solr is looking for, and re

Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Turner, Robbin J
I've been looking through all the documentation. I've set up a single solr instance, and one multicore instance. If someone would be willing to share some configuration examples and/or advise for setting up solr for distributing the search, I would really appreciate it. I've read that there i

Re: Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Otis Gospodnetic
RJ, You may want to take a simpler step - single Solr core (no solr.xml needed) per machine. Then distributed search really only requires that you specify shard URLs in the URL of the search requests. In practice/production you rarely benefit from distributed search against multiple cores on

Re: Segment file not found error - after replicating

2009-11-10 Thread Otis Gospodnetic
It sounds like your index is not being fully replicated. I can't tell why, but I can suggest you try the new Solr 1.4 replication. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message >

RE: Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Turner, Robbin J
I've already done the single Solr, that's why my request. I read on some site that there is a way to setup the configuration so I can send a query to one solr instance and have it pass it on or distribute it across all the instances? Btw, thanks for the quick reply. RJ -Original Message--

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-10 Thread Otis Gospodnetic
Peter, For CJK and n-grams, I think you don't want the *Edge* n-grams, but just n-grams. Before you take the n-gram route, you may want to look at the smart Chinese analyzer in Lucene contrib (I think it works only for Simplified Chinese) and Sen (on java.net). I also spotted a Korean analyzer

Re: Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Otis Gospodnetic
Right, that's http://wiki.apache.org/solr/DistributedSearch Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: "Turner, Robbin J" > To: "solr-user@lucene.apache.org" > Sent: Tu

Re: understanding how solr/lucene handles a select query (to analyze where solr/lucene is taking time )

2009-11-10 Thread Otis Gospodnetic
Hi, I don't think there is anything inside Lucene/Solr that will give you granular timing information. The only thing I can think of is using &debugQuery=true and looking at timing info for different search components. You're better off using a profiler, though such slow queries tend to be the

Re: tracking solr response time

2009-11-10 Thread Otis Gospodnetic
Hello, - Original Message > From: bharath venkatesh > To: solr-user@lucene.apache.org > Sent: Tue, November 10, 2009 8:07:59 AM > Subject: Re: tracking solr response time > > Otis, > >This means we have to leave enough space for os cache to cache the whole > index . so In case o

Re: HTMLStripCharFilterFactory not working when using SolrJ java client

2009-11-10 Thread aseem cheema
HTMLStripCharFilterFactory class has a constructor that accept escaptedTags. I believe this will solve my problem. But I am not sure how to pass this from schema.xml file. I have tried but that didn't work. Anybody? Thanks On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema wrote: > Hey Guys, > I hav

RE: Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Turner, Robbin J
Thanks, I had already read through this url. I guess my request was is there a way to setup something that is already part of solr itself to pass the URL[shard...] then having create a custom handler. thanks -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]

Re: Are subqueries possible in Solr? If so, are they performant?

2009-11-10 Thread Otis Gospodnetic
No, I don't think you can do that with Solr. Somebody will correct me if I'm wrong. :) What you are describing are SQL sub-queries and the closest things I can think of are using AND as I mentioned, and maybe using filter queries (the "fq" parameter). Otis -- Sematext is hiring -- http://se

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-10 Thread Peter Wolanin
So, this is the normal N-gram one? NGramTokenizerFactory Digging deeper - there are actualy CJK and Chinese tokenizers in the Solr codebase: http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseToken

Apache Hadoop Get Together Berlin - December 2009

2009-11-10 Thread Isabel Drost
As announced at ApacheCon US, the next Apache Hadoop Get Together Berlin is scheduled for December 2009. When: Wednesday December 16, 2009 at 5:00pm Where: newthinking store, Tucholskystr. 48, Berlin As always there will be slots of 20min each for talks on your Hadoop topic. After each talk

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-10 Thread Otis Gospodnetic
Yes, that's the n-gram one. I believe the existing CJK one in Lucene is really just an n-gram tokenizer, so no different than the normal n-gram tokenizer. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR -

Re: Request assistance with distributed search multi shard/core setup and configuration

2009-11-10 Thread Otis Gospodnetic
Hm, I don't follow. You don't need to create a custom (request) handler to make use of Solr's distributed search. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: "Turner, Ro

Re: long startup time

2009-11-10 Thread Otis Gospodnetic
I'm not sure if anyone answered this. The "2 minutes" makes me think it's a DNS lookup timeout. Is something trying to look up some host name? (say from the top of some XML file) Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIM

memory size

2009-11-10 Thread Jörg Agatz
Hallo, I have a Problem withe the Memory Size, but i dont know how i can repair it. Maby it is a PHP problem, but i dont know. My Error: Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 16515072 bytes) I hope you can help me KinGArtus