Re: Questions about XPath in data import handler

2009-08-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
yes. look at the 'flatten' attribute in the field. It should give you all the text (not attributes) under a given node. On Thu, Aug 13, 2009 at 8:02 PM, Andrew Clegg wrote: > > > > Noble Paul നോബിള്‍  नोब्ळ्-2 wrote: >> >> On Thu, Aug 13, 2009 at 6:35 PM, Andrew Clegg >> wrote: >> >>> Does the s

Re: defaultOperator="AND" and queries with "("

2009-08-13 Thread Shalin Shekhar Mangar
On Thu, Aug 13, 2009 at 5:31 AM, Subbacharya, Madhu < madhu.subbacha...@corp.aol.com> wrote: > > Hello, > > We have Solr running with the defaultOperator set to "AND". Am not able > to get any results for queries like q=( Ferrari AND ( "599 GTB Fiorano" OR > "612 Scaglietti" OR F430 )) , whic

Re: [OT] Solr Webinar

2009-08-13 Thread Lukáš Vlček
Hello, they [Lucid Imagination guys] said it should be published on their blog. I hope I understood it correctly. Regards, Lukas http://blog.lukas-vlcek.com/ On Fri, Aug 14, 2009 at 7:52 AM, Mani Kumar wrote: > if anyone has any pointer to this webinar, please share it. > thanks! > mani > > On

Re: [OT] Solr Webinar

2009-08-13 Thread Mani Kumar
if anyone has any pointer to this webinar, please share it. thanks! mani On Thu, Aug 13, 2009 at 9:26 PM, Chenini, Mohamed wrote: > I also registered to attend but I am not going to because here at work a > last minute meeting has been scheduled at the same time. > > Is it possible in the future

Solr 1.4 Replication scheme

2009-08-13 Thread KaktuChakarabati
Hello, I've recently switched over to solr1.4 (recent nightly build) and have been using the new replication. Some questions come to mind: In the old replication, I could snappull with multiple slaves asynchronously but perform the snapinstall on each at the same time (+- epsilon seconds), so tha

Re: Lock timed out 2 worker running

2009-08-13 Thread renz052496
Yes, I missunderstood you question (re: the crashed). Solr did not crash but we shutdown the JVM (tomcat) gracefully after we kill all our workers. But upon restarting, solr just throwing the error. Regards, /Renz 2009/8/11 Chris Hostetter > > : > 5) are these errors appearing after Solr crashes

Re: Facets with an IDF concept

2009-08-13 Thread wojtekpia
Hi Asif, Did you end up implementing this as a custom sort order for facets? I'm facing a similar problem, but not related to time. Given 2 terms: A: appears twice in half the search results B: appears once in every search result I think term A is more "interesting". Using facets sorted by freque

RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-13 Thread Fuad Efendi
I upgraded "master" to 1.4-dev from trunk 3 days ago BTW such performance broke my "commodity hardware", most probably network card... can't SSH to check stats; need to check onsite what happened... -Original Message- From: Grant Ingersoll Sent: August-13-09 4:20 PM To: solr-user@lucene

HTTP ERROR: 500 No default field name specified

2009-08-13 Thread Kevin Miller
I have a different error once I direct the curl to look in the correct folder for the file. I am getting an HTTP ERROR: 500 No default field name specified. I am using a test word document in the exampledocs folder. I am issuing the curl command from the exampledocs folder. Following is the com

RE: Curl error 26 failed creating formpost data

2009-08-13 Thread Kevin Miller
I figured out what was causing this error. I was directing the information for the myfile into the wrong directory. Kevin Miller Web Services -Original Message- From: Kevin Miller [mailto:kevin.mil...@oktax.state.ok.us] Sent: Thursday, August 13, 2009 10:08 AM To: solr-user@lucene.apa

Re: Solr 1.4 Clustering / mlt AS search?

2009-08-13 Thread Grant Ingersoll
On Aug 13, 2009, at 1:29 PM, Mark Bennett wrote: * mlb: comments On Thu, Aug 13, 2009 at 9:39 AM, Stanislaw Osinski wrote: Hi, On Tue, Aug 11, 2009 at 22:19, Mark Bennett wrote: Carrot2 has several pluggable algorithms to choose from, though I have no evidence that they're "better

Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-13 Thread Grant Ingersoll
BTW, what version of Solr are you on? On Aug 13, 2009, at 1:43 PM, Fuad Efendi wrote: UPDATE: I have 100,000,000 new documents in 24 hours, including possible updates OR possibly adding same document several times. I have two segments now (30Gb total), and network is overloaded (I use web

Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
Right, I haven't used SOLR-475 yet and am more familiar with Bobo. I believe there are differences but I haven't gone into them yet. As I'm using Solr 1.4 now, maybe I'll test the UnInvertedField modality. Feel free to report back results as I don't think I've seen much yet? On Thu, Aug 13, 2009

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to be); check this http://issues.apache.org/jira/browse/SOLR-475 (and probably http://issues.apache.org/jira/browse/SOLR-711) -Original Message- From: Jason Rutherglen Yeah we need a performance comparison, I haven't

RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-13 Thread Fuad Efendi
UPDATE: I have 100,000,000 new documents in 24 hours, including possible updates OR possibly adding same document several times. I have two segments now (30Gb total), and network is overloaded (I use web crawler to generate documents). I never had more than 25,000,000 within a month before... I r

Re: Solr 1.4 Clustering / mlt AS search?

2009-08-13 Thread Mark Bennett
* mlb: comments On Thu, Aug 13, 2009 at 9:39 AM, Stanislaw Osinski wrote: > Hi, > > On Tue, Aug 11, 2009 at 22:19, Mark Bennett wrote: > > Carrot2 has several pluggable algorithms to choose from, though I have no > > evidence that they're "better" than Lucene's. Where TF/IDF is sort of a > > on

RE: JVM Heap utilization & Memory leaks with Solr

2009-08-13 Thread Fuad Efendi
Most OutOfMemoryException (if not 100%) happening with SOLR are because of http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/FieldCache. html - it is used internally in Lucene to cache Field value and document ID. My very long-term observations: SOLR can run without any problems fe

Re: Issue with Collection & Distribution

2009-08-13 Thread Bill Au
Have you check the solr log on the slave to see if there was any commit done? It looks to me you are still using an older version of the commit script that is not compatible with the newer Solr response format. If thats' the case, the commit was actually performed. It is just that the script fai

Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
Yeah we need a performance comparison, I haven't had time to put one together. If/when I do I'll compare Bobo performance against Solr bitset intersection based facets, compare memory consumption. For near realtime Solr needs to cache and merge bitsets at the SegmentReader level, and Bobo needs to

Re: Using Lucene's payload in Solr

2009-08-13 Thread Bill Au
I need to boost a field differently according to the content of the field. Here is an example: Solr information retrieval webapp xml Tomcat webapp XMLSpy xml ide A seach on category:webapp should return Tomcat before Solr. A search on category:xml should return XMLSpy befor

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
Interesting, it has "BoboRequestHandler implements SolrRequestHandler" - easy to try it; and shards support [Fuad Efendi] It seems BOBO-Browse is alternate faceting engine; would be interesting to compare performance with SOLR... Distributed? [Jason Rutherglen] For your fields with many terms

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
It seems BOBO-Browse is alternate faceting engine; would be interesting to compare performance with SOLR... Distributed? -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: August-12-09 6:12 PM To: solr-user@lucene.apache.org Subject: Re: facet performance

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
I took 1.4 from trunk three days ago, it seems Ok for production (at least for my Master instance which is doing writes-only). I use the same config files. 500 000 terms are Ok too; I am using several millions with pre-1.3 SOLR taken from trunk. However, do not try to "facet" (probably outdated

Re: Solr 1.4 Clustering / mlt AS search?

2009-08-13 Thread Stanislaw Osinski
Hi, On Tue, Aug 11, 2009 at 22:19, Mark Bennett wrote: Carrot2 has several pluggable algorithms to choose from, though I have no > evidence that they're "better" than Lucene's. Where TF/IDF is sort of a > one > step algebraic calculation, some clustering algorithms use iterative > approaches, e

Issue with Collection & Distribution

2009-08-13 Thread william pink
Hello, I am having a few problems with the snapinstaller/commit on the slave, I have a pull_from_master script which is the following #!/bin/bash cd /opt/solr/solr/bin -v ./snappuller -v -P 18983 ./snapinstaller -v I have been executing snapshooter manually on the master then running the above

Re: Using Lucene's payload in Solr

2009-08-13 Thread Grant Ingersoll
On Aug 13, 2009, at 11:58 AM, Bill Au wrote: Thanks for the tip on BFTQ. I have been using a nightly build before that was committed. I have upgrade to the latest nightly build and will use that instead of BTQ. I got DelimitedPayloadTokenFilter to work and see that the terms and payloa

Boosting relevance as terms get nearer to each other

2009-08-13 Thread Michael _
Hello, I'd like to score documents higher that have the user's search terms nearer each other. For example, if a user searches for a AND b AND c the standard query handler should return all documents with [a] [b] and [c] in them, but documents matching the phrase "a b c" should get a boost ove

Re: Using Lucene's payload in Solr

2009-08-13 Thread Bill Au
Thanks for the tip on BFTQ. I have been using a nightly build before that was committed. I have upgrade to the latest nightly build and will use that instead of BTQ. I got DelimitedPayloadTokenFilter to work and see that the terms and payload of the field are correct but the delimiter and payloa

RE: [OT] Solr Webinar

2009-08-13 Thread Chenini, Mohamed
I also registered to attend but I am not going to because here at work a last minute meeting has been scheduled at the same time. Is it possible in the future to schedule such webinars starting 5-6 PM ET? Thanks, Mohamed -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.or

RE: Using Lucene's payload in Solr

2009-08-13 Thread Ensdorf Ken
> > It looks like things have changed a bit since this subject was last > > brought > > up here. I see that there are support in Solr/Lucene for indexing > > payload > > data (DelimitedPayloadTokenFilterFactory and > > DelimitedPayloadTokenFilter). > > Overriding the Similarity class is straight f

Curl error 26 failed creating formpost data

2009-08-13 Thread Kevin Miller
I am trying to use the curl command located on the Extracting Request Handler on the Solr Wiki. I am using the command in the following way: curl "http://echo12:8983/solr/update/extract?literal.id=doc1&uprefix=attr&map .content=attr_content&commit=true" -F "myfi...@../../BadNews.doc" echo12 is t

Re: I think this is a "bug"

2009-08-13 Thread Chris Male
Hi Paul, Yes the comment does look very wrong. I'll open a JIRA issue and include a fix. On Thu, Aug 13, 2009 at 4:43 PM, Paul Tomblin wrote: > I don't want to join yet another mailing list or register for JIRA, > but I just noticed that the Javadocs for > SolrInputDocument.addField(String nam

I think this is a "bug"

2009-08-13 Thread Paul Tomblin
I don't want to join yet another mailing list or register for JIRA, but I just noticed that the Javadocs for SolrInputDocument.addField(String name, Object value, float boost) is incredibly wrong - it looks like it was copied from a "deleteAll" method. -- http://www.linkedin.com/in/paultomblin

Re: Questions about XPath in data import handler

2009-08-13 Thread Andrew Clegg
Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: > > On Thu, Aug 13, 2009 at 6:35 PM, Andrew Clegg > wrote: > >> Does the second one mean "select the value of the attribute called >> qualifier >> in the /a/b/subject element"? > > yes you are right. Isn't that the semantics of standard xpath syntax? > Ye

Re: Distributed query returns time consumed by each Solr shard?

2009-08-13 Thread Grant Ingersoll
Not that I am aware of. I think there is a patch for timing out shards and returning partial results if a shard takes to long. I believe it is slated for 1.4, but it doesn't have any unit tests at the moment. On Aug 12, 2009, at 7:12 PM, Jason Rutherglen wrote: Is there a way to do thi

Re: Query with no cache without editing solrconfig?

2009-08-13 Thread Koji Sekiguchi
Jason Rutherglen wrote: Is there a way to do this via a URL? I think - no there isn't. Koji

Re: Questions about XPath in data import handler

2009-08-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Aug 13, 2009 at 6:35 PM, Andrew Clegg wrote: > > A couple of questions about the DIH XPath syntax... > > The docs say it supports: > >   xpath="/a/b/subje...@qualifier='fullTitle']" >   xpath="/a/b/subject/@qualifier" >   xpath="/a/b/c" > > Does the second one mean "select the value of the

Re: Questions about XPath in data import handler

2009-08-13 Thread Andrew Clegg
Andrew Clegg wrote: > > > Sorry, Nabble swallowed my XML example. That was supposed to be [a] [b] [subject qualifier="some text" /] [/b] [/a] ... but in XML. Andrew. -- View this message in context: http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p249

Questions about XPath in data import handler

2009-08-13 Thread Andrew Clegg
A couple of questions about the DIH XPath syntax... The docs say it supports: xpath="/a/b/subje...@qualifier='fullTitle']" xpath="/a/b/subject/@qualifier" xpath="/a/b/c" Does the second one mean "select the value of the attribute called qualifier in the /a/b/subject element"? e.g. For

Re: facet performance tips

2009-08-13 Thread Jérôme Etévé
Thanks everyone for your advices. I increased my filterCache, and the faceting performances improved greatly. My faceted field can have at the moment ~4 different terms, so I did set a filterCache size of 5 and it works very well. However, I'm planning to increase the number of terms to