Re: facet performance tips
Thanks everyone for your advices. I increased my filterCache, and the faceting performances improved greatly. My faceted field can have at the moment ~4 different terms, so I did set a filterCache size of 5 and it works very well. However, I'm planning to increase the number of terms to maybe around 500 000, so I guess this approach won't work anymore, as I doubt a 500 000 sized fieldCache would work. So I guess my best move would be to upgrade to the soon to be 1.4 version of solr to benefit from its new faceting method. I know this is a bit off-topic, but do you have a rough idea about when 1.4 will be an official release? As well, is the current trunk OK for production? Is it compatible with 1.3 configuration files? Thanks ! Jerome. 2009/8/13 Stephen Duncan Jr : > Note that depending on the profile of your field (full text and how many > unique terms on average per document), the improvements from 1.4 may not > apply, as you may exceed the limits of the new faceting technique in Solr > 1.4. > -Stephen > > On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher wrote: > >> Yes, increasing the filterCache size will help with Solr 1.3 performance. >> >> Do note that trunk (soon Solr 1.4) has dramatically improved faceting >> performance. >> >>Erik >> >> >> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote: >> >> Hi everyone, >>> >>> I'm using some faceting on a solr index containing ~ 160K documents. >>> I perform facets on multivalued string fields. The number of possible >>> different values is quite large. >>> >>> Enabling facets degrades the performance by a factor 3. >>> >>> Because I'm using solr 1.3, I guess the facetting makes use of the >>> filter cache to work. My filterCache is set >>> to a size of 2048. I also noticed in my solr stats a very small ratio >>> of cache hit (~ 0.01%). >>> >>> Can it be the reason why the faceting is slow? Does it make sense to >>> increase the filterCache size so it matches more or less the number >>> of different possible values for the faceted fields? Would that not >>> make the memory usage explode? >>> >>> Thanks for your help ! >>> >>> -- >>> Jerome Eteve. >>> >>> Chat with me live at http://www.eteve.net >>> >>> jer...@eteve.net >>> >> >> > > > -- > Stephen Duncan Jr > www.stephenduncanjr.com > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Questions about XPath in data import handler
A couple of questions about the DIH XPath syntax... The docs say it supports: xpath="/a/b/subje...@qualifier='fullTitle']" xpath="/a/b/subject/@qualifier" xpath="/a/b/c" Does the second one mean "select the value of the attribute called qualifier in the /a/b/subject element"? e.g. For this document: ... I would get the result "some text"? Also... Can I select a non-leaf node and get *ALL* the text underneath it? e.g. /a/b in this example? Thanks! Andrew. -- View this message in context: http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954223.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Questions about XPath in data import handler
Andrew Clegg wrote: > > > Sorry, Nabble swallowed my XML example. That was supposed to be [a] [b] [subject qualifier="some text" /] [/b] [/a] ... but in XML. Andrew. -- View this message in context: http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954263.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Questions about XPath in data import handler
On Thu, Aug 13, 2009 at 6:35 PM, Andrew Clegg wrote: > > A couple of questions about the DIH XPath syntax... > > The docs say it supports: > > xpath="/a/b/subje...@qualifier='fullTitle']" > xpath="/a/b/subject/@qualifier" > xpath="/a/b/c" > > Does the second one mean "select the value of the attribute called qualifier > in the /a/b/subject element"? > > e.g. For this document: > > > > > > > > ... I would get the result "some text"? yes you are right. Isn't that the semantics of standard xpath syntax? > > Also... Can I select a non-leaf node and get *ALL* the text underneath it? > e.g. /a/b in this example? > > Thanks! > > Andrew. > > -- > View this message in context: > http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954223.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Query with no cache without editing solrconfig?
Jason Rutherglen wrote: Is there a way to do this via a URL? I think - no there isn't. Koji
Re: Distributed query returns time consumed by each Solr shard?
Not that I am aware of. I think there is a patch for timing out shards and returning partial results if a shard takes to long. I believe it is slated for 1.4, but it doesn't have any unit tests at the moment. On Aug 12, 2009, at 7:12 PM, Jason Rutherglen wrote: Is there a way to do this currently? If a shard takes an inordinate amount of time compared to the other shards, it's useful to see the various qtimes per shard, with the aggregated results.
Re: Questions about XPath in data import handler
Noble Paul നോബിള് नोब्ळ्-2 wrote: > > On Thu, Aug 13, 2009 at 6:35 PM, Andrew Clegg > wrote: > >> Does the second one mean "select the value of the attribute called >> qualifier >> in the /a/b/subject element"? > > yes you are right. Isn't that the semantics of standard xpath syntax? > Yes, just checking since the DIH XPath engine is a little different. Do you know what I would get in this case? > > Also... Can I select a non-leaf node and get *ALL* the text underneath > it? > > e.g. /a/b in this example? Cheers, Andrew. -- View this message in context: http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954869.html Sent from the Solr - User mailing list archive at Nabble.com.
I think this is a "bug"
I don't want to join yet another mailing list or register for JIRA, but I just noticed that the Javadocs for SolrInputDocument.addField(String name, Object value, float boost) is incredibly wrong - it looks like it was copied from a "deleteAll" method. -- http://www.linkedin.com/in/paultomblin
Re: I think this is a "bug"
Hi Paul, Yes the comment does look very wrong. I'll open a JIRA issue and include a fix. On Thu, Aug 13, 2009 at 4:43 PM, Paul Tomblin wrote: > I don't want to join yet another mailing list or register for JIRA, > but I just noticed that the Javadocs for > SolrInputDocument.addField(String name, Object value, float boost) is > incredibly wrong - it looks like it was copied from a "deleteAll" > method. > > > -- > http://www.linkedin.com/in/paultomblin >
Curl error 26 failed creating formpost data
I am trying to use the curl command located on the Extracting Request Handler on the Solr Wiki. I am using the command in the following way: curl "http://echo12:8983/solr/update/extract?literal.id=doc1&uprefix=attr&map .content=attr_content&commit=true" -F "myfi...@../../BadNews.doc" echo12 is the server where Solr is located and the BadNews.doc is located in the exampledocs directory. When I execute this command I get the following error message: curl: (26) failed creating formpost data. Can someone please direct me to where I can find the way to correct this error? Kevin Miller Web Services
RE: Using Lucene's payload in Solr
> > It looks like things have changed a bit since this subject was last > > brought > > up here. I see that there are support in Solr/Lucene for indexing > > payload > > data (DelimitedPayloadTokenFilterFactory and > > DelimitedPayloadTokenFilter). > > Overriding the Similarity class is straight forward. So the last > > piece of > > the puzzle is to use a BoostingTermQuery when searching. I think > > all I need > > to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser > > under > > the cover. I think all I need to do is to write my own query parser > > plugin > > that uses a custom query parser, with the only difference being in > the > > getFieldQuery() method where a BoostingTermQuery is used instead of a > > TermQuery. > > The BTQ is now deprecated in favor of the BoostingFunctionTermQuery, > which gives some more flexibility in terms of how the spans in a > single document are scored. > > > > > Am I on the right track? > > Yes. > > > Has anyone done something like this already? > I wrote a QParserPlugin that seems to do the trick. This is minimally tested - we're not actually using it at the moment, but should get you going. Also, as Grant suggested, you may want to sub BFTQ for BTQ below: package com.zoominfo.solr.analysis; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.*; import org.apache.lucene.search.*; import org.apache.lucene.search.payloads.BoostingTermQuery; import org.apache.solr.common.params.*; import org.apache.solr.common.util.NamedList; import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.search.*; public class BoostingTermQParserPlugin extends QParserPlugin { public static String NAME = "zoom"; public void init(NamedList args) { } public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { System.out.print("BoostingTermQParserPlugin::createParser\n"); return new BoostingTermQParser(qstr, localParams, params, req); } } class BoostingTermQueryParser extends QueryParser { public BoostingTermQueryParser(String f, Analyzer a) { super(f, a); System.out.print("BoostingTermQueryParser::BoostingTermQueryParser\n"); } @Override protected Query newTermQuery(Term term){ System.out.print("BoostingTermQueryParser::newTermQuery\n"); return new BoostingTermQuery(term); } } class BoostingTermQParser extends QParser { String sortStr; QueryParser lparser; public BoostingTermQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { super(qstr, localParams, params, req); System.out.print("BoostingTermQParser::BoostingTermQParser\n"); } public Query parse() throws ParseException { System.out.print("BoostingTermQParser::parse\n"); String qstr = getString(); String defaultField = getParam(CommonParams.DF); if (defaultField==null) { defaultField = getReq().getSchema().getSolrQueryParser(null).getField(); } lparser = new BoostingTermQueryParser(defaultField, getReq().getSchema().getQueryAnalyzer()); // these could either be checked & set here, or in the SolrQueryParser constructor String opParam = getParam(QueryParsing.OP); if (opParam != null) { lparser.setDefaultOperator("AND".equals(opParam) ? QueryParser.Operator.AND : QueryParser.Operator.OR); } else { // try to get default operator from schema lparser.setDefaultOperator(getReq().getSchema().getSolrQueryParser(null).getDefaultOperator()); } return lparser.parse(qstr); } public String[] getDefaultHighlightFields() { return new String[]{lparser.getField()}; } }
RE: [OT] Solr Webinar
I also registered to attend but I am not going to because here at work a last minute meeting has been scheduled at the same time. Is it possible in the future to schedule such webinars starting 5-6 PM ET? Thanks, Mohamed -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Wednesday, August 12, 2009 6:22 PM To: solr-user@lucene.apache.org Subject: Re: [OT] Solr Webinar I believe it will be, but am not sure of the procedure for distributing. I think if you register, but don't show, you will get a notification. -Grant On Aug 10, 2009, at 12:26 PM, Lucas F. A. Teixeira wrote: > Hello Grant, > Will the webinar be recorded and available to download later > someplace? > Unfortunately, I can't watch this time. > > Thanks, > > []s, > > Lucas Frare Teixeira .*. > - lucas...@gmail.com > - blog.lucastex.com > - twitter.com/lucastex > > > On Mon, Aug 10, 2009 at 12:33 PM, Grant Ingersoll > wrote: > >> I will be giving a free one hour webinar on getting started with >> Apache >> Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT >> >> You can sign up @ >> http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP >> >> I will present and demo: >> * Getting started with LucidWorks for Solr >> * Getting better, faster results using Solr's findability and >> relevance >> improvement tools >> * Deploying Solr in production, including monitoring performance >> and trends >> with the LucidGaze for Solr performance profiler >> >> -Grant This email/fax message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of this email/fax is prohibited. If you are not the intended recipient, please destroy all paper and electronic copies of the original message.
Re: Using Lucene's payload in Solr
Thanks for the tip on BFTQ. I have been using a nightly build before that was committed. I have upgrade to the latest nightly build and will use that instead of BTQ. I got DelimitedPayloadTokenFilter to work and see that the terms and payload of the field are correct but the delimiter and payload are stored so they appear in the response also. Here is an example: XML for indexing: Solr|2.0 In|2.0 Action|2.0 XML response: Solr|2.0 In|2.0 Action|2.0 > I want to set payload on a field that has a variable number of words. So I guess I can use a copy field with a PatternTokenizerFactory to filter out the delimiter and payload. I am thinking maybe I can do this instead when indexing: XML for indexing: Solr In Action This will simplify indexing as I don't have to repeat the payload for each word in the field. I do have to write a payload aware update handler. It looks like I can use Lucene's NumericPayloadTokenFilter in my custom update handler to Any thoughts/comments/suggestions? Bill On Wed, Aug 12, 2009 at 7:13 AM, Grant Ingersoll wrote: > > On Aug 11, 2009, at 5:30 PM, Bill Au wrote: > > It looks like things have changed a bit since this subject was last >> brought >> up here. I see that there are support in Solr/Lucene for indexing payload >> data (DelimitedPayloadTokenFilterFactory and DelimitedPayloadTokenFilter). >> Overriding the Similarity class is straight forward. So the last piece of >> the puzzle is to use a BoostingTermQuery when searching. I think all I >> need >> to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser under >> the cover. I think all I need to do is to write my own query parser >> plugin >> that uses a custom query parser, with the only difference being in the >> getFieldQuery() method where a BoostingTermQuery is used instead of a >> TermQuery. >> > > The BTQ is now deprecated in favor of the BoostingFunctionTermQuery, which > gives some more flexibility in terms of how the spans in a single document > are scored. > > >> Am I on the right track? >> > > Yes. > > Has anyone done something like this already? >> > > I intend to, but haven't started. > > Since Solr already has indexing support for payload, I was hoping that >> query >> support is already in the works if not available already. If not, I am >> willing to contribute but will probably need some guidance since my >> knowledge in Solr query parser is weak. >> > > > https://issues.apache.org/jira/browse/SOLR-1337 >
Boosting relevance as terms get nearer to each other
Hello, I'd like to score documents higher that have the user's search terms nearer each other. For example, if a user searches for a AND b AND c the standard query handler should return all documents with [a] [b] and [c] in them, but documents matching the phrase "a b c" should get a boost over those with "a x b c" over those with "b x y c z a", etc. To accomplish this, I thought I might replace the user's query with "a b c"~10 hoping that the slop term gets a higher and higher score the closer together [a] [b] and [c] appear. This doesn't seem to be the case in my experiments; when I debug the query, there's no component of the score based on how close together [a] [b] and [c] are. And I'm suspicious that this would make my queries a whole lot slower -- in reality my users' queries get expanded quite a bit already, and I'd thus need to add many slop terms. Perhaps instead I could modify the Standard query handler to examine the distance between all ANDed tokens, and boost proportionally to the inverse of their average distance apart. I've never modified a query handler before so I have no idea if this is possible. Any suggestions on what approach I should take? The less I have to modify Solr, the better -- I'd prefer a query-side solution over writing a plugin over forking the standard query handler. Thanks in advance! Michael
Re: Using Lucene's payload in Solr
On Aug 13, 2009, at 11:58 AM, Bill Au wrote: Thanks for the tip on BFTQ. I have been using a nightly build before that was committed. I have upgrade to the latest nightly build and will use that instead of BTQ. I got DelimitedPayloadTokenFilter to work and see that the terms and payload of the field are correct but the delimiter and payload are stored so they appear in the response also. Here is an example: XML for indexing: Solr|2.0 In|2.0 Action|2.0 XML response: Solr|2.0 In|2.0 Action|2.0 Correct. I want to set payload on a field that has a variable number of words. So I guess I can use a copy field with a PatternTokenizerFactory to filter out the delimiter and payload. I am thinking maybe I can do this instead when indexing: XML for indexing: Solr In Action Hmmm, interesting, what's your motivation vs. boosting the field? This will simplify indexing as I don't have to repeat the payload for each word in the field. I do have to write a payload aware update handler. It looks like I can use Lucene's NumericPayloadTokenFilter in my custom update handler to Any thoughts/comments/suggestions? Bill On Wed, Aug 12, 2009 at 7:13 AM, Grant Ingersoll wrote: On Aug 11, 2009, at 5:30 PM, Bill Au wrote: It looks like things have changed a bit since this subject was last brought up here. I see that there are support in Solr/Lucene for indexing payload data (DelimitedPayloadTokenFilterFactory and DelimitedPayloadTokenFilter). Overriding the Similarity class is straight forward. So the last piece of the puzzle is to use a BoostingTermQuery when searching. I think all I need to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser under the cover. I think all I need to do is to write my own query parser plugin that uses a custom query parser, with the only difference being in the getFieldQuery() method where a BoostingTermQuery is used instead of a TermQuery. The BTQ is now deprecated in favor of the BoostingFunctionTermQuery, which gives some more flexibility in terms of how the spans in a single document are scored. Am I on the right track? Yes. Has anyone done something like this already? I intend to, but haven't started. Since Solr already has indexing support for payload, I was hoping that query support is already in the works if not available already. If not, I am willing to contribute but will probably need some guidance since my knowledge in Solr query parser is weak. https://issues.apache.org/jira/browse/SOLR-1337 -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Issue with Collection & Distribution
Hello, I am having a few problems with the snapinstaller/commit on the slave, I have a pull_from_master script which is the following #!/bin/bash cd /opt/solr/solr/bin -v ./snappuller -v -P 18983 ./snapinstaller -v I have been executing snapshooter manually on the master then running the above script to test but I am getting the following 2009/08/13 17:18:16 notifing Solr to open a new Searcher 2009/08/13 17:18:16 failed to connect to Solr server 2009/08/13 17:18:17 snapshot installed but Solr server has not open a new Searcher Commit logs 2009/08/13 17:18:16 started by user 2009/08/13 17:18:16 command: /opt/solr/solr/bin/commit 2009/08/13 17:18:16 commit request to Solr at http://slave-server:8983/solr/update failed: 2009/08/13 17:18:16 028 2009/08/13 17:18:16 failed (elapsed time: 0 sec) Snappinstaller logs 2009/08/13 17:18:16 started by user 2009/08/13 17:18:16 command: ./snapinstaller -v 2009/08/13 17:18:16 installing snapshot /opt/solr/solr/data/snapshot.20090813171835 2009/08/13 17:18:16 notifing Solr to open a new Searcher 2009/08/13 17:18:16 failed to connect to Solr server 2009/08/13 17:18:17 snapshot installed but Solr server has not open a new Searcher 2009/08/13 17:18:17 failed (elapsed time: 1 sec) Is there a way of telling why it is failing? Many Thanks, Will
Re: Solr 1.4 Clustering / mlt AS search?
Hi, On Tue, Aug 11, 2009 at 22:19, Mark Bennett wrote: Carrot2 has several pluggable algorithms to choose from, though I have no > evidence that they're "better" than Lucene's. Where TF/IDF is sort of a > one > step algebraic calculation, some clustering algorithms use iterative > approaches, etc. I'm not sure if I completely follow the way in which you'd like to use Carrot2 for scoring -- would you cluster the whole index? Carrot2 was designed to be a post-retrieval clustering algorithm and optimized to cluster small sets of documents (up to ~1000) in real time. All processing is performed in-memory, which limits Carrot2's applicability to really large sets of documents. S.
RE: facet performance tips
I took 1.4 from trunk three days ago, it seems Ok for production (at least for my Master instance which is doing writes-only). I use the same config files. 500 000 terms are Ok too; I am using several millions with pre-1.3 SOLR taken from trunk. However, do not try to "facet" (probably outdated term after SOLR-475) on generic queries such as [* TO *] (with huge resultset). For smaller query results (100,000 instead of 100,000,000) "counting terms" is fast enough (few milliseconds at http://www.tokenizer.org) -Original Message- From: Jérôme Etévé [mailto:jerome.et...@gmail.com] Sent: August-13-09 5:38 AM To: solr-user@lucene.apache.org Subject: Re: facet performance tips Thanks everyone for your advices. I increased my filterCache, and the faceting performances improved greatly. My faceted field can have at the moment ~4 different terms, so I did set a filterCache size of 5 and it works very well. However, I'm planning to increase the number of terms to maybe around 500 000, so I guess this approach won't work anymore, as I doubt a 500 000 sized fieldCache would work. So I guess my best move would be to upgrade to the soon to be 1.4 version of solr to benefit from its new faceting method. I know this is a bit off-topic, but do you have a rough idea about when 1.4 will be an official release? As well, is the current trunk OK for production? Is it compatible with 1.3 configuration files? Thanks ! Jerome. 2009/8/13 Stephen Duncan Jr : > Note that depending on the profile of your field (full text and how many > unique terms on average per document), the improvements from 1.4 may not > apply, as you may exceed the limits of the new faceting technique in Solr > 1.4. > -Stephen > > On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher wrote: > >> Yes, increasing the filterCache size will help with Solr 1.3 performance. >> >> Do note that trunk (soon Solr 1.4) has dramatically improved faceting >> performance. >> >>Erik >> >> >> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote: >> >> Hi everyone, >>> >>> I'm using some faceting on a solr index containing ~ 160K documents. >>> I perform facets on multivalued string fields. The number of possible >>> different values is quite large. >>> >>> Enabling facets degrades the performance by a factor 3. >>> >>> Because I'm using solr 1.3, I guess the facetting makes use of the >>> filter cache to work. My filterCache is set >>> to a size of 2048. I also noticed in my solr stats a very small ratio >>> of cache hit (~ 0.01%). >>> >>> Can it be the reason why the faceting is slow? Does it make sense to >>> increase the filterCache size so it matches more or less the number >>> of different possible values for the faceted fields? Would that not >>> make the memory usage explode? >>> >>> Thanks for your help ! >>> >>> -- >>> Jerome Eteve. >>> >>> Chat with me live at http://www.eteve.net >>> >>> jer...@eteve.net >>> >> >> > > > -- > Stephen Duncan Jr > www.stephenduncanjr.com > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
RE: facet performance tips
It seems BOBO-Browse is alternate faceting engine; would be interesting to compare performance with SOLR... Distributed? -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: August-12-09 6:12 PM To: solr-user@lucene.apache.org Subject: Re: facet performance tips For your fields with many terms you may want to try Bobo http://code.google.com/p/bobo-browse/ which could work well with your case.
RE: facet performance tips
Interesting, it has "BoboRequestHandler implements SolrRequestHandler" - easy to try it; and shards support [Fuad Efendi] It seems BOBO-Browse is alternate faceting engine; would be interesting to compare performance with SOLR... Distributed? [Jason Rutherglen] For your fields with many terms you may want to try Bobo http://code.google.com/p/bobo-browse/ which could work well with your case.
Re: Using Lucene's payload in Solr
I need to boost a field differently according to the content of the field. Here is an example: Solr information retrieval webapp xml Tomcat webapp XMLSpy xml ide A seach on category:webapp should return Tomcat before Solr. A search on category:xml should return XMLSpy before Solr. Bill On Thu, Aug 13, 2009 at 12:13 PM, Grant Ingersoll wrote: > > On Aug 13, 2009, at 11:58 AM, Bill Au wrote: > > Thanks for the tip on BFTQ. I have been using a nightly build before that >> was committed. I have upgrade to the latest nightly build and will use >> that >> instead of BTQ. >> >> I got DelimitedPayloadTokenFilter to work and see that the terms and >> payload >> of the field are correct but the delimiter and payload are stored so they >> appear in the response also. Here is an example: >> >> XML for indexing: >> Solr|2.0 In|2.0 Action|2.0 >> >> >> XML response: >> >> Solr|2.0 In|2.0 Action|2.0 >> >> > > > Correct. > > >> >>> I want to set payload on a field that has a variable number of words. >> So I >> guess I can use a copy field with a PatternTokenizerFactory to filter out >> the delimiter and payload. >> >> I am thinking maybe I can do this instead when indexing: >> >> XML for indexing: >> Solr In Action >> > > Hmmm, interesting, what's your motivation vs. boosting the field? > > > > >> This will simplify indexing as I don't have to repeat the payload for each >> word in the field. I do have to write a payload aware update handler. It >> looks like I can use Lucene's NumericPayloadTokenFilter in my custom >> update >> handler to >> >> Any thoughts/comments/suggestions? >> >> > > Bill >> >> >> On Wed, Aug 12, 2009 at 7:13 AM, Grant Ingersoll > >wrote: >> >> >>> On Aug 11, 2009, at 5:30 PM, Bill Au wrote: >>> >>> It looks like things have changed a bit since this subject was last >>> brought up here. I see that there are support in Solr/Lucene for indexing payload data (DelimitedPayloadTokenFilterFactory and DelimitedPayloadTokenFilter). Overriding the Similarity class is straight forward. So the last piece of the puzzle is to use a BoostingTermQuery when searching. I think all I need to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser under the cover. I think all I need to do is to write my own query parser plugin that uses a custom query parser, with the only difference being in the getFieldQuery() method where a BoostingTermQuery is used instead of a TermQuery. >>> The BTQ is now deprecated in favor of the BoostingFunctionTermQuery, >>> which >>> gives some more flexibility in terms of how the spans in a single >>> document >>> are scored. >>> >>> >>> Am I on the right track? >>> Yes. >>> >>> Has anyone done something like this already? >>> >>> I intend to, but haven't started. >>> >>> Since Solr already has indexing support for payload, I was hoping that >>> query support is already in the works if not available already. If not, I am willing to contribute but will probably need some guidance since my knowledge in Solr query parser is weak. >>> >>> https://issues.apache.org/jira/browse/SOLR-1337 >>> >>> > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > >
Re: facet performance tips
Yeah we need a performance comparison, I haven't had time to put one together. If/when I do I'll compare Bobo performance against Solr bitset intersection based facets, compare memory consumption. For near realtime Solr needs to cache and merge bitsets at the SegmentReader level, and Bobo needs to be upgraded to work with Lucene 2.9's searching at the segment level (currently it uses a MultiSearcher). Distributed search on either should be fairly straightforward? On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote: > It seems BOBO-Browse is alternate faceting engine; would be interesting to > compare performance with SOLR... Distributed? > > > -Original Message- > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: August-12-09 6:12 PM > To: solr-user@lucene.apache.org > Subject: Re: facet performance tips > > For your fields with many terms you may want to try Bobo > http://code.google.com/p/bobo-browse/ which could work well with your > case. > > > > >
Re: Issue with Collection & Distribution
Have you check the solr log on the slave to see if there was any commit done? It looks to me you are still using an older version of the commit script that is not compatible with the newer Solr response format. If thats' the case, the commit was actually performed. It is just that the script failed to handle the Solr response. See https://issues.apache.org/jira/browse/SOLR-463 https://issues.apache.org/jira/browse/SOLR-426 Bill On Thu, Aug 13, 2009 at 12:28 PM, william pink wrote: > Hello, > > I am having a few problems with the snapinstaller/commit on the slave, I > have a pull_from_master script which is the following > > #!/bin/bash > cd /opt/solr/solr/bin -v > ./snappuller -v -P 18983 > ./snapinstaller -v > > > I have been executing snapshooter manually on the master then running the > above script to test but I am getting the following > > 2009/08/13 17:18:16 notifing Solr to open a new Searcher > 2009/08/13 17:18:16 failed to connect to Solr server > 2009/08/13 17:18:17 snapshot installed but Solr server has not open a new > Searcher > > Commit logs > > 2009/08/13 17:18:16 started by user > 2009/08/13 17:18:16 command: /opt/solr/solr/bin/commit > 2009/08/13 17:18:16 commit request to Solr at > http://slave-server:8983/solr/update failed: > 2009/08/13 17:18:16name="responseHeader">0 name="QTime">28 > 2009/08/13 17:18:16 failed (elapsed time: 0 sec) > > Snappinstaller logs > > 2009/08/13 17:18:16 started by user > 2009/08/13 17:18:16 command: ./snapinstaller -v > 2009/08/13 17:18:16 installing snapshot > /opt/solr/solr/data/snapshot.20090813171835 > 2009/08/13 17:18:16 notifing Solr to open a new Searcher > 2009/08/13 17:18:16 failed to connect to Solr server > 2009/08/13 17:18:17 snapshot installed but Solr server has not open a new > Searcher > 2009/08/13 17:18:17 failed (elapsed time: 1 sec) > > > Is there a way of telling why it is failing? > > Many Thanks, > Will >
RE: JVM Heap utilization & Memory leaks with Solr
Most OutOfMemoryException (if not 100%) happening with SOLR are because of http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/FieldCache. html - it is used internally in Lucene to cache Field value and document ID. My very long-term observations: SOLR can run without any problems few days/months and unpredictable OOM happens just because someone tried sorted search which will populate array with IDs of ALL documents in the index. The only solution: calculate exactly amount of RAM needed for FieldCache... For instance, for 100,000,000 documents single instance of FieldCache may require 8*100,000,000 bytes (8 bytes per document ID?) which is almost 1Gb (at least!) I didn't notice any memory leaks after I started to use 16Gb RAM for SOLR instance (almost a year without any restart!) -Original Message- From: Rahul R [mailto:rahul.s...@gmail.com] Sent: August-13-09 1:25 AM To: solr-user@lucene.apache.org Subject: Re: JVM Heap utilization & Memory leaks with Solr *You should try to generate heap dumps and analyze the heap using a tool like the Eclipse Memory Analyzer. Maybe it helps spotting a group of objects holding a large amount of memory* The tool that I used also allows to capture heap snap shots. Eclipse had a lot of pre-requisites. You need to apply some three or five patches before you can start using it My observations with this tool were that some Hashmaps were taking up a lot of space. Although I could not pin it down to the exact HashMap. These would either be weblogic's or Solr's I will anyway give eclipse's a try and see how it goes. Thanks for your input. Rahul On Wed, Aug 12, 2009 at 2:15 PM, Gunnar Wagenknecht wrote: > Rahul R schrieb: > > I tried using a profiling tool - Yourkit. The trial version was free for > 15 > > days. But I couldn't find anything of significance. > > You should try to generate heap dumps and analyze the heap using a tool > like the Eclipse Memory Analyzer. Maybe it helps spotting a group of > objects holding a large amount of memory. > > -Gunnar > > -- > Gunnar Wagenknecht > gun...@wagenknecht.org > http://wagenknecht.org/ > >
Re: Solr 1.4 Clustering / mlt AS search?
* mlb: comments On Thu, Aug 13, 2009 at 9:39 AM, Stanislaw Osinski wrote: > Hi, > > On Tue, Aug 11, 2009 at 22:19, Mark Bennett wrote: > > Carrot2 has several pluggable algorithms to choose from, though I have no > > evidence that they're "better" than Lucene's. Where TF/IDF is sort of a > > one > > step algebraic calculation, some clustering algorithms use iterative > > approaches, etc. > > > I'm not sure if I completely follow the way in which you'd like to use > Carrot2 for scoring -- would you cluster the whole index? Carrot2 was > designed to be a post-retrieval clustering algorithm and optimized to > cluster small sets of documents (up to ~1000) in real time. All processing > is performed in-memory, which limits Carrot2's applicability to really > large > sets of documents. > > S. > * mlb: I agree with all of your assertions, but... There are comments in the Solr materials about having an option to cluster based on the entire document set, and some warning about this being atypical and possibly slow. And from what you're saying, for a big enough docset, it might go from "slow" to "impossible", I'm not sure. And so my question was, *if* you were willing to spend that much time and effort to cluster all the text of all the documents (and if it were even possible), would the result perform better than the standard TF/IDF techniques? In the application I'm considering, the queries tend to be longer than average, more like full sentences or more. And they tend to be of a question and answer nature. I've seen references in several search engines that QandA search sometimes benefits from alternative search techniques. And, from a separate email, the IDF part of the standard similarity may be causing a problem, so I'm casting a wide net for other ideas. Just brainstorming here... :-) So, given that, did you have any thoughts on it Stanislaw? Mark
RE: Performance Tuning: segment_merge:index_update=5:1 (timing)
UPDATE: I have 100,000,000 new documents in 24 hours, including possible updates OR possibly adding same document several times. I have two segments now (30Gb total), and network is overloaded (I use web crawler to generate documents). I never had more than 25,000,000 within a month before... I read that high mergeFactor improves performance of updates; however, it didn't work (it delays all merges... commit/optimize took similar timing). High ramBufferSizeMB does the job. [Fuad Efendi] >Looks like I temporarily solved the problem with not-so-obvious settings: [Fuad Efendi] >ramBufferSizeMB=8192 [Fuad Efendi] >mergeFactor=10 > Never tried profiling; > 3000-5000 docs per second if SOLR is not busy with segment merge; > > During segment merge 99% CPU, no disk swap; I can't suspect I/O... > > During document updates (small batches 100-1000 docs) only 5-15% CPU > > constant rate 5:1 is very suspicious... > >> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM >> Buffer >> Flash / Segment Merge per 1 minute of (heavy) batch document updates.
RE: facet performance tips
SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to be); check this http://issues.apache.org/jira/browse/SOLR-475 (and probably http://issues.apache.org/jira/browse/SOLR-711) -Original Message- From: Jason Rutherglen Yeah we need a performance comparison, I haven't had time to put one together. If/when I do I'll compare Bobo performance against Solr bitset intersection based facets, compare memory consumption. For near realtime Solr needs to cache and merge bitsets at the SegmentReader level, and Bobo needs to be upgraded to work with Lucene 2.9's searching at the segment level (currently it uses a MultiSearcher). Distributed search on either should be fairly straightforward? On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote: > It seems BOBO-Browse is alternate faceting engine; would be interesting to > compare performance with SOLR... Distributed? > > > -Original Message- > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: August-12-09 6:12 PM > To: solr-user@lucene.apache.org > Subject: Re: facet performance tips > > For your fields with many terms you may want to try Bobo > http://code.google.com/p/bobo-browse/ which could work well with your > case. > > > > >
Re: facet performance tips
Right, I haven't used SOLR-475 yet and am more familiar with Bobo. I believe there are differences but I haven't gone into them yet. As I'm using Solr 1.4 now, maybe I'll test the UnInvertedField modality. Feel free to report back results as I don't think I've seen much yet? On Thu, Aug 13, 2009 at 10:51 AM, Fuad Efendi wrote: > SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to > be); check this > http://issues.apache.org/jira/browse/SOLR-475 > (and probably http://issues.apache.org/jira/browse/SOLR-711) > > -Original Message- > From: Jason Rutherglen > > Yeah we need a performance comparison, I haven't had time to put > one together. If/when I do I'll compare Bobo performance against > Solr bitset intersection based facets, compare memory > consumption. > > For near realtime Solr needs to cache and merge bitsets at the > SegmentReader level, and Bobo needs to be upgraded to work with > Lucene 2.9's searching at the segment level (currently it uses a > MultiSearcher). > > Distributed search on either should be fairly straightforward? > > On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote: >> It seems BOBO-Browse is alternate faceting engine; would be interesting to >> compare performance with SOLR... Distributed? >> >> >> -Original Message- >> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] >> Sent: August-12-09 6:12 PM >> To: solr-user@lucene.apache.org >> Subject: Re: facet performance tips >> >> For your fields with many terms you may want to try Bobo >> http://code.google.com/p/bobo-browse/ which could work well with your >> case. >> >> >> >> >> > > >
Re: Performance Tuning: segment_merge:index_update=5:1 (timing)
BTW, what version of Solr are you on? On Aug 13, 2009, at 1:43 PM, Fuad Efendi wrote: UPDATE: I have 100,000,000 new documents in 24 hours, including possible updates OR possibly adding same document several times. I have two segments now (30Gb total), and network is overloaded (I use web crawler to generate documents). I never had more than 25,000,000 within a month before... I read that high mergeFactor improves performance of updates; however, it didn't work (it delays all merges... commit/optimize took similar timing). High ramBufferSizeMB does the job. [Fuad Efendi] >Looks like I temporarily solved the problem with not-so-obvious settings: [Fuad Efendi] >ramBufferSizeMB=8192 [Fuad Efendi] >mergeFactor=10 Never tried profiling; 3000-5000 docs per second if SOLR is not busy with segment merge; During segment merge 99% CPU, no disk swap; I can't suspect I/O... During document updates (small batches 100-1000 docs) only 5-15% CPU constant rate 5:1 is very suspicious... In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM Buffer Flash / Segment Merge per 1 minute of (heavy) batch document updates. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Solr 1.4 Clustering / mlt AS search?
On Aug 13, 2009, at 1:29 PM, Mark Bennett wrote: * mlb: comments On Thu, Aug 13, 2009 at 9:39 AM, Stanislaw Osinski wrote: Hi, On Tue, Aug 11, 2009 at 22:19, Mark Bennett wrote: Carrot2 has several pluggable algorithms to choose from, though I have no evidence that they're "better" than Lucene's. Where TF/IDF is sort of a one step algebraic calculation, some clustering algorithms use iterative approaches, etc. I'm not sure if I completely follow the way in which you'd like to use Carrot2 for scoring -- would you cluster the whole index? Carrot2 was designed to be a post-retrieval clustering algorithm and optimized to cluster small sets of documents (up to ~1000) in real time. All processing is performed in-memory, which limits Carrot2's applicability to really large sets of documents. S. * mlb: I agree with all of your assertions, but... There are comments in the Solr materials about having an option to cluster based on the entire document set, and some warning about this being atypical and possibly slow. And from what you're saying, for a big enough docset, it might go from "slow" to "impossible", I'm not sure. Those comments are referring to a yet unimplemented feature that will allow for pluggable background clustering using something like Mahout to cluster the whole collection and then return back the results later upon request. And so my question was, *if* you were willing to spend that much time and effort to cluster all the text of all the documents (and if it were even possible), would the result perform better than the standard TF/IDF techniques? In the application I'm considering, the queries tend to be longer than average, more like full sentences or more. And they tend to be of a question and answer nature. I've seen references in several search engines that QandA search sometimes benefits from alternative search techniques. And, from a separate email, the IDF part of the standard similarity may be causing a problem, so I'm casting a wide net for other ideas. Just brainstorming here... :-) QA has a lot of factors at play, but I can't recall anyone using clustering as a way of doing the initial passage retrieval, but it's been a few years since I kept up with that literature. You of course can turn off or downplay IDF if that is an issue. I think payloads can also play a useful hand in QA (or Lucene's new Attribute capabilities, but I won't quite go there yet) because you could store term level information (often POS plays a role in helping QA, as well as parsing information) -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
RE: Curl error 26 failed creating formpost data
I figured out what was causing this error. I was directing the information for the myfile into the wrong directory. Kevin Miller Web Services -Original Message- From: Kevin Miller [mailto:kevin.mil...@oktax.state.ok.us] Sent: Thursday, August 13, 2009 10:08 AM To: solr-user@lucene.apache.org Subject: Curl error 26 failed creating formpost data I am trying to use the curl command located on the Extracting Request Handler on the Solr Wiki. I am using the command in the following way: curl "http://echo12:8983/solr/update/extract?literal.id=doc1&uprefix=attr&map .content=attr_content&commit=true" -F "myfi...@../../BadNews.doc" echo12 is the server where Solr is located and the BadNews.doc is located in the exampledocs directory. When I execute this command I get the following error message: curl: (26) failed creating formpost data. Can someone please direct me to where I can find the way to correct this error? Kevin Miller Web Services
HTTP ERROR: 500 No default field name specified
I have a different error once I direct the curl to look in the correct folder for the file. I am getting an HTTP ERROR: 500 No default field name specified. I am using a test word document in the exampledocs folder. I am issuing the curl command from the exampledocs folder. Following is the command I am using: c:\curl\bin\curl "http://echo12:8983/solr/update/extract?literal.id=doc1uprefix=attr_&map .content=attr_content&commit=true" -F "myfi...@badnews.doc" curl is installed on my machine at c:\curl and the .exe file is located at c:\curl\bin Can someone please direct me to where I can look to find out how to the a default field name? Kevin Miller Oklahoma Tax Commission Web Services
RE: Performance Tuning: segment_merge:index_update=5:1 (timing)
I upgraded "master" to 1.4-dev from trunk 3 days ago BTW such performance broke my "commodity hardware", most probably network card... can't SSH to check stats; need to check onsite what happened... -Original Message- From: Grant Ingersoll Sent: August-13-09 4:20 PM To: solr-user@lucene.apache.org Subject: Re: Performance Tuning: segment_merge:index_update=5:1 (timing) BTW, what version of Solr are you on? On Aug 13, 2009, at 1:43 PM, Fuad Efendi wrote: > UPDATE: > > I have 100,000,000 new documents in 24 hours, including possible > updates OR > possibly adding same document several times. I have two segments now > (30Gb > total), and network is overloaded (I use web crawler to generate > documents). > I never had more than 25,000,000 within a month before... > > I read that high mergeFactor improves performance of updates; > however, it > didn't work (it delays all merges... commit/optimize took similar > timing). > High ramBufferSizeMB does the job. > > > [Fuad Efendi] >Looks like I temporarily solved the problem with > not-so-obvious settings: > [Fuad Efendi] >ramBufferSizeMB=8192 > [Fuad Efendi] >mergeFactor=10 > > > >> Never tried profiling; >> 3000-5000 docs per second if SOLR is not busy with segment merge; >> >> During segment merge 99% CPU, no disk swap; I can't suspect I/O... >> >> During document updates (small batches 100-1000 docs) only 5-15% CPU >> >> constant rate 5:1 is very suspicious... >> >>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM >>> Buffer >>> Flash / Segment Merge per 1 minute of (heavy) batch document >>> updates. > > > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Facets with an IDF concept
Hi Asif, Did you end up implementing this as a custom sort order for facets? I'm facing a similar problem, but not related to time. Given 2 terms: A: appears twice in half the search results B: appears once in every search result I think term A is more "interesting". Using facets sorted by frequency, term B is more important (since it shows up first). To me, terms that appear in all documents aren't really that interesting. I'm thinking of using a combination of document count (in the result set, not globally) and term frequency (in the result set, not globally) to come up with a facet sort order. Wojtek -- View this message in context: http://www.nabble.com/Facets-with-an-IDF-concept-tp24071160p24959192.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lock timed out 2 worker running
Yes, I missunderstood you question (re: the crashed). Solr did not crash but we shutdown the JVM (tomcat) gracefully after we kill all our workers. But upon restarting, solr just throwing the error. Regards, /Renz 2009/8/11 Chris Hostetter > > : > 5) are these errors appearing after Solr crashes and you restart it? > : > : > : Yep, I can't find the logs but it's something like can't obtain lock for > : .lck Need to delete that fiile in order to start the solr > properly > > wait ... either you missunderstood my question, or you just explained > what's happening. > > If you are using SimpleFSLock, and solr crashes (OOM, kill -9, yank the > power cord) then it's possible the lock file will get left arround, in > which case this is the expected behavior. there's a config option you > can set to tell solr that on start up you want it to cleanup any old lock > files, but if you switch to the "single" lock manager mode your life gets > a lot easier anyway. > > But you never mentioned anything about the server crashing in your > original message, so i'm wondering if you really ment to answer "yep" when > i asked "are these errors appearing *after* Solr crashes" > > > -Hoss > >
Solr 1.4 Replication scheme
Hello, I've recently switched over to solr1.4 (recent nightly build) and have been using the new replication. Some questions come to mind: In the old replication, I could snappull with multiple slaves asynchronously but perform the snapinstall on each at the same time (+- epsilon seconds), so that way production load balanced query serving will always be consistent. With the new system it seems that i have no control over syncing them, but rather it polls every few minutes and then decides the next cycle based on last time it *finished* updating, so in any case I lose control over the synchronization of snap installation across multiple slaves. Also, I noticed the default poll interval is 60 seconds. It would seem that for such a rapid interval, what i mentioned above is a non issue, however i am not clear how this works vis-a-vis the new searcher warmup? for a considerable index size (20Million docs+) the warmup itself is an expensive and somewhat lengthy process and if a new searcher opens and warms up every minute, I am not at all sure i'll be able to serve queries with reasonable QTimes. Anyone else came across these issues? any advise/comment will be appreciated! Thanks, -Chak -- View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24965590.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [OT] Solr Webinar
Hello, they [Lucid Imagination guys] said it should be published on their blog. I hope I understood it correctly. Regards, Lukas http://blog.lukas-vlcek.com/ On Fri, Aug 14, 2009 at 7:52 AM, Mani Kumar wrote: > if anyone has any pointer to this webinar, please share it. > thanks! > mani > > On Thu, Aug 13, 2009 at 9:26 PM, Chenini, Mohamed >wrote: > > > I also registered to attend but I am not going to because here at work a > > last minute meeting has been scheduled at the same time. > > > > Is it possible in the future to schedule such webinars starting 5-6 PM > > ET? > > > > Thanks, > > Mohamed > > > > -Original Message- > > From: Grant Ingersoll [mailto:gsing...@apache.org] > > Sent: Wednesday, August 12, 2009 6:22 PM > > To: solr-user@lucene.apache.org > > Subject: Re: [OT] Solr Webinar > > > > I believe it will be, but am not sure of the procedure for > > distributing. I think if you register, but don't show, you will get a > > notification. > > > > -Grant > > > > On Aug 10, 2009, at 12:26 PM, Lucas F. A. Teixeira wrote: > > > > > Hello Grant, > > > Will the webinar be recorded and available to download later > > > someplace? > > > Unfortunately, I can't watch this time. > > > > > > Thanks, > > > > > > []s, > > > > > > Lucas Frare Teixeira .*. > > > - lucas...@gmail.com > > > - blog.lucastex.com > > > - twitter.com/lucastex > > > > > > > > > On Mon, Aug 10, 2009 at 12:33 PM, Grant Ingersoll > > > wrote: > > > > > >> I will be giving a free one hour webinar on getting started with > > >> Apache > > >> Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT > > >> > > >> You can sign up @ > > >> http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP > > >> > > >> I will present and demo: > > >> * Getting started with LucidWorks for Solr > > >> * Getting better, faster results using Solr's findability and > > >> relevance > > >> improvement tools > > >> * Deploying Solr in production, including monitoring performance > > >> and trends > > >> with the LucidGaze for Solr performance profiler > > >> > > >> -Grant > > > > > > This email/fax message is for the sole use of the intended > > recipient(s) and may contain confidential and privileged information. > > Any unauthorized review, use, disclosure or distribution of this > > email/fax is prohibited. If you are not the intended recipient, please > > destroy all paper and electronic copies of the original message. > > > > >
Re: defaultOperator="AND" and queries with "("
On Thu, Aug 13, 2009 at 5:31 AM, Subbacharya, Madhu < madhu.subbacha...@corp.aol.com> wrote: > > Hello, > > We have Solr running with the defaultOperator set to "AND". Am not able > to get any results for queries like q=( Ferrari AND ( "599 GTB Fiorano" OR > "612 Scaglietti" OR F430 )) , which contain "(" for grouping. Anyone have > any ideas for a workaround ? > > Can you try adding debugQuery=on to the request and post the details here? -- Regards, Shalin Shekhar Mangar.
Re: [OT] Solr Webinar
if anyone has any pointer to this webinar, please share it. thanks! mani On Thu, Aug 13, 2009 at 9:26 PM, Chenini, Mohamed wrote: > I also registered to attend but I am not going to because here at work a > last minute meeting has been scheduled at the same time. > > Is it possible in the future to schedule such webinars starting 5-6 PM > ET? > > Thanks, > Mohamed > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Wednesday, August 12, 2009 6:22 PM > To: solr-user@lucene.apache.org > Subject: Re: [OT] Solr Webinar > > I believe it will be, but am not sure of the procedure for > distributing. I think if you register, but don't show, you will get a > notification. > > -Grant > > On Aug 10, 2009, at 12:26 PM, Lucas F. A. Teixeira wrote: > > > Hello Grant, > > Will the webinar be recorded and available to download later > > someplace? > > Unfortunately, I can't watch this time. > > > > Thanks, > > > > []s, > > > > Lucas Frare Teixeira .*. > > - lucas...@gmail.com > > - blog.lucastex.com > > - twitter.com/lucastex > > > > > > On Mon, Aug 10, 2009 at 12:33 PM, Grant Ingersoll > > wrote: > > > >> I will be giving a free one hour webinar on getting started with > >> Apache > >> Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT > >> > >> You can sign up @ > >> http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP > >> > >> I will present and demo: > >> * Getting started with LucidWorks for Solr > >> * Getting better, faster results using Solr's findability and > >> relevance > >> improvement tools > >> * Deploying Solr in production, including monitoring performance > >> and trends > >> with the LucidGaze for Solr performance profiler > >> > >> -Grant > > > This email/fax message is for the sole use of the intended > recipient(s) and may contain confidential and privileged information. > Any unauthorized review, use, disclosure or distribution of this > email/fax is prohibited. If you are not the intended recipient, please > destroy all paper and electronic copies of the original message. > >
Re: Questions about XPath in data import handler
yes. look at the 'flatten' attribute in the field. It should give you all the text (not attributes) under a given node. On Thu, Aug 13, 2009 at 8:02 PM, Andrew Clegg wrote: > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> On Thu, Aug 13, 2009 at 6:35 PM, Andrew Clegg >> wrote: >> >>> Does the second one mean "select the value of the attribute called >>> qualifier >>> in the /a/b/subject element"? >> >> yes you are right. Isn't that the semantics of standard xpath syntax? >> > > Yes, just checking since the DIH XPath engine is a little different. > > Do you know what I would get in this case? > >> > Also... Can I select a non-leaf node and get *ALL* the text underneath >> it? >> > e.g. /a/b in this example? > > Cheers, > > Andrew. > > -- > View this message in context: > http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954869.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com