Re: 'Minimum Should Match' on subquery level
Thanks a lot for reply. But I've already figured out that nested queries can help me to implement what I was looking for. -Myron 2010/5/28 Chris Hostetter > > : I need to use Lucene's `minimum number should match` option of > BooleanQuery > : on Solr. > > unfortunately, the Lucene QueryParser doesn't support any way of > manipulating the minNumberSHouldMatch property of BooleanQueries specified > in that syntax. > > I'm not sure of anyway to do what you're looking for w/o some custom code > (either customing the QUeryParser, or writing a QParser that modifies the > BooleanQueries produced) > > > > > > > > > -Hoss > >
Re: Sites with Innovative Presentation of Tags and Facets
NP ;-) . Just to explain: With tooltips I meant js-tooltips (not the native webbrowser tooltips) since sliders require JS anyway, presenting additional info in a Js-tooltip on drag, doesn't limit the nr of people able to view it. I think this is ok from a usability standpoint since I don't consider the 'nr of items left' info 100% essential (after all lots of sites do well without it at the moment). Call if graceful degradation ;-) As for mobile, I never realized that 'hover' is an issue on mobile, but on drag is supported on mobile touch displays... Moreover, having a navigational-complex site like kayak.com / tripadvisor.com to work well on mobile (from a usability perspective) is pretty much an utopia anyway. For these types of sites, specialized mobile sites (or apps as is the case for the above brands) are the way to go in my opinion. Geert-Jan 2010/5/28 Mark Bennett > Haha! Important tooltips are now "deprecated" in Web Applications. > > This is nothing "official", of course. > > But it's being advised to avoid important UI tasks that require cursor > tracking, mouse-over, hovering, etc. in web applications. > > Why? Many touch-centric mobile devices don't support "hover". For me I'm > used to my laptop where the touch pad or stylus *is* able to measure the > pressure. But the finger based touch devices generally can differenciate > it > I guess. > > They *can* tell one gesture from another, but only looking at the timing > and > shape. And hapless hover aint one of them. > > With that said, I'm still a fan of Tool Tips in desktop IDE's like Eclipse, > or even on Web applications when I'm on a desktop. > > I guess the point is that, if it's a really important thing, then you need > to expose it in another way on mobile. > > Just passing this on, please don't shoot the messenger. ;-) > > Mark > > -- > Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 > > > On Thu, May 27, 2010 at 2:55 PM, Geert-Jan Brits wrote: > > > Perhaps you could show the 'nr of items left' as a tooltip of sorts when > > the > > user actually drags the slider. > > If the user doesn't drag (or hovers over ) the slider 'nr of items left' > > isn't shown. > > > > Moreover, initially a slider doesn't limit the results so 'nr of items > > left' > > shown for the slider would be the same as the overall number of items > left > > (thereby being redundant) > > > > I must say I haven't seen this been implemented but it would be rather > easy > > to adapt a slider implementation, to show the nr on drag/ hover. (they > > exit > > for jquery, scriptaculous and a bunch of other libs) > > > > Geert-Jan > > > > 2010/5/27 Lukas Kahwe Smith > > > > > > > > On 27.05.2010, at 23:32, Geert-Jan Brits wrote: > > > > > > > Something like sliders perhaps? > > > > Of course only numerical ranges can be put into sliders. (or a > concept > > > that > > > > may be logically presented as some sort of ordening, such as "bad, > hmm, > > > > good, great" > > > > > > > > Use Solr's Statscomponent to show the min and max values > > > > > > > > Have a look at tripadvisor.com for good uses/implementation of > sliders > > > > (price, and reviewscore are presented as sliders) > > > > my 2c: try to make the possible input values discrete (like at > > > tripadvisor) > > > > which gives a better user experience and limits the potential nr of > > > queries > > > > (cache-wise advantage) > > > > > > > > > yeah i have been pondering something similar. but i now realized that > > this > > > way the user doesnt get an overview of the distribution without > actually > > > applying the filter. that being said, it would be nice to display 3 > > numbers > > > with the silders, the count of items that were filtered out on the > lower > > and > > > upper boundaries as well as the number of items still left (*). > > > > > > aside from this i just put a little tweak to my facetting online: > > > http://search.un-informed.org/search?q=malaria&tm=any&s=Search > > > > > > if you deselect any of the checkboxes, it updates the counts. however i > > > display both the count without and with those additional checkbox > filters > > > applied (actually i only display two numbers of they are not the same): > > > http://screencast.com/t/MWUzYWZkY2Yt > > > > > > regards, > > > Lukas Kahwe Smith > > > m...@pooteeweet.org > > > > > > (*) if anyone has a slider that can do the above i would love to > > integrate > > > that and replace the adoption year checkboxes with that > > >
Re: Sites with Innovative Presentation of Tags and Facets
On 5/28/2010 9:31 PM, Chris Hostetter wrote: : Perhaps you could show the 'nr of items left' as a tooltip of sorts when the : user actually drags the slider. Years ago, when we were first working on building Solr, a coworker of mind suggested using double bar sliders (ie: pick a range using a min and a max) for all numeric facets and putting "sparklines" above them to give the user a visual indication of the "spread" of documents across the numeric spectrum. it wsa a little more complicated then anything we needed -- and seemed like a real pain in hte ass to implement. i still don't know of anyone doing anything like that, but it's definitley an interesting idea. The hard part is really just deciding what "quantum" interval you want to use along the xaxis to decide how to count the docs for the y axis. http://en.wikipedia.org/wiki/Sparkline http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR -Hoss I love the idea of a sparkline at range-sliders. I think if I have time, I might add them to the range sliders on our site. I already have all the data since I show the count for a range while the user is dragging by storing the facet counts for each interval in javascript.
Re: Sites with Innovative Presentation of Tags and Facets
Interesting.. say you have a double slider with a discrete range (like tripadvisor et.al.) perhaps it would be a good guideline to use these discrete points for the quantum interval for the sparkline as well? Of course it then becomes the question which discrete values to use for the slider. I tend to follow what tripadvisor does for it's price-slider: set a cap for the max price, and set a fixed interval ($25) for the discrete steps. (of course there are edge cases like when no product hits the maximum capped price) I have also seen non-linear steps implemented, but I guess this doesn't go well with the notion of sparlines. Anyway, from a implementation standpoint it would be enough for Solr to return the 'nr of items' per interval. From that, it would be easy to calculate on the application-side the 'nr of items' for each possible slider-combination. getting these values from solr would require (staying with the price-example): - a new discretised price field. And doing a facet.field. - the (continu) price field already present, and doing 50 facet queries (if you have 50 steps) - another more elegant way ;-) . Perhaps an addition to statscomponent that returns all counts within a discrete (to be specified) step? Would this slow the statscomponent-code down a lot, or ir the info already (almost) present in statscomponent for doing things as calculating sddev / means, etc? - something I'm completely missing... 2010/5/28 Chris Hostetter > > : Perhaps you could show the 'nr of items left' as a tooltip of sorts when > the > : user actually drags the slider. > > Years ago, when we were first working on building Solr, a coworker of mind > suggested using double bar sliders (ie: pick a range using a min and a > max) for all numeric facets and putting "sparklines" above them to give > the user a visual indication of the "spread" of documents across the > numeric spectrum. > > it wsa a little more complicated then anything we needed -- and seemed > like a real pain in hte ass to implement. i still don't know of anyone > doing anything like that, but it's definitley an interesting idea. > > The hard part is really just deciding what "quantum" interval you want > to use along the xaxis to decide how to count the docs for the y axis. > > http://en.wikipedia.org/wiki/Sparkline > http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR > > > -Hoss > >
Re: Sites with Innovative Presentation of Tags and Facets
May I ask how you implemented getting the facet counts for each interval? Do you use a facet-query per interval? And perhaps for inspiration a link to the site you implemented this .. Thanks, Geert-Jan I love the idea of a sparkline at range-sliders. I think if I have time, I > might add them to the range sliders on our site. I already have all the data > since I show the count for a range while the user is dragging by storing the > facet counts for each interval in javascript. >
Re: Sites with Innovative Presentation of Tags and Facets
On 31.05.2010, at 11:29, Geert-Jan Brits wrote: > May I ask how you implemented getting the facet counts for each interval? Do > you use a facet-query per interval? > And perhaps for inspiration a link to the site you implemented this .. > > Thanks, > Geert-Jan > > I love the idea of a sparkline at range-sliders. I think if I have time, I >> might add them to the range sliders on our site. I already have all the data >> since I show the count for a range while the user is dragging by storing the >> facet counts for each interval in javascript. >> i guess the easiest is to do the intervals at index time, obviously less flexible. regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: Luke browser does not show non-String Solr fields?
> Solr 1.4 > > > You haven't identified the version of Luke you're > using. > > Luke 1.0.1 (2010-04-01) > I think with solr you need to use Release 0.9.9.1 or 0.9.9 Because solr 1.4.0 uses lucene 2.9.1
Solr 1.4 query fails against all fields, but succeed if field is specified.
Hi, I have created in index with several fields. If I query my index in the admin section of solr (or via http request), I get results for my search if I specify the requested field: Query: note:Aspergillus (look for "Aspergillus" in field "note") However, if I query the same word against all fields ("Aspergillus" or "all:Aspergillus") , I have no match in response from Solr. Do you have any idea of what can be wrong with my index? Regards Olivier
Re: Sites with Innovative Presentation of Tags and Facets
On 5/31/2010 11:29 AM, Geert-Jan Brits wrote: May I ask how you implemented getting the facet counts for each interval? Do you use a facet-query per interval? And perhaps for inspiration a link to the site you implemented this .. Thanks, Geert-Jan I love the idea of a sparkline at range-sliders. I think if I have time, I might add them to the range sliders on our site. I already have all the data since I show the count for a range while the user is dragging by storing the facet counts for each interval in javascript. Hi, Sorry, seems I pressed send halfway through my mail and forgot about it. The site I implemented my numerical range faceting on is http://www.mysecondhome.co.uk/search.html and I got the facets by making a small patch for Solr (https://issues.apache.org/jira/browse/SOLR-1240) which does the same thing for numbers what date faceting does for dates. The biggest issue with range-faceting is the double counting of edges (which also happens in date faceting, see https://issues.apache.org/jira/browse/SOLR-397). My patch deals with that by adding an extra parameter which allows you specify which end of the range query should be exclusive. A secondary issue is that you can't do filter queries with one end inclusive and one end exclusive (i.e. price:[500 TO 1000}). You can get around this by doing "price:({500 TO 1000} OR 500)". I've looked into the JavaCC code of Lucene to see if I could fix it so you could mix [] and {} but unfortunately I'm not familiar enough with it to get it to work. Regards, gwk
Re: Solr 1.4 query fails against all fields, but succeed if field is specified.
Am 31.05.2010 11:50, schrieb olivier sallou: > Hi, > I have created in index with several fields. > If I query my index in the admin section of solr (or via http request), I > get results for my search if I specify the requested field: > Query: note:Aspergillus (look for "Aspergillus" in field "note") > However, if I query the same word against all fields ("Aspergillus" or > "all:Aspergillus") , I have no match in response from Solr. Querying "Aspergillus" without a field does only work if you're using DisMaxHandler. Do you have a field "all"? Try "*:Aspergillus" instead.
Re: Solr 1.4 query fails against all fields, but succeed if field is specified.
Check your request handler setting, what do you have in the query field (qf) entry ? On 5/31/10, olivier sallou wrote: > > Hi, > I have created in index with several fields. > If I query my index in the admin section of solr (or via http request), I > get results for my search if I specify the requested field: > Query: note:Aspergillus (look for "Aspergillus" in field "note") > However, if I query the same word against all fields ("Aspergillus" or > "all:Aspergillus") , I have no match in response from Solr. > > Do you have any idea of what can be wrong with my index? > > Regards > > > Olivier > -- Abdelhamid ABID Software Engineer- J2EE / WEB
Re: TikaEntityProcessor not working?
BinFileDataSource will only work with file, Try FieldStreamDataSource On Mon, May 31, 2010 at 3:30 AM, Brad Greenlee wrote: > Hi. I'm trying to get Solr to index a database in which one column is a > filename of a PDF document I'd like to index. My configuration looks like > this: > > > url="jdbc:mysql://localhost/document_db" user="user" password="password" > readOnly="true"/> > > > > url="/some/path/${document.filename}" dataSource="ds-file" format="text"> > > > > > > > I'm using Solr from trunk (as of two days ago). The import process > completes without errors, and it picks up the columns from the database, but > not the content from the PDF file. It is definitely trying to access the PDF > file, for if I give it an incorrect path name, it complains. It doesn't seem > to be attempting to index the PDF, though, as it completes in about 40ms, > whereas if I import the PDF via the ExtractingRequestHandler, it takes about > 11 seconds to index it. > > I've also tried the tika example in example-DIH and that doesn't seem to > index anything, either. Am I doing something wrong, or is this just not > working yet? > > Cheers, > > Brad > > -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr 1.4 query fails against all fields, but succeed if field is specified.
On 5/31/2010 11:50 AM, olivier sallou wrote: Hi, I have created in index with several fields. If I query my index in the admin section of solr (or via http request), I get results for my search if I specify the requested field: Query: note:Aspergillus (look for "Aspergillus" in field "note") However, if I query the same word against all fields ("Aspergillus" or "all:Aspergillus") , I have no match in response from Solr. Do you have any idea of what can be wrong with my index? Regards Olivier Look for the tag in your schema.xml. The field defined in it determines the default field which is searched when no explicit field is specified in your query. Regards, gwk
AW: strange results with query and hyphened words
i am not very sure, whether this helps me. i see the point, that there will be problems. but the default-config for index is: and for query: with this settings i don't find "profiauskunft" when searching for "profi-auskunft" (analyse0.jpg) if i use "catenateWords="1"" analysis.jsp says that there is a match (analyse1.jpg). but in our life search "profi-auskunft" won't match "profiaukunft", only finds "profi-auskunft". could anyone please clearify the output of analysis.jsp for me. why is there a highlight in anylises.jsp but not a match when doing a search. even from the admin panel when i have profi auskunft profiauskunft does this mean "profi (auskunft profiauksunft)" will match the word "profi" follewed by "auskunft" or "profiauksunft". is this OR the same as i configure with defaultOperator in solrQueryParser-tag? the "OR"-thing does only apply to the query-part, right? what will that mean in the index part? > -Ursprüngliche Nachricht- > Von: Sascha Szott [mailto:sz...@zib.de] > Gesendet: Sonntag, 30. Mai 2010 19:01 > An: solr-user@lucene.apache.org > Betreff: Re: strange results with query and hyphened words > > Hi Markus, > > I was facing the same problem a few days ago and found an > explanation in > the mail archive that clarifies my question regarding the usage of > Solr's WordDelimiterFilterFactory: > > http://markmail.org/message/qoby6kneedtwd42h > > Best, > Sascha > > markus.rietz...@rzf.fin-nrw.de wrote: > > i am wondering why a search term with hyphen doesn't match. > > > > my search term is "prof-auskunft". in > WordDelimiterFilterFactory i have > > catenateWords, so my understanding is that profi-auskunft > would search > > for profiauskunft. when i use the analyse panel in solr > admi i see that > > profi-auskunft matches a term "profiauskunft". > > > > the analyse will show > > > > Query Analyzer > > WhitespaceTokenizerFactory > > profi-auskunft > > SynonymFilterFactory > > profi-auskunft > > StopFilterFactory > > profi-auskunft > > > > WordDelimiterFilterFactory > > > > term position 1 2 > > term text profi auskunft > > profiauskunft > > term type wordword > > word > > source start,end0,5 6,14 > > 0,15 > > > > LowerCaseFilterFactory > > SnowballPorterFilterFactory > > > > why is auskunft and profiauskunft in one column. how do they get > > searched? > > > > when i search "profiauskunft" i have 230 hits, when i now search for > > "profi-auskunft" i do get less hits. when i call the search with > > debugQuery=on i see > > > > body:"profi (auskunft profiauskunft)" > > > > what does this query mean? profi and "auskunft or profiauskunft"? > > > > > > > > > > > positionIncrementGap="100"> > > > > > > > > > > > > > > > > > ignoreCase="true" > > words="de/stopwords_de.txt" > > enablePositionIncrements="true" > > /> > > > > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > > > > language="German" protected="de/protwords_de.txt"/> > > > > > > > > > synonyms="de/synonyms_de.txt" ignoreCase="true" expand="true"/> > > > ignoreCase="true" > > words="de/stopwords_de.txt" > > enablePositionIncrements="true" > > /> > > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > > > > language="German" protected="de/protwords_de.txt"/> > > > > > > > > > >
Restricting the values returned by Facet Fields using Filter Query
Hi, Is it possible to restrict the values returned by Facet Fields using Filter Queries to Group on only those documents will pass the filter query passed in filter criteria?? I am under the assumption that fq is disjoint from facet.field function. Let me know if my assumptions are right or wrong. Regards, Ninad R
Re: AW: XSLT for JSON
thx for your help. the problem ist that our app already have an suggest version implement. and now, i want to use a new "version" of autosuggestion, but the responseformat isnt the same. so its not backward compatible. the client cannot change the uses of the response format ... =( today i will try out with velocity and the json.nl parameter. i didnt know about these options. thx =) -- View this message in context: http://lucene.472066.n3.nabble.com/XSLT-for-JSON-tp845386p858025.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Restricting the values returned by Facet Fields using Filter Query
On 5/31/2010 12:01 PM, Ninad Raut wrote: Hi, Is it possible to restrict the values returned by Facet Fields using Filter Queries to Group on only those documents will pass the filter query passed in filter criteria?? I am under the assumption that fq is disjoint from facet.field function. Let me know if my assumptions are right or wrong. Regards, Ninad R Hi, Filter queries do restrict the documents set used for faceting. In fact you have to explicitly turn it off if you don't want it to by using tagging/excluding (see http://wiki.apache.org/solr/SimpleFacetParameters#LocalParams_for_faceting). Regards, gwk
Re: strange results with query and hyphened words
Hi Markus, the default-config for index is: and for query: That's not true. The default configuration for query-time processing is: By using this setting, a search for "profi-auskunft" will match "profiauskunft". It's important to note, that WordDelimiterFilterFactory's catenate* parameters should only be used in the index-time analysis stack. Otherwise the strange behaviour (search for profi-auskunft is translated into "profi followed by (auskunft or profiauskunft)" you mentioned will occur. Best, Sascha -Ursprüngliche Nachricht- Von: Sascha Szott [mailto:sz...@zib.de] Gesendet: Sonntag, 30. Mai 2010 19:01 An: solr-user@lucene.apache.org Betreff: Re: strange results with query and hyphened words Hi Markus, I was facing the same problem a few days ago and found an explanation in the mail archive that clarifies my question regarding the usage of Solr's WordDelimiterFilterFactory: http://markmail.org/message/qoby6kneedtwd42h Best, Sascha markus.rietz...@rzf.fin-nrw.de wrote: i am wondering why a search term with hyphen doesn't match. my search term is "prof-auskunft". in WordDelimiterFilterFactory i have catenateWords, so my understanding is that profi-auskunft would search for profiauskunft. when i use the analyse panel in solr admi i see that profi-auskunft matches a term "profiauskunft". the analyse will show Query Analyzer WhitespaceTokenizerFactory profi-auskunft SynonymFilterFactory profi-auskunft StopFilterFactory profi-auskunft WordDelimiterFilterFactory term position 1 2 term text profi auskunft profiauskunft term type wordword word source start,end0,5 6,14 0,15 LowerCaseFilterFactory SnowballPorterFilterFactory why is auskunft and profiauskunft in one column. how do they get searched? when i search "profiauskunft" i have 230 hits, when i now search for "profi-auskunft" i do get less hits. when i call the search with debugQuery=on i see body:"profi (auskunft profiauskunft)" what does this query mean? profi and "auskunft or profiauskunft"?
Re: Solr 1.4 query fails against all fields, but succeed if field is specified.
Ok, I use default e.g. standard request handler. Using "*:Aspergillus" does not work either. I can try with DisMax but this means that I know all field names. My schema knows a number of them, but some other fields are defined via dynamic fields (I know the type, but I do not know their names). Is there any way to query all fields including dynamic ones? thanks Olivier 2010/5/31 Michael Kuhlmann > Am 31.05.2010 11:50, schrieb olivier sallou: > > Hi, > > I have created in index with several fields. > > If I query my index in the admin section of solr (or via http request), I > > get results for my search if I specify the requested field: > > Query: note:Aspergillus (look for "Aspergillus" in field "note") > > However, if I query the same word against all fields ("Aspergillus" or > > "all:Aspergillus") , I have no match in response from Solr. > > Querying "Aspergillus" without a field does only work if you're using > DisMaxHandler. > > Do you have a field "all"? > > Try "*:Aspergillus" instead. >
Re: Solr 1.4 query fails against all fields, but succeed if field is specified.
Am 31.05.2010 12:36, schrieb olivier sallou: > Is there any way to query all fields including dynamic ones? Yes, using the *:term query. (Please note that the asterisk should not be quoted.) To answer your question, we need more details on your Solr configuration, esp. the part of schema.xml that defines your "note" field. Greetings, Michael
Re: Solr 1.4 query fails against all fields, but succeed if field is specified.
I finally got a solution. As I use dynamic fields. I use the copyField to a global indexed attribute, and specify this attribute as defaultSearchField in my schema. The *:term with "standard" query type fails without this... This solution requires to double the required indexing data but works in all cases... In my schema I have: Some other fields are "lowercase" or "int" types. Regards 2010/5/31 Michael Kuhlmann > Am 31.05.2010 12:36, schrieb olivier sallou: > > Is there any way to query all fields including dynamic ones? > > Yes, using the *:term query. (Please note that the asterisk should not > be quoted.) > > To answer your question, we need more details on your Solr > configuration, esp. the part of schema.xml that defines your "note" field. > > Greetings, > Michael > > >
Re: strange results with query and hyphened words
Sorry Markus, I mixed up the index and query field in analysis.jsp. In fact, I meant that a search for profiauskunft matches profi-auskunft. I'm not sure, whether the case you are dealing with (search for profi-auskunft should match profiauskunft) is appropriately addressed by the WordDelimiterFilter. What about using the PatternReplaceCharFilter at query time to eliminate all intra-word hyphens? -Sascha Sascha Szott wrote: Hi Markus, the default-config for index is: and for query: That's not true. The default configuration for query-time processing is: By using this setting, a search for "profi-auskunft" will match "profiauskunft". It's important to note, that WordDelimiterFilterFactory's catenate* parameters should only be used in the index-time analysis stack. Otherwise the strange behaviour (search for profi-auskunft is translated into "profi followed by (auskunft or profiauskunft)" you mentioned will occur. Best, Sascha -Ursprüngliche Nachricht- Von: Sascha Szott [mailto:sz...@zib.de] Gesendet: Sonntag, 30. Mai 2010 19:01 An: solr-user@lucene.apache.org Betreff: Re: strange results with query and hyphened words Hi Markus, I was facing the same problem a few days ago and found an explanation in the mail archive that clarifies my question regarding the usage of Solr's WordDelimiterFilterFactory: http://markmail.org/message/qoby6kneedtwd42h Best, Sascha markus.rietz...@rzf.fin-nrw.de wrote: i am wondering why a search term with hyphen doesn't match. my search term is "prof-auskunft". in WordDelimiterFilterFactory i have catenateWords, so my understanding is that profi-auskunft would search for profiauskunft. when i use the analyse panel in solr admi i see that profi-auskunft matches a term "profiauskunft". the analyse will show Query Analyzer WhitespaceTokenizerFactory profi-auskunft SynonymFilterFactory profi-auskunft StopFilterFactory profi-auskunft WordDelimiterFilterFactory term position 1 2 term text profi auskunft profiauskunft term type word word word source start,end 0,5 6,14 0,15 LowerCaseFilterFactory SnowballPorterFilterFactory why is auskunft and profiauskunft in one column. how do they get searched? when i search "profiauskunft" i have 230 hits, when i now search for "profi-auskunft" i do get less hits. when i call the search with debugQuery=on i see body:"profi (auskunft profiauskunft)" what does this query mean? profi and "auskunft or profiauskunft"?
Question about specifying the query analysis at query time
Hey there, I am facing a problem related to query analysis and stopwords. Have some ideas how to sort it out but would like to do it in the cleanest way possible. I am using dismax and I query to 3 fields. These fields are defined as "text" this way: stopword.txt has the same words in the index and query analyzer. The thing is, in some search requests (not all of them) want to add some extra stopwords (at query time). The 3 fields would have the same extra stopwords. I want these extra stopwords to be indexed in the index but that some searches never find these words. All documents would be indexed with the same analyzer but want a different one at search time depending on a defined criteria. Before executing the query I already now How would be the best way to do this? Thanks in advance
How bad is stopping Solr with SIGKILL?
Hi folks, I had a Solr instance (in Jetty on Linux) taken down by a process monitoring tool (God) with a SIGKILL recently. How bad is this? Can it cause index corruption if it's in the middle of indexing something? Or will it just lose uncommitted changes? What if the signal arrives in the middle of the commit process? Unfortunately I can't tell exactly what it was doing at the time as someone's deleted the logfile :-( Thanks, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/How-bad-is-stopping-Solr-with-SIGKILL-tp858119p858119.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: strange results with query and hyphened words
> > Sorry Markus, I mixed up the index and query field in > analysis.jsp. In > fact, I meant that a search for profiauskunft matches profi-auskunft. > > I'm not sure, whether the case you are dealing with (search for > profi-auskunft should match profiauskunft) is appropriately > addressed by the WordDelimiterFilter. ok, seems like this is the point. > What about using the PatternReplaceCharFilter > at query time to eliminate all intra-word hyphens? > ok, would be a way. i thought that catenateWords would help at this point, but it doesn't. wonder then whats the difference between a patternreplacement and the catenateWords. markus
AW: strange results with query and hyphened words
> I'm not sure, whether the case you are dealing with (search for > profi-auskunft should match profiauskunft) is appropriately > addressed by > the WordDelimiterFilter. What about using the > PatternReplaceCharFilter > at query time to eliminate all intra-word hyphens? > maybe it would be best to have solr search for "profi-auskunft" or "profiauskunft" if i have "profi-auskunft" as the query. maybe it is not a good idea to remove the hyphen at all. markus
Re: Restricting the values returned by Facet Fields using Filter Query
Maybe what you're looking for is facet.mincount=1 ? Erik On May 31, 2010, at 6:01 AM, Ninad Raut wrote: Hi, Is it possible to restrict the values returned by Facet Fields using Filter Queries to Group on only those documents will pass the filter query passed in filter criteria?? I am under the assumption that fq is disjoint from facet.field function. Let me know if my assumptions are right or wrong. Regards, Ninad R
Re: TikaEntityProcessor not working?
It is a file. Only the filename is stored in the database. Brad On May 31, 2010, at 2:59 AM, Noble Paul നോബിള് नो ब्ळ् wrote: BinFileDataSource will only work with file, Try FieldStreamDataSource On Mon, May 31, 2010 at 3:30 AM, Brad Greenlee wrote: Hi. I'm trying to get Solr to index a database in which one column is a filename of a PDF document I'd like to index. My configuration looks like this: url="jdbc:mysql://localhost/document_db" user="user" password="password" readOnly="true"/> url="/some/path/${document.filename}" dataSource="ds-file" format="text"> I'm using Solr from trunk (as of two days ago). The import process completes without errors, and it picks up the columns from the database, but not the content from the PDF file. It is definitely trying to access the PDF file, for if I give it an incorrect path name, it complains. It doesn't seem to be attempting to index the PDF, though, as it completes in about 40ms, whereas if I import the PDF via the ExtractingRequestHandler, it takes about 11 seconds to index it. I've also tried the tika example in example-DIH and that doesn't seem to index anything, either. Am I doing something wrong, or is this just not working yet? Cheers, Brad -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Restricting the values returned by Facet Fields using Filter Query
Hi, I tried a small POC and found that Filter queries do restrict the documents set used for "Group By" on Facet Field. It will help me restrict documents on some other filters when I am grouping on muti-valued "Buzz" field. Thanks Gijs and Erik. Regards, Ninad R On 5/31/10, Erik Hatcher wrote: > Maybe what you're looking for is facet.mincount=1 ? > > Erik > > On May 31, 2010, at 6:01 AM, Ninad Raut wrote: > >> Hi, >> >> Is it possible to restrict the values returned by Facet Fields using >> Filter >> Queries to Group on only those documents will pass the filter query >> passed >> in filter criteria?? >> >> >> I am under the assumption that fq is disjoint from facet.field >> function. Let >> me know if my assumptions are right or wrong. >> >> Regards, >> Ninad R > >
Re: Sites with Innovative Presentation of Tags and Facets
On 5/31/2010 11:50 AM, gwk wrote: On 5/31/2010 11:29 AM, Geert-Jan Brits wrote: May I ask how you implemented getting the facet counts for each interval? Do you use a facet-query per interval? And perhaps for inspiration a link to the site you implemented this .. Thanks, Geert-Jan I love the idea of a sparkline at range-sliders. I think if I have time, I might add them to the range sliders on our site. I already have all the data since I show the count for a range while the user is dragging by storing the facet counts for each interval in javascript. Hi, Sorry, seems I pressed send halfway through my mail and forgot about it. The site I implemented my numerical range faceting on is http://www.mysecondhome.co.uk/search.html and I got the facets by making a small patch for Solr (https://issues.apache.org/jira/browse/SOLR-1240) which does the same thing for numbers what date faceting does for dates. The biggest issue with range-faceting is the double counting of edges (which also happens in date faceting, see https://issues.apache.org/jira/browse/SOLR-397). My patch deals with that by adding an extra parameter which allows you specify which end of the range query should be exclusive. A secondary issue is that you can't do filter queries with one end inclusive and one end exclusive (i.e. price:[500 TO 1000}). You can get around this by doing "price:({500 TO 1000} OR 500)". I've looked into the JavaCC code of Lucene to see if I could fix it so you could mix [] and {} but unfortunately I'm not familiar enough with it to get it to work. Regards, gwk Hi, I was supposed to work on something else but I just couldn't resist, and just implemented some bar-graphs for the range sliders and I really like it. In my case it was really easy, all the data was already right there in javascript so it's not causing additional server side load. It's also really nice to see the graph updating when a facet is selected/changed. Regards, gwk
Re: Sites with Innovative Presentation of Tags and Facets
On 5/31/2010 4:24 PM, gwk wrote: On 5/31/2010 11:50 AM, gwk wrote: On 5/31/2010 11:29 AM, Geert-Jan Brits wrote: May I ask how you implemented getting the facet counts for each interval? Do you use a facet-query per interval? And perhaps for inspiration a link to the site you implemented this .. Thanks, Geert-Jan I love the idea of a sparkline at range-sliders. I think if I have time, I might add them to the range sliders on our site. I already have all the data since I show the count for a range while the user is dragging by storing the facet counts for each interval in javascript. Hi, Sorry, seems I pressed send halfway through my mail and forgot about it. The site I implemented my numerical range faceting on is http://www.mysecondhome.co.uk/search.html and I got the facets by making a small patch for Solr (https://issues.apache.org/jira/browse/SOLR-1240) which does the same thing for numbers what date faceting does for dates. The biggest issue with range-faceting is the double counting of edges (which also happens in date faceting, see https://issues.apache.org/jira/browse/SOLR-397). My patch deals with that by adding an extra parameter which allows you specify which end of the range query should be exclusive. A secondary issue is that you can't do filter queries with one end inclusive and one end exclusive (i.e. price:[500 TO 1000}). You can get around this by doing "price:({500 TO 1000} OR 500)". I've looked into the JavaCC code of Lucene to see if I could fix it so you could mix [] and {} but unfortunately I'm not familiar enough with it to get it to work. Regards, gwk Hi, I was supposed to work on something else but I just couldn't resist, and just implemented some bar-graphs for the range sliders and I really like it. In my case it was really easy, all the data was already right there in javascript so it's not causing additional server side load. It's also really nice to see the graph updating when a facet is selected/changed. Regards, gwk (Tried attaching an image, but it didn't work, so here it is: http://img249.imageshack.us/img249/7766/faceting.png)
deleteDocByID
Hello. i have a littel porblem to synchronize my index with my database. we have an extra table for delta-import. we cannot use a modified-field =( so. in this delta-table where all id saved to update. after a deltaimport this table should be deleted. this works fine. but when an item is is deleted i save this in my delta-table with a flag "is_deleted" how can i add the row $deleteDocByID" ? i use an script for docBoosting and i thought to solve this problem on the same way. but it dont wont to work =( this is my script: function DeleteDoc(row){ var is_deleted = row.get('is_deleted'); var id = row.get('update_id'); if(is_deleted == true){ row.put('$deleteDocById', 'id'); } } can i use this script in my normal delta-import ? or should i create a new entity ? whats your solutions ? what do yu thing is the smartest way to delete these docs from the index ? thx =) =) =) -- View this message in context: http://lucene.472066.n3.nabble.com/deleteDocByID-tp858903p858903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 query fails against all fields, but succeed if field is specified.
: > Is there any way to query all fields including dynamic ones? : : Yes, using the *:term query. (Please note that the asterisk should not : be quoted.) uh ...no, completely incorrect. you can not use "*" to denote 'all fields' in that way. there is no syntax for "find this term in any field" ... every query involves a field of some kind. if you want to be able to query against the text of "all" fields you need to use copyField to create some kind of "all" or "allText" field 9you can name it whatever you want) -Hoss
Re: deleteDocByID
oh sry, i solve it with this entity: ;) thx nabble -- View this message in context: http://lucene.472066.n3.nabble.com/deleteDocByID-tp858903p858951.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does SOLR Allow q= (A or B) AND (C or D)?
http://lucene.472066.n3.nabble.com/file/n859016/qa.writepublic.com.xml qa.writepublic.com.xml I modified your schema.xml You need to restart jetty and re-index you documents. After that in the solr admin page, if you search prefix_full:"george clo" prefix_token:(george clo) you will get your documents for to use in suggestion. After trying this, can you tell us if this is what you were looking for? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-SOLR-Allow-q-A-or-B-AND-C-or-D-tp849703p859016.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Luke browser does not show non-String Solr fields?
Thanks for the suggestion. I tried 0.9.9.1 but saw the same problem. I didn't see 0.9.9 on their download page. On Mon, May 31, 2010 at 2:39 AM, Ahmet Arslan wrote: > >> Solr 1.4 >> >> > You haven't identified the version of Luke you're >> using. >> >> Luke 1.0.1 (2010-04-01) >> > > I think with solr you need to use Release 0.9.9.1 or 0.9.9 > Because solr 1.4.0 uses lucene 2.9.1 > > > >
Re: Luke browser does not show non-String Solr fields?
> Thanks for the suggestion. I tried > 0.9.9.1 but saw the same problem. > I didn't see 0.9.9 on their download page. http://www.getopt.org/luke/ has 0.9.9 version. But that may not be the issue. I suspect that trie based fields causing this. Because they index each value at various levels of precision. Do you have problems with other than trie based (tint, tdate, etc) types?
Re: Luke browser does not show non-String Solr fields?
: 1. Queries like "id:123" which work fine in /solr/admin web interface but : returns nothing in Luke. Query "*:*" returns all records fine in Luke. I : expect Luke returns the same result as /solr/admin since it's essentially : a Lucene query? you haven't told us what fieldtype you are using for the "id" field -- but i'm going to go out on a limb and guess it's TrieIntFieldType (or possibly a SortedIntFieldType) ... those field types encode their values in such a way that they sort lexigraphicaly and produce faster range queries -- if Luke doesn't kow about that special encoding, it can search on them (or even display the terms properly) Luke has a "view terms" feature right? ... look at the raw terms in your "id" ifeld and i bet you'll see they look nothing like numbers -- and that's why you can search on them as numbers in Luke (when you serach on them in Solr, SOlr knows about your schema, and knows about your field types, and can do the proper encoding/decoding) -Hoss
newbie question on how to batch commit documents
I have a newbie question on what is the best way to batch add/commit a large collection of document data via solrj. My first attempt was to write a multi-threaded application that did following. Collection docs = new ArrayList(); for (Widget w : widges) { doc.addField("id", w.getId()); doc.addField("name", w.getName()); doc.addField("price", w.getPrice()); doc.addField("category", w.getCat()); doc.addField("srcType", w.getSrcType()); docs.add(doc); // commit docs to solr server server.add(docs); server.commit(); } And I got this exception. rg.apache.solr.common.SolrException: Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86) The solrj wiki/documents seemed to indicate that because multiple threads were calling SolrServer.commit() which in term called CommonsHttpSolrServer.request() resulting in multiple searchers. My first thought was to change the configs for autowarming. But after looking at the autowarm params, I am not sure what can be changed or perhaps a different approach is recommened. Your help is much appreciated.
Re: Luke browser does not show non-String Solr fields?
I submitted a patch a few months back for a Solr Document Inspector which allows one to see the indexed values for any document in a Solr index ( https://issues.apache.org/jira/browse/SOLR-1837). This is more or less a port of Luke's DocumentReconstructor into Solr, but the tool additionally has access to all the solr schema/field type information for display purposes (i.e. Trie Fields are human-readable). This won't help you search for values in an index or inspect anything at a macro level (i.e. term counts across the index), but there are other tools in Solr for that. Given a UniqueID, however, you can view all the indexed values for each field in that particular document. You can always do a search within Solr for the values you are looking for and then use this tool to view the indexed values for any documents which match. This may or may not help you (I'm can't tell what problem you are trying to solve), but I thought it would be worth mentioning as one tool in your toolbox. -Trey > > > >
NPE error when extending DefaultSolrHighlighter
I was looking at solr-386 and thought I would try to create a custom highlighter for something I was doing. I created a class that looks something like this: public class CustomOutputHighlighter extends DefaultSolrHighlighter { @Override public NamedList doHighlighting(DocList docs, Query query, SolrQueryRequest req, String[] defaultFields) throws IOException { NamedList highlightedValues = super.doHighlighting(docs, query, req, defaultFields); // do more stuff here return highlightedValues } } and have replaced the line in my solrconfig xml so that it looks something like this: and left all the existing default highlighting parameters as-is The code compiles with no problem, and should simply perform the normal highlighting (since all I am doing is calling the original doHighlighting code and returning the results). However, when I start Solr, I get an NPE error: java.lang.NullPointerException at org.apache.solr.highlight.DefaultSolrHighlighter.init(DefaultSolrHighlighter.java:75) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:437) at org.apache.solr.core.SolrCore.initHighLighter(SolrCore.java:612) at org.apache.solr.core.SolrCore.(SolrCore.java:558) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) It doesn't seem to even call my custom highlighter (I put a breakpoint in which did not get hit). Any ideas re what I am doing wrong?? If I use the default highlighter, I don't get this error and have no problems I am using a copy of 1.5.0-dev solr ($Id: CHANGES.txt 906924 2010-02-05 12:43:11Z noble $) thanks for any advice -- View this message in context: http://lucene.472066.n3.nabble.com/NPE-error-when-extending-DefaultSolrHighlighter-tp859670p859670.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: newbie question on how to batch commit documents
Move the commit outside your loop and you'll be in better shape. Better yet, enable autocommit in solrconfig.xml and don't commit from your multithreaded client, otherwise you still run the risk of too many commits happening concurrently. Erik On May 31, 2010, at 5:27 PM, Steve Kuo wrote: I have a newbie question on what is the best way to batch add/commit a large collection of document data via solrj. My first attempt was to write a multi-threaded application that did following. Collection docs = new ArrayList(); for (Widget w : widges) { doc.addField("id", w.getId()); doc.addField("name", w.getName()); doc.addField("price", w.getPrice()); doc.addField("category", w.getCat()); doc.addField("srcType", w.getSrcType()); docs.add(doc); // commit docs to solr server server.add(docs); server.commit(); } And I got this exception. rg.apache.solr.common.SolrException: Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later at org .apache .solr .client .solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java: 424) at org .apache .solr .client .solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java: 243) at org .apache .solr .client .solrj .request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86) The solrj wiki/documents seemed to indicate that because multiple threads were calling SolrServer.commit() which in term called CommonsHttpSolrServer.request() resulting in multiple searchers. My first thought was to change the configs for autowarming. But after looking at the autowarm params, I am not sure what can be changed or perhaps a different approach is recommened. Your help is much appreciated.
Re: Storing different entities in Solr
Thanks for the replies guys. I am not at work so I don't have the exact schema but here's what it roughly looks like: Request: == id client_id pm_id pm2_id title Advisor: == id person_id address_id bio sector (IT, doctor, etc) There's another table. RequestAdvisor: = id advisor_id request_id The idea of adding a prefix to primary keys does sound good. I can just do advisor_123 and request_12345. On Sun, May 30, 2010 at 9:22 PM, Bill Au wrote: > There is only one primary key in a single index. If the id of your > different document types do collide, you can simply add a prefix or suffix > to make them unique. > > Bill > > On Fri, May 28, 2010 at 1:12 PM, Moazzam Khan wrote: > >> Thanks for all your answers guys. Requests and consultants have a many >> to many relationship so I can't store request info in a document with >> advisorID as the primary key. >> >> Bill's solution and multicore solutions might be what I am looking >> for. Bill, will I be able to have 2 primary keys (so I can update and >> delete documents)? If yes, can you please give me a link or someting >> where I can get more info on this? >> >> Thanks, >> Moazzam >> >> >> >> On Fri, May 28, 2010 at 11:50 AM, Bill Au wrote: >> > You can keep different type of documents in the same index. If each >> > document has a type field. You can restrict your searches to specific >> > type(s) of document by using a filter query, which is very fast and >> > efficient. >> > >> > Bill >> > >> > On Fri, May 28, 2010 at 12:28 PM, Nagelberg, Kallin < >> > knagelb...@globeandmail.com> wrote: >> > >> >> Multi-core is an option, but keep in mind if you go that route you will >> >> need to do two searches to correlate data between the two. >> >> >> >> -Kallin Nagelberg >> >> >> >> -Original Message- >> >> From: Robert Zotter [mailto:robertzot...@gmail.com] >> >> Sent: Friday, May 28, 2010 12:26 PM >> >> To: solr-user@lucene.apache.org >> >> Subject: Re: Storing different entities in Solr >> >> >> >> >> >> Sounds like you'll want to use a multiple core setup. One core fore each >> >> type >> >> of "document" >> >> >> >> http://wiki.apache.org/solr/CoreAdmin >> >> -- >> >> View this message in context: >> >> >> http://lucene.472066.n3.nabble.com/Storing-different-entities-in-Solr-tp852299p852346.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> > >> >
Re: NPE error when extending DefaultSolrHighlighter
(10/06/01 6:45), Gerald wrote: I was looking at solr-386 and thought I would try to create a custom highlighter for something I was doing. I created a class that looks something like this: public class CustomOutputHighlighter extends DefaultSolrHighlighter { @Override public NamedList doHighlighting(DocList docs, Query query, SolrQueryRequest req, String[] defaultFields) throws IOException { NamedList highlightedValues = super.doHighlighting(docs, query, req, defaultFields); // do more stuff here return highlightedValues } } and have replaced the line in my solrconfig xml so that it looks something like this: and left all the existing default highlighting parameters as-is The code compiles with no problem, and should simply perform the normal highlighting (since all I am doing is calling the original doHighlighting code and returning the results). However, when I start Solr, I get an NPE error: Try to put a constructor that has an argument SolrCore: public CustomOutputHighlighter(SolrCore core){ super(core); } Koji -- http://www.rondhuit.com/en/
Re: Luke browser does not show non-String Solr fields?
The id field has type "long" in schema.xml. In Luke, they are shown as "hex dump". When viewing a doc (returned by *:*), I pick the ID field and press the "Show" button, Luke pops up a dialog that allows me to change the "Show Content As" value. When I choose "Number", I get an error message: "Some values could not be properly represented in this format. They are marked in grey and presented as a hex dump." So it seems like Luke does not understand Solr's long type. This is not a native Lucene type? On Mon, May 31, 2010 at 9:52 AM, Chris Hostetter wrote: > > : 1. Queries like "id:123" which work fine in /solr/admin web interface but > : returns nothing in Luke. Query "*:*" returns all records fine in Luke. I > : expect Luke returns the same result as /solr/admin since it's essentially > : a Lucene query? > > you haven't told us what fieldtype you are using for the "id" field -- but > i'm going to go out on a limb and guess it's TrieIntFieldType (or possibly > a SortedIntFieldType) ... those field types encode their values in such a > way that they sort lexigraphicaly and produce faster range queries -- if > Luke doesn't kow about that special encoding, it can search on them (or > even display the terms properly) > > Luke has a "view terms" feature right? ... look at the raw terms in your > "id" ifeld and i bet you'll see they look nothing like numbers -- and > that's why you can search on them as numbers in Luke > > (when you serach on them in Solr, SOlr knows about your schema, and knows > about your field types, and can do the proper encoding/decoding) > > > > -Hoss > >
Re: newbie question on how to batch commit documents
Add commit after the loop. I would advise to use commit in a separate thread. I do keep separate timer thread, where every minute I will do commit and at the end of every day I will optimize the index. Regards Aditya www.findbestopensource.com On Tue, Jun 1, 2010 at 2:57 AM, Steve Kuo wrote: > I have a newbie question on what is the best way to batch add/commit a > large > collection of document data via solrj. My first attempt was to write a > multi-threaded application that did following. > > Collection docs = new ArrayList(); > for (Widget w : widges) { >doc.addField("id", w.getId()); >doc.addField("name", w.getName()); > doc.addField("price", w.getPrice()); >doc.addField("category", w.getCat()); >doc.addField("srcType", w.getSrcType()); >docs.add(doc); > >// commit docs to solr server >server.add(docs); >server.commit(); > } > > And I got this exception. > > rg.apache.solr.common.SolrException: > > Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later > > > Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later > >at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424) >at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) >at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) >at > org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86) > > The solrj wiki/documents seemed to indicate that because multiple threads > were calling SolrServer.commit() which in term called > CommonsHttpSolrServer.request() resulting in multiple searchers. My first > thought was to change the configs for autowarming. But after looking at > the > autowarm params, I am not sure what can be changed or perhaps a different > approach is recommened. > > class="solr.FastLRUCache" > size="512" > initialSize="512" > autowarmCount="0"/> > > class="solr.LRUCache" > size="512" > initialSize="512" > autowarmCount="0"/> > > class="solr.LRUCache" > size="512" > initialSize="512" > autowarmCount="0"/> > > Your help is much appreciated. >