any docs on using the GeoHashField?
looking at http://wiki.apache.org/solr/SpatialSearchDev I would think I could index a lat,lon pair into a GeoHashField (that works) and then retrieve the field value to see the computed geohash. however, that doesn't seem to work. If I index: 21.4,33.5 The retrieved value is not a hash, but approximately the same lat,lon: 21.4001527369,33.498472631 If I try to filter on a geohash, &fq=geos_test:sezcd* that works, so I guess the hash is stored internally. What am I missing - how can I retrieve the hash? -Peter -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Re: any docs on using the GeoHashField?
When I retrieve the value the lat/lon pair that comes out is not exactly the same as what I indexed, which made be think it was actually stored as the hash and then transformed back? Anyhow - I'm trying to understand the actual use case for the field as it exists - essentially you are saying I could query with a geohash and use data in this field type to do a distance-based filter from the lat,lon point corresponding to the geohash? -Peter On Thu, Sep 8, 2011 at 5:34 PM, Chris Hostetter wrote: > > : I would think I could index a lat,lon pair into a GeoHashField (that > : works) and then retrieve the field value to see the computed geohash. > ... > : What am I missing - how can I retrieve the hash? > > I don't think it's designed to work that way. > > GeoHashField provides GeoHash based search support for lat/lon values > through it's internal (indexed) representaiton -- much like TrieLongField > provides efficient range queries using trie encoding -- but the "stored" > value is still the lat/lon pair (just as a TrieLongField is still the long > value) > > If you want to store/retrive a raq GeoHash string, i think you have to > compute it yourself (or put the logic in an UpdateProcessor). > > org.apache.lucene.spatial.geohash.GeoHashUtils should take care of all the > heavy lifting for you. > > -Hoss > -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Re: Setting up Solr 3.4 example with Tomcat 7
I've seen a number of users fail to get Solr working correctly in combination with the Drupal client code when using the .deb installer so I have been strongly recommending against it personally. It's also a rather stale version of Solr, generally. -Peter On Sun, Oct 2, 2011 at 4:04 AM, Gora Mohanty wrote: > On Sun, Oct 2, 2011 at 12:22 PM, Stardrive Engineering > wrote: >> Thanks. Since Tomcat and Solr are running already Tomcat oriented samples to >> quickly get up to >> speed would be good to have next. > > I think that the issue is that Jetty is small, and easy to embed and get > running, which is why it is packaged along with Solr. > >> What do you think of this >> site, is it up to date and worth learning? The site seems to get cut off >> prematurely, are there more tutorials of this kind? >> >> http://synapticloop.com/tomes/solr/solr-tutorial/solr-from-whoa-to-go/ > > Just skimmed through this part, > http://synapticloop.com/tomes/solr/solr-tutorial/the-base-solr-install/ > and it looks reasonable. > > What operating system are you using? Some of them, e.g., Debian and > Ubuntu, have packages for Solr (though, probably version 1.4) > running in Tomcat. It might be easiest to look for such a package > for your OS. > > Regards, > Gora > -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Retrieving matched tokens and their payload?
A colleague came to be with a problem that intrigued me. I can see partly how to solve it with Solr, but looking for insight into solving the last step. The problem: 1) Start from a set of text transcriptions of videos where there is a timestamp associated with each word. 2) Index into Solr with analysis including stemming, so that a user can search for videos based on keywords. 3) When the user clicks into a single video in the search result, retrieve from the corresponding doc in Solr the timestamps of all words matching the keyword(s) (including stemming). So, obviously #1 and 2 are easy. As part of #2 it would seem one could use the DelimitedPayloadTokenFilterFactory to index the timestamp as a payload for each word. I don't want the payload to influence score, but my understanding is that by default it will not. Ok, so now for the harder part. For #3 it would seem I need something roughly like the highlighter - to return each matching word and the payload which is the timestamp. I'm not seeing any existing request handler or component that would do this. Is there an easy way to retrieve the indexed words (or analyzed tokens) and their payload? Thanks, -Peter -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Re: Lucene/Solr
Assuming you are using Drupal for the website, you can have Solr set up and integrated with Drupal in < 5 minutes for local development purposes. See: https://drupal.org/node/1358710 for a pre-configured download. -Peter On Mon, Dec 5, 2011 at 11:46 AM, Achebe, Ike, JCL wrote: > Hi, > My name is Ike Achebe and I am a Developer Analyst with the Johnson County > Library. I'm actually researching better and less expensive alternatives to > "Google Appliance Search " , which is currently our search engine. > Fortunately, I have come across a variety of blogs recommending Lucene/Solr > as one of the best if not the best open source search engine. > Fortunately, I have read a few articles and documentations about Solr, > however , I'm still in awe as to how to go about installing and integrating > this search engine. > Could you in simple terms intimate me on how to go about acquiring or > subscribing to solr? > Thank You. > Sincerely, > Ike Achebe -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Polish language support?
In IRC trying to help someone find Polish-language support for Solr. Seems lucene has nothing to offer? Found one stemmer that looks to be compatibly licensed in case someone wants to take a shot at incorporating it: http://www.getopt.org/stempel/ -Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
access control for spellcheck suggestions?
We have a content access control system that works well for the actual search results, but we see that the spellcheck suggestions include words that are not within the set of documents the current user is allowed to access. Does anyone have an approach to this problem for Solr 1.4.x? Anything new in Solr trunk to address this? Maybe spellcheck..key? Is there something in the Solr API that lets us control which spellcheck index a certain document goes into at index time, since one approach might be to at least obey some gross access control rules per user role by having multiple spellcheck indexes. -Peter -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 978-296-5247 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Re: access control for spellcheck suggestions?
Thanks for the info - I'll try out this patch. -Peter On Thu, Oct 7, 2010 at 10:43 AM, Dyer, James wrote: > Look at SOLR-2010 which has patches for 1.4.1 and trunk. It works with the > spellcheck "collate" functionality and ensures that collations are returned > only if they can result in hits if requeried (it tests each collation with > any "fq" you put on the original query). This would effectively prevent > users from seeing sensitive data in their spell suggestions. > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Peter Wolanin [mailto:peter.wola...@acquia.com] > Sent: Thursday, October 07, 2010 9:00 AM > To: solr-user@lucene.apache.org > Subject: access control for spellcheck suggestions? > > We have a content access control system that works well for the actual > search results, but we see that the spellcheck suggestions include > words that are not within the set of documents the current user is > allowed to access. Does anyone have an approach to this problem for > Solr 1.4.x? Anything new in Solr trunk to address this? Maybe > spellcheck..key? > > Is there something in the Solr API that lets us control which > spellcheck index a certain document goes into at index time, since one > approach might be to at least obey some gross access control rules per > user role by having multiple spellcheck indexes. > > -Peter > > -- > Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com : 978-296-5247 > > "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com"; > -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 978-296-5247 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
mergePolicy element format change in 3.6 vs 3.5?
Trying to maintain the Drupal integration module across multiple versions of 3.x, we've gotten a bug report suggesting that Solr 3.6 needs this change to solrconfig: - org.apache.lucene.index.LogByteSizeMergePolicy + I don't see this mentioned in the release notes - is the second format useable with 3.5, 3.4, etc? -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Re: mergePolicy element format change in 3.6 vs 3.5?
Ok, thanks for the info. As long as the second one works, we can just use that. I just verified that it works for 3.5 at least. -Peter On Fri, Apr 13, 2012 at 1:12 PM, Michael Ryan wrote: > It looks like the first format was removed in 3.6 as part of > https://issues.apache.org/jira/browse/SOLR-1052. The second format works > in all 3.x versions. > > -Michael > > -Original Message- > From: Peter Wolanin [mailto:peter.wola...@acquia.com] > Sent: Friday, April 13, 2012 12:32 PM > To: solr-user@lucene.apache.org > Subject: mergePolicy element format change in 3.6 vs 3.5? > > Trying to maintain the Drupal integration module across multiple versions > of 3.x, we've gotten a bug report suggesting that Solr 3.6 needs this > change to solrconfig: > > - > org.apache.lucene.index.LogByteSizeMergePolicy > + > > > I don't see this mentioned in the release notes - is the second format > useable with 3.5, 3.4, etc? > -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Re: Highlighting words with non-ascii chars
Does your servlet container have the URI encoding set correctly, e.g. URIEncoding="UTF-8" for tomcat6? http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config Older versions of Jetty use ISO-8859-1 as the default URI encoding, but jetty 6 should use UTF-8 as default: http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings -Peter On Sat, Apr 30, 2011 at 6:31 AM, Pavel Kukačka wrote: > Hello, > > I've hit a (probably trivial) roadblock I don't know how to overcome > with Solr 3.1: > I have a document with common fields (title, keywords, content) and I'm > trying to use highlighting. > With queries using ASCII characters there is no problem; it works > smoothly. However, > when I search using a czech word including non-ascii chars (like "slovíčko" > for example - > http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), > the document is found, but > the response doesn't contain the highlighted snippet in the highlighting node > - there is only an > empty node - like this: > ** > . > . > . > > > > > > > When searching for the other keyword ( > http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), > the resulting response is fine - like this: > > > > > slovíčko id="highlighting">slovo > > > > > > > Did anyone come accross this problem? > Cheers, > Pavel > > > -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 978-296-5247 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
what data type for geo fields?
Looking at the example schema: http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3/solr/example/solr/conf/schema.xml the solr.PointType field type uses double (is this just an example field, or used for geo search?), while the solr.LatLonType field uses tdouble and it's unclear how the geohash is translated into lat/lon values or if the geohash itself might typically be used as a copyfield and use just for matching a query on a geohash? Is there an advantage in terms of speed to using Trie fields for solr.LatLonType? I would assume so, e.g. for bbox operations. Thanks, Peter -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 978-296-5247 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Re: what data type for geo fields?
Thanks for the feedback. I'll have look more at how geohash works. Looking at the sample schema more closely, I see: So in fact "double" is also Trie, but just with precisionStep 0 in the example. -Peter On Wed, Jul 27, 2011 at 9:57 AM, Yonik Seeley wrote: > On Wed, Jul 27, 2011 at 9:01 AM, Peter Wolanin > wrote: >> Looking at the example schema: >> >> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3/solr/example/solr/conf/schema.xml >> >> the solr.PointType field type uses double (is this just an example >> field, or used for geo search?) > > While you could possibly use PointType for geo search, it doesn't have > good support for it (it's more of a general n-dimension point) > The LatLonType has all the geo support currently. > >>, while the solr.LatLonType field uses >> tdouble and it's unclear how the geohash is translated into lat/lon >> values or if the geohash itself might typically be used as a copyfield >> and use just for matching a query on a geohash? > > There's no geohash used in LatLonType > It is indexed as a lat and lon under the covers (using the suffix "_d") > >> Is there an advantage in terms of speed to using Trie fields for >> solr.LatLonType? > > Currently only for explicit range queries... like point:[10,10 TO 20,20] > >> I would assume so, e.g. for bbox operations. > > It's a bit of an implementation detail, but bbox doesn't currently use > range queries. > > -Yonik > http://www.lucidimagination.com > -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 978-296-5247 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Solr 4.0 - distributed updates without zookeeper?
Looking at how we could upgrade some of our infrastructure to Solr 4.0 - I would really like to take advantage of distributed updates to get NRT, but we want to keep our fixed master and slave server roles since we use different hardware appropriate to the different roles. Looking at the solr 4.0 distributed update code, it seems really hard-coded and bound to zookeeper. Is there a way to have a solr master distribute updates without using ZK, or a way to mock the ZK interface to provide a fixed cluster topography that will work when sending updates just to the master? To be clear, if the master goes doen I don't want a slave promoted, nor do I want most of the other SolrCloud features - we have already built out a system for managing groups of servers. Thanks, Peter
Re: Solr 4.0 - distributed updates without zookeeper?
Yes, basically I want to at least avoid leader election and the other dynamic behaviors. I don't have any experience with ZK, and a lot of "magic" behavior seems baked in now that's I'm concerned I'd need to dig into SK to debug or monitor what's really happening as we scale out. We also have a somewhat non-typical use case, of lots of small cores/indexes on the same server, rather large indexes that might need multiple shards. We have master servers that have persistent (but sometimes slower) storage, and slaves with faster non-persistent disk. My colleague noticed that their is a param to flag a server as eligible to be a shard leader, so I guess we could enable that for only the preferred master? I'm also having trouble understanding config handling from the docs. Even browsing the java code I don't see if Solr is creating the instance dirs, or somehow just linking to config files? It sounds as though if I create a core using core admin, it would get associated with a collection of the same name. -Peter On Mon, Nov 12, 2012 at 9:41 PM, Otis Gospodnetic wrote: > Hi Peter, > > Not sure I have the answer for you, but are you looking to avoid using ZK > for some reason? > Or are you OK with ZK per se, but just don't want any leader re-election > and any other dynamic/cloudy behaviour? > > Could you not simply treat 1 node as the "master" to which you send all > your updates and let SolrCloud distribute that to the rest of the cluster? > Is your main/only worry around what happens if this 1 node that you > designated as the master goes down? What would you like to happen? You'd > like indexing to start failing, while the search functionality remains up? > > Otis > -- > Search Analytics - http://sematext.com/search-analytics/index.html > Performance Monitoring - http://sematext.com/spm/index.html > > > On Sun, Nov 11, 2012 at 7:42 PM, Peter Wolanin > wrote: > >> Looking at how we could upgrade some of our infrastructure to Solr 4.0 >> - I would really like to take advantage of distributed updates to get >> NRT, but we want to keep our fixed master and slave server roles since >> we use different hardware appropriate to the different roles. >> >> Looking at the solr 4.0 distributed update code, it seems really >> hard-coded and bound to zookeeper. Is there a way to have a solr >> master distribute updates without using ZK, or a way to mock the ZK >> interface to provide a fixed cluster topography that will work when >> sending updates just to the master? >> >> To be clear, if the master goes doen I don't want a slave promoted, >> nor do I want most of the other SolrCloud features - we have already >> built out a system for managing groups of servers. >> >> Thanks, >> >> Peter >> -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Re: Solr 4.0 - distributed updates without zookeeper?
So, from looking at the code and talking to some of the Lucid guys today, it seems like there is no good way (currently) to control the shard leader selection, or even to "fail back" if the preferred leader server comes back up. We currently let indexing fail if the one master goes down, but adding HA there would be helpful in some cases. -Peter On Tue, Nov 13, 2012 at 9:12 PM, Peter Wolanin wrote: > Yes, basically I want to at least avoid leader election and the other > dynamic behaviors. I don't have any experience with ZK, and a lot of > "magic" behavior seems baked in now that's I'm concerned I'd need to > dig into SK to debug or monitor what's really happening as we scale > out. > > We also have a somewhat non-typical use case, of lots of small > cores/indexes on the same server, rather large indexes that might need > multiple shards. > > We have master servers that have persistent (but sometimes slower) > storage, and slaves with faster non-persistent disk. > > My colleague noticed that their is a param to flag a server as > eligible to be a shard leader, so I guess we could enable that for > only the preferred master? > > I'm also having trouble understanding config handling from the docs. > Even browsing the java code I don't see if Solr is creating the > instance dirs, or somehow just linking to config files? It sounds as > though if I create a core using core admin, it would get associated > with a collection of the same name. > > -Peter > > On Mon, Nov 12, 2012 at 9:41 PM, Otis Gospodnetic > wrote: >> Hi Peter, >> >> Not sure I have the answer for you, but are you looking to avoid using ZK >> for some reason? >> Or are you OK with ZK per se, but just don't want any leader re-election >> and any other dynamic/cloudy behaviour? >> >> Could you not simply treat 1 node as the "master" to which you send all >> your updates and let SolrCloud distribute that to the rest of the cluster? >> Is your main/only worry around what happens if this 1 node that you >> designated as the master goes down? What would you like to happen? You'd >> like indexing to start failing, while the search functionality remains up? >> >> Otis >> -- >> Search Analytics - http://sematext.com/search-analytics/index.html >> Performance Monitoring - http://sematext.com/spm/index.html >> >> >> On Sun, Nov 11, 2012 at 7:42 PM, Peter Wolanin >> wrote: >> >>> Looking at how we could upgrade some of our infrastructure to Solr 4.0 >>> - I would really like to take advantage of distributed updates to get >>> NRT, but we want to keep our fixed master and slave server roles since >>> we use different hardware appropriate to the different roles. >>> >>> Looking at the solr 4.0 distributed update code, it seems really >>> hard-coded and bound to zookeeper. Is there a way to have a solr >>> master distribute updates without using ZK, or a way to mock the ZK >>> interface to provide a fixed cluster topography that will work when >>> sending updates just to the master? >>> >>> To be clear, if the master goes doen I don't want a slave promoted, >>> nor do I want most of the other SolrCloud features - we have already >>> built out a system for managing groups of servers. >>> >>> Thanks, >>> >>> Peter >>> > > > > -- > Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com : 781-313-8322 > > "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com"; -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
tika 0.4?
Sadly, I had to muis the meetup in NYC, but looking over the slides (http://files.meetup.com/1482573/YonikSeeley_NYCMeetup_solr14_features.pdf) I see: Solr Cell: Integrates Apache Tika (v0.4) into Solr My current checkout of solr still has tika 0.3, and I don't see a jira issue for updating to 0.4. Is this something that's going to be in Solr 1.4 for sure? -Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: server won't start using configs from Drupal
Looks like we better update our schema for the Drupal module - what rev of Solr incorporates this change? -Peter On Fri, Jul 24, 2009 at 8:38 AM, Koji Sekiguchi wrote: > David, > > Try to change solr.CharStreamAwareWhitespaceTokenizerFactory to > solr.WhitespaceTokenizerFactory > in your schema.xml and reboot Solr. > > Koji > > > david wrote: >> >> >> Otis Gospodnetic wrote: >>> >>> I think the problem is CharStreamAwareWhitespaceTokenizerFactory, which >>> used to live in Solr (when Drupal schema.xml for Solr was made), but has >>> since moved to Lucene. I'm half guessing. :) >>> >>> Otis >>> -- >> >> Thanks unfortunately I have no idea about Java. Do you know when that >> change was made? >> >> regards, >> >> David. >> >> >>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls >>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >>> >>> >>> >>> - Original Message From: david To: solr-user@lucene.apache.org Sent: Thursday, July 23, 2009 9:59:53 PM Subject: server won't start using configs from Drupal I've downloaded solr-2009-07-21.tgz and followed the instructions at http://drupal.org/node/343467 including retrieving the solrconfig.xml and schema.xml files from the Drupal apachesolr module. The server seems to start properly with the original solrconfig.xml and schema.xml files When I try to start up the server with the Drupal supplied files, I get errors on the command line, and a 500 error from the server. solrconfig.xml http://pastebin.com/m23d14a2 schema.xml http://pastebin.com/m2e79f304 output of http://localhost:8983/solr/admin/: http://pastebin.com/m410fa74d Following looks to me like the important bits, but I'm not a java coder, so I could easily be wrong. command line extract: 22/07/2009 5:58:54 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: analyzer without class or tokenizer & filter list (plus lots of WARN messages) extract from browser at http://localhost:8983/solr/admin/ org.apache.solr.common.SolrException: Unknown fieldtype 'text' specified on field title (snip lots of stuff) org.apache.solr.common.SolrException: analyzer without class or tokenizer & filter list (snip lots of stuff) org.apache.solr.common.SolrException: Error loading class 'solr.CharStreamAwareWhitespaceTokenizerFactory' (snip lots of stuff) Caused by: java.lang.ClassNotFoundException: solr.CharStreamAwareWhitespaceTokenizerFactory Nothing in apache logs... solr logs contain this: 127.0.0.1 - - [22/07/2009:08:01:10 +] "GET /solr/admin/ HTTP/1.1" 500 10292 Any help greatly appreciated. David. >>> >> > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: "standard" requestHandler components
I just copied this information to the wiki at http://wiki.apache.org/solr/SolrRequestHandler -Peter On Fri, Sep 11, 2009 at 7:43 PM, Jay Hill wrote: > RequestHandlers are configured in solrconfig.xml. If no components are > explicitly declared in the request handler config the the defaults are used. > They are: > - QueryComponent > - FacetComponent > - MoreLikeThisComponent > - HighlightComponent > - StatsComponent > - DebugComponent > > If you wanted to have a custom list of components (either omitting defaults > or adding custom) you can specify the components for a handler directly: > > query > facet > mlt > highlight > debug > someothercomponent > > > You can add components before or after the main ones like this: > > mycomponent > > > > myothercomponent > > > and that's how the spell check component can be added: > > spellcheck > > > Note that the a component (except the defaults) must be configured in > solrconfig.xml with the name used in the str element as well. > > Have a look at the solrconfig.xml in the example directory > (".../example/solr/conf/") for examples on how to set up the spellcheck > component, and on how the request handlers are configured. > > -Jay > http://www.lucidimagination.com > > > On Fri, Sep 11, 2009 at 3:04 PM, michael8 wrote: > >> >> Hi, >> >> I have a newbie question about the 'standard' requestHandler in >> solrconfig.xml. What I like to know is where is the config information for >> this requestHandler kept? When I go to http://localhost:8983/solr/admin, >> I >> see the following info, but am curious where are the supposedly 'chained' >> components (e.g. QueryComponent, FacetComponent, MoreLikeThisComponent) >> configured for this requestHandler. I see timing and process debug output >> from these components with "debugQuery=true", so somewhere these components >> must have been configured for this 'standard' requestHandler. >> >> name: standard >> class: org.apache.solr.handler.component.SearchHandler >> version: $Revision: 686274 $ >> description: Search using components: >> >> org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.DebugComponent, >> stats: handlerStart : 1252703405335 >> requests : 3 >> errors : 0 >> timeouts : 0 >> totalTime : 201 >> avgTimePerRequest : 67.0 >> avgRequestsPerSecond : 0.015179728 >> >> >> What I like to do from understanding this is to properly integrate >> spellcheck component into the standard requestHandler as suggested in a >> solr >> spellcheck example. >> >> Thanks for any info in advance. >> Michael >> -- >> View this message in context: >> http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25409075.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: dismax + wildcard
There are some open issues (not for 1.4 at this point) to make dismax more flexible or add wildcard handling, e.g: https://issues.apache.org/jira/browse/SOLR-756 https://issues.apache.org/jira/browse/SOLR-758 You might participate in those to try to get this in a future version and/or get a working patch for 1.4 -Peter On Wed, Nov 4, 2009 at 7:04 PM, Koji Sekiguchi wrote: > Jan Kammer wrote: >> >> Hi there, >> >> what is the best way to search all fields AND use wildcards? >> Somewhere I read that there are problems with this combination... (dismax >> + wildcard) >> > It's a feature of dismax. WildcardQuery cannot be used in dismax q > parameter. > > You can copy the "all fields" to a destination field by using > copyField, then search the destination field with wildcards > (without using dismax). > > Koji > > -- > http://www.rondhuit.com/en/ > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
any docs on solr.EdgeNGramFilterFactory?
This fairly recent blog post: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ describes the use of the solr.EdgeNGramFilterFactory as the tokenizer for the index. I don't see any mention of that tokenizer on the Solr wiki - is it just waiting to be added, or is there any other documentation in addition to the blog post? In particular, there was a thread last year about using an N-gram tokenizer to enable reasonable (if not ideal) searching of CJK text, so I'd be curious to know how people are configuring their schema (with this tokenizer?) for that use case. Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: any docs on solr.EdgeNGramFilterFactory?
So, this is the normal N-gram one? NGramTokenizerFactory Digging deeper - there are actualy CJK and Chinese tokenizers in the Solr codebase: http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html The CJK one uses the lucene CJKTokenizer http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html and there seems to be another one even that no one has wrapped into Solr: http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html So seems like the existing options are a little better than I thought, though it would be nice to have some docs on properly configuring these. -Peter On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic wrote: > Peter, > > For CJK and n-grams, I think you don't want the *Edge* n-grams, but just > n-grams. > Before you take the n-gram route, you may want to look at the smart Chinese > analyzer in Lucene contrib (I think it works only for Simplified Chinese) and > Sen (on java.net). I also spotted a Korean analyzer in the wild a few months > back. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Peter Wolanin >> To: solr-user@lucene.apache.org >> Sent: Tue, November 10, 2009 4:06:52 PM >> Subject: any docs on solr.EdgeNGramFilterFactory? >> >> This fairly recent blog post: >> >> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ >> >> describes the use of the solr.EdgeNGramFilterFactory as the tokenizer >> for the index. I don't see any mention of that tokenizer on the Solr >> wiki - is it just waiting to be added, or is there any other >> documentation in addition to the blog post? In particular, there was >> a thread last year about using an N-gram tokenizer to enable >> reasonable (if not ideal) searching of CJK text, so I'd be curious to >> know how people are configuring their schema (with this tokenizer?) >> for that use case. >> >> Thanks, >> >> Peter >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: any docs on solr.EdgeNGramFilterFactory?
It looks like the CJK one actually does 2-grams plus a little processing separate processing on latin text. That's kind of interesting - in general can I build a custom tokenizer from existing tokenizers that treats different parts of the input differently based on the utf-8 range of the characters? E.g. use a porter stemmer for stretches of Latin text and n-gram or something else for CJK? -Peter On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic wrote: > Yes, that's the n-gram one. I believe the existing CJK one in Lucene is > really just an n-gram tokenizer, so no different than the normal n-gram > tokenizer. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Peter Wolanin >> To: solr-user@lucene.apache.org >> Sent: Tue, November 10, 2009 7:34:37 PM >> Subject: Re: any docs on solr.EdgeNGramFilterFactory? >> >> So, this is the normal N-gram one? NGramTokenizerFactory >> >> Digging deeper - there are actualy CJK and Chinese tokenizers in the >> Solr codebase: >> >> http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html >> http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html >> >> The CJK one uses the lucene CJKTokenizer >> http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html >> >> and there seems to be another one even that no one has wrapped into Solr: >> http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html >> >> So seems like the existing options are a little better than I thought, >> though it would be nice to have some docs on properly configuring >> these. >> >> -Peter >> >> On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic >> wrote: >> > Peter, >> > >> > For CJK and n-grams, I think you don't want the *Edge* n-grams, but just >> n-grams. >> > Before you take the n-gram route, you may want to look at the smart Chinese >> analyzer in Lucene contrib (I think it works only for Simplified Chinese) and >> Sen (on java.net). I also spotted a Korean analyzer in the wild a few months >> back. >> > >> > Otis >> > -- >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> > >> > >> > >> > - Original Message >> >> From: Peter Wolanin >> >> To: solr-user@lucene.apache.org >> >> Sent: Tue, November 10, 2009 4:06:52 PM >> >> Subject: any docs on solr.EdgeNGramFilterFactory? >> >> >> >> This fairly recent blog post: >> >> >> >> >> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ >> >> >> >> describes the use of the solr.EdgeNGramFilterFactory as the tokenizer >> >> for the index. I don't see any mention of that tokenizer on the Solr >> >> wiki - is it just waiting to be added, or is there any other >> >> documentation in addition to the blog post? In particular, there was >> >> a thread last year about using an N-gram tokenizer to enable >> >> reasonable (if not ideal) searching of CJK text, so I'd be curious to >> >> know how people are configuring their schema (with this tokenizer?) >> >> for that use case. >> >> >> >> Thanks, >> >> >> >> Peter >> >> >> >> -- >> >> Peter M. Wolanin, Ph.D. >> >> Momentum Specialist, Acquia. Inc. >> >> peter.wola...@acquia.com >> > >> > >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: any docs on solr.EdgeNGramFilterFactory?
Thanks for the link - there doesn't seem a be a fix version specified, so I guess this will not officially ship with lucene 2.9? -Peter On Wed, Nov 11, 2009 at 10:36 PM, Robert Muir wrote: > Peter, here is a project that does this: > http://issues.apache.org/jira/browse/LUCENE-1488 > > >> That's kind of interesting - in general can I build a custom tokenizer >> from existing tokenizers that treats different parts of the input >> differently based on the utf-8 range of the characters? E.g. use a >> porter stemmer for stretches of Latin text and n-gram or something >> else for CJK? >> >> -Peter >> >> On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic >> wrote: >> > Yes, that's the n-gram one. I believe the existing CJK one in Lucene is >> really just an n-gram tokenizer, so no different than the normal n-gram >> tokenizer. >> > >> > Otis >> > -- >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> > >> > >> > >> > - Original Message >> >> From: Peter Wolanin >> >> To: solr-user@lucene.apache.org >> >> Sent: Tue, November 10, 2009 7:34:37 PM >> >> Subject: Re: any docs on solr.EdgeNGramFilterFactory? >> >> >> >> So, this is the normal N-gram one? NGramTokenizerFactory >> >> >> >> Digging deeper - there are actualy CJK and Chinese tokenizers in the >> >> Solr codebase: >> >> >> >> >> http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html >> >> >> http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html >> >> >> >> The CJK one uses the lucene CJKTokenizer >> >> >> http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html >> >> >> >> and there seems to be another one even that no one has wrapped into >> Solr: >> >> >> http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html >> >> >> >> So seems like the existing options are a little better than I thought, >> >> though it would be nice to have some docs on properly configuring >> >> these. >> >> >> >> -Peter >> >> >> >> On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic >> >> wrote: >> >> > Peter, >> >> > >> >> > For CJK and n-grams, I think you don't want the *Edge* n-grams, but >> just >> >> n-grams. >> >> > Before you take the n-gram route, you may want to look at the smart >> Chinese >> >> analyzer in Lucene contrib (I think it works only for Simplified >> Chinese) and >> >> Sen (on java.net). I also spotted a Korean analyzer in the wild a few >> months >> >> back. >> >> > >> >> > Otis >> >> > -- >> >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> >> > >> >> > >> >> > >> >> > - Original Message >> >> >> From: Peter Wolanin >> >> >> To: solr-user@lucene.apache.org >> >> >> Sent: Tue, November 10, 2009 4:06:52 PM >> >> >> Subject: any docs on solr.EdgeNGramFilterFactory? >> >> >> >> >> >> This fairly recent blog post: >> >> >> >> >> >> >> >> >> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ >> >> >> >> >> >> describes the use of the solr.EdgeNGramFilterFactory as the tokenizer >> >> >> for the index. I don't see any mention of that tokenizer on the Solr >> >> >> wiki - is it just waiting to be added, or is there any other >> >> >> documentation in addition to the blog post? In particular, there was >> >> >> a thread last year about using an N-gram tokenizer to enable >> >> >> reasonable (if not ideal) searching of CJK text, so I'd be curious to >> >> >> know how people are configuring their schema (with this tokenizer?) >> >> >> for that use case. >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Peter >> >> >> >> >> >> -- >> >> >> Peter M. Wolanin, Ph.D. >> >> >> Momentum Specialist, Acquia. Inc. >> >> >> peter.wola...@acquia.com >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> Peter M. Wolanin, Ph.D. >> >> Momentum Specialist, Acquia. Inc. >> >> peter.wola...@acquia.com >> > >> > >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com >> > > > > > -- > Robert Muir > rcm...@gmail.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
changes to highlighting config or syntax in 1.4?
I'm testing out the final release of Solr 1.4 as compared to the build I have been using from around June. I'm using hte dismax handler for searches. I'm finding that highlighting is completely broken as compared to previously. Much more text is returned than it should for each string in , but the search words are never highlighted in that response. Setting usePhraseHighlighter=false makes no difference. Any pointers appreciated. -Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: changes to highlighting config or syntax in 1.4?
Apparently one of my conf files was broken - odd that I didn't see any exceptions. Anyhow - excuse my haste, I don't see the problem now. -Peter On Fri, Nov 13, 2009 at 11:06 PM, Peter Wolanin wrote: > I'm testing out the final release of Solr 1.4 as compared to the build > I have been using from around June. > > I'm using hte dismax handler for searches. I'm finding that > highlighting is completely broken as compared to previously. Much > more text is returned than it should for each string in name="highlighting">, but the search words are never highlighted in > that response. Setting usePhraseHighlighter=false makes no > difference. > > Any pointers appreciated. > > -Peter > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Newbie Solr questions
Take a look at the example schema - you can have dynamic fields that are used based on wildcard matching to the field name if a field doesn't mtach the name of an existing field. -Peter On Sun, Nov 15, 2009 at 10:50 AM, yz5od2 wrote: > Thanks for the reply: > > I follow the schema.xml concept, but what if my requirement is more dynamic > in nature? I.E. I would like my developers to be able to annotate a POJO and > submit it to the Solr server (embedded) to be indexed according to public > properties OR annotations. Is that possible? > > If that is not possible, can I programatically define documents and fields > (and the field options) in straight Java? I.E. in pseudo code below... > > // this is made up but this is what I would like to be able to do > SolrDoc document = new SolrDoc(); > SolrField field = new SolrField() > field.isIndexed=true; > field.isStored=true; > field.name = 'myField' > > field.value = myPOJO.getValue(); > > solrServer.index(document); > > > > > > On Nov 15, 2009, at 12:50 AM, Avlesh Singh wrote: > >>> >>> a) Since Solr is built on top of lucene, using SolrJ, can I still >>> directly >>> create custom documents, specify the field specifics etc (indexed, stored >>> etc) and then map POJOs to those documents, simular to just using the >>> straight lucene API? >>> >>> b) I took a quick look at the SolrJ javadocs but did not see anything in >>> there that allowed me to customize if a field is stored, indexed, not >>> indexed etc. How do I do that with SolrJ without having to go directly to >>> the lucene apis? >>> >>> c) The SolrJ beans package. By annotating a POJO with @Field, how exactly >>> does SolrJ treat that field? Indexed/stored, or just indexed? Is there >>> any >>> other way to control this? >>> >> The answer to all your questions above is the magical file called >> schema.xml. For more read here - http://wiki.apache.org/solr/SchemaXml. >> SolrJ is simply a java client to access (read and update from) the solr >> server. >> >> c) If I create a custom index outside of Solr using straight lucene, is it >>> >>> easy to import a pre-exisiting lucene index into a Solr Server? >>> >> As long as the Lucene index matches the definitions in your schema you can >> use the same index. The data however needs to copied into a predictable >> location inside SOLR_HOME. >> >> Cheers >> Avlesh >> >> On Sun, Nov 15, 2009 at 9:26 AM, yz5od2 >> wrote: >> >>> Hi, >>> I am new to Solr but fairly advanced with lucene. >>> >>> In the past I have created custom Lucene search engines that indexed >>> objects in a Java application, so my background is coming from this >>> requirement >>> >>> a) Since Solr is built on top of lucene, using SolrJ, can I still >>> directly >>> create custom documents, specify the field specifics etc (indexed, stored >>> etc) and then map POJOs to those documents, simular to just using the >>> straight lucene API? >>> >>> b) I took a quick look at the SolrJ javadocs but did not see anything in >>> there that allowed me to customize if a field is stored, indexed, not >>> indexed etc. How do I do that with SolrJ without having to go directly to >>> the lucene apis? >>> >>> c) The SolrJ beans package. By annotating a POJO with @Field, how exactly >>> does SolrJ treat that field? Indexed/stored, or just indexed? Is there >>> any >>> other way to control this? >>> >>> c) If I create a custom index outside of Solr using straight lucene, is >>> it >>> easy to import a pre-exisiting lucene index into a Solr Server? >>> >>> thanks! >>> > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
using Xinclude with multi-core
I'm trying to take advantage of the Solr 1.4 Xinclude feature to include a different xml fragment (e.g. a different analyzer chain in schema.xml) for each core in a multi-core setup. When the Xinclude operates on a relative path, it seems to NOT be acting relative to the xml file with the Xinlclude statement. Using the jetty example, it looks for a file in example/. Is this a bug in the way Solr invokes Xinclude? If not, is there a variable that contains the instanceDir that can be used? ${solr.instanceDir} or ${solr/instanceDir} DOMUtil.substituteProperties(doc, loader.getCoreProperties()); I see that I could potentially specify solrcore.properties, http://wiki.apache.org/solr/SolrConfigXml#System_property_substitutionin order to determine the correct based path, but this seems overly complicated in terms of wht the useual use case would be for Xinclude? -Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
is it possible to use Xinclude in schema.xml?
I'm trying to determine if it's possible to use Xinclude to (for example) have a base schema file and then substitute various pieces. It seems that the schema fieldTypes throw exceptions if there is an unexpected attribute? SEVERE: java.lang.RuntimeException: schema fieldtype text(org.apache.solr.schema.TextField) invalid arguments:{xml:base=solr/core2/conf/text-analyzer.xml} This is what I'm trying to do (details of the analyzer chain omitted - nothing unusual) - so the error occurs when the external xml file is actually included: http://www.w3.org/2001/XInclude"; > ... ... Where (for testing) the text-analyzer.xml file just looks like the fallback: ... ... -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: is it possible to use Xinclude in schema.xml?
Follow-up: it seems the schema parser doesn't barf if you use xinclude with a single analyzer element, but so far seems like it's impossible for a field type. So this seems to work: ... ... On Sat, Nov 28, 2009 at 1:40 PM, Peter Wolanin wrote: > I'm trying to determine if it's possible to use Xinclude to (for > example) have a base schema file and then substitute various pieces. > > It seems that the schema fieldTypes throw exceptions if there is an > unexpected attribute? > > SEVERE: java.lang.RuntimeException: schema fieldtype > text(org.apache.solr.schema.TextField) invalid > arguments:{xml:base=solr/core2/conf/text-analyzer.xml} > > This is what I'm trying to do (details of the analyzer chain omitted - > nothing unusual) - so the error occurs when the external xml file is > actually included: > > xmlns:xi="http://www.w3.org/2001/XInclude"; > > > > > ... > > > ... > > > > > > > Where (for testing) the text-analyzer.xml file just looks like the fallback: > > > > > > ... > > > ... > > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
boosting certain terms within one field?
I've recently started working on the Drupal integration module for SOLR, and we are looking for suggestions for how to address this question: how do we boost the importance of a subset of terms within a field. For example, we are using the standard request handler for queries, and the default field for keyword searches is a concatentation of the title, body, taxonomy terms, etc. One "hackish" way I can imagine is that terms we want to boost (for example the title, or text inside h2 tags) could be concatenated on multiple times. Would this be effective and reasonable? It seems like the alternative is to try to switch to using the dismax handler, storing the terms that we desire to have different boosts into different fields, all of which are in the list of query fields? Thanks in advance for your suggestions. -Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. [EMAIL PROTECTED]
Re: boosting certain terms within one field?
Hi Grant, Thanks for your feedback. The major short-term downside to switching to dismax with multiple fields would be the required re-writing of our current PHP code - especially our code to handle addition of facets fields to the q parameter. From reading about dismax, seems we would need to instead use fq to limit the search results to those matching a specific facet value. Best, Peter On Sun, Nov 30, 2008 at 8:43 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Hi Peter, > > What are the downsides to your last alternative approach below? That seems > like the simplest approach and should work as long as the terms within those > fields do not need to be boosted separately. > > If you want to go the boosting terms route, this is handled via a thing > called Payloads in Lucene. Payloads are an array of bytes that are added > during indexing at the term level through the analysis process. To do this > in Solr, you would need to write your own TokenFilter that adds payloads as > needed. Then, during search, you can take advantage of these payloads by > using the BoostingTermQuery from Lucene. The downside to all of this is > Solr doesn't currently support it, so you would be coding it up yourself. > I'm sure, though, that if you were to start a patch on it, there would be > others who are interested. > > Note, on the payloads. The biggest sticking point, I think, is coming up w/ > an efficient way of encoding the byte array and putting it into the XML > format, such that one can send in payloads when indexing. It's not > particularly hard, but no one has done it yet. > > -Grant > > > On Nov 29, 2008, at 10:45 PM, Peter Wolanin wrote: > >> I've recently started working on the Drupal integration module for >> SOLR, and we are looking for suggestions for how to address this >> question: how do we boost the importance of a subset of terms within >> a field. >> >> For example, we are using the standard request handler for queries, >> and the default field for keyword searches is a concatentation of the >> title, body, taxonomy terms, etc. >> >> One "hackish" way I can imagine is that terms we want to boost (for >> example the title, or text inside h2 tags) could be concatenated on >> multiple times. Would this be effective and reasonable? >> >> It seems like the alternative is to try to switch to using the dismax >> handler, storing the terms that we desire to have different boosts >> into different fields, all of which are in the list of query fields? >> >> Thanks in advance for your suggestions. >> >> -Peter >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> [EMAIL PROTECTED] > > -- > Grant Ingersoll > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > -- -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. [EMAIL PROTECTED]
Re: problem index accented character with release version of solr 1.3
We have been having this problem also. and have resorted to just stripping control characters before sending the text for indexing: preg_replace('@[\x00-\x08\x0B\x0C\x0E-\x1F]@', '', $text); -Peter On Tue, Dec 9, 2008 at 7:59 AM, knietzie <[EMAIL PROTECTED]> wrote: > > hi joshua, > > i'm having the same problem as yours. > just curious, have you found any fix for this? > > thnks > > > Joshua Reedy wrote: >> >> I have been using a stable dev version of 1.3 for a few months. >> Today, I began testing the final release version, and I encountered a >> strange problem. >> The only thing that has changed in my setup is the solr code (I didn't >> make any config change or change the schema). >> >> a document has a text field with a value that contains: >> "Andr\005é 3000" >> >> Indexing the document by itself or as part of a batch, produces the >> following error: >> Sep 17, 2008 5:00:27 PM org.apache.solr.common.SolrException log >> SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal >> character ((CTRL-CHAR, code 5)) >> at [row,col {unknown-source}]: [5,205] >> at >> com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) >> at >> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) >> at >> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) >> at >> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) >> at >> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) >> at >> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) >> at >> org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327) >> at >> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195) >> at >> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) >> at >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >> at >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >> at >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >> at >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) >> at >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >> at >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >> at >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >> at >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) >> at >> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) >> at >> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) >> at >> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) >> at java.lang.Thread.run(Thread.java:595) >> >> The latest version of the solr doesn't seem to like control characters >> (\005, in this case), but previous versions handled them (or at least >> ignored them). >> >> These characters shouldn't be in my documents, so there's a bug on my >> end to track down. However, I'm wondering if this was an expected >> change or an unintended consequence of recent work . . . >> >> >> >> >> -- >> - >> Be who you are and say what you feel, >> because those who mind don't matter and >> those who matter don't mind. >> -- Dr. Seuss >> >> > > -- > View this message in context: > http://www.nabble.com/problem-index-accented-character-with-release-version-of-solr-1.3-tp19544660p20914244.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. [EMAIL PROTECTED]
does this break Solr? dynamicField name="*" type="ignored"
I'm seeing a weird effect with a '*' field. In the example schema.xml, there is a commented out sample: We have this un-commented, and in the schema browser via the admin interface I see that all non-dynamic fields get a type of "ignored". I see this in the Solr admin interface: Field: uid Dynamically Created From Pattern: * Field Type: ignored though the field definition is: Is this a bug in the admin interface, or a problem with using this '*' in the schema? Thanks, Peter -- -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: does this break Solr? dynamicField name="*" type="ignored"
created issue: https://issues.apache.org/jira/browse/SOLR-929 -Peter On Thu, Dec 18, 2008 at 3:32 PM, Yonik Seeley wrote: > Looks like it's a bug in the schema browser (i.e. just this display, > no the inner workings of Solr). > Could you open a JIRA issue for this? > > -Yonik > > > On Thu, Dec 18, 2008 at 3:20 PM, Peter Wolanin > wrote: >> I'm seeing a weird effect with a '*' field. In the example >> schema.xml, there is a commented out sample: >> >> >> >> >> We have this un-commented, and in the schema browser via the admin >> interface I see that all non-dynamic fields get a type of "ignored". >> >> I see this in the Solr admin interface: >> >> Field: uid >> Dynamically Created From Pattern: * >> Field Type: ignored >> >> though the field definition is: >> >> >> >> Is this a bug in the admin interface, or a problem with using this '*' >> in the schema? >> >> Thanks, >> >> Peter >> >> -- >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com >> > -- -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: How can i omit the illegal characters,when indexing the docs?
For documents we are indexing via the PHP client, we are currently using the following regex to strip control characters from each field that might contain them: function apachesolr_strip_ctl_chars($text) { // See: http://w3.org/International/questions/qa-forms-utf-8.html // Printable utf-8 does not include any of these chars below x7F return preg_replace('@[\x00-\x08\x0B\x0C\x0E-\x1F]@', ' ', $text); } -Peter On Fri, Jan 2, 2009 at 3:41 AM, RaghavPrabhu wrote: > > Hi all, > > I am extracting the word document using Apache POI,then generate the xml > doc,which is the document that i want to indexing in the solr. The problem > which i faced was,it thrown the error in the browser is shown below. > > HTTP Status 500 - Illegal character ((CTRL-CHAR, code 8)) at [row,col > {unknown-source}]: [1,1592] > com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, > code 8)) at [row,col {unknown-source}]: [1,1592] at > com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at > com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:660) at > com.ctc.wstx.sr.BasicStreamReader.readCDataPrimary(BasicStreamReader.java:4240) > at > com.ctc.wstx.sr.BasicStreamReader.nextFromTreeCommentOrCData(BasicStreamReader.java:3280) > at > com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2824) > at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) at > org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:321) > at > org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195) > at > org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) > at > org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:179) > at > org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:84) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:157) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:262) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:446) > at java.lang.Thread.run(Thread.java:619) > > The extracted word document contains the special character ( its like a > square box).How can i omit those characters,when i submit the document to > the solr. > > > Thanks in advance, > Regards > Prabhu.K > > > -- > View this message in context: > http://www.nabble.com/How-can-i-omit-the-illegal-characters%2Cwhen-indexing-the-docs--tp21249084p21249084.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
can the TermsComponent be used in combination with fq?
We have been trying to figure out how to construct, for example, a directory page with an overview of available facets for several fields. Looking at the issue and wiki http://wiki.apache.org/solr/TermsComponent https://issues.apache.org/jira/browse/SOLR-877 It would seem like this component would be useful for this. However - we often require that some filtering be applied to search results based on which user is searching (e.g. public vs. private content). Is it possible to apply filtering here, or will we need to do something like running a q=*:*&fq=status:1 and then getting facets? Note - also - the wiki page references a tutorial including this /autocomplete path, but I cannot ifnd any trace of such. I was able to get results similar to the examples on the wiki page by adding the following to solrconfig.xml: explicit terms Is this the right way to activate this? Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Finding total range of dates for date faceting
It *looks* as though Solr supports returning the results of arbitrary calculations: http://wiki.apache.org/solr/SolrQuerySyntax However, I am so far unable to get any example working except in the context of a dismax bf. It seems like one ought to be able to write a query to return the doc matching the max OR the min of a particular field. -Peter On Tue, Feb 17, 2009 at 5:33 AM, Jacob Singh wrote: > Hi, > > I'm trying to write some code to build a facet list for a date field, > but I don't know what the first and last available dates are. I would > adjust the gap param accordingly. If there is a 10yr stretch between > min(date) and max(date) I'd want to facet by year. If it is a 1 month > gap, I'd want to facet by day. > > Is there a way to do this? > > Thanks, > Jacob > > -- > > +1 510 277-0891 (o) > +91 33 7458 (m) > > web: http://pajamadesign.com > > Skype: pajamadesign > Yahoo: jacobsingh > AIM: jacobsingh > gTalk: jacobsi...@gmail.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Store content out of solr
Sure, we are doing essentially that with our Drupal integration module - each search result contains a link to the "real" content, which is stored in MySQL, etc, and presented via the Drupal CMS. http://drupal.org/project/apachesolr -Peter On Tue, Feb 17, 2009 at 11:57 AM, roberto wrote: > Hello, > > We are indexing information from diferent sources so we would like to > centralize the information content so i can retrieve using the ID > provided buy solr? > > Does anyone did something like this, and have some advices ? I > thinking in store the information into a database like mysql ? > > Thanks, > -- > "Without love, we are birds with broken wings." > Morrie > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
make the suggested ignored field multi-valued?
In the example schema.xml, there is a field type 'ignored' which it is suggested can be used with the wildcard * to prevent errors when a document contains fields that don't match any in the schema. My experience recently in using this is that it does not worked as desired if the unmatched field is multiValued, and that that suggested * field should be designated multiValued: https://issues.apache.org/jira/browse/SOLR-1022 Obviously this has no effect out of the box, since the field is commented out. -Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: why don't we have a forum for discussion?
If some stuff is asked over and over again, it would be great to grab some reasonable responses and add them to the wiki. I've edited it a few times when I've struggled with what's there and found something that wasn't covered or was out of date - even the best forum or mailing list will not replicate an organized and maintained doc site in terms of ready access to knowledge. -Peter 2009/2/18 Martin Lamothe : > E-mails wouldn't go away with a discussion forum as they have e-mail > notifications tooit could compliment this mailing list... some stuff is > asked over and over and over ... isn't it? With a forum, it would be > possible to say.. go see this post.. .or that thread.. etc... > > Multi-core could use it's own Topic > Scalling could use it's own too > Indexing > Optimizing Indexes > etc...
Suggested hardening of Solr schema.jsp admin interface
My colleague Paul opened this issue and supplied a patch and I commented on it regarding a potential security weakness in the admin interface: https://issues.apache.org/jira/browse/SOLR-1031 -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
What is the performance impact of a fq that matches all docs?
We are working on integration with the Drupal CMS, and so are writing code that carries out operations that might only be relevant for only a small subset of the sites/indexes that might use the integration module. In this regard, I'm wondering if adding to the query (using the dismax or mlt handlers) a fq that matches all documents would have any impact on performance? I gatehr that there is caching for the fq matches, but it seems liek that would still incur some overhead, especially for a large index? As a more concrete example, suppose each document has a string field that names the role of user that is allowed to see the content. e.g. 'public', 'registered', 'admin'. Most sites have only public content, but because our code is generic, we might add &fq=role:public to every query. What would the expected performance effect be compared to omitting that fq if, for example, we had a way to determine in advance that all site content matches 'public'. Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Error with highlighter and UTF-8 chars?
We are using Solr trunk (1.4) - currently " nightly exported - yonik - 2009-02-05 08:06:00" -Peter On Mon, Feb 23, 2009 at 8:07 AM, Koji Sekiguchi wrote: > Jacob, > > What Solr version are you using? There is a bug in SolrHighlighter of Solr > 1.3, > you may want to look at: > > https://issues.apache.org/jira/browse/SOLR-925 > https://issues.apache.org/jira/browse/LUCENE-1500 > > regards, > > Koji > > > Jacob Singh wrote: >> >> Hi, >> >> We ran into a weird one today. We have a document which is written in >> German and everytime we make a query which matches it, we get the >> following: >> >> java.lang.StringIndexOutOfBoundsException: String index out of range: 2822 >>at java.lang.String.substring(String.java:1935) >>at >> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274) >> >> >> >From source diving it looks like Lucene's highlighter is trying to >> subStr against an offset that is outside the bounds of the body field >> which it is highlighting against. Running a fq against the ID of the >> doucment returns it fine (because no highlighting is done) and I took >> the body and tried to cut the first 2822 chars and while it is near >> the end of the body, it is still in range. >> >> Here is the related code: >> >> startOffset = tokenGroup.matchStartOffset; >> endOffset = tokenGroup.matchEndOffset; >> tokenText = text.substring(startOffset, endOffset); >> >> >> This leads me to believe there is some problem with mb string encoding >> and Lucene's counting. >> >> Any ideas here? Tomcat is configured with UTF-8 btw. >> >> Best, >> Jacob >> >> >> > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Error with highlighter and UTF-8 chars?
Here you can see a manifestation of it when trying to highlight with ?q=daß − − − -Community" einfach nicht mehr wahrnimmt. Hätte mir am letzten Montag Nachmittag jemand gesagt, daß ich am Abend − recht, wenn er sagte, daß das wirklich wertvolle an Drupal schlichtweg seine (Entwickler- und Anwender-) − die Entstehungsgeschichte des Portals) auch dokumentiert worden, denn Ihr vermutet schon richtig, daß da You can see the "strong" tags each get offset one character more from where they are supposed to be. -Peter On Mon, Feb 23, 2009 at 8:24 AM, Peter Wolanin wrote: > We are using Solr trunk (1.4) - currently " nightly exported - yonik > - 2009-02-05 08:06:00" > > -Peter > > On Mon, Feb 23, 2009 at 8:07 AM, Koji Sekiguchi wrote: >> Jacob, >> >> What Solr version are you using? There is a bug in SolrHighlighter of Solr >> 1.3, >> you may want to look at: >> >> https://issues.apache.org/jira/browse/SOLR-925 >> https://issues.apache.org/jira/browse/LUCENE-1500 >> >> regards, >> >> Koji >> >> >> Jacob Singh wrote: >>> >>> Hi, >>> >>> We ran into a weird one today. We have a document which is written in >>> German and everytime we make a query which matches it, we get the >>> following: >>> >>> java.lang.StringIndexOutOfBoundsException: String index out of range: 2822 >>> at java.lang.String.substring(String.java:1935) >>> at >>> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274) >>> >>> >>> >From source diving it looks like Lucene's highlighter is trying to >>> subStr against an offset that is outside the bounds of the body field >>> which it is highlighting against. Running a fq against the ID of the >>> doucment returns it fine (because no highlighting is done) and I took >>> the body and tried to cut the first 2822 chars and while it is near >>> the end of the body, it is still in range. >>> >>> Here is the related code: >>> >>> startOffset = tokenGroup.matchStartOffset; >>> endOffset = tokenGroup.matchEndOffset; >>> tokenText = text.substring(startOffset, endOffset); >>> >>> >>> This leads me to believe there is some problem with mb string encoding >>> and Lucene's counting. >>> >>> Any ideas here? Tomcat is configured with UTF-8 btw. >>> >>> Best, >>> Jacob >>> >>> >>> >> >> > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Error with highlighter and UTF-8 chars?
So - something in the highlighting code is counting bytes when it should be counting characters. Looks like a lucene bug, so I'm surprised others have not hit this before. Probably this is it: https://issues.apache.org/jira/browse/LUCENE-1500 -Peter On Tue, Feb 24, 2009 at 2:22 PM, Peter Wolanin wrote: > Here you can see a manifestation of it when trying to highlight with ?q=daß > > > − > > − > > − > > -Community" einfach nicht mehr wahrnimmt. > Hätte mir am letzten Montag Nachmittag jemand gesagt, daß > ich am Abend > > − > > recht, wenn er sagte, daß das wirklich wertvolle an > Drupal schlichtweg seine (Entwickler- und Anwender-) > > − > > die Entstehungsgeschichte des Portals) auch dokumentiert worden, denn > Ihr vermutet schon richtig, daß da > > > > > > > You can see the "strong" tags each get offset one character more from > where they are supposed to be. > > > -Peter > > > > On Mon, Feb 23, 2009 at 8:24 AM, Peter Wolanin > wrote: >> We are using Solr trunk (1.4) - currently " nightly exported - yonik >> - 2009-02-05 08:06:00" >> >> -Peter >> >> On Mon, Feb 23, 2009 at 8:07 AM, Koji Sekiguchi wrote: >>> Jacob, >>> >>> What Solr version are you using? There is a bug in SolrHighlighter of Solr >>> 1.3, >>> you may want to look at: >>> >>> https://issues.apache.org/jira/browse/SOLR-925 >>> https://issues.apache.org/jira/browse/LUCENE-1500 >>> >>> regards, >>> >>> Koji >>> >>> >>> Jacob Singh wrote: >>>> >>>> Hi, >>>> >>>> We ran into a weird one today. We have a document which is written in >>>> German and everytime we make a query which matches it, we get the >>>> following: >>>> >>>> java.lang.StringIndexOutOfBoundsException: String index out of range: 2822 >>>> at java.lang.String.substring(String.java:1935) >>>> at >>>> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274) >>>> >>>> >>>> >From source diving it looks like Lucene's highlighter is trying to >>>> subStr against an offset that is outside the bounds of the body field >>>> which it is highlighting against. Running a fq against the ID of the >>>> doucment returns it fine (because no highlighting is done) and I took >>>> the body and tried to cut the first 2822 chars and while it is near >>>> the end of the body, it is still in range. >>>> >>>> Here is the related code: >>>> >>>> startOffset = tokenGroup.matchStartOffset; >>>> endOffset = tokenGroup.matchEndOffset; >>>> tokenText = text.substring(startOffset, endOffset); >>>> >>>> >>>> This leads me to believe there is some problem with mb string encoding >>>> and Lucene's counting. >>>> >>>> Any ideas here? Tomcat is configured with UTF-8 btw. >>>> >>>> Best, >>>> Jacob >>>> >>>> >>>> >>> >>> >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com >> > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Error with highlighter and UTF-8 chars?
Actually, looking at the Lucene source and the trace: java.lang.StringIndexOutOfBoundsException: String index out of range: 2822 at java.lang.String.substring(String.java:1765) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:313) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) ... I see now that getBestTextFragments() takes in a token stream - and each token in this steam already has start/end positions set. So, the patch at LUCENE-1500 would mitigate the exception, but looks like the real bug is in Solr. -Peter On Tue, Feb 24, 2009 at 4:28 PM, Peter Wolanin wrote: > So - something in the highlighting code is counting bytes when it > should be counting characters. Looks like a lucene bug, so I'm > surprised others have not hit this before. Probably this is it: > https://issues.apache.org/jira/browse/LUCENE-1500 > > -Peter > > > On Tue, Feb 24, 2009 at 2:22 PM, Peter Wolanin > wrote: >> Here you can see a manifestation of it when trying to highlight with ?q=daß >> >> >> − >> >> − >> >> − >> >> -Community" einfach nicht mehr wahrnimmt. >> Hätte mir am letzten Montag Nachmittag jemand gesagt, daß >> ich am Abend >> >> − >> >> recht, wenn er sagte, daß das wirklich wertvolle an >> Drupal schlichtweg seine (Entwickler- und Anwender-) >> >> − >> >> die Entstehungsgeschichte des Portals) auch dokumentiert worden, denn >> Ihr vermutet schon richtig, daß da >> >> >> >> >> >> >> You can see the "strong" tags each get offset one character more from >> where they are supposed to be. >> >> >> -Peter >> >> >> >> On Mon, Feb 23, 2009 at 8:24 AM, Peter Wolanin >> wrote: >>> We are using Solr trunk (1.4) - currently " nightly exported - yonik >>> - 2009-02-05 08:06:00" >>> >>> -Peter >>> >>> On Mon, Feb 23, 2009 at 8:07 AM, Koji Sekiguchi wrote: >>>> Jacob, >>>> >>>> What Solr version are you using? There is a bug in SolrHighlighter of Solr >>>> 1.3, >>>> you may want to look at: >>>> >>>> https://issues.apache.org/jira/browse/SOLR-925 >>>> https://issues.apache.org/jira/browse/LUCENE-1500 >>>> >>>> regards, >>>> >>>> Koji >>>> >>>> >>>> Jacob Singh wrote: >>>>> >>>>> Hi, >>>>> >>>>> We ran into a weird one today. We have a document which is written in >>>>> German and everytime we make a query which matches it, we get the >>>>> following: >>>>> >>>>> java.lang.StringIndexOutOfBoundsException: String index out of range: 2822 >>>>> at java.lang.String.substring(String.java:1935) >>>>> at >>>>> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274) >>>>> >>>>> >>>>> >From source diving it looks like Lucene's highlighter is trying to >>>>> subStr against an offset that is outside the bounds of the body field >>>>> which it is highlighting against. Running a fq against the ID of the >>>>> doucment returns it fine (because no highlighting is done) and I took >>>>> the body and tried to cut the first 2822 chars and while it is near >>>>> the end of the body, it is still in range. >>>>> >>>>> Here is the related code: >>>>> >>>>> startOffset = tokenGroup.matchStartOffset; >>>>> endOffset = tokenGroup.matchEndOffset; >>>>> tokenText = text.substring(startOffset, endOffset); >>>>> >>>>> >>>>> This leads me to believe there is some problem with mb string encoding >>>>> and Lucene's counting. >>>>> >>>>> Any ideas here? Tomcat is configured with UTF-8 btw. >>>>> >>>>> Best, >>>>> Jacob >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> -- >>> Peter M. Wolanin, Ph.D. >>> Momentum Specialist, Acquia. Inc. >>> peter.wola...@acquia.com >>> >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com >> > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
up/down sides to using compound file format for index?
Trying to set up a server to host multiple Solr cores, we have run into the issue of too many open files a few times. The 2nd ed "Lucene in Action" book suggests using the compound file format to reduce the required number of files when having multiple indexes, but mentions a possible ~10% slow-down when indexing. Are there any other down-sides to this? Seems to work by just changing this line in solrconfig.xml: true -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Query Boosting using both BQ and BF
This doesn't seem to match what I'm seeing in terms of using bq - using any value > 0 increases the score. For example, with no bq: solr title,score,type 2.2 1.6885357 Building a killer search for Drupal wikipage 1.5547959 New Solr module available for testing story 1.408378 Check out the Solr project! story Versus with a bq < 1, the scores of matching docs are still increased compared to using no bq: type:story^0.5 on solr title,score,type 2.2 1.6885297 Building a killer search for Drupal wikipage 1.5585454 New Solr module available for testing story 1.4121282 Check out the Solr project! story On Sun, Mar 8, 2009 at 9:48 AM, Otis Gospodnetic wrote: > > Also note that the following is not doing what you want: > > -is_mp_parent_b:true^50.0 > > You want something like: > > is_mp_parent_b:true^0.20 > > for negative boosting use a boost that is less than 1.0. > > Otis-- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: "Dean Missikowski (Consultant), CLSA" >> To: solr-user@lucene.apache.org >> Sent: Sunday, March 8, 2009 3:30:28 AM >> Subject: RE: Query Boosting using both BQ and BF >> >> Some more experiments have helped me answer my own questions. >> >> > Q1. Can anyone confirm whether bf and bq can both >> > be used together in solrconfig.xml? >> yes >> >> > Q3. Can I have multiple bq parameters? If so, do I >> > space-separate them as a single bq, or provide >> > multiple BQs? >> Yes, multiple bq parameters work, space-separating multiple query terms >> in a single bq also works. >> >> Here's a snippet of my solrconfig.xml: >> >> published_date_d:[NOW-3MONTHS/DAY TO NOW/DAY+1DAY]^5.2 OR >> (published_date_d:[NOW-12MONTHS/DAY TO NOW/DAY+1DAY] AND >> report_type_id_i:1004)^10.0 OR (published_date_d:[NOW-6MONTHS/DAY TO >> NOW/DAY+1DAY] AND is_printed_b:true)^4.0 >> is_mp_parent_b:false^10.0 >> recip(rord(published_date_d),20,5000,5)^5.5 >> >> >> -Original Message- >> From: Dean Missikowski (Consultant), CLSA >> Sent: 08/03/2009 12:01 PM >> To: solr-user@lucene.apache.org >> Subject: Query Boosting using both BQ and BF >> >> Hi, >> >> I have a couple of questions related to query boosting using the dismax >> request handler. I'm using a recent 1.4 build (to take advantage of >> omitTf), and have also tried all of this with 1.3. >> >> To apply a query-time boost to the previous 3 months of documents in my >> index I use: >> >> published_date_d:[NOW-3MONTHS/DAY TO NOW/DAY+1DAY]^10.2 >> >> >> And, to provide a boosting that helps rank recently documents higher I >> use: >> recip(rord(published_date_d),20,5000,5)^5.5 >> >> This seems to be working well. >> >> But I have more boosting requirements. For example, I need to boost >> documents that are tagged as printed. So, I tried to add another bq >> parameter: >> >> is_printed_b:true^4.0 >> >> Also, tried to append this space-separated all in one bq parameter like >> this: >> published_date_d:[NOW-3MONTHS/DAY TO NOW/DAY+1DAY]^10.2 >> is_printed_b:true^4.0 >> >> Lastly, I need to apply a negative boost to documents of a certain type, >> so I use: >> -is_mp_parent_b:true^50.0 >> >> Not sure if it matters, but I have >> defaultOperator="AND"/> in schema.xml >> >> None of those variations return expected results (it's like the bq is >> being applied as a filter instead of just applying boosts). >> >> Q1. Can anyone confirm whether bf and bq can both be used together in >> solrconfig.xml? >> Q2. Is there a way I can do ths using only BF? How? >> Q3. Can I have multiple bq parameters? If so, do I space-separate them >> as a single bq, or provide multiple BQs? >> Q3. Am I formulating my BQs that use Boolean fields correctly? >> >> Any help or insights much appreciated, >> >> Thanks Dean >> >> CLSA CLEAN & GREEN: Please consider our environment before printing this >> email. >> The content of this communication is subject to CLSA Legal and Regulatory >> Notices. >> These can be viewed at https://www.clsa.com/disclaimer.html or sent to you >> upon >> request. > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: ExtractingRequestHandler and SolrRequestHandler issue
I had problems with this when trying to set this up with multiple cores - I had to set the shared lib as: in example/solr/solr.xml in order for it to find the jars in example/solr/lib -Peter On Wed, Apr 22, 2009 at 11:43 AM, Grant Ingersoll wrote: > > On Apr 20, 2009, at 12:46 PM, francisco treacy wrote: > >> Additionally, here's what I've got in example/lib: > > These need to be in your Solr home lib, not example/lib. I sometimes get > confused on this one, too, forgetting that I need to go down a few more > directories. The example/lib directory is where the Jetty stuff lives, > example/solr/lib is the lib where the plugins go. In fact, if you run "ant > example" from the top level (or contrib/extraction) it should place the JARs > in the right places for the example. > >> >> >> apache-solr-cell-nightly.jar bcmail-jdk14-132.jar >> commons-lang-2.1.jar icu4j-3.8.jar log4j-1.2.14.jar >> poi-3.5-beta5.jar slf4j-api-1.5.5.jar >> xml-apis-1.0.b2.jar >> apache-solr-core-nightly.jar bcprov-jdk14-132.jar >> commons-logging-1.0.4.jar jetty-6.1.3.jar nekohtml-1.9.9.jar >> poi-ooxml-3.5-beta5.jar slf4j-jdk14-1.5.5.jar >> xmlbeans-2.3.0.jar >> apache-solr-solrj-nightly.jar commons-codec-1.3.jar dom4j-1.6.1.jar >> jetty-util-6.1.3.jar ooxml-schemas-1.0.jar >> poi-scratchpad-3.5-beta5.jar tika-0.3.jar >> asm-3.1.jar commons-io-1.4.jar >> fontbox-0.1.0-dev.jar jsp-2.1 pdfbox-0.7.3.jar >> servlet-api-2.5-6.1.3.jar xercesImpl-2.8.1.jar >> >> Actually I wasn't very accurate. Following the wiki didn't suffice. I >> had to add other jars, in order to avoid ClassNotFoundExceptions at >> startup. These are >> >> apache-solr-core-nightly.jar >> apache-solr-solrj-nightly.jar >> slf4j-api-1.5.5.jar >> slf4j-jdk14-1.5.5.jar >> >> even while using solr nightly war (in example/webapps). >> >> Perhaps something wrong with jar versions? >> >> Francisco >> >> >> 2009/4/20 francisco treacy : >>> >>> Hi Grant, >>> >>> Here is the full stacktrace: >>> >>> 20-Apr-2009 12:36:39 org.apache.solr.common.SolrException log >>> SEVERE: java.lang.ClassCastException: >>> org.apache.solr.handler.extraction.ExtractingRequestHandler cannot be >>> cast to org.apache.solr.request.SolrRequestHandler >>> at >>> org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:154) >>> at >>> org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:163) >>> at >>> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141) >>> at >>> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:171) >>> at org.apache.solr.core.SolrCore.(SolrCore.java:535) >>> at >>> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:122) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) >>> at >>> org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) >>> at >>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) >>> at >>> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) >>> at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) >>> at >>> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) >>> at >>> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) >>> at >>> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) >>> at >>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) >>> at >>> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) >>> at >>> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) >>> at >>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) >>> at >>> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) >>> at >>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) >>> at >>> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) >>> at org.mortbay.jetty.Server.doStart(Server.java:210) >>> at >>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) >>> at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:616) >>> at org.mortbay.start.Main.invokeMain(Main.java:183) >>> at org.mortbay.start.Main.start(Main.java:497) >>> at org.mortbay.start.Main.main(Main.java:115) >>> >>> Thanks >>> >>> Francisco >>> >
bug? No highlighting results with dismax and q.alt=*:*
For the Drupal Apache Solr Integration module, we are exploring the possibility of doing facet browsing - since we are using dismax as the default handler, this would mean issuing a query with an empty q and falling back to to q.alt='*:*' or some other q.alt that matches all docs. However, I notice when I do this that we do not get any highlights back in the results despite defining a highlight alternate field. In contrast, if I force the standard request handler then I do get text back from the highlight alternate field: select/?q=*:*&qt=standard&hl=true&hl.fl=body&hl.alternateField=body&hl.maxAlternateFieldLength=256 However, I then loose the nice dismax features of weighting the results using bq and bf parameters. So, is this a bug or the intended behavior? The relevant fragment of the solrconfig.xml is this: dismax *:* true body 3 true body 256 Full solrconfig.xml and other files: http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/?pathrev=DRUPAL-6--1 -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: bug? No highlighting results with dismax and q.alt=*:*
Possibly this issue is related: https://issues.apache.org/jira/browse/SOLR-825 Though it seems that might affect the standard handler, while what I'm seeing is more sepcific to the dismax handler. -Peter On Thu, May 7, 2009 at 8:27 PM, Peter Wolanin wrote: > For the Drupal Apache Solr Integration module, we are exploring the > possibility of doing facet browsing - since we are using dismax as > the default handler, this would mean issuing a query with an empty q > and falling back to to q.alt='*:*' or some other q.alt that matches > all docs. > > However, I notice when I do this that we do not get any highlights > back in the results despite defining a highlight alternate field. > > In contrast, if I force the standard request handler then I do get > text back from the highlight alternate field: > > select/?q=*:*&qt=standard&hl=true&hl.fl=body&hl.alternateField=body&hl.maxAlternateFieldLength=256 > > However, I then loose the nice dismax features of weighting the > results using bq and bf parameters. So, is this a bug or the intended > behavior? > > The relevant fragment of the solrconfig.xml is this: > > > > dismax > > *:* > > > true > body > 3 > true > > body > 256 > > > Full solrconfig.xml and other files: > http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/?pathrev=DRUPAL-6--1 > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: bug? No highlighting results with dismax and q.alt=*:*
Well, in this case we want to match all documents, so I'm not sure that can be accomplished with dismax other than using a q.alt ? -Peter On Fri, May 8, 2009 at 1:32 PM, Marc Sturlese wrote: > > I have experienced it before... maybe you can manage something similar to > your q.alt using the params q and qf. Highlight will work in that case (I > sorted it out doing that) > > Peter Wolanin-2 wrote: >> >> Possibly this issue is related: >> https://issues.apache.org/jira/browse/SOLR-825 >> >> Though it seems that might affect the standard handler, while what I'm >> seeing is more sepcific to the dismax handler. >> >> -Peter >> >> On Thu, May 7, 2009 at 8:27 PM, Peter Wolanin >> wrote: >>> For the Drupal Apache Solr Integration module, we are exploring the >>> possibility of doing facet browsing - since we are using dismax as >>> the default handler, this would mean issuing a query with an empty q >>> and falling back to to q.alt='*:*' or some other q.alt that matches >>> all docs. >>> >>> However, I notice when I do this that we do not get any highlights >>> back in the results despite defining a highlight alternate field. >>> >>> In contrast, if I force the standard request handler then I do get >>> text back from the highlight alternate field: >>> >>> select/?q=*:*&qt=standard&hl=true&hl.fl=body&hl.alternateField=body&hl.maxAlternateFieldLength=256 >>> >>> However, I then loose the nice dismax features of weighting the >>> results using bq and bf parameters. So, is this a bug or the intended >>> behavior? >>> >>> The relevant fragment of the solrconfig.xml is this: >>> >>> >> default="true"> >>> >>> dismax >>> >>> *:* >>> >>> >>> true >>> body >>> 3 >>> true >>> >>> body >>> 256 >>> >>> >>> Full solrconfig.xml and other files: >>> http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/?pathrev=DRUPAL-6--1 >>> >>> -- >>> Peter M. Wolanin, Ph.D. >>> Momentum Specialist, Acquia. Inc. >>> peter.wola...@acquia.com >>> >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com >> >> > > -- > View this message in context: > http://www.nabble.com/bug--No-highlighting-results-with-dismax-and-q.alt%3D*%3A*-tp23438048p23450189.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Replication master+slave
Indeed - that looks nice - having some kind of conditional includes would make many things easier. -Peter On Wed, May 13, 2009 at 4:22 PM, Otis Gospodnetic wrote: > > This looks nice and simple. I don't know enough about this stuff to see any > issues. If there are no issues.? > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Bryan Talbot >> To: solr-user@lucene.apache.org >> Sent: Wednesday, May 13, 2009 11:26:41 AM >> Subject: Re: Replication master+slave >> >> I see that Nobel's final comment in SOLR-1154 is that config files need to be >> able to include snippets from external files. In my limited testing, a >> simple >> patch to enable XInclude support seems to work. >> >> >> >> --- src/java/org/apache/solr/core/Config.java (revision 774137) >> +++ src/java/org/apache/solr/core/Config.java (working copy) >> @@ -100,8 +100,10 @@ >> if (lis == null) { >> lis = loader.openConfig(name); >> } >> - javax.xml.parsers.DocumentBuilder builder = >> DocumentBuilderFactory.newInstance().newDocumentBuilder(); >> - doc = builder.parse(lis); >> + javax.xml.parsers.DocumentBuilderFactory dbf = >> DocumentBuilderFactory.newInstance(); >> + dbf.setNamespaceAware(true); >> + dbf.setXIncludeAware(true); >> + doc = dbf.newDocumentBuilder().parse(lis); >> >> DOMUtil.substituteProperties(doc, loader.getCoreProperties()); >> } catch (ParserConfigurationException e) { >> >> >> >> This allows a clause like this to include the contents of replication.xml if >> it >> exists. If it's not found an exception will be thrown. >> >> >> >> href="http://localhost:8983/solr/corename/admin/file/?file=replication.xml"; >> xmlns:xi="http://www.w3.org/2001/XInclude";> >> >> >> >> If the file is optional and no exception should be thrown if the file is >> missing, simply include a fallback action: in this case the fallback is empty >> and does nothing. >> >> >> >> href="http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml"; >> xmlns:xi="http://www.w3.org/2001/XInclude";> >> >> >> >> >> -Bryan >> >> >> >> >> On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote: >> >> > I was looking at the same problem, and had a discussion with Noble. You can >> > use a hack to achieve what you want, see >> > >> > https://issues.apache.org/jira/browse/SOLR-1154 >> > >> > Thanks, >> > >> > Jianhan >> > >> > >> > On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote: >> > >> >> So how are people managing solrconfig.xml files which are largely the same >> >> other than differences for replication? >> >> >> >> I don't think it's a "good thing" to maintain two copies of the same file >> >> and I'd like to avoid that. Maybe enabling the XInclude feature in >> >> DocumentBuilders would make it possible to modularize configuration files >> >> to >> >> make this possible? >> >> >> >> >> >> >> http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean) >> >> >> >> >> >> -Bryan >> >> >> >> >> >> >> >> >> >> >> >> On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote: >> >> >> >> On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot >> wrote: >> >>> >> >>> For replication in 1.4, the wiki at >> http://wiki.apache.org/solr/SolrReplication says that a node can be both >> the master and a slave: >> >> A node can act as both master and slave. In that case both the master >> and >> slave configuration lists need to be present inside the >> ReplicationHandler >> requestHandler in the solrconfig.xml. >> >> What does this mean? Does the core then poll itself for updates? >> >> >>> >> >>> >> >>> No. This type of configuration is meant for "repeaters". Suppose there >> >>> are >> >>> slaves in multiple data-centers (say data center A and B). There is >> >>> always >> >>> a >> >>> single master (say in A). One of the slaves in B is used as a master for >> >>> the >> >>> other slaves in B. Therefore, this one slave in B is both a master as >> >>> well >> >>> as the slave. >> >>> >> >>> >> >>> >> I'd like to have a single set of configuration files that are shared by >> masters and slaves and avoid duplicating configuration details in >> multiple >> files (one for master and one for slave) to ease management and >> failover. >> Is this possible? >> >> >> >>> You wouldn't want the master to be a slave. So I guess you'd need to have >> >>> a >> >>> separate file. Also, it needs to be a separate file so that the slave >> >>> does >> >>> not become a master when the solrconfig.xml is replicated. >> >>> >> >>> >> >>> >> When I attempt to setup a multi server master-slave configuration and >> include both master and slave replication configuration options, I into >> some >> problems. I'm running a nightly build from May 7. >> >> >>
Re: Solr memory requirements?
I think that if you have in your index any documents with norms, you will still use norms for those fields even if the schema is changed later. Did you wipe and re-index after all your schema changes? -Peter On Fri, May 15, 2009 at 9:14 PM, vivek sar wrote: > Some more info, > > Profiling the heap dump shows > "org.apache.lucene.index.ReadOnlySegmentReader" as the biggest object > - taking up almost 80% of total memory (6G) - see the attached screen > shot for a smaller dump. There is some norms object - not sure where > are they coming from as I've omitnorms=true for all indexed records. > > I also noticed that if I run a query - let's say generic query that > hits 100million records and then follow up with a specific query - > which hits only 1 record, the second query causes the increase in > heap. > > Looks like there are few bytes being loaded into memory for each > document - I've checked the schema all indexes have omitNorms=true, > all caches are commented out - still looking to see what else might > put things in memory which don't get collected by GC. > > I also saw, https://issues.apache.org/jira/browse/SOLR- for Solr > 1.4 (which I'm using). Not sure if that can cause any problem. I do > use range queries for dates - would that have any effect? > > Any other ideas? > > Thanks, > -vivek > > On Thu, May 14, 2009 at 8:38 PM, vivek sar wrote: >> Thanks Mark. >> >> I checked all the items you mentioned, >> >> 1) I've omitnorms=true for all my indexed fields (stored only fields I >> guess doesn't matter) >> 2) I've tried commenting out all caches in the solrconfig.xml, but >> that doesn't help much >> 3) I've tried commenting out the first and new searcher listeners >> settings in the solrconfig.xml - the only way that helps is that at >> startup time the memory usage doesn't spike up - that's only because >> there is no auto-warmer query to run. But, I noticed commenting out >> searchers slows down any other queries to Solr. >> 4) I don't have any sort or facet in my queries >> 5) I'm not sure how to change the "Lucene term interval" from Solr - >> is there a way to do that? >> >> I've been playing around with this memory thing the whole day and have >> found that it's the search that's hogging the memory. Any time there >> is a search on all the records (800 million) the heap consumption >> jumps by 5G. This makes me think there has to be some configuration in >> Solr that's causing some terms per document to be loaded in memory. >> >> I've posted my settings several times on this forum, but no one has >> been able to pin point what configuration might be causing this. If >> someone is interested I can attach the solrconfig and schema files as >> well. Here are the settings again under Query tag, >> >> >> 1024 >> true >> 50 >> 200 >> >> false >> 2 >> >> >> and schema, >> >> > required="true" omitNorms="true" compressed="false"/> >> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > default="NOW/HOUR" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > default="NOW/HOUR" omitNorms="true"/> >> >> >> > omitNorms="true" multiValued="true"/> >> >> Any help is greatly appreciated. >> >> Thanks, >> -vivek >> >> On Thu, May 14, 2009 at 6:22 PM, Mark Miller wrote: >>> 800 million docs is on the high side for modern hardware. >>> >>> If even one field has norms on, your talking almost 800 MB right there. And >>> then if another Searcher is brought up well the old one is serving (which >>> happens when you update)? Doubled. >>> >>> Your best bet is to distribute across a couple machines. >>> >>> To minimize you would want to turn off or down caching, don't facet, don't >>> sort, turn off all norms, possibly get at the Lucene term interval and raise >>> it. Drop on deck searchers setting. Even then, 800 million...time to >>> distribute I'd think. >>> >>> vivek sar wrote: Some update on this issue, 1) I attached jconsole to my app and monitored the memory usage. During indexing the memory usage goes up and down, which I think is normal. The memory remains around the min heap size (4 G) for indexing, but as soon as I run a search the tenured heap usage jumps up to 6G and remains there. Subsequent searches increases the heap usage even more until it reaches the max (8G) - after which everything (indexing and searching becomes slow). Th
exceptions when using existing index with latest build
Building Solr last night from updated svn, I'm now getting the exception below when I use any fq parameter searching a pre-existing index. So far, I cannot fix it by tweak config files, but I had to delete and re-index. I note that Solr was recently updated to the latest lucene build, so maybe something broke in the index format? here's the relevant part of the trace: org.apache.lucene.index.ReadOnlySegmentReader cannot be cast to org.apache.solr.search.SolrIndexReader java.lang.ClassCastException: org.apache.lucene.index.ReadOnlySegmentReader cannot be cast to org.apache.solr.search.SolrIndexReader at org.apache.solr.search.SortedIntDocSet$2.getDocIdSet(SortedIntDocSet.java:530) at org.apache.lucene.search.IndexSearcher.doSearch(IndexSearcher.java:237) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:221) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212) at org.apache.lucene.search.Searcher.search(Searcher.java:150) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1032) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:894) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:337) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:176) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1328) -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Recover crashed solr index
you can use the lucene jar with solr to invoke the CheckIndex method - this will possibly allow you to recover if you pass the -fix param. You may lose some docs, however, so this is only viable if you can, for example, query to check what's missing. The command looks like (from the root of the solr svn checkout): java -ea:org.apache.lucene -cp lib/lucene-core-2.9-dev.jar org.apache.lucene.index.CheckIndex [path to index directory] For example, to check the example index: java -ea:org.apache.lucene -cp lib/lucene-core-2.9-dev.jar org.apache.lucene.index.CheckIndex example/solr/data/index/ -Peter On Mon, May 25, 2009 at 4:42 AM, Wang Guangchen wrote: > Hi everyone, > > I have 8m docs to index, and each doc is around 50kb. The solr crashed in > the middle of indexing. error message said that one of the file in the data > directory is missing. I don't know why this is happened. > > So right now I have to find a way to recover the index to avoid re-index. Is > there anyone know any tools or method to recover the crashed index? Please > help. > > Thanks a lot. > > Regards > GC > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
NPE when unloading an absent
Is this a known bug? When I try to unload a core that does not exist, Solr throws a NullPointerException java.lang.NullPointerException at org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:125) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.acquia.search.HmacFilter.doFilter(HmacFilter.java:62) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: NPE when unloading an absent
I did not find any relevant issue, so here's a new issue with a patch: https://issues.apache.org/jira/browse/SOLR-1200 -Peter On Wed, Jun 3, 2009 at 4:56 PM, Peter Wolanin wrote: > Is this a known bug? When I try to unload a core that does not exist, > Solr throws a NullPointerException > > java.lang.NullPointerException > at > org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319) > at > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:125) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at > org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Dismax request handler and highlighting
I had the same problem - I think the answer is that highlighting is not currently supported with q.alt and dismax. http://www.nabble.com/bug--No-highlighting-results-with-dismax-and-q.alt%3D*%3A*-td23438048.html#a23438048 -Peter On Sun, Jun 7, 2009 at 7:51 AM, Fouad Mardini wrote: > Hello, > > I am having problems with the dismax request handler and highlighting. > The following query works as intended > > http://localhost:8983/solr/select?indent=on&q=myquery&start=0&rows=10&fl=id%2Cscore&qt=standard&wt=standard&hl=true&hl.fl=myfield > > whereas > > http://localhost:8983/solr/select?indent=on&q.alt=myquery&start=0&rows=10&fl=id%2Cscore&qt=dismax&wt=standard&hl=true&hl.fl=myfield > > I am using dismax since i need boost functions. > Furthermore, using the q parameter with dismax doesn't seem to work with me, > debug gives the following output > > myquery > +() () > > is there a setting somewhere that i need to set? > > I am building SOLR right out of svn. > > Thanks, > Fouad > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
can Trie fields be stored?
Looking at the new examples of solr.TrieField http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/schema.xml I see that all have indexed="true" stored="false" in the field tpye definition. Does this mean that yo cannot ever store a value for one of these fields? I.e. if I want to do a range query and also return the values, I need to store the values in a separate field? Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
multi-core, autocommit and resource use
A question for anyone familiar with the details of the time-based autocommit mechanism in Solr: if I am running several core on the same server and send updates to each core at the same time, what happens? If all the cores have their autocommit time run out at the same time, will every core try to conduct operations (e.g. opening new searchers, merges, other things?) at the same time and thus cause resource issues? I think I understand that all the pending changes are on disk already, so the "commit" that happens when the time is up is really just opening new searchers that include the added documents. Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: multi-core, autocommit and resource use
So for now would it make sense to spread out the autocommit times for the different cores? Thanks. -Peter On Thu, Jun 18, 2009 at 7:07 PM, Yonik Seeley wrote: > On Thu, Jun 18, 2009 at 4:27 PM, Peter Wolanin > wrote: >> I think I understand >> that all the pending changes are on disk already, so the "commit" that >> happens when the time is up is really just opening new searchers that >> include the added documents. > > Only some of the pending changes may be on disk - a solr level commit > involves closing the IndexWriter which flushes everything to disk, and > then a new IndexReader is opened to read those changes. > > This will be improved in future versions such that an IndexReader can > be opened *before* all of the changes have been flushed to disk (work > on near-real-time indexing/searching in Lucene is progressing). > > -Yonik > http://www.lucidimagination.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: facets: case and accent insensitive sort
Seems like this might be approached using a Lucene payload? For example where the original string is stored as the payload and available in the returned facets for display purposes? Payloads are byte arrays stored with Terms on Fields. See https://issues.apache.org/jira/browse/LUCENE-755 Solr seems to have support for a few example payloads already like NumericPayloadTokenFilter Almost any way you approach this it seems like there are potentially problems since you might have multiple combinations of case and accent mapping to the same case-less accent-less value that you want to use for sorting (and I assume for counting) your facets? -Peter On Fri, Jun 26, 2009 at 9:02 AM, Sébastien Lamy wrote: > Shalin Shekhar Mangar a écrit : >> >> On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy wrote: >> >> >>> >>> If I use a copyField to store into a string type, and facet on that, my >>> problem remains: >>> The facets are sorted case and accent sensitive. And I want an >>> *insensitive* sort. >>> If I use a copyField to store into a type with no accents and case (e.g >>> alphaOnlySort), then solr return me facet values with no accents and no >>> case. And I want the facet values returned by solr to *have accents and >>> case*. >>> >> >> Ah, of course you are right. There is no way to do this right now except >> at >> the client side. >> > > Thank you for your response. > Would it be easy to modify Solr to behave like I want. Where should I start > to investigate? > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Select tika output for extract-only?
I had been assuming that I could choose among possible tika output formats when using the extracting request handler in extract-only mode as if from the CLI with the tika jar: -x or --xmlOutput XHTML content (default) -h or --html Output HTML content -t or --text Output plain text content -m or --metadata Output only metadata However, looking at the docs and source, it seems that only the xml option is available (hard-coded) in ExtractingDocumentLoader: serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", true)); In addition, it seems that the metadata is always appended to the response. Are there any open issues relating to this, or opinions on whether adding additional flexibility to the response format would be of interest for 1.4? Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Select tika output for extract-only?
Ok, thanks. I played with it enough to to get plain text out at least, but I'll wait for the resolution of SOLR-284 -Peter On Sun, Jul 12, 2009 at 9:20 AM, Yonik Seeley wrote: > Peter, I'm hacking up solr cell right now, trying to simplify the > parameters and fix some bugs (see SOLR-284) > A quick patch to specify the output format should make it into 1.4 - > but you may want to wait until I finish. > > -Yonik > http://www.lucidimagination.com > > On Sat, Jul 11, 2009 at 5:39 PM, Peter Wolanin > wrote: >> I had been assuming that I could choose among possible tika output >> formats when using the extracting request handler in extract-only mode >> as if from the CLI with the tika jar: >> >> -x or --xml Output XHTML content (default) >> -h or --html Output HTML content >> -t or --text Output plain text content >> -m or --metadata Output only metadata >> >> However, looking at the docs and source, it seems that only the xml >> option is available (hard-coded) in ExtractingDocumentLoader: >> >> serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", >> true)); >> >> In addition, it seems that the metadata is always appended to the response. >> >> Are there any open issues relating to this, or opinions on whether >> adding additional flexibility to the response format would be of >> interest for 1.4? >> >> Thanks, >> >> Peter >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com >> > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
lucene or Solr bug with dismax?
I have been getting exceptions thrown when users try to send boolean queries into the dismax handler. In particular, with a leading 'OR'. I'm really not sure why this happens - I thought the dsimax parser ignored AND/OR? I'm using rev 779609 in case there were recent changes to this. Is this a known issue? Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR bin OR vti OR aut OR author OR dll': Encountered " "OR "" at line 1, column 0. Was expecting one of: ... "+" ... "-" ... "(" ... "*" ... ... ... ... ... "[" ... "{" ... ... ... "*" ... at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: lucene or Solr bug with dismax?
Indeed - I assumed that only the "+" and "-" characters had any special meaning when parsing dismax queries and that all other content would be treated just as keywords. That seems to be how it's described in the dismax documentation? Looks like this is a relevant issue (is there another)? https://issues.apache.org/jira/browse/SOLR-874 -Peter On Mon, Jul 13, 2009 at 4:12 PM, Mark Miller wrote: > It doesn't ignore OR and AND, though it probably should. I think there is a > JIRA issue for it somewhere. > > On Mon, Jul 13, 2009 at 4:10 PM, Peter Wolanin > wrote: > >> I can still generate this error with Solr built from svn trunk just now. >> >> http://localhost:8983/solr/select/?qt=dismax&q=OR+vti+OR+foo >> >> I'm doubly perplexed by this since 'or' is in the stopwords file. >> >> -Peter >> >> On Mon, Jul 13, 2009 at 3:15 PM, Peter Wolanin >> wrote: >> > I have been getting exceptions thrown when users try to send boolean >> > queries into the dismax handler. In particular, with a leading 'OR'. >> > I'm really not sure why this happens - I thought the dsimax parser >> > ignored AND/OR? >> > >> > I'm using rev 779609 in case there were recent changes to this. Is >> > this a known issue? >> > >> > >> > Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log >> > SEVERE: org.apache.solr.common.SolrException: >> > org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR >> > bin OR vti OR aut OR author OR dll': Encountered " "OR "" at line >> > 1, column 0. >> > Was expecting one of: >> > ... >> > "+" ... >> > "-" ... >> > "(" ... >> > "*" ... >> > ... >> > ... >> > ... >> > ... >> > "[" ... >> > "{" ... >> > ... >> > ... >> > "*" ... >> > >> > at >> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110) >> > at >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) >> > at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >> > >> > >> > >> > -- >> > Peter M. Wolanin, Ph.D. >> > Momentum Specialist, Acquia. Inc. >> > peter.wola...@acquia.com >> > >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com >> > > > > -- > -- > - Mark > > http://www.lucidimagination.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: lucene or Solr bug with dismax?
I can still generate this error with Solr built from svn trunk just now. http://localhost:8983/solr/select/?qt=dismax&q=OR+vti+OR+foo I'm doubly perplexed by this since 'or' is in the stopwords file. -Peter On Mon, Jul 13, 2009 at 3:15 PM, Peter Wolanin wrote: > I have been getting exceptions thrown when users try to send boolean > queries into the dismax handler. In particular, with a leading 'OR'. > I'm really not sure why this happens - I thought the dsimax parser > ignored AND/OR? > > I'm using rev 779609 in case there were recent changes to this. Is > this a known issue? > > > Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: > org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR > bin OR vti OR aut OR author OR dll': Encountered " "OR "" at line > 1, column 0. > Was expecting one of: > ... > "+" ... > "-" ... > "(" ... > "*" ... > ... > ... > ... > ... > "[" ... > "{" ... > ... > ... > "*" ... > > at > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Multivalued fields and scoring/sorting
Assuming that you know the unique ID when constructing the query (which it sounds like you do) why not try a boost query with a high boost for 2 and a lower boost for 1 - then the default sort by score should match your desired ordering, and this order can be further tweaked with other bf or bq arguments. -Peter On Thu, Jul 16, 2009 at 9:15 AM, Matt Schraeder wrote: > The first number is a unique ID that points to a particular customer, > the second is a value. It basically tells us whether or not a customer > already has that product or not. The main use of it is to be able to > search our product listing for products the customer does not already > have. > > The alternative would be to put that in a second index, but that would > mean that I would be doing two searches for every single search I want > to complete, which I am not sure would be a very good option. > avl...@gmail.com 7/16/2009 12:04:53 AM >>> > > The harsh reality of life is that you cannot sort on multivalued > fields. > If you can explain your domain problem (the significance of numbers > "818", > "2" etc), maybe people can come up with an alternate index design which > fits > into your use cases. > > Cheers > Avlesh > > On Thu, Jul 16, 2009 at 1:18 AM, Matt Schraeder > wrote: > >> I am trying to come up with a way to sort (or score, and sort based > on >> the score) of a multivalued field. I was looking at FunctionQueries > and >> saw fieldvalue, but as that only works on single valued fields that >> doesn't help me. >> >> The field is as follows: >> >> > sortMissingLast="true" omitNorms="true"> >> >> >> >> >> >> >> >> >> >> >> > multiValued="true" /> >> >> The actual data that gets put in this field is a string consisting of > a >> number, a space, and a 1 or a 2. For example: >> >> "818 2" >> "818 1" >> "950 1" >> "1022 2" >> >> I want to be able to give my search results given a boost if a >> particular document contains "818 2" and a smaller boost if the > document >> contains "818 1" but not "818 2". >> >> The end result would be documents sorted as follows: >> >> 1) Documents with "818 2" >> 2) Documents with "818 1" but not "818 2" >> 3) Documents that contain neither "818 2" nor "818 1" >> >> Is this possible with solr? How would I go about doing this? >> > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Wikipedia or reuters like index for testing facets?
AWS provides some standard data sets, including an extract of all wikipedia content: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2345&categoryID=249 Looks like it's not being updated often, so this or another AWS data set could be a consistent basis for benchmarking? -Peter On Wed, Jul 15, 2009 at 2:21 PM, Jason Rutherglen wrote: > Yeah that's what I was thinking of as an alternative, use enwiki > and randomly generate facet data along with it. However for > consistent benchmarking the random data would need to stay the > same so that people could execute the same benchmark > consistently in their own environment. > > On Tue, Jul 14, 2009 at 6:28 PM, Mark Miller wrote: >> Why don't you just randomly generate the facet data? Thats prob the best way >> right? You can control the uniques and ranges. >> >> On Wed, Jul 15, 2009 at 1:21 AM, Grant Ingersoll wrote: >> >>> Probably not as generated by the EnwikiDocMaker, but the WikipediaTokenizer >>> in Lucene can pull out richer syntax which could then be Teed/Sinked to >>> other fields. Things like categories, related links, etc. Mostly, though, >>> I was just commenting on the fact that it isn't hard to at least use it for >>> getting docs into Solr. >>> >>> -Grant >>> >>> On Jul 14, 2009, at 7:38 PM, Jason Rutherglen wrote: >>> >>> You think enwiki has enough data for faceting? On Tue, Jul 14, 2009 at 2:56 PM, Grant Ingersoll wrote: > At a min, it is trivial to use the EnWikiDocMaker and then send the doc > over > SolrJ... > > On Jul 14, 2009, at 4:07 PM, Mark Miller wrote: > > On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen < >> jason.rutherg...@gmail.com> wrote: >> >> Is there a standard index like what Lucene uses for contrib/benchmark >>> for >>> executing faceted queries over? Or maybe we can randomly generate one >>> that >>> works in conjunction with wikipedia? That way we can execute real world >>> queries against faceted data. Or we could use the Lucene/Solr mailing >>> lists >>> and other data (ala Lucid's faceted site) as a standard index? >>> >>> >> I don't think there is any standard set of docs for solr testing - there >> is >> not a real benchmark contrib - though I know more than a few of us have >> hacked up pieces of Lucene benchmark to work with Solr - I think I've >> done >> it twice now ;) >> >> Would be nice to get things going. I was thinking the other day: I >> wonder >> how hard it would be to make Lucene Benchmark generic enough to accept >> Solr >> impls and Solr algs? >> >> It does a lot that would suck to duplicate. >> >> -- >> -- >> - Mark >> >> http://www.lucidimagination.com >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > > >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >>> Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >> >> >> -- >> -- >> - Mark >> >> http://www.lucidimagination.com >> > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: spellcheck with misspelled words in index
I think you can just tell the spellchecker to only supply "more popular" suggestions, which would naturally omit these rare misspellings: true -Peter On Wed, Jul 15, 2009 at 7:30 PM, Jay Hill wrote: > We had the same thing to deal with recently, and a great solution was posted > to the list. Create a stopwords filter on the field your using for your > spell checking, and then populate a custom stopwords file with known > misspelled words: > > positionIncrementGap="100" > > > > > words="misspelled_words.txt"/> > > > > > Your spell field would look like this: > multiValued="true"/> > > Then add words like "cusine" to messpelled_words.txt > > -Jay > > > On Tue, Jul 14, 2009 at 11:40 PM, Chris Williams wrote: > >> Hi, >> I'm having some trouble getting the correct results from the >> spellcheck component. I'd like to use it to suggest correct product >> titles on our site, however some of our products have misspellings in >> them outside of our control. For example, there's 2 products with the >> misspelled word "cusine" (and 25k with the correct spelling >> "cuisine"). So if someone searches for the word "cusine" on our site, >> I would like to show the 2 misspelled products, and a suggestion with >> "Did you mean cuisine?". >> >> However, I can't seem to ever get any spelling suggestions when I >> search by the word "cusine", and correctlySpelled is always true. >> Misspelled words that don't appear in the index work fine. >> >> I noticed that setting onlyMorePopular to true will return suggestions >> for the misspelled word, but I've found that it doesn't work great for >> other words and produces suggestions too often for correctly spelled >> words. >> >> I incorrectly had thought that by setting thresholdTokenFrequency >> higher on my spelling dictionary that these misspellings would not >> appear in my spelling index and thus I would get suggestions for them, >> but as I see now, the spellcheck doesn't quite work like that. >> >> Is there any way to somehow get spelling suggestions to work for these >> misspellings in my index if they have a low frequency? >> >> Thanks in advance, >> Chris >> > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Obtaining SOLR index size on disk
Actually, if you have a server enabled as a replication master, the stats.jsp page reports the index size, so that information is available in some cases. -Peter On Sat, Jul 18, 2009 at 8:14 AM, Erik Hatcher wrote: > > On Jul 17, 2009, at 8:45 PM, J G wrote: >> >> Is it possible to obtain the SOLR index size on disk through the SOLR API? >> I've read through the docs and mailing list questions but can't seem to find >> the answer. > > No, but it'd be a great addition to the /admin/system handler which returns > lots of other useful trivia like the free memory, ulimit, uptime, and such. > > Erik > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: SOLR: Replication
Related to the difference between rsync and native Solr replication - we are seeing issues with Solr 1.4 where search queries that come in during a replication request hang for excessive amount of time (up to 100's of seconds for a result normally that takes ~50 ms). We are replicating pretty often (every 90 sec for multiple cores to one slave server), but still did not think that replication would make the master server unable to handle search requests. Is there some configuration option we are missing which would handle this situation better? Thanks, Peter On Sun, Jan 3, 2010 at 11:27 AM, Fuad Efendi wrote: > Thank you Yonik, excellent WIKI! I'll try without APR, I believe it's > environmental issue; 100Mbps switched should do 10 times faster (current > replica speed is 1Mbytes/sec) > > >> -Original Message- >> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik >> Seeley >> Sent: January-03-10 10:03 AM >> To: solr-user@lucene.apache.org >> Subject: Re: SOLR: Replication >> >> On Sat, Jan 2, 2010 at 11:35 PM, Fuad Efendi wrote: >> > I tried... I set APR to improve performance... server is slow while >> replica; >> > but "top" shows only 1% of I/O wait... it is probably environment >> specific; >> >> So you're saying that stock tomcat (non-native APR) was also 10 times >> slower? >> >> > but the same happened in my home-based network, rsync was 10 times >> faster... >> > I don't know details of HTTP-replica, it could be base64 or something >> like >> > that; RAM-buffer, flush to disk, etc. >> >> The HTTP replication is using binary. >> If you look here, it was benchmarked to be nearly as fast as rsync: >> http://wiki.apache.org/solr/SolrReplication >> >> It does do a fsync to make sure that the files are on disk after >> downloading, but that shouldn't make too much difference. >> >> -Yonik >> http://www.lucidimagination.com > > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: SOLR Performance Tuning: Pagination
At the NOVA Apache Lucene/Solr Meetup last May, one of the speakers from Near Infinity (Aaron McCurry I think) mentioned that he had a patch for lucene that enabled unlimited depth memory-efficient paging. Is anyone in contact with him? -Peter On Thu, Dec 24, 2009 at 11:27 AM, Grant Ingersoll wrote: > > On Dec 24, 2009, at 11:09 AM, Fuad Efendi wrote: > >> I used pagination for a while till found this... >> >> >> I have filtered query ID:[* TO *] returning 20 millions results (no >> faceting), and pagination always seemed to be fast. However, fast only with >> low values for start=12345. Queries like start=28838540 take 40-60 seconds, >> and even cause OutOfMemoryException. > > Yeah, deep pagination in Lucene/Solr can be problematic due to the Priority > Queue management. See http://issues.apache.org/jira/browse/LUCENE-2127 and > the linked discussion on java-dev. > >> >> I use highlight, faceting on nontokenized "Country" field, standard handler. >> >> >> It even seems to be a bug... >> >> >> Fuad Efendi >> +1 416-993-2060 >> http://www.linkedin.com/in/liferay >> >> Tokenizer Inc. >> http://www.tokenizer.ca/ >> Data Mining, Vertical Search >> >> >> >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Indexing the latests MS Office documents
You must have been searching old documentation - I think tika 0,3+ has support for the new MS formats. but don't take my word for it - why don't you build tika and try it? -Peter On Sun, Jan 3, 2010 at 7:00 PM, Roland Villemoes wrote: > Hi All, > > Anyone who knows how to index the latest MS office documents like .docx and > .xlsx ? > > From searching it seems like Tika only supports the earlier formats .doc and > .xls > > > > med venlig hilsen/best regards > > Roland Villemoes > Tel: (+45) 22 69 59 62 > E-Mail: mailto:r...@alpha-solutions.dk > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
dramatic load from stas.jsp page
The attached screenshot shows the transition on a master search server when we updated from a Solr 1.4 dev build (revision 779609 from 2009-05-28) to the Solr 1.4.0 released code. Every 3 hours we have a cron task to log some of the data from the stats.jsp page from each core (about 100 cores, most of which are small indexes). You can see there is a dramatic spiking of the load after the update - I think due to added reporting on that page such as from the lucene field cache. Is this amount of increased load expected from the stats.jsp page, or would your consider this a bug? Other than creating a custom jsp page with just a subset of this data, is I don't see any way in Solr to query and display specific stats of interest for one core via the REST interface - am I missing something? Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: internal XML parser used in Solr
Config.java (which parses e.g. solrconfig.xml) in the solr core code has: import org.w3c.dom.Document; import org.w3c.dom.Node; import org.xml.sax.SAXException; import org.apache.solr.common.SolrException; import org.apache.solr.common.util.DOMUtil; import javax.xml.parsers.*; import javax.xml.xpath.XPath; import javax.xml.xpath.XPathFactory; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpressionException; import javax.xml.namespace.QName; On Tue, Jan 5, 2010 at 10:05 AM, Smith G wrote: > Hello , > There are some project specific schema xml files which should > be parsed. I have used Jdom API for the same. But it seems more clean > to shift to xml parser used by Solr itself. I have gone through source > codes.Its a bit confusing. I have found javax.xml package and also > org.xml.sax package . May I know which API should I use so that there > is no need to add some external jar file to the solr-lib . I am also > looking for the jar file ( in solr ) in which xml parser api is > included. > Thanks > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: SOLR Performance Tuning: Pagination
Great - this issue? https://issues.apache.org/jira/browse/LUCENE-2127 Sounds like it would be a real win for lucene. -Peter On Thu, Jan 7, 2010 at 4:12 PM, Otis Gospodnetic wrote: > Peter - Aaron just commented on a recent Solr issue (reading large result > sets) and mentioned his patch. > So far he has 2 x +1 from Grant and me to stick his patch in JIRA. > > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > - Original Message >> From: Peter Wolanin >> To: solr-user@lucene.apache.org >> Sent: Sun, January 3, 2010 3:37:01 PM >> Subject: Re: SOLR Performance Tuning: Pagination >> >> At the NOVA Apache Lucene/Solr Meetup last May, one of the speakers >> from Near Infinity (Aaron McCurry I think) mentioned that he had a >> patch for lucene that enabled unlimited depth memory-efficient paging. >> Is anyone in contact with him? >> >> -Peter >> >> On Thu, Dec 24, 2009 at 11:27 AM, Grant Ingersoll wrote: >> > >> > On Dec 24, 2009, at 11:09 AM, Fuad Efendi wrote: >> > >> >> I used pagination for a while till found this... >> >> >> >> >> >> I have filtered query ID:[* TO *] returning 20 millions results (no >> >> faceting), and pagination always seemed to be fast. However, fast only >> >> with >> >> low values for start=12345. Queries like start=28838540 take 40-60 >> >> seconds, >> >> and even cause OutOfMemoryException. >> > >> > Yeah, deep pagination in Lucene/Solr can be problematic due to the Priority >> Queue management. See http://issues.apache.org/jira/browse/LUCENE-2127 and >> the >> linked discussion on java-dev. >> > >> >> >> >> I use highlight, faceting on nontokenized "Country" field, standard >> >> handler. >> >> >> >> >> >> It even seems to be a bug... >> >> >> >> >> >> Fuad Efendi >> >> +1 416-993-2060 >> >> http://www.linkedin.com/in/liferay >> >> >> >> Tokenizer Inc. >> >> http://www.tokenizer.ca/ >> >> Data Mining, Vertical Search >> >> >> >> >> >> >> >> >> > >> > -- >> > Grant Ingersoll >> > http://www.lucidimagination.com/ >> > >> > Search the Lucene ecosystem using Solr/Lucene: >> http://www.lucidimagination.com/search >> > >> > >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 - stats page slow
I recently noticed the same sort of thing. The attached screenshot shows the transition on a search server when we updated from a Solr 1.4 dev build (revision 779609 from 2009-05-28) to the Solr 1.4.0 released code. Every 3 hours we have a cron task to log some of the data from the stats.jsp page from each core (about 100 cores, most of which are small indexes). You can see there is a dramatic spiking of the load after the update - I think due to added reporting on that page such as from the lucene field cache. Is this amount of load expected? -Peter On Thu, Dec 24, 2009 at 12:23 PM, Jay Hill wrote: > Also, what is your heap size and the amount of RAM on the machine? > > I've also noticed that, when watching memory usage through JConsole or > YourKit while loading the stats page, the memory usage spikes dramatically - > are you seeing this as well? > > -Jay > > On Thu, Dec 24, 2009 at 9:12 AM, Jay Hill wrote: > >> I've noticed this as well, usually when working with a large field cache. I >> haven't done in-depth analysis of this yet, but it seems like when the stats >> page is trying to pull data from a large field cache it takes quite a long >> time. >> >> Are you doing a lot of sorting? If so, what are the field types of the >> fields you're sorting on? How large is the index both in document count and >> file size? >> >> Another approach to get data from the Solr instance would be to use JMX. >> And I've been working on a request handler (started by Erik Hatcher) that >> will provide the same information as the stats page, but a little more >> efficiently. I may try to put up a patch with this soon. >> >> -Jay >> >> >> >> On Wed, Dec 23, 2009 at 6:43 AM, Stephen Weiss wrote: >> >>> We've been using Solr 1.4 for a few days now and one slight downside we've >>> noticed is the stats page comes up very slowly for some reason - sometimes >>> more than 10 seconds. We call this programmatically to retrieve the last >>> commit date so that we can keep users from committing too frequently. This >>> means some of our administration pages are now taking a long time to load. >>> Is there anything we should be doing to ensure that this page comes up >>> quickly? I see some notes on this back in October but it looks like that >>> update should already be applied by now. Or, better yet, is there now a >>> better way to just retrieve the last commit date from Solr without pulling >>> all of the statistics? >>> >>> Thanks in advance. >>> >>> -- >>> Steve >>> >> >> > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 - stats page slow
Ah sorry - didn't realize attachments were stripped. Here's a web version: http://img.skitch.com/20100108-t99a1emmar32w9gkcfcius8afm.png -Peter On Thu, Jan 7, 2010 at 9:53 PM, Otis Gospodnetic wrote: > I'd love to see the screenshot, but it didn't come through - got stripped by > ML manager. Maybe upload it somewhere? > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > ----- Original Message >> From: Peter Wolanin >> To: solr-user@lucene.apache.org >> Sent: Thu, January 7, 2010 9:32:26 PM >> Subject: Re: Solr 1.4 - stats page slow >> >> I recently noticed the same sort of thing. >> >> The attached screenshot shows the transition on a search server >> when we updated from a Solr 1.4 dev build (revision 779609 from >> 2009-05-28) to the Solr 1.4.0 released code. Every 3 hours we have a >> cron task to log some of the data from the stats.jsp page from each >> core (about 100 cores, most of which are small indexes). >> >> You can see there is a dramatic spiking of the load after the update - >> I think due to added reporting on that page such as from the lucene >> field cache. Is this amount of load expected? >> >> -Peter >> >> On Thu, Dec 24, 2009 at 12:23 PM, Jay Hill wrote: >> > Also, what is your heap size and the amount of RAM on the machine? >> > >> > I've also noticed that, when watching memory usage through JConsole or >> > YourKit while loading the stats page, the memory usage spikes dramatically >> > - >> > are you seeing this as well? >> > >> > -Jay >> > >> > On Thu, Dec 24, 2009 at 9:12 AM, Jay Hill wrote: >> > >> >> I've noticed this as well, usually when working with a large field cache. >> >> I >> >> haven't done in-depth analysis of this yet, but it seems like when the >> >> stats >> >> page is trying to pull data from a large field cache it takes quite a long >> >> time. >> >> >> >> Are you doing a lot of sorting? If so, what are the field types of the >> >> fields you're sorting on? How large is the index both in document count >> >> and >> >> file size? >> >> >> >> Another approach to get data from the Solr instance would be to use JMX. >> >> And I've been working on a request handler (started by Erik Hatcher) that >> >> will provide the same information as the stats page, but a little more >> >> efficiently. I may try to put up a patch with this soon. >> >> >> >> -Jay >> >> >> >> >> >> >> >> On Wed, Dec 23, 2009 at 6:43 AM, Stephen Weiss wrote: >> >> >> >>> We've been using Solr 1.4 for a few days now and one slight downside >> >>> we've >> >>> noticed is the stats page comes up very slowly for some reason - >> >>> sometimes >> >>> more than 10 seconds. We call this programmatically to retrieve the last >> >>> commit date so that we can keep users from committing too frequently. >> >>> This >> >>> means some of our administration pages are now taking a long time to >> >>> load. >> >>> Is there anything we should be doing to ensure that this page comes up >> >>> quickly? I see some notes on this back in October but it looks like that >> >>> update should already be applied by now. Or, better yet, is there now a >> >>> better way to just retrieve the last commit date from Solr without >> >>> pulling >> >>> all of the statistics? >> >>> >> >>> Thanks in advance. >> >>> >> >>> -- >> >>> Steve >> >>> >> >> >> >> >> > >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Basic questions about Solr cost in programming time
Having worked quite a bit on the Drupal integration - here's my quick take: If you have someone help you the first time, you can have a basic implementation running in Jetty in about 15 minutes. On your own, a couple hours maybe. For a non-public site (intranet) with modest traffic and no requirements for high availability, that is likely going to hold you for a while. If you are not already using tomcat6 and want a more robust deployment, getting that right will take you a couple days work I'd guess. There are already some options for indexing/searching documents via the Drupal integration, but that's still a little rough. Of course, we'd also be happy to have you get Drupal support and a hosted Solr index from us at Acquia. http://acquia.com/products-services/acquia-search-features However, I don't think you'll readily be able to use our service with Jive at the moment - you don't really describe why you'd be using both Jive and Drupal. If you are not doing any customization and compiling the java isn't something you enjoy, I'd think the certified distribution is a fine place to start and you can get with it Lucid's free PDF book, which is, I think, by far the best and most comprehensive Solr 1.4 reference work that exists at the moment. -Peter On Tue, Jan 26, 2010 at 3:00 PM, Jeff Crump wrote: > Hi, > I hope this message is OK for this list. > > I'm looking into search solutions for an intranet site built with Drupal. > Eventually we'd like to scale to enterprise search, which would include the > Drupal site, a document repository, and Jive SBS (collaboration software). > I'm interested in Lucene/Solr because of its scalability, faceted search and > optimization features, and because it is free. Our problem is that we are a > non-profit organization with only three very busy programmers/sys admins > supporting our employees around the world. > > To help me argue for Solr in terms of total cost, I'm hoping that members of > this list can share their insights about the following: > > * About how many hours of programming did it take you to set up your > instance of Lucene/Solr (not counting time spent on optimization)? > > * Are there any disadvantages of going with a certified distribution rather > than the standard distribution? > > > Thanks and best regards, > Jeff > > Jeff Crump > jcr...@hq.mercycorps.org > > > > > > > > > > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 - stats page slow
Sorry for not following up sooner- been a busy last couple weeks. We do see a significant instanity count - could this be due to updating indexes from the dev Solr build? E.g. on one server I see 61 and entries like: SUBREADER: Found caches for decendents of org.apache.lucene.index.readonlydirectoryrea...@2b8d6cbf+created 'org.apache.lucene.index.readonlydirectoryrea...@2b8d6cbf'=>'created',class org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#2002656056 (size =~ 74.4 KB) 'org.apache.lucene.store.niofsdirectory$niofsindexin...@47adeb94'=>'created',class org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#1099177573 (size =~ 74.4 KB) SUBREADER: Found caches for decendents of org.apache.lucene.index.readonlydirectoryrea...@d0340a9+created 'org.apache.lucene.index.readonlydirectoryrea...@d0340a9'=>'created',class org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#868132357 (size =~ 831.2 KB) 'org.apache.lucene.store.niofsdirectory$niofsindexin...@78802615'=>'created',class org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#1542727931 (size =~ 831.2 KB) And I think it's higher on the one associated with the screenshot. using the lucene checkIndex tool does not show any errors. Most of what we want is returned by the Luke handler, except for the pending adds and deletes and the index size. I can hack around this by creating a greatly reduced stats.jsp, but I'd also liek to understand what we are experiencing. -Peter On Fri, Jan 8, 2010 at 1:38 PM, Mark Miller wrote: > Yonik Seeley wrote: >> On Fri, Jan 8, 2010 at 1:03 PM, Mark Miller wrote: >> >>> It should be fixed in trunk, but that was after 1.4. Currently, it >>> should only do it if it sees insanity - which there shouldn't be any >>> with stock Solr. >>> >> >> http://svn.apache.org/viewvc/lucene/solr/tags/release-1.4.0/src/java/org/apache/solr/search/SolrFieldCacheMBean.java >> http://svn.apache.org/viewvc?view=revision&revision=826788 >> Seems like it's there? Or was it a different commit? >> >> Perhaps there is just real instanity... which may be unavoidable at >> this point since not everything in solr is done per-segment yet. >> >> -Yonik >> http://www.lucidimagination.com >> > > Your right - when looking at the Solr release date, I quickly took the > 10 as October - but it was 11/10, so it is in 1.4. > > So people seeing this should also being seeing an insanity count over one. > > I'd think that would be rarer than one this sounds like though ... whats > left that could cause insanity? > > We should prob switch to never calculating the size unless an explicit > param is pass to the stats page. > > > -- > - Mark > > http://www.lucidimagination.com > > > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: schema.xml and Xinclude
It doesn't really work with the schema.xml - I beat my head on it for a few hours not long ago - maybe I sent an e-mail to this list about it? Yes, here: http://www.lucidimagination.com/search/document/ba68aa6f2f7702c3/is_it_possible_to_use_xinclude_in_schema_xml -Peter On Wed, Jan 6, 2010 at 8:36 AM, Patrick Sauts wrote: > As in schema.xml are the same between all our indexes, I'd like to > make them an XInclude so I tried : > > > > xmlns:xi="http://www.w3.org/2001/XInclude";> > > > > - > - > - > > > My Syntax might not be correct ? > Or it is not possible ? yet ? > > Thank you again for your time. > > Patrick. > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 - stats page slow
Yes, we do have some fields (like the creation date) that we use for both sorting and faceting. -Peter On Tue, Jan 26, 2010 at 8:55 PM, Yonik Seeley wrote: > On Tue, Jan 26, 2010 at 8:49 PM, Peter Wolanin > wrote: >> Sorry for not following up sooner- been a busy last couple weeks. >> >> We do see a significant instanity count - could this be due to >> updating indexes from the dev Solr build? E.g. on one server I see > > Do you both sort (or use a function query) and facet on the "created" field? > Faceting on single-valued fields is still currently done at the > top-level reader, while sorting and function queries are at a segment > level. > > -Yonik > http://www.lucidimagination.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr/Drupal Integration - Query Question
Can you tell me more about the rord() performance issues? I'm one of the maintainers of the Drupal module, so I'd like to switch if there is a better option. Thanks, Peter On Wed, Feb 10, 2010 at 12:00 AM, Lance Norskog wrote: > The admin/form.jsp is supposed to prepopulate fl= with '*,score' which > means bring back all fields and the calculated relevance score. > > This is the Drupal search, decoded. I changed the %2B to + signs for > readability. Have a look at the filter query fq= and the facet date > range. > > Also, in Solr 1.4 the 'rord' function has become very slow. So the > Drupal integration needs some updating anyway. > > INFO: [] webapp=/solr path=/select > params={spellcheck=true& > spellcheck.q=video& > fl=id,nid,title,comment_count,type,created,changed,score,path,url,uid,name,ss_image_relative& > > bf=recip(rord(created),4,19,19)^200.0& > > &hl.simple.post=& > hl.simple.pre=&hl=&version=1.2& > hl.fragsize=& > hl.fl=& > hl.snippets=& > > facet=true&facet.limit=20& > facet.field=uid&facet.field=type&facet.field=language& > facet.mincount=1& > > fq=(nodeaccess_all:0+OR+hash:c13a544eb3ac)& > qf=name^3.0&facet.date=changed& > json.nl=map&wt=json& > > f.changed.facet.date.start=2010-02-09T07:01:14Z/HOUR& > f.changed.facet.date.end=2010-02-09T17:44:16Z+1HOUR/HOUR& > f.changed.facet.date.gap=+1HOUR > > rows=10&start=0&facet.sort=true& > q=video} > hits=0 status=0 QTime=0 > > On Tue, Feb 9, 2010 at 1:28 PM, jaybytez wrote: >> >> I know this is not Drupal, but thought this question maybe more around the >> Solr query. >> >> For instance, I pulled down LucidImaginations Solr install, just like the >> apache solr install and ran the example solr and loaded the documents from >> the exampledocs. >> >> I can go to: >> >> http://localhost:8983/solr/admin/ >> >> And search for video and get responses >> >> But on my solr if I go to the full interface and use the defaults, I get no >> results back because of search fields, etc. >> >> http://localhost:8983/solr/admin/form.jsp >> >> So my admin Solr search query looks like this when searching "video": >> >> Feb 9, 2010 1:25:49 PM org.apache.solr.core.SolrCore execute >> INFO: [] webapp=/solr path=/select >> params={explainOther=&fl=&indent=on&start=0&q=video&hl.fl=&qt=&wt=&fq=&version=2.2&rows=10} >> hits=2 status=0 QTime=0 >> >> But if I go into Drupal and search "video", this is the query and no results >> come back: >> >> Feb 9, 2010 1:27:33 PM org.apache.solr.core.SolrCore execute >> INFO: [] webapp=/solr path=/select >> params={spellcheck=true&f.changed.facet.date.start=2010-02-09T07:01:14Z/HOUR&facet=true&facet.limit=20&spellcheck.q=video&hl.simple.pre=&hl=&version=1.2&fl=id,nid,title,comment_count,type,created,changed,score,path,url,uid,name,ss_image_relative&bf=recip(rord(created),4,19,19)^200.0&f.changed.facet.date.gap=%2B1HOUR&hl.simple.post=&facet.field=uid&facet.field=type&facet.field=language&fq=(nodeaccess_all:0+OR+hash:c13a544eb3ac)&hl.fragsize=&facet.mincount=1&qf=name^3.0&facet.date=changed&hl.fl=&json.nl=map&wt=json&f.changed.facet.date.end=2010-02-09T17:44:16Z%2B1HOUR/HOUR&rows=10&hl.snippets=&start=0&facet.sort=true&q=video} >> hits=0 status=0 QTime=0 >> >> Any thoughts on the search query that gets generated by the Drupal/Solr >> module? >> >> Thanks...jay >> -- >> View this message in context: >> http://old.nabble.com/Solr-Drupal-Integration---Query-Question-tp27522362p27522362.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Lance Norskog > goks...@gmail.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr/Drupal Integration - Query Question
The Drupal schema and solrconfig and the example schema and solrconfig have different fields and defaults, and likely Drupal won't find the fields its looking for and might not be even using the right query perser. -Peter On Thu, Feb 11, 2010 at 3:19 PM, jaybytez wrote: > > So I got it to work by running the drupal cron.php. > > I was originally trying to use the exampledocs, indexing that content, and > making that index available to the Drupal solr. > > But it might just be that they are different indexes? And that's why I > wasn't get responses. > > One quick question, the Drupal/Solr Facets are awesome, the only thing is > the URLs are escaped and seem to cause problems when I click the link. Is > this most likely an encoding issue or something in Solr that is causing > these links to be created poorly? > > For instance: > > http://localhost:8080/search/apachesolr_search/drupal?filters=tid%3A1%20tid%3A3%20%28nodeaccess_all%3A0%20OR%20hash%3Ac13a544eb3ac%29 > > This returns no results and produces the following error in Solr (is this > error related to http://issues.apache.org/jira/browse/SOLR-1231): > > Feb 11, 2010 12:18:58 PM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: > org.apache.lucene.queryParser.ParseException: Cannot parse > 'hash:c13a544eb3ac)': Encountered " ")" ") "" at line 1, column 17. > Was expecting one of: > > ... > ... > ... > "+" ... > "-" ... > "(" ... > "*" ... > "^" ... > ... > ... > ... > ... > ... > "[" ... > "{" ... > ... > > at > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:108) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse > 'hash:c13a544eb3ac)': Encountered " ")" ") "" at line 1, column 17. > Was expecting one of: > > ... > ... > ... > "+" ... > "-" ... > "(" ... > "*" ... > "^" ... > ... > ... > ... > ... > ... > "[" ... > "{" ... > ... > > at > org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:205) > at > org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) > at org.apache.solr.search.QParser.getQuery(QParser.java:131) > at > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:103) > ... 22 more > Caused by: org.apache.lucene.queryParser.ParseException: Encountered " ")" > ") "" at line 1, column 17. > Was expecting one of: > > ... > ... > ... > "+" ... > "-" ... > "(" ... > "*" ... > "^" ... > ... > ... > ... > ... > ... > "[" ... > "{" ... > ... > > at > org.apache.lucene.queryParser.QueryParser.generateParseException(QueryParser.java:1846) > at > org.apache.lucene.queryParser.QueryParser.jj_consume_token(QueryParser.java:1728) > at > org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1255) > at > org.apache.lucene.queryParser.
Solr 1.4 bug? search fails but analyzer indicates a match
Ran into an odd situation today searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. Our default handler is dismax, but this also fails with the standard handler. So I'm wondering if this is a known issue, or am I missing something subtle in the analysis chain? Solr is 1.4.0 that I built. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca I would expect all the queries that fail to match. Looking at the schema browser, the index contains the expected terms: identica, identi, ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 bug? search fails but analyzer indicates a match
Hi Mitch, I am also seeing this locally with the exact same solr.war, solrconfig.xml, and schema.xml running under Jetty, as well as on 2 different production servers with the same content indexed. So this is really weird - this seems to be influenced by the surrounding text: "would be great to have support for Identi.ca on the follow block" fails to match "Identi.ca", but putting the content on its own or in another sentence: "Support Identi.ca" the search matches. More testing suggests the word "for" is the problem. I don't see an exception or error. Could be a problem with how stopwords are removed? -Peter On Sat, Mar 27, 2010 at 1:19 PM, MitchK wrote: > > Hi Peter, > > have you tried to reindex your data and did you do a commit? > If you changed anything, have you restarted your Solr-server? > > I can't understand why this problem occurs, since the example seem to work > at analysis.jsp. > > Kind regards > - Mitch > -- > View this message in context: > http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680313.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 bug? search fails but analyzer indicates a match
If I empty the stopword file and re-index, all expected matches happen. So maybe that provides a further suggestion of where the problem is. This certainly feels like a Solr bug (or lucene bug?). -Peter On Sat, Mar 27, 2010 at 3:05 PM, Peter Wolanin wrote: > Hi Mitch, > > I am also seeing this locally with the exact same solr.war, > solrconfig.xml, and schema.xml running under Jetty, as well as on 2 > different production servers with the same content indexed. > > So this is really weird - this seems to be influenced by the surrounding text: > > "would be great to have support for Identi.ca on the follow block" > > fails to match "Identi.ca", but putting the content on its own or in > another sentence: > > "Support Identi.ca" > > the search matches. More testing suggests the word "for" is the > problem. I don't see an exception or error. Could be a problem with > how stopwords are removed? > > -Peter > > > On Sat, Mar 27, 2010 at 1:19 PM, MitchK wrote: >> >> Hi Peter, >> >> have you tried to reindex your data and did you do a commit? >> If you changed anything, have you restarted your Solr-server? >> >> I can't understand why this problem occurs, since the example seem to work >> at analysis.jsp. >> >> Kind regards >> - Mitch >> -- >> View this message in context: >> http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680313.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 bug? search fails but analyzer indicates a match
The output on the analysis screen does look correct. Here are 2 screen shots: empty stopwords: http://img.skitch.com/20100327-rcsjdih4bn3y8ahajqa5wjwybd.png standard stopwords: http://img.skitch.com/20100327-1w5ct1wr25jkir4sji8kumefn1.png -Peter On Sat, Mar 27, 2010 at 4:13 PM, MitchK wrote: > > Peter, > > if you are right, please outcomment the stopword filter to make clear, that > the problem is really a problem of how the stopword filter deletes > stopwords. > > Is the output correct, if you enter "would be great to have support for > Identi.ca on the follow block" in the query-label at the analysis.jsp? Can > you make a screenshot for this sentence? > > - Mitch > -- > View this message in context: > http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680530.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 bug? search fails but analyzer indicates a match
The stopwords stanza looks like: Which is the same as the example schema http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/example/solr/conf/schema.xml changing this to enablePositionIncrements="false" seems to make the searching work as expected. Is it incorrect to have that directive here, or is this a bug? -Peter On Sat, Mar 27, 2010 at 4:25 PM, Peter Wolanin wrote: > The output on the analysis screen does look correct. Here are 2 screen shots: > > empty stopwords: http://img.skitch.com/20100327-rcsjdih4bn3y8ahajqa5wjwybd.png > > standard stopwords: > http://img.skitch.com/20100327-1w5ct1wr25jkir4sji8kumefn1.png > > -Peter > > On Sat, Mar 27, 2010 at 4:13 PM, MitchK wrote: >> >> Peter, >> >> if you are right, please outcomment the stopword filter to make clear, that >> the problem is really a problem of how the stopword filter deletes >> stopwords. >> >> Is the output correct, if you enter "would be great to have support for >> Identi.ca on the follow block" in the query-label at the analysis.jsp? Can >> you make a screenshot for this sentence? >> >> - Mitch >> -- >> View this message in context: >> http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680530.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 bug? search fails but analyzer indicates a match
Discussing this with Mark Miller in IRC - we are honing in on the problem. Looks as though Identi.ca is treated as phrase query as if I had quoted it like "Identi ca". That phrase search also fails. I had expected that Identi.ca would be the same as Identi ca (i.e. 2 separate tokens, not a phrase). -Peter On Sat, Mar 27, 2010 at 4:32 PM, Peter Wolanin wrote: > The stopwords stanza looks like: > > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true" > /> > > Which is the same as the example schema > http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/example/solr/conf/schema.xml > > changing this to enablePositionIncrements="false" seems to make the > searching work as expected. Is it incorrect to have that directive > here, or is this a bug? > > -Peter > > > On Sat, Mar 27, 2010 at 4:25 PM, Peter Wolanin > wrote: >> The output on the analysis screen does look correct. Here are 2 screen shots: >> >> empty stopwords: >> http://img.skitch.com/20100327-rcsjdih4bn3y8ahajqa5wjwybd.png >> >> standard stopwords: >> http://img.skitch.com/20100327-1w5ct1wr25jkir4sji8kumefn1.png >> >> -Peter >> >> On Sat, Mar 27, 2010 at 4:13 PM, MitchK wrote: >>> >>> Peter, >>> >>> if you are right, please outcomment the stopword filter to make clear, that >>> the problem is really a problem of how the stopword filter deletes >>> stopwords. >>> >>> Is the output correct, if you enter "would be great to have support for >>> Identi.ca on the follow block" in the query-label at the analysis.jsp? Can >>> you make a screenshot for this sentence? >>> >>> - Mitch >>> -- >>> View this message in context: >>> http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680530.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com >> > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 bug? search fails but analyzer indicates a match
Created a new issue: https://issues.apache.org/jira/browse/SOLR-1852 further discussion there. -Peter On Sat, Mar 27, 2010 at 5:51 PM, Peter Wolanin wrote: > Discussing this with Mark Miller in IRC - we are honing in on the problem. > > Looks as though Identi.ca is treated as phrase query as if I had > quoted it like "Identi ca". That phrase search also fails. I had > expected that Identi.ca would be the same as Identi ca (i.e. 2 > separate tokens, not a phrase). > > -Peter > > On Sat, Mar 27, 2010 at 4:32 PM, Peter Wolanin > wrote: >> The stopwords stanza looks like: >> >> > ignoreCase="true" >> words="stopwords.txt" >> enablePositionIncrements="true" >> /> >> >> Which is the same as the example schema >> http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/example/solr/conf/schema.xml >> >> changing this to enablePositionIncrements="false" seems to make the >> searching work as expected. Is it incorrect to have that directive >> here, or is this a bug? >> >> -Peter >> >> >> On Sat, Mar 27, 2010 at 4:25 PM, Peter Wolanin >> wrote: >>> The output on the analysis screen does look correct. Here are 2 screen >>> shots: >>> >>> empty stopwords: >>> http://img.skitch.com/20100327-rcsjdih4bn3y8ahajqa5wjwybd.png >>> >>> standard stopwords: >>> http://img.skitch.com/20100327-1w5ct1wr25jkir4sji8kumefn1.png >>> >>> -Peter >>> >>> On Sat, Mar 27, 2010 at 4:13 PM, MitchK wrote: >>>> >>>> Peter, >>>> >>>> if you are right, please outcomment the stopword filter to make clear, that >>>> the problem is really a problem of how the stopword filter deletes >>>> stopwords. >>>> >>>> Is the output correct, if you enter "would be great to have support for >>>> Identi.ca on the follow block" in the query-label at the analysis.jsp? Can >>>> you make a screenshot for this sentence? >>>> >>>> - Mitch >>>> -- >>>> View this message in context: >>>> http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680530.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>> >>> >>> >>> >>> -- >>> Peter M. Wolanin, Ph.D. >>> Momentum Specialist, Acquia. Inc. >>> peter.wola...@acquia.com >>> >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com >> > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 bug? search fails but analyzer indicates a match
I think it is clearly a bug - see comments on the issue by Robert Muir. https://issues.apache.org/jira/browse/SOLR-1852 The patch is a backport by Mark Miller of Robert's fixes for other problems for the WordDelimiterFilter in Solr trunk. Those fixes also fix this bug as a side effect. -Peter On Sun, Mar 28, 2010 at 4:09 AM, MitchK wrote: > > Peter, > > following your discussion, I was a bit confused: Is this still a bug or is > the behaviour correct (since the positionIncrement is set to be true) and > what changes did you do in the patch? > > Does the patch fits all your needs (Matches at "identi ca", "identica", > "identi-ca", "identi.ca")? > > - Mitch > -- > View this message in context: > http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p681185.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Evangelism
A very abbreviated list of sites using Apache Solr + Drupal here: http://drupal.org/node/447564 -Peter On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman wrote: > Hi I'm new to the list here, > > > > I'd like to steer someone in the direction of Solr, and I see the list of > companies using solr, but none have a "power by solr" logo or anything. > > > > Does anyone have any great links with evidence to majorly successful solr > projects? > > > > Thanks in advance, > > > > Dan B. > > > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com