Re: background merge hit exception
Do you mean that earlier it was doing indexing well then all of sudden you started getting this exception? - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/background-merge-hit-exception-tp2680625p2680979.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query on facet field¹s count
My patch is for 4.0 trunk. On Fri, Mar 11, 2011 at 10:05 PM, rajini maski wrote: > Thanks Bill Bell . .This query works after applying the patch you refered > to, is it? Please can you let me know how do I need to update the current > war (apache solr 1.4.1 )file with this new patch? Thanks a lot. > > Thanks, > Rajani > > On Sat, Mar 12, 2011 at 8:56 AM, Bill Bell wrote: > >> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=StudyID&face >> t.mincount=1&facet.limit=-1&f.StudyID.facet.namedistinct=1 >> >> Would do what you want I believe... >> >> >> >> On 3/11/11 8:51 AM, "Bill Bell" wrote: >> >> >There is my patch to do that. SOLR-2242 >> > >> >Bill Bell >> >Sent from mobile >> > >> > >> >On Mar 11, 2011, at 1:34 AM, rajini maski wrote: >> > >> >> Query on facet field results... >> >> >> >> >> >> When I run a facet query on some field say : facet=on & >> >> facet.field=StudyID I get list of distinct StudyID list with the count >> >>that >> >> tells that how many times did this study occur in the search query. >> >>But I >> >> also needed the count of these distinct StudyID list.. Any solr query >> >>to get >> >> count of it.. >> >> >> >> >> >> >> >> Example: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> 135164 >> >> >> >> 79820 >> >> >> >> 70815 >> >> >> >> 37076 >> >> >> >> 35276 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> I wanted the count attribute that shall return the count of number of >> >> different studyID occurred .. In above example it could be : Count = 5 >> >> (105,179,107,120,134) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> 135164 >> >> >> >> 79820 >> >> >> >> 70815 >> >> >> >> 37076 >> >> >> >> 35276 >> >> >> >> >> >> >> >> >> >> >> >
Solr query POST and not in GET
Hi, is possible to change Solr sending query method from get to post? because my query has a lot of OR..OR..OR and the log says to me Request URI too large Where can i change it?? thanx -- Gastone Penzo www.solr-italia.it The first italian blog about SOLR
Re: Solr query POST and not in GET
Yes it's possible. Assuming your using SolrJ as a client-library: set: QueryRequest req = new QueryRequest(); req.setMethod(METHOD.POST); Any other client-library should have a similar method. hth, Geert-Jan 2011/3/15 Gastone Penzo > Hi, > is possible to change Solr sending query method from get to post? > because my query has a lot of OR..OR..OR and the log says to me Request URI > too large > Where can i change it?? > thanx > > > > > -- > Gastone Penzo > > www.solr-italia.it > The first italian blog about SOLR >
Re: Solr query POST and not in GET
Please do not cross-post between lists - yours seems like a user query to me, so I'm answering it here. As to your question - Solr does not select the request method - you do. I've just tested it and Solr happily accepts a query via a POST request. However, you'd probably do well to look at other ways to structure your query if you're hitting the URL length limit. Upayavira On Tue, 15 Mar 2011 12:23 +0100, "Gastone Penzo" wrote: Hi, is possible to change Solr sending query method from get to post? because my query has a lot of OR..OR..OR and the log says to me Request URI too large Where can i change it?? thanx -- Gastone Penzo [1]www.solr-italia.it The first italian blog about SOLR References 1. http://www.solr-italia.it/ --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: keeping data consistent between Database and Solr
Solandra is great for adding better scalability and NRT to Solr, but it pretty much just stores the index in Cassandra and insulates that from the user. It doesn't solve the problem of allowing quick and direct retrieval of data that need not be searched. I could certainly just use a Solr search query to "directly" access a single document, but that has overhead and would not be as efficient as directly accessing a database. With potentially tens of thousands of simultaneous direct data accesses, I'd rather not put this burden on Solr and would prefer to use it only for searchas it was intended, while simple data retrieval could come from a better equipped database. But my question of consistency applies to all databases and Solr. i would imagine most people maintain separate MySQL and Solr databases. On Tuesday, March 15, 2011, Bill Bell wrote: > Look at Solandra. Solr + Cassandra. > > On 3/14/11 9:38 PM, "onlinespend...@gmail.com" > wrote: > >>Like many people, Solr is not my primary data store. Not all of my data >>need >>be searchable and for simple and fast retrieval I store it in a database >>(Cassandra in my case). Actually I don't have this all built up yet, but >>my >>intention is that whenever new data is entered that it be added to my >>Cassandra database and simultaneously added to the Solr index (either by >>queuing up recent data before a commit or some other means; any >>suggestions >>on this front?). >> >>But my main question is, how do I guarantee that data between my Cassandra >>database and Solr index are consistent and up-to-date? What if I write >>the >>data to Cassandra and then a failure occurs during the commit to the Solr >>index? I would need to be aware what data failed to commit and make sure >>that a re-attempt is made. Obviously inconsistency for a short duration >>is >>inevitable when using two different databases (Cassandra and Solr), but I >>certainly don't want a failure to create perpetual inconsistency. I'm >>curious what sort of mechanisms people are using to ensure consistency >>between their database (MySQL, Cassandra, etc.) and Solr. >> >>Thank you, >>Ben > > >
Re: background merge hit exception
On Tuesday 15 March 2011 01:31 PM, Anurag wrote: Do you mean that earlier it was doing indexing well then all of sudden you started getting this exception? - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/background-merge-hit-exception-tp2680625p2680979.html Sent from the Solr - User mailing list archive at Nabble.com. ya earlier it was fine but i think as index size grows the problem start occuring. also I have a doubt related to index size . Isnt it is too large as compared to no. of documents?
Solrj performance bottleneck
Hi, I am using Solrj as a Solr client in my project. While searching, for a few words, it seems Solrj takes more time to send response, for eg (8 - 12 sec). While searching most of the other words it seems Solrj take less amount of time only. For eg, if I post a search url in browser, it shows the QTime in milliseconds only. http://serverName/solr/mydata/select?q=computing&qt=myhandler&fq=category:1 But, if I query the same using Solrj from my project like below, it takes long time(8 - 12 sec) to produce the same results. Hence, I suspect whether Solrj takes such long time to produce results. SolrServer server = new CommonsHttpSolrServer(url); SolrQuery query = new SolrQuery("computing"); query.setParam("qt", "myhandler"); query.setFilterQueries("category:1"); query.setHighlight(false); QueryResponse rsp = server.query( query ); I have tried both POTH and GET method. But, both are taking much time. Any idea why Solrj takes such long time for particular words. It returns around 40 doc list as a search result. I have even comment out highlighting for that. And any way to speed it up. Note: I am using Tomcat and set heap size as around 1024 mb. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-performance-bottleneck-tp2681294p2681294.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrj performance bottleneck
On Tue, Mar 15, 2011 at 8:12 AM, rahul wrote: > I am using Solrj as a Solr client in my project. > > While searching, for a few words, it seems Solrj takes more time to send > response, for eg (8 - 12 sec). While searching most of the other words it > seems Solrj take less amount of time only. > > For eg, if I post a search url in browser, it shows the QTime in > milliseconds only. The QTime does not measure the time to stream back the response (which includes loading stored fields). Since the response is streamed, it's not possible to include this time. The difference normally isn't that large unless you have a network issue, a client that is taking a long time to read the response, or a very large index and not enough free RAM for the OS to cache all the files. Check the solr logs and make sure that equivalent queries are being received. The QTime is also logged. -Yonik http://lucidimagination.com
Re: keeping data consistent between Database and Solr
On 3/14/2011 9:38 PM, onlinespend...@gmail.com wrote: But my main question is, how do I guarantee that data between my Cassandra database and Solr index are consistent and up-to-date? Our MySQL database has two unique indexes. One is a document ID, implemented in MySQL as an autoincrement integer and in Solr as a long. The other is what we call a tag id, implemented in MySQL as a varchar and Solr as a single lowercased token and serving as Solr's uniqueKey. We have an update trigger on the database that updates the document ID whenever the database document is updated. We have a homegrown build system for Solr. In a nutshell, it keeps track of the newest document ID in the Solr Index. If the DIH delta-import fails, it doesn't update the stored ID, which means that on the next run, it will try and index those documents again. Changes to the entries in the database are automatically picked up because the document ID is newer, but the tag id doesn't change, so the document in Solr is overwritten. Things are actually more complex than I've written, because our index is distributed. Hopefully it can give you some ideas for yours. Shawn
Solrj Performance check.
Hi, I am using Solrj as a Solr client in my project. While searching, for a few words, it seems Solrj takes more time to send response, for eg (8 - 12 sec). While searching most of the other words it seems Solrj take less amount of time only. For eg, if I post a search url in browser, it shows the QTime in milliseconds only. http://serverName/solr/mydata/select?q=computing&qt=myhandler&fq=category:1 But, if I query the same using Solrj from my project like below, it takes long time(8 - 12 sec) to produce the same results. Hence, I suspect whether Solrj takes such long time to produce results. SolrServer server = new CommonsHttpSolrServer(url); SolrQuery query = new SolrQuery("computing"); query.setParam("qt", "myhandler"); query.setFilterQueries("category:1"); query.setHighlight(false); QueryResponse rsp = server.query( query ); I have tried both POTH and GET method. But, both are taking much time. Any idea why Solrj takes such long time for particular words. It returns around 40 doc list as a search result. I have even comment out highlighting for that. And any way to speed it up. Note: I am using Tomcat and set heap size as around 1024 mb. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-Performance-check-tp2681444p2681444.html Sent from the Solr - User mailing list archive at Nabble.com.
Dismax: field not returned unless in sort clause?
We have a "D" field (string, indexed, stored, not required) that is returned * when we search with the standard request handler * when we search with dismax request handler _and the field is specified in the sort parameter_ but is not returned when using the dismax handler and the field is not specified in the sort param. IOW, if I do the following query (no sort param), I get all the expected results, but the D field never comes back... &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D ...but if I add "D" to the sort param, the D field comes back on every single record &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20asc &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20desc If I omit the fl param, I see that all of our other fields appear to be returned on every result without any need to specify them in the sort param. Obviously, I cannot hard-code the sort order around the D field. :) Any ideas? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-field-not-returned-unless-in-sort-clause-tp2681447p2681447.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: keeping data consistent between Database and Solr
I use Solr + MySql with data coming from several DHI type "loaders" that I have written to move data from many different databases into my "BI" solution. I don't use DHI because I am not simply replicating the data, but I am moving/merging/processing the incoming data during the loading. For me, I have an Aspect (aspectj) which wraps my Data Access Object and every time a "persist" is called (I am using hibernate), I update Solr with the same data an instant later using @Around advice. This handles nearly every event during the day. I have a simple "retry" procedure on my Solrj add/commit on network error in hopes that it will eventually work. In case of error I rebuild the solr index from scratch each night by recreating it based on the data in MySQL. That takes about 10 minutes and I run it at night. This allows for me to have "eventual consistency" for any issues that cropped up during the day. Obviously the size of my database (< 2 million records) makes this approach manageable. YMMV. Tim -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Tuesday, March 15, 2011 9:13 AM To: solr-user@lucene.apache.org Subject: Re: keeping data consistent between Database and Solr On 3/14/2011 9:38 PM, onlinespend...@gmail.com wrote: > But my main question is, how do I guarantee that data between my Cassandra > database and Solr index are consistent and up-to-date? Our MySQL database has two unique indexes. One is a document ID, implemented in MySQL as an autoincrement integer and in Solr as a long. The other is what we call a tag id, implemented in MySQL as a varchar and Solr as a single lowercased token and serving as Solr's uniqueKey. We have an update trigger on the database that updates the document ID whenever the database document is updated. We have a homegrown build system for Solr. In a nutshell, it keeps track of the newest document ID in the Solr Index. If the DIH delta-import fails, it doesn't update the stored ID, which means that on the next run, it will try and index those documents again. Changes to the entries in the database are automatically picked up because the document ID is newer, but the tag id doesn't change, so the document in Solr is overwritten. Things are actually more complex than I've written, because our index is distributed. Hopefully it can give you some ideas for yours. Shawn
Solrj (1.4.1) Performance related query
Hi, I am using Solrj as a Solr client in my project. While searching, for a few words, it seems Solrj takes more time to send response, for eg (8 - 12 sec). While searching most of the other words it seems Solrj take less amount of time only. For eg, if I post a search url in browser, it shows the QTime in milliseconds only. http://serverName/solr/mydata/select?q=computing&qt=myhandler&fq=category:1 But, if I query the same using Solrj from my project like below, it takes long time(8 - 12 sec) to produce the same results. Hence, I suspect whether Solrj takes such long time to produce results. SolrServer server = new CommonsHttpSolrServer(url); SolrQuery query = new SolrQuery("computing"); query.setParam("qt", "myhandler"); query.setFilterQueries("category:1"); query.setHighlight(false); QueryResponse rsp = server.query( query ); I have tried both POTH and GET method. But, both are taking much time. Any idea why Solrj takes such long time for particular words. It returns around 40 doc list as a search result. I have even comment out highlighting for that. And any way to speed it up. Note: I am using Tomcat and set heap size as around 1024 mb. And I am using Solr 1.4.1 version. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-1-4-1-Performance-related-query-tp2681488p2681488.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dismax: field not returned unless in sort clause?
--- On Tue, 3/15/11, mrw wrote: > From: mrw > Subject: Dismax: field not returned unless in sort clause? > To: solr-user@lucene.apache.org > Date: Tuesday, March 15, 2011, 3:21 PM > We have a "D" field (string, indexed, > stored, not required) that is returned > * when we search with the standard request handler > * when we search with dismax request handler _and the field > is specified in > the sort parameter_ > > but is not returned when using the dismax handler and the > field is not > specified in the sort param. > > IOW, if I do the following query (no sort param), I get all > the expected > results, but the D field never comes back... > > &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D > > ...but if I add "D" to the sort param, the D field comes > back on every > single record > > &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20asc > &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20desc > > If I omit the fl param, I see that all of our other fields > appear to be > returned on every result without any need to specify them > in the sort param. > > Obviously, I cannot hard-code the sort order around the D > field. :) Can you use one space in qf parameter while separating field names? q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A B C&start=0&rows=300&fl=D
Re: Solr performance issue
My solr+jetty+java6 install seems to work well with these GC options. It's a dual processor environment: -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode I've never had a real problem with memory, so I've not done any kind of auditing. I probably should, but time is a limited resource. Shawn On 3/14/2011 2:29 PM, Markus Jelsma wrote: That depends on your GC settings and generation sizes. And, instead of UseParallelGC you'd better use UseParNewGC in combination with CMS. See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html It's actually, as I understand it, expected JVM behavior to see the heap rise to close to it's limit before it gets GC'd, that's how Java GC works. Whether that should happen every 20 seconds or what, I don't nkow. Another option is setting better JVM garbage collection arguments, so GC doesn't "stop the world" so often. I have had good luck with my Solr using this: -XX:+UseParallelGC
Re: Solr performance issue
CMS is very good for multicore CPU's. Use incremental mode only when you have a single CPU with only one or two cores. On Tuesday 15 March 2011 16:03:38 Shawn Heisey wrote: > My solr+jetty+java6 install seems to work well with these GC options. > It's a dual processor environment: > > -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > > I've never had a real problem with memory, so I've not done any kind of > auditing. I probably should, but time is a limited resource. > > Shawn > > On 3/14/2011 2:29 PM, Markus Jelsma wrote: > > That depends on your GC settings and generation sizes. And, instead of > > UseParallelGC you'd better use UseParNewGC in combination with CMS. > > > > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html > > > >> It's actually, as I understand it, expected JVM behavior to see the heap > >> rise to close to it's limit before it gets GC'd, that's how Java GC > >> works. Whether that should happen every 20 seconds or what, I don't > >> nkow. > >> > >> Another option is setting better JVM garbage collection arguments, so GC > >> doesn't "stop the world" so often. I have had good luck with my Solr > >> using this: -XX:+UseParallelGC -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Sorting 0 values last
Hi @everyone. I want so sort ASC on a price field, but some of the docs got a 0 (not NULL) value. Now I want that this docs are at the end when i sort the price field ascending. Is it possible? Thanks in advance. MOuli -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-0-values-last-tp2681612p2681612.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting 0 values last
On Tue, Mar 15, 2011 at 10:35 AM, MOuli wrote: > I want so sort ASC on a price field, but some of the docs got a 0 (not NULL) > value. Now I want that this docs are at the end when i sort the price field > ascending. Is it possible? In 3.1 and trunk (4.0-dev), you could sort by a function query that maps values of 0 to something very large. sort=map(price,0,0,99) asc This should map anything with a price between 0 and 0 to 99 for the purposes of sorting. http://wiki.apache.org/solr/FunctionQuery -Yonik http://lucidimagination.com
Faceting help
Hello list, I'm trying to use facet's via widget's within Ajax-Solr. I have tried the wiki for general help on configuring facets and constraints and also attended the recent Lucidworks webinar on faceted search. Can anyone please direct me to some reading on how to formally configure facets for searching. Currently my facets are configured as follows 'facet.field': [ 'topics', 'organisations', 'exchanges', 'countryCodes' ], 'facet.limit': 20, 'facet.mincount': 1, 'f.topics.facet.limit': 50, 'f.countryCodes.facet.limit': -1, 'facet.date': 'date', 'facet.date.start': '1987-02-26T00:00:00.000Z/DAY', 'facet.date.end': '1987-10-20T00:00:00.000Z/DAY+1DAY', 'facet.date.gap': '+1DAY', 'json.nl': 'map' However I wish to change the fields to contain some constraints such as Topics < field Legislation < constraint Guidance/Policies < constraint Customer Service information/complaints procedure < constraint financial information < constraint etc etc Source < field html < constraint < constraint pdf < constraint email < constraint etc etc Date < field < constraint Basically I need resources to understand how to implement the above instead of the example I currently have. Some guidance would be great Thank you kindly Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: Solr performance issue
The host is dual quad-core, each Xen VM has been given two CPUs. Not counting dom0, two of the hosts have 10/8 CPUs allocated, two of them have 8/8. The dom0 VM is also allocated two CPUs. I'm not really sure how that works out when it comes to Java running on the VM, but if at all possible, it is likely that Xen would try and keep both VM cpus on the same physical CPU and the VM's memory allocation on the same NUMA node. If that's the case, it would meet what you've stated as the recommendation for incremental mode. Shawn On 3/15/2011 9:10 AM, Markus Jelsma wrote: CMS is very good for multicore CPU's. Use incremental mode only when you have a single CPU with only one or two cores.
Sorting on multiValued fields via function query
Hello, I believe the most recent builds of Solr have started explicitly throwing an error around sorting on multiValued fields. I'd actually been sorting on multiValued fields for some time without any problems before this, not sure how Solr was able to handle this in the past... In any case, I'd like to be able to sort on multiValued fields via a function query, but keep getting the following error: can not use FieldCache on multivalued field I've tried using the function 'sum', 'max', and 'min' with the same result. Is there any way to sort on the maximum value, for instance, of a multiValued field? Thanks, -Harish -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-on-multiValued-fields-via-function-query-tp2681833p2681833.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Query
I am a bit new for Solr. I am running below query in query browser admin interface +RetailPriceCodeID:1 +MSRP:[16001.00 TO 32000.00] I think it should return only results with RetailPriceCode = 1 ad MSRP between 16001 and 32000. But it returns all resuts with MSRP = 1 and doesnt consider 2nd query at all. Am i doing something wrong here? Please help
Re: Sorting 0 values last
On Tue, Mar 15, 2011 at 9:04 PM, Yonik Seeley wrote: > On Tue, Mar 15, 2011 at 10:35 AM, MOuli wrote: >> I want so sort ASC on a price field, but some of the ouldocs got a 0 (not >> NULL) >> value. Now I want that this docs are at the end when i sort the price field >> ascending. Is it possible? > > In 3.1 and trunk (4.0-dev), you could sort by a function query that > maps values of 0 to something very large. [...] Not sure how you are indexing, but in addition to the above suggestion by Yonik, one could ignore 0's at indexing time, i.e., ensure that 0 values for that field are not indexed, and use sortMissingLast. Regards, Gora
Re: Solr Query
> But it returns all resuts with MSRP = 1 and doesnt consider 2nd query at all. I believe you mean: 'it returns all results with RetailPriceCodeID = 1 while ignoring the 2nd query?' If so, please check that your default operator is set to AND in your schema config. Other than that, your syntax seems correct. Hth, Geert-Jan 2011/3/15 Vishal Patel > I am a bit new for Solr. > > I am running below query in query browser admin interface > > +RetailPriceCodeID:1 +MSRP:[16001.00 TO 32000.00] > > I think it should return only results with RetailPriceCode = 1 ad MSRP > between 16001 and 32000. > > But it returns all resuts with MSRP = 1 and doesnt consider 2nd query at > all. > > Am i doing something wrong here? Please help >
Re: Dynamically boost search scores
Thank you for the advice. I looked at the page you recommended and came up with: http://localhost:8983/solr/search/?q=dog&fl=boost_score,genus,species,score&rows=15&bf=%22ord%28sum%28boost_score,1%29%29 ^10%22 But appeared to have no effect. The results were in the same order as they were when I left off the bf parameter. So what am I doing incorrectly? Thanks, Brian Lamb On Mon, Mar 14, 2011 at 11:45 AM, Markus Jelsma wrote: > See boosting documents by function query. This way you can use document's > boost_score field to affect the final score. > > http://wiki.apache.org/solr/FunctionQuery > > On Monday 14 March 2011 16:40:42 Brian Lamb wrote: > > Hi all, > > > > I have a field in my schema called boost_score. I would like to set it up > > so that if I pass in a certain flag, each document score is boosted by > the > > number in boost_score. > > > > For example if I use: > > > > http://localhost/solr/search/?q=dog > > > > I would get search results like normal. But if I use: > > > > http://localhost/solr/search?q=dog&boost=true > > > > The score of each document would be boosted by the number in the field > > boost_score. > > > > Unfortunately, I have no idea how to implement this actually but I'm > hoping > > that's where you all can come in. > > > > Thanks, > > > > Brian Lamb > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 >
Re: keeping data consistent between Database and Solr
That's pretty interesting to use the autoincrementing document ID as a way to keep track of what has not been indexed in Solr. And you overwrite this document ID even when you modify an existing document. Very cool. I suppose the number can even rotate back to 0, as long as you handle that. I am thinking of using a timestamp to achieve a similar thing. All documents that have been accessed after the last Solr index need to be added to the Solr index. In fact, each name-value pair in Cassandra has a timestamp associated with it, so I'm curious if I could simply use this. I'm curious how you handle the delta-imports. Do you have some routine that periodically checks for updates to your MySQL database via the document ID? Which language do you use for that? Thanks, Ben On Tue, Mar 15, 2011 at 9:12 AM, Shawn Heisey wrote: > On 3/14/2011 9:38 PM, onlinespend...@gmail.com wrote: > >> But my main question is, how do I guarantee that data between my Cassandra >> database and Solr index are consistent and up-to-date? >> > > Our MySQL database has two unique indexes. One is a document ID, > implemented in MySQL as an autoincrement integer and in Solr as a long. The > other is what we call a tag id, implemented in MySQL as a varchar and Solr > as a single lowercased token and serving as Solr's uniqueKey. We have an > update trigger on the database that updates the document ID whenever the > database document is updated. > > We have a homegrown build system for Solr. In a nutshell, it keeps track > of the newest document ID in the Solr Index. If the DIH delta-import fails, > it doesn't update the stored ID, which means that on the next run, it will > try and index those documents again. Changes to the entries in the database > are automatically picked up because the document ID is newer, but the tag id > doesn't change, so the document in Solr is overwritten. > > Things are actually more complex than I've written, because our index is > distributed. Hopefully it can give you some ideas for yours. > > Shawn > >
Re: problem using dataimporthandler
Regarding the DIH problem: I'm encountering "content not allowed in prolog" only when I'm deploying solr on tomcat. I'm using the same data-config.xml in the solr example through jetty and it works fine and I can index the data. Please let me know what should be changed while using tomcat. Thanks, Ram. -- View this message in context: http://lucene.472066.n3.nabble.com/problem-using-dataimporthandler-tp495415p2683596.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dynamically boost search scores
> the page you recommended and came up > with: > > http://localhost:8983/solr/search/?q=dog&fl=boost_score,genus,species,score&rows=15&bf=%22ord%28sum%28boost_score,1%29%29 > ^10%22 > > But appeared to have no effect. The results were in the > same order as they > were when I left off the bf parameter. So what am I doing > incorrectly? bf belongs to DisMaxParams. Instead try this q={!boost b=boost_score}dog You can use any valid FuctionQuery as a b paramter. http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html You may want to use parameter referencing also. Where q is harcoded in solrconfig.xml, something like: q={!boost b=$bb v=$qq}&qq=dog&bb=%22ord%28sum%28boost_score,1%29%29 http://wiki.apache.org/solr/LocalParams
Re: Solr Query
> I am running below query in query browser admin interface > > +RetailPriceCodeID:1 +MSRP:[16001.00 TO 32000.00] > > I think it should return only results with RetailPriceCode > = 1 ad MSRP > between 16001 and 32000. > > But it returns all resuts with MSRP = 1 and doesnt consider > 2nd query at > all. > > Am i doing something wrong here? Please help You query is perfectly fine. However range queries require trie based or sortable based types. tfloat tdouble etc.
partial optimize does not reduce the segment number to maxNumSegments
I have a core with 120+ segment files and I tried partial optimize specify maxNumSegments=10, after the optimize the segment files reduced to 64 files; I did the same optimize again, it reduced to 30 something; this keeps going and eventually it drops to teen number. I was expecting seeing the optimize results in exactly 10 segment files or somewhere near, and why do I have to manually repeat the optimize to reach that number? thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2682195.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem using dataimporthandler
I got rid of the problem by just copying the other schema and config files( which sound like nothing to do with the error on the dataconfig file but I gave it a try) and it worked I don't know if I'm missing something here but its working now. Thanks, Ram. -- View this message in context: http://lucene.472066.n3.nabble.com/problem-using-dataimporthandler-tp495415p2684044.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem using dataimporthandler
Could possibly be your original xml file was in unicode (with a BOM header - FFFE or FEFF) - xml will see it as content if the underlying file system doesn't handle it. On Tue, Mar 15, 2011 at 10:00 PM, sivaram wrote: > I got rid of the problem by just copying the other schema and config files( > which sound like nothing to do with the error on the dataconfig file but I > gave it a try) and it worked I don't know if I'm missing something here > but its working now. > > Thanks, > Ram. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/problem-using-dataimporthandler-tp495415p2684044.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Faceting help
I'm not sure if I get what you are trying to achieve. What do you mean by "constraint"? Are you saying that you effectively want to filter the facets that are returned? e.g. for source field, you want to show html/pdf/email, but not, say xls or doc? Upayavira On Tue, 15 Mar 2011 15:38 +, "McGibbney, Lewis John" wrote: > Hello list, > > I'm trying to use facet's via widget's within Ajax-Solr. I have tried the > wiki for general help on configuring facets and constraints and also > attended the recent Lucidworks webinar on faceted search. Can anyone > please direct me to some reading on how to formally configure facets for > searching. > > Currently my facets are configured as follows > > 'facet.field': [ 'topics', 'organisations', 'exchanges', > 'countryCodes' ], > 'facet.limit': 20, > 'facet.mincount': 1, > 'f.topics.facet.limit': 50, > 'f.countryCodes.facet.limit': -1, > 'facet.date': 'date', > 'facet.date.start': '1987-02-26T00:00:00.000Z/DAY', > 'facet.date.end': '1987-10-20T00:00:00.000Z/DAY+1DAY', > 'facet.date.gap': '+1DAY', > 'json.nl': 'map' > > However I wish to change the fields to contain some constraints such as > > Topics < field > Legislation < constraint > Guidance/Policies < constraint > Customer Service information/complaints procedure < constraint > financial information < constraint > etc etc > > Source < field > html < constraint < constraint > pdf < constraint > email < constraint > etc etc > > Date < field >< constraint > > Basically I need resources to understand how to implement the above > instead of the example I currently have. > Some guidance would be great > Thank you kindly > > Lewis > > Glasgow Caledonian University is a registered Scottish charity, number > SC021474 > > Winner: Times Higher Education’s Widening Participation Initiative of the > Year 2009 and Herald Society’s Education Initiative of the Year 2009. > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html > > Winner: Times Higher Education’s Outstanding Support for Early Career > Researchers of the Year 2010, GCU as a lead with Universities Scotland > partners. > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html > --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: accessing the analyzers in a component?
Thanks Ahmet, I indicated that in the wiki at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters My solution was a little bit different since I wanted to get the analyzer per field name: rb.getSchema().getField("name").getFieldType().getAnalyzer() thanks again! paul Le 15 mars 2011 à 02:44, Ahmet Arslan a écrit : >> Within my custom query-component, I wish to obtain an >> instance of the analyzer for a given named field. >> Is a schema object I can access? > > > public void process(ResponseBuilder rb) throws IOException { > > Map map = rb.req.getSchema().getFieldTypes(); > > Analyzer analyzer = map.get("myFieldName").getAnalyzer(); > Analyzer queryAnalyzer = map.get("myFieldName").getQueryAnalyzer(); > > >
stopFilterFactor and SnowballPorterFilterFactory not work for Spanish
I am using solr 1.4.1. I am trying to index a spanish field using the following tokenizer/filters: Using field analysis solr Admin i can tell StopFilterFactory and SnowballPorterFilterFactory with Spanish not working right: 1. after stopFilter, "la" should be gone, but it is not. 2. after snowballporterFilterFactory(language=Spanish), "cöcktäils" should become "cöcktäil". But i still see the token "cöcktäils" coming out. I configured a spanish stopword list for the StopFilterFactory. Field name: title_name field value: la Cöcktäils Index Analyzer = org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 2 term text la Cöcktäils term type wordword source start,end0,2 3,12 payload = org.apache.solr.analysis.StopFilterFactory {words=stopwords_es.txt, ignoreCase=true} term position 1 2 term text la Cöcktäils term type wordword source start,end0,2 3,12 payload == org.apache.solr.analysis.LowerCaseFilterFactory {} term position 1 2 term text la cöcktäils term type wordword source start,end0,2 3,12 payload === org.apache.solr.analysis.SnowballPorterFilterFactory {language=Spanish} term position 1 2 term text la cöcktäils term type wordword source start,end0,2 3,12 payload === org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position 1 2 term text la cöcktäils term type wordword source start,end0,2 3,12 payload == I just copied the text from this URL to form my stopwords_es.txt: http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt Look forward to your help... -- View this message in context: http://lucene.472066.n3.nabble.com/stopFilterFactor-and-SnowballPorterFilterFactory-not-work-for-Spanish-tp2684322p2684322.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dismax: field not returned unless in sort clause?
: We have a "D" field (string, indexed, stored, not required) that is returned : * when we search with the standard request handler : * when we search with dismax request handler _and the field is specified in : the sort parameter_ : : but is not returned when using the dismax handler and the field is not : specified in the sort param. are you using one of the "sortMissing" options on D or it's fieldType? I'm guessing you have sortMissingLast="true" for D, so anytime you sort on it the docs that do have a value appear first. but when you don't sort on it, other factors probably lead docs that don't have a value for the D field to appear first -- solr doesn't include fields in docs that don't have any value for that field. if my guess is correct, adding "fq=D:[* TO *] to any of your queries will cause the total number of results to shrink, but the first page of results for your requests that don't sort on D will look exactly the same. the LUkeRequestHandler will help you see how many docs in your index don't have any values indexed in the "D" field. -Hoss
Re: stopFilterFactor and SnowballPorterFilterFactory not work for Spanish
On Tue, Mar 15, 2011 at 7:07 PM, cyang2010 wrote: > I just copied the text from this URL to form my stopwords_es.txt: > > http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt > you cannot use the files from analysis/snowball/* directly with Solr. These files are not in a format that Solr understands... so you will have to modify them (e.g. the comment character for solr is # not |, and there can only be one word per line)
Re: Getting Category ID (primary key)
: If it works, it's performant and not too messy it's a good way :-) . You can : also consider just faceting on Id, and use the id to fetch the categoryname : through sql / nosql. : That way your logic is seperated from your presentation, which makes : extending (think internationalizing, etc.) easier. Not sure if that's : appropriate for your 'category' field but anyway. That's the advice i generally give *if* you already have access to the system of record for your categorization in your application (if not, the tradeoff of the extra data lookup may not be worth the cost in flexibility) >From the notes in my "Many Facets" talk (slide #24: "Pretty" facet.field Terms)... >> If you can use unique identifiers (instead of pretty, long, string >> labels) it can reduce the memory and simplify the request parsing (see >> next Tip) but it adds work to your front end application -- keeping >> track of the mapping between id => pretty label. If your application >> already needs to knows about these mappsings for other purposes, then >> it's much simpler to take advantage of that. http://people.apache.org/~hossman/apachecon2010/facets/ -Hoss
Re: background merge hit exception
The stack trace indicates there's an issue with optimizing. How are you kicking off optimizing? Is it automatic or manual? 34G for an index with that many documents does seem excessive. Are you re-indexing the same content with the same unique key? In other words do you have a large number of deleted documents in your index? You can tell by looking at your admin page, "schema browser" link and see if MaxDoc and NumDocs are wildly different. Is there any chance that more than one process is writing to your index? Best Erick On Tue, Mar 15, 2011 at 8:02 AM, Isha Garg wrote: > On Tuesday 15 March 2011 01:31 PM, Anurag wrote: >> >> Do you mean that earlier it was doing indexing well then all of sudden you >> started getting this exception? >> >> - >> Kumar Anurag >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/background-merge-hit-exception-tp2680625p2680979.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > ya earlier it was fine but i think as index size grows the problem start > occuring. > also I have a doubt related to index size . Isnt it is too large as > compared to no. of documents? >
Re: Sorting 0 values last
: Not sure how you are indexing, but in addition to the above : suggestion by Yonik, one could ignore 0's at indexing time, : i.e., ensure that 0 values for that field are not indexed, and : use sortMissingLast. Once upon a time i had a usecase where i was indexing product data, and in thta product data a null price ment not currently for sale, but "0" was a legal price for products that were being given away free. i had a similar requirement that the default sort should be based on price, but "free" products should come last ... except if the user explicitly said "sort by price". what i did was to index a "hasPrice" field that was true if the price field existed and was non-zero. my default sort was "hasPrice desc, price asc" but if the user clicked "sort by price" it was "price asc" that might give you ideas about your own usecase. -Hoss
Re: stopFilterFactor and SnowballPorterFilterFactory not work for Spanish
Robert, Thanks for your advice. I modified my stopword text file. Now the stopwordFilter start to work. But the stemming related filter (SnowballPorterFilterFactory-- spanish) still not working. Anyone have any idea on that? Thanks, cyang -- View this message in context: http://lucene.472066.n3.nabble.com/stopFilterFactor-and-SnowballPorterFilterFactory-not-work-for-Spanish-tp2684322p2684713.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: stopFilterFactor and SnowballPorterFilterFactory not work for Spanish
On Tue, Mar 15, 2011 at 8:50 PM, cyang2010 wrote: > Robert, > > Thanks for your advice. I modified my stopword text file. Now the > stopwordFilter start to work. > > But the stemming related filter (SnowballPorterFilterFactory-- spanish) > still not working. Anyone have any idea on that? > how is it not working? how is "cöcktäils" a spanish word!
Re: stopFilterFactor and SnowballPorterFilterFactory not work for Spanish
Sorry Robert. I just use some text translated by someone. Maybe that translation is not right. Could you please give me a spanish term which i can show the spanish stemming factory is working? Thanks, cyang -- View this message in context: http://lucene.472066.n3.nabble.com/stopFilterFactor-and-SnowballPorterFilterFactory-not-work-for-Spanish-tp2684322p2684775.html Sent from the Solr - User mailing list archive at Nabble.com.
Tokenizing Chinese & multi-language search
Hi, I remember reading in this list a while ago that Solr will only tokenize on whitespace even when using CJKAnalyzer. That would make Solr unusable on Chinese or any other languages that don't use whitespace as separator. 1) I remember reading about a workaround. Unfortunately I can't find the post that mentioned it. Could someone give me pointers on how to address this issue? 2) Let's say I have fixed this issue and have properly analyzed and indexed the Chinese documents. My documents are in multiple languages. I plan to use separate fields for documents in different languages: text_en, text_zh, text_ja, text_fr, etc. Each field will be associated with the appropriate analyzer. My problem now is how to deal with the query string. I don't know what language the query is in, so I won't be able to select the appropriate analyzer for the query string. If I just use the standard analyzer on the query string, any query that's in Chinese won't be tokenized correctly. So would the whole system still work in this case? This must be a pretty common use case, handling multi-language search. What is the recommended way of dealing with this problem? Thanks. Andy
Re: stopFilterFactor and SnowballPorterFilterFactory not work for Spanish
I just tried with some real spanish text: "Alquileres" --> = org.apache.solr.analysis.LowerCaseFilterFactory {} term position 1 term text alquileres term type word source start,end4,14 payload org.apache.solr.analysis.SnowballPorterFilterFactory {language=Spanish} term position 1 term text alquiler term type word source start,end4,14 payload Looks like that spanish stemmer is working. Thanks, cyang -- View this message in context: http://lucene.472066.n3.nabble.com/stopFilterFactor-and-SnowballPorterFilterFactory-not-work-for-Spanish-tp2684322p2684814.html Sent from the Solr - User mailing list archive at Nabble.com.
Trunk Compile failure/ hang
Hello, I am trying to build source out of trunk (to apply a patch) and ran into an issue were the build process hangs ( below output) during build lucene at sanity-load-lib. Just when build sanity-load-lib starts, I see an dialog box asking for applet access permission "The applet is attempting to invoke the java/lang/System.loadLibrary() operation on db_java-4.7" . I click on "Allow" and the build process never resumes (waited for more than an hour) Any thoughts would be very helpful. -Viswa ...build-lucene: contrib-build.init: get-db-jar: check-and-get-db-jar: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: compile-core: compile-test-framework: common.compile-test: sanity-load-lib:
noobie question: sorting
Hi Guys, came across this sorting query query ({!v="category: 445"}) desc I understand it is sorting on exact match of category = 445, I don't quite understand the syntax, could someone please elaborate a bit for me? So I can reuse this syntax in the future. Regards James
Re: Tokenizing Chinese & multi-language search
Hi Andy, Is the "I don't know what language the query is in" something you could change by... - asking the user - deriving from HTTP request headers - identifying the query language (if queries are long enough and "texty") - ... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Andy > To: solr-user@lucene.apache.org > Sent: Tue, March 15, 2011 9:07:36 PM > Subject: Tokenizing Chinese & multi-language search > > Hi, > > I remember reading in this list a while ago that Solr will only tokenize on >whitespace even when using CJKAnalyzer. That would make Solr unusable on >Chinese or any other languages that don't use whitespace as separator. > > 1) I remember reading about a workaround. Unfortunately I can't find the > post >that mentioned it. Could someone give me pointers on how to address this >issue? > > 2) Let's say I have fixed this issue and have properly analyzed and indexed >the Chinese documents. My documents are in multiple languages. I plan to use >separate fields for documents in different languages: text_en, text_zh, >text_ja, text_fr, etc. Each field will be associated with the appropriate >analyzer. > > My problem now is how to deal with the query string. I don't know what >language the query is in, so I won't be able to select the appropriate >analyzer >for the query string. If I just use the standard analyzer on the query >string, >any query that's in Chinese won't be tokenized correctly. So would the whole >system still work in this case? > > This must be a pretty common use case, handling multi-language search. What > is >the recommended way of dealing with this problem? > > Thanks. > Andy > > > >
Re: Sorting on multiValued fields via function query
Hi Harish. Did sorting on multiValued fields actually work correctly for you before? I'd be surprised if so. I could be wrong but I think you previously always got the sorting affects of whatever was the last indexed value. It is indeed the case that the FieldCache only supports up to one indexed value per field. Recently Hoss added sanity checks that you are seeing the results of: https://issues.apache.org/jira/browse/SOLR-2339 You might want to comment on that issue with proof (e.g. a simple test) that it worked before but not now. ~ David - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-on-multiValued-fields-via-function-query-tp2681833p2685485.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR DIH importing MySQL "text" column as a BLOB
I've a column for posts in MySQL of type `text`, I've tried corresponding `field-type` for it in Solr `schema.xml` e.g. `string, text, text-ws`. But whenever I'm importing it using the DIH, it's getting imported as a BLOB object. I checked, this thing is happening only for columns of type `text` and not for `varchar`(they are getting indexed as string). Hence, the posts field is not becoming searchable. I found about this issue, after repeated search failures, when I did a `*:*` query search on Solr. A sample response: 1.0 [B@10a33ce2 2011-02-21T07:02:55Z test.acco...@gmail.com Test Account [B@2c93c4f1 1 The `data-config.xml` : The `schema.xml` : solr_post_status_message_id solr_post_message Thanks, Kaushik
Re: Different options for autocomplete/autosuggestion
Hi, I actually don't follow how field collapsing helps with autocompletion...? Over at http://search-lucene.com we eat our own autocomplete dog food: http://sematext.com/products/autocomplete/index.html . Tasty stuff. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Kai Schlamp > To: solr-user@lucene.apache.org > Sent: Mon, March 14, 2011 11:52:48 PM > Subject: Re: Different options for autocomplete/autosuggestion > > @Robert: That sounds interesting and very flexible, but also like a > lot of work. This approach also doesn't seem to allow querying Solr > directly by using Ajax ... one of the big benefits in my opinion when > using Solr. > @Bill: There are some things I don't like about the Suggester > component. It doesn't seem to allow infix searches (at least it is not > mentioned in the Wiki or elsewhere). It also uses a separate index > that has to be rebuild independently of the main index. And it doesn't > support any filter queries. > > The Lucid Imagination blog also describes a further autosuggest > approach >(http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/). > > The disadvantage here is that the source documents must have distinct > fields (resp. the dih selects must provide distinct data). Otherwise > duplications would come up in the Solr query result, cause of the > document nature of Solr. > > In my opinion field collapsing seems to be most promising for a full > featured autosuggestion solution. Unfortunately it is not available > for Solr 1.4.x or 3.x (I tried patching those branches several times > without success). > > 2011/3/15 Bill Bell : > > http://lucidworks.lucidimagination.com/display/LWEUG/Spell+Checking+and+Aut > > omatic+Completion+of+User+Queries > > > > For Auto-Complete, find the following section in the solrconfig.xml file > > for the collection: > > > > > > > > autocomplete > > > name="classname">org.apache.solr.spelling.suggest.Suggester > >> name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookup > tr> > > autocomplete > > true > > > > > > > > > > > > > > On 3/14/11 8:16 PM, "Andy" wrote: > > > >>Can you provide more details? Or a link? > >> > >>--- On Mon, 3/14/11, Bill Bell wrote: > >> > >>> See how Lucid Enterprise does it... A > >>> bit differently. > >>> > >>> On 3/14/11 12:14 AM, "Kai Schlamp" > >>> wrote: > >>> > >>> >Hi. > >>> > > >>> >There seems to be several options for implementing an > >>> >autocomplete/autosuggestions feature with Solr. I am > >>> trying to > >>> >summarize those possibilities together with their > >>> advantages and > >>> >disadvantages. It would be really nice to read some of > >>> your opinions. > >>> > > >>> >* Using N-Gram filter + text field query > >>> >+ available in stable 1.4.x > >>> >+ results can be boosted > >>> >+ sorted by best matches > >>> >- may return duplicate results > >>> > > >>> >* Facets > >>> >+ available in stable 1.4.x > >>> >+ no duplicate entries > >>> >- sorted by count > >>> >- may need an extra N-Gram field for infix queries > >>> > > >>> >* Terms > >>> >+ available in stable 1.4.x > >>> >+ infix query by using regex in 3.x > >>> >- only prefix query in 1.4.x > >>> >- regexp may be slow (just a guess) > >>> > > >>> >* Suggestions > >>> >? Did not try that yet. Does it allow infix queries? > >>> > > >>> >* Field Collapsing > >>> >+ no duplications > >>> >- only available in 4.x branch > >>> >? Does it work together with highlighting? That would > >>> be a big plus. > >>> > > >>> >What are your experiences regarding > >>> autocomplete/autosuggestion with > >>> >Solr? Any additions, suggestions or corrections? What > >>> do you prefer? > >>> > > >>> >Kai > >>> > >>> > >>> > >> > >> > >> > > > > > > > > > > -- > Dr. med. Kai Schlamp > Am Fort Elisabeth 17 > 55131 Mainz > Germany > Phone +49-177-7402778 > Email: schl...@gmx.de >
Re: Problem with field collapsing of patched Solr 1.4
Kai, try SOLR-1086 with Solr trunk instead if trunk is OK for you. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Kai Schlamp > To: solr-user@lucene.apache.org > Sent: Sun, March 13, 2011 11:58:56 PM > Subject: Problem with field collapsing of patched Solr 1.4 > > Hello. > > I just tried to patch Solr 1.4 with the field collapsing patch of > https://issues.apache.org/jira/browse/SOLR-236. The patching and build > process seemed to be ok (below are the steps I did), but the field > collapsing feature doesn't seem to work. > When I go to `http://localhost:8982/solr/select/?q=*:*` I correctly > get 10 documents as result. > When going to >`http://localhost:8982/solr/select/?q=*:*&collapse=true&collapse.field=tag_name_ss&collapse.max=1` > > (tag_name_ss is surely a field with content) I get the same 10 docs as > result back. No further information regarding the field collapsing. > What do I miss? Do I have to activate it somehow? > > * Downloaded >[Solr](http://apache.lauf-forum.at//lucene/solr/1.4.1/apache-solr-1.4.1.tgz) > * Downloaded >[SOLR-236-1_4_1-paging-totals-working.patch](https://issues.apache.org/jira/secure/attachment/12459716/SOLR-236-1_4_1-paging-totals-working.patch) > > * Changed line 2837 of that patch to `@@ -0,0 +1,511 @@` (regarding > this >[comment](https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12932905&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12932905)) > > * Downloaded >[SOLR-236-1_4_1-NPEfix.patch](https://issues.apache.org/jira/secure/attachment/12470202/SOLR-236-1_4_1-NPEfix.patch) > > * Extracted the Solr archive > * Applied both patches: > ** `cd apache-solr-1.4.1` > ** `patch -p0 < ../SOLR-236-1_4_1-paging-totals-working.patch` > ** `patch -p0 < ../SOLR-236-1_4_1-NPEfix.patch` > * Build Solr > ** `ant clean` > ** `ant example` ... tells me "BUILD SUCCESSFUL" > * Reindexed everything (using Sunspot Solr) > * Solr info tells me correctly "Solr Specification Version: > 1.4.1.2011.03.14.04.29.20" > > Kai >
Re: Tokenizing Chinese & multi-language search
Hi Otis, It doesn't look like the last 2 options would work for me. So I guess my best bet is to ask the user to specify the language when they type in the query. Once I get that information from the user, how do I dynamically pick an analyzer for the query string? Thanks Andy --- On Tue, 3/15/11, Otis Gospodnetic wrote: > From: Otis Gospodnetic > Subject: Re: Tokenizing Chinese & multi-language search > To: solr-user@lucene.apache.org > Date: Tuesday, March 15, 2011, 11:51 PM > Hi Andy, > > Is the "I don't know what language the query is in" > something you could change > by... > - asking the user > - deriving from HTTP request headers > - identifying the query language (if queries are long > enough and "texty") > - ... > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > > From: Andy > > To: solr-user@lucene.apache.org > > Sent: Tue, March 15, 2011 9:07:36 PM > > Subject: Tokenizing Chinese & multi-language > search > > > > Hi, > > > > I remember reading in this list a while ago that Solr > will only tokenize on > >whitespace even when using CJKAnalyzer. That would make > Solr unusable on > >Chinese or any other languages that don't use > whitespace as separator. > > > > 1) I remember reading about a workaround. > Unfortunately I can't find the post > >that mentioned it. Could someone give me pointers on > how to address this issue? > > > > 2) Let's say I have fixed this issue and have > properly analyzed and indexed > >the Chinese documents. My documents are in > multiple languages. I plan to use > >separate fields for documents in different > languages: text_en, text_zh, > >text_ja, text_fr, etc. Each field will be > associated with the appropriate > >analyzer. > > > > My problem now is how to deal with the query > string. I don't know what > >language the query is in, so I won't be able to > select the appropriate analyzer > >for the query string. If I just use the standard > analyzer on the query string, > >any query that's in Chinese won't be tokenized > correctly. So would the whole > >system still work in this case? > > > > This must be a pretty common use case, handling > multi-language search. What is > >the recommended way of dealing with this > problem? > > > > Thanks. > > Andy > > > > > > > > >
Stemming question
When I use the Porter Stemmer in Solr, it appears to take works that are stemmed and replace it with the root work in the index. I verified this by looking at analysis.jsp. Is there an option to expand the stemmer to include all combinations of the word? Like include 's, ly, etc? Other options besides protection? Bill
Re: noobie question: sorting
Hi. Where did you find such an obtuse example? Recently, Solr supports sorting by function query. One such function is named "query" which takes a query and uses the score of the result of that query as the function's result. Due to constraints of where this query is placed within a function query, it is necessary to use the local-params syntax (e.g. {!v=...}) since you can't simply state "category:445". Or, there could have been a parameter dereference like $sortQ where sortQ is another parameter holding category:445. Any way, the net effect is that documents are score-sorted based on the query category:445 instead of the user-query ("q" param). I'd expect category:445 docs to come up top and all others to appear randomly afterwards. It would be nice if the sort query could simply be "category:445 desc" but that's not supported. Complicated? You bet! But fear not; this is about as complicated as it gets. References: http://wiki.apache.org/solr/SolrQuerySyntax http://wiki.apache.org/solr/CommonQueryParameters#sort http://wiki.apache.org/solr/FunctionQuery#query ~ David Smiley Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/noobie-question-sorting-tp2685250p2685617.html Sent from the Solr - User mailing list archive at Nabble.com.