Re: ranged query on multivalued field doesnt seem to work
Hi, I am still struggling with this... but I guess would it be because for some data there are maximum interger values for the fields "start_year" "end_year", like "2.14748365E9", which solr does not recognise as "sfloat", because there is a "E" letter? In terms of doing ranged queries on multivalued fields, I think it should be ok because i have another two fields using sfloat and are multivalued, the ranged queries work ok Any hints are appreciated! thanks! zqzuk wrote: > > Hi all, > > in my schema I have two multivalued fields as > > multiValued="true"/> > multiValued="true"/> > > and I issued a query as: start_year:[400 TO *], the result seems to be > incorrect because I got some records with start year = - 3000... and also > start year = -2147483647 (Integer.MINVALUE) Also when I combine start_year > with end_year, it also produced wrong results... > > what could be wrong? is it because I used wrong field type "sfloat", which > should be integer? > > Any hints would be very much appreciated! > > many thanks! > -- View this message in context: http://www.nabble.com/ranged-query-on-multivalued-field-doesnt-seem-to-work-tp21731778p21743688.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: WebLogic 10 Compatibility Issue - StackOverflowError
I created a wiki page shortly after posting to the list: http://wiki.apache.org/solr/SolrWeblogic From what we could tell Solr itself was fully functional, it was only the admin tools that were failing. Regards, Ilan Rabinovitch --- SCALE 7x: 2009 Southern California Linux Expo Los Angeles, CA http://www.socallinuxexpo.org On 1/29/09 4:34 AM, Mark Miller wrote: We should get this on the wiki. - Mark Ilan Rabinovitch wrote: We were able to deploy Solr 1.3 on Weblogic 10.0 earlier today. Doing so required two changes: 1) Creating a weblogic.xml file in solr.war's WEB-INF directory. The weblogic.xml file is required to disable Solr's filter on FORWARD. The contents of weblogic.xml should be: http://www.bea.com/ns/weblogic/90"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://www.bea.com/ns/weblogic/90 http://www.bea.com/ns/weblogic/90/weblogic-web-app.xsd";> false 2) Remove the pageEncoding attribute from line 1 of solr/admin/header.jsp On 1/17/09 2:02 PM, KSY wrote: I hit a major roadblock while trying to get Solr 1.3 running on WebLogic 10.0. A similar message was posted before - ( http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin-page-td20157873.html http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin-page-td20157873.html ) - but it seems like it hasn't been resolved yet, so I'm re-posting here. I am sure I configured everything correctly because it's working fine on Resin. Has anyone successfully run Solr 1.3 on WebLogic 10.0 or higher? Thanks. SUMMARY: When accessing /solr/admin page, StackOverflowError occurs due to an infinite recursion in SolrDispatchFilter ENVIRONMENT SETTING: Solr 1.3.0 WebLogic 10.0 JRockit JVM 1.5 ERROR MESSAGE: SEVERE: javax.servlet.ServletException: java.lang.StackOverflowError at weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:276) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:42) at weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDispatcherImpl.java:526) at weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:261) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:42) at weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDispatcherImpl.java:526) at weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:261) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:42) at weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDispatcherImpl.java:526) at weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:261) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273)
Re: How to handle database replication delay when using DataImportHandler?
On Fri, Jan 30, 2009 at 12:27 AM, Gregg Donovan wrote: > Noble, > > Thanks for the suggestion. The unfortunate thing is that we really don't > know ahead of time what sort of replication delay we're going to encounter > -- it could be one millisecond or it could be one hour. So, we end up > needing to do something like: > > For delta-import run N: > 1. query DB slave for "seconds_behind_master", use this to calculate > Date(N). > 2. query DB slave for records updated since Date(N - 1) > > I see there are plugin points for EventListener classes (onImportStart, > onImportEnd). Would those be the right spot to calculate these dates so > that > I could expose them to my custom function at query time? > Unfortunately, the Context object (which carries the context information and way to pass messages to other components) is not exposed to Evaluator. We should expose this information to be consistent with other DIH components. I've opened an issue to track this at https://issues.apache.org/jira/browse/SOLR-996 -- Regards, Shalin Shekhar Mangar.
Re: got background_merge_hit_exception during optimization
What system and JVM was this using? Also, could you get the stack trace directly from the Solr logs and post it? -Yonik On Thu, Jan 29, 2009 at 4:06 PM, Qingdi wrote: > > We got the following background_merge_hit_exception during optimization: > exception: > )background_merge_hit_exception__4zsgC136887658__50nfC995992__51i9C995977__52d5C995968__537yC995999__54xmC1892345__54xlC99593_into__54xn_optimize__javaioIOException_background_merge_hit_exception__4zsgC136887658__50nfC995992__51i9C995977__52d5C995968__537yC995999__54xmC1892345__54xlC99593_into__54xn_optimize__at_orgapacheluceneindexIndexWriteroptimizeIndexWriterjava2346__at_orgapacheluceneindexIndexWriteroptimizeIndexWriterjava2280__at_orgapachesolrupdateDirectUpdateHandler2commitDirectUpdateHandler2java355__at_orgapachesolrupdateprocessorRunUpdateProcessorprocessCommitRunUpdateProcessorFactoryjava77__at_orgapachesolrhandlerRequestHandlerUtilshandleCommitRequestHandlerUtilsjava104__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava113__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1204__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava303__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava232__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__a > > Does anyone know what could be the cause of the exception? what should we do > to prevent this type of exception? > > Some posts in the Lucene forum say the exception is usually related with > disk space issue. But there should be enough disk space in our system. Our > index size was about 56G. And before optimization, the disk had about 360G > free space. > > After the above background_merge_hit_exception raised, solr kept generating > new segment files, which ate up all the CPU time and the disk space, so we > had to kill the solr server. > > Thanks for your help. > > Qingdi > > > -- > View this message in context: > http://www.nabble.com/got-background_merge_hit_exception-during-optimization-tp21735847p21735847.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
MultiValue DynamicFields?
Hi, it is possible to create a dynamic field that is multi valued? Cheers, Bruno
Re: MultiValue DynamicFields?
Yes. It's totally acceptable. 2009/1/30 Bruno Aranda > Hi, it is possible to create a dynamic field that is multi valued? > > Cheers, > > Bruno > -- Alexander Ramos Jardim
Re: Optimizing & Improving results based on user feedback
It may not be as fine-grained as you want, but also check the QueryElevationComponent. This takes a preconfigured list of what the top results should be for a given query and makes thoes documents the top results. Presumably, you could use click logs to determine what the top result should be. On Jan 29, 2009, at 7:45 PM, Walter Underwood wrote: "A Decision Theoretic Framework for Ranking using Implicit Feedback" uses clicks, but the best part of that paper is all the side comments about difficulties in evaluation. For example, if someone clicks on three results, is that three times as good or two failures and a success? We have to know the information need to decide. That paper is in the LR4IR 2008 proceedings. Both Radlinski and Joachims seem to be focusing on click data. I'm thinking of something much simpler, like taking the first N hits and reordering those before returning. Brute force, but would get most of the benefit. Usually, you only have reliable click data for a small number of documents on each query, so it is a waste of time to rerank the whole list. Besides, if you need to move something up 100 places on the list, you should probably be tuning your regular scoring rather than patching it with click data. wunder On 1/29/09 3:43 PM, "Matthew Runo" wrote: Agreed, it seems that a lot of the algorithms in these papers would almost be a whole new RequestHandler ala Dismax. Luckily a lot of them seem to be built on Lucene (at least the ones that I looked at that had code samples). Which papers did you see that actually talked about using clicks? I don't see those, beyond "Addressing Malicious Noise in Clickthrough Data" by Filip Radlinski and also his "Query Chains: Learning to Rank from Implicit Feedback" - but neither is really on topic. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Jan 29, 2009, at 11:36 AM, Walter Underwood wrote: Thanks, I didn't know there was so much research in this area. Most of the papers at those workshops are about tuning the entire ranking algorithm with machine learning techniques. I am interested in adding one more feature, click data, to an existing ranking algorithm. In my case, I have enough data to use query-specific boosts instead of global document boosts. We get about 2M search clicks per day from logged in users (little or no click spam). I'm checking out some papers from Thorsten Joachims and from Microsoft Research that are specifically about clickthrough feedback. wunder On 1/27/09 11:15 PM, "Neal Richter" wrote: OK I've implemented this before, written academic papers and patents related to this task. Here are some hints: - you're on the right track with the editorial boosting elevators - http://wiki.apache.org/solr/UserTagDesign - be darn careful about assuming that one click is enough evidence to boost a long 'distance' - first page effects in search will skew the learning badly if you don't compensate. 95% of users never go past the first page of results, 1% go past the second page. So perfectly good results on the second page get permanently locked out - consider forgetting what you learn under some condition In fact this whole area is called 'learning to rank' and is a hot research topic in IR. http://web.mit.edu/shivani/www/Ranking-NIPS-05/ http://research.microsoft.com/en-us/um/people/lr4ir-2007/ https://research.microsoft.com/en-us/um/people/lr4ir-2008/ - Neal Richter On Tue, Jan 27, 2009 at 2:06 PM, Matthew Runo wrote: Hello folks! We've been thinking about ways to improve organic search results for a while (really, who hasn't?) and I'd like to get some ideas on ways to implement a feedback system that uses user behavior as input. Basically, it'd work on the premise that what the user actually clicked on is probably a really good match for their search, and should be boosted up in the results for that search. For example, if I search for "rain boots", and really love the 10th result down (and show it by clicking on it), then we'd like to capture this and use the data to boost up that result //for that search//. We've thought about using index time boosts for the documents, but that'd boost it regardless of the search terms, which isn't what we want. We've thought about using the Elevator handler, but we don't really want to force a product to the top - we'd prefer it slowly rises over time as more and more people click it from the same search terms. Another way might be to stuff the keyword into the document, the more times it's in the document the higher it'd score - but there's gotta be a better way than that. Obviously this can't be done 100% in solr - but if anyone had some clever ideas about how this might be possible it'd be interesting to hear them. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833
Re: query with stemming, prefix and fuzzy?
Thanks, Mark, for your answer, Mark Miller wrote: > Truncation queries and stemming are difficult partners. You likely have > to accept compromise. You can try using multiple fields like you are, I already have multiple fields, one per language, to be able to use different stemmers. Wouldn't become this too much? > you can try indexing the full term at the same position as the stemmed > term, what does this mean "at the same position" and how could I do this? > or you can accept the weirdness that comes from matching on a > stemmed form (potentially very confusing for a user). Currently I think about dropping the stemming and only use prefix-search. But as highlighting does not work with a prefix "house*" this is a problem for me. The hint to use "house?*" instead does not work here. > In any case though, a queryparser that support fuzzyquery should not be > analyzing it. What parser are you using? If it is analyzing the fuzzy > syntax, it doesnt likely support it. I am using the following definitions (testing it with and without stemming): > positionIncrementGap="100"> > > > > > ignoreCase="true" > words="stopwords_de_de.txt" > enablePositionIncrements="true" > /> > generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" > splitOnCaseChange="1"/> > > > > > > > > synonyms="synonyms_de_de.txt" ignoreCase="true" expand="true"/> > words="stopwords_de_de.txt"/> > generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" > splitOnCaseChange="1"/> > > > > > > and, well, the parser? Where is the parser specified? Do you mean the request handler "qt" (that will be "standard", as I do not set it yet)? > The prefix length determines how many terms are enumerated - with the Can the prefix length be set in Solr? I could not find such an option. > The latest trunk build on Lucene will let us switch fuzzy query to use a > constant score mode - this will eliminate the booleanquery and should > perform much better on a large index. Solr already uses a constant score > mode for Prefix and Wildcard queries. much better performance is always good. When will this feature be available in Solr? > How big is your index? If its not that big, it may be odd that your > seeing things that slow (number of unique terms in the index will play a > large role). Well, the index currently contains about 5000 documents. These are HTML-pages, some of them are concatenated with PDF/DOCs (Downloads linked from the HTML-page) converted to text. The index data is about 11MB (optimized). So think, this is just a smaller index. Greetings, Gert
Re: Optimizing & Improving results based on user feedback
Matthew Runo wrote: Which papers did you see that actually talked about using clicks? I don't see those, beyond "Addressing Malicious Noise in Clickthrough Data" by Filip Radlinski and also his "Query Chains: Learning to Rank from Implicit Feedback" - but neither is really on topic. Here are three that I've found useful: P. Young, C. Clarke, et.al. Improving Retrieval Accuracy by Weighting Document Types with Clickthrough Data. SIGIR 2007. E. Agichtein, E. Brill, and S. Dumais. Improving Web Search Ranking by incorporation user behavior information. SIGIR 2006. T. Joachims, L. Granka, and B. Pan. Accurately Interpreting Clickthrough Data as Implicit Feedback. SIGIR 2005. -Sean
Re: Optimizing & Improving results based on user feedback
I've thought about patching the QueryElevationComponent to apply boosts rather than a specific sort. Then the file might look like.. query> And I could write a script that looks at click data once a day to fill out this file. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Jan 30, 2009, at 6:37 AM, Ryan McKinley wrote: It may not be as fine-grained as you want, but also check the QueryElevationComponent. This takes a preconfigured list of what the top results should be for a given query and makes thoes documents the top results. Presumably, you could use click logs to determine what the top result should be. On Jan 29, 2009, at 7:45 PM, Walter Underwood wrote: "A Decision Theoretic Framework for Ranking using Implicit Feedback" uses clicks, but the best part of that paper is all the side comments about difficulties in evaluation. For example, if someone clicks on three results, is that three times as good or two failures and a success? We have to know the information need to decide. That paper is in the LR4IR 2008 proceedings. Both Radlinski and Joachims seem to be focusing on click data. I'm thinking of something much simpler, like taking the first N hits and reordering those before returning. Brute force, but would get most of the benefit. Usually, you only have reliable click data for a small number of documents on each query, so it is a waste of time to rerank the whole list. Besides, if you need to move something up 100 places on the list, you should probably be tuning your regular scoring rather than patching it with click data. wunder On 1/29/09 3:43 PM, "Matthew Runo" wrote: Agreed, it seems that a lot of the algorithms in these papers would almost be a whole new RequestHandler ala Dismax. Luckily a lot of them seem to be built on Lucene (at least the ones that I looked at that had code samples). Which papers did you see that actually talked about using clicks? I don't see those, beyond "Addressing Malicious Noise in Clickthrough Data" by Filip Radlinski and also his "Query Chains: Learning to Rank from Implicit Feedback" - but neither is really on topic. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Jan 29, 2009, at 11:36 AM, Walter Underwood wrote: Thanks, I didn't know there was so much research in this area. Most of the papers at those workshops are about tuning the entire ranking algorithm with machine learning techniques. I am interested in adding one more feature, click data, to an existing ranking algorithm. In my case, I have enough data to use query-specific boosts instead of global document boosts. We get about 2M search clicks per day from logged in users (little or no click spam). I'm checking out some papers from Thorsten Joachims and from Microsoft Research that are specifically about clickthrough feedback. wunder On 1/27/09 11:15 PM, "Neal Richter" wrote: OK I've implemented this before, written academic papers and patents related to this task. Here are some hints: - you're on the right track with the editorial boosting elevators - http://wiki.apache.org/solr/UserTagDesign - be darn careful about assuming that one click is enough evidence to boost a long 'distance' - first page effects in search will skew the learning badly if you don't compensate. 95% of users never go past the first page of results, 1% go past the second page. So perfectly good results on the second page get permanently locked out - consider forgetting what you learn under some condition In fact this whole area is called 'learning to rank' and is a hot research topic in IR. http://web.mit.edu/shivani/www/Ranking-NIPS-05/ http://research.microsoft.com/en-us/um/people/lr4ir-2007/ https://research.microsoft.com/en-us/um/people/lr4ir-2008/ - Neal Richter On Tue, Jan 27, 2009 at 2:06 PM, Matthew Runo wrote: Hello folks! We've been thinking about ways to improve organic search results for a while (really, who hasn't?) and I'd like to get some ideas on ways to implement a feedback system that uses user behavior as input. Basically, it'd work on the premise that what the user actually clicked on is probably a really good match for their search, and should be boosted up in the results for that search. For example, if I search for "rain boots", and really love the 10th result down (and show it by clicking on it), then we'd like to capture this and use the data to boost up that result //for that search//. We've thought about using index time boosts for the documents, but that'd boost it regardless of the search terms, which isn't what we want. We've thought about using the Elevator handler, but we don't really want to force a product to the top - we'd prefer it slowly rises over time as more and more people click it from the same search terms. Another way might be to stuff the keyword into the document, the more times it's in the document the
Re: Rsyncd start and stop for multiple instances
Hi, How can I hack the existing script to support multiple rsync module rsyncd.conf file uid = root gid = root use chroot = no list = no pid file = /data/solr/book/logs/rsyncd.pid log file = /data/solr/book/logs/rsyncd.log [solr] path = /data/solr/book/data comment = Solr How do I do for /data/solr/user ?? thanks a lot Bill Au wrote: > > You can either use a dedicated rsync port for each instance or hack the > existing scripts to support multiple rsync modules. Both ways should > work. > > Bill > > On Tue, Jul 1, 2008 at 3:49 AM, Jacob Singh wrote: > >> Hi Bill and Others: >> >> >> Bill Au wrote: >> > The rsyncd-start scripts gets the data_dir path from the command line >> and >> > create a rsyncd.conf on the fly exporting the path as the rsync module >> named >> > "solr". The salves need the data_dir path on the master to look for >> the >> > latest snapshot. But the rsync command used by the slaves relies on >> the >> > rsync module name "solr" to do the file transfer using rsyncd. >> >> So is the answer that replication simply won't work for multiple >> instances unless I have a dedicated port for each one? >> >> Or is the answer that I have to hack the existing scripts? >> >> I'm a little confused when you say that slave needs to know the master's >> data dir, but, no matter what it sends, it needs to match the one known >> by the master when it starts rsyncd... >> >> Sorry if my questions are newbie, I've not actually used rsyncd, but >> I've read up quite a bit now. >> >> Thanks, >> Jacob >> >> > >> > Bill >> > >> > On Tue, Jun 10, 2008 at 4:24 AM, Jacob Singh >> wrote: >> > >> >> Hey folks, >> >> >> >> I'm messing around with running multiple indexes on the same server >> >> using Jetty contexts. I've got the running groovy thanks to the >> >> tutorial on the wiki, however I'm a little confused how the collection >> >> distribution stuff will work for replication. >> >> >> >> The rsyncd-enable command is simple enough, but the rsyncd-start >> command >> >> takes a -d (data dir) as an argument... Since I'm hosting 4 different >> >> instances, all with their own data dirs, how do I do this? >> >> >> >> Also, you have to specify the master data dir when you are connecting >> >> from the slave anyway, so why does it need to be specified when I >> start >> >> the daemon? If I just start it with any old data dir will it work for >> >> anything the user running it has perms on? >> >> >> >> Thanks, >> >> Jacob >> >> >> > >> >> > > -- View this message in context: http://www.nabble.com/Rsyncd-start-and-stop-for-multiple-instances-tp17750242p21750131.html Sent from the Solr - User mailing list archive at Nabble.com.
User tag design for read-only index
I am build a system that indexes a bunch of data and then will let users manually put the data in lists. I have seen http://wiki.apache.org/solr/UserTagDesign The behavior I would like is identical to 'tagging' each document with the list-id/user/order and then using standard faceting to show what lists documents are in and what users have put the docs into a list. But - I would like the main index to be read only. The index needs to be shared across many installations that should not have access to other users data. Any thoughts on how this might be possible? Off hand, it seems like manually filling an un-inverted field cache might be a place to start looking. Perhaps using a multi-searcher and keeping two indexes -- that seems like a lot of work. thanks ryan
RE: Re: WebLogic 10 Compatibility Issue - StackOverflowError
Are the issues ran into due to non-standard code in Solr, or is there some WebLogic inconsistency? -Todd Feak -Original Message- From: news [mailto:n...@ger.gmane.org] On Behalf Of Ilan Rabinovitch Sent: Friday, January 30, 2009 1:11 AM To: solr-user@lucene.apache.org Subject: Re: WebLogic 10 Compatibility Issue - StackOverflowError I created a wiki page shortly after posting to the list: http://wiki.apache.org/solr/SolrWeblogic From what we could tell Solr itself was fully functional, it was only the admin tools that were failing. Regards, Ilan Rabinovitch --- SCALE 7x: 2009 Southern California Linux Expo Los Angeles, CA http://www.socallinuxexpo.org On 1/29/09 4:34 AM, Mark Miller wrote: > We should get this on the wiki. > > - Mark > > > Ilan Rabinovitch wrote: >> >> We were able to deploy Solr 1.3 on Weblogic 10.0 earlier today. Doing >> so required two changes: >> >> 1) Creating a weblogic.xml file in solr.war's WEB-INF directory. The >> weblogic.xml file is required to disable Solr's filter on FORWARD. >> >> The contents of weblogic.xml should be: >> >> >> > xmlns="http://www.bea.com/ns/weblogic/90"; >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; >> xsi:schemaLocation="http://www.bea.com/ns/weblogic/90 >> http://www.bea.com/ns/weblogic/90/weblogic-web-app.xsd";> >> >> >> >> false >> >> >> >> >> >> >> 2) Remove the pageEncoding attribute from line 1 of solr/admin/header.jsp >> >> >> >> >> On 1/17/09 2:02 PM, KSY wrote: >>> I hit a major roadblock while trying to get Solr 1.3 running on WebLogic >>> 10.0. >>> >>> A similar message was posted before - ( >>> http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin- page-td20157873.html >>> >>> http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin- page-td20157873.html >>> >>> ) - but it seems like it hasn't been resolved yet, so I'm re-posting >>> here. >>> >>> I am sure I configured everything correctly because it's working fine on >>> Resin. >>> >>> Has anyone successfully run Solr 1.3 on WebLogic 10.0 or higher? Thanks. >>> >>> >>> SUMMARY: >>> >>> When accessing /solr/admin page, StackOverflowError occurs due to an >>> infinite recursion in SolrDispatchFilter >>> >>> >>> ENVIRONMENT SETTING: >>> >>> Solr 1.3.0 >>> WebLogic 10.0 >>> JRockit JVM 1.5 >>> >>> >>> ERROR MESSAGE: >>> >>> SEVERE: javax.servlet.ServletException: java.lang.StackOverflowError >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:276) >>> >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) >>> >>> at >>> weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java: 42) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis patcherImpl.java:526) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:261) >>> >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) >>> >>> at >>> weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java: 42) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis patcherImpl.java:526) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:261) >>> >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) >>> >>> at >>> weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java: 42) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis patcherImpl.java:526) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:261) >>> >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) >>> >>> >> >> > >
Re: Optimizing & Improving results based on user feedback
yes, applying a boost would be a good addition. patches are always welcome ;) On Jan 30, 2009, at 10:56 AM, Matthew Runo wrote: I've thought about patching the QueryElevationComponent to apply boosts rather than a specific sort. Then the file might look like.. And I could write a script that looks at click data once a day to fill out this file. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Jan 30, 2009, at 6:37 AM, Ryan McKinley wrote: It may not be as fine-grained as you want, but also check the QueryElevationComponent. This takes a preconfigured list of what the top results should be for a given query and makes thoes documents the top results. Presumably, you could use click logs to determine what the top result should be. On Jan 29, 2009, at 7:45 PM, Walter Underwood wrote: "A Decision Theoretic Framework for Ranking using Implicit Feedback" uses clicks, but the best part of that paper is all the side comments about difficulties in evaluation. For example, if someone clicks on three results, is that three times as good or two failures and a success? We have to know the information need to decide. That paper is in the LR4IR 2008 proceedings. Both Radlinski and Joachims seem to be focusing on click data. I'm thinking of something much simpler, like taking the first N hits and reordering those before returning. Brute force, but would get most of the benefit. Usually, you only have reliable click data for a small number of documents on each query, so it is a waste of time to rerank the whole list. Besides, if you need to move something up 100 places on the list, you should probably be tuning your regular scoring rather than patching it with click data. wunder On 1/29/09 3:43 PM, "Matthew Runo" wrote: Agreed, it seems that a lot of the algorithms in these papers would almost be a whole new RequestHandler ala Dismax. Luckily a lot of them seem to be built on Lucene (at least the ones that I looked at that had code samples). Which papers did you see that actually talked about using clicks? I don't see those, beyond "Addressing Malicious Noise in Clickthrough Data" by Filip Radlinski and also his "Query Chains: Learning to Rank from Implicit Feedback" - but neither is really on topic. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Jan 29, 2009, at 11:36 AM, Walter Underwood wrote: Thanks, I didn't know there was so much research in this area. Most of the papers at those workshops are about tuning the entire ranking algorithm with machine learning techniques. I am interested in adding one more feature, click data, to an existing ranking algorithm. In my case, I have enough data to use query-specific boosts instead of global document boosts. We get about 2M search clicks per day from logged in users (little or no click spam). I'm checking out some papers from Thorsten Joachims and from Microsoft Research that are specifically about clickthrough feedback. wunder On 1/27/09 11:15 PM, "Neal Richter" wrote: OK I've implemented this before, written academic papers and patents related to this task. Here are some hints: - you're on the right track with the editorial boosting elevators - http://wiki.apache.org/solr/UserTagDesign - be darn careful about assuming that one click is enough evidence to boost a long 'distance' - first page effects in search will skew the learning badly if you don't compensate. 95% of users never go past the first page of results, 1% go past the second page. So perfectly good results on the second page get permanently locked out - consider forgetting what you learn under some condition In fact this whole area is called 'learning to rank' and is a hot research topic in IR. http://web.mit.edu/shivani/www/Ranking-NIPS-05/ http://research.microsoft.com/en-us/um/people/lr4ir-2007/ https://research.microsoft.com/en-us/um/people/lr4ir-2008/ - Neal Richter On Tue, Jan 27, 2009 at 2:06 PM, Matthew Runo wrote: Hello folks! We've been thinking about ways to improve organic search results for a while (really, who hasn't?) and I'd like to get some ideas on ways to implement a feedback system that uses user behavior as input. Basically, it'd work on the premise that what the user actually clicked on is probably a really good match for their search, and should be boosted up in the results for that search. For example, if I search for "rain boots", and really love the 10th result down (and show it by clicking on it), then we'd like to capture this and use the data to boost up that result //for that search//. We've thought about using index time boosts for the documents, but that'd boost it regardless of the search terms, which isn't what we want. We've thought about using the Elevator handler, but we don't really want to force a product to the top - we'd prefer it slowly rises over time as more and more people
1.3 <-> 1.4 patch for onError handling
Hi, Ive just had a bump in the night where some feeds have disappeared, Im wondering since Im running the base 1.3 copy would patching it w/ https://issues.apache.org/jira/browse/SOLR-842 Break anything? Has anyone done this yet? Thanks. - Jon
Re: got background_merge_hit_exception during optimization
We are on solr 1.3, and we use the default jetty server, which is included in the solr 1.3 download package. The java version is: java version "1.5.0_12" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_12-b04, mixed mode) I checked the log files under logs and solr/logs, but don't see any error. Would you please let me know how to get the stack trace from the solr logs? Appreciate your help. Qingdi Yonik Seeley-2 wrote: > > What system and JVM was this using? > Also, could you get the stack trace directly from the Solr logs and post > it? > > -Yonik > > On Thu, Jan 29, 2009 at 4:06 PM, Qingdi wrote: >> >> We got the following background_merge_hit_exception during optimization: >> exception: >> )background_merge_hit_exception__4zsgC136887658__50nfC995992__51i9C995977__52d5C995968__537yC995999__54xmC1892345__54xlC99593_into__54xn_optimize__javaioIOException_background_merge_hit_exception__4zsgC136887658__50nfC995992__51i9C995977__52d5C995968__537yC995999__54xmC1892345__54xlC99593_into__54xn_optimize__at_orgapacheluceneindexIndexWriteroptimizeIndexWriterjava2346__at_orgapacheluceneindexIndexWriteroptimizeIndexWriterjava2280__at_orgapachesolrupdateDirectUpdateHandler2commitDirectUpdateHandler2java355__at_orgapachesolrupdateprocessorRunUpdateProcessorprocessCommitRunUpdateProcessorFactoryjava77__at_orgapachesolrhandlerRequestHandlerUtilshandleCommitRequestHandlerUtilsjava104__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava113__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1204__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava303__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava232__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__a >> >> Does anyone know what could be the cause of the exception? what should we >> do >> to prevent this type of exception? >> >> Some posts in the Lucene forum say the exception is usually related with >> disk space issue. But there should be enough disk space in our system. >> Our >> index size was about 56G. And before optimization, the disk had about >> 360G >> free space. >> >> After the above background_merge_hit_exception raised, solr kept >> generating >> new segment files, which ate up all the CPU time and the disk space, so >> we >> had to kill the solr server. >> >> Thanks for your help. >> >> Qingdi >> >> >> -- >> View this message in context: >> http://www.nabble.com/got-background_merge_hit_exception-during-optimization-tp21735847p21735847.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/got-background_merge_hit_exception-during-optimization-tp21735847p21751938.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: query with stemming, prefix and fuzzy?
Gert Brinkmann wrote: Thanks, Mark, for your answer, Mark Miller wrote: Truncation queries and stemming are difficult partners. You likely have to accept compromise. You can try using multiple fields like you are, I already have multiple fields, one per language, to be able to use different stemmers. Wouldn't become this too much? Possibly. Especially if you are using norms with all of those fields. Depends on your index though. you can try indexing the full term at the same position as the stemmed term, what does this mean "at the same position" and how could I do this? Write a custom filter. Normally, for every term, its position is incremented by 1 as the terms are broken out in tokenization. You can change this and index terms at the same position using your own filter. There are ramifications, because you are adding more terms to your index, but it allows you to index multiple forms of a term at the same position (so that phrase queries still work as expected). or you can accept the weirdness that comes from matching on a stemmed form (potentially very confusing for a user). Currently I think about dropping the stemming and only use prefix-search. But as highlighting does not work with a prefix "house*" this is a problem for me. The hint to use "house?*" instead does not work here. Thats because wildcard queries are also not highlightable now. I actually have somewhat of a solution to this that I'll work on soon (I've gotten the ground work for it in or ready to be in Lucene). No guarantee on when or if it will be accepted in solr though. In any case though, a queryparser that support fuzzyquery should not be analyzing it. What parser are you using? If it is analyzing the fuzzy syntax, it doesnt likely support it. I am using the following definitions (testing it with and without stemming): and, well, the parser? Where is the parser specified? Do you mean the request handler "qt" (that will be "standard", as I do not set it yet)? Thats odd. I'll have to look at this closer to be of help. The prefix length determines how many terms are enumerated - with the Can the prefix length be set in Solr? I could not find such an option. I don't think there is an option in Solr. Patches welcome of course. It would be a nice one - using the default of 0 is *very* not scalable. The latest trunk build on Lucene will let us switch fuzzy query to use a constant score mode - this will eliminate the booleanquery and should perform much better on a large index. Solr already uses a constant score mode for Prefix and Wildcard queries. much better performance is always good. When will this feature be available in Solr? Soon I hope. Since wildcard and prefix are already constant score, it only makes sense to make fuzzy query that way as well. How big is your index? If its not that big, it may be odd that your seeing things that slow (number of unique terms in the index will play a large role). Well, the index currently contains about 5000 documents. These are HTML-pages, some of them are concatenated with PDF/DOCs (Downloads linked from the HTML-page) converted to text. The index data is about 11MB (optimized). So think, this is just a smaller index. Yeah, sounds small. Its odd you would see such slow performance. It depends though. You may still have a *lot* of unique terms in there.
Re: query with stemming, prefix and fuzzy?
Mark Miller wrote: > Yeah, sounds small. Its odd you would see such slow performance. It > depends though. You may still have a *lot* of unique terms in there. Is there a way to retrieve the list of terms in the index? Gert
Re: query with stemming, prefix and fuzzy?
Gert Brinkmann wrote: Mark Miller wrote: Yeah, sounds small. Its odd you would see such slow performance. It depends though. You may still have a *lot* of unique terms in there. Is there a way to retrieve the list of terms in the index? Gert Try hitting /solr/admin/luke and see what it says. - Mark
solr booosting
Hey there, I am trying to tune the boost of the results obtained using DisMaxQueryParser. As I understood lucene's boost, if you search for "John Le Carre" it will give better score to the results that contains just the searched string that results that have, for example, 50 words and the search is contained in the words. In Solr, my goal is to give more score to the docs that contains both words but that have more words in the field. I have tried 2 options: 1.-On index time, I check the length of the fields and if are bigger that 'x' chars i give more boost to that doc (I am adding 3.0 extra boost using addBoost). 2.-In another hand I have been playing with tie and pf but I think they are not helping in my issue. Before using Solr (my own Lucene searcher and indexer) the first option use to work quite well, in Solr my extra boost seems to afect much less. Is this normal as I am using DismaxQueryParser or it should be the same? Any advice is more than welcome! Thanks in advance -- View this message in context: http://www.nabble.com/solr-booosting-tp21753617p21753617.html Sent from the Solr - User mailing list archive at Nabble.com.
exceeded limit of maxWarmingSearchers
I am getting hit by a storm of these once a day or so: SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=16, try again later. I keep bumping up maxWarmingSearchers. It's at 32 now. Is there any way to figure out what the "right" value is besides trial and error? Our site gets extremely minimal traffic so I'm really puzzled why the out-of-the-box settings are insufficient. The index has about 61000 documents, very small, and we do less than one query per second. -jsd-
Re: exceeded limit of maxWarmingSearchers
I'd advise setting it to a very low limit (like 2) and committing less often. Once you get too many overlapping searchers, things will slow to a crawl and that will just cause more to pile up. The root cause is simply too many commits in conjunction with warming too long. If you are using a dev version of Solr 1.4, you might try commitWithin instead of explicit commits. (see SOLR-793) Depending how long warming takes, you may want to lower autowarm counts. -Yonik On Fri, Jan 30, 2009 at 2:14 PM, Jon Drukman wrote: > I am getting hit by a storm of these once a day or so: > > SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. > exceeded limit of maxWarmingSearchers=16, try again later. > > I keep bumping up maxWarmingSearchers. It's at 32 now. Is there any way to > figure out what the "right" value is besides trial and error? Our site gets > extremely minimal traffic so I'm really puzzled why the out-of-the-box > settings are insufficient. > > The index has about 61000 documents, very small, and we do less than one > query per second. > > -jsd- > >
Re: query with stemming, prefix and fuzzy?
Mark Miller wrote: > Try hitting /solr/admin/luke and see what it says. Oh, interesting. I think I have to check the stopword list. Is there a way to filter single characters like the "h"? text_de_de ITS-- ITS-- 2340 57971 1454 1016 1008 980 927 924 895 843 730 730 Thank you for the information. Gert
Re: solr as the data store
The other option was actually couchdb. It was very nice but the benefits were not compelling compared to the pure simplicity of just having solr. With the replication just so simple to setup now - it really does seem to solve all the problems we are looking for in a redundant distributed storage solution. On Thu, Jan 29, 2009 at 12:50 AM, Neal Richter wrote: > You might examine what the Apache CouchDB people have done. > > It's a document oriented DB that is able to use JSON structured > documents combined with Lucene indexing of the documents with a > RESTful HTTP interface. > > It's a stretch, and written in Erlang.. but perhaps there is some > inspiration to be had for 'solr as the data store'. > > - Neal Richter > -- Regards, Ian Connor
Re: query with stemming, prefix and fuzzy?
Gert Brinkmann wrote: 57971 Its a lot for a small index. The fuzzy query will enumerate all of those terms and calculate an edit distance. Its not an insane amount of work, but it jives with the slowness you see. Doing that 60,000 times for a query is not that fast. Unfortunately, without the prefix setting, FuzzyQueries are slow, slow with that many uniques. Solr should def allow the prefix to be set. There was talk a couple years back about changing the default prefix value in Lucene because its so slow - but it didn't happen. The developers decided that you could tweak it yourself if you needed to be able to scale (if you add a prefix length, up to that length won't be fuzzy). Unfortunately, Solr hasnt yet given this option to my knowledge. - Mark
Re: query with stemming, prefix and fuzzy?
On Fri, Jan 30, 2009 at 11:37 PM, Mark Miller wrote: > >> >>> you can try indexing the full term at the same position as the stemmed >>> term, >>> >>> >> >> what does this mean "at the same position" and how could I do this? >> >> > Write a custom filter. Normally, for every term, its position is > incremented by 1 as the terms are broken out in tokenization. You can change > this and index terms at the same position using your own filter. There are > ramifications, because you are adding more terms to your index, but it > allows you to index multiple forms of a term at the same position (so that > phrase queries still work as expected). Can SOLR-763 help here? It is in trunk now. https://issues.apache.org/jira/browse/SOLR-763 -- Regards, Shalin Shekhar Mangar.
problems on solr search patterns and sorting rules
Hi buddy, I work on an audio search based on solr engine. I want to realize lyric search and sort by relevance. Here is my confusion. . My schema.xml is like this: text ... http://localhost:8983/solr/select/?q=lyric:(tear the house down)&fl=*,score&version=2.2&start=0&rows=10&indent=on has results http://localhost:8983/solr/select/?q=tear the house down&fl=*,score&version=2.2&start=0&rows=10&indent=on have no result http://localhost:8983/solr/select/?q=tear the house down&fl=*,score&qf=lyric&version=2.2&start=0&rows=10&indent=on have no result Q1: why the latter links not work while I have added lyric to copyField? Q2: I want to set the priority of song name higher ,then artist name and album so I try like this: http://localhost:8983/solr/select/?q=sweet&fl=*,score&qf=mp3^5 artist album^0.4&version=2.2&start=0&rows=10&indent=on I find the score are totally same as without the argument of qf: http://localhost:8983/solr/select/?q=sweet&fl=*,score&version=2.2&start=0&rows=10&indent=on How could I modify the sorting? Q3: I would like to realize the effect like : http://mp3.baidu.com/m?f=ms&rn=&tn=baidump3&ct=134217728&word=tear+the+house+down&lm=-1 highlight the fragment. Can solr give me the range of minimum of all keywords in an article like lyric? Thank you for attention!
Separate error logs
Hi all,What's the best way for me to split Solr/Lucene error message off to a separate log? Thanks James
Re: exceeded limit of maxWarmingSearchers
Yonik Seeley wrote: I'd advise setting it to a very low limit (like 2) and committing less often. Once you get too many overlapping searchers, things will slow to a crawl and that will just cause more to pile up. The root cause is simply too many commits in conjunction with warming too long. If you are using a dev version of Solr 1.4, you might try commitWithin instead of explicit commits. (see SOLR-793) Depending how long warming takes, you may want to lower autowarm counts. right now we commit on every update, but that's probably not more than once every few minutes. should i back it off? -jsd-
Re: solr as the data store
We've been using a Lucene index as the main data-store for ActiveMath, the indexing process of which takes the XML fragments apart and stores them in an organized way, including storage of the relationships both ways. The difference between SQL and Lucene in this case? Pure java was the major reason back then. The performance of Lucene stayed top as well (compared to XML databases). As of now because of 2.0, we had to split out the storage of the fragments themselves, keeping the rest in Lucene, because the functionality to reliably read and write fields and never have them be loaded as single strings has been missing us. Maybe it's back in 2.3... Our fragments' size vary from 20 byte to 2 MBytes... about 25k of them is normal. I'm looking forward to, one day, recycle it all to solr which would finally take care of it all in terms of index update and read management, adding a Luke-like web-access. Scalability of Lucene has always been top. Joins are not there... I could get along without them. Summaries are also not really there... but again, we could get along without them. paul Le 28-janv.-09 à 21:37, Ian Connor a écrit : Hi All, Is anyone using Solr (and thus the lucene index) as there database store. Up to now, we have been using a database to build Solr from. However, given that lucene already keeps the stored data intact, and that rebuilding from solr to solr can be very fast, the need for the separate database does not seem so necessary. It seems totally possible to maintain just the solr shards and treat them as the database (backups, redundancy, etc are already built right in). The idea that we would need to rebuild from scratch seems unlikely and the speed boost by using solr shards for data massaging and reindexing seems very appealing. Has anyone else thought about this or done this and ran into problems that caused them to go back to a seperate database model? Is there a critical need you can think is missing? -- Regards, Ian Connor smime.p7s Description: S/MIME cryptographic signature
Re: Separate error logs
check: http://wiki.apache.org/solr/SolrLogging You configure whatever flavor logger to write error to a separate log On Jan 30, 2009, at 4:36 PM, James Brady wrote: Hi all,What's the best way for me to split Solr/Lucene error message off to a separate log? Thanks James
Re: exceeded limit of maxWarmingSearchers
That should be fine (but apparently isn't), as long as you don't have some very slow machine or if your caches are are large and configured to copy a lot of data on commit. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Jon Drukman > To: solr-user@lucene.apache.org > Sent: Friday, January 30, 2009 4:54:06 PM > Subject: Re: exceeded limit of maxWarmingSearchers > > Yonik Seeley wrote: > > I'd advise setting it to a very low limit (like 2) and committing less > > often. Once you get too many overlapping searchers, things will slow > > to a crawl and that will just cause more to pile up. > > > > The root cause is simply too many commits in conjunction with warming > > too long. If you are using a dev version of Solr 1.4, you might try > > commitWithin instead of explicit commits. (see SOLR-793) Depending > > how long warming takes, you may want to lower autowarm counts. > > right now we commit on every update, but that's probably not more than once > every few minutes. should i back it off? > > -jsd-
Re: Separate error logs
Oh... I should really have found that myself :/ Thank you! 2009/1/30 Ryan McKinley > check: > http://wiki.apache.org/solr/SolrLogging > > You configure whatever flavor logger to write error to a separate log > > > > On Jan 30, 2009, at 4:36 PM, James Brady wrote: > > Hi all,What's the best way for me to split Solr/Lucene error message off >> to >> a separate log? >> >> Thanks >> James >> > >
Re: problems on solr search patterns and sorting rules
fei dong wrote: Hi buddy, I work on an audio search based on solr engine. I want to realize lyric search and sort by relevance. Here is my confusion. . My schema.xml is like this: text ... http://localhost:8983/solr/select/?q=lyric:(tear the house down)&fl=*,score&version=2.2&start=0&rows=10&indent=on has results http://localhost:8983/solr/select/?q=tear the house down&fl=*,score&version=2.2&start=0&rows=10&indent=on have no result http://localhost:8983/solr/select/?q=tear the house down&fl=*,score&qf=lyric&version=2.2&start=0&rows=10&indent=on have no result Q1: why the latter links not work while I have added lyric to copyField? Did you re-index after adding lyric to copyField? Q2: I want to set the priority of song name higher ,then artist name and album so I try like this: http://localhost:8983/solr/select/?q=sweet&fl=*,score&qf=mp3^5 artist album^0.4&version=2.2&start=0&rows=10&indent=on I find the score are totally same as without the argument of qf: http://localhost:8983/solr/select/?q=sweet&fl=*,score&version=2.2&start=0&rows=10&indent=on How could I modify the sorting? Try indexing-time boost: http://wiki.apache.org/solr/UpdateXmlMessages#head-8315b8028923d028950ff750a57ee22cbf7977c6 Q3: I would like to realize the effect like : http://mp3.baidu.com/m?f=ms&rn=&tn=baidump3&ct=134217728&word=tear+the+house+down&lm=-1 highlight the fragment. Can solr give me the range of minimum of all keywords in an article like lyric? I couldn't understand your requirement, but can you try &hl=on&hl.fl=lyric and see what you get? Thank you for attention!
RE: Performance "dead-zone" due to garbage collection
I profiled our application, and GC is definitely the problem. The IBM JVM didn't change much. I'm currently looking into ways of reducing my memory footprint. -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21758001.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr on Sun Java Real-Time System
Has anyone tried Solr on the Sun Java Real-Time JVM (http://java.sun.com/javase/technologies/realtime/index.jsp)? I've read that it includes better control over the garbage collector. Thanks. Wojtek -- View this message in context: http://www.nabble.com/Solr-on-Sun-Java-Real-Time-System-tp21758035p21758035.html Sent from the Solr - User mailing list archive at Nabble.com.
Range search question
I have a string field in my schema that actually numeric data. If I try a range search: fieldInQuestion:[ 100 TO 150 ] I fetch back a lot of data that is NOT in this range, such as 11, etc. Any idea why this happens? Is it because this is a string? Thanks.
Re: Range search question
Jim Adams wrote: I have a string field in my schema that actually numeric data. If I try a range search: fieldInQuestion:[ 100 TO 150 ] I fetch back a lot of data that is NOT in this range, such as 11, etc. Any idea why this happens? Is it because this is a string? Thanks. Yep, try sint field type instead. Koji
Re: Range search question
True, which is what I'll probably do, but is there any way to do this using 'string'? Actually I have even seen this with date fields, which seems very odd (more data being returned than I expected). On Fri, Jan 30, 2009 at 7:04 PM, Koji Sekiguchi wrote: > Jim Adams wrote: > >> I have a string field in my schema that actually numeric data. If I try a >> range search: >> >> fieldInQuestion:[ 100 TO 150 ] >> >> I fetch back a lot of data that is NOT in this range, such as 11, etc. >> >> Any idea why this happens? Is it because this is a string? >> >> Thanks. >> >> >> > > Yep, try sint field type instead. > > Koji > >