Re: FunctionQuery score=0
After playing some more with this I managed to get what I want, almost. My query now looks like: q={!frange l=0 incl=false}query({!type=edismax qf="abstract^0.02 title^0.08 categorysearch^0.05" boost='eqsim(alltokens,"xyz")' v='+tokens5:"xyz" '}) With the above query, I am getting only the results that I want, the ones whose score after my FucntionQuery are above 0, but the problem now is that the final score for all results is changed to 1, which affects the sorting. How can I keep the original score that is calculated by the edismax query? Cheers, John On Fri, Nov 18, 2011 at 10:50 AM, Andre Bois-Crettez wrote: > Definitely worked for me, with a classic full text search on "ipod" and > such. > Changing the lower bound changed the number of results. > > Follow Chris advice, and give more details. > > > > John wrote: > >> Doesn't seem to work. >> I though that FilterQueries work before the search is performed and not >> after... no? >> >> Debug doesn't include filter query only the below (changed a bit): >> >> BoostedQuery(boost(+fieldName:**"",boostedFunction(ord(** >> fieldName),query))) >> >> >> On Thu, Nov 17, 2011 at 5:04 PM, Andre Bois-Crettez >> wrote: >> >> >> >>> John wrote: >>> >>> >>> Some of the results are receiving score=0 in my function and I would like them not to appear in the search results. >>> you can use frange, and filter by score: >>> >>> q=ipod&fq={!frange l=0 incl=false}query($q) >>> >>> -- >>> André Bois-Crettez >>> >>> Search technology, Kelkoo >>> http://www.kelkoo.com/ >>> >>> >>> >>> >> >> >> > > -- > André Bois-Crettez > > Search technology, Kelkoo > http://www.kelkoo.com/ > >
Re: Performance issues
http://www.lucidimagination.com/content/scaling-lucene-and-solr Has good guidance. Wrt 1. What is the issue - mem, cpu or query perf or indexing process On Nov 20, 2011, at 11:39 AM, Lalit Kumar 4 wrote: > Hello: > We recently have seen performance issues of SOLR (running on jetty). > > We are looking for help in: > > 1) How can I benchmark our current implementation? > 2) We are trying core vs another instances. What are pros and cons? > 3) Any pointers to validate current configuration is correct? > > Sent on my BlackBerry® from Vodafone
Re: Performance issues
The search with couple of parameters bringing 650 counts(out of 2500 approx) and taking around 30 seconds The schema.xml have more than 100 fields. -Original Message- From: "Govind @ Gmail" Date: Sun, 20 Nov 2011 15:01:04 To: solr-user@lucene.apache.org Reply-To: "solr-user@lucene.apache.org" Cc: solr-user@lucene.apache.org Subject: Re: Performance issues http://www.lucidimagination.com/content/scaling-lucene-and-solr Has good guidance. Wrt 1. What is the issue - mem, cpu or query perf or indexing process On Nov 20, 2011, at 11:39 AM, Lalit Kumar 4 wrote: > Hello: > We recently have seen performance issues of SOLR (running on jetty). > > We are looking for help in: > > 1) How can I benchmark our current implementation? > 2) We are trying core vs another instances. What are pros and cons? > 3) Any pointers to validate current configuration is correct? > > Sent on my BlackBerry® from Vodafone
Re: Performance issues
On Sun, Nov 20, 2011 at 11:27 AM, Lalit Kumar 4 wrote: > > The search with couple of parameters bringing 650 counts(out of 2500 approx) > and taking around 30 seconds > The schema.xml have more than 100 fields. You have of course started with the basics like making sure that the index is less than available RAM on your server? (Or index-size per shard is less than available RAM on the server if you are running a multi server cluster) As long as your index is bigger than what can be placed in cache you will have a hard time keeping your queries fast no matter what, unless the search queries are few enough that they are always within Solr`s own query cache. -- Regards Tor Henning Ueland
how to transform a URL (newbie question)
I am a beginner to solr and need to ask the following: Using the apache-solr example, how can I display an url in the xml document as an active link/url in http? Do i need to add some special transform in the example.xslt file? thanks Ben - No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11
Re: how to transform a URL (newbie question)
Ben, Not quite sure how to interpret what you're asking here. Are you speaking of the /browse view? If so, you can tweak the templates under conf/velocity to make links out of things. But generally, it's the end application that would take the results from Solr and render links as appropriate. Erik On Nov 20, 2011, at 11:53 , Bent Jensen wrote: > I am a beginner to solr and need to ask the following: > Using the apache-solr example, how can I display an url in the xml document > as an active link/url in http? Do i need to add some special transform in > the example.xslt file? > > thanks > Ben > - > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11 >
RE: how to transform a URL (newbie question)
Erik, OK, I will look at that. Basically, what I amtrying to do is to index a document with lots of URLs. I also index the url and give it a field type. Don't know much about solr yet, but though maybe I can transform the url to an active link, i.e. ''. I tried putting the href into the xml document, but it just prints out as text in html. I also could not find any xslt transform or schema. thanks Ben -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Sunday, November 20, 2011 9:05 AM To: solr-user@lucene.apache.org Subject: Re: how to transform a URL (newbie question) Ben, Not quite sure how to interpret what you're asking here. Are you speaking of the /browse view? If so, you can tweak the templates under conf/velocity to make links out of things. But generally, it's the end application that would take the results from Solr and render links as appropriate. Erik On Nov 20, 2011, at 11:53 , Bent Jensen wrote: > I am a beginner to solr and need to ask the following: > Using the apache-solr example, how can I display an url in the xml document > as an active link/url in http? Do i need to add some special transform in > the example.xslt file? > > thanks > Ben > - > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11 > - No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11 - No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11
Re: Only a subset of edismax pf fields are used for the phrase part DisjunctionMaxQuery
Could we see the schema definitions for the fields in question? And the solrconfig for the handler, and the query you actually send? Best Erick On Fri, Nov 18, 2011 at 6:33 AM, Jean-Claude Dauphin wrote: > Hello, > > The parsedQuery is displayed as follow: > > parsedquery=+(DisjunctionMaxQuery((title:responsable^4.0 | > keywords:responsable^3.0 | organizationName:responsable | > location:responsable | formattedDescription:responsable^2.0 | > nafCodeText:responsable^2.0 | jobCodeText:responsable^3.0 | > categoryPayloads:responsable | labelLocation:responsable)~0.1) > DisjunctionMaxQuery((title:boutique^4.0 | keywords:boutique^3.0 | > organizationName:boutique | location:boutique | > formattedDescription:boutique^2.0 | nafCodeText:boutique^2.0 | > jobCodeText:boutique^3.0 | categoryPayloads:boutique | > labelLocation:boutique)~0.1) DisjunctionMaxQuery((title:lingerie^4.0 | > keywords:lingerie^3.0 | organizationName:lingerie | location:lingerie | > formattedDescription:lingerie^2.0 | nafCodeText:lingerie^2.0 | > jobCodeText:lingerie^3.0 | categoryPayloads:lingerie | > labelLocation:lingerie)~0.1)) > > *DisjunctionMaxQuery*((title:"responsable boutique lingerie"~10^4.0 | > formattedDescription:"responsable boutique lingerie"~10^2.0 | > categoryPayloads:"responsable boutique lingerie"~10)~0.1) > > The search query is 'responsable boutique lingerie' > The qf and pf fields are the same: > > qf= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0 > organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0 > categoryPayloads^1.0, > > pf= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0 > organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0 > categoryPayloads^1.0, > > I would have expect to retrieve the whole set of pf fields for the phrase > part of the parsed query! > > Is it comming from the field definition in the schema.xml? > > Best, > > Jean-Claude Dauphin > > > > -- > Jean-Claude Dauphin > > jc.daup...@gmail.com > jc.daup...@afus.unesco.org > > http://kenai.com/projects/j-isis/ > http://www.unesco.org/isis/ > http://www.unesco.org/idams/ > http://www.greenstone.org >
Re: wild card search and lower-casing
As it happens I'm working on SOLR-2438 which should address this. This patch will provide two things: The ability to define a new analysis chain in your schema.xml, currently called "multiterm" that will be applied to queries of various sorts, including wildcard, prefix, range. This will be somewhat of an "expert" thing to make yourself... In the absence of an explicit definition it'll synthesize a multiterm analyzer out of the query analyzer, taking any char fitlers, and lowercaseFilter (if present), and ASCIIFoldingfilter (if present) and putting them in the multiterm analyzer along with a (hardcoded) WhitespaceTokenizer. As of 3.6 and 4.0, this will be the default behavior, although you can explicitly define a field type parameter to specify the current behavior. The reason it is on 3.6 is that I want it to bake for a while before getting into the wild, so I have no intention of trying to get it into the 3.5 release. The patch is up for review now, I'd like another set of eyeballs or two on it before committing. The patch that's up there now is against trunk but I hope to have a 3x patch that I'll apply to the 3x code line after 3.5 RC1 is cut. Best Erick On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan wrote: > >> You're right: >> >> public SolrQueryParser(IndexSchema schema, String >> defaultField) { >> ... >> setLowercaseExpandedTerms(false); >> ... >> } > > Please note that lowercaseExpandedTerms uses String.toLowercase() (uses > default Locale) which is a Locale sensitive operation. > > In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if > it is ported to solr. > > http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html >
Re: Solr filterCache size settings...
Each fq will create a bitmap that is bounded by (maxdocs / 8) bytes. You can think of the entries in the fiterCache as a map where the key is the filter query you specify and the value is the aforementioned bitmap. The number of entries specified in the config file is the number of entries in that map. So the cache can take up roughly (assuming the size if 512) 512 * maxDocs / 8 bytes. Best Erick On Fri, Nov 18, 2011 at 6:49 PM, Andrew Lundgren wrote: > I am new to solr in general and trying to get a handle on the memory > requirements for caching. Specifically I am looking at the filterCache > right now. The documentation on size setting seems to indicate that it is > the number of values to be cached. Did I read that correctly, or is it > really the amount of memory that will be set aside for the cache? > > How do you determine how much cache each fq will consume? > > Thank you! > > -- > Andrew Lundgren > lundg...@familysearch.org > > > NOTICE: This email message is for the sole use of the intended recipient(s) > and may contain confidential and privileged information. Any unauthorized > review, use, disclosure or distribution is prohibited. If you are not the > intended recipient, please contact the sender by reply email and destroy all > copies of the original message. > > >
Re: Can files be faceted based on their size ?
Well, I wouldn't store it as a string in the first place. Otherwise, you're right, you have to store it as an entity that compares lexicographically, usually by left-padding with zeroes. But don't do that if at all possible, it's much more expensive than storing ints or longs, so can you re-index these as one of the Trie* types? Best Erick On Sat, Nov 19, 2011 at 3:35 AM, neuron005 wrote: > But sir > fileSize is of type string, how will it compare? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Can-files-be-faceted-based-on-their-size-tp3518393p3520569.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Performance issues
Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Sun, Nov 20, 2011 at 5:32 AM, Tor Henning Ueland wrote: > On Sun, Nov 20, 2011 at 11:27 AM, Lalit Kumar 4 wrote: >> >> The search with couple of parameters bringing 650 counts(out of 2500 approx) >> and taking around 30 seconds >> The schema.xml have more than 100 fields. > > You have of course started with the basics like making sure that the > index is less than available RAM on your server? (Or index-size per > shard is less than available RAM on the server if you are running a > multi server cluster) > > As long as your index is bigger than what can be placed in cache you > will have a hard time keeping your queries fast no matter what, unless > the search queries are few enough that they are always within Solr`s > own query cache. > > -- > Regards > Tor Henning Ueland >
Re: how to transform a URL (newbie question)
I think you're confusing Solr with a web app Solr itself has nothing to do whatsoever with presenting things to the user. It just returns, as you have seen, XML (or JSON or ) formatted replies. It's up to the application layer to do something intelligent with those. That said, the /browse request handler that ships with the example code uses something called the VelocityResponseWriter to render pages, where the VeolcityResponseWriter interacts with the templates Erik Hatcher mentioned to show you pages. So think of all the Velocity stuff as your app engine for demo purposes. Erik is directing you at that code if you want to hack the Solr example to display stuff. Hope that helps Erick (not Hatcher ) On Sun, Nov 20, 2011 at 2:15 PM, Bent Jensen wrote: > Erik, > OK, I will look at that. Basically, what I amtrying to do is to index a > document with lots of URLs. I also index the url and give it a field type. > Don't know much about solr yet, but though maybe I can transform the url to > an active link, i.e. ''. I tried putting the href into the xml > document, but it just prints out as text in html. I also could not find any > xslt transform or schema. > > thanks > Ben > > -Original Message- > From: Erik Hatcher [mailto:erik.hatc...@gmail.com] > Sent: Sunday, November 20, 2011 9:05 AM > To: solr-user@lucene.apache.org > Subject: Re: how to transform a URL (newbie question) > > Ben, > > Not quite sure how to interpret what you're asking here. Are you speaking > of the /browse view? If so, you can tweak the templates under conf/velocity > to make links out of things. > > But generally, it's the end application that would take the results from > Solr and render links as appropriate. > > Erik > > On Nov 20, 2011, at 11:53 , Bent Jensen wrote: > >> I am a beginner to solr and need to ask the following: >> Using the apache-solr example, how can I display an url in the xml > document >> as an active link/url in http? Do i need to add some special transform in >> the example.xslt file? >> >> thanks >> Ben >> - >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11 >> > > - > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11 > - > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11 > >
Pagination problem with group.limit=2
Hi, As per our business logic we have to show two products per company in our results. Second product should also be displayed as a normal search result, instead of "more..." (or "+", "expand") kind of nested results. *Short description*: With *group.limit=2 *option I am not able to find the exact number of results (not groups) that will be returned in the output on increasing start/rows parameters. Is there any solution? *Detailed description*: We are using solr 3.4 with following group options group=true group.field=company group.limit=2 group.format=simple group.ngroups=true I have following company wise count in index against search criteria company1 = 1 products company2 = 2 products company3 = 3 products company4 = 4 products Now I get a result with following figures matches:10 ngroups:4 doclist.numFound:10 But actual results returned are different (7 results). At max there can be 4x2 (= 8 ngroups x limit) products in the result but company1 returns just 1 results. Others return 2 results each. Now My question is how can I find this actual number so that I can display correct page numbers? I searched for it but there were very few threads regarding the exact issue and those too were old. Any help or pointer is appreciated. -- Regards, Samar
Re: wild card search and lower-casing
Thanks Erick. Do you think the patch you are working on will be applicable as well to 3.4? Best, Dmitry On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson wrote: > As it happens I'm working on SOLR-2438 which should address this. This > patch > will provide two things: > > The ability to define a new analysis chain in your schema.xml, currently > called > "multiterm" that will be applied to queries of various sorts, > including wildcard, > prefix, range. This will be somewhat of an "expert" thing to make > yourself... > > In the absence of an explicit definition it'll synthesize a multiterm > analyzer > out of the query analyzer, taking any char fitlers, and > lowercaseFilter (if present), > and ASCIIFoldingfilter (if present) and putting them in the multiterm > analyzer along > with a (hardcoded) WhitespaceTokenizer. > > As of 3.6 and 4.0, this will be the default behavior, although you can > explicitly > define a field type parameter to specify the current behavior. > > The reason it is on 3.6 is that I want it to bake for a while before > getting into the > wild, so I have no intention of trying to get it into the 3.5 release. > > The patch is up for review now, I'd like another set of eyeballs or > two on it before > committing. > > The patch that's up there now is against trunk but I hope to have a 3x > patch that > I'll apply to the 3x code line after 3.5 RC1 is cut. > > Best > Erick > > > On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan wrote: > > > >> You're right: > >> > >> public SolrQueryParser(IndexSchema schema, String > >> defaultField) { > >> ... > >> setLowercaseExpandedTerms(false); > >> ... > >> } > > > > Please note that lowercaseExpandedTerms uses String.toLowercase() (uses > default Locale) which is a Locale sensitive operation. > > > > In Lucene AnalyzingQueryParser exists for this purposes, but I am not > sure if it is ported to solr. > > > > > http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html > > >
Re: delta-import of rich documents like word and pdf files!
I am using solr 3.4 and configured my DataImportHandler to get some data from MySql as well as index some rich document from the disk. This is the part of db-data-config file where i am indexing Rich text documents. http://localhost/resumes-new/resumes${resume.dir}/${js_logins.id}/${resume.name}"; dataSource="ds-file" format="text"> But after some time i get the following error in my error log. It looks like a class missing error, Can anyone tell me which poi jar version would work with tika.0.6. Currently I have poi-3.7.jar. Error which i am getting is this SEVERE: Exception while processing: js_logins document : SolrInputDocument[{id=id(1.0)={100984}, complete_mobile_number=complete_mobile_number(1.0)={+91 9600067575}, emailid=emailid(1.0)={vkry...@gmail.com}, full_name=full_name(1.0)={Venkat Ryali}}]:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodError: org.apache.poi.xwpf.usermodel.XWPFParagraph.(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) Caused by: java.lang.NoSuchMethodError: org.apache.poi.xwpf.usermodel.XWPFParagraph.(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.(XWPFWordExtractorDecorator.java:163) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.(XWPFWordExtractorDecorator.java:161) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTableContent(XWPFWordExtractorDecorator.java:140) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:91) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:69) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:51) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596) ... 7 more -- View this message in context: http://lucene.472066.n3.nabble.com/delta-import-of-rich-documents-like-word-and-pdf-files-tp3502039p3524047.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Performance/Architecture
Number of rows in SQL Table (Indexed till now using Solr): 1 million Total Size of Data in the table: 4GB Total Index Size: 3.5 GB Total Number of Rows that I have to index: 20 Million (approximately 100 GB Data) and growing What is the best practices with respect to distributing the index? What I mean to say here is when should I distribute and what is the magic number that I can have for index size per instance? For 1 million itself Solr instance running on a VM is taking roughly 2.5 hrs to index for me. So for 20 million roughly it would take 60 -70 hrs. That would be too much. What would be the best distributed architecture for my case? It will be great if people may share their best practices and experience. Thanks!! **This message may contain confidential or proprietary information intended only for the use of theaddressee(s) named above or may contain information that is legally privileged. If you arenot the intended addressee, or the person responsible for delivering it to the intended addressee,you are hereby notified that reading, disseminating, distributing or copying this message is strictlyprohibited. If you have received this message by mistake, please immediately notify us byreplying to the message and delete the original message and any copies immediately thereafter. Thank you.~ ** FAFLD