Re: Facet count problem
Hi Ranveer, The error in the count of the facets its caused by the tokenized field that you are using, if you want to do facets for the whole string, use a fieldType that doesn't strip the the field in tokens like the string field. Regards, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/4/19 Ranveer Kumar > Hi Erick, > > My schema configuration is following. > > > > > > > > > >ignoreCase="true" >words="stopwords.txt" >enablePositionIncrements="true" >/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > > > > > ignoreCase="true" expand="true"/> >ignoreCase="true" >words="stopwords.txt" >enablePositionIncrements="true" >/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > > > > > > > > > > On Mon, Apr 19, 2010 at 6:22 AM, Erick Erickson >wrote: > > > Can we see the actual field definitions from your schema file. > > Ahmet's question is vital and is best answered if you'll > > copy/paste the relevant configuration entries But based > > on what you *have* posted, I'd guess you're trying to > > facet on tokenized fields, which is not recommended. > > > > You might take a look at: > > http://wiki.apache.org/solr/UsingMailingLists, it'll help you > > frame your questions in a manner that gets you your > > answers as fast as possibld. > > > > Best > > Erick > > > > On Sun, Apr 18, 2010 at 12:59 PM, Ranveer Kumar > >wrote: > > > > > I am.using text for type, which is static. For example: type is a field > > and > > > I am using type for categorization. For news type I am using news and > for > > > blog using blog.. type is a text field. > > > > > > On Apr 17, 2010 8:38 PM, "Ahmet Arslan" wrote: > > > > > > > I am facing problem to get facet result count. I must be > wrong > > > somewhere. > I am getting proper ... > > > Are you faceting on a tokenized field? What is the fieldType of your > > field? > > > > > >
Help using boolean operators
Hello, I am confused about the proper usage of the Boolean operators, AND, OR and NOT. Could somebody please provide me an easy to understand explanation. Thanks, Sandhya
Re: LucidWorks Solr
Andy, I think it is important to know what a stemmer really is. It reduces words to their infinitves. Those infinitives do not refer to the real infinitive everytime, but however: for the system, it is an infinitive, since all its derivates could be reduced to the same form. Thats a stemmer. According to this, there can't exist a stemmer for every language, because every language has got its own rules of how to reduce a word to its infinitive. If you apply a stemmer for english language on a german document, the results might be unexpected. However, sometimes it still works good enough. Keep in mind that this is an algorithm. It is not important whether the created infinitive is the real infinitive. It is only important that most of the derivate forms can be reduced to the same basic form. Please ask, if something is not clear. KStem: The wiki[1] says that KStem is less aggressive as the standard stemmer. I guess that this means that there are more rules for how to reduce a word to its infinitive and according to this the results might be better. [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem Kind regards - Mitch -- View this message in context: http://n3.nabble.com/LucidWorks-Solr-tp727341p729110.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help using boolean operators
Hello Sandhya, title: star AND wars NOT sdi This query will match every document where "star" *and* "wars" occur but *not* the term "sdi" (SDI => Strategic Defense Initiative => in the media there was often the term star wars used to describe the project). title: star OR wars This query will match every document where "star" *or* "wars" occur. If your standard operator (defined in your schema.xml) is the OR, you don't need to add the "OR" operator to your query. Standard-operator: OR title: star wars This is the same as title: star OR wars standard-operator: AND title: star wars - > the same as title: star AND wars standard-operator: AND title: star wars NOT sdi is the same as: title: star AND wars NOT sdi Hope this helps. Kind regards - Mitch -- View this message in context: http://n3.nabble.com/Help-using-boolean-operators-tp729102p729135.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query regarding "copyField"
Hello Sandhya, please, show us your schema.xml, so that we can have a look whether something might be wrong there. However, if the source of a copyField is "description" and the destination is "description_stemmed", you can query both: description and description_stemmed. There will be no error. - Mitch -- View this message in context: http://n3.nabble.com/Query-regarding-copyField-tp728961p729140.html Sent from the Solr - User mailing list archive at Nabble.com.
Stemming - disable at query time - reg.
Hi, I have the following filter for a field named "myText" This enables stemming, I guess. My questions are: 1) Can I disable stemming for the same field at the query time? 2) Do I need to copyField the "myText" to "nonStemText", wherein "nonStemText" is not configured with the PorterFilterFactory. regards, Naga
Re: Stemming - disable at query time - reg.
Hello! If you want to have both non-stemmed and stemmed field You should use copyField. Even if there would be a possibility to disable snowball filter at query time, you would have stemmed tokens written in the index. > Hi, > I have the following filter for a field named "myText" > protected="protwords.txt"/> > This enables stemming, I guess. > My questions are: > 1) Can I disable stemming for the same field at the query time? > 2) Do I need to copyField the "myText" to "nonStemText", wherein > "nonStemText" is not configured with the PorterFilterFactory. > regards, > Naga -- Regards, Rafał Kuć
Re: Stemming - disable at query time - reg.
Naga, 1) Yes, it is possible. ... define those filters which you want to apply at query-time 2) I am not sure whether I understand your question right: You do not need to copyField your myText-field, if it is okay for you that the indexed data of the myText-field is stemmed and the query is not. For example: if the original data consists of the sentence "I am working" than it (maybe) looks like this after it is stemmed "I am work". If you query against this with the term "working" there will be no match, if you don't stem your querystring, too. Hope this helps. - Mitch -- View this message in context: http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p729171.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stemming - disable at query time - reg.
Hello! MitchK posted the right solution, my post can be confusing ;( Sorry, for that. > Hello! > If you want to have both non-stemmed and stemmed field You should > use copyField. > Even if there would be a possibility to disable snowball filter at > query time, you would have stemmed tokens written in the index. >> Hi, >> I have the following filter for a field named "myText" >> > protected="protwords.txt"/> >> This enables stemming, I guess. >> My questions are: >> 1) Can I disable stemming for the same field at the query time? >> 2) Do I need to copyField the "myText" to "nonStemText", wherein >> "nonStemText" is not configured with the PorterFilterFactory. >> regards, >> Naga -- Regards, Rafał Kuć
RE: Stemming - disable at query time - reg.
Thank you Mitch! I will try that. regards, Naga -Original Message- From: MitchK [mailto:mitc...@web.de] Sent: Monday, April 19, 2010 2:35 PM To: solr-user@lucene.apache.org Subject: Re: Stemming - disable at query time - reg. Naga, 1) Yes, it is possible. ... define those filters which you want to apply at query-time 2) I am not sure whether I understand your question right: You do not need to copyField your myText-field, if it is okay for you that the indexed data of the myText-field is stemmed and the query is not. For example: if the original data consists of the sentence "I am working" than it (maybe) looks like this after it is stemmed "I am work". If you query against this with the term "working" there will be no match, if you don't stem your querystring, too. Hope this helps. - Mitch -- View this message in context: http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p729171.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Help using boolean operators
Thank You Mitch. I have a query mentioned below : (my defaultOperator is set to "AND") (field1 : This is a good string AND field2 : This is a good string AND field3 : This is a good string AND (field4 : ASCIIDocument OR field4 : BinaryDocument OR field4 : HTMLDocument) AND field5 : doc) This is not giving me the desired results. I want all documents with field1 = ' This is a good string' and field2 = 'This is a good string' and field3 = ' This is a good string' and (field4 = 'ASCIIDocument' or ' BinaryDocument' or ' HTMLDocument') and field5 = 'doc' to be returned. I am not sure why this is not giving me the desired results. Thanks, Sandhya -Original Message- From: MitchK [mailto:mitc...@web.de] Sent: Monday, April 19, 2010 2:19 PM To: solr-user@lucene.apache.org Subject: Re: Help using boolean operators Hello Sandhya, title: star AND wars NOT sdi This query will match every document where "star" *and* "wars" occur but *not* the term "sdi" (SDI => Strategic Defense Initiative => in the media there was often the term star wars used to describe the project). title: star OR wars This query will match every document where "star" *or* "wars" occur. If your standard operator (defined in your schema.xml) is the OR, you don't need to add the "OR" operator to your query. Standard-operator: OR title: star wars This is the same as title: star OR wars standard-operator: AND title: star wars - > the same as title: star AND wars standard-operator: AND title: star wars NOT sdi is the same as: title: star AND wars NOT sdi Hope this helps. Kind regards - Mitch -- View this message in context: http://n3.nabble.com/Help-using-boolean-operators-tp729102p729135.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Stemming - disable at query time - reg.
Hi Mitch, I have defined my field like: I have indexed two documents with "working" and "worked" values and when I search for "working" it is not giving me any results, whereas when I search for "work" it is giving me two results. What should I be doing to get the query results for "working". regards, Naga -Original Message- From: Naga Darbha [mailto:ndar...@opentext.com] Sent: Monday, April 19, 2010 2:45 PM To: solr-user@lucene.apache.org Subject: RE: Stemming - disable at query time - reg. Thank you Mitch! I will try that. regards, Naga -Original Message- From: MitchK [mailto:mitc...@web.de] Sent: Monday, April 19, 2010 2:35 PM To: solr-user@lucene.apache.org Subject: Re: Stemming - disable at query time - reg. Naga, 1) Yes, it is possible. ... define those filters which you want to apply at query-time 2) I am not sure whether I understand your question right: You do not need to copyField your myText-field, if it is okay for you that the indexed data of the myText-field is stemmed and the query is not. For example: if the original data consists of the sentence "I am working" than it (maybe) looks like this after it is stemmed "I am work". If you query against this with the term "working" there will be no match, if you don't stem your querystring, too. Hope this helps. - Mitch -- View this message in context: http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p729171.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Help using boolean operators
Also, one of the fields here, *field3* is a dynamic field. All the other fields except this field, are copied into "text" with copyField. Thanks, Sandhya -Original Message- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Monday, April 19, 2010 2:55 PM To: solr-user@lucene.apache.org Subject: RE: Help using boolean operators Thank You Mitch. I have a query mentioned below : (my defaultOperator is set to "AND") (field1 : This is a good string AND field2 : This is a good string AND field3 : This is a good string AND (field4 : ASCIIDocument OR field4 : BinaryDocument OR field4 : HTMLDocument) AND field5 : doc) This is not giving me the desired results. I want all documents with field1 = ' This is a good string' and field2 = 'This is a good string' and field3 = ' This is a good string' and (field4 = 'ASCIIDocument' or ' BinaryDocument' or ' HTMLDocument') and field5 = 'doc' to be returned. I am not sure why this is not giving me the desired results. Thanks, Sandhya -Original Message- From: MitchK [mailto:mitc...@web.de] Sent: Monday, April 19, 2010 2:19 PM To: solr-user@lucene.apache.org Subject: Re: Help using boolean operators Hello Sandhya, title: star AND wars NOT sdi This query will match every document where "star" *and* "wars" occur but *not* the term "sdi" (SDI => Strategic Defense Initiative => in the media there was often the term star wars used to describe the project). title: star OR wars This query will match every document where "star" *or* "wars" occur. If your standard operator (defined in your schema.xml) is the OR, you don't need to add the "OR" operator to your query. Standard-operator: OR title: star wars This is the same as title: star OR wars standard-operator: AND title: star wars - > the same as title: star AND wars standard-operator: AND title: star wars NOT sdi is the same as: title: star AND wars NOT sdi Hope this helps. Kind regards - Mitch -- View this message in context: http://n3.nabble.com/Help-using-boolean-operators-tp729102p729135.html Sent from the Solr - User mailing list archive at Nabble.com.
Wildcard search in phrase query using spanquery
I need to perform wildcard search in phrase query. I have 2 documents containing text "how do impair" and "how to improve". I want to be able to search both documents by searching (how to im*). There is a provision in lucene which allows me to perform this operation using SpanWildcardQuery and keeping span length to 0. http://mail-archives.apache.org/mod_mbox//lucene-java-user/200707.mbox/%3c469df09f.9030...@gmail.com%3e I tried proximity search in solr but it didn't work with wildcard. Is there any other provision to perform wildcard search in phrase query? Any suggestions Maddy. -- View this message in context: http://n3.nabble.com/Wildcard-search-in-phrase-query-using-spanquery-tp729275p729275.html Sent from the Solr - User mailing list archive at Nabble.com.
Query 2 Cores
Hey All I have 2 cores which have been used with tika to do index files. I would like to do one query on both at once as I will be searching attr_content field. If I do a test on each core I get 1 & 17 results but trying with shards I just get 17 results. Here is my example query http://localhost8983/solr/core1/select?shards=localhost:8983/solr/core2&q=attr_content:test Is this the correct way to query 2 cores at once ? Hope you can help Lee
Re: Stemming - disable at query time - reg.
Hi Naga, I think you should add the same filter to the query configuration: ** That way stemming is applied to the query, so it would search for "work" instead of "working" and, therefore you should be able to retrieve both "worked" and "working". You can see the diferent transformations due to analyzers in query and index time in the "analysis" link inside the Solr admin page so you can check why a given query doesn't match some text. In this case I think you should get: Index: Working -> Work (Applies stemming) Query: Working -> Working (Doesn't apply stemming) So "working" won't match "work" Regards 2010/4/19 Naga Darbha > Hi Mitch, > > I have defined my field like: > > positionIncrementGap="100"> > > >ignoreCase="true" >words="stopwords.txt" >enablePositionIncrements="true" >/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > ignoreCase="true" expand="true"/> >ignoreCase="true" >words="stopwords.txt" >enablePositionIncrements="true" >/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > > > > I have indexed two documents with "working" and "worked" values and when I > search for "working" it is not giving me any results, whereas when I search > for "work" it is giving me two results. > > What should I be doing to get the query results for "working". > > regards, > Naga > > -Original Message- > From: Naga Darbha [mailto:ndar...@opentext.com] > Sent: Monday, April 19, 2010 2:45 PM > To: solr-user@lucene.apache.org > Subject: RE: Stemming - disable at query time - reg. > > Thank you Mitch! I will try that. > > regards, > Naga > > > > -Original Message- > From: MitchK [mailto:mitc...@web.de] > Sent: Monday, April 19, 2010 2:35 PM > To: solr-user@lucene.apache.org > Subject: Re: Stemming - disable at query time - reg. > > > Naga, > > 1) Yes, it is possible. > > > > language="English" protected="protwords.txt"/> > > > > ... define those filters which you want to apply at query-time > > > > 2) I am not sure whether I understand your question right: > You do not need to copyField your myText-field, if it is okay for you that > the indexed data of the myText-field is stemmed and the query is not. > For example: if the original data consists of the sentence "I am working" > than it (maybe) looks like this after it is stemmed "I am work". If you > query against this with the term "working" there will be no match, if you > don't stem your querystring, too. > > Hope this helps. > > - Mitch > -- > View this message in context: > http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p729171.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Alejandro Marqués Rodríguez Paradigma Tecnológico http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42
Re: Solr throws TikaException while parsing sample PDF
Hi Grant, I tried command line of Tika v-0.7(newest), and it parsed the file.. I believe Solr1.4 contains 0.4 version of Tika. Do you suggest to upgrade to new Tika? Can i upgrade only tika in Solr-1.4? or i need to wait till Solr ships with new Tika? Thanks. On Sun, Apr 18, 2010 at 11:24 PM, Grant Ingersoll wrote: > Can you extract content from this using Tika's standalone command line > tool? PDF's are notorious for problems in extracting. To me, it looks like > a bug in PDFBox. I would try to isolate it down to there and then send, if > possible, the sample document to PDFBox and see if they can come up w/ a > fix. > > -Grant > > On Apr 18, 2010, at 1:12 PM, pk wrote: > > > > > Hi, > > while posting a sample pdf (that comes with Solr dist'n) to solr, i'm > > getting a TikaException. > > Using Solr-1.4, SolrJ (StreamingUpdateSolrServer) for posting pdf to > solr. > > Other sample pdfs can be parsed and indexed successfully.. I;m getting > same > > error with some other pdfs also (but adobe reader can open them fine, so > i > > dont think they have an issue in formatting or are corrupt etc)... Here > is > > the trace... > > > > > > found uploaded file : C:\solr_1.4.0\docs\Installing Solr in Tomcat.pdf :: > > size=286242 > > Apr 18, 2010 10:31:34 PM > org.apache.solr.update.processor.LogUpdateProcessor > > finish > > INFO: {} 0 640 > > Apr 18, 2010 10:31:34 PM org.apache.solr.common.SolrException log > > SEVERE: org.apache.solr.common.SolrException: > > org.apache.tika.exception.TikaException: Una > > ble to extract PDF content > >at > > > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocu > > mentLoader.java:211) > >at > > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStrea > > mHandlerBase.java:54) > >at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.jav > > a:131) > >at > > > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(Re > > questHandlers.java:233) > >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > >at > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > > > >at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241 > > ) > >at > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFil > > terChain.java:215) > >at > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain > > .java:188) > >at > > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: > > 213) > >at > > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java: > > 172) > >at > > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > >at > > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) > >at > > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:10 > > 8) > >at > > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) > >at > > > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:873) > >at > > > org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConn > > ection(Http11BaseProtocol.java:665) > >at > > > org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:5 > > 28) > >at > > > org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorke > > rThread.java:81) > >at > > > org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:6 > > 89) > >at java.lang.Thread.run(Thread.java:595) > > Caused by: org.apache.tika.exception.TikaException: Unable to extract PDF > > content > >at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:58) > >at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:51) > >at > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) > >at > > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) > >at > > > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocu > > mentLoader.java:190) > >... 20 more > > Caused by: java.util.zip.ZipException: incorrect header check > >at > > java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140) > >at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97) > >at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290) > >at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235) > >at > org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170) > >at org.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:101) > >at org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:132) > >a
Re: Solr throws TikaException while parsing sample PDF
Praveen Agrawal wrote: Hi Grant, I tried command line of Tika v-0.7(newest), and it parsed the file.. I believe Solr1.4 contains 0.4 version of Tika. Do you suggest to upgrade to new Tika? Can i upgrade only tika in Solr-1.4? or i need to wait till Solr ships with new Tika? Thanks. Solr trunk uses Tika 0.7. I'm not SolrCell user, so this is just an FYI. Koji -- http://www.rondhuit.com/en/
Howto build a function query using the 'query' function
I want to build a function expression for a dismax request handler 'bf' field, to boost the documents if it is referenced by other documents. I.e. the more often a document is referenced, the higher the boost. Something like linear(query(myQueryReturningACountOfHowOftenThisDocumentIsReference d, 1), 0.01, 1) Intended to mean; if count is 0, then the boost is 0*0.01+1 = 1 if count is 1, then the boost is 1*0.01+1 = 1.01 If count is 100, then the boost is 100*0.01 + 1 = 2 However the query function (http://wiki.apache.org/solr/FunctionQuery#query) seems to only be able to return the score of the query results, not the count of results. How can I do this? Thanks, Gert. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
OutOfMemoryError when using query with sort
Hi, i using solr that running on windows server 2008 32-bit. I add about 100 million article into solr without set store attribute. (only store document id) (index file size about 164 GB) when try to get query without sort , it's return doc ids in some ms, but when add sort command, i get below error: TTP Status 500 - Java heap space java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:560) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:525) at org.apache.lucene.search.FieldComparator$LongComparator.setNextReader(FieldComparator.java:391) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:245) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at Note: i set max heap size to 1600MB (tomcat service not start when apply more heap size) but problem not solved I check heap dump file with mat and see this info org.apache.lucene.index.ReadOnlySegmentReader @ 0x253508e8 Shallow Size: 80 B Retained Size: 449,4 MB Problem Suspect 1 One instance of "org.apache.lucene.index.ReadOnlySegmentReader" loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x25350c80" occupies 471.244.848 (97,44%) bytes. The memory is accumulated in one instance of "org.apache.lucene.index.TermInfosReader" loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x25350c80".Keywords org.apache.lucene.index.ReadOnlySegmentReader org.apache.catalina.loader.WebappClassLoader @ 0x25350c80 org.apache.lucene.index.TermInfosReader Problem Suspect 1 how to decrease segment file size for solving this problem Thanks in advanced Hamid
Re: LucidWorks Solr
Regarding stemmers, I ditched them altogether a long time ago in favor of a dictionary of morphologies of all known words (for any given language). A simple lookup of any word morphology thus produces the set, including the correct stem. Works great. 100% of the time. Just a tip from me. On Mon, 2010-04-19 at 00:36 -0800, MitchK wrote: > Andy, I think it is important to know what a stemmer really is. > > It reduces words to their infinitves. Those infinitives do not refer to the > real infinitive everytime, but however: for the system, it is an infinitive, > since all its derivates could be reduced to the same form. > Thats a stemmer. > > According to this, there can't exist a stemmer for every language, because > every language has got its own rules of how to reduce a word to its > infinitive. > > If you apply a stemmer for english language on a german document, the > results might be unexpected. However, sometimes it still works good enough. > > Keep in mind that this is an algorithm. It is not important whether the > created infinitive is the real infinitive. It is only important that most of > the derivate forms can be reduced to the same basic form. Please ask, if > something is not clear. > > KStem: > The wiki[1] says that KStem is less aggressive as the standard stemmer. > I guess that this means that there are more rules for how to reduce a word > to its infinitive and according to this the results might be better. > > > [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem > > Kind regards > - Mitch
Ampersand in searchstring. how to replace ?
Hello.. I didnt find any about my problem... how can i replace an ampersand in indextime ? my autosuggest words are haveing ampersands. how can i replace this sign (&) ??? PatternReplaceCharFilterFactory ?? how is to use this Factory ? or RegexTransformer ??? thx for ya help ;) -- View this message in context: http://n3.nabble.com/Ampersand-in-searchstring-how-to-replace-tp729475p729475.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: LucidWorks Solr
Thanks for the explanation Mitch. You're right. There can't be universal stemmers. What about multi-language stemmers? I'm mostly interested in English, Spanish, German, French, Italian. Are there any stemmers that would handle those languages? If not, what's the recommended way to deal with documents in multiple languages? --- On Mon, 4/19/10, MitchK wrote: > From: MitchK > Subject: Re: LucidWorks Solr > To: solr-user@lucene.apache.org > Date: Monday, April 19, 2010, 4:36 AM > > Andy, I think it is important to know what a stemmer really > is. > > It reduces words to their infinitves. Those infinitives do > not refer to the > real infinitive everytime, but however: for the system, it > is an infinitive, > since all its derivates could be reduced to the same form. > Thats a stemmer. > > According to this, there can't exist a stemmer for every > language, because > every language has got its own rules of how to reduce a > word to its > infinitive. > > If you apply a stemmer for english language on a german > document, the > results might be unexpected. However, sometimes it still > works good enough. > > Keep in mind that this is an algorithm. It is not important > whether the > created infinitive is the real infinitive. It is only > important that most of > the derivate forms can be reduced to the same basic form. > Please ask, if > something is not clear. > > KStem: > The wiki[1] says that KStem is less aggressive as the > standard stemmer. > I guess that this means that there are more rules for how > to reduce a word > to its infinitive and according to this the results might > be better. > > > [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem > > Kind regards > - Mitch > -- > View this message in context: > http://n3.nabble.com/LucidWorks-Solr-tp727341p729110.html > Sent from the Solr - User mailing list archive at > Nabble.com. >
Re: LucidWorks Solr
Thanks for the tip. Are there any publicly available dictionary of morphologies that I could use? Or did you build your own one? --- On Mon, 4/19/10, Darren Govoni wrote: > From: Darren Govoni > Subject: Re: LucidWorks Solr > To: solr-user@lucene.apache.org > Date: Monday, April 19, 2010, 7:39 AM > Regarding stemmers, I ditched them > altogether a long time ago in favor > of a dictionary of morphologies of all known words (for any > given > language). A simple lookup of any word morphology thus > produces the set, > including the correct stem. > > Works great. 100% of the time. > > Just a tip from me. > > > On Mon, 2010-04-19 at 00:36 -0800, MitchK wrote: > > > Andy, I think it is important to know what a stemmer > really is. > > > > It reduces words to their infinitves. Those > infinitives do not refer to the > > real infinitive everytime, but however: for the > system, it is an infinitive, > > since all its derivates could be reduced to the same > form. > > Thats a stemmer. > > > > According to this, there can't exist a stemmer for > every language, because > > every language has got its own rules of how to reduce > a word to its > > infinitive. > > > > If you apply a stemmer for english language on a > german document, the > > results might be unexpected. However, sometimes it > still works good enough. > > > > Keep in mind that this is an algorithm. It is not > important whether the > > created infinitive is the real infinitive. It is only > important that most of > > the derivate forms can be reduced to the same basic > form. Please ask, if > > something is not clear. > > > > KStem: > > The wiki[1] says that KStem is less aggressive as the > standard stemmer. > > I guess that this means that there are more rules for > how to reduce a word > > to its infinitive and according to this the results > might be better. > > > > > > [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem > > > > Kind regards > > - Mitch > > >
Fwd: [Dbworld] Survey on Web Geo-Spatial Open-Source Technologies
maybe of interest to those doing geo-search in solr? paul Début du message réexpédié : De : "Gavin McArdle" Date : 19 avril 2010 14:46:05 GMT+02:00 À : dbwo...@cs.wisc.edu Objet : [Dbworld] Survey on Web Geo-Spatial Open-Source Technologies Répondre à : dbworld_ow...@yahoo.com [Apologies for cross-posting] Hi everybody, I am part of the Spatial Information Systems Group in University College Dublin. We are conducting a survey on Open-Source technologies with particular focus on Geo-Spatial projects. Our goal is to collect first-hand knowledge about a number of Open-Source projects active on the Internet. With this work we hope to identify strong and weak points of each project in order to give some guidelines for future directions to the Open-Source community and potential developers in relation to Geo-Spatial research. Therefore we would like to ask you to take an anonymous questionnaire on these technologies. The questionnaire consists of a few simple questions about your experience with the software in terms of usability, stability, interoperability and so on. Estimated completion time: about 1 minute Link to the questionnaire: http://bit.ly/geospatial-opensource-survey Projects included in this survey: GeoServer, MapServer, PostGIS, MySQL, Hibernate Spatial, Ruby on Rails, Grails, Proj.4, GeoTools, Java Topology Suite, OpenLayers, JsExt, Prototype, MooTools Feel free to contact us at andrea.ballatore [at] ucd.ie if you have any questions, comments and recommendation about this survey. Thank you for your attention, Spatial Information Systems Group, School of Computer Science and Informatics, University College Dublin ___ Please do not post msgs that are not relevant to the database community at large. Go to www.cs.wisc.edu/dbworld for guidelines and posting forms. To unsubscribe, go to https://lists.cs.wisc.edu/mailman/listinfo/dbworld
[ANN] Carrot2 3.3.0 released
Dear All, We're pleased to announce the 3.3.0 release of Carrot2 which significantly improves the scalability of the clustering algorithms (up to 7x times faster clustering in case of the STC algorithm) and fixes a number of minor issues. Release notes: http://project.carrot2.org/release-3.3.0-notes.html Download: http://download.carrot2.org JIRA issues: http://issues.carrot2.org/secure/IssueNavigator.jspa?jqlQuery=project+%3D+CARROT+AND+fixVersion+%3D+%223.3.0%22+ORDER+BY+priority+DESC%2C+key+DESC Similar improvements are available in Lingo3G, the real-time document clustering engine from Carrot Search. Thanks! Dawid Weiss, Stanislaw Osinski Carrot Search, i...@carrot-search.com
is solr ignored my filters ?
hey. sry for this ... stupid question ;) when i perform an import from my data is use some filters. how can i really be sure that solr used my configured filters and analyzer ? when i search in solr the result looks 100% like bevor an import. th =) -- View this message in context: http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729646.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is solr ignored my filters ?
Analyzers/Tokenizers/TokenFilters operate on the text that gets indexed. Stored text remains exactly as you sent it in. Erik On Apr 19, 2010, at 9:53 AM, stockii wrote: hey. sry for this ... stupid question ;) when i perform an import from my data is use some filters. how can i really be sure that solr used my configured filters and analyzer ? when i search in solr the result looks 100% like bevor an import. th =) -- View this message in context: http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729646.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is solr ignored my filters ?
Hi, could you provide at least some information? Usually you can be 100% sure that Solr uses the configuration it is provided with. Cheers, Sven --On Montag, 19. April 2010 05:53 -0800 stockii wrote: hey. sry for this ... stupid question ;) when i perform an import from my data is use some filters. how can i really be sure that solr used my configured filters and analyzer ? when i search in solr the result looks 100% like bevor an import. th =) -- View this message in context: http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729646.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: LucidWorks Solr
There have been some open source ones. I don't have the links handy at this moment[1]. But I parsed through the electronic dictionary and generated a database of each word and its morphologies. I got tired of lame stemmers that were wrong half the time. Computers are fast enough to do lookups on 150,000 words noawadays, there's no need for fuzzy algorithms here, IMO. Good luck! [1] google will turn up some I think. > Thanks for the tip. > > Are there any publicly available dictionary of morphologies that I could > use? Or did you build your own one? > > > --- On Mon, 4/19/10, Darren Govoni wrote: > >> From: Darren Govoni >> Subject: Re: LucidWorks Solr >> To: solr-user@lucene.apache.org >> Date: Monday, April 19, 2010, 7:39 AM >> Regarding stemmers, I ditched them >> altogether a long time ago in favor >> of a dictionary of morphologies of all known words (for any >> given >> language). A simple lookup of any word morphology thus >> produces the set, >> including the correct stem. >> >> Works great. 100% of the time. >> >> Just a tip from me. >> >> >> On Mon, 2010-04-19 at 00:36 -0800, MitchK wrote: >> >> > Andy, I think it is important to know what a stemmer >> really is. >> > >> > It reduces words to their infinitves. Those >> infinitives do not refer to the >> > real infinitive everytime, but however: for the >> system, it is an infinitive, >> > since all its derivates could be reduced to the same >> form. >> > Thats a stemmer. >> > >> > According to this, there can't exist a stemmer for >> every language, because >> > every language has got its own rules of how to reduce >> a word to its >> > infinitive. >> > >> > If you apply a stemmer for english language on a >> german document, the >> > results might be unexpected. However, sometimes it >> still works good enough. >> > >> > Keep in mind that this is an algorithm. It is not >> important whether the >> > created infinitive is the real infinitive. It is only >> important that most of >> > the derivate forms can be reduced to the same basic >> form. Please ask, if >> > something is not clear. >> > >> > KStem: >> > The wiki[1] says that KStem is less aggressive as the >> standard stemmer. >> > I guess that this means that there are more rules for >> how to reduce a word >> > to its infinitive and according to this the results >> might be better. >> > >> > >> > [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem >> > >> > Kind regards >> > - Mitch >> >> >> > > > >
Re: is solr ignored my filters ?
okay. as example. i want to check if WordDelimiterFactory works correct. And i want to experimant with search in substrings with edgengram... i have the problem with that string: "Kamera-Wasserwaage" ... so i think solr should filter this like this. Kamera-Wasserwaage -> Kamera -> Wasserwaage but i want that solr split Wasserwaage into -> Wasser ->Waage and wasserwaage. But this only works with WasserWaage. grml... so i want to see how it is indexed. -- View this message in context: http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729699.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is solr ignored my filters ?
Am 19.04.2010 16:09, schrieb stockii: > so i want to see how it is indexed. > > Go to the admin panel, open the schema browser, and set the number of shown tokens to 1 or something. -Michael
Re: Help using boolean operators
If you're submitting this: field1 : This is a good string then you're searching in "field1" ONLY for "This". the tokens "is", "a" "good" and "string" are being searched against your default search field as defined in your schema. Have you tried parenthesizing? Try the SOLR admin page for looking at how a query is parsed and/or attach &debugQuery=on to your http request to see how the query actually works HTH Erick On Mon, Apr 19, 2010 at 5:47 AM, Sandhya Agarwal wrote: > Also, one of the fields here, *field3* is a dynamic field. All the other > fields except this field, are copied into "text" with copyField. > > Thanks, > Sandhya > > -Original Message- > From: Sandhya Agarwal [mailto:sagar...@opentext.com] > Sent: Monday, April 19, 2010 2:55 PM > To: solr-user@lucene.apache.org > Subject: RE: Help using boolean operators > > Thank You Mitch. > > I have a query mentioned below : (my defaultOperator is set to "AND") > > (field1 : This is a good string AND field2 : This is a good string AND > field3 : This is a good string AND (field4 : ASCIIDocument OR field4 : > BinaryDocument OR field4 : HTMLDocument) AND field5 : doc) > > This is not giving me the desired results. > > I want all documents with field1 = ' This is a good string' and field2 = > 'This is a good string' and field3 = ' This is a good string' and (field4 = > 'ASCIIDocument' or ' BinaryDocument' or ' HTMLDocument') and field5 = 'doc' > to be returned. > > I am not sure why this is not giving me the desired results. > > Thanks, > Sandhya > > -Original Message- > From: MitchK [mailto:mitc...@web.de] > Sent: Monday, April 19, 2010 2:19 PM > To: solr-user@lucene.apache.org > Subject: Re: Help using boolean operators > > > Hello Sandhya, > > title: star AND wars NOT sdi > This query will match every document where "star" *and* "wars" occur but > *not* the term "sdi" (SDI => Strategic Defense Initiative => in the media > there was often the term star wars used to describe the project). > > title: star OR wars > This query will match every document where "star" *or* "wars" occur. > > If your standard operator (defined in your schema.xml) is the OR, you don't > need to add the "OR" operator to your query. > > Standard-operator: OR > title: star wars > This is the same as title: star OR wars > > standard-operator: AND > title: star wars > - > the same as title: star AND wars > > standard-operator: AND > title: star wars NOT sdi > is the same as: title: star AND wars NOT sdi > > Hope this helps. > > Kind regards > - Mitch > -- > View this message in context: > http://n3.nabble.com/Help-using-boolean-operators-tp729102p729135.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: is solr ignored my filters ?
oha, yes thx but we have 800 000 items ... to find the right in this way ? XD -- View this message in context: http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729749.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is solr ignored my filters ?
Am 19.04.2010 16:29, schrieb stockii: > > oha, yes thx but > > we have 800 000 items ... to find the right in this way ? XD Then use the TermsComponent: http://wiki.apache.org/solr/TermsComponent -Michael
Re: Ampersand in searchstring. how to replace ?
> I didnt find any about my problem... > > how can i replace an ampersand in indextime ? > > my autosuggest words are haveing ampersands. how can i > replace this sign (&) > ??? > Easiest way is to use MappingCharFilterFactory before your tokenizer. mapping.txt will be placed under solrhome/conf directory and contain this line : "&" => " "
Re: Wildcard search in phrase query using spanquery
> I need to perform wildcard search in phrase query. I have 2 > documents > containing text "how do impair" and "how to improve". I > want to be able to > search both documents by searching (how to im*). There is a > provision in > lucene which allows me to perform this operation using > SpanWildcardQuery and > keeping span length to 0. > > http://mail-archives.apache.org/mod_mbox//lucene-java-user/200707.mbox/%3c469df09f.9030...@gmail.com%3e > > > I tried proximity search in solr but it didn't work with > wildcard. Is there > any other provision to perform wildcard search in phrase > query? With https://issues.apache.org/jira/browse/SOLR-1604 you can use * operator inside phrases, e.g. "how to im*"
best practice handling html content
hello, we want to index and search in our intranet documents. the field "body" contains html-tags. in our schema.xml we have a fieldType text_de (see at the end of this mail) which uses charFilter solr.HTMLStripCharFilterFactory with index. so this is no problem. the text is put into the index without any html. i can do search over this field, also html entities like ä for a german umlaut (ä) do work, are filtered out correct, support for german language etc. so now comes the problem. the field body is defined like so we do index it and also store the content. on the result page when we are printing body or the highlighing on body we have all the html tags back. sounds correct, as the HTML-Filter only works on the indexing... so my question is, how is the best way to handle this case? strip out all html before adding the document to the index. let solr do the html-filtering and then using some additional filtering on the GUI frontend when printing the search result? or do i have misunderstand something? thank you markus schema.xml
Caching of search results, caching proxy
I'm setting up my Solr index to be updated every x minutes. Does Solr cache the result of a search, and then when next time the same search is requested, it'd recognize that the Index has not changed and therefore just return the previous result from cache without processing the search again? If Solr doesn't do that, can Tomcat or Jetty be configured to cache a dynamically generated result for x minutes and serve that from cache until it expires? Or I'd need to use a caching reverse proxy like Squid or Varnish to do that? Please share your experience - do you actually set up some caching system like this?
Re: Caching of search results, caching proxy
> I'm setting up my Solr index to be > updated every x minutes. > > Does Solr cache the result of a search, and then when next > time the same search is requested, it'd recognize that the > Index has not changed and therefore just return the previous > result from cache without processing the search again? Yes. http://wiki.apache.org/solr/SolrCaching Also http://wiki.apache.org/solr/SolrAndHTTPCaches
Re: Stemming - disable at query time - reg.
Additionally to Alejandro's posting, I would say that you don't need to specify an analyzer for index-time and query-time, since it *seems* (maybe I am wrong) like you want to use the same functionality on index- and query-time. Hope this helps - Mitch -- View this message in context: http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p730019.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: best practice handling html content
> we want to index and search in our intranet documents. > the field "body" contains html-tags. > > in our schema.xml we have a fieldType text_de (see at the > end of this mail) which uses charFilter > solr.HTMLStripCharFilterFactory with index. > so this is no problem. the text is put into the index > without any html. i can do search over this field, also html > entities like ä for a german umlaut (ä) do work, > are filtered out correct, support for german > language etc. > > so now comes the problem. the field body is defined like > > stored="true" /> > > so we do index it and also store the content. on the result > page when we are printing body or the highlighing on body we > have all the html tags back. sounds correct, as the > HTML-Filter only works on the indexing... > > so my question is, how is the best way to handle this case? > strip out all html before adding the document to the index. I think this is the best way to do it if you want to display html-stripped content. By doing so you will save disk space too. Similar discussion: http://search-lucene.com/m/hyKqg1MJEDL
Big problem with solr in an official server.
Hi everybody: I have a big problem with solr in a server with the memory size it is using, I am setting up Solr with "java -jar start.jar" command in an ubuntu server, the process start.jar is using 7Gb of memory in the server and it is affecting considerably the performance of the server. I would want to know how to configure it to use a limited memory size with high performance results, Do I need to migrate the solr to an apache tomcat servlet container to improve the memory performance ??? Could you help me please ??? Thanks in advance. Regards
Re: Help using boolean operators
Erick, I am a little bit confused, because I wasn't aware of this fact (and have never noticed any wrong behaviour... maybe because I used the dismax-handler). How should I search for field1: This is a good string without doing something like field1:this field1:is ... ? If I quote the whole thing, Solr would search for the whole phrase (and only the whole phrase), or am I wrong? I would test it, if I can, but unfortunately it's not possible at the moment. Thank you! Mitch -- View this message in context: http://n3.nabble.com/Help-using-boolean-operators-tp729102p730051.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Big problem with solr in an official server.
> Hi everybody: > > I have a big problem with solr in a server with the memory > size it is using, > I am setting up Solr with "java -jar start.jar" command in > an ubuntu server, > the process start.jar is using 7Gb of memory in the > server and it is > affecting considerably the performance of the server. > I would want to know how to configure it to use a limited > memory size with > high performance results, Do I need to migrate the solr to > an apache tomcat > servlet container to improve the memory performance ??? Recent post about the "java -jar start.jar" : http://search-lucene.com/m/atxZc2MSKig2/run+in+background
Re: LucidWorks Solr
I am curious: The idea behind a stemmer is not that he produces the correct infinitive for a given word. The idea is that he produces always the same infintive for any derivate of the word. What would be, if there is an unknown word? For example something like slang? How does your solution works here? Does it scale? Thank you for sharing experiences. :) - Mitch -- View this message in context: http://n3.nabble.com/LucidWorks-Solr-tp727341p730059.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is solr ignored my filters ?
Where should Solr know that Wasserwaage contains on "Wasser" and "Waage"? You are searching for some extra-filter like DictionaryCompundWordTokenFilter. Kind regards - Mitch stockii wrote: > > okay. > > as example. i want to check if WordDelimiterFactory works correct. And i > want to experimant with search in substrings with edgengram... > > i have the problem with that string: "Kamera-Wasserwaage" ... > > so i think solr should filter this like this. > > Kamera-Wasserwaage > -> Kamera > -> Wasserwaage > > but i want that solr split Wasserwaage into -> Wasser ->Waage and > wasserwaage. But this only works with WasserWaage. grml... > > so i want to see how it is indexed. > > > -- View this message in context: http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p730071.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Big problem with solr in an official server.
I have just read the post, but it doesn't said if the problems with memory are associated with that way, the jetty web server it is used when I start solr that way, then I supposed that problems with memory should not happen because jetty must administrate the way the memory is used. Then are you really sure I must migrate to a non jetty web server ??? It is that what you recommend ? Thanks in advance again. Regards Ariel On Mon, Apr 19, 2010 at 12:27 PM, Ahmet Arslan wrote: > > > Hi everybody: > > > > I have a big problem with solr in a server with the memory > > size it is using, > > I am setting up Solr with "java -jar start.jar" command in > > an ubuntu server, > > the process start.jar is using 7Gb of memory in the > > server and it is > > affecting considerably the performance of the server. > > I would want to know how to configure it to use a limited > > memory size with > > high performance results, Do I need to migrate the solr to > > an apache tomcat > > servlet container to improve the memory performance ??? > > Recent post about the "java -jar start.jar" : > http://search-lucene.com/m/atxZc2MSKig2/run+in+background > > > > >
Re: LucidWorks Solr
This is a little bit of hijacking going on here, but It's algorithmic. That is, there isn't a list of variants that stem to the same infinitive, and your statement "always the same infintive for any derivate of the word" isn't quite what happens. Stemmers will always produce the same infinitive for any given word, just the opposite of what you said. But it is NOT guaranteed that a stemmer will always produce the same infinitive for all derivatives. Rather it just does a pretty darn good job with some anomalies because the rules don't cover all the edge cases. Their *goal* is to do it perfectly, but we all know about unachievable goals... HTH Erick On Mon, Apr 19, 2010 at 12:28 PM, MitchK wrote: > > I am curious: > The idea behind a stemmer is not that he produces the correct infinitive > for > a given word. The idea is that he produces always the same infintive for > any > derivate of the word. > > What would be, if there is an unknown word? For example something like > slang? How does your solution works here? Does it scale? > > Thank you for sharing experiences. :) > > - Mitch > -- > View this message in context: > http://n3.nabble.com/LucidWorks-Solr-tp727341p730059.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: is solr ignored my filters ?
yes, thats what im sying to my chef... but i found another solution in this moment ;) -> i use EdgeNGram only for my productnames and search with an OR operator in my default "text" field and in the productname field. so i found all substrings :D -- View this message in context: http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p730102.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Big problem with solr in an official server.
if you want to limit the use of memory by the java process you could use java -XmxNGB where N is the amount of memory you want to limit to jetty container. On Mon, Apr 19, 2010 at 10:05 PM, Ariel wrote: > I have just read the post, but it doesn't said if the problems with memory > are associated with that way, the jetty web server it is used when I start > solr that way, then I supposed that problems with memory should not happen > because jetty must administrate the way the memory is used. > > Then are you really sure I must migrate to a non jetty web server ??? It is > that what you recommend ? > Thanks in advance again. > Regards > Ariel > > On Mon, Apr 19, 2010 at 12:27 PM, Ahmet Arslan wrote: > > > > > > Hi everybody: > > > > > > I have a big problem with solr in a server with the memory > > > size it is using, > > > I am setting up Solr with "java -jar start.jar" command in > > > an ubuntu server, > > > the process start.jar is using 7Gb of memory in the > > > server and it is > > > affecting considerably the performance of the server. > > > I would want to know how to configure it to use a limited > > > memory size with > > > high performance results, Do I need to migrate the solr to > > > an apache tomcat > > > servlet container to improve the memory performance ??? > > > > Recent post about the "java -jar start.jar" : > > http://search-lucene.com/m/atxZc2MSKig2/run+in+background > > > > > > > > > > >
Re: Big problem with solr in an official server.
And what is the recommended max size memory I should use ??? Is there anyone recommended ??? Regards. On Mon, Apr 19, 2010 at 12:44 PM, Geek Gamer wrote: > if you want to limit the use of memory by the java process you could use > java -XmxNGB > where N is the amount of memory you want to limit to jetty container. > > On Mon, Apr 19, 2010 at 10:05 PM, Ariel wrote: > > > I have just read the post, but it doesn't said if the problems with > memory > > are associated with that way, the jetty web server it is used when I > start > > solr that way, then I supposed that problems with memory should not > happen > > because jetty must administrate the way the memory is used. > > > > Then are you really sure I must migrate to a non jetty web server ??? It > is > > that what you recommend ? > > Thanks in advance again. > > Regards > > Ariel > > > > On Mon, Apr 19, 2010 at 12:27 PM, Ahmet Arslan > wrote: > > > > > > > > > Hi everybody: > > > > > > > > I have a big problem with solr in a server with the memory > > > > size it is using, > > > > I am setting up Solr with "java -jar start.jar" command in > > > > an ubuntu server, > > > > the process start.jar is using 7Gb of memory in the > > > > server and it is > > > > affecting considerably the performance of the server. > > > > I would want to know how to configure it to use a limited > > > > memory size with > > > > high performance results, Do I need to migrate the solr to > > > > an apache tomcat > > > > servlet container to improve the memory performance ??? > > > > > > Recent post about the "java -jar start.jar" : > > > http://search-lucene.com/m/atxZc2MSKig2/run+in+background > > > > > > > > > > > > > > > > > >
Re: Big problem with solr in an official server.
> And what is the recommended max size > memory I should use ??? Is there anyone > recommended ??? What is your index size?
Re: LucidWorks Solr
Yes, you are right, thank you Erick. I've lost this point and thought only of common cases, not of special ones. However, one can combine the mentioned solutions and different stem-filters in different fields, so that one can be quite (not absolutely) sure, that in most of all cases the application works as expected. - Mitch -- View this message in context: http://n3.nabble.com/LucidWorks-Solr-tp727341p730160.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Big problem with solr in an official server.
Wasn't there a good posting on lucidworks.com? The title was something like "deadly sins" or so. There are some good suggestions on things like that :). Kind regards - Mitch -- View this message in context: http://n3.nabble.com/Big-problem-with-solr-in-an-official-server-tp730049p730168.html Sent from the Solr - User mailing list archive at Nabble.com.
Fwd: Query 2 Cores
Any ideas about my below Q ? Lee Begin forwarded message: > From: Lee Smith > Date: 19 April 2010 11:19:45 GMT+01:00 > To: solr-user@lucene.apache.org > Subject: Query 2 Cores > Reply-To: solr-user@lucene.apache.org > > Hey All > > I have 2 cores which have been used with tika to do index files. > > I would like to do one query on both at once as I will be searching > attr_content field. > > If I do a test on each core I get 1 & 17 results but trying with shards I > just get 17 results. > > Here is my example query > > http://localhost8983/solr/core1/select?shards=localhost:8983/solr/core2&q=attr_content:test > > Is this the correct way to query 2 cores at once ? > > Hope you can help > > Lee
Re: Fwd: Query 2 Cores
On 4/19/2010 11:09 AM, Lee Smith wrote: http://localhost8983/solr/core1/select?shards=localhost:8983/solr/core2&q=attr_content:test Is this the correct way to query 2 cores at once ? This should do what you want: http://localhost:8983/solr/core1/select?shards=localhost:8983/solr/core1,localhost:8983/solr/core2&q=attr_content:test
Re: LucidWorks Solr
My use requires a mroe correct processing of language than what you define as a stemmer. My experience with stemmers is that even with some words without a stem, it makes a new word from it. I consider those false positives. My approach is based on the need to recognize that walk, walked, walking all refer to the same lemma "walk" as is correct in grammar (not some stemmer algorithm choice). It scales fine. In fact, I use lucene with Instantiated in-memory index to perform the lookups, but one could easily use MySQL or something else. Darren > > I am curious: > The idea behind a stemmer is not that he produces the correct infinitive > for > a given word. The idea is that he produces always the same infintive for > any > derivate of the word. > > What would be, if there is an unknown word? For example something like > slang? How does your solution works here? Does it scale? > > Thank you for sharing experiences. :) > > - Mitch > -- > View this message in context: > http://n3.nabble.com/LucidWorks-Solr-tp727341p730059.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: LucidWorks Solr
> This is a little bit of hijacking going on here, but You are right. Accept my regrets. > It's algorithmic. That is, there isn't a list of variants that > stem to the same infinitive, and your statement > "always the same infintive for any derivate of the word" > isn't quite what happens. > > Stemmers will always produce the same infinitive for any given > word, just the opposite of what you said. But it is NOT guaranteed > that a stemmer will always produce the same infinitive for all > derivatives. Rather it just does a pretty darn good job with some > anomalies because the rules don't cover all the edge cases. > > Their *goal* is to do it perfectly, but we all know about unachievable > goals... > > HTH > Erick > > On Mon, Apr 19, 2010 at 12:28 PM, MitchK wrote: > >> >> I am curious: >> The idea behind a stemmer is not that he produces the correct infinitive >> for >> a given word. The idea is that he produces always the same infintive for >> any >> derivate of the word. >> >> What would be, if there is an unknown word? For example something like >> slang? How does your solution works here? Does it scale? >> >> Thank you for sharing experiences. :) >> >> - Mitch >> -- >> View this message in context: >> http://n3.nabble.com/LucidWorks-Solr-tp727341p730059.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >
Re: LucidWorks Solr
no big deal, just wanted to mention. On Mon, Apr 19, 2010 at 1:24 PM, wrote: > > This is a little bit of hijacking going on here, but > You are right. Accept my regrets. > > > > It's algorithmic. That is, there isn't a list of variants that > > stem to the same infinitive, and your statement > > "always the same infintive for any derivate of the word" > > isn't quite what happens. > > > > Stemmers will always produce the same infinitive for any given > > word, just the opposite of what you said. But it is NOT guaranteed > > that a stemmer will always produce the same infinitive for all > > derivatives. Rather it just does a pretty darn good job with some > > anomalies because the rules don't cover all the edge cases. > > > > Their *goal* is to do it perfectly, but we all know about unachievable > > goals... > > > > HTH > > Erick > > > > On Mon, Apr 19, 2010 at 12:28 PM, MitchK wrote: > > > >> > >> I am curious: > >> The idea behind a stemmer is not that he produces the correct infinitive > >> for > >> a given word. The idea is that he produces always the same infintive for > >> any > >> derivate of the word. > >> > >> What would be, if there is an unknown word? For example something like > >> slang? How does your solution works here? Does it scale? > >> > >> Thank you for sharing experiences. :) > >> > >> - Mitch > >> -- > >> View this message in context: > >> http://n3.nabble.com/LucidWorks-Solr-tp727341p730059.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > > > >
synonym filter and offsets
hello *, im having issues with the synonym filter altering token offsets, my input text is "saturday night live" its is tokenized by the whitespace tokenizer yielding 3 tokens [saturday, 0,8], [night, 9, 14], [live, 15,19] on indexing these are passed through a synonym filter that has this line saturday night live => snl, saturday night live i now end up with four tokens [saturday, 0, 19], [snl, 0, 19], [night, 0, 19], [live, 0,19] what i want is [saturday, 0,8], [snl, 0,19], [night, 9, 14], [live, 15,19] when using the highlighter i want to make it so only the relevant part of the text is highlighted, how can i fix my filter chain? thx much --joe
Re: LucidWorks Solr
Andy, This will help with smooth injection of your multilingual documents into Solr (multilingual either in the sense of 1 doc containing fields in multiple languages or 1 index containing documents in different languages): http://sematext.com/products/multilingual-indexer/index.html Re your other question about open-source morpho dictionaries - I don't know of any. Last time I looked for dictionaries I learned that they cost money. That said, the market for datasets is starting to grow, so you may be able to find more and cheaper dictionaries now. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Andy > To: solr-user@lucene.apache.org > Sent: Mon, April 19, 2010 8:45:40 AM > Subject: Re: LucidWorks Solr > > Thanks for the explanation Mitch. You're right. There can't be universal > stemmers. What about multi-language stemmers? I'm mostly interested in > English, Spanish, German, French, Italian. Are there any stemmers that would > handle those languages? If not, what's the recommended way to deal with > documents in multiple languages? --- On Mon, 4/19/10, MitchK < > ymailto="mailto:mitc...@web.de"; > href="mailto:mitc...@web.de";>mitc...@web.de> wrote: > From: > MitchK < > href="mailto:mitc...@web.de";>mitc...@web.de> > Subject: Re: > LucidWorks Solr > To: > href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org > > Date: Monday, April 19, 2010, 4:36 AM > > Andy, I think it is > important to know what a stemmer really > is. > > It reduces > words to their infinitves. Those infinitives do > not refer to the > > real infinitive everytime, but however: for the system, it > is an > infinitive, > since all its derivates could be reduced to the same > form. > Thats a stemmer. > > According to this, there can't > exist a stemmer for every > language, because > every language has > got its own rules of how to reduce a > word to its > > infinitive. > > If you apply a stemmer for english language on a > german > document, the > results might be unexpected. However, > sometimes it still > works good enough. > > Keep in mind > that this is an algorithm. It is not important > whether the > > created infinitive is the real infinitive. It is only > important that > most of > the derivate forms can be reduced to the same basic > form. > Please ask, if > something is not clear. > > > KStem: > The wiki[1] says that KStem is less aggressive as the > > standard stemmer. > I guess that this means that there are more rules for > how > to reduce a word > to its infinitive and according to this the > results might > be better. > > > [1] > href="http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem"; > target=_blank > >http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem > > > Kind regards > - Mitch > -- > View this message in > context: > target=_blank > >http://n3.nabble.com/LucidWorks-Solr-tp727341p729110.html > Sent from > the Solr - User mailing list archive at > Nabble.com. > >
Re: LucidWorks Solr
> Andy, > > This will help with smooth injection of your multilingual > documents into Solr (multilingual either in the sense of 1 > doc containing fields in multiple languages or 1 index > containing documents in different languages): > > http://sematext.com/products/multilingual-indexer/index.html Otis, Thanks for the info. Is multilingual indexer an open source project or a commercial product? That web page doesn't mention anything about either open source or a price, so it's hard to tell.
Re: Help using boolean operators
?id you try parenthesizing: field1:(This is a good string) You can try lots of things easily by going to http://localhost:8983/solr/admin/form.jsp and clicking the "debug enable" checkbox... HTH Erick On Mon, Apr 19, 2010 at 12:23 PM, MitchK wrote: > > Erick, > > I am a little bit confused, because I wasn't aware of this fact (and have > never noticed any wrong behaviour... maybe because I used the > dismax-handler). > How should I search for > field1: This is a good string > without doing something like > field1:this field1:is ... ? > If I quote the whole thing, Solr would search for the whole phrase (and > only > the whole phrase), or am I wrong? > > I would test it, if I can, but unfortunately it's not possible at the > moment. > > Thank you! > > Mitch > -- > View this message in context: > http://n3.nabble.com/Help-using-boolean-operators-tp729102p730051.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Help using boolean operators
Careful though... the Solr admin page is for *analysis* testing, not query parsing. I saw that mentioned earlier too. To test query parsing, submit your query to http://localhost:8983/solr/select?q=your_query&debugQuery=true and look at the parsed query output. Erik On Apr 19, 2010, at 6:45 PM, Erick Erickson wrote: ?id you try parenthesizing: field1:(This is a good string) You can try lots of things easily by going to http://localhost:8983/solr/admin/form.jsp and clicking the "debug enable" checkbox... HTH Erick On Mon, Apr 19, 2010 at 12:23 PM, MitchK wrote: Erick, I am a little bit confused, because I wasn't aware of this fact (and have never noticed any wrong behaviour... maybe because I used the dismax-handler). How should I search for field1: This is a good string without doing something like field1:this field1:is ... ? If I quote the whole thing, Solr would search for the whole phrase (and only the whole phrase), or am I wrong? I would test it, if I can, but unfortunately it's not possible at the moment. Thank you! Mitch -- View this message in context: http://n3.nabble.com/Help-using-boolean-operators- tp729102p730051.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help using boolean operators
Hmmm, I *thought* I saw the XML response with the parsed query in it, did I miss the details *again*? Erick On Mon, Apr 19, 2010 at 7:15 PM, Erik Hatcher wrote: > Careful though... the Solr admin page is for *analysis* testing, not query > parsing. I saw that mentioned earlier too. To test query parsing, submit > your query to > http://localhost:8983/solr/select?q=your_query&debugQuery=true and look at > the parsed query output. > >Erik > > > On Apr 19, 2010, at 6:45 PM, Erick Erickson wrote: > > ?id you try parenthesizing: >> field1:(This is a good string) >> >> You can try lots of things easily by going to >> http://localhost:8983/solr/admin/form.jsp >> and clicking the "debug enable" checkbox... >> >> HTH >> Erick >> >> On Mon, Apr 19, 2010 at 12:23 PM, MitchK wrote: >> >> >>> Erick, >>> >>> I am a little bit confused, because I wasn't aware of this fact (and have >>> never noticed any wrong behaviour... maybe because I used the >>> dismax-handler). >>> How should I search for >>> field1: This is a good string >>> without doing something like >>> field1:this field1:is ... ? >>> If I quote the whole thing, Solr would search for the whole phrase (and >>> only >>> the whole phrase), or am I wrong? >>> >>> I would test it, if I can, but unfortunately it's not possible at the >>> moment. >>> >>> Thank you! >>> >>> Mitch >>> -- >>> View this message in context: >>> http://n3.nabble.com/Help-using-boolean-operators-tp729102p730051.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >
Re: Help using boolean operators
Ah sorry... my bad. You're right. I thought you were referring to the admin analysis.jsp page, but I misread and replied to quickly. You're spot on, Erick. Erik On Apr 19, 2010, at 7:21 PM, Erick Erickson wrote: Hmmm, I *thought* I saw the XML response with the parsed query in it, did I miss the details *again*? Erick On Mon, Apr 19, 2010 at 7:15 PM, Erik Hatcher wrote: Careful though... the Solr admin page is for *analysis* testing, not query parsing. I saw that mentioned earlier too. To test query parsing, submit your query to http://localhost:8983/solr/select?q=your_query&debugQuery=true and look at the parsed query output. Erik On Apr 19, 2010, at 6:45 PM, Erick Erickson wrote: ?id you try parenthesizing: field1:(This is a good string) You can try lots of things easily by going to http://localhost:8983/solr/admin/form.jsp and clicking the "debug enable" checkbox... HTH Erick On Mon, Apr 19, 2010 at 12:23 PM, MitchK wrote: Erick, I am a little bit confused, because I wasn't aware of this fact (and have never noticed any wrong behaviour... maybe because I used the dismax-handler). How should I search for field1: This is a good string without doing something like field1:this field1:is ... ? If I quote the whole thing, Solr would search for the whole phrase (and only the whole phrase), or am I wrong? I would test it, if I can, but unfortunately it's not possible at the moment. Thank you! Mitch -- View this message in context: http://n3.nabble.com/Help-using-boolean-operators-tp729102p730051.html Sent from the Solr - User mailing list archive at Nabble.com.
Highlighting apostrophe
I have the following text field: ... When I search for women's, womens or women I correctly get back all the results I want. However when I use the highlighting feature it only highlights women in the women's cases. How can I highlight the whole word women's including the apostrophe? Thanks -- View this message in context: http://n3.nabble.com/Highlighting-apostrophe-tp731155p731155.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting apostrophe
Same general question about highlighting the full work "sunglasses" when I search for glasses. Is this possible? Thanks -- View this message in context: http://n3.nabble.com/Highlighting-apostrophe-tp731155p731305.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Stemming - disable at query time - reg.
Yes, both have same filters, so we can avoid specifying analyzer type. - Naga -Original Message- From: MitchK [mailto:mitc...@web.de] Sent: Monday, April 19, 2010 9:44 PM To: solr-user@lucene.apache.org Subject: Re: Stemming - disable at query time - reg. Additionally to Alejandro's posting, I would say that you don't need to specify an analyzer for index-time and query-time, since it *seems* (maybe I am wrong) like you want to use the same functionality on index- and query-time. Hope this helps - Mitch -- View this message in context: http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p730019.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Help using boolean operators
Thanks Erick. Using parentheses works. With parentheses, the query,q=field1: (this is a good string) is parsed as follows : +field1:this +field1:good +field1:string Is that ok to do. Thanks, Sandhya -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 20, 2010 4:16 AM To: solr-user@lucene.apache.org Subject: Re: Help using boolean operators ?id you try parenthesizing: field1:(This is a good string) You can try lots of things easily by going to http://localhost:8983/solr/admin/form.jsp and clicking the "debug enable" checkbox... HTH Erick On Mon, Apr 19, 2010 at 12:23 PM, MitchK wrote: > > Erick, > > I am a little bit confused, because I wasn't aware of this fact (and have > never noticed any wrong behaviour... maybe because I used the > dismax-handler). > How should I search for > field1: This is a good string > without doing something like > field1:this field1:is ... ? > If I quote the whole thing, Solr would search for the whole phrase (and > only > the whole phrase), or am I wrong? > > I would test it, if I can, but unfortunately it's not possible at the > moment. > > Thank you! > > Mitch > -- > View this message in context: > http://n3.nabble.com/Help-using-boolean-operators-tp729102p730051.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Solr throws TikaException while parsing sample PDF
I'm using Solr 1.4 distribution, with Solr cell. Can i update only new version of Tika in Solr 1.4 distn? If yes, any guide etc? Thanks. On Mon, Apr 19, 2010 at 4:36 PM, Koji Sekiguchi wrote: > Praveen Agrawal wrote: > >> Hi Grant, >> I tried command line of Tika v-0.7(newest), and it parsed the file.. I >> believe Solr1.4 contains 0.4 version of Tika. >> Do you suggest to upgrade to new Tika? Can i upgrade only tika in >> Solr-1.4? >> or i need to wait till Solr ships with new Tika? >> Thanks. >> >> > Solr trunk uses Tika 0.7. I'm not SolrCell user, so this is just an FYI. > > Koji > > -- > http://www.rondhuit.com/en/ > >