Re: Pass Double Quotes using SolrJ
On Mon, Apr 6, 2009 at 10:56 AM, dabboo wrote: > > I want to pass double quotes to my solr from the front end, so that it can > return the specific results of that particular phrase which is there in > double quotes. > > If I use httpClient, it doesnt allow me to send the query in this format. > As > it throws me an invalid query exception. > > I want to know, if I can do this with SolrJ Client. If yes, can somebody > please let me know as how SolrJ is doing this and parsing this type of > query. > > Amit, look at http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Escaping%20Special%20Charactersfor the list of characters that need to be escaped. Look at ClientUtils.escapeQueryChars() method in Solrj. I'm curious to know why you are trying to roll your own solr client when Solrj exists? -- Regards, Shalin Shekhar Mangar.
How could I limit a specific field size ?
Hello, I'm trying to tune my Solr installation, specifically the search results. At present, my search queries return some standard fields like filename, filepath and text of the matching file. However the text field contains the full contents of the file, which is not very efficient in my case. I'd like to copy the "text" field to a field called "preview" and then limit the "preview" field to just a few lines of text (or number of terms). Then I could configure retrieving the "preview" field instead of "text" upon search. Is there a way to specify such size limits per field or something similar? Thank you much. Regards, Veselin K
Boost fileds at indexing ot query time
Hey there, Don't know if I shoud ask this in here or in Lucene Users forum... I have a doubt with field boosting (I am using dismax). I use document boosting at index time to give more importance to some documents. At this point, I don't care about the matching, just want to tell solr/lucene that that docuemtns are more important than the others. In another hand, I give field boost at query time to some fields because I want to give more importance to the match in that field than in the others. Until here everything clear... but I am missing the concept of field boost at index time... waht is it used for? -- View this message in context: http://www.nabble.com/Boost-fileds-at-indexing-ot-query-time-tp22904463p22904463.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH API for specifying a either specific or all configurations imported
There is a debug mode http://wiki.apache.org/solr/DataImportHandler#head-0b0ff832aa29f5ba39c22b99603996e8a2f2d801 On Mon, Apr 6, 2009 at 2:35 PM, Wesley Small wrote: > Good Morning, > > Is there any way to specify or debug a specific DIH configuration via the > API/http request? > > I have the following: > > > dih_pc_default_feed.xml > > > dih_pc_cms_article_feed.xml > > > dih_pc_local_event_feed.xml > > > For example, is there any to specific only the "pc_local_event" be process > (imported)? > > Another questions, if command=full-import, this should effectively mean that > all DIH configuration are executed in sequential order. Is that correct? I > am not seeing that behaviour at present. > > Thanks, > Wesley > > -- --Noble Paul
Re: How could I avoid reindexing same files?
Hello Paul, I'm indexing with "curl http://localhost... -F myfi...@file.pdf" Regards, Veselin K On Mon, Apr 06, 2009 at 02:56:20PM +0530, Noble Paul ? ?? wrote: > how are you indexing? > > On Mon, Apr 6, 2009 at 2:54 PM, Veselin Kantsev > wrote: > > Hello, > > apologies for the basic question. > > > > How can I avoid double indexing files? > > > > In case all my files are in one folder which is scanned frequently, is > > there a Solr feature of checking and skipping a file if it has already been > > indexed > > and not changed since? > > > > > > Thank you. > > > > Regards, > > Veselin K > > > > > > > > -- > --Noble Paul
Re: DIH API for specifying a either specific or all configurations imported
>Good Morning, > >Is there any way to specify or debug a specific DIH configuration via the >API/http request? > >I have the following: > > >dih_pc_default_feed.xml > > >dih_pc_cms_article_feed.xml > > >dih_pc_local_event_feed.xml > > >For example, is there any to specific only the "pc_local_event" be process >(imported)? > >Another questions, if command=full-import, this should effectively mean that >all DIH configuration are executed in sequential order. Is that correct? I >am not seeing that behaviour at present. > Wesley, I do not think the above is valid syntactically. I am a still coming up to speed on DIH, however I have taken to storing all my DIH import configurations in a single file. Each of your different configurations would be within its own top level entity tag. Each of which MUST be named. It is also a good idea to explicitly name each of your datasource descriptions, and then have the entities reference there datasource by name. I can then invoke only that entity from the URL as follows:- http://localhost:8080/apache-solr-1.4-dev/dataimport?command=full-import&entity=jc See the docs at:- http://wiki.apache.org/solr/DataImportHandler#head-1582242c1bfc1f3e89f4025bf2055791848acefb Fergus. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
maxBufferedDocs
I see two entries of maxBufferedDocs property in solrconfig.xml. One in indexDefaults tag and other in mainIndex tag commented as Deprecated. So is this property required and gets used? What if remove the indexDefaults tag altogether? Thanks, Siddharth
What is QTime a measure of?
Hi Just started using Solr/Lucene and am getting to grips with it. Great product! What is the QTime a measure of? is it milliseconds, seconds? I tried a Google search but couldn't fins anything definitive. Thanks In Advance Andrew McCombe
Re: How could I limit a specific field size ?
On Mon, Apr 6, 2009 at 1:52 PM, Veselin K wrote: > > I'd like to copy the "text" field to a field called "preview" and > then limit the "preview" field to just a few lines of text (or number of > terms). > > Then I could configure retrieving the "preview" field instead of "text" > upon search. > > Is there a way to specify such size limits per field or something similar? > > Yes, there is a maxLength attribute for a copyField which you can use: -- Regards, Shalin Shekhar Mangar.
How could I avoid reindexing same files?
Hello, apologies for the basic question. How can I avoid double indexing files? In case all my files are in one folder which is scanned frequently, is there a Solr feature of checking and skipping a file if it has already been indexed and not changed since? Thank you. Regards, Veselin K
Re: How could I avoid reindexing same files?
how are you indexing? On Mon, Apr 6, 2009 at 2:54 PM, Veselin Kantsev wrote: > Hello, > apologies for the basic question. > > How can I avoid double indexing files? > > In case all my files are in one folder which is scanned frequently, is > there a Solr feature of checking and skipping a file if it has already been > indexed > and not changed since? > > > Thank you. > > Regards, > Veselin K > > -- --Noble Paul
DIH API for specifying a either specific or all configurations imported
Good Morning, Is there any way to specify or debug a specific DIH configuration via the API/http request? I have the following: dih_pc_default_feed.xml dih_pc_cms_article_feed.xml dih_pc_local_event_feed.xml For example, is there any to specific only the "pc_local_event" be process (imported)? Another questions, if command=full-import, this should effectively mean that all DIH configuration are executed in sequential order. Is that correct? I am not seeing that behaviour at present. Thanks, Wesley
Re: maxBufferedDocs
maxBufferedDocs is deprecated, better use ramBufferSizeMB. In case you have both specified, the more restrictive will be the one that will be used. You can remove the config of indexDefaults if you have your index configuration in mainIndex. Gargate, Siddharth wrote: > > I see two entries of maxBufferedDocs property in solrconfig.xml. One in > indexDefaults tag and other in mainIndex tag commented as Deprecated. So > is this property required and gets used? What if remove the > indexDefaults tag altogether? > > Thanks, > Siddharth > > -- View this message in context: http://www.nabble.com/maxBufferedDocs-tp22905364p22905494.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH API for specifying a either specific or all configurations imported
On Mon, Apr 6, 2009 at 2:35 PM, Wesley Small wrote: > > Is there any way to specify or debug a specific DIH configuration via the > API/http request? > > I have the following: > > > dih_pc_default_feed.xml > > > dih_pc_cms_article_feed.xml > > > dih_pc_local_event_feed.xml > > That is not a valid configuration. There can be only one single config (the one specified under "defaults") per core. > > For example, is there any to specific only the "pc_local_event" be process > (imported)? Perhaps what you intend to do, can be achieved through multiple root entities in the same data-config.xml ? > Another questions, if command=full-import, this should effectively mean > that > all DIH configuration are executed in sequential order. Is that correct? > I > am not seeing that behaviour at present. All root entities are executed sequentially. What behavior are you seeing? -- Regards, Shalin Shekhar Mangar.
Re: What is QTime a measure of?
On Mon, Apr 6, 2009 at 4:38 PM, Andrew McCombe wrote: > > Just started using Solr/Lucene and am getting to grips with it. Great > product! Welcome to Solr! > What is the QTime a measure of? is it milliseconds, seconds? I tried a > Google search but couldn't fins anything definitive. > QTime is the elapsed time (in milliseconds) between the arrival of the request (when the SolrQueryRequest object is created) and the completion of the request handler. In other words, it will tell you how long it took to execute your query including things like query parsing, the actual search, faceting etc. -- Regards, Shalin Shekhar Mangar.
Re: How could I limit a specific field size ?
Thank you very much Shalin. Regards, Veselin K On Mon, Apr 06, 2009 at 02:19:05PM +0530, Shalin Shekhar Mangar wrote: > On Mon, Apr 6, 2009 at 1:52 PM, Veselin K wrote: > > > > > I'd like to copy the "text" field to a field called "preview" and > > then limit the "preview" field to just a few lines of text (or number of > > terms). > > > > Then I could configure retrieving the "preview" field instead of "text" > > upon search. > > > > Is there a way to specify such size limits per field or something similar? > > > > > Yes, there is a maxLength attribute for a copyField which you can use: > > > > -- > Regards, > Shalin Shekhar Mangar.
Re: Multiple Core schemas with single solr.solr.home
the only issue you may have will be related to software that writes files in solr-home, but the only one I can think of is dataimport.properties of DIH, so if you use DIH, you may want to make dataimport.properties location to be configurable dinamically, like an entry in data-config.xml, otherwise each import on a core will change the file for all cores; Another (easier? safer?) option would be to use symbolic links, i.e make a dir per core and add in each one a simbolic link for xml files, so that they all read the same. On Sat, Apr 4, 2009 at 6:28 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Sat, Apr 4, 2009 at 9:51 PM, Rakesh Sinha >wrote: > > > I am planning to configure a solr server with multiple cores with > > different schema for themselves with a single solr.solr.home . Are > > there any examples in the wiki to the wiki ( the ones that I see have > > a single schema.xml for a given solr.solr.home under schema directory. > > ). > > > > Thanks for helping pointing to the same. > > > > It should be possible though I don't there are any examples. You can > specify > the same instanceDir for different cores but different dataDir (specifying > dataDir in solr.xml is a trunk feature) > > -- > Regards, > Shalin Shekhar Mangar. >
How to copy the Dynamic fields into one field
Hi, Can I have the dynamic field in copyField as follows, Can anyone tell me please how to make the dynamic field to be available in one field "all" ?
Re: Boost fileds at indexing ot query time
>From Hossman: "index time field boosts are a way to express things like 'this documents title is worth twice as much as the title of most documents'. Query time boosts are a way to express 'I care about matches on this clause of my query twice as much as I do about matches to other clauses of my query'. HTH Erick On Mon, Apr 6, 2009 at 4:38 AM, Marc Sturlese wrote: > > Hey there, > Don't know if I shoud ask this in here or in Lucene Users forum... > I have a doubt with field boosting (I am using dismax). I use document > boosting at index time to give more importance to some documents. At this > point, I don't care about the matching, just want to tell solr/lucene that > that docuemtns are more important than the others. > In another hand, I give field boost at query time to some fields because I > want to give more importance to the match in that field than in the others. > Until here everything clear... but I am missing the concept of field boost > at index time... waht is it used for? > -- > View this message in context: > http://www.nabble.com/Boost-fileds-at-indexing-ot-query-time-tp22904463p22904463.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Custom sort based on arbitrary order
Hi, Apologies if this question has been answered already, I'm so new to Solr (literally a few hours using it) that I still find some of the answers a bit obscure. I got Apache Solr working for a Drupal install, I must implement ASAP a custom order that is fairly simple: there is a list of venues and some of them are more relevant than others (there is no logic, it's arbitrary, it's not an alphabetic order), it'd be something like this: Orange venue = 1 Red venu = 2 Blue venue = 3 So results where venue is "orange" should go first, then "red" and finally "blue". Could you advice on the easiest way to have this example working? Thanks a lot, Paula -- View this message in context: http://www.nabble.com/Custom-sort-based-on-arbitrary-order-tp22908037p22908037.html Sent from the Solr - User mailing list archive at Nabble.com.
Too many open files and background merge exceptions
I'm indexing a set of 50 small documents. I'm adding documents in batches of 1000. At the beginning I had a setup that optimized the index each 1 documents, but quickly I had to optimize after adding each batch of documents. Unfortunately, I'm still getting the "Too many open files" IO error on optimize. I went from mergeFactor of 25 down to 10, but I'm still unable to optimize the index. I have configuration: false 256 2 2147483647 1 The machine (2 core AMD64, 4GB RAM) is running Debian Linux, Java is 1.6.0_11 64-Bit, Solr is nightly build (2009-04-02). And no, I can not change the limit of file descriptors (currently: 1024). What more can I do? -- We read Knuth so you don't have to. - Tim Peters Jarek Zgoda, R&D, Redefine jarek.zg...@redefine.pl
Re: Too many open files and background merge exceptions
try ulimit -n5 or something On Mon, Apr 6, 2009 at 6:28 PM, Jarek Zgoda wrote: > I'm indexing a set of 50 small documents. I'm adding documents in > batches of 1000. At the beginning I had a setup that optimized the index > each 1 documents, but quickly I had to optimize after adding each batch > of documents. Unfortunately, I'm still getting the "Too many open files" IO > error on optimize. I went from mergeFactor of 25 down to 10, but I'm still > unable to optimize the index. > > I have configuration: > false > 256 > 2 > 2147483647 > 1 > > The machine (2 core AMD64, 4GB RAM) is running Debian Linux, Java is > 1.6.0_11 64-Bit, Solr is nightly build (2009-04-02). And no, I can not > change the limit of file descriptors (currently: 1024). What more can I do? > > -- > We read Knuth so you don't have to. - Tim Peters > > Jarek Zgoda, R&D, Redefine > jarek.zg...@redefine.pl > > -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: Too many open files and background merge exceptions
you may try to put true in that useCompoundFile entry; this way indexing should use far less file descriptors, but it will slow down indexing, see http://issues.apache.org/jira/browse/LUCENE-888. Try to see if the reason of lack of descriptors is related only on solr. How are you using indexing, by using solrj, by posting xmls? Are the files being opened/parsed on the same machine of solr? On Mon, Apr 6, 2009 at 2:58 PM, Jarek Zgoda wrote: > I'm indexing a set of 50 small documents. I'm adding documents in > batches of 1000. At the beginning I had a setup that optimized the index > each 1 documents, but quickly I had to optimize after adding each batch > of documents. Unfortunately, I'm still getting the "Too many open files" IO > error on optimize. I went from mergeFactor of 25 down to 10, but I'm still > unable to optimize the index. > > I have configuration: >false >256 >2 >2147483647 >1 > > The machine (2 core AMD64, 4GB RAM) is running Debian Linux, Java is > 1.6.0_11 64-Bit, Solr is nightly build (2009-04-02). And no, I can not > change the limit of file descriptors (currently: 1024). What more can I do? > > -- > We read Knuth so you don't have to. - Tim Peters > > Jarek Zgoda, R&D, Redefine > jarek.zg...@redefine.pl > >
Re: Using ExtractingRequestHandler to index a large PDF ~solved
Hmmm, Not sure how this all hangs together. But editing my solrconfig.xml as follows sorted the problem:- to Also, my initial report of the issue was misled by the log messages. The mention of "oceania.pdf" refers to a previous successful tika extract. There no mention of the filename that was rejected in the logs or any information that would help me identify it! Regards Fergus. >Sorry if this is a FAQ; I suspect it could be. But how do I work around the >following:- > >INFO: [] webapp=/apache-solr-1.4-dev path=/update/extract >params={ext.def.fl=text&ext.literal.id=factbook/reference_maps/pdf/oceania.pdf} > status=0 QTime=318 >Apr 2, 2009 11:17:46 AM org.apache.solr.common.SolrException log >SEVERE: >org.apache.commons.fileupload.FileUploadBase$SizeLimitExceededException: the >request was rejected because its size (4585774) exceeds the configured maximum >(2097152) > at > org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.(FileUploadBase.java:914) > at > org.apache.commons.fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331) > at > org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:349) > at > org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126) > at > org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:343) > at > org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:396) > at > org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:114) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) > >Although the PDF is big, it contains very little text; it is a map. > > "java -jar solr/lib/tika-0.3.jar -g" appears to have no bother with it. > >Fergus... >-- > >=== >Fergus McMenemie Email:fer...@twig.me.uk >Techmore Ltd Phone:(UK) 07721 376021 > >Unix/Mac/Intranets Analyst Programmer >=== -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: How could I limit a specific field size ?
Shalin Shekhar Mangar wrote: On Mon, Apr 6, 2009 at 1:52 PM, Veselin K wrote: I'd like to copy the "text" field to a field called "preview" and then limit the "preview" field to just a few lines of text (or number of terms). Then I could configure retrieving the "preview" field instead of "text" upon search. Is there a way to specify such size limits per field or something similar? Yes, there is a maxLength attribute for a copyField which you can use: Correction. Use maxChars, not maxLength. Koji
Stemming and ISO Latin Accent filters together
Hi, we're trying to apply the French Stemmer filter with the ISO Latin Accent filter for our index, but unfortunately, we're having some bad behaviors for some searches. After many tries, I've found out that the French Stemmer (or Snowball with language = "french") seems to be too sensitive to accents : for example, we have a couple of documents with the word "publiée". Normally, if I search for "publiée" or "publié" or "publiee" or "publie", all this should be equivalent and returns the same results. But in that case, "publie" and "publiee" does not work at all. I've tried the same words after deactivating the stemming and then re-index, and effectively, the results were good. I've also try to change the order of the filters in the schema, but unfortunately, it brings other kind of problems. I know that this should be more a question for the Lucene community, but I'm just curious if someone using Solr and working with such language seems to encounter the same behave and has someway found a trick to fix the problem by, for example, using another filter or using the protword list feature of Snowball. Thanks. -- View this message in context: http://www.nabble.com/Stemming-and-ISO-Latin-Accent-filters-together-tp22910690p22910690.html Sent from the Solr - User mailing list archive at Nabble.com.
Pass Quoted Query to Solr
Hi, I am sending a query to the solr search engine from my application using httpClient. I want to search for a specific title from the available. for e.g. If user wants to search for the book which have titled "Complete Java Reference", I am sending this query to Solr having double quotes with the search string. I have encoded the search criteria but it is still giving me "Invalid Query Exception". Please suggest as if there is any way to pass this query to Solr. I am not sure though, but can we do this using SolrJ client instead of httpClient. If yes, then how Solrj is handing this kind of request. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Pass-Quoted-Query-to-Solr-tp22911184p22911184.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtractingRequestHandler Question
Hi Jacob, Thanks for the reply. I am still trying to nail down this problem with the best possible solution. Yeah I had thought about these 2 approaches but both of them are gonna make my indexing slower. Plus the fact that I will have atleast 5 rich text files associated with each document is not helping much either. Anyways I will explore and see if I can come up with anything better (may be a separate index for rich text docs). Thanks, Venu From: Jacob Singh To: solr-user@lucene.apache.org Sent: Saturday, April 4, 2009 9:59:13 PM Subject: Re: ExtractingRequestHandler Question Hi TIA, I have the same desired requirement. If you look up in the archives, you might find a similar thread between myself and the always super helpful Erik Hatcher. Basically, it can't be done (right now). You can however use the "ExtractOnly" request handler, and just get the extracted text back from solr, and then use xpath to get out the attributes and then add them to your XML you are sending. Not ideal because the file has to be transfered twice. The only other option is to send the file as per the instructions via POST with its attributes as POST fields. Keep in mind that Solr documents are immutable, which means they cannot change. When you update a document with the same primary key, it will simply delete the existing one and add the new one. hth, Jacob On Sat, Apr 4, 2009 at 5:59 AM, Venu Mittal wrote: > Hi, > > I am using ExtractingRequestHandler to index rich text documents. > The way I am doing it is I get some data related to the document from > database and then post an xml (containing only this data ) to solr. Then I > make another call to solr, which sends the actual document to be indexed. > But while doing so I am loosing out all the other data that is related to the > document. > > Is this the right way to do handle it or am I missing out on something. > > TIA > > > > -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: Pass Double Quotes using SolrJ
My application is using the httpclient. I will have to replace this from solrj client. But do Solrj client supports passing query with double quotes in it. like ?q="Glorious Revolution"&qt=dismaxrequest Thanks, Amit Garg Shalin Shekhar Mangar wrote: > > On Mon, Apr 6, 2009 at 10:56 AM, dabboo wrote: > >> >> I want to pass double quotes to my solr from the front end, so that it >> can >> return the specific results of that particular phrase which is there in >> double quotes. >> >> If I use httpClient, it doesnt allow me to send the query in this format. >> As >> it throws me an invalid query exception. >> >> I want to know, if I can do this with SolrJ Client. If yes, can somebody >> please let me know as how SolrJ is doing this and parsing this type of >> query. >> >> > Amit, look at > http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Escaping%20Special%20Charactersfor > the list of characters that need to be escaped. Look at > ClientUtils.escapeQueryChars() method in Solrj. I'm curious to know why > you > are trying to roll your own solr client when Solrj exists? > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Pass-Double-Quotes-using-SolrJ-tp22902404p22912443.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Wildcard searches
So I've started making a QParserPlugin to handle phrase wild card searches but I think I need a little bit of guidance. In my plugin I've subclassed the SolrQueryParser and overridden the getFieldQuery(...) method so that I can handle queries that contain spaces and wildcards. I naively tried to construct a WildcardQuery object from the query text but that didn't seem to work. What sort of Query object(s) should I be using here? (Note: the field I'm working with is an untokenized field). Thanks, Laurent -Original Message- From: solr-user-return-20352-laurent.vauthrin=disney@lucene.apache.org [mailto:solr-user-return-20352-laurent.vauthrin=disney@lucene.apache .org] On Behalf Of Otis Gospodnetic Sent: Wednesday, April 01, 2009 9:11 AM To: solr-user@lucene.apache.org Subject: Re: Wildcard searches Hi, Another option for 1) is to use n-grams with token begin/end symbols. Then you won't need to use wildcards at all, but you'll have a larger index. 2) may be added to Lucene in the near future, actually, I saw a related JIRA issue. But in the mean time, yes, you coul dimplement it via a custom QParserPlugin. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: "Vauthrin, Laurent" > To: solr-user@lucene.apache.org > Sent: Monday, March 30, 2009 5:45:30 PM > Subject: Wildcard searches > > Hello again, > > I'm in the process of converting one of our services that was previously > using Lucene to use Solr instead. The main focus here is to preserve > backwards compatibility (even if some searches are not as efficient). > There are currently two scenarios that are giving me problems right now. > > 1. Leading wildcard searches/suffix searches (e.g. *ickey) > I've looked at https://issues.apache.org/jira/browse/SOLR-218. Is the > best approach to create a QParserPlugin and change the parser to allow > leading wildcards - setAllowLeadingWildcard(true)? At the moment we're > trying to avoid indexing terms in reverse order. > > 2. Phrase searches with wildcards (e.g. "Mickey Mou*") > From what I understand, Solr/Lucene doesn't support this but we used to > get results with the following code: > > new WildcardQuery(new Term("U_name", " Mickey Mou*")) > > Is it possible for me to allow this capability in a QParserPlugin? Is > there another way for me to do it? > > Thanks, > Laurent Vauthrin
solr 1.4 facet boost field according to another field
Hi, I've title description and tag field ... According to where I find the word searched, I would like to boost differently other field like nb_views or rating. if word is find in title then nb_views^10 and rating^10 if word is find in description then nb_views^2 and rating^2 Thanks a lot for your help, -- View this message in context: http://www.nabble.com/solr-1.4-facet-boost-field-according-to-another-field-tp22913642p22913642.html Sent from the Solr - User mailing list archive at Nabble.com.
solr 1.4 indexation or request > memory
Hi I would like to know if it use less memory to facet or put weight to a field when I index it then when I make a dismax request. Thanks, -- View this message in context: http://www.nabble.com/solr-1.4-indexation-or-request-%3E-memory-tp22913679p22913679.html Sent from the Solr - User mailing list archive at Nabble.com.
solr 1.4 memory jvm
Hi, Sorry I can't find and issue, during my replication my respond time query goes very slow. I'm using replication handler, is there a way to slow down debit or ??? 11G index size 8G ram 20 requests/sec Java HotSpot(TM) 64-Bit Server VM 10.0-b22 Java HotSpot(TM) 64-Bit Server VM 4 -Xms4G -Xmx5G -XX:ScavengeBeforeFullGC -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/data/solr/logs/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStam − Is it a problem ?? 0.21 (error executing: uname -a) (error executing: ulimit -n) (error executing: uptime) Thanks -- View this message in context: http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22913742.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Remote Access To Schema Data
The LukeRequest class gets me what I wanted. Thanks! -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, April 03, 2009 10:15 AM To: solr-user@lucene.apache.org Subject: Re: Remote Access To Schema Data On 4/3/09, Erik Hatcher wrote: > > On Apr 3, 2009, at 9:26 AM, Shalin Shekhar Mangar wrote: >> Note that the luke handler gives out a lot of information like term >> frequency and therefore takes a longer time to execute. > > It's fast if you say &numTerms=0 though, which is good enough to get > field/type info. Nice. I didn't know that. Thanks Erik. -- Regards, Shalin Shekhar Mangar.
Term Counts/Term Frequency Vector Info
I want the functionality that Lucene IndexReader.termDocs gives me. That or access on the document level to the term vector. This (http://wiki.apache.org/solr/TermVectorComponent?highlight=(term)|(vector) seems to suggest that this will be available in 1.4. Is there any way to do this in 1.3? Thanks, Clay
Re: Searching on mulit-core Solr
Hi, Any help on this. I've looked at DistributedSearch on Wiki, but that doesn't seem to be working for me on multi-core and multiple Solr instances on the same box. Scenario, 1) Two boxes (localhost, 10.4.x.x) 2) Two Solr instances on each box (8080 and 8085 ports) 3) Two cores on each instance (core0, core1) I'm not sure how to construct my search on the above setup if I need to search across all the cores on all the boxes. Here is what I'm trying, http://localhost:8080/solr/core0/select?shards=localhost:8080/solr/core0,localhost:8085/solr/core0,localhost:8080/solr/core1,localhost:8085/solr/core1,10.4.x.x:8080/solr/core0,10.4.x.x:8085/solr/core0,10.4.x.x:8080/solr/core1,10.4.x.x:8085/solr/core1&indent=true&q=vivek+japan I get 404 error. Is this the right URL construction for my setup? How else can I do this? Thanks, -vivek On Fri, Apr 3, 2009 at 1:02 PM, vivek sar wrote: > Hi, > > I've a multi-core system (one core per day), so there would be around > 30 cores in a month on a box running one Solr instance. We have two > boxes running the Solr instance and input data is feeded to them in > round-robin fashion. Each box can have up to 30 cores in a month. Here > are questions, > > 1) How would I search for a term in multiple cores on same box? > > Single core I'm able to search like, > http://localhost:8080/solr/20090402/select?q=*:* > > 2) How would I search for a term in multiple cores on both boxes at > the same time? > > 3) Is it possible to have two Solr instances on one box with one doing > the indexing and other perform only searches on that index? The idea > is have two JVMs with each doing its own task - I'm not sure whether > the indexer process needs to know about searcher process - like do > they need to have the same solr.xml (for multicore etc). We don't want > to replicate the indexes also (we got very light search traffic, but > very high indexing traffic) so they need to use the same index. > > > Thanks, > -vivek >
Coming up with a model of memory usage
To combat our frequent OutOfMemory Exceptions, I'm attempting to come up with a model so that we can determine how much memory to give Solr based on how much data we have (as we expand to more data types eligible to be supported this becomes more important). Are there any published guidelines on how much memory a particular document takes up in memory, based on the data types, etc? I have several stored fields, numerous other non-stored fields, a largish copyTo field, and I am doing some sorting on indexed, non-stored fields. Any pointers would be appreciated! Thanks, -Joe
Re: Searching on mulit-core Solr
vivek, 404 from the URL you provided in the message! Similar URLs work OK for me. hmm try http://localhost:8080/solr/admin/cores?action=status and see if that gives a 404. Also are you running a nightly build or a svn checkout? Using tomcat? Perhaps it should be http://localhost:8080/apache-solr-1.4-dev/admin/cores?action=status Fergus. >Hi, > > Any help on this. I've looked at DistributedSearch on Wiki, but that >doesn't seem to be working for me on multi-core and multiple Solr >instances on the same box. > >Scenario, > >1) Two boxes (localhost, 10.4.x.x) >2) Two Solr instances on each box (8080 and 8085 ports) >3) Two cores on each instance (core0, core1) > >I'm not sure how to construct my search on the above setup if I need >to search across all the cores on all the boxes. Here is what I'm >trying, > >http://localhost:8080/solr/core0/select?shards=localhost:8080/solr/core0,localhost:8085/solr/core0,localhost:8080/solr/core1,localhost:8085/solr/core1,10.4.x.x:8080/solr/core0,10.4.x.x:8085/solr/core0,10.4.x.x:8080/solr/core1,10.4.x.x:8085/solr/core1&indent=true&q=vivek+japan > >I get 404 error. Is this the right URL construction for my setup? How >else can I do this? > >Thanks, >-vivek > >On Fri, Apr 3, 2009 at 1:02 PM, vivek sar wrote: >> Hi, >> >> I've a multi-core system (one core per day), so there would be around >> 30 cores in a month on a box running one Solr instance. We have two >> boxes running the Solr instance and input data is feeded to them in >> round-robin fashion. Each box can have up to 30 cores in a month. Here >> are questions, >> >> 1) How would I search for a term in multiple cores on same box? >> >> Single core I'm able to search like, >> http://localhost:8080/solr/20090402/select?q=*:* >> >> 2) How would I search for a term in multiple cores on both boxes at >> the same time? >> >> 3) Is it possible to have two Solr instances on one box with one doing >> the indexing and other perform only searches on that index? The idea >> is have two JVMs with each doing its own task - I'm not sure whether >> the indexer process needs to know about searcher process - like do >> they need to have the same solr.xml (for multicore etc). We don't want >> to replicate the indexes also (we got very light search traffic, but >> very high indexing traffic) so they need to use the same index. >> >> >> Thanks, >> -vivek >> -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: Term Counts/Term Frequency Vector Info
See also http://wiki.apache.org/solr/TermsComponent You might be able to apply these patches to 1.3 and have them work, but there is no guarantee. You also can get some termDocs like capabilities through Solr's faceting capabilities, but I am not aware of any way to get at the term vector capabilities. HTH, Grant On Apr 6, 2009, at 1:49 PM, Fink, Clayton R. wrote: I want the functionality that Lucene IndexReader.termDocs gives me. That or access on the document level to the term vector. This (http://wiki.apache.org/solr/TermVectorComponent?highlight=(term )|(vector) seems to suggest that this will be available in 1.4. Is there any way to do this in 1.3? Thanks, Clay -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: solr 1.4 memory jvm
hi sunnyfr, I wish to clarify something. you say that the performance is poor "during" the replication. I suspect that the performance is poor soon after the replication. The reason being , replication is a low CPU activity. If you think otherwise let me know how you found it out. If the perf is low soon after the replication is completed. I mean the index files are downloaded and the searcher is getting opened, it is understandable. That is the time when warming is done. have you setup auto warming? On Mon, Apr 6, 2009 at 11:12 PM, sunnyfr wrote: > > Hi, > > Sorry I can't find and issue, during my replication my respond time query > goes very slow. > I'm using replication handler, is there a way to slow down debit or ??? > > 11G index size > 8G ram > 20 requests/sec > Java HotSpot(TM) 64-Bit Server VM > > > 10.0-b22 > Java HotSpot(TM) 64-Bit Server VM > 4 > > -Xms4G > -Xmx5G > -XX:ScavengeBeforeFullGC > -XX:+UseConcMarkSweepGC > -XX:+HeapDumpOnOutOfMemoryError > -Xloggc:/data/solr/logs/gc.log > -XX:+PrintGCDetails > -XX:+PrintGCTimeStam > - > > > Is it a problem ?? > 0.21 > (error executing: uname -a) > (error executing: ulimit -n) > (error executing: uptime) > > Thanks > > -- > View this message in context: > http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22913742.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul
response time
Hi, I have around 10 solr servers running indexes of around 80-85 GB each and and with 16,000,000 docs each. When i use distrib for querying, I am not getting a satisfactory response time. My response time is around 4-5 seconds. Any suggestions to improve the response time for queries (to bring it below 1 second). Is the response slow due to the size of the index ? I have already gone through the pointers provided at: http://wiki.apache.org/solr/SolrPerformanceFactors Regards, CI