Re: phpnative response writer in SOLR 3.1 ?
Am 14.04.2011 09:53, schrieb Ralf Kraus: Hello, I just updatet to SOLR 3.1 and wondering if the phpnative response writer plugin is part of it? ( https://issues.apache.org/jira/browse/SOLR-1967 ) When I try to compile the sources files I get some errors : PHPNativeResponseWriter.java:57: org.apache.solr.request.PHPNativeResponseWriter is not abstract and does not override abstract method getContentType(org.apache.solr.request.SolrQueryRequest,org.apache.solr.response.SolrQueryResponse) in org.apache.solr.response.QueryResponseWriter public class PHPNativeResponseWriter implements QueryResponseWriter { ^ PHPNativeResponseWriter.java:70: method does not override a method from its superclass @Override ^ Is there a new JAR File or something I could use with SOLR 3.1? Because the SOLR pecl Package only uses XML oder PHPNATIVE as response writer ( http://pecl.php.net/package/solr ) No hints at all ? -- Greetings, Ralf Kraus
Dismax Minimum Match/Stopwords Bug
A thread with this same subject from 2008/2009 is here: http://search-lucene.com/m/jkBgXnSsla We're seeing customers being bitten by this "bug" now and then, and normally my workaround is to simply not use stopwords at all. However, is there an actual fix in the 3.1 eDisMax parser which solves the problem for real? Cannot find a JIRA issue for it. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com
Re: SOLR support for unicode?
Hi, Thanks for your response. I am currently working in this issue. When I run the test_utf8.sh script, I got the following result. Solr server is up. HTTP GET is accepting UTF-8 HTTP POST is accepting UTF-8 HTTP POST defaults to UTF-8 ERROR: HTTP GET is not accepting UTF-8 beyond the basic multilingual plane ERROR: HTTP POST is not accepting UTF-8 beyond the basic multilingual plane ERROR: HTTP POST + URL params is not accepting UTF-8 beyond the basic multilingual plane I also placed "TM" symbol and "–" Symbol in one of the example XML docs and indexed that with post.jar, with "wt=python" param. Input: Good unicode support: héllo (hello with an™ accent OLB – Account over the e) Output: Good unicode support: héllo (hello with an� accent OLB � Account over the e) -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-support-for-unicode-tp2790512p2824041.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search and index Result
You're possibly getting hit by server caching. Are you by chance submitting the exact same query after your commit? What happens if you change your query do one you haven't used before? Turning off http caching might help. Solr should be searching the new contents after a commit (and any attendant warmup time). Best Erick On Fri, Apr 15, 2011 at 1:43 AM, satya swaroop wrote: > Hi all, > i just made a duplication of solrdispatchfilter as > solrdispatchfilter1 and solrdispatchfilter2 such that all the /update or > /update/extract things are passed through the solrdispatchfilter1 > and all search (/select) things are passes through the > solrdispatchfilter2. It is because i need to establish a privacy concern > for > the search result. > I need to check whether the required user has access to the particular > files > or not.. it was success in implementing the privacy of results. > one major problem i am getting is after indexing some documents and > commiting it, i am not getting the commited data in the search result, i am > getting the old data that was before commit... > But i get the result only after restarting the server.. can anyone tell me > where to modify such that the search will give the results from the recent > commit... > > > Thanks and Regards, > satya >
newbie - filter to only show queried field when query is free text
Hi, If I want to filter a search result to not return all fields as per the default but I don't know what field my hits will be in. This is basically for unstructured document type data, for example large HTML or DOCBOOK documents. thanks, Bryan Rasmussen
Re: newbie - filter to only show queried field when query is free text
Hi There may be better ways but as far as my knowledge goes, I'd try to use the highhlighting component, with hl.requireFieldMatch the hightlighting response only includes fields where hightlights were applied (match was found), which is probably what you want. Best Marek Tichy > Hi, > > If I want to filter a search result to not return all fields as per > the default but I don't know what field my hits will be in. > > This is basically for unstructured document type data, for example > large HTML or DOCBOOK documents. > > thanks, > Bryan Rasmussen > >
DataImportHandler - importing XML documents, undeclared general entity - DTD right there
Hi, I am importing a number of XML documents from the filesystem. The dataimporthandler finds them, but returns an undeclared general entity error - even though my DTD is present and findable by other parsers. DTD Declaration In XML file in the same folder as the DTD allartikel.dtd Thanks, Bryan Rasmussen
Using autocomplete with the new Suggest component
Hi everybody, Recently I implemented an autocomplete mechanism for my website using a custom TermsComponent. I was quite happy with that because it also enables me to do a Google-like feature where complete sentences where suggested to the user when he typed in the search field. I used Shingles to search against pieces of sentences. (I have resources for French people if somebody asks) Then came solr 3.1 and its new suggest component. I have looked at the documentation but it's still unclear how it works exactly. So please let me ask some questions : - Is there performance improvements over TermsComponent ? - Is it able to autosuggest sentences and not only words ? If yes, how ? Should I keep my shingles ? - What is this "threshold" value that I see ? Is it a mandatory field to complete ? I want to have suggestion no matter what the frequency is in the document ! Thank you all, if I succeed to do that I will try to provide a tutorial to do what with Jquery UI autocomplete + Suggest component if anyone's interested. Best regards. Victor
Strange DisMax results
Hi. I've got a strange result of a DisMax search function. I might have understood the functionallity wrong. But after I read the manual I understood it is used to do ranked results with simple search terms. Solr Version 1.4.0 I've got the setup Schema fields -- DisMax config -- explicit 0.01 name^1.2 shortDescription^1.0 longDescription^1.0 prodShortDescription^0.5 prodLongDescription^0.5 name^1.2 shortDescription^1.0 longDescription^1.0 prodShortDescription^0.5 prodLongDescription ^0.5 *:* 100 spellcheck Standard config - explicit spellcheck When I search for a term "q=term" I get 68 hits. But when I search for "q=term&qt=dismax" I get 0 hits. Of course I got more fields and search parameters. But the only difference I could see is that in one case I use dismax and the other I don't. What have I missed? Any suggestions? Best regards Daniel
Re: Strange DisMax results
If you haven't modified your schema.xml, you'll find that the is set to the text field. So when you issue the q=term you're going against your default search field. Assuming you've changed the default search field to "defaultSearch", then the problem is probably that your analysis chain for default search is different from that applied to your individual fields. Which I absolutely guarantee since you have two different fieldTypes in your 5 fields. I'm extremely suspicious of your fieldTypes that involve the word "keyword", because if this indicates the KeywordTokenizer is being used, then everything in the input is a single token, the input stream isn't being split up... But the best way to understand this is in the admin/analysis page. If you check the "verbose" box and put in some text you'll see the effects of each part of the chain. Try this for the field you expect Dismax to find your term in, and also for your defaultSearch field and I suspect you'll see what's going on Best Erick On Fri, Apr 15, 2011 at 10:35 AM, Daniel Persson wrote: > Hi. > > I've got a strange result of a DisMax search function. I might have > understood the functionallity wrong. But after I read the manual I > understood it is used to do ranked results with simple search terms. > > Solr Version 1.4.0 > > I've got the setup > > Schema fields > -- > multiValued="false"/> > multiValued="false"/> > stored="false" multiValued="false"/> > indexed="true" stored="false" multiValued="false"/> > indexed="true" stored="false" multiValued="false"/> > > > > > > > > > DisMax config > -- > > > explicit > 0.01 > >name^1.2 shortDescription^1.0 longDescription^1.0 > prodShortDescription^0.5 prodLongDescription^0.5 > > >name^1.2 shortDescription^1.0 longDescription^1.0 > prodShortDescription^0.5 prodLongDescription ^0.5 > > *:* > 100 > > > spellcheck > > > > Standard config > - > default="true"> > > explicit > > > spellcheck > > > > > When I search for a term "q=term" I get 68 hits. But when I search for > "q=term&qt=dismax" I get 0 hits. > > Of course I got more fields and search parameters. But the only difference > I > could see is that in one case I use dismax and the other I don't. > > What have I missed? Any suggestions? > > Best regards > > Daniel >
Sort by function - 400 error
Using solr 3.1. When I do: sort=score desc it works. sort=product(typeId,2) desc (typeId is a valid attribute in document) it works. sort=product(score,typeId) desc fails on 400 error? Also "sort=product(score,2) desc" fails too. Must be something basic I'm missing? Tried adding &fl=*,score too. Thanks Mike
RE: Understanding the DisMax tie parameter
Thanks everyone. I updated the wiki. If you have a chance please take a look and check to make sure I got it right on the wiki. http://wiki.apache.org/solr/DisMaxQParserPlugin#tie_.28Tie_breaker.29 Tom -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, April 14, 2011 5:41 PM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Cc: Burton-West, Tom Subject: Re: Understanding the DisMax tie parameter : Perhaps the parameter could have had a better name. It's essentially : max(score of matching clauses) + tie * (score of matching clauses that : are not the max) : : So it can be used and thought of as a tiebreak only in the sense that : if two docs match a clause (with essentially the same score), then a : small tie value will act as a tiebreaker *if* one of those docs also : matches some other fields. correct. w/o a tiebreaker value, a dismax query will only look at the maximum scoring clause for each doc -- the "tie" param is named for it's ability to help break ties when multiple documents have the same score from the max scoring clause -- by adding in a small portion of the scores (based on the 0->1 ratio of the "tie" param) from the other clauses. -Hoss
Re: Sort by function - 400 error
On Fri, Apr 15, 2011 at 11:50 AM, Michael Owen wrote: > > Using solr 3.1. > When I do: > sort=score desc > it works. > sort=product(typeId,2) desc (typeId is a valid attribute in document) > it works. > sort=product(score,typeId) desc > fails on 400 error? Also "sort=product(score,2) desc" fails too. You can't currently use "score" in function queries. You can embed another query in a function query though. Example: sort=product($qq,typeId) desc&qq=my_query_here In your case, when you just want to multiply the score by a field, then you can either use the edismax query parser and the "boost" parameter: defType=edismax&q=my_query_here&boost=typeId Or you could directly use the "boost" query parser http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html q={!boost b=typeId}my_query_here OR q={!boost b=typeId v=$qq}&qq=my_query_here -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Field compression
I know I'm late to the party, but I recently learned that field compression was removed as of Solr 1.4.1. I think a lot of sites were relying on that feature, so I'm curious what people are doing now that it's gone. Specifically, what are people doing to efficiently store *and highlight* large fulltext fields? I can think of ways to store the text efficiently (compress it myself), or highlight it (leave it uncompressed), but not both at the same time. Also, is anyone working on anything to restore compression to Solr? I understand it was removed because Lucene removed support for it, but I was hoping to upgrade my site to 3.1 soon and we rely on that feature. - Charlie
Solr 3.1: Old Index Files Not Removed on Optimize?
I was just hoping someone might be able to point me in the right direction here. We just upgraded from Solr 1.4 to Solr 3.1 this past week and we're having issues running out of disk space on our Master servers. Our Master has dozens of cores. We have a script that kicks off once per day to do a rolling optimize. The script optimizes a single core, waits 5 minutes to give the server some breathing room to catch up on indexing in a non-i/o intensive state, and then moves onto the next core (repeating until done). The problem we are facing is that under Solr 1.4, the old index files were deleted very quickly after each optimize, but under Solr 3.1, the old index files hang around for hours... in many cases they don't disappear until we restart Solr completely. This is leading to us running out of disk space, as each core's index doubles in size during the optimize process and stays that way until the next solr restart. I was just wondering if anyone could point me to some specific changes or settings which may be leading to the difference between solr versions (or any other environmental issues you may know about). I see several tickets in Jira about similar issues, but they mostly appear to have been resolved in the past. Has anyone else seen this behavior under Solr 3.1, or do you think we may be missing some kind of new configuration setting? For reference, we are running on 64bit RedHat Linux. This is what I have right now: [From SolrConfig.xml]: true commit optimize startup 10 30 false 1 Thanks in advance, -Trey
Split token
Hello, I want to split my string when it contains "(". Example: spurs (London) Internationale (milan) to spurs (london) Internationale (milan) What tokenizer can i use to fix this problem? -- View this message in context: http://lucene.472066.n3.nabble.com/Split-token-tp2810772p2810772.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: partial optimize does not reduce the segment number to maxNumSegments
thanks! It seems the file count in index directory is the segment# * 8 in my dev environment... I see there are .fnm .frq .fdt .fdx .nrm .prx .tii .tis (8) file extensions, and each has as many as segment# files. Is it always safe to calculate the file counts using segment number multiply by 8? of course this excludes the segment_N, segment.gen and xxx_del files. I found most of the cores has the file count that can be calculated just using above formula, but few cores do not have a match number... thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2813419.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: partial optimize does not reduce the segment number to maxNumSegments
yeah, I can figure out the segment number by going to stat page of solr... but my question was how to figure out exact total number of files in 'index' folder for each core. Like I mentioned in previous message, I currently have 8 files per segment (.prx .tii etc), but it seems this might change if I use term vector for example. So I need suggestions on how to accurately figure out the total file number. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2817912.html Sent from the Solr - User mailing list archive at Nabble.com.
most stable way to get facet pivoting
Hi, I want to evaluate (and probably use in production) facet pivoting - what is the best approach to get a "as-stable-as-can-be" version of solr which is able to do facet pivoting? I was hoping to see this in Solr 3.1, but apparently it is only in the dev versions/nightlies... Is it possible to patch this feature into Solr 3.1 stable? best regards, Nik -- Nikolas Tautenhahn nikolas.tautenh...@livinglogic.de http://www.livinglogic.de LivingLogic AG Markgrafenallee 44 95448 Bayreuth Amtsgericht Bayreuth ++ HRB 3274 Aufsichtsratsvorsitzender: Achim Lindner Vorstand: Philipp Ambrosch, Alois Kastner-Maresch (Vors.)
How to combine Deduplication and Elevation
Hi I have a question. How to combine the Deduplication and Elevation implementations in Solr. Currently , I managed to implement either one only. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-combine-Deduplication-and-Elevation-tp2819621p2819621.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 3.1.0 core not reloading with RamDirectoryFactory
Hello, We just tried core reloading on a freshly installed Solr 3.1.0 with RamDirectoryFactory. It doesn't seem to happen. With the FSDirectoryFactory everything works fine. Looks like the RamDirectoryFactory implementation caches directory and if it's available it doesn't really reopen it thus not having updated index loaded into memory. Can anyone comment on this? Should we implement our own RamDirectoryFactory? Here is the code snippet from Solr 3.1.0. It looks a bit confusing. public Directory open(String path) throws IOException { synchronized (RAMDirectoryFactory.class) { RefCntRamDirectory directory = directories.get(path); if (directory == null || !directory.isOpen()) { directory = (RefCntRamDirectory) openNew(path); directories.put(path, directory); } else { directory.incRef(); } return directory; } } Regards, Dmitry -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-1-0-core-not-reloading-with-RamDirectoryFactory-tp2820603p2820603.html Sent from the Solr - User mailing list archive at Nabble.com.
Avoiding corrupted index
Hi everyone, We are using Solr 1.4.1 in my company and we need to do some backups of the indexes. After some googling, I'm quite confused about the differents ways of backing up the index. First, I tried the scripts provided in the Solr distribution without success : I untarred the apache-solr-1.4.1.tar.gz into /opt; then I launched but I get this error : $ /opt/apache-solr-1.4.1/src/scripts/backup /opt/apache-solr-1.4.1/src/scripts/backup: line 26: /opt/apache-solr-1.4.1/src/bin/scripts-util: No such file or directory And that's true : there is no /opt/apache-solr-1.4.1/src/bin/scripts-util but a /opt/apache-solr-1.4.1/src/scripts/scripts-util Is this normal to distribute the scripts with a bad path ? Then I discovered that these utility scripts were not distributed anymore with the version 3.1.0 : were they not reliable ? can we get corrupted backups with this scripts ? Finally, we found the page about SolrReplication on the Solr wiki also this post http://stackoverflow.com/questions/3083314/solr-incremental-backup-on-real-time-system-with-heavy-indexand in particular the answer advising to use the replication. So we tried to use this replication mecanism (and call the URL on the slave with the query parameters command="backup" and location="/backup") but this method requires lots of i/o for big index. Is it the best way to get not corrupted backup of the index ? Is there another way to do the backup with Solr 3.1 ? Thanks in advance for your time. Regards, Laurent
Re: SOLR support for unicode?
Hi, Thanks for your response. I am currently working in this issue. When I run the test_utf8.sh script, I got the following result. Solr server is up. HTTP GET is accepting UTF-8 HTTP POST is accepting UTF-8 HTTP POST defaults to UTF-8 ERROR: HTTP GET is not accepting UTF-8 beyond the basic multilingual plane ERROR: HTTP POST is not accepting UTF-8 beyond the basic multilingual plane ERROR: HTTP POST + URL params is not accepting UTF-8 beyond the basic multilingual plane I also placed "TM" symbol and "–" Symbol in one of the example XML docs and indexed that with post.jar, with "wt=python" param. Input: Good unicode support: héllo (hello with an™ accent OLB – Account over the e) Output: Good unicode support: héllo (hello with an� accent OLB � Account over the e) -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-support-for-unicode-tp2790512p2822358.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing relations for sorting
Hi everybody, I have the following problem/question: In our system we have some categories and products in those categories. Our structure looks a bit like this: product X belongs to category: cat1_subcat1 (10) product X belongs to category: cat2_subcat1 (20) product Y belongs to category: cat1_subcat2 (30) product Z belongs to category: cat2_subcat1 (15) Every product-to-category relation has its own sorting order which we would like to index in solr. To make the problem more complex, we have two ways of searching for a product: We want all products of subcat1 (no mather what the parent category is) ordered by their sorting order We want all products of cat2_subcat1 ordered by their sorting order This probably is not what solr is designed for, but everything else in our system is indexed and searched by solr. So it would be very helpfull if someone has an idea or suggestion to make this work. Our solr version is 1.3.0 Many thanks! Derk -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-relations-for-sorting-tp2824223p2824223.html Sent from the Solr - User mailing list archive at Nabble.com.
how to import data from database combine with file content in solr
Hello, I am new to solr, my requirements are, 1. at regular interval need solr to fetch data from sql server database and do indexing on it. 2. fetch only those records which is not yet indexed 3. for each record there is one file associated, so with database table fields also want to index content of that particular file e.g. there is one table "Customer" in database and customerid is primary key for each customerid there is associated file of that customerprofile named with customerid, 4. as i metioned above that when solr fetch data from sql server database table , should fetch only data which is not yet indexed, (we have one older lucene code, in which there is one field in table that isindexed so when fetching data in select clause there is one condition that isindexed=false, and when indexing is done update particular record of database with isindexed=true) is there any mechanism in solr for that? how to achieve same ? do i need to write custom code for that or it can be done with configuration provided by solr? Thanks, Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-import-data-from-database-combine-with-file-content-in-solr-tp2824749p2824749.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using autocomplete with the new Suggest component
Hi Victor, I have the same questions about the new Suggest component. I can't really help you as I didn't really manage to understand how it worked. Sometimes, I had more results, sometimes less. Even so, I would really be interested in your resources using Terms and shingles to implement auto-complete. I am myself a French student and it could help me improve the solution of one of my project. Best regards, Quentin 2011/4/15 openvictor Open > Hi everybody, > > > Recently I implemented an autocomplete mechanism for my website using a > custom TermsComponent. I was quite happy with that because it also enables > me to do a Google-like feature where complete sentences where suggested to > the user when he typed in the search field. I used Shingles to search > against pieces of sentences. > (I have resources for French people if somebody asks) > > Then came solr 3.1 and its new suggest component. I have looked at the > documentation but it's still unclear how it works exactly. So please let me > ask some questions : > > > - Is there performance improvements over TermsComponent ? > - Is it able to autosuggest sentences and not only words ? If yes, how ? > Should I keep my shingles ? > - What is this "threshold" value that I see ? Is it a mandatory field to > complete ? I want to have suggestion no matter what the frequency is in > the > document ! > > > Thank you all, if I succeed to do that I will try to provide a tutorial to > do what with Jquery UI autocomplete + Suggest component if anyone's > interested. > Best regards. > > Victor > -- Quentin Proust Email : q.pro...@gmail.com Tel : 06.78.81.15.94 http://www.linkedin.com/in/quentinproust
Re: Split token
What you've shown would be handled with WhitespaceTokenizer, but you'd have to prevent filters from stripping the parens. If you have to handle things like blah ( stuff ) WhitespaceTokenizer wouldn't work. PatternTokenizerFactory might work for you, see: http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternTokenizerFactory.html Best Erick On Tue, Apr 12, 2011 at 6:02 AM, roySolr wrote: > Hello, > > I want to split my string when it contains "(". Example: > > spurs (London) > Internationale (milan) > > to > > spurs > (london) > Internationale > (milan) > > What tokenizer can i use to fix this problem? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Split-token-tp2810772p2810772.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Using autocomplete with the new Suggest component
Hi Quentin, well stick in this thread, I will try to see how it works and get inputs from other people. Here is the link to my blog who shows how to do it : http://www.victorkabdebon.net/archives/16 Note that I used Tomcat + SolR, but it can easily done with PHP. Also solrj in 1.4.1 didn't have terms component so I had to find a way around that problem but it's provided. 2011/4/15 Quentin Proust > Hi Victor, > > I have the same questions about the new Suggest component. > I can't really help you as I didn't really manage to understand how it > worked. > Sometimes, I had more results, sometimes less. > > Even so, I would really be interested in your resources using Terms and > shingles to implement auto-complete. > I am myself a French student and it could help me improve the solution of > one of my project. > > Best regards, > Quentin > > 2011/4/15 openvictor Open > > > Hi everybody, > > > > > > Recently I implemented an autocomplete mechanism for my website using a > > custom TermsComponent. I was quite happy with that because it also > enables > > me to do a Google-like feature where complete sentences where suggested > to > > the user when he typed in the search field. I used Shingles to search > > against pieces of sentences. > > (I have resources for French people if somebody asks) > > > > Then came solr 3.1 and its new suggest component. I have looked at the > > documentation but it's still unclear how it works exactly. So please let > me > > ask some questions : > > > > > > - Is there performance improvements over TermsComponent ? > > - Is it able to autosuggest sentences and not only words ? If yes, how > ? > > Should I keep my shingles ? > > - What is this "threshold" value that I see ? Is it a mandatory field > to > > complete ? I want to have suggestion no matter what the frequency is in > > the > > document ! > > > > > > Thank you all, if I succeed to do that I will try to provide a tutorial > to > > do what with Jquery UI autocomplete + Suggest component if anyone's > > interested. > > Best regards. > > > > Victor > > > > > > -- > > Quentin Proust > Email : q.pro...@gmail.com > Tel : 06.78.81.15.94 > http://www.linkedin.com/in/quentinproust > >
Re: partial optimize does not reduce the segment number to maxNumSegments
Why do you care? You haven't outlined why having the precise numbers here is necessary. Perhaps with a higher-level statement of the problem you're trying to solve we could make some better suggestions Best Erick On Wed, Apr 13, 2011 at 5:23 PM, Renee Sun wrote: > yeah, I can figure out the segment number by going to stat page of solr... > but my question was how to figure out exact total number of files in > 'index' > folder for each core. > > Like I mentioned in previous message, I currently have 8 files per segment > (.prx .tii etc), but it seems this might change if I use term vector for > example. So I need suggestions on how to accurately figure out the total > file number. > > thanks > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2817912.html > Sent from the Solr - User mailing list archive at Nabble.com. >
RE: Split token
This pattern split tokens *only* in the presence of parentheses with adjoining whitespace, and includes the parentheses with the tokens: (?<=\))\s+|\s+(?=\() So you'll get this kind of behavior: Tottenham Hotspur (London) F.C. Internationale (milan) FC Midtjylland (Herning) (Ikast) to Tottenham Hotspur (London) F.C. Internationale (milan) FC Midtjylland (Herning) (Ikast) Steve > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, April 15, 2011 1:51 PM > To: solr-user@lucene.apache.org > Subject: Re: Split token > > What you've shown would be handled with WhitespaceTokenizer, but you'd > have > to > prevent filters from stripping the parens. If you have to handle things > like > blah ( stuff ) > WhitespaceTokenizer wouldn't work. > > PatternTokenizerFactory might work for you, see: > http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternTokeniz > erFactory.html > > Best > Erick > > On Tue, Apr 12, 2011 at 6:02 AM, roySolr wrote: > > > Hello, > > > > I want to split my string when it contains "(". Example: > > > > spurs (London) > > Internationale (milan) > > > > to > > > > spurs > > (london) > > Internationale > > (milan) > > > > What tokenizer can i use to fix this problem? > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/Split-token-tp2810772p2810772.html > > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: how to import data from database combine with file content in solr
Sorry if this comes through twice, but my first got rejected (this one is plain text, should come through better). Part of this is solved by the Data Import Handler (DIH) see: http://wiki.apache.org/solr/DataImportHandler And think about a "database" data source. This can be combined with the "TikaEntityParser", and maybe some transformers to assemble the file name and send it through parsing. Don't overlook the possibility of parameters (the ${ reference pattern). If you need some custom code, you can also implement a custom Transformer that gets into the transformation chain in DIH, but you should only approach that after you exhaust the above approach. Hope this helps Erick On Fri, Apr 15, 2011 at 10:24 AM, vrpar...@gmail.com wrote: > > Hello, > > I am new to solr, > > my requirements are, > > 1. at regular interval need solr to fetch data from sql server database and > do indexing on it. > 2. fetch only those records which is not yet indexed > 3. for each record there is one file associated, so with database table > fields also want to index content of that particular file > > e.g. there is one table "Customer" in database and customerid is primary key > for each customerid there is associated file of that customerprofile > named with customerid, > > 4. as i metioned above that when solr fetch data from sql server database > table , should fetch only data which is not yet indexed, (we have one older > lucene code, in which there is one field in table that isindexed so when > fetching data in select clause there is one condition that isindexed=false, > and when indexing is done update particular record of database with > isindexed=true) is there any mechanism in solr for that? > > how to achieve same ? > do i need to write custom code for that or it can be done with configuration > provided by solr? > > Thanks, > > Vishal Parekh > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-import-data-from-database-combine-with-file-content-in-solr-tp2824749p2824749.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.1: Old Index Files Not Removed on Optimize?
I can reproduce this with the example server w/ your deletionPolicy and replicationHandler configs. I'll dig further to see what's behind this behavior. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco On Fri, Apr 15, 2011 at 1:14 PM, Trey Grainger wrote: > I was just hoping someone might be able to point me in the right direction > here. We just upgraded from Solr 1.4 to Solr 3.1 this past week and we're > having issues running out of disk space on our Master servers. Our Master > has dozens of cores. We have a script that kicks off once per day to do a > rolling optimize. The script optimizes a single core, waits 5 minutes to > give the server some breathing room to catch up on indexing in a non-i/o > intensive state, and then moves onto the next core (repeating until done). > > The problem we are facing is that under Solr 1.4, the old index files were > deleted very quickly after each optimize, but under Solr 3.1, the old index > files hang around for hours... in many cases they don't disappear until we > restart Solr completely. This is leading to us running out of disk space, > as each core's index doubles in size during the optimize process and stays > that way until the next solr restart. > > I was just wondering if anyone could point me to some specific changes or > settings which may be leading to the difference between solr versions (or > any other environmental issues you may know about). I see several tickets > in Jira about similar issues, but they mostly appear to have been resolved > in the past. > > Has anyone else seen this behavior under Solr 3.1, or do you think we may be > missing some kind of new configuration setting? > > For reference, we are running on 64bit RedHat Linux. This is what I have > right now: [From SolrConfig.xml]: > true > > > > commit > optimize > startup > > > > > > 10 > 30 > > > > > false > 1 > > > > Thanks in advance, > > -Trey >
Re: partial optimize does not reduce the segment number to maxNumSegments
sorry I should elaborate that earlier... in our production environment, we have multiple cores and the ingest continuously all day long; we only do optimize periodically, and optimize once a day in mid night. So sometimes we could see 'too many open files' error. To prevent it from happening, in production we maintain a script to monitor the segment files total with all cores, and send out warnings if that number exceed a threshold... it is kind of preventive measurement. Currently we are using the linux command to count the files. We are wondering if we can simply use some formula to figure out this number, it will be better that way. Seems we could use the stat url to get segment number and multiply it by 8 (that is what we have given our schema). Any better way to approach this? thanks a lot! Renee -- View this message in context: http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2825736.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: QUESTION: SOLR INDEX BIG FILE SIZES
Hi John, ¿How can split the file of the solr index into multiple files? > Actually, the index is organized in a set of files called segments. It's not just a single file, unless you tell Solr to do so. That's because some "file systems are about to support a maximun > of space in a single file" for example some UNIX file systems only support > a maximun of 2GB per file. > As far as I know, Solr will never arrive to a segment file greater than 2GB, so this shouldn't be a problem. ¿What is the recommended storage strategy for a big solr index files? > I guess that it depends in the indexing/querying performance that you're having, the performance that you want, and what "big" exactly means for you. If your index is so big that individual queries take too long, sharding may be what you're looking for. To better understand the index format, you can see http://lucene.apache.org/java/3_1_0/fileformats.html Also, you can take a look at my blog (http://juanggrande.wordpress.com), in my last post I speak about segments merging. Regards, *Juan* 2011/4/15 JOHN JAIRO GÓMEZ LAVERDE > > SOLR > USER SUPPORT TEAM > > I have a quiestion about the "maximun file size of solr index", > when i have a "lot of data in the solr index", > > -¿How can split the file of the solr index into multiple files? > > That's because some "file systems are about to support a maximun > of space in a single file" for example some UNIX file systems only support > a maximun of 2GB per file. > > -¿What is the recommended storage strategy for a big solr index files? > > Thanks for the reply. > > JOHN JAIRO GÓMEZ LAVERDE > Bogotá - Colombia - South America
Re: Solr 3.1: Old Index Files Not Removed on Optimize?
Thank you, Yonik! I see the Jira issue you created and am guessing it's due to this issue. We're going to remove replicateAfter="startup" in the mean-time to see if that helps (assuming this is the issue the jira ticket described). I appreciate you taking a look at this. Thanks -Trey On Fri, Apr 15, 2011 at 2:58 PM, Yonik Seeley wrote: > I can reproduce this with the example server w/ your deletionPolicy > and replicationHandler configs. > I'll dig further to see what's behind this behavior. > > -Yonik > http://www.lucenerevolution.org -- Lucene/Solr User Conference, May > 25-26, San Francisco > > On Fri, Apr 15, 2011 at 1:14 PM, Trey Grainger wrote: > > I was just hoping someone might be able to point me in the right > direction > > here. We just upgraded from Solr 1.4 to Solr 3.1 this past week and > we're > > having issues running out of disk space on our Master servers. Our > Master > > has dozens of cores. We have a script that kicks off once per day to do > a > > rolling optimize. The script optimizes a single core, waits 5 minutes to > > give the server some breathing room to catch up on indexing in a non-i/o > > intensive state, and then moves onto the next core (repeating until > done). > > > > The problem we are facing is that under Solr 1.4, the old index files > were > > deleted very quickly after each optimize, but under Solr 3.1, the old > index > > files hang around for hours... in many cases they don't disappear until > we > > restart Solr completely. This is leading to us running out of disk > space, > > as each core's index doubles in size during the optimize process and > stays > > that way until the next solr restart. > > > > I was just wondering if anyone could point me to some specific changes or > > settings which may be leading to the difference between solr versions (or > > any other environmental issues you may know about). I see several > tickets > > in Jira about similar issues, but they mostly appear to have been > resolved > > in the past. > > > > Has anyone else seen this behavior under Solr 3.1, or do you think we may > be > > missing some kind of new configuration setting? > > > > For reference, we are running on 64bit RedHat Linux. This is what I have > > right now: [From SolrConfig.xml]: > > true > > > > > > > >commit > >optimize > >startup > > > > > > > > > > > > 10 > > 30 > > > > > > > > > > false > > 1 > > > > > > > > Thanks in advance, > > > > -Trey > > >
Re: QUESTION: SOLR INDEX BIG FILE SIZES
Specifically to the file size support, all the file systems on current releases of linux (and unixes too) support large files with 64 bit offsets, and I am pretty sure that java VM supports 64 bit offsets in files, so there is no 2GB file size limit anymore. François On Apr 15, 2011, at 4:31 PM, JOHN JAIRO GÓMEZ LAVERDE wrote: > > SOLR > USER SUPPORT TEAM > > I have a quiestion about the "maximun file size of solr index", > when i have a "lot of data in the solr index", > > -¿How can split the file of the solr index into multiple files? > > That's because some "file systems are about to support a maximun > of space in a single file" for example some UNIX file systems only support > a maximun of 2GB per file. > > -¿What is the recommended storage strategy for a big solr index files? > > Thanks for the reply. > > JOHN JAIRO GÓMEZ LAVERDE > Bogotá - Colombia - South America
Re: Understanding the DisMax tie parameter
Looks good, thanks Tom. -Jay On Fri, Apr 15, 2011 at 8:55 AM, Burton-West, Tom wrote: > Thanks everyone. > > I updated the wiki. If you have a chance please take a look and check to > make sure I got it right on the wiki. > > http://wiki.apache.org/solr/DisMaxQParserPlugin#tie_.28Tie_breaker.29 > > Tom > > > > -Original Message- > From: Chris Hostetter [mailto:hossman_luc...@fucit.org] > Sent: Thursday, April 14, 2011 5:41 PM > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > Cc: Burton-West, Tom > Subject: Re: Understanding the DisMax tie parameter > > > : Perhaps the parameter could have had a better name. It's essentially > : max(score of matching clauses) + tie * (score of matching clauses that > : are not the max) > : > : So it can be used and thought of as a tiebreak only in the sense that > : if two docs match a clause (with essentially the same score), then a > : small tie value will act as a tiebreaker *if* one of those docs also > : matches some other fields. > > correct. w/o a tiebreaker value, a dismax query will only look at the > maximum scoring clause for each doc -- the "tie" param is named for it's > ability to help break ties when multiple documents have the same score > from the max scoring clause -- by adding in a small portion of the scores > (based on the 0->1 ratio of the "tie" param) from the other clauses. > > > -Hoss >
Re: Solr 3.1: Old Index Files Not Removed on Optimize?
On Fri, Apr 15, 2011 at 5:28 PM, Trey Grainger wrote: > Thank you, Yonik! > I see the Jira issue you created and am guessing it's due to this issue. > We're going to remove replicateAfter="startup" in the mean-time to see if > that helps (assuming this is the issue the jira ticket described). Yes, removing replicateAfter="startup" will avoid this bug. https://issues.apache.org/jira/browse/SOLR-2469 fixes the bug, if you need to replicate after startup. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco