Re: SolrCloud different score for same document on different replicas.
On Fri, 6 Jan 2017 10:45:02 -0600 Webster Homer wrote: > I was seeing something like this, and it turned out to be a problem with > our autoCommit and autoSoftCommit settings. We had overly aggressive > settings that eventually started failing with errors around too many > warming searchers etc... > > You can test this by doing a commit and seeing if the replicas start > returning consistent results > Commit changes nothing, since number og deleted documents doesn't change much. Optimize makes ranking consistent over replicas for the time being, until too many updates has hit the shard, and the number of deleted documents (in the largest, it takes some time to prune due to a merge) segment. Optimizing hourly is not really an option. > On Thu, Jan 5, 2017 at 10:31 AM, Charlie Hull wrote: > > > On 05/01/2017 13:30, Morten Bøgeskov wrote: > > > >> > >> > >> Hi. > >> > >> We've got a SolrCloud which is sharded and has a replication factor of > >> 2. > >> > >> The 2 replicas of a shard may look like this: > >> > >> Num Docs:5401023 > >> Max Doc:6388614 > >> Deleted Docs:987591 > >> > >> > >> Num Docs:5401023 > >> Max Doc:5948122 > >> Deleted Docs:547099 > >> > >> We've seen >10% difference in Max Doc at times with same Num Docs. > >> Our use case is few documents that are search and many small that > >> are filtered against (often updated multiple times a day), so the > >> difference in deleted docs aren't surprising. > >> > >> This results in a different score for a document depending on which > >> replica it comes from. As I see it: it has to do with the different > >> maxDoc value when calculating idf. > >> > >> This in turn alters a specific document's position in the search > >> result over reloads. This is quite confusing (duplicates in pagination). > >> > >> What is the trick to get homogeneous score from different replicas. > >> We've tried using ExactStatsCache & ExactSharedStatsCache, but that > >> didn't seem to make any difference. > >> > >> Any hints to this will be greatly appreciated. > >> > >> > > This was one of things we looked at during our recent Lucene London > > Hackday (see item 3) https://github.com/flaxsearch/london-hackday-2016 > > > > I'm not sure there is a way to get a homogenous score - this patch tries > > to keep you connected to the same replica during a session so you don't see > > results jumping over pagination. > > > > Cheers > > > > Charlie > > > > > > -- > > Charlie Hull > > Flax - Open Source Enterprise Search > > > > tel/fax: +44 (0)8700 118334 > > mobile: +44 (0)7767 825828 > > web: www.flax.co.uk > > > -- Morten Bøgeskov
Re: Regarding /sql -- WHERE <> IS NULL and IS NOT NULL
For NOT NULL, I had some success using: WHERE field_name <> '' (greater or less than empty quotes) Best regards, Gethin. From: Joel Bernstein Sent: 05 January 2017 20:12:19 To: solr-user@lucene.apache.org Subject: Re: Regarding /sql -- WHERE <> IS NULL and IS NOT NULL IS NULL and IS NOT NULL predicate are not currently supported. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jan 5, 2017 at 2:05 PM, radha krishnan wrote: > Hi, > > solr version : 6.3 > > will WHERE <> IS NULL / IS NOT NULL work with the /sql handler > ? > > " select name from gettingstarted where name is not null " > > the above query is not returning any documents in the response even if > there are documents with "name"defined > > > Thanks, > Radhakrishnan D >
OnError CSV upload
Hi All, Background: I have a mainframe file that I want to upload and the data is pipe delimited. Some of the records however have a few fields less that others within the same file and when I try to import the file, Solr has an issue with the amount of columns vs the amount of values, which is correct. Is there not a way, using the standard CSV upload, to continue on error and perhaps get a log of the failed records? === GPAA e-mail Disclaimers and confidential note This e-mail is intended for the exclusive use of the addressee only. If you are not the intended recipient, you should not use the contents or disclose them to any other person. Please notify the sender immediately and delete the e-mail. This e-mail is not intended nor shall it be taken to create any legal relations, contractual or otherwise. Legally binding obligations can only arise for the GPAA by means of a written instrument signed by an authorised signatory. ===
Help needed in breaking large index file into smaller ones
Hi All, My solr server has a few large index files (say ~10G). I am looking for some help on breaking them it into smaller ones (each < 4G) to satisfy my application requirements. Are there any such tools available? Appreciate your help. Thanks NRC
Help needed in breaking large solr index file into smaller ones
Hi All, My solr server has a few large index files (say ~10G). I am looking for some help on breaking them it into smaller ones (each < 4G) to satisfy my application requirements. Basically, I am not looking for any optimization of index here (ex: optimize, expungeDeletes etc.). Are there any such tools available? Appreciate your help. Thanks NRC
Question about Lucene FieldCache
Hi, After some reading into the documentation, supposedly the Lucene FieldCache is the only one that it's not possible to disable. Fetching the config for a collection through the REST API I found an entry like this: "query": { "useFilterForSortedQuery": true, "queryResultWindowSize": 1, "queryResultMaxDocsCached": 0, "enableLazyFieldLoading": true, "maxBooleanClauses": 8192, "": { "size": "1", "showItems": "-1", "initialSize": "10", "name": "fieldValueCache" } }, My questions: - That size, 1 is for all files of the collection schema or is 1 for each field defined? - If I reload the collection the caches are wiped? Regards, /Yago - Best regards /Yago -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-Lucene-FieldCache-tp4313062.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about Lucene FieldCache
Hello, Yago. "size": "1", "showItems": "-1", "initialSize": "10", "name": "fieldValueCache" These are Solr's UnInvertedFields, not Lucene's FieldCache. That 1 is for all fields of the collection schema. Collection reload or commit drop all entries from this cache. On Mon, Jan 9, 2017 at 1:30 PM, Yago Riveiro wrote: > Hi, > > After some reading into the documentation, supposedly the Lucene FieldCache > is the only one that it's not possible to disable. > > Fetching the config for a collection through the REST API I found an entry > like this: > > "query": { > "useFilterForSortedQuery": true, > "queryResultWindowSize": 1, > "queryResultMaxDocsCached": 0, > "enableLazyFieldLoading": true, > "maxBooleanClauses": 8192, > "": { > "size": "1", > "showItems": "-1", > "initialSize": "10", > "name": "fieldValueCache" > } > }, > > My questions: > > - That size, 1 is for all files of the collection schema or is 1 > for > each field defined? > - If I reload the collection the caches are wiped? > > Regards, > > /Yago > > > > - > Best regards > > /Yago > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Question-about-Lucene-FieldCache-tp4313062.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Sincerely yours Mikhail Khludnev
Re: Question about Lucene FieldCache
Thanks for re reply Mikhail, Do you know if the 1 value is configurable? My insert rate is so high (5000 docs/s) that the cache it's quite useless. In the case of the Lucene field cache, it's possible "clean" it in some way? Some cache is eating my memory heap. - Best regards /Yago -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help needed in breaking large solr index file into smaller ones
Hi All, I have a problem simillar to this one, where the indexes in multiple solr shards has created large index files (~10 GB each) and wanted to split this large file on each shard into smaller files. Please provide some guidelines. Thanks, Manan Sheth From: Narsimha Reddy CHALLA Sent: Monday, January 9, 2017 3:51 PM To: solr-user@lucene.apache.org Subject: Help needed in breaking large solr index file into smaller ones Hi All, My solr server has a few large index files (say ~10G). I am looking for some help on breaking them it into smaller ones (each < 4G) to satisfy my application requirements. Basically, I am not looking for any optimization of index here (ex: optimize, expungeDeletes etc.). Are there any such tools available? Appreciate your help. Thanks NRC NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Question about Lucene FieldCache
On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro wrote: > Thanks for re reply Mikhail, > > Do you know if the 1 value is configurable? yes. in solrconfig.xml https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches iirc you cant' fully disable it setting size to 0. > My insert rate is so high > (5000 docs/s) that the cache it's quite useless. > > In the case of the Lucene field cache, it's possible "clean" it in some > way? > > Even it would be possible, the first sorting query or so loads it back. > Some cache is eating my memory heap. > Probably you need to dedicate master which won't load FieldCache. > > > > - > Best regards > > /Yago > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Sincerely yours Mikhail Khludnev
term frequency solrj
Hi, Can anybody help me, I need to get term frequency for a specific filed, I use the techproduct example and I use this code: // import java.util.List; import org.apache.solr.client.solrj.SolrClient; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.impl.HttpSolrClient; import org.apache.solr.client.solrj.response.QueryResponse; import org.apache.solr.client.solrj.response.TermsResponse; public class App3 { public static void main(String[] args) throws Exception { String urlString = "http://localhost:8983/solr/techproducts";; SolrClient solr = new HttpSolrClient.Builder(urlString).build(); SolrQuery query = new SolrQuery(); query.setQuery("*:*"); query.setRequestHandler("terms"); QueryResponse response = solr.query(query); System.out.println("numFound: " + response.getResults().getNumFound()); TermsResponse termResp =response.getTermsResponse(); List terms = termResp.getTerms("name"); System.out.print("size="+ terms.size()); } } I get the following error : Exception in thread "main" numFound: 32 java.lang.NullPointerException at testPkg.App3.main(App3.java:29) Thank you in advance,,, Huda
How to integrate SOLR in ibm filenet 5.2.1?
How we can integrate SOLR in IBM filenet 5.2? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-integrate-SOLR-in-ibm-filenet-5-2-1-tp4313090.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about Lucene FieldCache
The documentation says that the only caches configurable are: - filterCache - queryResultCache - documentCache - user defined caches There is no entry for fieldValueCache and in my case all of list in the documentation are disable ... -- /Yago Riveiro On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote: > On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro wrote: > > > Thanks for re reply Mikhail, > > > > Do you know if the 1 value is configurable? > > yes. in solrconfig.xml > https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches > iirc you cant' fully disable it setting size to 0. > > > > My insert rate is so high > > (5000 docs/s) that the cache it's quite useless. > > > > In the case of the Lucene field cache, it's possible "clean" it in some > > way? > > > > Even it would be possible, the first sorting query or so loads it back. > > > Some cache is eating my memory heap. > > > Probably you need to dedicate master which won't load FieldCache. > > > > > > > > > > - > > Best regards > > > > /Yago > > -- > > View this message in context: http://lucene.472066.n3. > > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > -- > Sincerely yours > Mikhail Khludnev
RE: How to integrate SOLR in ibm filenet 5.2.1?
Apache ManifolCF is probably your friend here: http://manifoldcf.apache.org/en_US/index.html -Original message- > From:puneetmishra2555 > Sent: Monday 9th January 2017 14:37 > To: solr-user@lucene.apache.org > Subject: How to integrate SOLR in ibm filenet 5.2.1? > > How we can integrate SOLR in IBM filenet 5.2? > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-integrate-SOLR-in-ibm-filenet-5-2-1-tp4313090.html > Sent from the Solr - User mailing list archive at Nabble.com. >
RE: Help needed in breaking large index file into smaller ones
Hi, Try split on linux or unix split -l 100 originalfile.csv this will split a file into 100 lines each see other options for how to split like size -Original Message- From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com] Sent: 09 January 2017 12:12 PM To: solr-user@lucene.apache.org Subject: Help needed in breaking large index file into smaller ones Hi All, My solr server has a few large index files (say ~10G). I am looking for some help on breaking them it into smaller ones (each < 4G) to satisfy my application requirements. Are there any such tools available? Appreciate your help. Thanks NRC === GPAA e-mail Disclaimers and confidential note This e-mail is intended for the exclusive use of the addressee only. If you are not the intended recipient, you should not use the contents or disclose them to any other person. Please notify the sender immediately and delete the e-mail. This e-mail is not intended nor shall it be taken to create any legal relations, contractual or otherwise. Legally binding obligations can only arise for the GPAA by means of a written instrument signed by an authorised signatory. ===
IndexWriter.forceMerge not working as desired
Hi All, While doing index merging through IndexWriter.forceMerge method in solr 6.2.1, I am passing the argument as 30, but it is still merging all the data (earlier collection use to have 10 segments) into single segment. Please provide some information in understading the behaviour. Thanks, Manan Sheth NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Help needed in breaking large index file into smaller ones
Is this really works for lucene index files? Thanks, Manan Sheth From: Moenieb Davids Sent: Monday, January 9, 2017 7:36 PM To: solr-user@lucene.apache.org Subject: RE: Help needed in breaking large index file into smaller ones Hi, Try split on linux or unix split -l 100 originalfile.csv this will split a file into 100 lines each see other options for how to split like size -Original Message- From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com] Sent: 09 January 2017 12:12 PM To: solr-user@lucene.apache.org Subject: Help needed in breaking large index file into smaller ones Hi All, My solr server has a few large index files (say ~10G). I am looking for some help on breaking them it into smaller ones (each < 4G) to satisfy my application requirements. Are there any such tools available? Appreciate your help. Thanks NRC === GPAA e-mail Disclaimers and confidential note This e-mail is intended for the exclusive use of the addressee only. If you are not the intended recipient, you should not use the contents or disclose them to any other person. Please notify the sender immediately and delete the e-mail. This e-mail is not intended nor shall it be taken to create any legal relations, contractual or otherwise. Legally binding obligations can only arise for the GPAA by means of a written instrument signed by an authorised signatory. === NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Help needed in breaking large index file into smaller ones
No, it does not work by splitting. First of all lucene index files are not text files. There is a segment_NN file which will refer index files in a commit. So, when we split a large index file into smaller ones, the corresponding segment_NN file also needs to be updated with new index files OR a new segment_NN file should be created, probably. Can someone who is familiar with lucene index files please help us in this regard? Thanks NRC On Mon, Jan 9, 2017 at 7:38 PM, Manan Sheth wrote: > Is this really works for lucene index files? > > Thanks, > Manan Sheth > > From: Moenieb Davids > Sent: Monday, January 9, 2017 7:36 PM > To: solr-user@lucene.apache.org > Subject: RE: Help needed in breaking large index file into smaller ones > > Hi, > > Try split on linux or unix > > split -l 100 originalfile.csv > this will split a file into 100 lines each > > see other options for how to split like size > > > -Original Message- > From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com] > Sent: 09 January 2017 12:12 PM > To: solr-user@lucene.apache.org > Subject: Help needed in breaking large index file into smaller ones > > Hi All, > > My solr server has a few large index files (say ~10G). I am looking > for some help on breaking them it into smaller ones (each < 4G) to satisfy > my application requirements. Are there any such tools available? > > Appreciate your help. > > Thanks > NRC > > > > > > > > > > > > === > GPAA e-mail Disclaimers and confidential note > > This e-mail is intended for the exclusive use of the addressee only. > If you are not the intended recipient, you should not use the contents > or disclose them to any other person. Please notify the sender immediately > and delete the e-mail. This e-mail is not intended nor > shall it be taken to create any legal relations, contractual or otherwise. > Legally binding obligations can only arise for the GPAA by means of > a written instrument signed by an authorised signatory. > > === > > > > > > > > > NOTE: This message may contain information that is confidential, > proprietary, privileged or otherwise protected by law. The message is > intended solely for the named addressee. If received in error, please > destroy and notify the sender. Any use of this email is prohibited when > received in error. Impetus does not represent, warrant and/or guarantee, > that the integrity of this communication has been maintained nor that the > communication is free of errors, virus, interception or interference. >
Re: Help needed in breaking large index file into smaller ones
You can try to reindex your data to another collection with more shards -- /Yago Riveiro On 9 Jan 2017 14:15 +, Narsimha Reddy CHALLA , wrote: > No, it does not work by splitting. First of all lucene index files are not > text files. There is a segment_NN file which will refer index files in a > commit. So, when we split a large index file into smaller ones, the > corresponding segment_NN file also needs to be updated with new index files > OR a new segment_NN file should be created, probably. > > Can someone who is familiar with lucene index files please help us in this > regard? > > Thanks > NRC > > On Mon, Jan 9, 2017 at 7:38 PM, Manan Sheth wrote: > > > Is this really works for lucene index files? > > > > Thanks, > > Manan Sheth > > > > From: Moenieb Davids > Sent: Monday, January 9, 2017 7:36 PM > > To: solr-user@lucene.apache.org > > Subject: RE: Help needed in breaking large index file into smaller ones > > > > Hi, > > > > Try split on linux or unix > > > > split -l 100 originalfile.csv > > this will split a file into 100 lines each > > > > see other options for how to split like size > > > > > > -Original Message- > > From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com] > > Sent: 09 January 2017 12:12 PM > > To: solr-user@lucene.apache.org > > Subject: Help needed in breaking large index file into smaller ones > > > > Hi All, > > > > My solr server has a few large index files (say ~10G). I am looking > > for some help on breaking them it into smaller ones (each < 4G) to satisfy > > my application requirements. Are there any such tools available? > > > > Appreciate your help. > > > > Thanks > > NRC > > > > > > > > > > > > > > > > > > > > > > > > === > > GPAA e-mail Disclaimers and confidential note > > > > This e-mail is intended for the exclusive use of the addressee only. > > If you are not the intended recipient, you should not use the contents > > or disclose them to any other person. Please notify the sender immediately > > and delete the e-mail. This e-mail is not intended nor > > shall it be taken to create any legal relations, contractual or otherwise. > > Legally binding obligations can only arise for the GPAA by means of > > a written instrument signed by an authorised signatory. > > > > === > > > > > > > > > > > > > > > > > > NOTE: This message may contain information that is confidential, > > proprietary, privileged or otherwise protected by law. The message is > > intended solely for the named addressee. If received in error, please > > destroy and notify the sender. Any use of this email is prohibited when > > received in error. Impetus does not represent, warrant and/or guarantee, > > that the integrity of this communication has been maintained nor that the > > communication is free of errors, virus, interception or interference. > >
Re: SolrCloud and LVM
That's good to hear. I didn't think there would be any reason that using lvm would impact solr's performance but wanted to see if there was anything I've missed. As far as other performance goes, we use pcie and sata solid state drives since the indexes are mostly too large to cache entirely in memory, and we haven't had any performance problems so far. So I'm not expecting that to change too much when moving the cloud architecture. Thanks again. On Thu, Jan 5, 2017 at 7:55 PM Shawn Heisey wrote: > On 1/5/2017 3:12 PM, Chris Ulicny wrote: > > Is there any known significant performance impact of running solrcloud > with > > lvm on linux? > > > > While migrating to solrcloud we don't have the storage capacity for our > > expected final size, so we are planning on setting up the solrcloud > > instances on a logical volume that we can grow when hardware becomes > > available. > > Nothing specific. Whatever the general performance impacts for LVM are > is what Solr would encounter when it reads and writes data to/from the > disk. > > If your system has enough memory for good performance, then disk reads > will be rare, so the performance of the storage volume wouldn't matter > much. If you don't have enough memory, then the disk performance would > matter ...although Solr's performance at that point would probably be > bad enough that you'd be looking for ways to improve it. > > Here's some information: > > https://wiki.apache.org/solr/SolrPerformanceProblems > > Exactly how much memory is enough depends on enough factors that there's > no good general advice. The only thing we can say in general is to > recommend the ideal setup -- where you have enough spare memory that > your OS can cache the ENTIRE index. The ideal setup is usually not > required for good performance. > > Thanks, > Shawn > >
CloudSolrStream can't set the setZkClientTimeout and setZkConnectTimeout properties
Hi, Using the CloudSolrStream, is it possible define the setZkConnectTimeout and setZkClientTimeout of internal CloudSolrClient? The default negotiation timeout is set to 10 seconds. Regards, /Yago - Best regards /Yago -- View this message in context: http://lucene.472066.n3.nabble.com/CloudSolrStream-can-t-set-the-setZkClientTimeout-and-setZkConnectTimeout-properties-tp4313127.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CDCR How to recover from Corrupted transaction log
The root cause was the aggressive logging filling up the file system. Our admins have the logs on the same file system with the data, so when the filesystem got full it couldn't write to the transaction logs which corrupted them Thank you for the tips on recovery, I will forward them to our admins Web On Fri, Jan 6, 2017 at 12:43 PM, Shawn Heisey wrote: > On 1/6/2017 10:09 AM, Webster Homer wrote: > > This happened while testing and was not in a production system. So we > > just deleted both collections and recreated them after fixing the root > > cause. If this had been a production system that would not have been > > acceptable. What is the best way to recover from a problem like this? > > Stop cdcr and delete the tlog files? > > What was the root cause? Need to know that before anyone can tell you > whether or not you've run into a bug. > > If it was the problem you've separately described where CDCR logging > filled up your disk ... handling that gracefully in a program is very > difficult. It's possible, but there's very little incentive for anyone > to attempt it. Lucene and Solr have a general requirement of plenty of > free disk space (enough for the index to triple in size temporarily) > just for normal operation, so coding for disk space exhaustion isn't > likely to happen. Server monitoring should send an alarm when disk > space gets low so you can fix it before it causes real problems. > > Thanks, > Shawn > > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Help needed in breaking large index file into smaller ones
Can you set Solr config segments to a higher number, don't optimize and you will get smaller files after a new index is created. Can you reindex ? Bill Bell Sent from mobile > On Jan 9, 2017, at 7:15 AM, Narsimha Reddy CHALLA > wrote: > > No, it does not work by splitting. First of all lucene index files are not > text files. There is a segment_NN file which will refer index files in a > commit. So, when we split a large index file into smaller ones, the > corresponding segment_NN file also needs to be updated with new index files > OR a new segment_NN file should be created, probably. > > Can someone who is familiar with lucene index files please help us in this > regard? > > Thanks > NRC > > On Mon, Jan 9, 2017 at 7:38 PM, Manan Sheth > wrote: > >> Is this really works for lucene index files? >> >> Thanks, >> Manan Sheth >> >> From: Moenieb Davids >> Sent: Monday, January 9, 2017 7:36 PM >> To: solr-user@lucene.apache.org >> Subject: RE: Help needed in breaking large index file into smaller ones >> >> Hi, >> >> Try split on linux or unix >> >> split -l 100 originalfile.csv >> this will split a file into 100 lines each >> >> see other options for how to split like size >> >> >> -Original Message- >> From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com] >> Sent: 09 January 2017 12:12 PM >> To: solr-user@lucene.apache.org >> Subject: Help needed in breaking large index file into smaller ones >> >> Hi All, >> >> My solr server has a few large index files (say ~10G). I am looking >> for some help on breaking them it into smaller ones (each < 4G) to satisfy >> my application requirements. Are there any such tools available? >> >> Appreciate your help. >> >> Thanks >> NRC >> >> >> >> >> >> >> >> >> >> >> >> === >> GPAA e-mail Disclaimers and confidential note >> >> This e-mail is intended for the exclusive use of the addressee only. >> If you are not the intended recipient, you should not use the contents >> or disclose them to any other person. Please notify the sender immediately >> and delete the e-mail. This e-mail is not intended nor >> shall it be taken to create any legal relations, contractual or otherwise. >> Legally binding obligations can only arise for the GPAA by means of >> a written instrument signed by an authorised signatory. >> >> === >> >> >> >> >> >> >> >> >> NOTE: This message may contain information that is confidential, >> proprietary, privileged or otherwise protected by law. The message is >> intended solely for the named addressee. If received in error, please >> destroy and notify the sender. Any use of this email is prohibited when >> received in error. Impetus does not represent, warrant and/or guarantee, >> that the integrity of this communication has been maintained nor that the >> communication is free of errors, virus, interception or interference. >>
Re: Question about Lucene FieldCache
Try disabling and perf may get better Bill Bell Sent from mobile > On Jan 9, 2017, at 6:41 AM, Yago Riveiro wrote: > > The documentation says that the only caches configurable are: > > - filterCache > - queryResultCache > - documentCache > - user defined caches > > There is no entry for fieldValueCache and in my case all of list in the > documentation are disable ... > > -- > > /Yago Riveiro > >> On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote: >>> On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro wrote: >>> >>> Thanks for re reply Mikhail, >>> >>> Do you know if the 1 value is configurable? >> >> yes. in solrconfig.xml >> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches >> iirc you cant' fully disable it setting size to 0. >> >> >>> My insert rate is so high >>> (5000 docs/s) that the cache it's quite useless. >>> >>> In the case of the Lucene field cache, it's possible "clean" it in some >>> way? >>> >>> Even it would be possible, the first sorting query or so loads it back. >> >>> Some cache is eating my memory heap. >>> >> Probably you need to dedicate master which won't load FieldCache. >> >> >>> >>> >>> >>> - >>> Best regards >>> >>> /Yago >>> -- >>> View this message in context: http://lucene.472066.n3. >>> nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev
Available
I am available for consulting projects if your project needs help. Been doing Solr work for 6 years... Bill Bell Sent from mobile
Facet date Range without start and and date
Hi All, Is it possible to have facet date range without specifying start and and of the range. Otherwise, is it possible to put in the same request start to min value and end to max value. Thank you. Regards,NKI.
Re: SolrCloud and LVM
Yeah we normally take the number of GB on a machine for the index size on disk and then double it for memory... For example we have 28gb on disk and we see great perf at 64gb ram. If you can do that you will probably get good results. Remember to not give Java much memory. We set it at 12gb. We call it starving Java and it reduces the time to garbage collect to small increments. Bill Bell Sent from mobile > On Jan 9, 2017, at 7:56 AM, Chris Ulicny wrote: > > That's good to hear. I didn't think there would be any reason that using > lvm would impact solr's performance but wanted to see if there was anything > I've missed. > > As far as other performance goes, we use pcie and sata solid state drives > since the indexes are mostly too large to cache entirely in memory, and we > haven't had any performance problems so far. So I'm not expecting that to > change too much when moving the cloud architecture. > > Thanks again. > > >> On Thu, Jan 5, 2017 at 7:55 PM Shawn Heisey wrote: >> >>> On 1/5/2017 3:12 PM, Chris Ulicny wrote: >>> Is there any known significant performance impact of running solrcloud >> with >>> lvm on linux? >>> >>> While migrating to solrcloud we don't have the storage capacity for our >>> expected final size, so we are planning on setting up the solrcloud >>> instances on a logical volume that we can grow when hardware becomes >>> available. >> >> Nothing specific. Whatever the general performance impacts for LVM are >> is what Solr would encounter when it reads and writes data to/from the >> disk. >> >> If your system has enough memory for good performance, then disk reads >> will be rare, so the performance of the storage volume wouldn't matter >> much. If you don't have enough memory, then the disk performance would >> matter ...although Solr's performance at that point would probably be >> bad enough that you'd be looking for ways to improve it. >> >> Here's some information: >> >> https://wiki.apache.org/solr/SolrPerformanceProblems >> >> Exactly how much memory is enough depends on enough factors that there's >> no good general advice. The only thing we can say in general is to >> recommend the ideal setup -- where you have enough spare memory that >> your OS can cache the ENTIRE index. The ideal setup is usually not >> required for good performance. >> >> Thanks, >> Shawn >> >>
can we customize SOLR search for IBM Filenet 5.2?
can we customize SOLR search for IBM Filenet 5.2? -- View this message in context: http://lucene.472066.n3.nabble.com/can-we-customize-SOLR-search-for-IBM-Filenet-5-2-tp4313091.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Help needed in breaking large index file into smaller ones
Hi, Aplogies for my response, did not read the question properly. I was speaking about splitting files for import -Original Message- From: billnb...@gmail.com [mailto:billnb...@gmail.com] Sent: 09 January 2017 05:45 PM To: solr-user@lucene.apache.org Subject: Re: Help needed in breaking large index file into smaller ones Can you set Solr config segments to a higher number, don't optimize and you will get smaller files after a new index is created. Can you reindex ? Bill Bell Sent from mobile > On Jan 9, 2017, at 7:15 AM, Narsimha Reddy CHALLA > wrote: > > No, it does not work by splitting. First of all lucene index files are > not text files. There is a segment_NN file which will refer index > files in a commit. So, when we split a large index file into smaller > ones, the corresponding segment_NN file also needs to be updated with > new index files OR a new segment_NN file should be created, probably. > > Can someone who is familiar with lucene index files please help us in > this regard? > > Thanks > NRC > > On Mon, Jan 9, 2017 at 7:38 PM, Manan Sheth > > wrote: > >> Is this really works for lucene index files? >> >> Thanks, >> Manan Sheth >> >> From: Moenieb Davids >> Sent: Monday, January 9, 2017 7:36 PM >> To: solr-user@lucene.apache.org >> Subject: RE: Help needed in breaking large index file into smaller >> ones >> >> Hi, >> >> Try split on linux or unix >> >> split -l 100 originalfile.csv >> this will split a file into 100 lines each >> >> see other options for how to split like size >> >> >> -Original Message- >> From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com] >> Sent: 09 January 2017 12:12 PM >> To: solr-user@lucene.apache.org >> Subject: Help needed in breaking large index file into smaller ones >> >> Hi All, >> >> My solr server has a few large index files (say ~10G). I am >> looking for some help on breaking them it into smaller ones (each < >> 4G) to satisfy my application requirements. Are there any such tools >> available? >> >> Appreciate your help. >> >> Thanks >> NRC >> >> >> >> >> >> >> >> >> >> >> >> === >> GPAA e-mail Disclaimers and confidential note >> >> This e-mail is intended for the exclusive use of the addressee only. >> If you are not the intended recipient, you should not use the >> contents or disclose them to any other person. Please notify the >> sender immediately and delete the e-mail. This e-mail is not intended >> nor shall it be taken to create any legal relations, contractual or >> otherwise. >> Legally binding obligations can only arise for the GPAA by means of a >> written instrument signed by an authorised signatory. >> >> === >> >> >> >> >> >> >> >> >> NOTE: This message may contain information that is confidential, >> proprietary, privileged or otherwise protected by law. The message is >> intended solely for the named addressee. If received in error, please >> destroy and notify the sender. Any use of this email is prohibited >> when received in error. Impetus does not represent, warrant and/or >> guarantee, that the integrity of this communication has been >> maintained nor that the communication is free of errors, virus, interception >> or interference. >> === GPAA e-mail Disclaimers and confidential note This e-mail is intended for the exclusive use of the addressee only. If you are not the intended recipient, you should not use the contents or disclose them to any other person. Please notify the sender immediately and delete the e-mail. This e-mail is not intended nor shall it be taken to create any legal relations, contractual or otherwise. Legally binding obligations can only arise for the GPAA by means of a written instrument signed by an authorised signatory. ===
Re: Question about Lucene FieldCache
This probably says why https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrConfig.java#L258 On Mon, Jan 9, 2017 at 4:41 PM, Yago Riveiro wrote: > The documentation says that the only caches configurable are: > > - filterCache > - queryResultCache > - documentCache > - user defined caches > > There is no entry for fieldValueCache and in my case all of list in the > documentation are disable ... > > -- > > /Yago Riveiro > > On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote: > > On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro > wrote: > > > > > Thanks for re reply Mikhail, > > > > > > Do you know if the 1 value is configurable? > > > > yes. in solrconfig.xml > > https://cwiki.apache.org/confluence/display/solr/Query+ > Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches > > iirc you cant' fully disable it setting size to 0. > > > > > > > My insert rate is so high > > > (5000 docs/s) that the cache it's quite useless. > > > > > > In the case of the Lucene field cache, it's possible "clean" it in some > > > way? > > > > > > Even it would be possible, the first sorting query or so loads it back. > > > > > Some cache is eating my memory heap. > > > > > Probably you need to dedicate master which won't load FieldCache. > > > > > > > > > > > > > > > > - > > > Best regards > > > > > > /Yago > > > -- > > > View this message in context: http://lucene.472066.n3. > > > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev
Re: Question about Lucene FieldCache
Ok, then I need to configure to reduce the size of the cache. Thanks for the help Mikhail. -- /Yago Riveiro On 9 Jan 2017 17:01 +, Mikhail Khludnev , wrote: > This probably says why > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrConfig.java#L258 > > On Mon, Jan 9, 2017 at 4:41 PM, Yago Riveiro wrote: > > > The documentation says that the only caches configurable are: > > > > - filterCache > > - queryResultCache > > - documentCache > > - user defined caches > > > > There is no entry for fieldValueCache and in my case all of list in the > > documentation are disable ... > > > > -- > > > > /Yago Riveiro > > > > On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote: > > > On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro > wrote: > > > > > > > Thanks for re reply Mikhail, > > > > > > > > Do you know if the 1 value is configurable? > > > > > > yes. in solrconfig.xml > > > https://cwiki.apache.org/confluence/display/solr/Query+ > > Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches > > > iirc you cant' fully disable it setting size to 0. > > > > > > > > > > My insert rate is so high > > > > (5000 docs/s) that the cache it's quite useless. > > > > > > > > In the case of the Lucene field cache, it's possible "clean" it in some > > > > way? > > > > > > > > Even it would be possible, the first sorting query or so loads it back. > > > > > > > Some cache is eating my memory heap. > > > > > > > Probably you need to dedicate master which won't load FieldCache. > > > > > > > > > > > > > > > > > > > > > > - > > > > Best regards > > > > > > > > /Yago > > > > -- > > > > View this message in context: http://lucene.472066.n3. > > > > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html > > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > > > > -- > Sincerely yours > Mikhail Khludnev
Soir Ulr entity
Hi, i made a soir project with multiple entity. I want to launch one entity index with an URL. How i can choose the entity that i want in my url? Thank to your help -- View this message in context: http://lucene.472066.n3.nabble.com/Soir-Ulr-entity-tp4313172.html Sent from the Solr - User mailing list archive at Nabble.com.
Loading Third party libraries along with Solr
Hi, I’m Shashank. I’m new to Solr and was trying to use amazon-aws sdk along with Solr. I added amazon-aws.jar and its third party dependencies under /solr-6.3.0/server/solr/lib folder. Even after I add all required dependencies, I keep getting NoClassDefinitionError and NoSuchMethod Errors. I see that some of the third party jars such as jackson-core, jackson-mapper-asl libraries are part of /solr-6.3.0/server/solr/solr-webapp/WEB-INF/lib, but of different versions. The classes in these jars are the ones causing the issue. Could someone help me with loading these dependencies (amazon-aws third party libs) appropriately to not cause issue with the rest of the jars. Thanks, Shashank Pedamallu
Re: Help needed in breaking large index file into smaller ones
Perhaps you can copy this index into a separate location. Remove odd and even docs into former and later indexes consequently, and then force merge to single segment in both locations separately. Perhaps shard splitting in SolrCloud does something like that. On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA wrote: > Hi All, > > My solr server has a few large index files (say ~10G). I am looking > for some help on breaking them it into smaller ones (each < 4G) to satisfy > my application requirements. Are there any such tools available? > > Appreciate your help. > > Thanks > NRC > -- Sincerely yours Mikhail Khludnev
Re: term frequency solrj
Hello Huda, Try to check this https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/test/org/apache/solr/client/solrj/response/TermsResponseTest.java On Mon, Jan 9, 2017 at 4:31 PM, huda barakat wrote: > Hi, > Can anybody help me, I need to get term frequency for a specific filed, I > use the techproduct example and I use this code: > > > // > import java.util.List; > import org.apache.solr.client.solrj.SolrClient; > import org.apache.solr.client.solrj.SolrQuery; > import org.apache.solr.client.solrj.impl.HttpSolrClient; > import org.apache.solr.client.solrj.response.QueryResponse; > import org.apache.solr.client.solrj.response.TermsResponse; > > public class App3 { > public static void main(String[] args) throws Exception { > > String urlString = "http://localhost:8983/solr/techproducts";; > SolrClient solr = new HttpSolrClient.Builder(urlString).build(); > > SolrQuery query = new SolrQuery(); > > query.setQuery("*:*"); > > query.setRequestHandler("terms"); > > > QueryResponse response = solr.query(query); > System.out.println("numFound: " + > response.getResults().getNumFound()); > > TermsResponse termResp =response.getTermsResponse(); > > List terms = termResp.getTerms("name"); > System.out.print("size="+ terms.size()); > > > } > } > > > I get the following error : > > Exception in thread "main" numFound: 32 > java.lang.NullPointerException > at testPkg.App3.main(App3.java:29) > > > Thank you in advance,,, > Huda > -- Sincerely yours Mikhail Khludnev
Re: Help needed in breaking large index file into smaller ones
Can you provide more information about: - Are you using Solr in standalone or SolrCloud mode? What version of Solr? - Why do you want this? Lack of disk space? Uneven distribution of data on shards? - Do you want this data together i.e. as part of a single collection? You can check out the following APIs: SPLITSHARD: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 MIGRATE: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12 Among other things, make sure you have enough spare disk-space before trying out the SPLITSHARD API in particular. -Anshum On Mon, Jan 9, 2017 at 12:08 PM Mikhail Khludnev wrote: > Perhaps you can copy this index into a separate location. Remove odd and > even docs into former and later indexes consequently, and then force merge > to single segment in both locations separately. > Perhaps shard splitting in SolrCloud does something like that. > > On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA < > chnredd...@gmail.com> > wrote: > > > Hi All, > > > > My solr server has a few large index files (say ~10G). I am looking > > for some help on breaking them it into smaller ones (each < 4G) to > satisfy > > my application requirements. Are there any such tools available? > > > > Appreciate your help. > > > > Thanks > > NRC > > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: Help needed in breaking large index file into smaller ones
Why do you have a requirement that the indexes be < 4G? If it's arbitrarily imposed why bother? Or is it a non-negotiable requirement imposed by the platform you're on? Because just splitting the files into a smaller set won't help you if you then start to index into it, the merge process will just recreate them. You might be able to do something with the settings in TieredMergePolicy in the first place to stop generating files > 4g.. Best, Erick On Mon, Jan 9, 2017 at 3:27 PM, Anshum Gupta wrote: > Can you provide more information about: > - Are you using Solr in standalone or SolrCloud mode? What version of Solr? > - Why do you want this? Lack of disk space? Uneven distribution of data on > shards? > - Do you want this data together i.e. as part of a single collection? > > You can check out the following APIs: > SPLITSHARD: > https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 > MIGRATE: > https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12 > > Among other things, make sure you have enough spare disk-space before > trying out the SPLITSHARD API in particular. > > -Anshum > > > > On Mon, Jan 9, 2017 at 12:08 PM Mikhail Khludnev wrote: > >> Perhaps you can copy this index into a separate location. Remove odd and >> even docs into former and later indexes consequently, and then force merge >> to single segment in both locations separately. >> Perhaps shard splitting in SolrCloud does something like that. >> >> On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA < >> chnredd...@gmail.com> >> wrote: >> >> > Hi All, >> > >> > My solr server has a few large index files (say ~10G). I am looking >> > for some help on breaking them it into smaller ones (each < 4G) to >> satisfy >> > my application requirements. Are there any such tools available? >> > >> > Appreciate your help. >> > >> > Thanks >> > NRC >> > >> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >>
Re: Help needed in breaking large solr index file into smaller ones
Why? What do you think this will accomplish? I'm wondering if this is an XY problem. Best, Erick On Mon, Jan 9, 2017 at 7:48 AM, Manan Sheth wrote: > Hi All, > > I have a problem simillar to this one, where the indexes in multiple solr > shards has created large index files (~10 GB each) and wanted to split this > large file on each shard into smaller files. > > Please provide some guidelines. > > Thanks, > Manan Sheth > > From: Narsimha Reddy CHALLA > Sent: Monday, January 9, 2017 3:51 PM > To: solr-user@lucene.apache.org > Subject: Help needed in breaking large solr index file into smaller ones > > Hi All, > > My solr server has a few large index files (say ~10G). I am looking > for some help on breaking them it into smaller ones (each < 4G) to satisfy > my application requirements. Basically, I am not looking for any > optimization of index here (ex: optimize, expungeDeletes etc.). > > Are there any such tools available? > > Appreciate your help. > > Thanks > NRC > > > > > > > > > NOTE: This message may contain information that is confidential, proprietary, > privileged or otherwise protected by law. The message is intended solely for > the named addressee. If received in error, please destroy and notify the > sender. Any use of this email is prohibited when received in error. Impetus > does not represent, warrant and/or guarantee, that the integrity of this > communication has been maintained nor that the communication is free of > errors, virus, interception or interference.
Re: CloudSolrStream can't set the setZkClientTimeout and setZkConnectTimeout properties
Currently these are not settable.It's easy enough to add a setter for this values. What types of behaviors have you run into when CloudSolrClient is having timeouts issues? Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Jan 9, 2017 at 10:06 AM, Yago Riveiro wrote: > Hi, > > Using the CloudSolrStream, is it possible define the setZkConnectTimeout > and > setZkClientTimeout of internal CloudSolrClient? > > The default negotiation timeout is set to 10 seconds. > > Regards, > > /Yago > > > > - > Best regards > > /Yago > -- > View this message in context: http://lucene.472066.n3. > nabble.com/CloudSolrStream-can-t-set-the-setZkClientTimeout-and- > setZkConnectTimeout-properties-tp4313127.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: CDCR logging is Needlessly verbose, fills up the file system fast
On 12/22/2016 8:10 AM, Webster Homer wrote: > While testing CDCR I found that it is writing tons of log messages per > second. Example: > 2016-12-21 23:24:41.652 INFO (qtp110456297-13) [c:sial-catalog-material > s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1] > o.a.s.c.S.Request [sial-catalog-material_shard1_replica1] webapp=/solr > path=/cdcr params={qt=/cdcr&action=BOOTSTRAP_STATUS&wt=javabin&version=2} > status=0 QTime=0 > 2016-12-21 23:24:41.653 INFO (qtp110456297-18) [c:sial-catalog-material > s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1] > o.a.s.c.S.Request [sial-catalog-material_shard1_replica1] webapp=/solr > path=/cdcr params={qt=/cdcr&action=BOOTSTRAP_STATUS&wt=javabin&version=2} > status=0 QTime=0 I hadn't looked closely at the messages you were seeing in your logs until now. These messages are *request* logging. This is the same code path that logs every query -- it's not specific to CDCR. It's just logging all the requests that Solr is receiving. If this log message were changed to DEBUG, then Solr would not log queries by default. A large number of Solr users want that logging. I think that you could probably avoid seeing these logs by configuring log4j to not log things tagged asorg.apache.solr.core.SolrCore.Request(even though it's not a real class, I think log4j can still configure it) ... but then you wouldn't get your queries logged either. In order to not log these particular messages, but still log queries and other requests, the request logging code will need to have a way to specify that certain messages should not be logged. This might be something thatcould beconfigurable at the request handler definition level -- put something in the requestHandler configuration (for /cdcr in this case) that tells it to skip logging. That seems like a good feature to have. After looking at the CDCR configuration page in the reference guide, I might have a little more insight. You're getting one of these logs every 1-2 milliseconds ... so it sounds like you have configured the CDCR with a schedule of one millisecond. The default value for the replicator schedule is is ten milliseconds, and the update log synchronizer defaults to a full minute. I'm guessing that CDCR is not designed to have such a low schedule value. I would personally configure the replicator schedule even higher than the default -- network latency between Internet locations is often longer than ten milliseconds. Thanks, Shawn
Re: Solr Index upgradation Merging issue observed
On 1/8/2017 11:21 PM, Manan Sheth wrote: > Currently, We are in process of upgrading existing Solr indexes from Solr 4.x > to Solr 6.2.1. In order to upgrade existing indexes we are planning to use > IndexUpgrader class in sequential manner from Solr 4.x to Solr 5.x and Solr > 5.x to Solr 6.2.1. > > While preforming the upgrdation a strange behaviour is noticed where all the > previous segments are getting merged to one single large segment. We need to > preserve the original segments as single large segment is getting bulkier (~ > 2-4 TBs). > > Please let me know how to tune the process or write custom logic to overcome > it. I've taken a very quick look at the IndexUpgrader code. It does the upgrade by calling forceMerge ... which is the Lucene term for what Solr still calls "optimize." What you are seeing is completely normal. This is how it is designed to work. Changing it *might* be possible, but it would involve development work in Lucene. It likely would not be quick and easy. Single-segment indexes are slightly faster than multi-segment indexes containing the same data, so I am failing to see why this is a problem. After the upgrade, the total amount of disk space for your index would either go down or stay the same, although it will temporarily double in size during the upgrade. FYI -- this kind of segment merging can happen as a consequence of normal indexing, so this is something your system must be prepared to handle even when you are not using the upgrader tool, on ANY version of Solr. Thanks, Shawn
Re: Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?
On 11/28/2016 11:06 AM, Walter Underwood wrote: > Worst case: > 1. Disable merging. > 2. Delete all the documents. > 3. Add all the documents. > 4. Enable merging. > > After step 3, you have two copies of everything, one deleted copy and one new > copy. > The merge makes a third copy. Just getting around to replying to this REALLY old thread. What you've described doesn't really triple the size of the index at the optimize/forceMerge step. While it's true that the index is temporarily three times it's *final* size, it is not three times the pre-optimize size -- in fact, it would only get 50 percent larger. Does this mean that the recommendation saying "need 3x the space for normal operation" is not in fact true? The Lucene folks seem to be pretty adamant in their assertion that a merge to one segment can triple the index size, although I've never seen it actually happen. Disabling and re-enabling merging is not what I would call "normal." Thanks, Shawn
Re: Loading Third party libraries along with Solr
On 1/9/2017 11:35 AM, Shashank Pedamallu wrote: > I’m Shashank. I’m new to Solr and was trying to use amazon-aws sdk > along with Solr. I added amazon-aws.jar and its third party > dependencies under /solr-6.3.0/server/solr/lib folder. Even after I > add all required dependencies, I keep getting NoClassDefinitionError > and NoSuchMethod Errors. I see that some of the third party jars such > as jackson-core, jackson-mapper-asl libraries are part of > /solr-6.3.0/server/solr/solr-webapp/WEB-INF/lib, but of different > versions. The classes in these jars are the ones causing the issue. > Could someone help me with loading these dependencies (amazon-aws > third party libs) appropriately to not cause issue with the rest of > the jars. The first thing to try would be to simply use the dependencies already present in Solr. If the component you are using can't work with the older version of a third-party library that already exists in Solr, then you will have to upgrade the third-party libraries in Solr. This means replacing those jars in the WEB-INF/lib directory, not adding them to the user lib directory. Having multiple versions of any library causes problems. Note that if you do upgrade jars in WEB-INF/lib, Solr itself may stop working correctly. It's *usually* pretty safe to upgrade an internal Solr dependency as long as you're not upgrading to a new major version, but it doesn't always work. Sometimes it is simply not possible to combine Java projects in the way you want, because each of them use a dependency in ways that are not compatible with each other. Here's an example of something that just won't work because of problems with a dependency: https://issues.apache.org/jira/browse/SOLR-5582 Thanks, Shawn
Re: term frequency solrj
On 1/9/2017 6:31 AM, huda barakat wrote: > Can anybody help me, I need to get term frequency for a specific > filed, I use the techproduct example and I use this code: The variable "terms" is null on line 29, which is why you are getting NullPointerException. > query.setRequestHandler("terms"); One possible problem is setting the request handler to "terms" ... chances are that this should be "/terms" instead. Handler names in your config will most likely start with a forward slash, because if they don't, a typical example config in version 3.6 and later doesn't allow any way for them to be used. Since 3.6, "handleSelect" is set to false in all examples, and it should be left at false. Thanks, Shawn
Re: Help needed in breaking large index file into smaller ones
Hi Erick, Its due to some past issues observed with Joins on Solr 4, which got OOM on joining to large indexes after optimization/compaction, if those are stored as smaller files those gets fit into memory and operations are performed appropriately. Also, there are slow write/commit/updates are observed for large files. Thus, to minimize this risk while upgrading on Solr 6, we wanted to store indexes into smaller sized files. Thanks, Manan Sheth From: Erick Erickson Sent: Tuesday, January 10, 2017 5:24 AM To: solr-user Subject: Re: Help needed in breaking large index file into smaller ones Why do you have a requirement that the indexes be < 4G? If it's arbitrarily imposed why bother? Or is it a non-negotiable requirement imposed by the platform you're on? Because just splitting the files into a smaller set won't help you if you then start to index into it, the merge process will just recreate them. You might be able to do something with the settings in TieredMergePolicy in the first place to stop generating files > 4g.. Best, Erick On Mon, Jan 9, 2017 at 3:27 PM, Anshum Gupta wrote: > Can you provide more information about: > - Are you using Solr in standalone or SolrCloud mode? What version of Solr? > - Why do you want this? Lack of disk space? Uneven distribution of data on > shards? > - Do you want this data together i.e. as part of a single collection? > > You can check out the following APIs: > SPLITSHARD: > https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 > MIGRATE: > https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12 > > Among other things, make sure you have enough spare disk-space before > trying out the SPLITSHARD API in particular. > > -Anshum > > > > On Mon, Jan 9, 2017 at 12:08 PM Mikhail Khludnev wrote: > >> Perhaps you can copy this index into a separate location. Remove odd and >> even docs into former and later indexes consequently, and then force merge >> to single segment in both locations separately. >> Perhaps shard splitting in SolrCloud does something like that. >> >> On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA < >> chnredd...@gmail.com> >> wrote: >> >> > Hi All, >> > >> > My solr server has a few large index files (say ~10G). I am looking >> > for some help on breaking them it into smaller ones (each < 4G) to >> satisfy >> > my application requirements. Are there any such tools available? >> > >> > Appreciate your help. >> > >> > Thanks >> > NRC >> > >> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Help needed in breaking large index file into smaller ones
Additionally to answer Anshum's queries, We are currently using Solr 4.10 and planning to upgrade to Solr 6.2.1 and upgradation process in creating the current problem. We are using it in SolrCloud with 8-10 shards split on different nodes each having segment size ~30 GB for some collection and ranging 10-12 GB across board. This is due to performance and partial lack of large RAM (currently ~32 GB/node). Yes, we want data together in single collection. Thanks, Manan Sheth From: Manan Sheth Sent: Tuesday, January 10, 2017 10:51 AM To: solr-user Subject: Re: Help needed in breaking large index file into smaller ones Hi Erick, Its due to some past issues observed with Joins on Solr 4, which got OOM on joining to large indexes after optimization/compaction, if those are stored as smaller files those gets fit into memory and operations are performed appropriately. Also, there are slow write/commit/updates are observed for large files. Thus, to minimize this risk while upgrading on Solr 6, we wanted to store indexes into smaller sized files. Thanks, Manan Sheth From: Erick Erickson Sent: Tuesday, January 10, 2017 5:24 AM To: solr-user Subject: Re: Help needed in breaking large index file into smaller ones Why do you have a requirement that the indexes be < 4G? If it's arbitrarily imposed why bother? Or is it a non-negotiable requirement imposed by the platform you're on? Because just splitting the files into a smaller set won't help you if you then start to index into it, the merge process will just recreate them. You might be able to do something with the settings in TieredMergePolicy in the first place to stop generating files > 4g.. Best, Erick On Mon, Jan 9, 2017 at 3:27 PM, Anshum Gupta wrote: > Can you provide more information about: > - Are you using Solr in standalone or SolrCloud mode? What version of Solr? > - Why do you want this? Lack of disk space? Uneven distribution of data on > shards? > - Do you want this data together i.e. as part of a single collection? > > You can check out the following APIs: > SPLITSHARD: > https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 > MIGRATE: > https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12 > > Among other things, make sure you have enough spare disk-space before > trying out the SPLITSHARD API in particular. > > -Anshum > > > > On Mon, Jan 9, 2017 at 12:08 PM Mikhail Khludnev wrote: > >> Perhaps you can copy this index into a separate location. Remove odd and >> even docs into former and later indexes consequently, and then force merge >> to single segment in both locations separately. >> Perhaps shard splitting in SolrCloud does something like that. >> >> On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA < >> chnredd...@gmail.com> >> wrote: >> >> > Hi All, >> > >> > My solr server has a few large index files (say ~10G). I am looking >> > for some help on breaking them it into smaller ones (each < 4G) to >> satisfy >> > my application requirements. Are there any such tools available? >> > >> > Appreciate your help. >> > >> > Thanks >> > NRC >> > >> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
ICUFoldingFilter with swedish characters, and tokens with the keyword attribute?
Hi, I wasn't happy with how our current solr configuration handled diacritics (like 'é') in the text and in search queries, since it simply considered the letter with a diacritic as a distinct letter. Ie 'é' didn't match 'e', and vice versa. Except for a handful rare words where the diacritical sign in 'é' actually change the word meaning, it is usually used in names of people and places and the expected behaivor when searching is to not have to type them and still get the expected results (like searching for 'Penelope Cruz' and getting hits for 'Penélope Cruz'). When reading online about how to handle diacritics in solr, it seems that the general recommendation, when no language specific solution exists that handles this, is to use the ICUFoldingFilter. However this filter doesn't really come with a lot of documentation, and doesn't seem to have any configuration options at all (at least not documented). So what I ended up with doing was simply to add the ICUFoldingFilterFactory in the middle of the existing analyzer chain, like this: But that didn't really give me the results I want. For example, using the analysis debug tool I see that the text 'café åäö' becomes 'cafe caf aao'. And there are two problems with that result: 1. It doesn't respect keyword attribute 2. It folds the Swedish characters 'åäö' into 'aao' The disregard of the keyword attribute is bad enough, but the mangling of the Swedish language is really a show stopper for us. The Swedish language doesn't consider 'ö', for example, to be the letter 'o' with two diacritical dots above it, just as 'Q' isn't considered to be the letter 'O' with a diacritical "squiggly line" at the bottom. So when handling Swedish text, these characters ('åäöÅÄÖ') shouldn't be folded, because then there will be to many "collisions". For example, when searching for 'påstå' ('claim'), one doesn't want hits about 'pasta' (you guessed it, it means 'pasta'), just as one doesn't want to get hits about 'aga' ('corporal punishment, usually against children') when searching for 'äga' ('to own'). Or even worse, when searching för 'höra' ('to hear'), one most likely doesn't want hits about 'hora' ('prostitute'). And I can go on... :) So, is there a way for us to make the ICUFoldingFilter work in a better way? Ie configure it to respect the keyword attribute and ignore 'åäö' characters when folding, but otherwise fold all diacritical characters into the non-diacritical form. Or how would you recommend us to configure our analyzer chain to acomplice this? Regards /Jimi