Re: Long STW GCs with Solr Cloud
16.6.2016, 1.41, Shawn Heisey kirjoitti: If you want to continue avoiding G1, you should definitely be using CMS. My recommendation right now would be to try the G1 settings on my wiki page under the heading "Current experiments" or the CMS settings just below that. For what it's worth, we're currently running Shawn's G1 settings slightly modified for our workload on Java 1.8.0_91 25.91-b14: GC_TUNE=" \ -XX:+UseG1GC \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=16m \ -XX:MaxGCPauseMillis=200 \ -XX:+UnlockExperimentalVMOptions \ -XX:G1NewSizePercent=3 \ -XX:ParallelGCThreads=12 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " It seems that our highly varying loads during day vs. night caused some issues leading to long pauses until I added the G1NewSizePercent (which needs +UnlockExperimentalVMOptions). Things are running smoothly and there are reports that the warnings regarding G1 with Lucene tests don't happen anymore with the newer Java versions, but it's of course up to you if you're willing to take the chance. Regards, Ere
Error when searching with special characters
Hi, I encountered this error when I tried to search with special characters, like "&" and "#". { "responseHeader":{ "status":400, "QTime":0}, "error":{ "msg":"org.apache.solr.search.SyntaxError: Cannot parse '\"Research ': Lexical error at line 1, column 11. Encountered: after : \"\\\"Research \"", "code":400}} I have done the search by putting inverted commands, like: q="Research & Development" What could be the issue here? I'm facing this problem in both Solr 5.4.0 and Solr 6.0.1. Regards, Edwin
Re: Long STW GCs with Solr Cloud
Am 17.06.2016 um 09:06 schrieb Ere Maijala: > 16.6.2016, 1.41, Shawn Heisey kirjoitti: >> If you want to continue avoiding G1, you should definitely be using >> CMS. My recommendation right now would be to try the G1 settings on my >> wiki page under the heading "Current experiments" or the CMS settings >> just below that. > > For what it's worth, we're currently running Shawn's G1 settings slightly > modified for our workload on Java 1.8.0_91 25.91-b14: > > GC_TUNE=" \ > -XX:+UseG1GC \ > -XX:+ParallelRefProcEnabled \ > -XX:G1HeapRegionSize=16m \ > -XX:MaxGCPauseMillis=200 \ > -XX:+UnlockExperimentalVMOptions \ > -XX:G1NewSizePercent=3 \ > -XX:ParallelGCThreads=12 \ > -XX:+UseLargePages \ > -XX:+AggressiveOpts \ > " -XX:G1NewSizePercent ... Sets the percentage of the heap to use as the minimum for the young generation size. The default value is 5 percent of your Java heap. ... So you are reducing the young heap generation size to get a smoother running system. This is strange, like reducing the bottle below the bottleneck. Just my 2 cents. Regards Bernd > > It seems that our highly varying loads during day vs. night caused some > issues leading to long pauses until I added the G1NewSizePercent (which > needs +UnlockExperimentalVMOptions). Things are running smoothly and there > are reports that the warnings regarding G1 with Lucene tests don't > happen anymore with the newer Java versions, but it's of course up to you if > you're willing to take the chance. > > Regards, > Ere
Re: Long STW GCs with Solr Cloud
17.6.2016, 11.05, Bernd Fehling kirjoitti: Am 17.06.2016 um 09:06 schrieb Ere Maijala: 16.6.2016, 1.41, Shawn Heisey kirjoitti: If you want to continue avoiding G1, you should definitely be using CMS. My recommendation right now would be to try the G1 settings on my wiki page under the heading "Current experiments" or the CMS settings just below that. For what it's worth, we're currently running Shawn's G1 settings slightly modified for our workload on Java 1.8.0_91 25.91-b14: GC_TUNE=" \ -XX:+UseG1GC \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=16m \ -XX:MaxGCPauseMillis=200 \ -XX:+UnlockExperimentalVMOptions \ -XX:G1NewSizePercent=3 \ -XX:ParallelGCThreads=12 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " -XX:G1NewSizePercent ... Sets the percentage of the heap to use as the minimum for the young generation size. The default value is 5 percent of your Java heap. ... So you are reducing the young heap generation size to get a smoother running system. This is strange, like reducing the bottle below the bottleneck. True, but it works. Perhaps that's due to the default being too much with our heap size (> 10 GB). In any case, these settings allow us to run with average pause of <150ms and max pause of <2s whiel we previously struggled with pauses exceeding 20s at worst. All this was inspired by https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase. Regards, Ere
[ANNOUNCE] Apache Solr 6.1.0 released
17 June 2016, Apache Solr 6.1.0 available Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search and analytics, rich document parsing, geospatial search, extensive REST APIs as well as parallel SQL. Solr is enterprise grade, secure and highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 6.1.0 is available for immediate download at: * http://lucene.apache.org/solr/mirrors-solr-latest-redir.html Please read CHANGES.txt for a full list of new features and changes: * https://lucene.apache.org/solr/6_1_0/changes/Changes.html Solr 6.1 Release Highlights: * Added graph traversal support, and new "sort" and "random" streaming expressions. It's also now possible to create streaming expressions with the Solr Admin UI. * Fixed the ENUM faceting method to not be unnecessarily rewritten to FCS, which was causing slowdowns. * Reduced garbage creation when creating cache entries. * New [subquery] document transformer to obtatin related documents per result doc. * EmbeddedSolrServer allocates heap much wisely even with plain document list without callbacks. * New GeoJSON response writer for encoding geographic data in query responses. Further details of changes are available in the change log available at: http://lucene.apache.org/solr/6_1_0/changes/Changes.html Please report any feedback to the mailing lists ( http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also applies to Maven access. -- Adrien
re: tlogs not deleting as usual in Solr 5.5.1?
After some more searching, I found a thread online where Erick Erickson is telling someone about how there are old tlogs left around in case there is a need for a peer to sync even if SolrCloud is not enabled. That makes sense, but we'll probably want to enable autoCommit and then trigger replication on the slaves when we know everything is committed after a full import. (We disable polling.) From: "Chris Morley" Sent: Thursday, June 16, 2016 3:20 PM To: "Solr Newsgroup" Subject: tlogs not deleting as usual in Solr 5.5.1? The repetition below is on purpose to show the contrast between solr versions. In Solr 4.10.3, we have autocommits disabled. We do a dataimport of a few hundred thousand records and have a tlog that grows to ~1.2G. In Solr 5.5.1, we have autocommits disabled. We do a dataimport of a few hundred thousand records and have a tlog that grows to ~1.6G. (same exact data, slightly larger tlog but who knows, that's fine) In Solr 4.10.3 tlogs ARE deleted after issuing update?commit=true. (And deleted immediately.) In Solr 5.5.1 tlogs ARE NOT deleted after issuing update?commit=true. We want the tlog to delete like it did in Solr 4.10.3. Perhaps there is a configuration setting or feature of Solr 5.5.1 that causes this? Would appreciate any tips on configuration or code we could change to ensure the tlog will delete after a hard commit.
Accessing response docs in process method
Hi, I would like to check the response for the *authors *data that comes in my multiValued *authors* field and do some activity related to it before the output is send back. I know to access the facets and investigate it. Could some one pls advise (the apis/ methods etc) on how I can get started on this (accessing results in the *process* method). Thanks! Mark.
Re: ConcurrentMergeScheduler options not exposed
Really we need the infoStream output, to see what IW is doing, to take so long merging. Likely only one merge thread is running (CMS tries to detect if your IO system "spins" and if so, uses 1 merge thread) ... maybe try configuring this to something higher since your RAID array can probably handle it? It's good that disabling auto IO throttling didn't change things ... that's what I expected (since forced merges are not throttled by default). Maybe capture all thread stacks and post back here? Mike McCandless http://blog.mikemccandless.com On Thu, Jun 16, 2016 at 4:04 PM, Shawn Heisey wrote: > On 6/16/2016 2:35 AM, Michael McCandless wrote: > > > > Hmm, merging can't read at 800 MB/sec and only write at 20 MB/sec for > > very long ... unless there is a huge percentage of deletes. Also, by > > default CMS doesn't throttle forced merges (see > > CMS.get/setForceMergeMBPerSec). Maybe capture > > IndexWriter.setInfoStream output? > > I can see the problem myself. I have a RAID10 array with six SATA > disks. When I click the Optimize button for a core that's several > gigabytes, iotop shows me reads happening at about 100MB/s for several > seconds, then writes clocking no more than 25 MB/s, and usually a lot > less. The last several gigabytes that were written were happening at > less than 5 MB/s. This is VERY slow, and does affect my nightly > indexing processes. > > Asking the shell to copy a 5GB file revealed sustained write rates of > over 500MB/s, so the hardware can definitely go faster. > > I patched in an option for solrconfig.xml where I could force it to call > disableAutoIOThrottle(). I included logging in my patch to make > absolutely sure that the new code was used. This option made no > difference in the write speed. I also enabled infoStream, but either I > configured it wrong or I do not know where to look for the messages. I > was modifying and compiling branch_5_5. > > This is the patch that I applied: > > http://apaste.info/wKG > > I did see the expected log entries in solr.log when I restarted with the > patch and the new option in solrconfig.xml. > > What else can I look at? > > Thanks, > Shawn > >
Morphlines.cell and attachments in complex docs?
I was just looking at SolrCellBuilder, and it looks like there's an assumption that documents will not have attachments/embedded objects. Unless I misunderstand the code, users will not be able to search documents inside zips, or attachments in msg/ doc/pdf/etc (cf. SOLR-7189). Are embedded documents extracted in a step before hitting SolrCellBuilder? Bug or feature? Thank you! Cheers, Tim
Re: Long STW GCs with Solr Cloud
For what it’s worth, I looked into reducing the allocation footprint of CollapsingQParserPlugin a bit, but without success. See https://issues.apache.org/jira/browse/SOLR-9125 As it happened, I was collapsing on a field with such high cardinality that the chances of a query even doing much collapsing of interest was pretty low. That allowed me to use a vastly stripped-down version of CollapsingQParserPlugin with a *much* lower memory footprint, in exchange for collapsed document heads essentially being picked at random. (That is, when collapsing two documents, the one that gets returned is random.) If that’s of interest, I could probably throw the code someplace public. On 6/16/16, 3:39 PM, "Cas Rusnov" wrote: >Hey thanks for your reply. > >Looks like running the suggested CMS config from Shawn, we're getting some >nodes with 30+sec pauses, I gather due to large heap, interestingly enough >while the scenario Jeff talked about is remarkably similar (we use field >collapsing), including the performance aspects of it, we are getting >concurrent mode failures both due to new space allocation failures and due >to promotion failures. I suspect there's a lot of garbage building up. >We're going to run tests with field collapsing disabled and see if that >makes a difference. > >Cas > > >On Thu, Jun 16, 2016 at 1:08 PM, Jeff Wartes wrote: > >> Check your gc log for CMS “concurrent mode failure” messages. >> >> If a concurrent CMS collection fails, it does a stop-the-world pause while >> it cleans up using a *single thread*. This means the stop-the-world CMS >> collection in the failure case is typically several times slower than a >> concurrent CMS collection. The single-thread business means it will also be >> several times slower than the Parallel collector, which is probably what >> you’re seeing. I understand that it needs to stop the world in this case, >> but I really wish the CMS failure would fall back to a Parallel collector >> run instead. >> The Parallel collector is always going to be the fastest at getting rid of >> garbage, but only because it stops all the application threads while it >> runs, so it’s got less complexity to deal with. That said, it’s probably >> not going to be orders of magnitude faster than a (successfully) concurrent >> CMS collection. >> >> Regardless, the bigger the heap, the bigger the pause. >> >> If your application is generating a lot of garbage, or can generate a lot >> of garbage very suddenly, CMS concurrent mode failures are more likely. You >> can turn down the -XX:CMSInitiatingOccupancyFraction value in order to >> give the CMS collection more of a head start at the cost of more frequent >> collections. If that doesn’t work, you can try using a bigger heap, but you >> may eventually find yourself trying to figure out what about your query >> load generates so much garbage (or causes garbage spikes) and trying to >> address that. Even G1 won’t protect you from highly unpredictable garbage >> generation rates. >> >> In my case, for example, I found that a very small subset of my queries >> were using the CollapseQParserPlugin, which requires quite a lot of memory >> allocations, especially on a large index. Although generally this was fine, >> if I got several of these rare queries in a very short window, it would >> always spike enough garbage to cause CMS concurrent mode failures. The >> single-threaded concurrent-mode failure would then take long enough that >> the ZK heartbeat would fail, and things would just go downhill from there. >> >> >> >> On 6/15/16, 3:57 PM, "Cas Rusnov" wrote: >> >> >Hey Shawn! Thanks for replying. >> > >> >Yes I meant HugePages not HugeTable, brain fart. I will give the >> >transparent off option a go. >> > >> >I have attempted to use your CMS configs as is and also the default >> >settings and the cluster dies under our load (basically a node will get a >> >35-60s GC STW and then the others in the shard will take the load, and >> they >> >will in turn get long STWs until the shard dies), which is why basically >> in >> >a fit of desperation I tried out ParallelGC and found it to be half-way >> >acceptable. I will run a test using your configs (and the defaults) again >> >just to be sure (since I'm certain the machine config has changed since we >> >used your unaltered settings). >> > >> >Thanks! >> >Cas >> > >> > >> >On Wed, Jun 15, 2016 at 3:41 PM, Shawn Heisey >> wrote: >> > >> >> On 6/15/2016 3:05 PM, Cas Rusnov wrote: >> >> > After trying many of the off the shelf configurations (including CMS >> >> > configurations but excluding G1GC, which we're still taking the >> >> > warnings about seriously), numerous tweaks, rumors, various instance >> >> > sizes, and all the rest, most of which regardless of heap size and >> >> > newspace size resulted in frequent 30+ second STW GCs, we settled on >> >> > the following configuration which leads to occasional high GCs but >> >> > mostly stays between 10-20 second STWs every few
Re: Error when searching with special characters
Hi, May be URL encoding issue? By the way, I would use back slash to escape special characters. Ahmet On Friday, June 17, 2016 10:08 AM, Zheng Lin Edwin Yeo wrote: Hi, I encountered this error when I tried to search with special characters, like "&" and "#". { "responseHeader":{ "status":400, "QTime":0}, "error":{ "msg":"org.apache.solr.search.SyntaxError: Cannot parse '\"Research ': Lexical error at line 1, column 11. Encountered: after : \"\\\"Research \"", "code":400}} I have done the search by putting inverted commands, like: q="Research & Development" What could be the issue here? I'm facing this problem in both Solr 5.4.0 and Solr 6.0.1. Regards, Edwin
Re: Long STW GCs with Solr Cloud
I try to adjust the new generation size so that it can handle all the allocations needed for HTTP requests. Those short-lived objects should never come from tenured space. Even without facets, I run a pretty big new generation, 2 GB in an 8 GB heap. The tenured space will always grow in Solr, because objects ejected from cache have been around a while. Caches create garbage in tenured space. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 17, 2016, at 10:01 AM, Jeff Wartes wrote: > > For what it’s worth, I looked into reducing the allocation footprint of > CollapsingQParserPlugin a bit, but without success. See > https://issues.apache.org/jira/browse/SOLR-9125 > > As it happened, I was collapsing on a field with such high cardinality that > the chances of a query even doing much collapsing of interest was pretty low. > That allowed me to use a vastly stripped-down version of > CollapsingQParserPlugin with a *much* lower memory footprint, in exchange for > collapsed document heads essentially being picked at random. (That is, when > collapsing two documents, the one that gets returned is random.) > > If that’s of interest, I could probably throw the code someplace public. > > > On 6/16/16, 3:39 PM, "Cas Rusnov" wrote: > >> Hey thanks for your reply. >> >> Looks like running the suggested CMS config from Shawn, we're getting some >> nodes with 30+sec pauses, I gather due to large heap, interestingly enough >> while the scenario Jeff talked about is remarkably similar (we use field >> collapsing), including the performance aspects of it, we are getting >> concurrent mode failures both due to new space allocation failures and due >> to promotion failures. I suspect there's a lot of garbage building up. >> We're going to run tests with field collapsing disabled and see if that >> makes a difference. >> >> Cas >> >> >> On Thu, Jun 16, 2016 at 1:08 PM, Jeff Wartes wrote: >> >>> Check your gc log for CMS “concurrent mode failure” messages. >>> >>> If a concurrent CMS collection fails, it does a stop-the-world pause while >>> it cleans up using a *single thread*. This means the stop-the-world CMS >>> collection in the failure case is typically several times slower than a >>> concurrent CMS collection. The single-thread business means it will also be >>> several times slower than the Parallel collector, which is probably what >>> you’re seeing. I understand that it needs to stop the world in this case, >>> but I really wish the CMS failure would fall back to a Parallel collector >>> run instead. >>> The Parallel collector is always going to be the fastest at getting rid of >>> garbage, but only because it stops all the application threads while it >>> runs, so it’s got less complexity to deal with. That said, it’s probably >>> not going to be orders of magnitude faster than a (successfully) concurrent >>> CMS collection. >>> >>> Regardless, the bigger the heap, the bigger the pause. >>> >>> If your application is generating a lot of garbage, or can generate a lot >>> of garbage very suddenly, CMS concurrent mode failures are more likely. You >>> can turn down the -XX:CMSInitiatingOccupancyFraction value in order to >>> give the CMS collection more of a head start at the cost of more frequent >>> collections. If that doesn’t work, you can try using a bigger heap, but you >>> may eventually find yourself trying to figure out what about your query >>> load generates so much garbage (or causes garbage spikes) and trying to >>> address that. Even G1 won’t protect you from highly unpredictable garbage >>> generation rates. >>> >>> In my case, for example, I found that a very small subset of my queries >>> were using the CollapseQParserPlugin, which requires quite a lot of memory >>> allocations, especially on a large index. Although generally this was fine, >>> if I got several of these rare queries in a very short window, it would >>> always spike enough garbage to cause CMS concurrent mode failures. The >>> single-threaded concurrent-mode failure would then take long enough that >>> the ZK heartbeat would fail, and things would just go downhill from there. >>> >>> >>> >>> On 6/15/16, 3:57 PM, "Cas Rusnov" wrote: >>> Hey Shawn! Thanks for replying. Yes I meant HugePages not HugeTable, brain fart. I will give the transparent off option a go. I have attempted to use your CMS configs as is and also the default settings and the cluster dies under our load (basically a node will get a 35-60s GC STW and then the others in the shard will take the load, and >>> they will in turn get long STWs until the shard dies), which is why basically >>> in a fit of desperation I tried out ParallelGC and found it to be half-way acceptable. I will run a test using your configs (and the defaults) again just to be sure (since I'm certain the machine config has changed since we used your unal
Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
Hi all - I've successfully run the hon-lucene-synonyms plugin from the Admin console by adding the following to the Raw Query Parameters field... &qf=text&defType=synonym_edismax&synonyms=true&synonyms.originalBoost=1.2&synonyms.synonymBoost=1.1 I got those from the Read Me on the github account. Now I'm trying to make this work via a requestHandler in solrconfig.xml. I think the following should work, but it just hangs if I add the last line referencing synonyms.originalBoost explicit 10 synonym_edismax text true 1.2 --> If I add this line, the admin console just hangs when I hit /test1 If I do NOT add the last line and only have the line that sets synonyms=true, it appears to work fine. I see the dot notation all over the sample entries in solrconfig.xml... Am I missing something here? Essentially, how do I get these variables set correctly from inside a requestHandler configured in the solrconfig.xml file? On Tue, Jun 7, 2016 at 11:47 AM, Joe Lawson < jlaw...@opensourceconnections.com> wrote: > MaryJo you might want to start a new thread, I think we kinda hijacked this > one. Also if you are interested in tuning queries check out > http://splainer.io/ and https://www.quepid.com which are interactive tools > (both of which my company makes) to tune for search relevancy. > > On Tue, Jun 7, 2016 at 1:45 PM, MaryJo Sminkey > wrote: > > > I'm really thinking this just might not be the right tool for us, what we > > really need is a solution that works like the normal synonym filter does, > > just with proper multi-term support, so I can apply the synonyms only on > > certain fields (copied fields) that have their own, lower boost settings. > > The way this plugin works across the entire query just seems too > > problematic when you need to do complex queries with lots of different > > boost settings to get good relevancy. Anyone used a different method of > > handling multi-term synonyms that isn't as global? > > > > Mary Jo > > > > > > > > On Tue, Jun 7, 2016 at 1:31 PM, MaryJo Sminkey > > wrote: > > > > > Here's the issue I am still having with getting the right search > > relevancy > > > with the synonym plugin in place. We typically have users searching on > > > multiple terms, and we want matches across multiple terms, particularly > > > those that appears as phrases, to appear higher than matches for the > same > > > term multiple times. The synonym filter makes this complicated since we > > may > > > have cases where the term the user enters, like "sbc", maps to a > > multi-term > > > synonym like "small block", and we always want the matches for the > > original > > > term to pop up first, so I'm trying to make sure the original boost is > > high > > > enough to override a phrase boost that the multi-term synonym would > give. > > > Unfortunately this then means matches on the same term multiple times > get > > > pushed up over my phrase matches...those aren't going to be the most > > > relevant matches. Not sure there's a way to solve this successfully, > > > without a completely different approach to the synonyms... or not > > counting > > > the number of matches on terms (I assume you can drop that ability, > > > although that's not ideal either...just better than what I have now). > > > > > > MJ > > > > > > > > > > > > Sent with MailTrack > > > < > > > https://mailtrack.io/install?source=signature&lang=en&referral=mjsmin...@gmail.com&idSignature=22 > > > > > > > > > On Mon, Jun 6, 2016 at 9:39 PM, MaryJo Sminkey > > > wrote: > > > > > >> > > >> On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson < > > >> jlaw...@opensourceconnections.com> wrote: > > >> > > >>> > > >>> We were thinking, as you experimented with, that the 0.5 and 2.0 > boosts > > >>> were no match for the product name and keyword field boosts so that > > would > > >>> influence your search as well. > > >> > > >> > > >> > > >> Yeah I definitely will have to play with the values a bit as we want > the > > >> product name matches to always appear highest, whether original or > > >> synonyms, but I'll have to figure out how to get that result without > one > > >> word terms that have multi word synonyms getting overly boosted for a > > >> phrase match while still sufficiently boosting the normal phrase > > match > > >> stuff too. With the normal synonym filter I was able to just copy > fields > > >> that could have synonyms to a new field (which would be the only one > > with > > >> the synonym filter), and use a different, lower boost on those fields, > > but > > >> that won't work with this plugin which applies across everything in > the > > >> query. Makes it a bit more complicated to get everything just right. > > >> > > >> MJ > > >> > > >> > > >> Sent with MailTrack > > >> < > > > https://mailtrack.io/install?source=signature&lang=en&referral=mjsmin...@gmail.com&idSignature=22 > > > > > >> > > > > > > > > >
Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
On Fri, Jun 17, 2016 at 2:15 PM, John Bickerstaff wrote: > If I do NOT add the last line and only have the line that sets > synonyms=true, it appears to work fine. > > I see the dot notation all over the sample entries in solrconfig.xml... Am > I missing something here? > > Essentially, how do I get these variables set correctly from inside a > requestHandler configured in the solrconfig.xml file? > I know I didn't have any issues using those boosts but I was sending them on the query string (or otherwise as part of my query request), rather than setting them in the config. You might try that to see if it makes a difference. Mary Jo
Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
OK - Slapping forehead now... D'oh! 1.2 wrote: > Hi all - > > I've successfully run the hon-lucene-synonyms plugin from the Admin > console by adding the following to the Raw Query Parameters field... > > > &qf=text&defType=synonym_edismax&synonyms=true&synonyms.originalBoost=1.2&synonyms.synonymBoost=1.1 > > I got those from the Read Me on the github account. > > Now I'm trying to make this work via a requestHandler in solrconfig.xml. > > I think the following should work, but it just hangs if I add the last > line referencing synonyms.originalBoost > > > > >explicit >10 >synonym_edismax >text >true >1.2 --> If I add this > line, the admin console just hangs when I hit /test1 > > > > If I do NOT add the last line and only have the line that sets > synonyms=true, it appears to work fine. > > I see the dot notation all over the sample entries in solrconfig.xml... > Am I missing something here? > > Essentially, how do I get these variables set correctly from inside a > requestHandler configured in the solrconfig.xml file? > > On Tue, Jun 7, 2016 at 11:47 AM, Joe Lawson < > jlaw...@opensourceconnections.com> wrote: > >> MaryJo you might want to start a new thread, I think we kinda hijacked >> this >> one. Also if you are interested in tuning queries check out >> http://splainer.io/ and https://www.quepid.com which are interactive >> tools >> (both of which my company makes) to tune for search relevancy. >> >> On Tue, Jun 7, 2016 at 1:45 PM, MaryJo Sminkey >> wrote: >> >> > I'm really thinking this just might not be the right tool for us, what >> we >> > really need is a solution that works like the normal synonym filter >> does, >> > just with proper multi-term support, so I can apply the synonyms only on >> > certain fields (copied fields) that have their own, lower boost >> settings. >> > The way this plugin works across the entire query just seems too >> > problematic when you need to do complex queries with lots of different >> > boost settings to get good relevancy. Anyone used a different method of >> > handling multi-term synonyms that isn't as global? >> > >> > Mary Jo >> > >> > >> > >> > On Tue, Jun 7, 2016 at 1:31 PM, MaryJo Sminkey >> > wrote: >> > >> > > Here's the issue I am still having with getting the right search >> > relevancy >> > > with the synonym plugin in place. We typically have users searching on >> > > multiple terms, and we want matches across multiple terms, >> particularly >> > > those that appears as phrases, to appear higher than matches for the >> same >> > > term multiple times. The synonym filter makes this complicated since >> we >> > may >> > > have cases where the term the user enters, like "sbc", maps to a >> > multi-term >> > > synonym like "small block", and we always want the matches for the >> > original >> > > term to pop up first, so I'm trying to make sure the original boost is >> > high >> > > enough to override a phrase boost that the multi-term synonym would >> give. >> > > Unfortunately this then means matches on the same term multiple times >> get >> > > pushed up over my phrase matches...those aren't going to be the most >> > > relevant matches. Not sure there's a way to solve this successfully, >> > > without a completely different approach to the synonyms... or not >> > counting >> > > the number of matches on terms (I assume you can drop that ability, >> > > although that's not ideal either...just better than what I have now). >> > > >> > > MJ >> > > >> > > >> > > >> > > Sent with MailTrack >> > > < >> > >> https://mailtrack.io/install?source=signature&lang=en&referral=mjsmin...@gmail.com&idSignature=22 >> > > >> > > >> > > On Mon, Jun 6, 2016 at 9:39 PM, MaryJo Sminkey >> > > wrote: >> > > >> > >> >> > >> On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson < >> > >> jlaw...@opensourceconnections.com> wrote: >> > >> >> > >>> >> > >>> We were thinking, as you experimented with, that the 0.5 and 2.0 >> boosts >> > >>> were no match for the product name and keyword field boosts so that >> > would >> > >>> influence your search as well. >> > >> >> > >> >> > >> >> > >> Yeah I definitely will have to play with the values a bit as we want >> the >> > >> product name matches to always appear highest, whether original or >> > >> synonyms, but I'll have to figure out how to get that result without >> one >> > >> word terms that have multi word synonyms getting overly boosted for a >> > >> phrase match while still sufficiently boosting the normal phrase >> > match >> > >> stuff too. With the normal synonym filter I was able to just copy >> fields >> > >> that could have synonyms to a new field (which would be the only one >> > with >> > >> the synonym filter), and use a different, lower boost on those >> fields, >> > but >> > >> that won't work with this plugin which applies across everything in >> the >> > >> query. Makes it a bit more complicated to get everything just right. >> > >> >> > >> MJ >>
Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
> OK - Slapping forehead now... D'oh! > > 1.2 > Float, not int! > LOL, we've all been there. I'm surprised I didn't notice that myself. MJ
Re: tlogs not deleting as usual in Solr 5.5.1?
If you are NOT using SolrCloud and don't care about Real Time Get, you can just disable the tlogs entirely. They're not doing you all that much good in that case... The tlogs are irrelevant when it comes to master/slave replication. FWIW, Erick On Fri, Jun 17, 2016 at 9:14 AM, Chris Morley wrote: > After some more searching, I found a thread online where Erick Erickson is > telling someone about how there are old tlogs left around in case there is > a need for a peer to sync even if SolrCloud is not enabled. That makes > sense, but we'll probably want to enable autoCommit and then trigger > replication on the slaves when we know everything is committed after a full > import. (We disable polling.) > > > > > > From: "Chris Morley" > Sent: Thursday, June 16, 2016 3:20 PM > To: "Solr Newsgroup" > Subject: tlogs not deleting as usual in Solr 5.5.1? > The repetition below is on purpose to show the contrast between solr > versions. > > In Solr 4.10.3, we have autocommits disabled. We do a dataimport of a few > hundred thousand records and have a tlog that grows to ~1.2G. > > In Solr 5.5.1, we have autocommits disabled. We do a dataimport of a few > hundred thousand records and have a tlog that grows to ~1.6G. (same exact > data, slightly larger tlog but who knows, that's fine) > > In Solr 4.10.3 tlogs ARE deleted after issuing update?commit=true. > (And deleted immediately.) > > In Solr 5.5.1 tlogs ARE NOT deleted after issuing update?commit=true. > > We want the tlog to delete like it did in Solr 4.10.3. Perhaps there is a > configuration setting or feature of Solr 5.5.1 that causes this? > > Would appreciate any tips on configuration or code we could change to > ensure the tlog will delete after a hard commit. > > >
Thank You Guys
Hi Guys, Thank you all - I got synonyms, highlighting, stemming all working the way I wanted to. I am sure I will have more questions later on =) Thanks! Sas
Re: tlogs not deleting as usual in Solr 5.5.1?
Thanks Erick - that's what we have settled on doing until we are using SolrCloud, which will be later this year with any luck. We want to get up onto Solr 5.5.1 first (ASAP) and we tried disabling tlogs today and that seems to fit the bill. From: "Erick Erickson" Sent: Friday, June 17, 2016 2:36 PM To: "solr-user" , ch...@depahelix.com Subject: Re: tlogs not deleting as usual in Solr 5.5.1? If you are NOT using SolrCloud and don't care about Real Time Get, you can just disable the tlogs entirely. They're not doing you all that much good in that case... The tlogs are irrelevant when it comes to master/slave replication. FWIW, Erick On Fri, Jun 17, 2016 at 9:14 AM, Chris Morley wrote: > After some more searching, I found a thread online where Erick Erickson is > telling someone about how there are old tlogs left around in case there is > a need for a peer to sync even if SolrCloud is not enabled. That makes > sense, but we'll probably want to enable autoCommit and then trigger > replication on the slaves when we know everything is committed after a full > import. (We disable polling.) > > > > > > From: "Chris Morley" > Sent: Thursday, June 16, 2016 3:20 PM > To: "Solr Newsgroup" > Subject: tlogs not deleting as usual in Solr 5.5.1? > The repetition below is on purpose to show the contrast between solr > versions. > > In Solr 4.10.3, we have autocommits disabled. We do a dataimport of a few > hundred thousand records and have a tlog that grows to ~1.2G. > > In Solr 5.5.1, we have autocommits disabled. We do a dataimport of a few > hundred thousand records and have a tlog that grows to ~1.6G. (same exact > data, slightly larger tlog but who knows, that's fine) > > In Solr 4.10.3 tlogs ARE deleted after issuing update?commit=true. > (And deleted immediately.) > > In Solr 5.5.1 tlogs ARE NOT deleted after issuing update?commit=true. > > We want the tlog to delete like it did in Solr 4.10.3. Perhaps there is a > configuration setting or feature of Solr 5.5.1 that causes this? > > Would appreciate any tips on configuration or code we could change to > ensure the tlog will delete after a hard commit. > > >
Re: SOLR war for SOLR 6
On 6/16/2016 1:20 AM, Bharath Kumar wrote: > I was trying to generate a solr war out of the solr 6 source, but even > after i create the war, i was not able to get it deployed correctly on > jboss. Wanted to know if anyone was able to successfully generate solr > war and deploy it on tomcat or jboss? Really appreciate your help on > this. FYI: If you do this, you're running an unsupported configuration. You're on your own for both getting it working AND any problems that are related to the deployment rather than Solr itself. You actually don't need to create a war. Just run "ant clean server" in the solr directory of the source code and then install the exploded webapp (found in server/solr-webapp/webapp) into the container. There should be instructions available for how to install an exploded webapp into tomcat or jboss. As already stated, you are on your own for finding and following those instructions, and if Solr doesn't deploy, you will need to talk to somebody who knows the container for help. Once they are sure you have the config for the container right, they may refer you back here ... but because it's an unsupported config, the amount of support we can offer is minimal. https://wiki.apache.org/solr/WhyNoWar If you want the admin UI to work when you install into a user-supplied container, then you must set the context path for the app to "/solr". The admin UI in 6.x will not work if you use another path, and that is not considered a bug, because the only supported container has the path hardcoded to /solr. Thanks, Shawn