Unable to search with the number contains in Product Part number and the string sontains the name of the product
Hi, I am using Solr search in Websphere Commerce 7.While searching with the product partnumber and the first part of the name string I am able to search the particular product along with it's details.But when I am searching with part of a partnumber say for example '1487' instead of full part number 'A0001487',the search result is giving me empty. Similarly If I search with 'Kelver' for product 'Kevlar Brake Pads',the search result is coming.But If I search with 'Kev',the search result is empty from the Solr itself. I tried to see the loggs in details and I founf after connecting to Solr from WCS end,it is fetching the search tearms as [6/22/12 17:13:26:405 IST] 00a0 queryservice > com.ibm.commerce.foundation.internal.server.services.dataaccess.queryservice.SubstitutionHelper getControlValues ENTRY LANGUAGES {SearchTerms.1=SearchTerms=[Kev],, _wcf.dataLanguageIds.0=_wcf.dataLanguageIds=[-1],, AssociationType.1=AssociationType=[2,3,1], So,my query is where do I need to make the changes to fetch the actual result for this. Please suggest me.Any help is highly appriciated. Thanks Prasenjit -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-search-with-the-number-contains-in-Product-Part-number-and-the-string-sontains-the-name-oft-tp3990959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unable to search with the number contains in Product Part number and the string sontains the name of the product
Hi, thank for the suggestion.But my requirement is to search with Partnumber say 'ALT1' instead of whole 'ALT15SBW'.If I do so the search result is empty. I have changed the schema.xml file for fieldtype "wc_txt","solr.TextField" with passing . Still the search result empty for the above condition. Can you please suggest me what changes I need to do? Thanks Prasenjit -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-search-with-the-number-contains-in-Product-Part-number-and-the-string-sontains-the-name-oft-tp3990959p3991312.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unable to search with the number contains in Product Part number and the string sontains the name of the product
Hi, thank for the suggestion.But my requirement is to search with Partnumber say 'ALT1' instead of whole 'ALT15SBW'.If I do so the search result is empty. I have changed the schema.xml file for fieldtype "wc_txt","solr.TextField" with passing Still the search result empty for the above condition. Can you please suggest me what changes I need to do? Thanks Prasenjit -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-search-with-the-number-contains-in-Product-Part-number-and-the-string-sontains-the-name-oft-tp3990959p3991314.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrReplication configuration with frequent deletes and updates
I have the following requirements : 1. Adds : 20 docs/sec 2. Searches : 100 searches/sec 3. Deletes : (20*3600*24*7 ~ 12 mill ) docs/week ( basically a cron job which deletes all documents more than 7 days old ) I am thinking of having 6 shards ( with each having 2 million docs ) with 1 master and 2 slaves with SolrReplication. Have following questions : 1. With 50 searches/sec per shard with 2 million doc, what would be the tentative response-time ? I am thinking of keeping it under <100 ms 2. What would be a reasonable latency ( pollInterval ) on slave for SolrReplication ( all slaves connected with a single backplane ). Is 1 minute pollInterval reasonable ? 3. Is NRT a better/viable option compared to SolrReplication ? -Thanks, Prasenjit
Re: SolrReplication configuration with frequent deletes and updates
Appreciate your reply. Have some more follow up questions inline. On Thu, Feb 2, 2012 at 12:35 AM, Emmanuel Espina wrote: >> 1. Adds : 20 docs/sec >> 2. Searches : 100 searches/sec >> 3. Deletes : (20*3600*24*7 ~ 12 mill ) docs/week ( basically a cron >> job which deletes all documents more than 7 days old ) >> >> I am thinking of having 6 shards ( with each having 2 million docs ) >> with 1 master and 2 slaves with SolrReplication. Have following >> questions : >> >> 1. With 50 searches/sec per shard with 2 million doc, what would be >> the tentative response-time ? I am thinking of keeping it under <100 >> ms > > That are quite a lot of searches per second considering that you will > have to search in 6 shards (the coordination and network latency > affects the results). Also the components you use and the complexity > of the query (as well as the number of segments in each shard) affects > the time. 100 ms is probably a low threshold for your requirements, > you will probably need to add more replicas. Adding slaves ( using SolrReplication ) is fine as long as it scales linear. I do understand that shards may not scale linearly, mostly because of merging/network overhead, but think will help in reducing response time ( pls correct me if I am wrong ) . I am more worried about response time ( even on a lightly loaded slave ). The main intention of sharding was to reduce the response time. Will it be better to have a 2shardsX6slaves configuration compared to 6shardX2slaves ? Considering my total# docs is 12 million, wIll solr be ok with 6 million docs/shard ? > > >> 2. What would be a reasonable latency ( pollInterval ) on slave for >> SolrReplication ( all slaves connected with a single backplane ). Is 1 >> minute pollInterval reasonable ? > > Yes, but it is not reasonable that each time you poll you get updates. > That is, you shouldn't perform commits more than once every 10 > minutes. Otherwise we would be talking of near real time indexing, > something that is in development in trunk > http://wiki.apache.org/solr/NearRealtimeSearch Hmm. 10 minutes latency is definitely too hight for me ( specially as this is a streaming use case, i.e. show latest stuff first ) In that case I can probably get rid of master-slave and update all the replicated shards. But then I will have to do lot of leg-work ( what if one of the slaves are down etc. etc. ) I was trying to avoid that. Just curious to know what is the stability of NRT ? > > >> 3. Is NRT a better/viable option compared to SolrReplication ? > > That is something in development. AFAIK it works with shards (because > nrt refers to indexing and with shards there isn't anything particular > with the indexing) but with replication something different will be > needed: SolrCloud I think covers these nrt aspects due to its > different architecture (not master-slave that in replicas but all > peers replicating) So it seems SolrReplication is out ( if my pollInteterval < 5 minute ), right ? Let me look into SolrCloud. Any suggestions which one is more stable SolrCloud/NRT ? -Thanks, Prasenjit
effect of continuous deletes on index's read performance
I have a use case where documents are continuously added @ 20 docs/sec ( each doc add is also doing a commit ) and docs continuously getting deleted at the same rate. So the searchable index size remains the same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6). Will it have pauses when deletes triggers compaction. Or with every commits ( while adds ) ? How bad they will effect on search response time. -Thanks, Prasenjit
Re: effect of continuous deletes on index's read performance
Thanks Otis. commitWithin will definitely work for me ( as I currently am using 3.4 version, which doesnt have NRT yet ). Assuming that I use commitWithin=10secs, are you saying that the continuous deletes ( without commit ) wont have any affect on performance ? I was under the impression that deletes just mark the doc-ids ( essentially means that the index size will remain the same ) , but wont actually do the compaction till someone calls optimize/commit, is my assumption not true ? -Thanks, Prasenjit On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic wrote: > Hi Prasenjit, > > It sounds like at this point your main enemy might be those per-doc-add > commits. Don't commit until you need to see your new docs in results. And > if you need NRT then use softCommit option with Solr trunk > (http://search-lucene.com/?q=softcommit&fc_project=Solr) or use commitWithin > to limit commit's "performance damage". > > > Otis > > > Performance Monitoring SaaS for Solr - > http://sematext.com/spm/solr-performance-monitoring/index.html > > > >> >> From: prasenjit mukherjee >>To: solr-user >>Sent: Monday, February 6, 2012 1:17 AM >>Subject: effect of continuous deletes on index's read performance >> >>I have a use case where documents are continuously added @ 20 docs/sec >>( each doc add is also doing a commit ) and docs continuously getting >>deleted at the same rate. So the searchable index size remains the >>same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6). >> >>Will it have pauses when deletes triggers compaction. Or with every >>commits ( while adds ) ? How bad they will effect on search response >>time. >> >>-Thanks, >>Prasenjit >> >> >>
Re: effect of continuous deletes on index's read performance
Pardon my ignorance, Why can't the IndexWriter and IndexSearcher share the same underlying in-memory datastructure so that IndexSearcher need not be reopened with every commit. On 2/6/12, Erick Erickson wrote: > Your continuous deletes won't affect performance > noticeably, that's true. > > But you're really doing bad things with the commit after every > add or delete. You haven't said whether you have a master/ > slave setup or not, but assuming you're searching on > the same machine you're indexing to, each time you commit, > you're forcing the underlying searcher to close and re-open and > any attendant autowarming to occur. All to get a single > document searchable. 20 times a second. If you have a master/ > slave setup, you're forcing the slave to fetch the changed > parts of the index every time it polls, which is better than > what's happening on the master, but still rather often. > > 400K documents isn't very big by Solr standards, so unless > you can show performance problems, I wouldn't be concerned > about index size, as Otis says, your per-document commit > is probably hurting you far more than any index size > savings. > > I'd actually think carefully about whether you need even > 10 second commits. If you can stretch that out to minutes, > so much the better. But it all depends upon your problem > space. > > Best > Erick > > > On Mon, Feb 6, 2012 at 2:59 AM, prasenjit mukherjee > wrote: >> Thanks Otis. commitWithin will definitely work for me ( as I >> currently am using 3.4 version, which doesnt have NRT yet ). >> >> Assuming that I use commitWithin=10secs, are you saying that the >> continuous deletes ( without commit ) wont have any affect on >> performance ? >> I was under the impression that deletes just mark the doc-ids ( >> essentially means that the index size will remain the same ) , but >> wont actually do the compaction till someone calls optimize/commit, is >> my assumption not true ? >> >> -Thanks, >> Prasenjit >> >> On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic >> wrote: >>> Hi Prasenjit, >>> >>> It sounds like at this point your main enemy might be those per-doc-add >>> commits. Don't commit until you need to see your new docs in results. >>> And if you need NRT then use softCommit option with Solr trunk >>> (http://search-lucene.com/?q=softcommit&fc_project=Solr) or use >>> commitWithin to limit commit's "performance damage". >>> >>> >>> Otis >>> >>> >>> Performance Monitoring SaaS for Solr - >>> http://sematext.com/spm/solr-performance-monitoring/index.html >>> >>> >>> >>>> >>>> From: prasenjit mukherjee >>>>To: solr-user >>>>Sent: Monday, February 6, 2012 1:17 AM >>>>Subject: effect of continuous deletes on index's read performance >>>> >>>>I have a use case where documents are continuously added @ 20 docs/sec >>>>( each doc add is also doing a commit ) and docs continuously getting >>>>deleted at the same rate. So the searchable index size remains the >>>>same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6). >>>> >>>>Will it have pauses when deletes triggers compaction. Or with every >>>>commits ( while adds ) ? How bad they will effect on search response >>>>time. >>>> >>>>-Thanks, >>>>Prasenjit >>>> >>>> >>>> > -- Sent from my mobile device
Solr on netty
Is anybody aware of any effort regarding porting solr to a netty ( or any other async-io based framework ) based framework. Even on medium load ( 10 parallel clients ) with 16 shards performance seems to deteriorate quite sharply compared another alternative ( async-io based ) solution as load increases. -Prasenjit -- Sent from my mobile device
Re: Solr on netty
Thanks for the response. Yes we have 16 shards/partitions each on 16 different nodes and a separate master Solr receiving continuous parallel requests from 10 client threads running on a single separate machine. Our observation was that the perf degraded non linearly as the load ( no of concurrent clients ) increased. Have some followup questions : 1. What is the default maxnumber of threads configured when a Solr instance make calls to other 16 partitions ? 2. How do I increase the max no of connections for solr<-->solr interactions as u mentioned in your mail ? On 2/22/12, Yonik Seeley wrote: > On Wed, Feb 22, 2012 at 9:27 AM, prasenjit mukherjee > wrote: >> Is anybody aware of any effort regarding porting solr to a netty ( or >> any other async-io based framework ) based framework. >> >> Even on medium load ( 10 parallel clients ) with 16 shards >> performance seems to deteriorate quite sharply compared another >> alternative ( async-io based ) solution as load increases. > > By "16 shards" do you mean you have 16 nodes and each single client > request causes a distributed search across all of them them? How many > concurrent requests are your 10 clients making to each node? > > NIO works well when there are many clients, but when servicing those > client requests only needs intermittent CPU. That's not the pattern > we see for search. > You *can* easily configure Solr's Jetty to use NIO when accepting > client connections, but it won't do you any good, just as switching to > Netty wouldn't do anything here. > > Where NIO could help a little is with the requests that Solr makes to > other Solr instances. Solr is already architected for async > request-response to other nodes, but the current underlying > implementation uses HttpClient 3 (which doesn't have NIO). > > Anyway, it's unlikely that NIO vs BIO will make much of a difference > with the numbers you're talking about (16 shards). > > Someone else reported that we have the number of connections per host > set too low, and they saw big gains by increasing this. There's an > issue open to make this configurable in 3x: > https://issues.apache.org/jira/browse/SOLR-3079 > We should probably up the max connections per host by default. > > -Yonik > lucidimagination.com > -- Sent from my mobile device
Need help with Solr Streaming query
Hi, I am facing issue while working with solr streamimg expression. I am using /export for emiting tuples out of streaming query.Howver when I tried to use not operator in solr query it is not working.The same is working with /select. Please find the below query top(n=105,search(,qt="/export",q="-(: *c*) ",fl="",sort=" asc"),sort=" asc") In the above query the not operator q="-(: *c*)" is not working with /export.However the same query works when I combine any postive search criteria with the not expression like q="-(: *c*) AND (: **)". Can you please help here. As running only not query with /export should be a valid use case. I have also checked the solr logs and found no errors when running the not query.The query is just not returning any value and it is returning with no result very fast. Regards, Prasenjit Sarkar Experience certainty. IT Services Business Solutions Consulting =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you