Duplicate content
Hi All, I want to change the duplicate content behavior in solr. What I want to do is: 1) I don't want duplicate content. 2) I don't want to overwrite old content with new one. Means, if I add duplicate content in solr and the content already exists, the old content should not be overwritten. Can anyone suggest how to achieve it? Thanks, Sunil
RE: Duplicate content
Thanks guys. -Original Message- From: Norberto Meijome [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 15, 2008 2:35 PM To: solr-user@lucene.apache.org Subject: Re: Duplicate content On Tue, 15 Jul 2008 10:48:14 +0200 Jarek Zgoda <[EMAIL PROTECTED]> wrote: > >> 2) I don't want to overwrite old content with new one. > >> > >> Means, if I add duplicate content in solr and the content already > >> exists, the old content should not be overwritten. > > > > before inserting a new document, query the index - if you get a result back, > > then don't insert. I don't know of any other way. > > This operation is not atomic, so you get a race condition here. Other > than that, it seems fine. ;) of course - but i am not sure you can control atomicity at the SOLR level (yet? ;) ) for /update handler - so it'd have to either be a custom handler, or your app being the only one accessing and controlling write access to it that way. It definitely gets more interesting if you start adding shards ;) _ {Beto|Norberto|Numard} Meijome "All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use hammer." IBM maintenance manual, 1975 I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Solr/Lucene search term stats
Hi All, I am working on a module using Solr, where I want to get the stats of each keyword found in each field. If my search term is: (title:("web2.0" OR "ajax") OR description:("web2.0" OR "ajax")) Then I want to know how many times web2.0/ajax were found in title or description. Any suggestion on how to get this information (apart from & hl=true variable). Thanks, Sunil
Solr/Lucene search term stats
Hi All, I am working on a module using Solr, where I want to get the stats of each keyword found in each field. If my search term is: (title:("web2.0" OR "ajax") OR description:("web2.0" OR "ajax")) Then I want to know how many times web2.0/ajax were found in title or description. Any suggestion on how to get this information (apart from & hl=true variable). Thanks, Sunil
Exact match
Hi, I am sending a request to solr for exact match. Example: (title:("Web 2.0" OR "Social Networking") OR description: ("Web 2.0" OR "Social Networking")) But in the results I am getting stories matching "Social", "Web" etc. Please let me know what's going wrong. Thanks, Sunil
RE: Exact match
Both the fields are "text" type: How "&debugQuery=true" will help? I am not familiar with the output. Thanks, Sunil -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, July 28, 2008 2:33 PM To: solr-user@lucene.apache.org Subject: Re: Exact match Look at what Solr returns when adding &debugQuery=true for the parsed query, and also consider how your fields are analyzed (their associated type, etc). Erik On Jul 28, 2008, at 4:56 AM, Sunil wrote: > Hi, > > I am sending a request to solr for exact match. > > Example: (title:("Web 2.0" OR "Social Networking") OR description: > ("Web > 2.0" OR "Social Networking")) > > > But in the results I am getting stories matching "Social", "Web" etc. > > Please let me know what's going wrong. > > Thanks, > Sunil >
password protect solr URLs
Hi, I want to password protect the solr select/update/delete URLs. Any link from where I can get some help Thanks, Sunil
Can I change "/select" to POST and not GET
Hi, My query limit is exceeding the 1024 URL length. Can I configure solr to accept POST requests while searching content in solr? Thanks in advance, Sunil.
RE: Can I change "/select" to POST and not GET
Hi Ian, Thanks for the reply. I am using CURL, and the library was sending a GET request to solr. But I have changed it to POST. Now it's working properly. Thanks, Sunil -Original Message- From: Ian Connor [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 19, 2008 7:53 PM To: solr-user@lucene.apache.org Subject: Re: Can I change "/select" to POST and not GET The query limit is a software imposed limit. What client are you using and can that be configured to allow more? On Tue, Aug 19, 2008 at 9:43 AM, Sunil <[EMAIL PROTECTED]> wrote: > Hi, > > My query limit is exceeding the 1024 URL length. Can I configure solr to > accept POST requests while searching content in solr? > > Thanks in advance, > Sunil. > > > -- Regards, Ian Connor
OOM on commit after few days
I have been facing this issue since long in production environment and wanted to know if anybody came across can share their thoughts. Appreciate your help. Environment 2 GB index file 3.5 million documents 15 mins. time interval for committing 100 to 400 document updates Commit happens once in 15 mins. 3.5 GB of RAM available for JVM Solr Version 1.3 ; (nightly build of oct 18, 2008) MDB - Message Driven Bean I am Not using solr's replication mecahnism. Also don't use xml post update since the amount of data is too much. I have bundled a MDB that receives messages for data updates and uses solr's update handler to update and commit the index. Optimize happens once a day. Everything runs fine for 2-3 days; after that I keep getting following exceptions. Exception org.apache.solr.common.SolrException log java.lang.OutOfMemoryError: at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:350) at org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:907) at org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:338) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1170) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:856) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:283) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:170) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1302) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:51) at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1128) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:284) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690) at java.lang.Thread.run(Thread.java:810)
Re: OOM on commit after few days
Thanks Yonik The main search still happens through SolrDispatchFilter so SolrQueryRequest is getting closed implicitly. But I do use direct api in following cases; So pl suggest any more possible resource issues 1. update and commit; core.getUpdateHanlder(); Here I close the updateHandler once update/commits are done 2. Searching in other cores from current core writer I have requirement to aggregate the data from multiple indexes and send single xml response. otherCore.getSearcher() and call search method to get reference to Hits I do call decref() on refCounted once done with processing result 3. Also call reload core after commit ; This brings down the ram usage but does not solve the main issue; With the reload I don't see any leaks but the OOM error occurs after 2-3 days time. Do you think any other resource not getting closed ? Sunil --- On Tue, 12/2/08, Yonik Seeley <[EMAIL PROTECTED]> wrote: > From: Yonik Seeley <[EMAIL PROTECTED]> > Subject: Re: OOM on commit after few days > To: solr-user@lucene.apache.org > Date: Tuesday, December 2, 2008, 1:13 PM > Using embedded is always more error prone...you're > probably forgetting > to close some resource. > Make sure to close all SolrQueryRequest objects. > Start with a memory profiler or heap dump to try and figure > out what's > taking up all the memory. > > -Yonik > > On Tue, Dec 2, 2008 at 1:05 PM, Sunil > <[EMAIL PROTECTED]> wrote: > > I have been facing this issue since long in production > environment and wanted to know if anybody came across can > share their thoughts. > > Appreciate your help. > > > > Environment > > 2 GB index file > > 3.5 million documents > > 15 mins. time interval for committing 100 to 400 > document updates > > Commit happens once in 15 mins. > > 3.5 GB of RAM available for JVM > > Solr Version 1.3 ; (nightly build of oct 18, 2008) > > > > MDB - Message Driven Bean > > I am Not using solr's replication mecahnism. Also > don't use xml post update since the amount of data is > too much. > > I have bundled a MDB that receives messages for data > updates and uses solr's update handler to update and > commit the index. > > Optimize happens once a day. > > > > Everything runs fine for 2-3 days; after that I keep > getting following exceptions. > > > > Exception > > org.apache.solr.common.SolrException log > java.lang.OutOfMemoryError: > >at java.io.RandomAccessFile.readBytes(Native > Method) > >at > java.io.RandomAccessFile.read(RandomAccessFile.java:350) > >at > org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596) > >at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136) > >at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92) > >at > org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:907) > >at > org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:338) > >at > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69) > >at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131) > >at > org.apache.lucene.search.Searcher.search(Searcher.java:126) > >at > org.apache.lucene.search.Searcher.search(Searcher.java:105) > >at > org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1170) > >at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:856) > >at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:283) > >at > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) > >at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:170) > >at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > >at > org.apache.solr.core.SolrCore.execute(SolrCore.java:1302) > >at > org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:51) > >at > org.apache.solr.core.SolrCore$4.call(SolrCore.java:1128) > >at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:284) > >at > java.util.concurrent.FutureTask.run(FutureTask.java:138) > >at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665) > >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690) > >at java.lang.Thread.run(Thread.java:810) > > > >
Solr/ZK issues
) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74) at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:287) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:363) at org.apache.solr.cloud.ZkController.access$000(ZkController.java:89) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:237) at org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166) ERROR - 2015-06-17 08:07:51.190; org.apache.solr.common.SolrException; There was a problem finding the leader in zk:java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1153) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:307) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:304) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:304) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:928) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:914) at org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1514) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:386) at org.apache.solr.cloud.ZkController.access$000(ZkController.java:89) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:237) at org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166) INFO - 2015-06-17 08:07:51.220; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state. INFO - 2015-06-17 08:07:51.240; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state. INFO - 2015-06-17 08:07:51.258; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state. INFO - 2015-06-17 08:07:51.274; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state. INFO - 2015-06-17 08:07:51.284; org.apache.solr.cloud.ElectionContext; canceling election /overseer_elect/election/93424944611198761-<<<>>>>:8080_solr-n_000286 Any pointers here? Thanks, Sunil
Substitution variable and Collection api
Hi I am trying to create a collection via Collection API and set a core property to use system substitution variable as shown below: http://localhost:8090/solr/admin/collections?action= CREATE&name=ds&numShards=1&replicationFactor=1& maxShardsPerNode=1&collection.configName=ds&property. dataDir=${solr.data.dir}\ds This doesn't work as the index files are getting created at the root folder (c:/ds/). How do I force it to accept the value as a literal string so that it is set as "dataDir=${solr.data.dir}/ds" ? Note: If I explicitly modify the core.properties "dataDir" to ${solr.data.dir}\ds , it works as expected and the index files gets created at this location. This is using Solr 6.3. Thanks Sunil
Re: [EXTERNAL] Grouping and group.facet performance disaster
Use group.cache.percent – for your index size, it might work well. Thanks, On 5/31/17, 4:16 AM, "Marek Tichy" wrote: Hi, I'm getting a very slow response times on grouping, especially on facet grouping. Without grouping, the query takes 14ms, faceting 57ms. With grouping, the query time goes up to 1131ms, with facet grouping, the faceting goes up to the unbearable 12103 ms. Single solr instance, 927086docs, 518.23 MB size, solr 6.4.1. Is this really the price of grouping ? Are there any magic tricks/tips/techniques to improve the speed ? The query params below. Many thanks for any help, much appreciated. Best Marek Tichy fq=((type:knihy) OR (type:defekty)) fl=* start=0 f.ebook_formats.facet.mincount=1 f.authorid.facet.mincount=1 f.thematicgroupid.facet.mincount=1 f.articleparts.facet.mincount=1 f.type.facet.mincount=1 f.languageid.facet.mincount=1 f.showwindow.facet.mincount=1 f.articletypeid_grouped.facet.mincount=1 f.languageid.facet.limit=10 f.ebook_formats.facet.limit=10 f.authorid.facet.limit=10 f.type.facet.limit=10 f.articleparts.facet.limit=10 f.thematicgroupid.facet.limit=10 f.articletypeid_grouped.facet.limit=10 f.showwindow.facet.limit=100 version=2.2 group.limit=30 rows=30 echoParams=all sort=date desc,planneddate asc group.field=edition facet.method=enum group.truncate=false group.format=grouped group=true group.ngroups=true stats=true facet=true group.facet=true stats.field={!distinctValues=true}categoryid facet.field={!ex=at}articletypeid_grouped facet.field={!ex=at}type facet.field={!ex=author}authorid facet.field={!ex=format}articleparts facet.field={!ex=format}ebook_formats facet.field={!ex=lang}languageid facet.field={!ex=sw}showwindow facet.field={!ex=tema}thematicgroupid stats.field={!min=true max=true}price stats.field={!min=true max=true}yearout
Solr Spell check component - SolrJ
Hi, I am trying to use SolrJ java client to do search and the basic search works fine with the code given below. SolrServer server = new CommonsHttpSolrServer( " http://localhost:8983/solr";); SolrQuery query = new SolrQuery(). setQuery(searchKey). addHighlightField("TITLE_EN_US"). setRows(new Integer(10)). setStart(new Integer(startPosition)). setHighlight(true); QueryResponse rsp = server.query( query ); However I would like to do spell check for the query string and would like alternate suggestions. I looked at the documentation for spellcheck component and saw examples for queries using http url request. for eg: http://localhost:8983/solr/spellCheckCompRH?q=pizzza&spellcheck.q=pizzza&spellcheck=true&spellcheck.build=true I would like to acheive the same using the Java client api. Could anyone provide sample code for the same? Thanks&Rgds, Sunil
query parsing issue + behavior as OR (solr 1.4-dev)
I am working with nightly build of Oct 17, 2008 and found the issue that something wrong with LuceneQParserPlugin; It takes + as OR e.g. q=first_name:joe+last_name:smith is behaving as OR instead of AND. Default operator is set to AND in schema.xml Is there any new configuration I need to put in place in order to get this working ? Thanks -Sunil
Issue with Query Parsing '+' works as 'OR'
I am working with nightly build of Oct 17, 2008 and found the issue that something wrong with Query Parsing; It takes + as OR e.g. q=first_name:joe+last_name:smith is behaving as OR instead of AND. Default operator is set to AND in schema.xml Is there any new configuration I need to put in place in order to get this working ? Thanks -Sunil
solr 374- error on commit
Getting following exception when i try to commit the index 2nd time onwards. fyi..I am sending commit command via http post just to reload the index. at java.lang.Thread.run(Thread.java:595) y: org.apache.lucene.store.AlreadyClosedException: this Directory is clo at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220) at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo 33) at org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos. ) at org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndex ava:188) at org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexRea :124) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1015) ... 26 more
searching like RDBMS way
This is very general requirement and I am sure somebody might have thought about the solution. Sample scenario to explain my question --- There is a many-to-many relationship between 2 entities - Sales Person & Client One sales person can work for many clients. One Client may be served by many sales persons. I will have 3 separate index storages. 1. Only for Sales Persons 2. Id combinations for IDs of sales persons and clients (many-to-many list) 3. Only for Clients Query Requirement - > Get all the clients for a given sales person. For this I need to hook to index 2 and 3 to get the full result. One immediate solution would be - Make first query to get client ids from 2nd index - and then make another query using those client ids to pull client detail information from 3rd index. I cannot make 2 separate search calls since there could be thousands of clients for a sales person. This results into maxClause count error. I know how to increase it but not a good solutions. Thanks Sunil ** This e-mail transmission and any attachments that accompany it may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law and is intended solely for the use of the individual(s) to whom it was intended to be addressed. If you have received this e-mail by mistake, or you are not the intended recipient, any disclosure, dissemination, distribution, copying or other use or retention of this communication or its substance is prohibited. If you have received this communication in error, please immediately reply to the author via e-mail that you received this message by mistake and also permanently delete the original and all copies of this e-mail and any attachments from your computer. Thank you. **
Out of memory errors with Spatial indexing
We are seeing OOM errors when trying to index some spatial data. I believe the data itself might not be valid but it shouldn't cause the Server to crash. We see this on both Solr 7.6 and Solr 8. Below is the input that is causing the error. { "id": "bad_data_1", "spatialwkt_srpt": "LINESTRING (-126.86037681029909 -90.0 1.000150474662E30, 73.58164711175415 -90.0 1.000150474662E30, 74.52836551959528 -90.0 1.000150474662E30, 74.97006811540834 -90.0 1.000150474662E30)" } Above dynamic field is mapped to field type "location_rpt" ( solr.SpatialRecursivePrefixTreeFieldType). Any pointers to get around this issue would be highly appreciated. Thanks!
Re: Out of memory errors with Spatial indexing
Hi David Thanks for your response. Yes, I noticed that all the data causing issue were at the poles. I tried the "RptWithGeometrySpatialField" field type definition but get a "Spatial context does not support S2 spatial index"error. Setting "spatialContextFactory="Geo3D" I still see the original OOM error . On Sat, 4 Jul 2020 at 05:49, David Smiley wrote: > Hi Sunil, > > Your shape is at a pole, and I'm aware of a bug causing an exponential > explosion of needed grid squares when you have polygons super-close to the > pole. Might you try S2PrefixTree instead? I forget if this would fix it > or not by itself. For indexing non-point data, I recommend > class="solr.RptWithGeometrySpatialField" which internally is based off a > combination of a course grid and storing the original vector geometry for > accurate verification: > class="solr.RptWithGeometrySpatialField" > prefixTree="s2" /> > The internally coarser grid will lessen the impact of that pole bug. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Fri, Jul 3, 2020 at 7:48 AM Sunil Varma > wrote: > > > We are seeing OOM errors when trying to index some spatial data. I > believe > > the data itself might not be valid but it shouldn't cause the Server to > > crash. We see this on both Solr 7.6 and Solr 8. Below is the input that > is > > causing the error. > > > > { > > "id": "bad_data_1", > > "spatialwkt_srpt": "LINESTRING (-126.86037681029909 -90.0 > > 1.000150474662E30, 73.58164711175415 -90.0 1.000150474662E30, > > 74.52836551959528 -90.0 1.000150474662E30, 74.97006811540834 -90.0 > > 1.000150474662E30)" > > } > > > > Above dynamic field is mapped to field type "location_rpt" ( > > solr.SpatialRecursivePrefixTreeFieldType). > > > > Any pointers to get around this issue would be highly appreciated. > > > > Thanks! > > >
Solr client in JavaScript
This is my javascript code ,from where I am calling solr ,which has a loaded nutch core (index). My java script client ( runs on TOMCAT server) and Solr server are on the same machine (10.21.6.100) . May be due to cross domain references issues OR something is missing I don't know. I expected Response from Solr server (search result) as raw JASON object. Kindly help me fix it.Thanks in advance . Rgds Sunil Kumar Solr Search function search() { var xmlHttpReq =false; var xmlHttpClient=this; var hostURL='<a rel="nofollow" href="http://10.21.6.100:8983/solr/nutch/select">http://10.21.6.100:8983/solr/nutch/select</a>'; var querystring=document.getElementById("querystring").value; qstr='q='+escape(querystring)+"&fl=content"; if(window.XMLHttpRequest){ xmlHttpClient.xmlHttpReq=new XMLHttpRequest(); } xmlHttpClient.xmlHttpReq.open('POST',hostURL,true); xmlHttpClient.xmlHttpReq.setRequestHeader('Content-Type','application/x-www-form-urlencoded'); xmlHttpClient.xmlHttpReq.send(qstr); xmlHttpClient.xmlHttpReq.onreadystatechange=function() { if(xmlHttpClient.xmlHttpReq.readyState ==4 ) { showresponse(xmlHttpClient.xmlHttpReq.responseText);} } } function showResponse(str) { document.getElementById("responsestring").innerHTML=str; } Solr Search [ From Javascript ]
Solr edismax parser with multi-word synonyms
I have enabled the SynonymGraphFilter in my field configuration in order to support multi-word synonyms (I am using Solr 7.6). Here is my field configuration: And this is my synonyms.txt file: frozen dinner,microwave food Scenario 1: blue shirt (query with no synonyms) Here is my first Solr query: http://localhost:8983/solr/base/search?q=blue+shirt&qf=title&defType=edismax&debugQuery=on And this is the parsed query I see in the debug output: +((title:blue) (title:shirt)) Scenario 2: frozen dinner (query with synonyms) Now, here is my second Solr query: http://localhost:8983/solr/base/search?q=frozen+dinner&qf=title&defType=edismax&debugQuery=on And this is the parsed query I see in the debug output: +(((+title:microwave +title:food) (+title:frozen +title:dinner))) I am wondering why the first query looks for documents containing at least one of the two query tokens, whereas the second query looks for documents with both of the query tokens? I would understand if it looked for both the tokens of the synonyms (i.e. both microwave and food) to avoid the sausagization problem. But I would like to get partial matches on the original query at least (i.e. it should also match documents containing just the token 'dinner'). Would any one know why the behavior is different across queries with and without synonyms? And how could I work around this if I wanted partial matches on queries that also have synonyms? Ideally, I would like the parsed query in the second case to be: +(((+title:microwave +title:food) (title:frozen title:dinner))) I'd appreciate any help with this. Thanks!
Re: Re: Solr edismax parser with multi-word synonyms
Hi Erick, Is there anyway I can get it to match documents containing at least one of the words of the original query? i.e. 'frozen' or 'dinner' or both. (But not partial matches of the synonyms) Thanks,Sunil -Original Message- From: Erick Erickson To: solr-user Sent: Thu, Jul 18, 2019 04:42 AM Subject: Re: Solr edismax parser with multi-word synonyms This is not a phrase query, rather it’s requiring either pair of words to appear in the title. You’ve told it that “frozen dinner” and “microwave foods” are synonyms. So it’s looking for both the words “microwave” and “foods” in the title field, or “frozen” and “dinner” in the title field. You’d see the same thing with single-word synonyms, albeit a little less confusingly. Best, Erick > On Jul 18, 2019, at 1:01 AM, kshitij tyagi > wrote: > > Hi sunil, > > 1. as you have added "microwave food" in synonym as a multiword synonym to > "frozen dinner", edismax parsers finds your synonym in the file and is > considering your query as a Phrase query. > > This is the reason you are seeing parsed query as +(((+title:microwave > +title:food) (+title:frozen +title:dinner))), frozen dinner is considered > as a phrase here. > > If you want partial match on your query then you can add frozen dinner, > microwave food, microwave, food to your synonym file and you will see the > parsed query as: > "+(((+title:microwave +title:food) title:miccrowave title:food > (+title:frozen +title:dinner)))" > Another option is to write your own custom query parser and use it as a > plugin. > > Hope this helps!! > > kshitij > > > On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan wrote: > >> >> I have enabled the SynonymGraphFilter in my field configuration in order >> to support multi-word synonyms (I am using Solr 7.6). Here is my field >> configuration: >> >> >> >> >> >> >> >> > synonyms="synonyms.txt"/> >> >> >> >> >> >> And this is my synonyms.txt file: >> frozen dinner,microwave food >> >> Scenario 1: blue shirt (query with no synonyms) >> >> Here is my first Solr query: >> >> http://localhost:8983/solr/base/search?q=blue+shirt&qf=title&defType=edismax&debugQuery=on >> >> And this is the parsed query I see in the debug output: >> +((title:blue) (title:shirt)) >> >> Scenario 2: frozen dinner (query with synonyms) >> >> Now, here is my second Solr query: >> >> http://localhost:8983/solr/base/search?q=frozen+dinner&qf=title&defType=edismax&debugQuery=on >> >> And this is the parsed query I see in the debug output: >> +(((+title:microwave +title:food) (+title:frozen +title:dinner))) >> >> I am wondering why the first query looks for documents containing at least >> one of the two query tokens, whereas the second query looks for documents >> with both of the query tokens? I would understand if it looked for both the >> tokens of the synonyms (i.e. both microwave and food) to avoid the >> sausagization problem. But I would like to get partial matches on the >> original query at least (i.e. it should also match documents containing >> just the token 'dinner'). >> >> Would any one know why the behavior is different across queries with and >> without synonyms? And how could I work around this if I wanted partial >> matches on queries that also have synonyms? >> >> Ideally, I would like the parsed query in the second case to be: >> +(((+title:microwave +title:food) (title:frozen title:dinner))) >> >> I'd appreciate any help with this. Thanks! >>