Re: Search for FirstName with first Char uppercase followed by * not giving result; getting result with all lowercase and *
Hi Ahmet, Thanks for the reply. I had attached the Analysis report of the query George* It is found to be split into terms *George** and *George* by the WordDelimiterFilterFactory and the LowerCaseFilterFactory converts it to * george** and *george* When I indexed *George *it was also finally analyzed and stored as *george* Theny why is it that I don't get a match as per the analysis report I had attached in my previous mail. Or Am I missing something basic here? Many Thanks. M On Sun, Jan 30, 2011 at 4:34 AM, Ahmet Arslan wrote: > > :When i try george* I get results. Whereas George* fetches no results. > > > Wildcard queries are not analyzed by QueryParser. > > > > >
SOLR clustering ant code not compiling
Hi, I downloaded the latest version of SOLR. From the contrib/clustering directory ran *ant get-libraries*. It is not building!. Finally I manually downloaded colt, nni, pcj, simple xml and solr-common 1.3 jars and put them in the lib and restarted SOLR. It is giving me the following err:- *Error Loading Class org.apache.solr.handler.clustering.ClusteringComponent at org.apache.solr.core.SolrResourceLoader.findClass* ** Could some one pls help on how I can proceed. BR Mark.
Re: SOLR clustering ant code not compiling
Hi Koji, Thank you so much for the reply. I am not much familiar with the open source "trunk". So I downloaded solr1.4 from the following location http://www.apache.org/dyn/closer.cgi/lucene/solr/ On the browser I can see this err:- - org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.clustering.ClusteringComponent' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at org.apache.solr.core.SolrCore.(SolrCore.java:551) at ... Please find my ANT screen attached herewith. I used ANT 1.7.1. I separately downloaded the colt, nni, pcj, simple-xml jars into contrib/clustering/lib/downloads and again ran the ANT command. But the error is still there. Any early suggestions wud be a great help. In a separate try I deployed the solr.war in tomcat webapps and put all the jars in webapps/solr/WEB-INF/lib/{all jars including the downloded ones} ... still I get the error on browser that that the clustering class cldnot be found. Without running the ant command is it fine if we just download those 4 extra jars and put it in lib in addition to the clustering jars and the usualy solr jars??? Thanks! Mark. On Tue, Feb 23, 2010 at 9:55 AM, Koji Sekiguchi wrote: > Mark Fletcher wrote: > >> Hi, >> >> I downloaded the latest version of SOLR. From the contrib/clustering >> directory ran *ant get-libraries*. It is not building!. >> >> >> > I've just tried ant get-libraries under contrib/clustering without > any problems. I used trunk. What was your error message? > > > Finally I manually downloaded colt, nni, pcj, simple xml and solr-common >> 1.3 >> jars and put them in the lib and restarted SOLR. It is giving me the >> following err:- >> >> >> > What's solr-common 1.3? > Did you put colt, nni, ... jars under contrib/clustering/lib/downloads/ > directory? > > Koji > > -- > http://www.rondhuit.com/en/ > >
merge indexes command
Hi, Can someone pls suggest how to use this command as a part of linux script: * http://localhost:8983/solr/admin/cores?action=mergeindexes&core=core0&indexDir=/opt/solr/core1/data/index&indexDir=/opt/solr/core2/data/index * Will just adding *curl* at the beginning help.. I tried this but it gives err:- *Missing required parameter: core* ** Any help is deeply appreciated. Thanks and Rgds, Mark.
test mail... my mails to solr-user@lucene.apache.org are bouncing ... sorry for any inconvenience
Hi, Users pls ignore this mail. I am just sending a test mail to check whether my user id is okay. The mails I am sending to this group is bouncing from yesterday.Pls excuse me for any inconvenience. Thanks and Rgds, Mark
index merge
Hi, I have a doubt regarding Index Merging:- I have set up 2 cores COREX and COREY. COREX - always serves user requests COREY - gets updated with the latest values (dataDir is in a different location from COREX) I tried merging coreX and coreY at the end of COREY getting updated with the latest data values so that COREX and COREY are having the latest data. So the user who always queries COREX gets the latest data.Pls find the various approaches I followed and the commands used. I tried these merges:- COREX = COREX and COREY merged curl ' http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreX/data/index&indexDir=/opt1/solr/coreY/data/index ' COREX = COREY and COREY merged curl ' http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreY/data/index&indexDir=/opt1/solr/coreY/data/index ' COREX = COREY and COREA merged (COREA just contains the initial 2 seed segments.. a dummy core) curl ' http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreY/data/index&indexDir=/opt1/solr/coreA/data/index ' When I check the record count in COREX and COREY, COREX always contains about double of what COREY has. Is everything fine here and just the record count is different or is there something wrong. Note:- I have only 2 cores here and I tried the X=X+Y approach, X=Y+Y and X=Y+A approach where A is a dummy index. Never have the record counts matched after the merging is done. Can someone please help me understand why this record count difference occurs and is there anything fundamentally wrong in my approach. Thanks and Rgds, Mark.
Fwd: index merge
Hi, I have created 2 identical cores coreX and coreY (both have different dataDir values, but their index is same). coreX - always serves the request when a user performs a search. coreY - the updates will happen to this core and then I need to synchronize it with coreX after the update process, so that coreX also has the latest data in it. After coreX and coreY are synchronized, both should again be identical again. For this purpose I tried core merging of coreX and coreY once coreY is updated with the latest set of data. But I find coreX to be containing double the record count as in coreY. (coreX = coreX+coreY) Is there a problem in using MERGE concept here. If it is wrong can some one pls suggest the best approach. I tried the various merges explained in my previous mail. Any help is deeply appreciated. Thanks and Rgds, Mark. -- Forwarded message -- From: Mark Fletcher Date: Sat, Mar 6, 2010 at 9:17 AM Subject: index merge To: solr-user@lucene.apache.org Cc: goks...@gmail.com Hi, I have a doubt regarding Index Merging:- I have set up 2 cores COREX and COREY. COREX - always serves user requests COREY - gets updated with the latest values (dataDir is in a different location from COREX) I tried merging coreX and coreY at the end of COREY getting updated with the latest data values so that COREX and COREY are having the latest data. So the user who always queries COREX gets the latest data.Pls find the various approaches I followed and the commands used. I tried these merges:- COREX = COREX and COREY merged curl ' http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreX/data/index&indexDir=/opt1/solr/coreY/data/index ' COREX = COREY and COREY merged curl ' http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreY/data/index&indexDir=/opt1/solr/coreY/data/index ' COREX = COREY and COREA merged (COREA just contains the initial 2 seed segments.. a dummy core) curl ' http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreY/data/index&indexDir=/opt1/solr/coreA/data/index ' When I check the record count in COREX and COREY, COREX always contains about double of what COREY has. Is everything fine here and just the record count is different or is there something wrong. Note:- I have only 2 cores here and I tried the X=X+Y approach, X=Y+Y and X=Y+A approach where A is a dummy index. Never have the record counts matched after the merging is done. Can someone please help me understand why this record count difference occurs and is there anything fundamentally wrong in my approach. Thanks and Rgds, Mark.
Re: index merge
Hi Shalin, Thank you for the reply. I got your point. So I understand merge will just duplicate things. I ran the SWAP command. Now:- COREX has the dataDir pointing to the updated dataDir of COREY. So COREX has the latest. Again, COREY (on which the update regularly runs) is pointing to the old index of COREX. So this now doesnt have the most updated index. Now shouldn't I update the index of COREY (now pointing to the old COREX) so that it has the latest footprint as in COREX (having the latest COREY index)so that when the update again happens to COREY, it has the latest and I again do the SWAP. Is a physical copying of the index named COREY (the latest and now datDir of COREX after SWAP) to the index COREX (now the dataDir of COREY.. the orginal non-updated index of COREX) the best way for this or is there any other better option. Once again, later when COREY is again updated with the latest, I will run the SWAP again and it will be fine with COREX again pointing to its original dataDir (now the updated one).So every even SWAP command run will point COREX back to its original dataDir. (same case with COREY). My only concern is after the SWAP is done, updating the old index (which was serving previously and now replaced by the new index). What is the best way to do that? Physically copy the latest index to the old one and make it in sync with the latest one so that by the time it is to get the latest updates it has the latest in it so that the new ones can be added to this and it becomes the latest and is again swapped? Please share your opinion. Once again your help is appreciated. I am kind of going in circles with multiple indexs for some days! Thanks and Rgds, Mark. On Mon, Mar 8, 2010 at 7:45 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Hi Mark, > > On Sun, Mar 7, 2010 at 6:20 PM, Mark Fletcher > wrote: > > > > > I have created 2 identical cores coreX and coreY (both have different > > dataDir values, but their index is same). > > coreX - always serves the request when a user performs a search. > > coreY - the updates will happen to this core and then I need to > synchronize > > it with coreX after the update process, so that coreX also has the > > latest data in it. After coreX and coreY are synchronized, > both > > should again be identical again. > > > > For this purpose I tried core merging of coreX and coreY once coreY is > > updated with the latest set of data. But I find coreX to be containing > > double the record count as in coreY. > > (coreX = coreX+coreY) > > > > Is there a problem in using MERGE concept here. If it is wrong can some > one > > pls suggest the best approach. I tried the various merges explained in my > > previous mail. > > > > > Index merge happens at the Lucene level which has no idea about uniqueKeys. > Therefore when you merge two indexes containing exactly the same documents > (by uniqueKey), you get double the document count. > > Looking at your scenario, it seems to me that what you want to do is a swap > operation. coreX is serving the requests, coreY is updated and now you can > swap coreX with coreY so that new requests hit the updated index. I suggest > you look at the swap operation instead of index merge. > > -- > Regards, > Shalin Shekhar Mangar. >
Re: index merge
Hi Shalin, Thank you for the mail. My main purpose of having 2 identical cores COREX - always serves user request COREY - every day once, takes the updates/latest data and passess it on to COREX. is:- Suppose say I have only one COREY and suppose a request comes to COREY while the update of the latest data is happening on to it. Wouldn't it degrade performance of the requests at that point of time? So I was planning to keep COREX and COREY always identical. Once COREY has the latest it should somehow sync with COREX so that COREX also now has the latest. COREY keeps on getting the updates at a particular time of day and it will again pass it on to COREX. This process continues everyday. What is the best possible way to implement this? Thanks, Mark. On Mon, Mar 8, 2010 at 9:53 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Hi Mark, > > On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher < > mark.fletcher2...@gmail.com> wrote: > >> >> I ran the SWAP command. Now:- >> COREX has the dataDir pointing to the updated dataDir of COREY. So COREX >> has the latest. >> Again, COREY (on which the update regularly runs) is pointing to the old >> index of COREX. So this now doesnt have the most updated index. >> >> Now shouldn't I update the index of COREY (now pointing to the old COREX) >> so that it has the latest footprint as in COREX (having the latest COREY >> index)so that when the update again happens to COREY, it has the latest and >> I again do the SWAP. >> >> Is a physical copying of the index named COREY (the latest and now datDir >> of COREX after SWAP) to the index COREX (now the dataDir of COREY.. the >> orginal non-updated index of COREX) the best way for this or is there any >> other better option. >> >> Once again, later when COREY is again updated with the latest, I will run >> the SWAP again and it will be fine with COREX again pointing to its original >> dataDir (now the updated one).So every even SWAP command run will point >> COREX back to its original dataDir. (same case with COREY). >> >> My only concern is after the SWAP is done, updating the old index (which >> was serving previously and now replaced by the new index). What is the best >> way to do that? Physically copy the latest index to the old one and make it >> in sync with the latest one so that by the time it is to get the latest >> updates it has the latest in it so that the new ones can be added to this >> and it becomes the latest and is again swapped? >> > > Perhaps it is best if we take a step back and understand why you need two > identical cores? > > -- > Regards, > Shalin Shekhar Mangar. >
Re: index merge
Hi All, Thank you for the very valuable suggestions. I am planning to try using the Master - Slave configuration. Best Rgds, Mark. On Mon, Mar 8, 2010 at 11:17 AM, Mark Miller wrote: > On 03/08/2010 10:53 AM, Mark Fletcher wrote: > >> Hi Shalin, >> >> Thank you for the mail. >> My main purpose of having 2 identical cores >> COREX - always serves user request >> COREY - every day once, takes the updates/latest data and passess it on to >> COREX. >> is:- >> >> Suppose say I have only one COREY and suppose a request comes to COREY >> while >> the update of the latest data is happening on to it. Wouldn't it degrade >> performance of the requests at that point of time? >> >> > Yes - but your not going to help anything by using two indexes - best you > can do it use two boxes. 2 indexes on the same box will actually > be worse than one if they are identical and you are swapping between them. > Writes on an index will not affect reads in the way you are thinking - only > in that its uses IO and CPU that the read process cant. Thats going to > happen with 2 indexes on the same box too - except now you have way more > data to cache and flip between, and you can't take any advantage of things > just being written possibly being in the cache for reads. > > Lucene indexes use a write once strategy - when writing new segments, you > are not touching the segments being read from. Lucene is already doing the > index juggling for you at the segment level. > > > So I was planning to keep COREX and COREY always identical. Once COREY has >> the latest it should somehow sync with COREX so that COREX also now has >> the >> latest. COREY keeps on getting the updates at a particular time of day and >> it will again pass it on to COREX. This process continues everyday. >> >> What is the best possible way to implement this? >> >> Thanks, >> >> Mark. >> >> >> On Mon, Mar 8, 2010 at 9:53 AM, Shalin Shekhar Mangar< >> shalinman...@gmail.com> wrote: >> >> >> >>> Hi Mark, >>> >>> On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher< >>> mark.fletcher2...@gmail.com> wrote: >>> >>> >>> >>>> I ran the SWAP command. Now:- >>>> COREX has the dataDir pointing to the updated dataDir of COREY. So COREX >>>> has the latest. >>>> Again, COREY (on which the update regularly runs) is pointing to the old >>>> index of COREX. So this now doesnt have the most updated index. >>>> >>>> Now shouldn't I update the index of COREY (now pointing to the old >>>> COREX) >>>> so that it has the latest footprint as in COREX (having the latest COREY >>>> index)so that when the update again happens to COREY, it has the latest >>>> and >>>> I again do the SWAP. >>>> >>>> Is a physical copying of the index named COREY (the latest and now >>>> datDir >>>> of COREX after SWAP) to the index COREX (now the dataDir of COREY.. the >>>> orginal non-updated index of COREX) the best way for this or is there >>>> any >>>> other better option. >>>> >>>> Once again, later when COREY is again updated with the latest, I will >>>> run >>>> the SWAP again and it will be fine with COREX again pointing to its >>>> original >>>> dataDir (now the updated one).So every even SWAP command run will point >>>> COREX back to its original dataDir. (same case with COREY). >>>> >>>> My only concern is after the SWAP is done, updating the old index (which >>>> was serving previously and now replaced by the new index). What is the >>>> best >>>> way to do that? Physically copy the latest index to the old one and make >>>> it >>>> in sync with the latest one so that by the time it is to get the latest >>>> updates it has the latest in it so that the new ones can be added to >>>> this >>>> and it becomes the latest and is again swapped? >>>> >>>> >>>> >>> Perhaps it is best if we take a step back and understand why you need two >>> identical cores? >>> >>> -- >>> Regards, >>> Shalin Shekhar Mangar. >>> >>> >>> >> >> > > > -- > - Mark > > http://www.lucidimagination.com > > > >
create core with separate solrconfig.xml
Hi, I wanted to configure one core as Master and one core as slave. This is my existing configuration:- In my SOLR_HOME I have conf/schema.xml, conf/solrconfig.xml and the others when no core was present Also in my SOLR_HOME are solr.xml and coreA created using the CREATE command for cores I have my other coreB's index in a different dataDir I believe in this configuration both the cores share the same schema.xml and solrconfig.xml. I added the master slave replication code in my {SOLR_HOME}/conf/solrconfig.xml. optimize 00:00:10 Just below that I specified the slave {specified the instanceDir} /coreA/replication internal 5000 1 username password When I optimize coreA, replication to coreB doesn't happen. CoreA (my supposed to be master here) gets the new values but not coreB. When I tried the *startup* option in the first block of replication it gave lucene write error in the index so I went for optimize. Is there something wrong here or do I need to have separate solrconfig.xml for coreA and coreB to clearly indicate who is master and who is slave by including only one of the replicaiton codes in the corresponding solrconfig.xml rather than have a common solrconfig.xml and specify both in that. If I need to specify separate solrconfig.xml for both cores, how do I do that?? Any help is appreciated. Thanks and Rgds, Mark
Re: create core with separate solrconfig.xml
Hi Shalin, Thank you for your reply. I think I mixed 2 matters in my prev mail (replication and core creation). So let me first get help for my CORES set up. My current set up is:- In my SOLR_HOME I have *conf*/configfiles (like schema.xml, solrconfig.xml etc...) I created my new core say coreA using command:- http://localhost:8983/solr//admin/cores?action=CREATE&name=coreA&instanceDir={SOLR_HOME}&config=solrconfig.xml&schema=schema.xml&dataDir={SOLR_HOME}/core3/solr/data/ Note:- the place-holder {SOLR_HOME} ... stands for my SOLR_HOME which I have fully substituted there. As a result of running this query, I had a new directory coreA created in my SOLR_HOME. But inside coreA I didnt find a new conf directory to see my configuration files specific to my new coreA (otherwise where should i be looking to see the schema.xml and solrconfig.xml specific to coreA). It seems it is still referring to the common schema.xml and solrconfig.xml inside my {SOLR_HOME}/conf. Where do I see the solrconfig.xml and schema.xml specific to coreA?? Now only the dataDir has been created for coreA in {SOLR_HOME}/core3/solr/data/ Thanks and Rgds, Mark On Mon, Mar 15, 2010 at 4:15 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Mon, Mar 15, 2010 at 6:12 AM, Mark Fletcher > wrote: > > > > > I wanted to configure one core as Master and one core as slave. > > This is my existing configuration:- > > > > In my SOLR_HOME I have conf/schema.xml, conf/solrconfig.xml and the > others > > when no core was present > > Also in my SOLR_HOME are solr.xml and coreA created using the CREATE > > command > > for cores > > > > I have my other coreB's index in a different dataDir > > > > I believe in this configuration both the cores share the same schema.xml > > and > > solrconfig.xml. I added the master slave replication code in my > > {SOLR_HOME}/conf/solrconfig.xml. > > > > > > > > > > > >optimize > > > > > > > > > > > >00:00:10 > > > > > > > > > > Just below that I specified the slave > > > > > > > > > >{specified the instanceDir} > > /coreA/replication > > > > The masterUrl is a HTTP URL. I'm not sure if you have specified a HTTP URL > here. > > > > > > > > > > > > > > > >internal > > > >5000 > >1 > > > >username > >password > > > > > > > > When I optimize coreA, replication to coreB doesn't happen. CoreA (my > > supposed to be master here) gets the new values but not coreB. When I > tried > > the *startup* option in the first block of replication it gave lucene > write > > error in the index so I went for optimize. > > > > > What was the lucene write error that you saw? Can you paste the stack > trace? > > > > Is there something wrong here or do I need to have separate > solrconfig.xml > > for coreA and coreB to clearly indicate who is master and who is slave by > > including only one of the replicaiton codes in the corresponding > > solrconfig.xml rather than have a common solrconfig.xml and specify both > in > > that. > > > > If I need to specify separate solrconfig.xml for both cores, how do I do > > that?? > > > > > When you create the core using the CoreAdmin, you can specify an alternate > solrconfig.xml through the "config" parameter. > > -- > Regards, > Shalin Shekhar Mangar. >
some snynonym clarifications
Hi, Just needed some help to understand the following synonym mappings:- 1. aaa => does it mean:- if the user queries for aaa it is replaced with and documents matching are searched for or does it mean if the user queries for aaa, documents with aaa as well as are looked for 2. bbb => 1 2 does it mean that if the user queries for bbb, SOLR will look for documents that contain 1 2 3. ccc => 1,2 does it mean that if the user queries for ccc, SOLR will look for documents that contain 1 or 2 4. a\=>a => b\=>b First of all my doubt is what does the "\" do there. Does it have any special significance. Can someone help me interpret the above 5. a\,a => b\,b Can some one help me with this also 6. fooaaa,baraaa,bazaaa does this mean that if any of fooaaa or baraaa or bazaaa comes as the search keyword, SOLR will look for documents that contain fooaaa 7. abc def rose\, atlas method, NY.GNP.PCAP.PP.CD does this mean a query for any of the above 3 will always be replaced by a query for abc def rose\ Can some one pls extend some help at your earliest convenience. Thank you. Mark.
Re: some snynonym clarifications
Hi, Thanks for the mail. I had tried the WIKI. My doubts remaining were mainly:- 1. If we have synonyms specified and they replace your search keyword with the ones specified wouldn't we face a risk of our original keyword missed out. What i meant is if I have a keyword for search say "agriculture" and I replace it with some synonyms, will I never again be able to search directly for "agriculture". ie suppose I have a document which has the term agriculture and none of the synonyms in it. Will that document be retrieved when i search for agriculture as I have now mapped it to other terms. 2. I am still a bit confused about the interpretation of:- a\=>a => b\=>b a\,a => b\,b abc def rose\, my cap , rose flower Can you pls give a one linere explanation for the above. There are some sample entries in the synonyms.txt 3. If I get some help me with the above 3 it will help me understand the backslash "\" also better. Thanks, Mark. On Thu, Mar 18, 2010 at 12:19 PM, Markus Jelsma wrote: > Hi, > > > Check out the wiki page on the SynonymFilterFactory. It gives a decent > explantion on the subject. The backslash is just for escaping otherwise > meaningful characters. > > > [1]: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory > > > Cheers, > > On Thursday 18 March 2010 17:10:56 Mark Fletcher wrote: > > Hi, > > > > Just needed some help to understand the following synonym mappings:- > > > > 1. aaa => > > does it mean:- > > if the user queries for aaa it is replaced with and > documents > > matching are searched for > >or does it mean > > if the user queries for aaa, documents with aaa as well as > > are looked for > > > > > > 2. bbb => 1 2 > > does it mean that if the user queries for bbb, SOLR will look for > > documents that contain 1 2 > > > > > > 3. ccc => 1,2 > > does it mean that if the user queries for ccc, SOLR will look for > > documents that contain 1 or 2 > > > > 4. a\=>a => b\=>b > > First of all my doubt is what does the "\" do there. Does it > have > > any special significance. > > Can someone help me interpret the above > > > > 5. a\,a => b\,b > > Can some one help me with this also > > > > 6. fooaaa,baraaa,bazaaa > > does this mean that if any of fooaaa or baraaa or bazaaa > comes > > as the search keyword, SOLR will look for documents that contain > > fooaaa > > > > 7. abc def rose\, my cap , rose flower > > >does this mean a query for any of the above 3 will always be > > replaced by a query for abc def rose\ > > > > Can some one pls extend some help at your earliest convenience. > > > > Thank you. > > Mark. > > > > Markus Jelsma - Technisch Architect - Buyways BV > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 > >
Re: some snynonym clarifications
Thanks Marcus! I got it. BR, Mark. On Fri, Mar 19, 2010 at 5:50 AM, Markus Jelsma wrote: > > On Thursday 18 March 2010 17:47:45 Mark Fletcher wrote: > > Hi, > > > > Thanks for the mail. I had tried the WIKI. > > > > My doubts remaining were mainly:- > > > > 1. > > If we have synonyms specified and they replace your search keyword with > the > > ones specified wouldn't we face a risk of our original keyword missed > out. > > What i meant is if I have a keyword for search say "agriculture" and I > > replace it with some synonyms, will I never again be able to search > > directly for "agriculture". ie suppose I have a document which has the > > term agriculture and none of the synonyms in it. Will that document be > > retrieved when i search for agriculture as I have now mapped it to other > > terms. > > It depends whether you let them be replaced. If you omit the => sign, the > terms simlpy will be expanded to whatever synonyms you specified. I could > not > explain it any better than the wiki. > > > 2. > > I am still a bit confused about the interpretation of:- > > a\=>a => b\=>b > > > > a\,a => b\,b > > > >abc def rose\, my cap , rose flower > > > >Can you pls give a one linere explanation for the above. There are > some > > sample entries in the synonyms.txt > > This is escaping otherwise meaningful characters. The , and => are > meaninful > to the SynonymFilterFactory and therefore need to be escaped as you also > would > escape certain characters in any language or whatever. You need to escape > qoutes in many languages and you must escape the : sign a.o. in you Lucene > queries. > > > > 3. If I get some help me with the above 3 it will help me understand the > backslash "\" also better. > > Thanks, > > Mark. > > > > > > On Thu, Mar 18, 2010 at 12:19 PM, Markus Jelsma > wrote: > > Hi, > > > > > > Check out the wiki page on the SynonymFilterFactory. It gives a decent > > explantion on the subject. The backslash is just for escaping otherwise > > meaningful characters. > > > > > > [1]: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synony > > mFilterFactory > > > > > > Cheers, > > > > On Thursday 18 March 2010 17:10:56 Mark Fletcher wrote: > > > Hi, > > > > > > Just needed some help to understand the following synonym mappings:- > > > > > > 1. aaa => > > > does it mean:- > > > if the user queries for aaa it is replaced with and > > > documents matching are searched for > > >or does it mean > > > if the user queries for aaa, documents with aaa as well as > > > > are looked for > > > > > > > > > 2. bbb => 1 2 > > > does it mean that if the user queries for bbb, SOLR will look > for > > > documents that contain 1 2 > > > > > > > > > 3. ccc => 1,2 > > > does it mean that if the user queries for ccc, SOLR will look > for > > > documents that contain 1 or 2 > > > > > > 4. a\=>a => b\=>b > > > First of all my doubt is what does the "\" do there. Does it > > > have any special significance. > > > Can someone help me interpret the above > > > > > > 5. a\,a => b\,b > > > Can some one help me with this also > > > > > > 6. fooaaa,baraaa,bazaaa > > > does this mean that if any of fooaaa or baraaa or bazaaa > > > comes as the search keyword, SOLR will look for documents that contain > > > fooaaa > > > > > > 7. abc def rose\, my cap , rose flower > > > > > > > > >does this mean a query for any of the above 3 will always be > > > replaced by a query for abc def rose\ > > > > > > Can some one pls extend some help at your earliest convenience. > > > > > > Thank you. > > > Mark. > > > > Markus Jelsma - Technisch Architect - Buyways BV > > http://www.linkedin.com/in/markus17 > > 050-8536620 / 06-50258350 > > > > Markus Jelsma - Technisch Architect - Buyways BV > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 > >
dismax and q.op
Hi, I am using dismax handler. I have it set up in my solrconfig.xml. I have *not* used default="true" while setting it up (the standard still has default="true") *I haven't mentioned value for mm* In my schema.xml I have set the default operator to be AND. When I query I use the following in my query url where my query is for say for example *international monetary fund*:- .../select?*q.alt*=international+monetary+fund&*qt=dismax* My result:- No results; but each of the terms individually gave me results! I appreciate any help on my following queries :- 1. Will the query look for documents that have *international* AND *monetary * AND *fund* or is it some other behavior based on the setting I have mentioned above. 2. Does the default operator specified in schema.xml take effect when we use dismax also or is it only for the *standard* request handler. If it has an effect if we specify value for mm like say 90% will it overridethe schema.xml default operator set up. 3. How does q.alt and q difer in behavior in the above case. I found q.alt to be giving me the results which I got when I used the standard RH also. Hence used it. 4. When I make a change to the dismax set up I have in solrconfig.xml I believe i just have to bounce the SOLR server.Do i need to re-index again for the change to take effect 5. If I use the dismax how do I see the ANALYSIS feature on the admin console other wise used for *standard* RH. Thanks for your patience. Best Rgds, Mark.
Re: dismax and q.op
Hi Hoss, Thankyou so much for your time. Regarding the last one I myself got confused when I posed the question. I got it after your reply. I think I was actually looking for some thing like the debugQuery="on" option, which I found later. Best Regards, Mark. On Tue, Mar 23, 2010 at 6:56 PM, Chris Hostetter wrote: > > : *I haven't mentioned value for mm* >... > : My result:- No results; but each of the terms individually gave me > results! > > > http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29 > >"The default value is 100% (all clauses must match)" > > : 2. Does the default operator specified in schema.xml take effect when we > use > : dismax also or is it only for the *standard* request handler. If it has > an > > dismax doesn't look at the default operator, or q.op. > > : 3. How does q.alt and q difer in behavior in the above case. I found > q.alt > : to be giving me the results which I got when I used the standard RH also. > : Hence used it. > > q.alt is used if and only if there is no q param (or hte q param is blank) > ... the number of patches "q" gets, or the value of "mm" make no > differnce. > > : 4. When I make a change to the dismax set up I have in solrconfig.xml I > : believe i just have to bounce the SOLR server.Do i need to re-index again > : for the change to take effect > > no ... changes to "query" time options like your SearchHandler configs > don't require reindexing .. changes to your schema.xml *may* requre > reindexing. > > : 5. If I use the dismax how do I see the ANALYSIS feature on the admin > : console other wise used for *standard* RH. > > I'm afraid i don't understand this question ... analysis.jsp just shows > you the index and query time analysis that is performed when certain > fields are used -- it dosen't know/care about your choice of parser ... it > knows nothing about query parser syntax. > > > > -Hoss > >
Re: Read Time Out Exception while trying to upload a huge SOLR input xml
Hi Eric, Shawn, Thank you for your reply. Luckily just on the second time itself my 13GB SOLR XML (more than a million docs) went in fine into SOLR without any problem and I uploaded another 2 more sets of 1.2million+ docs fine without any hassle. I will try for lesser sized more xmls next time as well as the auto commit suggestion. Best Rgds, Mark. On Thu, Apr 1, 2010 at 6:18 PM, Shawn Smith wrote: > The error might be that your http client doesn't handle really large > files (32-bit overflow in the Content-Length header?) or something in > your network is killing your long-lived socket? Solr can definitely > accept a 13GB xml document. > > I've uploaded large files into Solr successfully, including recently a > 12GB XML input file with ~4 million documents. My Solr instance had > 2GB of memory and it took about 2 hours. Solr streamed the XML in > nicely. I had to jump through a couple of hoops, but in my case it > was easier than writing a tool to split up my 12GB XML file... > > 1. I tried to use curl to do the upload, but it didn't handle files > that large. For my quick and dirty testing, netcat (nc) did the > trick--it doesn't buffer the file in memory and it doesn't overflow > the Content-Length header. Plus I could pipe the data through pv to > get a progress bar and estimated time of completion. Not recommended > for production! > > FILE=documents.xml > SIZE=$(stat --format %s $FILE) > (echo "POST /solr/update HTTP/1.1 > Host: localhost:8983 > Content-Type: text/xml > Content-Length: $SIZE > " ; cat $FILE ) | pv -s $SIZE | nc localhost 8983 > > 2. Indexing seemed to use less memory if I configured Solr to auto > commit periodically in solrconfig.xml. This is what I used: > > > >25000 >30 > > > > Shawn > > On Thu, Apr 1, 2010 at 10:10 AM, Erick Erickson > wrote: > > Don't do that. For many reasons . By trying to batch so many docs > > together, you're just *asking* for trouble. Quite apart from whether > it'll > > work once, having *any* HTTP-based protocol work reliably with 13G is > > fragile... > > > > For instance, I don't want to have my know whether the XML parsing in > > SOLR parses the entire document into memory before processing or > > not. But I sure don't want my application to change behavior if SOLR > > changes it's mind and wants to process the other way. My perfectly > > working application (assuming an event-driven parser) could > > suddenly start requiring over 13G of memory... Oh my aching head! > > > > Your specific error might even be dependent upon GCing, which will > > cause it to break differently, sometimes, maybe.. > > > > So do break things up and transmit multiple documents. It'll save you > > a world of hurt. > > > > HTH > > Erick > > > > On Thu, Apr 1, 2010 at 4:34 AM, Mark Fletcher > > wrote: > > > >> Hi, > >> > >> For the first time I tried uploading a huge input SOLR xml having about > 1.2 > >> million *docs* (13GB in size). After some time I get the following > >> exception:- > >> > >> The server encountered an internal error ([was class > >> java.net.SocketTimeoutException] Read timed out > >> java.lang.RuntimeException: [was class java.net.SocketTimeoutException] > >> Read > >> timed out > >> at > >> > >> > com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) > >> at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) > >> at > >> > >> > com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) > >> at > com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) > >> at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:279) > >> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138) > >> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) > >> at > >> > >> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > >> at > >> > >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > >> at > >> > >> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > >> at > >> > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter
one particular doc in results should always come first for a particular query
Hi, Suppose I search for the word *international. *A particular record (say * recordX*) I am looking for is coming as the Nth result now. I have a requirement that when a user queries for *international *I need recordX to always be the first result. How can I achieve this. Note:- When user searches with a *different* keyword, *recordX* need not be the expected first result record; it may be a different record that has to be made to come as the first in the result for that keyword. Is there a way to achieve this requirement. I am using dismax. Thanks in advance. BR, Mark
exact match coming as second record
Hi, I am using the dismax handler. I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have boosted myfield^20.0. Even with such a high boost (in fact among the qf fields specified this field has the max boost given), when I search for XXX.YYY.ZZZ I see my record as the second one in the results and a record of the form XXX.YYY.ZZZ.AAA.BBB is appearing as the first one. Can any one help me understand why is this so, as I thought an exact match on a heavily boosted field would give the exact match record first in dismax. Thanks and Rgds, Mark
Re: exact match coming as second record
Hi Eric, Thanks many for your mail! Please find attached the debugQuery results. Thanks! Mark On Mon, Apr 5, 2010 at 7:38 PM, Erick Erickson wrote: > What do you get back when you specify &debugQuery=on? > > Best > Erick > > On Mon, Apr 5, 2010 at 7:31 PM, Mark Fletcher > wrote: > > > Hi, > > > > I am using the dismax handler. > > I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have > > boosted myfield^20.0. > > Even with such a high boost (in fact among the qf fields specified this > > field has the max boost given), when I search for XXX.YYY.ZZZ I see my > > record as the second one in the results and a record of the form > > XXX.YYY.ZZZ.AAA.BBB is appearing as the first one. > > > > Can any one help me understand why is this so, as I thought an exact > match > > on a heavily boosted field would give the exact match record first in > > dismax. > > > > Thanks and Rgds, > > Mark > > > A personal note:- I have boosted the id field to the highest among my qf values specified in my dismax. Even then when I search for an id say XX.YYY.ZZZ, instead of pushing the record with id=XX.YYY.ZZZ to the first place, it is displaying another record XX.YYY.ZZZ.ME.PK as the first one...There are total 4 results but I have included details of only the first and second. Am surprised why XX.YY.ZZZ doesn't come as the first record even after an exact match found in it. My qf fields in dismax:- name^10.0 id^20.0 subtopic1^1.0 indicator_value^1.0 country_name^1.0 country_code^1.0 source^0.8 database^1.4 definition^1.2 dr_report_name^1.0 dr_header^1.0 dr_footer^1.0 dr_mdx_query^1.0 dr_reportmetadata^1.0 content^1.0 aag_indicators^1.0 type^1.0 text^.3 id^6.0 type:Timeseries^1000.0 Debug Report:- xx.yyy. xx.yyy. +DisjunctionMaxQuery((text:"(xx.yyy.zzz xx) yyy "^0.3 | definition:"(xx.yyy.zzz xx) yyy "^0.2 | indicator_value:"(xx.yyy.zzz xx) yyy " | subtopic1:"(xx.yyy.zzz xx) yyy " | dr_report_name:"(xx.yyy.zzz xx) yyy " | dr_reportmetadata:"(xx.yyy.zzz xx) yyy " | dr_footer:"(xx.yyy.zzz xx) yyy " | type:"(xx.yyy.zzz xx) yyy " | country_code:"(xx.yyy.zzz xx) yyy "^2.0 | country_name:"(xx.yyy.zzz xx) yyy "^2.0 | database:"(xx.yyy.zzz xx) yyy "^1.4 | aag_indicators:"(xx.yyy.zzz xx) yyy " | content:"(xx.yyy.zzz xx) yyy " | id:xx.yyy.^1000.0 | dr_mdx_query:"(xx.yyy.zzz xx) yyy " | source:"(xx.yyy.zzz xx) yyy "^0.2 | name:"(xx.yyy.zzz xx) yyy "^10.0 | dr_header:"(xx.yyy.zzz xx) yyy ")~0.01) DisjunctionMaxQuery((id:xx.yyy.^6.0)~0.01) type:timeseries^1000.0 +(text:"(xx.yyy.zzz xx) yyy "^0.3 | definition:"(xx.yyy.zzz xx) yyy "^0.2 | indicator_value:"(xx.yyy.zzz xx) yyy " | subtopic1:"(xx.yyy.zzz xx) yyy " | dr_report_name:"(xx.yyy.zzz xx) yyy " | dr_reportmetadata:"(xx.yyy.zzz xx) yyy " | dr_footer:"(xx.yyy.zzz xx) yyy " | type:"(xx.yyy.zzz xx) yyy " | country_code:"(xx.yyy.zzz xx) yyy "^2.0 | country_name:"(xx.yyy.zzz xx) yyy "^2.0 | database:"(xx.yyy.zzz xx) yyy "^1.4 | aag_indicators:"(xx.yyy.zzz xx) yyy " | content:"(xx.yyy.zzz xx) yyy " | id:xx.yyy.^1000.0 | dr_mdx_query:"(xx.yyy.zzz xx) yyy " | source:"(xx.yyy.zzz xx) yyy "^0.2 | name:"(xx.yyy.zzz xx) yyy "^10.0 | dr_header:"(xx.yyy.zzz xx) yyy ")~0.01 (id:xx.yyy.^6.0)~0.01 type:timeseries^1000.0 0.15786289 = (MATCH) sum of: 6.086512E-4 = (MATCH) max plus 0.01 times others of: 6.086512E-4 = (MATCH) weight(text:"(xx.yyy. sp) yyy "^0.3 in 1004), product of: 7.562088E-4 = queryWeight(text:"(xx.yyy. xx) yyy "^0.3), product of: 0.3 = boost 20.604721 = idf(text:"(xx.yyy. xx) yyy "^0.3) 1.2233584E-4 = queryNorm 0.8048719 = (MATCH) fieldWeight(text:"(xx.yyy. xx) yyy "^0.3 in 1004), product of: 1.0 = tf(phraseFreq=1.0) 20.604721 = idf(text:"(xx.yyy. xx) yyy "^0.3) 0.0390625 = fieldNorm(field=text, doc=1004) 0.15725423 = (MATCH) weight(type:timeseries^1000.0 in 1004), product of: 0.1387005 = queryWeight(type:timeseries^1000.0), product of: 1000.0 = boost 1.1337683 = idf(docFreq=1054, maxDocs=1206) 1.2233584E-4 = queryNorm 1.1337683 = (MATCH) fieldWeight(type:timeseries in 1004), product of: 1.0 = tf(termFreq(type:timeseries)=1) 1.1337683 = idf(
Elevate query and standard RH
hi, I found elevate query working fine with dismax handler when i added the searchComponent to my Dismax RH. Couldn't find the desired results when trying with the standard RequestHandler. Hope it works just like that with the Standard RH also. Thanks and Rgds, Mark.
Re: one particular doc in results should always come first for a particular query
Thanks Eric, Chris! I tried the Query Elevation and it seems to be working fine for me. Best Rgds, Mark. On Mon, Apr 5, 2010 at 7:40 PM, Chris Hostetter wrote: > > : If that's the case, you could copy the magic keyword to a different field > : (say magic_keyword) and boost it right into orbit as an OR clause > : (magic_keyword:bonkers ^1). This kind of assumes that a magic keyword > : corresponds to one and only one document > : > : If this is way off base, perhaps you could characterize how keywords map > to > : specific documents you want at the top. > > This smells like... > > http://wiki.apache.org/solr/QueryElevationComponent > > -Hoss >
dismax and qf
Hi, I use *dismax* and have specified my fields to be boosted in the qf parameter in solrconfig.xml. What I understand is that in the search URL also I can specify these qf value by doing the addition &qf=field1^100 field2^200 which can override the boost specified to each field in solrconfig.xml. But when I change the boosts like this in my URL for the various fields, I still don't find any difference at all in the order in which the results are coming. I have many fields specified in qf. I changed their boost by changing say * field1^2.0* to *field1^1000 *in the query URL but I find no change in the order in which the results still come. Is there any problem in using the qf parameter like this as part of the query URL and varying the boosts (relevancy) to check how the order of results vary. My aim is to see how the results would change if I boosted some fields more from others or the viceversa of decreased boost of some fields compared to others. Could some one pls help! Thanks. Mark.