Re: SOLR-788 and merged trunk
On 5/17/2010 3:34 PM, Shawn Heisey wrote: I am looking at SOLR-788, trying to apply it to latest trunk. It looks like that's going to require some rework, because the included constant PURPOSE_GET_MLT_RESULTS conflicts with something added later, PURPOSE_GET_TERMS. How hard would it be to rework this to apply correctly to trunk? Is it simply a matter of advancing the constant to the next bit in the mask? There's been no discussion on the issue as to whether the original patch or the alternate one is better. Does anyone know? I could not make the original patch work. I did get it to apply, but it would not compile. With some massaging, the alternate patch applied, compiled, and seems to have passed all junit tests as well. Considering that it's nearly 2 AM here, I will play further tomorrow. I did have one question that I hope someone can answer. It looks like the DIH has been moved outside the war file into separate jars that I will have to ensure are in the lib directory. Is that an accurate statement? Shawn
Solr Architecture discussion
Hi, I'd like to get some architectural advices concerning the setup of a solr (v1.4) platform in a production environment. I'll first describe my targeted architecture and then ask the questions related to that environment. Here's briefly what I achieved so far: I've already setup an environment which serves as a proof of concept. This environment is composed of a master instance on one host, and a slave instance on a second host. The slave handles 2 solr cores. In the final version of the architecture I would add up one ore more SLAVE nodes depending on the request load. request | V [ MASTER [core] ] --- [SLAVE [core1] <--swap-->[core2] ] | v [index backup] The goal of this architecture is: * Isolate indexing from requesting * Enable index replication from master to slave * Control the swap between newly replicated index (use of dual core per Slave ) Here's how the whole platform works when we need to renew the index (on the slaves) 1- backup index files on master using solr backup capability (a backup is always welcome) 2- launch index creation (I'm using the delta indexing capabilities in order to limit the index generation time) 3- trigger replication from master core to slave core2 based on solr capabilities too 4- trigger swap between core 1 and core2 5- At this point Slave index has been renewed ... we can revert back to the previous index if there was any issues with the new one. As this is aimed to be a production environment, redondancy is one of the key elements, meaning that will double (or more) the front solr instances. If slave instances are not in the same network as the Master instance, our strategy will probably be to set up one of the slaves as a relay. That said, here are my questions: 1 / I'd like to have insight about issues that may happen with that kind of architecture? 2 / My first concern is about the size of the index that would need to be replicated. We need to perform indexing all day long (every 5min) and replicate as soon as the index is built. As far as I know, replication copies over all the index files. I think that there can not be delta replication (only replicating what changed). That's my assumption. But, is there any way to make a delta replication if that make any sense? 3 / How can I improve this architecture based on your own experience? Ex: Shall I use different network interface for solr commands and requests? Thank you for sharing. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p825708.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH. behavior after a import. Log, delete table !?
how can i say that solr should start the jar after every Delta-Import NOT after every Full-Import ? -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p825717.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH. behavior after a import. Log, delete table !?
> how can i say that solr should start the jar after every > Delta-Import NOT > after every Full-Import ? You cannot distinguish between delta or full. So you need to do it in your jar program. In your java program you need to send GET method to url http://localhost:8080/solr/dataimport if result string/xml contains contains 'idle' and 'Delta Dump started' then you can truncate your table. if result string contains contains 'idle' and 'Full Dump Started' then do nothing.
Multifaceting on multivalued field
Hi all, I read about multifaceting [1] and tried it for myself. With multifaceting I would like to conserve the number of documents for the 'un-facetted case'. This works nice with normal fields, but I get an exception [2] if I apply this on a multivalued field. Is this a bug or logical :-) ? If the latter one is the case, would anybody help me to understand this? Regards, Peter. [1] http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html [2] org.apache.solr.common.SolrException: undefined field !{ex=cars}cars at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1077) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:226) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239)
Solr Cell and encrypted pdf files
Hi, I can't seem to get solr cell to index password protected pdf files. I can't figure out how to pass the password to tika and looking at ExtractingDocumentLoader, it doesn't seem to pass any pdf password related metadata to the tika parser. Whatever I do, pdfbox complains that: "The supplied password does not match either the owner or user password in the document." If i strip the password manually before trying to index the document it works What I'm I missing? thanks! yiannis
Re: Multifaceting on multivalued field
Hi, This exception is fired when you don't have this field on your index, but this comes because you have an error in your query syntax !{ex=cars}cars, should be {*!*ex=cars}cars , whith the exclamation inside the brackets. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/5/18 Peter Karich > Hi all, > > I read about multifaceting [1] and tried it for myself. With > multifaceting I would like to conserve the number of documents for the > 'un-facetted case'. This works nice with normal fields, but I get an > exception [2] if I apply this on a multivalued field. > Is this a bug or logical :-) ? If the latter one is the case, would > anybody help me to understand this? > > Regards, > Peter. > > [1] > > http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html > > [2] > org.apache.solr.common.SolrException: undefined field !{ex=cars}cars >at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1077) >at > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:226) >at > > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283) >at > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166) >at > > org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) >at > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) >at > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) >at > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336) >at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239) > >
Long startup phase
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi there, trying to deploy Solr 1.4/JDK 1.6/CentOS Linux 64bit on a new production server. Starting Solr takes very long on this machine. In particular it seems to hang for a minute or two showing only this on the console: [...@db01 backend_buildout]$ bin/solr-instance fg 2010-05-18 16:22:51.507::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2010-05-18 16:22:51.585::INFO: jetty-6.1.3 Using strace shows that the process since to be waiting aka hanging in the wait4() call below. Any idea? Andreas open("/usr/local/lib/python2.6/plat-linux2/cStringIO.py", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.6/plat-linux2/cStringIO.pyc", O_RDONLY) = - -1 ENOENT (No such file or directory) stat("/usr/local/lib/python2.6/lib-tk/cStringIO", 0x7fff292873f0) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.6/lib-tk/cStringIO.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.6/lib-tk/cStringIOmodule.so", O_RDONLY) = - -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.6/lib-tk/cStringIO.py", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.6/lib-tk/cStringIO.pyc", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/local/lib/python2.6/lib-old/cStringIO", 0x7fff292873f0) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.6/lib-old/cStringIO.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.6/lib-old/cStringIOmodule.so", O_RDONLY) = - -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.6/lib-old/cStringIO.py", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.6/lib-old/cStringIO.pyc", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/local/lib/python2.6/lib-dynload/cStringIO", 0x7fff292873f0) = - -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 5 fstat(5, {st_mode=S_IFREG|0755, st_size=50484, ...}) = 0 open("/usr/local/lib/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 6 read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\31\0\0\0\0\0\0"..., 832) = 832 fstat(6, {st_mode=S_IFREG|0755, st_size=50484, ...}) = 0 mmap(NULL, 2114584, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0x2acde2995000 mprotect(0x2acde2999000, 2093056, PROT_NONE) = 0 mmap(0x2acde2b98000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x3000) = 0x2acde2b98000 close(6)= 0 close(5)= 0 close(4)= 0 getrlimit(RLIMIT_NOFILE, {rlim_cur=1, rlim_max=1}) = 0 close(3)= 0 pipe([3, 4])= 0 fcntl(4, F_GETFD) = 0 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2acdde8a9a50) = 21603 close(4)= 0 mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2acde2b9a000 read(3, "", 1048576)= 0 mremap(0x2acde2b9a000, 1052672, 4096, MREMAP_MAYMOVE) = 0x2acde2b9a000 close(3)= 0 munmap(0x2acde2b9a000, 4096)= 0 wait4(21603, 0x7fff2928f6f4, WNOHANG, NULL) = 0 wait4(21603, 2010-05-18 16:21:04.731::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2010-05-18 16:21:04.811::INFO: jetty-6.1.3 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvypD8ACgkQCJIWIbr9KYz5BwCfSUgefA5fWco2grsIC4nE+4O9 neYAoJpx5J/s6wu89CG5TdKYCZqts4u1 =IXgz -END PGP SIGNATURE-
how to achieve filters
Hi All I am using "dismax" query to fetch docs from solr where I have set some boost to the each fields, If I search for query "Rock" I get following docs with some boost value which I have specified, 19.494072 120 mp3 Rock 1 st name 1 19.494052 248 aac+ Rock 2 st name 2 19.494042 127 aac+ Rock 3 st name 3 19.494032 256 mp3 Rock 4 st name 5 I am looking for something below What is the best way to achieve them ? 1. Query=rock where content= mp3 where it should return only first and last docs where content=mp3 2. Query=rock where bitrate<128 where it should return only first and third docs where bitrate<128 Thanks in advance Prakash
Re: Multifaceting on multivalued field
Hi Marco, oh, awkward. Thanks a lot!! Regards, Peter. > Hi, > > This exception is fired when you don't have this field on your index, but > this comes because you have an error in your query syntax !{ex=cars}cars, > should be {*!*ex=cars}cars , whith the exclamation inside the brackets. > > > > Marco Martínez Bautista > http://www.paradigmatecnologico.com > Avenida de Europa, 26. Ática 5. 3ª Planta > 28224 Pozuelo de Alarcón > Tel.: 91 352 59 42 > > > 2010/5/18 Peter Karich > > >> Hi all, >> >> I read about multifaceting [1] and tried it for myself. With >> multifaceting I would like to conserve the number of documents for the >> 'un-facetted case'. This works nice with normal fields, but I get an >> exception [2] if I apply this on a multivalued field. >> Is this a bug or logical :-) ? If the latter one is the case, would >> anybody help me to understand this? >> >> Regards, >> Peter. >> >> [1] >> >> http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html >> >> [2] >> org.apache.solr.common.SolrException: undefined field !{ex=cars}cars >>at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1077) >>at >> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:226) >>at >> >> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283) >>at >> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166) >>at >> >> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) >>at >> >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) >>at >> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) >>at >> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336) >>at >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239) >> >> >> > -- Free your timetabling! http://timefinder.sourceforge.net/
Re: Which Solr to use?
On Mon, May 17, 2010 at 8:22 PM, Sixten Otto wrote: > - Plunge ahead with the trunk, and hope that things stabilize by a few > months from now, when we'd be hoping to go live on one of our biggest > client sites. > - Go with the last 1.5 code, knowing that the features we want are in > there, and hope we don't run into anything majorly broken. > - Stick with 1.4, and just accept the necessity of needing to push > content to the HTTP interface. > Of course this is really up to you, but personally I would not recommend using the trunk (slated to become 4.0) and hope that it stabilizes. Some discussions/voting happened and the trunk is intended to be ... more like a normal trunk. If you need features not in an official release, and are looking for a codebase with updated features, I would recommend instead considering: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/ I am sure someone will disagree, but its my opinion that this 3.x release branch is actually more stable than you might think, it gets all the bugfixes and "safe features" from the trunk, but nothing really risky or scary. So for example, it gets a lot of bugfixes and cleanups, and gets things like improvements to spatial and new analyzers, but doesn't get the really risky stuff like flexible indexing changes from Lucene. https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/CHANGES.txt https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/CHANGES.txt https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/contrib/CHANGES.txt -- Robert Muir rcm...@gmail.com
Re: how to achieve filters
> I am using "dismax" query to fetch docs from solr where I > have set some > boost to the each fields, > > > > If I search for query "Rock" I get following docs with some > boost value > which I have specified, > > > > > 19.494072 > 120 > mp3 > Rock > 1 > st name 1 > > > 19.494052 > 248 > aac+ > Rock > 2 > st name 2 > > > 19.494042 > 127 > aac+ > Rock > 3 > st name 3 > > > 19.494032 > 256 > mp3 > Rock > 4 > st name 5 > > > > I am looking for something below What is the best way to > achieve them ? With filter queries. fq= > 1. Query=rock where content= mp3 where it should return > only first and > last docs where content=mp3 Assuming that content is string typed. q=rock&fq={!field f=content}mp3 > 2. Query=rock where bitrate<128 where it should return > only first and third docs where bitrate<128 &q=rock&fq:bitrate:[* TO 128] for this bitrate field must be tint type.
Re: how to achieve filters
Am 18.05.2010 16:54, schrieb Ahmet Arslan: >> 2. Query=rock where bitrate<128 where it should return >> only first and third docs where bitrate<128 > > &q=rock&fq:bitrate:[* TO 128] for this bitrate field must be tint type. > &q=rock&fq:bitrate:[* TO 127] would be better, because bitrate should be lower than 128. BTW, a bitrate of 127 is interesting... @Prakash: See http://wiki.apache.org/solr/SolrFacetingOverview
RE: how to achieve filters
Thanks much Ahmet, Yep content is string, and bitrate is int. I am digging more now Can we combine both the scenarios. q=rock&fq={!field f=content}mp3 q=rock&fq:bitrate:[* TO 128] Say if I want only mp3 from 0 to 128 Regards Prakash -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Tuesday, May 18, 2010 8:24 PM To: solr-user@lucene.apache.org Subject: Re: how to achieve filters > I am using "dismax" query to fetch docs from solr where I have set > some boost to the each fields, > > > > If I search for query "Rock" I get following docs with some boost > value which I have specified, > > > > > 19.494072 > 120 > mp3 > Rock > 1 > st name 1 > > > 19.494052 > 248 > aac+ > Rock > 2 > st name 2 > > > 19.494042 > 127 > aac+ > Rock > 3 > st name 3 > > > 19.494032 > 256 > mp3 > Rock > 4 > st name 5 > > > > I am looking for something below What is the best way to achieve them > ? With filter queries. fq= > 1. Query=rock where content= mp3 where it should return only first and > last docs where content=mp3 Assuming that content is string typed. q=rock&fq={!field f=content}mp3 > 2. Query=rock where bitrate<128 where it should return only first and > third docs where bitrate<128 &q=rock&fq:bitrate:[* TO 128] for this bitrate field must be tint type.
Re: Recommended MySQL JDBC driver
On 5/14/2010 12:40 PM, Shawn Heisey wrote: I downgraded to 5.0.8 for testing. Initially, I thought it was going to be faster, but it slows down as it gets further into the index. It now looks like it's probably going to take the same amount of time. On the server timeout thing - that's a setting you'd have to put in my.ini or my.cfg, there may also be a way to change it on the fly without restarting the server. I suspect that when you are running a multiple query setup like yours, it opens multiple connections, and when one of them is busy doing some work, the others are idle. That may be related to the timeout with the older connector version. On my setup, I only have one query that retrieves records, so I'm probably not going to run into that. I could be wrong about how it works - you can confirm or refute this idea by looking at SHOW PROCESSLIST on your MySQL server while it's working. I was having no trouble with the 5.0.8 connector on 1.5-dev build 922440M, but then I upgraded the test machine to the latest 4.0 from trunk, and ran into the timeout issue you described, so I am going back to the 5.1.12 connector. I just saw the message on the list about branch_3x in SVN, which looks like a better option than trunk. Shawn
RE: how to achieve filters
> Yep content is string, and bitrate is int. bitrate should be trie based tint, not int, for range queries work correctly. > I am digging more now Can we combine both the scenarios. > > q=rock&fq={!field f=content}mp3 > q=rock&fq:bitrate:[* TO 128] > > Say if I want only mp3 from 0 to 128 You can append filter queries (fq) as many as you want. &q=rock&fq={!field f=content}mp3&fq=bitrate:[* TO 128]
RE: how to achieve filters
Hey q=rock&fq:bitrate:[* TO 128] bitrate is int This also return docs with more then 128 bitrate, Is there something I am doing wrong Regards prakash -Original Message- From: Doddamani, Prakash [mailto:prakash.doddam...@corp.aol.com] Sent: Tuesday, May 18, 2010 8:44 PM To: solr-user@lucene.apache.org Subject: RE: how to achieve filters Thanks much Ahmet, Yep content is string, and bitrate is int. I am digging more now Can we combine both the scenarios. q=rock&fq={!field f=content}mp3 q=rock&fq:bitrate:[* TO 128] Say if I want only mp3 from 0 to 128 Regards Prakash -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Tuesday, May 18, 2010 8:24 PM To: solr-user@lucene.apache.org Subject: Re: how to achieve filters > I am using "dismax" query to fetch docs from solr where I have set > some boost to the each fields, > > > > If I search for query "Rock" I get following docs with some boost > value which I have specified, > > > > > 19.494072 > 120 > mp3 > Rock > 1 > st name 1 > > > 19.494052 > 248 > aac+ > Rock > 2 > st name 2 > > > 19.494042 > 127 > aac+ > Rock > 3 > st name 3 > > > 19.494032 > 256 > mp3 > Rock > 4 > st name 5 > > > > I am looking for something below What is the best way to achieve them > ? With filter queries. fq= > 1. Query=rock where content= mp3 where it should return only first and > last docs where content=mp3 Assuming that content is string typed. q=rock&fq={!field f=content}mp3 > 2. Query=rock where bitrate<128 where it should return only first and > third docs where bitrate<128 &q=rock&fq:bitrate:[* TO 128] for this bitrate field must be tint type.
RE: how to achieve filters
Thanks Ahmet, Let me try these options Regards Prakash -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Tuesday, May 18, 2010 9:06 PM To: solr-user@lucene.apache.org Subject: RE: how to achieve filters > Yep content is string, and bitrate is int. bitrate should be trie based tint, not int, for range queries work correctly. > I am digging more now Can we combine both the scenarios. > > q=rock&fq={!field f=content}mp3 > q=rock&fq:bitrate:[* TO 128] > > Say if I want only mp3 from 0 to 128 You can append filter queries (fq) as many as you want. &q=rock&fq={!field f=content}mp3&fq=bitrate:[* TO 128]
RE: how to achieve filters
> q=rock&fq:bitrate:[* TO 128] > > bitrate is int > This also return docs with more then 128 bitrate, Is there > something I am doing wrong If you are using solr 1.4.0 you need to use
'Minimum Should Match' on subquery level
Hi All I need to use Lucene's `minimum number should match` option of BooleanQuery on Solr. Actually I need to do the same as DisMaxRequestHandler's `mm` parameter does but to use it on subquery level, i.e. I have complex query which consists of several Boolean subqueries and I need to specify different 'minimum number should match' threshold for each of such sub-queries. Can somebody advice me how can I do it with Solr? Thanks in advance, Myron
Storing RandomSortField
Hi guys, Is there any way to mak a RandomSortField be stored? I'm trying to do it for debugging purposes, My intention is to take a look at the values that are stored there to determine the sorting that is being applied to the results. I tried to make it a stored field as: And also tried to create another text field, copying the result from the random field like this: Neither of the approaches worked. Is there any restriction on this kind of field that prevents it from being displayed in the results? Thanks, Alexandre
Re: Autosuggest
: So there is no generally accepted preferred way to do auto-suggest? there are many generally accepted and preferred ways to do auto-suggest -- it all comes down to specifics goals and needs. for example: using the TermsComponent is really simple to setup if you want your suggestions to come from a single field of your index and be in a simple ordering -- but if you want the suggested terms to be limited based on other criteria, or if you want to influence hte ordering, by other things you need to use a complicated solution (like Facets). For people with really tricky requirements (like ordering the results by a custome rule) it can even make sense to setup a special core where each document corrisponds to a "term" to suggest, with a text field containing ngrams, and other fields containing numeric values that you use in boost functions. there are lots of options -- all of them equally accepted -- prefrence is based on needs. -Hoss
Re: disable caches in real time
: I want to know if there is any approach to disable caches in a specific core : from a multicore server. only via hte config. : I have a multicore server where the core0 will be listen to the queries and : other core (core1) that will be replicated from a master server. Once the : replication has been done, i will swap the cores. My point is that i want to : disable the caches in the core that is in charge of the replication to save : memory in the machine. that seems bizarely complicated -- replication can work against a "live" core, no need to do the swap yourself, the replicationHandler takes care of this for your transparently (ie: you have one core, replicating from a master -- the old index will be searched by users, and have caches, and when the new version of the index is ready, the replication handler will swap the *index* in that core (but the core itself never changes) ... it can even autowarm the caches on the new index for you before the swap if you configure it that way. -Hoss
Re: Which Solr to use?
On Tue, May 18, 2010 at 10:40 AM, Robert Muir wrote: > Some discussions/voting happened and the trunk is intended to be ... > more like a normal trunk. > > If you need features not in an official release, and are looking for a > codebase with updated features, I would recommend instead considering: > > http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/ So features are being actively added to / code rearranged in trunk/4.0, with some of the work being back-ported to this branch to form a stable 3.1 release? Is that accurate? Is there any thinking about when that might drop (beyond the quite understandable "when it's done")? Or, perhaps more reasonably, when it might freeze? (I've done some casual searching of the site + list archives without finding this information, but by all means if there's a thread I should go read to bone up on this stuff, a link is all I need.) Sixten
TikaEntityProcessor on Solr 1.4?
Sorry to repeat this question, but I realized that it probably belonged in its own thread: The TikaEntityProcessor class that enables DataImportHandler to process business documents was added after the release of Solr 1.4, along with some other changes (like the binary DataSources) to support it. Obviously, there hasn't been an official release of Solr since then. Has anyone tried back-porting those changes to Solr 1.4? (I do see that the question was asked last month, without any response: http://www.lucidimagination.com/search/document/5d2d25bc57c370e9) The patches for these issues don't seem all that complex or pervasive, but it's hard for me (as a Solr n00b) to tell whether this is really all that's involved: https://issues.apache.org/jira/browse/SOLR-1583 https://issues.apache.org/jira/browse/SOLR-1358 Sixten
Re: Autosuggest
Thanks for the info Hoss. I will probably need to go with one of the more complicated solutions. Is there any online documentation for this task? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p827329.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Merge Search for Suggestion. Keywords and Products ?!
: i searching for a way to merge my two different autocompletion in one : request. : thats what i want: you could copyField your two differnet ields into one destination, and then use a single strategy on that new field, but ... : - suggestion for Product Names (EdgNGram) : - suggestion for keywords. (TermsComponent with Shingle) ...to get two differnet approaches like that, you'll need a more complex solution. one example is to configure a special core just for autosuggest, where each "document" corrisponds to a specific "term" you want to suggest, and then other fields contain the EdgeNGrams for productnames and Shingles for other terms -- then you just search against this core using your input. -Hoss
Embedded Server, Caching, Stats page updates
I just switched from using CommonHttpSolrServer to EmbeddedSolrServer and the performance surprisingly deteriorated. I was expecting an improvement so in my confusion i went to the stats page and noticed that the caches were no longer getting hit. The embedded server however should still use IndexSearcher from Lucene (which is what the caches are supposed to be related to). Is there some kind of property that needs to be added or adjusted for embedded server to use cache? Should I create my own cache and wipe the rest out entirely? Should I remove the httpcache from the configuration since i'll no longer be accessing the service remotely? How accurate is the stats page and is the error actually coming from it rather than the actual backend? Thank you beforehand, Tony
Re: Embedded Server, Caching, Stats page updates
: I just switched from using CommonHttpSolrServer to EmbeddedSolrServer and : the performance surprisingly deteriorated. I was expecting an improvement so : in my confusion i went to the stats page and noticed that the caches were no : longer getting hit. The embedded server however should still use : IndexSearcher from Lucene (which is what the caches are supposed to be : related to). The way you phrased that paragraph makes me think that one of us doesn't understand what exactly you did when you "switched" ... When using CommonsHttpSolrServer in some application you write, you are talk to a remote server that is running Solr. when you use EmbeddedSolrServer, you are running solr directly within the application that you are writing. Now for starters: if the remote server you were running solr on is more powerful then the local machine you are running your java application on, that alone could explain some performance differences (likewise for JVM settings). Most importantly: when running solr embedded in your application, there is no "stats.jsp" page for you to look at -- because solr is no longer running in a servlet container. so if you are seeing stats on your solr server that say your caches aren't being hit, the reason is because the server isn't being hit at all. : Is there some kind of property that needs to be added or adjusted for : embedded server to use cache? Should I create my own cache and wipe the rest When running an embedded solr server, the filterCache and queryResultCache will still be used. the settings in the solrconfig.xml you specify when initializing the SolrCore will be honored. you can see use JMX to monitor those cache hit rates (assuming you have JMX enabled for your application, and the appropriate setting is in your solrconfig.xml) -Hoss
Re: Long startup phase
There are no .pyc files in Solr. It's an all-Java app, no Python. Run 'jps' to get a list of Java processes running. Then use 'jhat' or 'jstat' to examine the program. 'netstat -an | fgrep :8983' will give you a list of all sockets in use by Solr, both client and server. On Tue, May 18, 2010 at 7:29 AM, Andreas Jung wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi there, > > trying to deploy Solr 1.4/JDK 1.6/CentOS Linux 64bit > on a new production server. > > Starting Solr takes very long on this machine. In particular > it seems to hang for a minute or two showing only this on the > console: > > [...@db01 backend_buildout]$ bin/solr-instance fg > 2010-05-18 16:22:51.507::INFO: Logging to STDERR via > org.mortbay.log.StdErrLog > 2010-05-18 16:22:51.585::INFO: jetty-6.1.3 > > Using strace shows that the process since to be waiting aka hanging > in the wait4() call below. Any idea? > > Andreas > > open("/usr/local/lib/python2.6/plat-linux2/cStringIO.py", O_RDONLY) = -1 > ENOENT (No such file or directory) > open("/usr/local/lib/python2.6/plat-linux2/cStringIO.pyc", O_RDONLY) = > - -1 ENOENT (No such file or directory) > stat("/usr/local/lib/python2.6/lib-tk/cStringIO", 0x7fff292873f0) = -1 > ENOENT (No such file or directory) > open("/usr/local/lib/python2.6/lib-tk/cStringIO.so", O_RDONLY) = -1 > ENOENT (No such file or directory) > open("/usr/local/lib/python2.6/lib-tk/cStringIOmodule.so", O_RDONLY) = > - -1 ENOENT (No such file or directory) > open("/usr/local/lib/python2.6/lib-tk/cStringIO.py", O_RDONLY) = -1 > ENOENT (No such file or directory) > open("/usr/local/lib/python2.6/lib-tk/cStringIO.pyc", O_RDONLY) = -1 > ENOENT (No such file or directory) > stat("/usr/local/lib/python2.6/lib-old/cStringIO", 0x7fff292873f0) = -1 > ENOENT (No such file or directory) > open("/usr/local/lib/python2.6/lib-old/cStringIO.so", O_RDONLY) = -1 > ENOENT (No such file or directory) > open("/usr/local/lib/python2.6/lib-old/cStringIOmodule.so", O_RDONLY) = > - -1 ENOENT (No such file or directory) > open("/usr/local/lib/python2.6/lib-old/cStringIO.py", O_RDONLY) = -1 > ENOENT (No such file or directory) > open("/usr/local/lib/python2.6/lib-old/cStringIO.pyc", O_RDONLY) = -1 > ENOENT (No such file or directory) > stat("/usr/local/lib/python2.6/lib-dynload/cStringIO", 0x7fff292873f0) = > - -1 ENOENT (No such file or directory) > open("/usr/local/lib/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 5 > fstat(5, {st_mode=S_IFREG|0755, st_size=50484, ...}) = 0 > open("/usr/local/lib/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 6 > read(6, > "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\31\0\0\0\0\0\0"..., > 832) = 832 > fstat(6, {st_mode=S_IFREG|0755, st_size=50484, ...}) = 0 > mmap(NULL, 2114584, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, > 0) = 0x2acde2995000 > mprotect(0x2acde2999000, 2093056, PROT_NONE) = 0 > mmap(0x2acde2b98000, 8192, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x3000) = 0x2acde2b98000 > close(6) = 0 > close(5) = 0 > close(4) = 0 > getrlimit(RLIMIT_NOFILE, {rlim_cur=1, rlim_max=1}) = 0 > close(3) = 0 > pipe([3, 4]) = 0 > fcntl(4, F_GETFD) = 0 > fcntl(4, F_SETFD, FD_CLOEXEC) = 0 > clone(child_stack=0, > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, > child_tidptr=0x2acdde8a9a50) = 21603 > close(4) = 0 > mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x2acde2b9a000 > read(3, "", 1048576) = 0 > mremap(0x2acde2b9a000, 1052672, 4096, MREMAP_MAYMOVE) = 0x2acde2b9a000 > close(3) = 0 > munmap(0x2acde2b9a000, 4096) = 0 > wait4(21603, 0x7fff2928f6f4, WNOHANG, NULL) = 0 > wait4(21603, 2010-05-18 16:21:04.731::INFO: > > Logging to STDERR via org.mortbay.log.StdErrLog > 2010-05-18 16:21:04.811::INFO: jetty-6.1.3 > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.10 (Darwin) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAkvypD8ACgkQCJIWIbr9KYz5BwCfSUgefA5fWco2grsIC4nE+4O9 > neYAoJpx5J/s6wu89CG5TdKYCZqts4u1 > =IXgz > -END PGP SIGNATURE- > -- Lance Norskog goks...@gmail.com
Deduplication
Basically for some uses cases I would like to show duplicates for other I wanted them ignored. If I have overwriteDupes=false and I just create the dedup hash how can I query for only unique hash values... ie something like a SQL group by. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Deduplication-tp828016p828016.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Connection Pool
Do multiple calls with your client program. So, curl _file1_ & curl _file2_ & curl _file3_ & curl _file4_ & wait; wait; wait; wait On Sun, May 16, 2010 at 8:20 AM, Monmohan Singh wrote: > Sorry for hijacking the thread, but I have an additional question > Is there a way to achieve similar performance (SUSS like) when targeting > extract request handler (/update/extract)? > I guess one way can be to extract content on the client side and then use > SUSS to send update request but then extraction needs to be taken care of > locally in an asynchronous/batch manner. > Regards > Monmohan > > On Sun, May 16, 2010 at 5:19 AM, Lance Norskog wrote: > >> Connection spooling is specified by the underlying apache commons >> connection manager when you create the Server. >> >> The SUSS does socket pooling by default and is the preferred way to do >> concurrent indexing. There are some quirks in the Server >> implementation set, and SUSS avoids them. Unless you are willing to >> root around in the SolrJ Server code and understand exactly how it >> works, stay with the SUSS. >> >> On Fri, May 14, 2010 at 6:44 AM, gabriele renzi wrote: >> > On Fri, May 14, 2010 at 3:35 PM, Anderson vasconcelos >> > wrote: >> >> Hi >> >> I wanna to know if has any connection pool client to manage the >> connections >> >> with solr. In my system, we have a lot of concurrency index request. I >> cant >> >> shared my connection, i need to create one per transaction. But if i >> create >> >> one per transaction, i think the performance will down. >> >> >> >> How you resolve this problem? >> > >> > The commonsHttpSolrServer class does connection pooling, and IIRC also >> > the StreamingUpdateSolrServer. >> > >> > >> > >> > -- >> > blog en: http://www.riffraff.info >> > blog it: http://riffraff.blogsome.com >> > >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> > -- Lance Norskog goks...@gmail.com