Problem instantiating CommonsHttpSolrServer using solrj
Hi all, I'm trying to use solrj for indexing in solr, but when I try to instantiate the server, using : SolrServer server = new CommonsHttpSolrServer("http://localhost:8080/solr";); I get the following runtime error: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/solr/client/solrj/SolrServerException Caused by: java.lang.ClassNotFoundException: org.apache.solr.client.solrj.SolrServerException at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) I am following the this link : http://wiki.apache.org/solr/Solrj ,and have included all the jar files specified there, in the classpath. Do help me out with this, thanks in advance Bijeet
Delta-import with solrj client
Greetings. I have a solrj client for fetching data from database. I am using delta-import for fetching data. If a column is changed in database using timestamp with delta-import i get the latest column indexed but there are duplicate values in the index similar to the column but the data is older. This works with cleaning the index but i want to update the index without cleaning it. Is there a way to just update the index with the updated column without having duplicate values. Appreciate for any feedback. Hando -- View this message in context: http://lucene.472066.n3.nabble.com/Delta-import-with-solrj-client-tp1085763p1085763.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr query result not read the latest xml file
Hi, Yes, this is normal behavior. This is because Solr is *document* based, it does not know about *files*. What happens here is that your source database (or whatever) has had deletinons within this category in addition to updates, and you need to relay those to Solr. The best way to integrate with your source system is through some connector which picks up deletes as well as adds (updates is just a special case of add). If your source data is in a database, have a look at DataImportHandler which can be setup to do things like this. If your source data is on files on a file system only, you'll have to write some scripts which takes care of all of this, e.g. by first issuing the delete and then the add (tip: try -Dcommit=no on the delete request and -Dcommit=yes on the following add to avoid temporary loss of data). You need to think about what happens if a whole category is deleted. How would you know by simply looking at the file system? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 04.10, e8en wrote: > > thanks for you response Jan, > I just knew that the post.jar only an example tool > so what should I use if not post.jar for production? > > btw, I already tried using this command: > java -Durl=http://localhost:8983/search/update -jar post.jar cat_817.xml > > and IT WORKS !! > the cat_817.xml reflected directly in the solr query after I commit the > cat_817.xml, this is the url: > http://localhost:8983/search/select/?q=ITEM_CAT:817&version=2.2&start=0&rows=10&indent=on > > the problem is it works if the old xml contain less doc than the new xml, > for example if the old cat_817.xml contain 2 doc and the new cat_817.xml > contain 10 doc then I just have to re-index (java > -Durl=http://localhost:8983/search/update -jar post.jar cat_817.xml) and it > the query result will have correct result (10 doc), but it doesn't work vice > versa. > If the old cat_817.xml contain 10 doc and the new cat_817.xml contain 2 doc, > then I have to delete the index first (java -Ddata=args -Dcommit=yes -jar > post.jar "ITEM_CAT:817") and re-index it > (java -Durl=http://localhost:8983/search/update -jar post.jar cat_817.xml) > to make the query result updated (2 doc). > > is it a normal process or something wrong with my solr? > > once again thanks again Jan, your help really make my day brighter :) > and I believe your answer will help many solr newbie especially me > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr-query-result-not-read-the-latest-xml-file-tp1066785p1081802.html > Sent from the Solr - User mailing list archive at Nabble.com.
timestamp field
Hi, I have on my schema This field is returned as 2010-08-11T10:11:03.354Z For an article added at 2010-08-11T11:11:03.354Z! And the server has the time of 2010-08-11T11:11:03.354Z... This is a w2003 server using solr 1.4. Any guess of what could be wrong here? Thanks, Frederico
Re: timestamp field
Hi, Which time zone are you located in? Do you have DST? Solr uses UTC internally for dates, which means that "NOW" will be the time in London right now :) Does that appear to be right 4 u? Also see this thread: http://search-lucene.com/m/hqBed2jhu2e2/ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 13.02, Frederico Azeiteiro wrote: > Hi, > > > > I have on my schema > > default="NOW" /> > > > > This field is returned as > > 2010-08-11T10:11:03.354Z > > > > For an article added at 2010-08-11T11:11:03.354Z! > > > > And the server has the time of 2010-08-11T11:11:03.354Z... > > > > This is a w2003 server using solr 1.4. > > > > Any guess of what could be wrong here? > > > > Thanks, > > Frederico > > > > >
RE: timestamp field
Hi Jan, Dah, I didn't know that :( I always thought it used the servertime. Anyway,just out of curiosity, the hour is UTC but NOT the time in London right now. London is UTC+1 (same as here in Portugal) :). So, London solr users should have the same "problem". Well, I must be careful when using this field. Thanks for your answer, Frederico -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: quarta-feira, 11 de Agosto de 2010 12:17 To: solr-user@lucene.apache.org Subject: Re: timestamp field Hi, Which time zone are you located in? Do you have DST? Solr uses UTC internally for dates, which means that "NOW" will be the time in London right now :) Does that appear to be right 4 u? Also see this thread: http://search-lucene.com/m/hqBed2jhu2e2/ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 13.02, Frederico Azeiteiro wrote: > Hi, > > > > I have on my schema > > default="NOW" /> > > > > This field is returned as > > 2010-08-11T10:11:03.354Z > > > > For an article added at 2010-08-11T11:11:03.354Z! > > > > And the server has the time of 2010-08-11T11:11:03.354Z... > > > > This is a w2003 server using solr 1.4. > > > > Any guess of what could be wrong here? > > > > Thanks, > > Frederico > > > > >
Re: Delta-import with solrj client
Short answer is no, there isn't a way. Solr doesn't have the concept of 'Update' to an indexed document. You need to add the full document (all 'columns') each time any one field changes. If doing that in your DataImportHandler logic is difficult you may need to write a separate Update Service that does: 1) Read UniqueID, UpdatedColumn(s) from database 2) Using UniqueID Retrieve document from Solr 3) Add/Update field(s) with updated column(s) 4) Add document back to Solr Although, if you use DIH to do a full import, using the same query in your Delta-Import to get the whole document shouldn't be that difficult. -- View this message in context: http://lucene.472066.n3.nabble.com/Delta-import-with-solrj-client-tp1085763p1086173.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: timestamp field
For what it's worth, London and the rest of the UK is currently observing British Summer Time (called Daylight Savings Time in other parts of the world) which is why we appear to be UTC+1 between the last Sunday in March and the last Sunday in October. Mark On 11 Aug 2010, at 12:36 pm, Frederico Azeiteiro wrote: Hi Jan, Dah, I didn't know that :( I always thought it used the servertime. Anyway,just out of curiosity, the hour is UTC but NOT the time in London right now. London is UTC+1 (same as here in Portugal) :). So, London solr users should have the same "problem". Well, I must be careful when using this field. Thanks for your answer, Frederico -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: quarta-feira, 11 de Agosto de 2010 12:17 To: solr-user@lucene.apache.org Subject: Re: timestamp field Hi, Which time zone are you located in? Do you have DST? Solr uses UTC internally for dates, which means that "NOW" will be the time in London right now :) Does that appear to be right 4 u? Also see this thread: http://search-lucene.com/m/hqBed2jhu2e2/ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 13.02, Frederico Azeiteiro wrote: Hi, I have on my schema This field is returned as 2010-08-11T10:11:03.354Z For an article added at 2010-08-11T11:11:03.354Z! And the server has the time of 2010-08-11T11:11:03.354Z... This is a w2003 server using solr 1.4. Any guess of what could be wrong here? Thanks, Frederico -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: Delta-import with solrj client
Hi, Make sure you use a proper "ID" field, which does *not* change even if the content in the database changes. In this way, when your delta-import fetches changed rows to index, they will update the existing rows in your index. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 12.49, Hando420 wrote: > > Greetings. I have a solrj client for fetching data from database. I am using > delta-import for fetching data. If a column is changed in database using > timestamp with delta-import i get the latest column indexed but there are > duplicate values in the index similar to the column but the data is older. > This works with cleaning the index but i want to update the index without > cleaning it. Is there a way to just update the index with the updated column > without having duplicate values. Appreciate for any feedback. > > Hando > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Delta-import-with-solrj-client-tp1085763p1085763.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 - stats page slow
FYI, I opened https://issues.apache.org/jira/browse/SOLR-2036 for this. -Yonik http://www.lucidimagination.com On Tue, Aug 10, 2010 at 8:35 PM, entdeveloper wrote: > > Apologies if this was resolved, but we just deployed Solr 1.4.1 and the stats > page takes over a minute to load for us as well and began causing > OutOfMemory errors so we've had to refrain from hitting the page. From what > I gather, it is the fieldCache part that's causing it. > > Was there ever an official fix or recommendation on how to disable the stats > page from calculating the fieldCache entries? If we could just ignore it, I > think we'd be good to go since I find this page very useful otherwise.
DataImportHandler in Solr 1.4.1: exception handling in FileListEntityProcessor
Hi folks, why does FileListEntityProcessor ignores onError="continue" and abort indexing if a directory or a file does not exist? I'm using both XPathEntityProcessor and FileListEntityProcessor with onError set to continue. In case a directory or file is not present an Exception is thrown and indexing is stopped immediately. Below you can find a stack trace that is generated in case the directory /home/doe/foo does not exist: SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' value: /home/doe/foo/bar.xml is not a directory Processing Document # 3 at org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:122) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) How should I configure both processors so that missing directories and files are ignored and the indexing process does not stop immediately? Best, Sascha
Re: DataImportHandler in Solr 1.4.1: exception handling in FileListEntityProcessor
Sorry, there was a mistake in the stack trace. The correct one is: SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' value: /home/doe/foo is not a directory Processing Document # 3 at org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:122) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) -Sascha On 11.08.2010 15:18, Sascha Szott wrote: Hi folks, why does FileListEntityProcessor ignores onError="continue" and abort indexing if a directory or a file does not exist? I'm using both XPathEntityProcessor and FileListEntityProcessor with onError set to continue. In case a directory or file is not present an Exception is thrown and indexing is stopped immediately. Below you can find a stack trace that is generated in case the directory /home/doe/foo does not exist: SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' value: /home/doe/foo/bar.xml is not a directory Processing Document # 3 at org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:122) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) How should I configure both processors so that missing directories and files are ignored and the indexing process does not stop immediately? Best, Sascha
Re: Solr Doc Lucene Doc !?
i have a question about the solr index mechanism with DIH ... i try to understand how solr index an doc, and on wich code-elements, solr using lucene. thats my stand: DIH is using the SolrWriter to add an doc. To create an SolrInoputDocument SolrWriter uses the addUpdateCommand, This Command and Doc is put in the UpdateRequestProcessorChain. In this Chain solr creates an LuceneDoc with DocumentBuilder and put it back into the chain !?!? is this right ? Then the UpdateHandler getting the UpdateChain and managed the index changes !? So. i dont understand, how works the updatehandler. can anyone give me some tipps ? SolrIndexWriter is using from UpdateHandler and SolrindexWriter use IndexWriter from Lucene ? thx for your help =)=) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Doc-Lucene-Doc-tp995922p1088334.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: PDF file
Thanks so much for your help! I got "Remote Streaming is disabled" error. Would you please tell me if I miss something? Thanks, -Original Message- From: Jayendra Patil [mailto:jayendra.patil@gmail.com] Sent: Tuesday, August 10, 2010 8:51 PM To: solr-user@lucene.apache.org Subject: Re: PDF file Try ... curl " http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?stream.file= /pub2009001.pdf&literal.id=777045&commit=true" stream.file - specify full path literal. - specify any extra params if needed Regards, Jayendra On Tue, Aug 10, 2010 at 4:49 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] < xiao...@mail.nlm.nih.gov> wrote: > Thanks so much for your help! I tried to index a pdf file and got the > following. The command I used is > > curl ' > http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=text&map.stream_name=id&commit=true' > -F "fi...@pub2009001.pdf" > > Did I do something wrong? Do I need modify anything in schema.xml or other > configuration file? > > > [xiao...@lhcinternal lhc]$ curl ' > http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=text&map.stream_name=id&commit=true' > -F "fi...@pub2009001.pdf" > > > > Error 404 > > HTTP ERROR: 404NOT_FOUND > RequestURI=/solr/lhc/update/extracthttp://jetty.mortbay.org/";>Powered by Jetty:// > > > > > > > > > > > > > > > > > > > > > > > *** > > -Original Message- > From: Sharp, Jonathan [mailto:jsh...@coh.org] > Sent: Tuesday, August 10, 2010 4:37 PM > To: solr-user@lucene.apache.org > Subject: RE: PDF file > > Xiaohui, > > You need to add the following jars to the lib subdirectory of the solr > config directory on your server. > > (path inside the solr 1.4.1 download) > > /dist/apache-solr-cell-1.4.1.jar > plus all the jars in > /contrib/extraction/lib > > HTH > > -Jon > > From: Ma, Xiaohui (NIH/NLM/LHC) [C] [xiao...@mail.nlm.nih.gov] > Sent: Tuesday, August 10, 2010 11:57 AM > To: 'solr-user@lucene.apache.org' > Subject: RE: PDF file > > Does anyone have any experience with PDF file? I really appreciate your > help! > Thanks so much in advance. > > -Original Message- > From: Ma, Xiaohui (NIH/NLM/LHC) [C] > Sent: Tuesday, August 10, 2010 10:37 AM > To: 'solr-user@lucene.apache.org' > Subject: PDF file > > I have a lot of pdf files. I am trying to import pdf files to solr and > index them. I added ExtractingRequestHandler to solrconfig.xml. > > Please tell me if I need download some jar files. > > In the Solr1.4 Enterprise Search Server book, use following command to > import a mccm.pdf. > > curl ' > http://localhost:8983/solr/solr-home/update/extract?map.content=text&map.stream_name=id&commit=true' > -F "fi...@mccm.pdf" > > Please tell me if there is a way to import pdf files from a directory. > > Thanks so much for your help! > > > > - > SECURITY/CONFIDENTIALITY WARNING: > This message and any attachments are intended solely for the individual or > entity to which they are addressed. This communication may contain > information that is privileged, confidential, or exempt from disclosure > under applicable law (e.g., personal health information, research data, > financial information). Because this e-mail has been sent without > encryption, individuals other than the intended recipient may be able to > view the information, forward it to others or tamper with the information > without the knowledge or consent of the sender. If you are not the intended > recipient, or the employee or person responsible for delivering the message > to the intended recipient, any dissemination, distribution or copying of the > communication is strictly prohibited. If you received the communication in > error, please notify the sender immediately by replying to this message and > deleting the message and any accompanying files from your system. If, due to > the security risks, you do not wish to receive further communications via > e-mail, please reply to this message and inform the sender that you do not > wish to receive further e-mail from the sender. > > - > >
Re: Solr Doc Lucene Doc !?
oh, i see that i mixed DIH classes with other Solr classes ^^ -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Doc-Lucene-Doc-tp995922p1088738.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: PDF file
Thanks, I knew how to enable Streaming. But I got another error, ERROR:unknown field 'metadata_trapped'. Does anyone know how to match up with SolrCell metadata? I found the following in schema.xml. I don't know how to make changes for PDF. I really appreciate your help! Thanks, -Original Message- From: Ma, Xiaohui (NIH/NLM/LHC) [C] Sent: Wednesday, August 11, 2010 10:36 AM To: solr-user@lucene.apache.org Cc: 'jayendra.patil@gmail.com' Subject: RE: PDF file Thanks so much for your help! I got "Remote Streaming is disabled" error. Would you please tell me if I miss something? Thanks, -Original Message- From: Jayendra Patil [mailto:jayendra.patil@gmail.com] Sent: Tuesday, August 10, 2010 8:51 PM To: solr-user@lucene.apache.org Subject: Re: PDF file Try ... curl " http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?stream.file= /pub2009001.pdf&literal.id=777045&commit=true" stream.file - specify full path literal. - specify any extra params if needed Regards, Jayendra On Tue, Aug 10, 2010 at 4:49 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] < xiao...@mail.nlm.nih.gov> wrote: > Thanks so much for your help! I tried to index a pdf file and got the > following. The command I used is > > curl ' > http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=text&map.stream_name=id&commit=true' > -F "fi...@pub2009001.pdf" > > Did I do something wrong? Do I need modify anything in schema.xml or other > configuration file? > > > [xiao...@lhcinternal lhc]$ curl ' > http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=text&map.stream_name=id&commit=true' > -F "fi...@pub2009001.pdf" > > > > Error 404 > > HTTP ERROR: 404NOT_FOUND > RequestURI=/solr/lhc/update/extracthttp://jetty.mortbay.org/";>Powered by Jetty:// > > > > > > > > > > > > > > > > > > > > > > > *** > > -Original Message- > From: Sharp, Jonathan [mailto:jsh...@coh.org] > Sent: Tuesday, August 10, 2010 4:37 PM > To: solr-user@lucene.apache.org > Subject: RE: PDF file > > Xiaohui, > > You need to add the following jars to the lib subdirectory of the solr > config directory on your server. > > (path inside the solr 1.4.1 download) > > /dist/apache-solr-cell-1.4.1.jar > plus all the jars in > /contrib/extraction/lib > > HTH > > -Jon > > From: Ma, Xiaohui (NIH/NLM/LHC) [C] [xiao...@mail.nlm.nih.gov] > Sent: Tuesday, August 10, 2010 11:57 AM > To: 'solr-user@lucene.apache.org' > Subject: RE: PDF file > > Does anyone have any experience with PDF file? I really appreciate your > help! > Thanks so much in advance. > > -Original Message- > From: Ma, Xiaohui (NIH/NLM/LHC) [C] > Sent: Tuesday, August 10, 2010 10:37 AM > To: 'solr-user@lucene.apache.org' > Subject: PDF file > > I have a lot of pdf files. I am trying to import pdf files to solr and > index them. I added ExtractingRequestHandler to solrconfig.xml. > > Please tell me if I need download some jar files. > > In the Solr1.4 Enterprise Search Server book, use following command to > import a mccm.pdf. > > curl ' > http://localhost:8983/solr/solr-home/update/extract?map.content=text&map.stream_name=id&commit=true' > -F "fi...@mccm.pdf" > > Please tell me if there is a way to import pdf files from a directory. > > Thanks so much for your help! > > > > - > SECURITY/CONFIDENTIALITY WARNING: > This message and any attachments are intended solely for the individual or > entity to which they are addressed. This communication may contain > information that is privileged, confidential, or exempt from disclosure > under applicable law (e.g., personal health information, research data, > financial information). Because this e-mail has been sent without > encryption, individuals other than the intended recipient may be able to > view the information, forward it to others or tamper with the information > without the knowledge or consent of the sender. If you are not the intended > recipient, or the employee or person responsible for delivering the message > to the intended recipient, any dissemination, distribution or copying of the > communication is strictly prohibited. If you received the communication in > error, please notify the sender immediately by replying to this message and > deleting the message and any accompanying files from your system. If, due to > the security risks, you do not wish to receive further communications via > e-mail, please reply to this message and inform the sender that you do not > wish to receive further e-mail from the sender. > > - > >
SolrException log
Hi, we are using solr 1.4.1 in a master-slave setup with replication, requests are loadbalanced to both instances. this is just working fine, but the slave behaves strange sometimes with a "SolrException log" (trace below). We are using 1.4.1 for weeks now, and this has happened only a few times so far, and it only occured on the Slave. The Problem seemed to be gone when we added a cron-job to send a periodic (once a day) to the master, but today it did happen again. The Index contains 55 files right now, after optimize there are only 10. So it seems its a problem when the index is spread among a lot files. The Slave wont ever recover once this Exception shows up, the only thing that helps is a restart. Is this a known issue? Only workaround would be to track the commit-counts and send additional requests after a certain amount of commits, but id prefer solving this problem rather than building a workaround.. Any hints/thoughts on this issue are verry much appreciated, thanks in advance for your help. cheers Bastian. Aug 11, 2010 4:51:58 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=media_id,keyword_1004&sort=priority_1000+desc,+score+desc&ind ent=off&start=0&q=mandant_id:1000+AND+partner_id:1000+AND+active_1000:tr ue+AND+cat_id_path_1000:7231/7258*+AND+language_id:1004&rows=24&version= 2.2} status=500 QTime=2 Aug 11, 2010 4:51:58 PM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.jav a:151) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.j ava:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:112) at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheI mpl.java:461) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:22 4) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430) at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheI mpl.java:445) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:22 4) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430) at org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(Fie ldComparator.java:332) at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringColl ector.setNextReader(TopFieldCollector.java:435) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher. java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.j ava:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:3 41) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent. java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:241) at org.mortbay.jetty.servlet.WebApplicationHandler$CachedChain.doFilter(Web ApplicationHandler.java:821) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationH andler.java:471) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568) at org.mortbay.http.HttpContext.handle(HttpContext.java:1530) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationCon text.java:633) at org.mortbay.http.HttpContext.handle(HttpContext.java:1482) at org.mortbay.http.HttpServer.service(HttpServer.java:909) at org.mortbay.http.HttpConnection.service(HttpConnection.java:820) at org.mortbay.http.ajp.AJP13Connection.handleNext(AJP13Connection.java:295 ) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:837) at org.mortbay.http.ajp.AJP13Listener.handleConnection(AJP13Listener.java:2 12) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
RE: Improve Query Time For Large Index
Hi Peter, Can you give a few more examples of slow queries? Are they phrase queries? Boolean queries? prefix or wildcard queries? If one word queries are your slow queries, than CommonGrams won't help. CommonGrams will only help with phrase queries. How are you using termvectors? That may be slowing things down. I don't have experience with termvectors, so someone else on the list might speak to that. When you say the query time for common terms stays slow, do you mean if you re-issue the exact query, the second query is not faster? That seems very strange. You might restart Solr, and send a first query (the first query always takes a relatively long time.) Then pick one of your slow queries and send it 2 times. The second time you send the query it should be much faster due to the Solr caches and you should be able to see the cache hit in the Solr admin panel. If you send the exact query a second time (without enough intervening queries to evict data from the cache, ) the Solr queryResultCache should get hit and you should see a response time in the .01-5 millisecond range. What settings are you using for your Solr caches? How much memory is on the machine? If your bottleneck is disk i/o for frequent terms, then you want to make sure you have enough memory for the OS disk cache. I assume that http is not in your stopwords. CommonGrams will only help with phrase queries CommonGrams was committed and is in Solr 1.4. If you decide to use CommonGrams you definitely need to re-index and you also need to use both the index time filter and the query time filter. Your index will be larger. Tom -Original Message- From: Peter Karich [mailto:peat...@yahoo.de] Sent: Tuesday, August 10, 2010 3:32 PM To: solr-user@lucene.apache.org Subject: Re: Improve Query Time For Large Index Hi Tom, my index is around 3GB large and I am using 2GB RAM for the JVM although a some more is available. If I am looking into the RAM usage while a slow query runs (via jvisualvm) I see that only 750MB of the JVM RAM is used. > Can you give us some examples of the slow queries? for example the empty query solr/select?q= takes very long or solr/select?q=http where 'http' is the most common term > Are you using stop words? yes, a lot. I stored them into stopwords.txt > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 this looks interesting. I read through https://issues.apache.org/jira/browse/SOLR-908 and it seems to be in 1.4. I only need to enable it via: right? Do I need to reindex? Regards, Peter. > Hi Peter, > > A few more details about your setup would help list members to answer your > questions. > How large is your index? > How much memory is on the machine and how much is allocated to the JVM? > Besides the Solr caches, Solr and Lucene depend on the operating system's > disk caching for caching of postings lists. So you need to leave some memory > for the OS. On the other hand if you are optimizing and refreshing every > 10-15 minutes, that will invalidate all the caches, since an optimized index > is essentially a set of new files. > > Can you give us some examples of the slow queries? Are you using stop words? > > > If your slow queries are phrase queries, then you might try either adding the > most frequent terms in your index to the stopwords list or try CommonGrams > and add them to the common words list. (Details on CommonGrams here: > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2) > > Tom Burton-West > > -Original Message- > From: Peter Karich [mailto:peat...@yahoo.de] > Sent: Tuesday, August 10, 2010 9:54 AM > To: solr-user@lucene.apache.org > Subject: Improve Query Time For Large Index > > Hi, > > I have 5 Million small documents/tweets (=> ~3GB) and the slave index > replicates itself from master every 10-15 minutes, so the index is > optimized before querying. We are using solr 1.4.1 (patched with > SOLR-1624) via SolrJ. > > Now the search speed is slow >2s for common terms which hits more than 2 > mio docs and acceptable for others: <0.5s. For those numbers I don't use > highlighting or facets. I am using the following schema [1] and from > luke handler I know that numTerms =~20 mio. The query for common terms > stays slow if I retry again and again (no cache improvements). > > How can I improve the query time for the common terms without using > Distributed Search [2] ? > > Regards, > Peter. > > > [1] > required="true" /> > > > termVectors="true" termPositions="true" termOffsets="true"/> > > [2] > http://wiki.apache.org/solr/DistributedSearch > > > -- http://karussell.wordpress.com/
Re: how to support "implicit trailing wildcards"
Hi Jan, Seems q=mount OR mount* have different sorting order with q=mount for those documents including mount. Change to q=mount^100 OR (mount?* -mount)^1.0, and test well. Thanks very much! 2010/8/10 Jan Høydahl / Cominvent > Hi, > > You don't need to duplicate the content into two fields to achieve this. > Try this: > > q=mount OR mount* > > The exact match will always get higher score than the wildcard match > because wildcard matches uses "constant score". > > Making this work for multi term queries is a bit trickier, but something > along these lines: > > q=(mount OR mount*) AND (everest OR everest*) > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Training in Europe - www.solrtraining.com > > On 10. aug. 2010, at 09.38, Geert-Jan Brits wrote: > > > you could satisfy this by making 2 fields: > > 1. exactmatch > > 2. wildcardmatch > > > > use copyfield in your schema to copy 1 --> 2 . > > > > q=exactmatch:mount+wildcardmatch:mount*&q.op=OR > > this would score exact matches above (solely) wildcard matches > > > > Geert-Jan > > > > 2010/8/10 yandong yao > > > >> Hi Bastian, > >> > >> Sorry for not make it clear, I also want exact match have higher score > than > >> wildcard match, that is means: if searching 'mount', documents with > 'mount' > >> will have higher score than documents with 'mountain', while 'mount*' > seems > >> treat 'mount' and 'mountain' as same. > >> > >> besides, also want the query to be processed with analyzer, while from > >> > >> > http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F > >> , > >> Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer. > >> The > >> rationale is that if search 'mounted', I also want documents with > 'mount' > >> match. > >> > >> So seems built-in wildcard search could not satisfy my requirements if i > >> understand correctly. > >> > >> Thanks very much! > >> > >> > >> 2010/8/9 Bastian Spitzer > >> > >>> Wildcard-Search is already built in, just use: > >>> > >>> ?q=umoun* > >>> ?q=mounta* > >>> > >>> -Ursprüngliche Nachricht- > >>> Von: yandong yao [mailto:yydz...@gmail.com] > >>> Gesendet: Montag, 9. August 2010 15:57 > >>> An: solr-user@lucene.apache.org > >>> Betreff: how to support "implicit trailing wildcards" > >>> > >>> Hi everyone, > >>> > >>> > >>> How to support 'implicit trailing wildcard *' using Solr, eg: using > >> Google > >>> to search 'umoun', 'umount' will be matched , search 'mounta', > 'mountain' > >>> will be matched. > >>> > >>> From my point of view, there are several ways, both with disadvantages: > >>> > >>> 1) Using EdgeNGramFilterFactory, thus 'umount' will be indexed with > 'u', > >>> 'um', 'umo', 'umou', 'umoun', 'umount'. The disadvantages are: a) the > >> index > >>> size increases dramatically, b) will matches even has no relationship, > >> such > >>> as such 'mount' will match 'mountain' also. > >>> > >>> 2) Using two pass searching: first pass searches term dictionary > through > >>> TermsComponent using given keyword, then using the first matched term > >> from > >>> term dictionary to search again. eg: when user enter 'umoun', > >> TermsComponent > >>> will match 'umount', then use 'umount' to search. The disadvantage are: > >> a) > >>> need to parse query string so that could recognize meta keywords such > as > >>> 'AND', 'OR', '+', '-', '"' (this makes more complex as I am using PHP > >>> client), b) The returned hit counts is not for original search string, > >> thus > >>> will influence other components such as auto-suggest component based on > >> user > >>> search history and hit counts. > >>> > >>> 3) Write custom SearchComponent, while have no idea where/how to start > >>> with. > >>> > >>> Is there any other way in Solr to do this, any feedback/suggestion are > >>> welcome! > >>> > >>> Thanks very much in advance! > >>> > >> > >
Re: Improve Query Time For Large Index
On Wed, Aug 11, 2010 at 11:47 AM, Burton-West, Tom wrote: > Hi Peter, > > Can you give a few more examples of slow queries? > Are they phrase queries? Boolean queries? prefix or wildcard queries? > If one word queries are your slow queries, than CommonGrams won't help. > CommonGrams will only help with phrase queries. > Since the example given was "http" being slow, its worth mentioning that if queries are "one word" urls [for example http://lucene.apache.org] these will actually form slow phrase queries by default. Because your content is very tiny documents, its probably good to disable this since the phrases won't likely help the results at all, but make things unbearably slow. in solr 3_x and trunk, you can disable these automatic phrase queries in schema.xml with autoGeneratePhraseQueries="false": then the system won't form phrase queries unless the user explicitly puts double quotes around it. -- Robert Muir rcm...@gmail.com
Re: Need help with facets
That's awesome. Thanks Ahmet! On Wed, Aug 11, 2010 at 1:50 AM, Ahmet Arslan wrote: > > > --- On Wed, 8/11/10, Moazzam Khan wrote: > >> From: Moazzam Khan >> Subject: Re: Need help with facets >> To: solr-user@lucene.apache.org >> Date: Wednesday, August 11, 2010, 1:32 AM >> Thanks Ahmet that worked! >> >> Here's another issues I have : >> >> Like I said before, I have these fields in Solr documents >> >> FirstName >> LastName >> RecruitedDate >> VolumeDate (just added this in this email) >> VolumeDone (just added this in this email) >> >> >> Now I have to get sum of all VolumeDone (integer field) for >> this month >> by everyone, then take 25% of that number and get all >> people whose >> volume was more than that. Is there a way to do this? :D > > You need to execute two queries for that. Stats Component can give you sum. > q=VolumeDate:[NOW-1MONTH TO NOW]&stats=true&stats.field=VolumeDone > > http://wiki.apache.org/solr/StatsComponent > > Then second query > q=VolumeDate:[NOW-1MONTH TO NOW]&fq=VolumeDone:[sumComesAbove TO *] > > But you need to use tint type instead of int for VolumeDone, to range queries > work correctly. > > > >
Analysing SOLR logfiles
Hi there, Just wondering what tools people use to analyse SOLR log files. We're looking to do things like extracting common queries, calculating averaging Qtime and hits, returning particularly slow/expensive queries, etc. Would prefer not to code something (completely) from scratch. Thanks!
Filter Performance in Solr 1.3
Hi there, I have a question about filter (fq) performance in Solr 1.3. After doing some testing it seems as though adding a filter increases search time. From what I've read here http://www.derivante.com/2009/06/23/solr-filtering-performance-increase/ and here http://www.lucidimagination.com/blog/2009/05/27/filtered-query-performan ce-increases-for-solr-14/ it seems as though upgrading to 1.4 would solve this problem. My question is whether there is anything that can be done in 1.3 to help alleviate the problem, before upgrading to 1.4? It becomes an issue because the majority of searches that are done on our site need some content type excluded or filtered for. Does it make sense to use the fq parameter in this way, or is there some better approach since filters are almost always used? Thank you!
Re: Filter Performance in Solr 1.3
fq's are the preferred way to use for filtering when the same filter is often used. (since the filter-set can be cached seperately) . as to your direct question: > My question is whether there is anything that can be done in 1.3 to help alleviate the problem, before upgrading to 1.4? I don't think so (perhaps some patches that I'm not aware of) . When are you seeing increased search time? is it the first time the filter is used? If that's the case: that's logical since the filter needs to be build. (fq)-filters only show their strength (as said above) when you use them repeatedly. If on the other hand you're seeing slower repsonse times with a fq-filter applied all the time, then the same queries without the fq-filter, there must be something strange going on since this really shouldn't happen in normal situations. Geert-Jan 2010/8/11 Bargar, Matthew B > Hi there, I have a question about filter (fq) performance in Solr 1.3. > After doing some testing it seems as though adding a filter increases > search time. From what I've read here > http://www.derivante.com/2009/06/23/solr-filtering-performance-increase/ > > and here > http://www.lucidimagination.com/blog/2009/05/27/filtered-query-performan > ce-increases-for-solr-14/ > > it seems as though upgrading to 1.4 would solve this problem. My > question is whether there is anything that can be done in 1.3 to help > alleviate the problem, before upgrading to 1.4? It becomes an issue > because the majority of searches that are done on our site need some > content type excluded or filtered for. Does it make sense to use the fq > parameter in this way, or is there some better approach since filters > are almost always used? > > Thank you! >
Data Import Handler Query
Hi, I have installed solr 1.4 and am trying to use the Data Import Handler to import data from a database. I have 2 tables which share a 1 to many relation (1 Story to Many Images). I want my index to contain attributes regarding “Story” and also all “Images” that it has. Based on the DIH documentation, I have setup the data-config.xml as follows: '${dataimporter.last_index_time}'"* deltaQuery=*"select story_id from story where time > '${dataimporter.last_index_time}'"*> However, when I query the index, I find that it imports only the first record from images that it finds for a story. Eg. If I have a story with 3 images, the index only has information about the first one. Is it possible to get the data for all images for a story in the same index. If so, what am I missing in the data config ? Thanks.
RE: Filter Performance in Solr 1.3
The search with the filter takes longer than a search for the same term but no filter after repeated searches, after the cache should have come into play. To be more specific, this happens on filters that exclude very few results from the overall set. For instance, type:video returns few results and as one would expect, returns much quicker than a search without that filter. -type:video, on the other hand returns a lot of results and excludes very few, and actually takes longer than a search without any filter at all. Is this what one might expect when using a filter that excludes few results, or does it still seem like something strange might be happening? Thanks, Matt -Original Message- From: Geert-Jan Brits [mailto:gbr...@gmail.com] Sent: Wednesday, August 11, 2010 2:55 PM To: solr-user@lucene.apache.org Subject: Re: Filter Performance in Solr 1.3 fq's are the preferred way to use for filtering when the same filter is often used. (since the filter-set can be cached seperately) . as to your direct question: > My question is whether there is anything that can be done in 1.3 to help alleviate the problem, before upgrading to 1.4? I don't think so (perhaps some patches that I'm not aware of) . When are you seeing increased search time? is it the first time the filter is used? If that's the case: that's logical since the filter needs to be build. (fq)-filters only show their strength (as said above) when you use them repeatedly. If on the other hand you're seeing slower repsonse times with a fq-filter applied all the time, then the same queries without the fq-filter, there must be something strange going on since this really shouldn't happen in normal situations. Geert-Jan 2010/8/11 Bargar, Matthew B > Hi there, I have a question about filter (fq) performance in Solr 1.3. > After doing some testing it seems as though adding a filter increases > search time. From what I've read here > http://www.derivante.com/2009/06/23/solr-filtering-performance-increas > e/ > > and here > http://www.lucidimagination.com/blog/2009/05/27/filtered-query-perform > an > ce-increases-for-solr-14/ > > it seems as though upgrading to 1.4 would solve this problem. My > question is whether there is anything that can be done in 1.3 to help > alleviate the problem, before upgrading to 1.4? It becomes an issue > because the majority of searches that are done on our site need some > content type excluded or filtered for. Does it make sense to use the > fq parameter in this way, or is there some better approach since > filters are almost always used? > > Thank you! >
Re: how to support "implicit trailing wildcards"
I guess q=mount OR (mount*)^0.01 would work equally as well, i.e. diminishing the effect of wildcard matches. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 17.53, yandong yao wrote: > Hi Jan, > > Seems q=mount OR mount* have different sorting order with q=mount for those > documents including mount. > Change to q=mount^100 OR (mount?* -mount)^1.0, and test well. > > Thanks very much! > > 2010/8/10 Jan Høydahl / Cominvent > >> Hi, >> >> You don't need to duplicate the content into two fields to achieve this. >> Try this: >> >> q=mount OR mount* >> >> The exact match will always get higher score than the wildcard match >> because wildcard matches uses "constant score". >> >> Making this work for multi term queries is a bit trickier, but something >> along these lines: >> >> q=(mount OR mount*) AND (everest OR everest*) >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> Training in Europe - www.solrtraining.com >> >> On 10. aug. 2010, at 09.38, Geert-Jan Brits wrote: >> >>> you could satisfy this by making 2 fields: >>> 1. exactmatch >>> 2. wildcardmatch >>> >>> use copyfield in your schema to copy 1 --> 2 . >>> >>> q=exactmatch:mount+wildcardmatch:mount*&q.op=OR >>> this would score exact matches above (solely) wildcard matches >>> >>> Geert-Jan >>> >>> 2010/8/10 yandong yao >>> Hi Bastian, Sorry for not make it clear, I also want exact match have higher score >> than wildcard match, that is means: if searching 'mount', documents with >> 'mount' will have higher score than documents with 'mountain', while 'mount*' >> seems treat 'mount' and 'mountain' as same. besides, also want the query to be processed with analyzer, while from >> http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F , Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer. The rationale is that if search 'mounted', I also want documents with >> 'mount' match. So seems built-in wildcard search could not satisfy my requirements if i understand correctly. Thanks very much! 2010/8/9 Bastian Spitzer > Wildcard-Search is already built in, just use: > > ?q=umoun* > ?q=mounta* > > -Ursprüngliche Nachricht- > Von: yandong yao [mailto:yydz...@gmail.com] > Gesendet: Montag, 9. August 2010 15:57 > An: solr-user@lucene.apache.org > Betreff: how to support "implicit trailing wildcards" > > Hi everyone, > > > How to support 'implicit trailing wildcard *' using Solr, eg: using Google > to search 'umoun', 'umount' will be matched , search 'mounta', >> 'mountain' > will be matched. > > From my point of view, there are several ways, both with disadvantages: > > 1) Using EdgeNGramFilterFactory, thus 'umount' will be indexed with >> 'u', > 'um', 'umo', 'umou', 'umoun', 'umount'. The disadvantages are: a) the index > size increases dramatically, b) will matches even has no relationship, such > as such 'mount' will match 'mountain' also. > > 2) Using two pass searching: first pass searches term dictionary >> through > TermsComponent using given keyword, then using the first matched term from > term dictionary to search again. eg: when user enter 'umoun', TermsComponent > will match 'umount', then use 'umount' to search. The disadvantage are: a) > need to parse query string so that could recognize meta keywords such >> as > 'AND', 'OR', '+', '-', '"' (this makes more complex as I am using PHP > client), b) The returned hit counts is not for original search string, thus > will influence other components such as auto-suggest component based on user > search history and hit counts. > > 3) Write custom SearchComponent, while have no idea where/how to start > with. > > Is there any other way in Solr to do this, any feedback/suggestion are > welcome! > > Thanks very much in advance! > >> >>
Re: Data Import Handler Query
It may not be the data config. Do you have the fields in the schema.xml that the image data is going to set to be multiValued="true"? Although, I would think the last image would be stored, not the first, but haven't really tested this. -- View this message in context: http://lucene.472066.n3.nabble.com/Data-Import-Handler-Query-tp1092010p1092917.html Sent from the Solr - User mailing list archive at Nabble.com.
bug or feature???
Hi, Can someone tell me why the two following queries do not return the same results? Is that a bug or a feature? http://localhost:8983/jobs/select?fq=title:(NOT janitor)&fq=description:(NOT janitor)&q=*:* http://localhost:8983/jobs/select?q=title:(NOT janitor) AND description:(NOT janitor) The second query returns no result while the first one returns 6097276 documents Thanks
General questions about distributed solr shards
1) Is there any information on preferred maximum sizes for a single solr index. I've read some people say 10 million, some say 80 million, etc... Is there any official recommendation or has anyone experimented with large datasets into the tens of billions? 2) Is there any down side to running multiple solr shard instances on a single machine rather than one shard instance with a larger index per machine? I would think that having 5 instances with 1/5 the index would return results approx 5 times faster. 3) Say you have a solr configuration with multiple shards. If you attempt to query while one of the shards is down you will receive a HTTP 500 on the client due to a connection refused on the server. Is there a way to tell the server to ignore this and return as many results as possible? In other words if you have 100 shards, it is possible that occasionally a process may die, but I would still like to return results from the active shards. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/General-questions-about-distributed-solr-shards-tp1095117p1095117.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing and ExtractingRequestHandler
I'm trying to use Solr to index the contents of an Excel file, using the ExtractingRequestHandler (CSV handler won't work for me - I need to consider the whole spreadsheet as one document), and I'm running into some trouble. Is there any way to see what's going on during the indexing process? I'm concerned that I may be losing some terms, and I'd like to see if i can snoop on the terms that are added to the index as they go along. How might I do this? Barring that, how can I inspect the index post-fact? I have tried to use luke to see what's in the index, but I get an error: "Unknown format version -10". Is it possible to get luke to work? My solr build is straight out of SVN. thanks, harry
Re: Analysing SOLR logfiles
Have a look at www.splunk.com -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 19.34, Jay Flattery wrote: > Hi there, > > > Just wondering what tools people use to analyse SOLR log files. > > We're looking to do things like extracting common queries, calculating > averaging > > Qtime and hits, returning particularly slow/expensive queries, etc. > > Would prefer not to code something (completely) from scratch. > > Thanks! > > > >
Re: bug or feature???
Your syntax looks a bit funny. Which version of Solr are you using? Pure negative queries are not supported, try q=(*:* -title:janitor) instead. Also, for debugging what's going on, please add &debugQuery=true and share the parsed query for both cases with us. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 22.28, Jean-Sebastien Vachon wrote: > Hi, > > Can someone tell me why the two following queries do not return the same > results? > Is that a bug or a feature? > > http://localhost:8983/jobs/select?fq=title:(NOT janitor)&fq=description:(NOT > janitor)&q=*:* > > http://localhost:8983/jobs/select?q=title:(NOT janitor) AND description:(NOT > janitor) > > > The second query returns no result while the first one returns 6097276 > documents > > Thanks
Re: Data Import Handler Query
I tried making the schema fields that get the image data to multiValued="true". But it still gets only the first image data. It doesn't have information about all the images. On Wed, Aug 11, 2010 at 1:15 PM, kenf_nc wrote: > > It may not be the data config. Do you have the fields in the schema.xml > that > the image data is going to set to be multiValued="true"? > > Although, I would think the last image would be stored, not the first, but > haven't really tested this. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Data-Import-Handler-Query-tp1092010p1092917.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Indexing and ExtractingRequestHandler
Hi, You can try Tika command line to parse your Excel file, then you will se the exact textual output from it, which will be indexed into Solr, and thus inspect whether something is missing. Are you sure you use a version of Luke which supports your version of Lucene? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 23.33, Harry Hochheiser wrote: > I'm trying to use Solr to index the contents of an Excel file, using > the ExtractingRequestHandler (CSV handler won't work for me - I need > to consider the whole spreadsheet as one document), and I'm running > into some trouble. > > Is there any way to see what's going on during the indexing process? > I'm concerned that I may be losing some terms, and I'd like to see if > i can snoop on the terms that are added to the index as they go along. > How might I do this? > > Barring that, how can I inspect the index post-fact? I have tried to > use luke to see what's in the index, but I get an error: "Unknown > format version -10". Is it possible to get luke to work? > > My solr build is straight out of SVN. > > thanks, > > harry
Re: DIH transformer script size limitations with Jetty?
To follow up on my own question, it appears this is only an issue when using the DataImport console debugging tools. It looks like when submitting the debugging request, the data-config.xml is sent via a GET request, which would fail. However, using the exact same data-config.xml via a full-import operation (ie not a dry run debug), it looks like the request is sent POST and the import works fine. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-transformer-script-size-limitations-with-Jetty-tp1091246p1100285.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH - Insert another record After first load
Hi, I did load of the data with DIH and now once the data is loaded. I want to load the records dynamically as an when I received. Use cases: 1. I did load of 7MM records and now everything is working fine. 2. A new record is received, now I want to add this new record into the indexed data. Here is difference in the processing and the logic: * Initial data Load is done from a Oracle Materialized view * The new record is added into the tables from where view is created and not available in the view now * now I want to add this new record into the index. I have a Java bean loaded with the data including the index column. * I looked at the indexed file and it is all encoded. 3. How do I load above loaded Java bean to the index? An example would really help. Thanks Girish
How to "OR" facet queries
Hi, I have 3 facet fields (A,B,C) the values of each facet field will be shown as check boxes to users: Field A [x] Val1a [x] Val2a [] Val3a Field B [x] Val1b [] Val2b [] Val3b Within a field if the user selects two items I want the queries to be an "OR" query. Currently I'm generating something like: &fq=FieldA%3AVal1a&fq=FieldA%3AVal2a&fq=FieldB%3AVal1b This is not working as the first two filter queries are 'and'ing. What is the proper syntax to accomplish what I'm trying to do? Thanks.
Re: How to "OR" facet queries
On Thu, Aug 12, 2010 at 7:12 AM, Frank A wrote: > Hi, I have 3 facet fields (A,B,C) the values of each facet field will > be shown as check boxes to users: > > Field A > [x] Val1a > [x] Val2a > [] Val3a > > Field B > [x] Val1b > [] Val2b > [] Val3b > > Within a field if the user selects two items I want the queries to be > an "OR" query. Currently I'm generating something like: > > &fq=FieldA%3AVal1a&fq=FieldA%3AVal2a&fq=FieldB%3AVal1b > &fq=FieldA%3AVal1a%20OR%20FieldA%3AVal2a&fq=FieldB%3AVal1b > > This is not working as the first two filter queries are 'and'ing. > What is the proper syntax to accomplish what I'm trying to do? > > Thanks. >
Re: DIH transformer script size limitations with Jetty?
Have you tried changing the -Xmx value to bump to -Xmx1300m? I had some problem with DIH loading the data and when I bumped the memory everything worked fine! harrysmith wrote: To follow up on my own question, it appears this is only an issue when using the DataImport console debugging tools. It looks like when submitting the debugging request, the data-config.xml is sent via a GET request, which would fail. However, using the exact same data-config.xml via a full-import operation (ie not a dry run debug), it looks like the request is sent POST and the import works fine.
Re: Indexing and ExtractingRequestHandler
Thanks. I've done Tika command line to parse the Excel file, and I see contents in it that don't appear to be indexed. I've tried the path of using Tika to parse the Excel and then using extracting request handler to index the resulting text, and that doesn't work either. As far as Luke goes, I've built it from scratch. Still bombs. Is it possible that it's not compatible with lucene builds based on trunk? thanks, -harry On Wed, Aug 11, 2010 at 6:48 PM, Jan Høydahl / Cominvent wrote: > Hi, > > You can try Tika command line to parse your Excel file, then you will se the > exact textual output from it, which will be indexed into Solr, and thus > inspect whether something is missing. > > Are you sure you use a version of Luke which supports your version of Lucene? > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Training in Europe - www.solrtraining.com > > On 11. aug. 2010, at 23.33, Harry Hochheiser wrote: > >> I'm trying to use Solr to index the contents of an Excel file, using >> the ExtractingRequestHandler (CSV handler won't work for me - I need >> to consider the whole spreadsheet as one document), and I'm running >> into some trouble. >> >> Is there any way to see what's going on during the indexing process? >> I'm concerned that I may be losing some terms, and I'd like to see if >> i can snoop on the terms that are added to the index as they go along. >> How might I do this? >> >> Barring that, how can I inspect the index post-fact? I have tried to >> use luke to see what's in the index, but I get an error: "Unknown >> format version -10". Is it possible to get luke to work? >> >> My solr build is straight out of SVN. >> >> thanks, >> >> harry > >
Re: Schema Definition Question
Can do you a DB join on OurID? That makes the association in the database, before it gets to the DataImportHandler. On Sun, Aug 8, 2010 at 6:17 PM, Frank A wrote: > Hi, > > I have a db handler with the following definition: > > query="select OurID,Name,City,State,lat,lng,cost from place" > deltaQuery="select OurID from destinations where > OurID= '${dataimporter.request.did}'" > deltaImportQuery="select > OurID,Name,City,State,lat,lng,cost from place where > OurID='${dataimporter.delta.id}'" > > > > query="select label,f.FeatureID from features f, > featureplace fp where fp.PlaceID='${place.OurID}' and > fp.FeatureID=f.FeatureID"> > > > > > In my schema I have: > > stored="true" multiValued="true"/> > multiValued="true"/> > > This yields results that have a list of feature labels and a separate > list of FeatureIDs with now real connection between the two. Is there > a better way to represent this? > > Thanks. > -- Lance Norskog goks...@gmail.com
In multicore env, can I make it access core0 by default
Thus when I access http://localhost/solr/select?q=*:* equals http://localhost/solr/core0/select?q=*:*.
Re: Schema Definition Question
I think I know where you're headed, I was struggling with the same issue. In my case, using results from Solr I link to a detailed profile using an ID, but I am displaying the String value. I was looking for something like: 12345 Feature 1 label 1 Feature 2 label 2 ...or something similar, some way of linking child items together. Unfortunately, this isn't how Solr works. This issue is addressed in the Solr 1.4 book by Smiley and Pugh. This related snippet is from Chapter 2, page 36, dealing with an example application with a Music artist's name, and a related id. "...If we only record the name, then it is problematic to do things like have links in the UI from a band member to that member's detail page... This means that we'll need to have an additional multi-valued field for the member's ID. Multi-valued fields maintain ordering so that the two fields would have corresponding values at a given index. Beware, there can be a tricky case when one of the values can be blank, and you need to come up with a placeholder. The client code would have to know about this placeholder." So it seems that we will be assured that the multivalued fields will be in the same order, so we can use the same index number. This seems clunky to me, but I have not come across any other solutions. -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-Definition-Question-tp1049966p1105593.html Sent from the Solr - User mailing list archive at Nabble.com.