Duplicate items in distributed search
Hi, I'm after a bit of clarification about the 'limitations' section of the distributed search page on the wiki. The first two limitations say: * Documents must have a unique key and the unique key must be stored (stored="true" in schema.xml) * When duplicate doc IDs are received, Solr chooses the first doc and discards subsequent ones Does 'doc ID' in the second point refer to the unique key in the first point, or does it refer to the internal Lucene document ID? Cheers, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-items-in-distributed-search-tp942408p942408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Duplicate items in distributed search
Mark Miller-3 wrote: > > The 'doc ID' in the second point refers to the unique key in the first > point. > I thought so but thanks for clarifying. Maybe a wording change on the wiki would be good? Cheers, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-items-in-distributed-search-tp942408p942554.html Sent from the Solr - User mailing list archive at Nabble.com.
Using symlinks to alias cores
Another question... I have a series of cores representing historical data, only the most recent of which gets indexed to. I'd like to alias the most recent one to 'current' so that when they roll over I can just change the alias, and the cron jobs etc. which manage indexing don't have to change. However, the wiki recommends against using the ALIAS command in CoreAdmin in a couple of places, and SOLR-1637 says it's been removed now anyway. If I can't use ALIAS safely, is it okay to just symlink the most recent core's instance (or data) directory to 'current', and bring it up in Solr as a separate core? Will this be safe, as long as all index writing happens via the 'current' core? Or will it cause Solr to get confused and do horrible things to the index? Thanks! Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-symlinks-to-alias-cores-tp942567p942567.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Duplicate items in distributed search
Mark Miller-3 wrote: > > On 7/4/10 12:49 PM, Andrew Clegg wrote: >> I thought so but thanks for clarifying. Maybe a wording change on the >> wiki > > Sounds like a good idea - go ahead and make the change if you'd like. > That page seems to be marked immutable... -- View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-items-in-distributed-search-tp942408p942984.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using symlinks to alias cores
Chris Hostetter-3 wrote: > > a cleaner way to deal with this would be do use something like > RewriteRule -- either in your appserver (if it supports a feature like > that) or in a proxy sitting in front of Solr. > I think we'll go with this -- seems like the most bulletproof way. Cheers, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-symlinks-to-alias-cores-tp942567p956394.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud in production?
Is anyone using ZooKeeper-based Solr Cloud in production yet? Any war stories? Any problematic missing features? Thanks, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-in-production-tp991995p991995.html Sent from the Solr - User mailing list archive at Nabble.com.
maxMergeDocs and performance tuning
Hi, I'm a little confused about how the tuning params in solrconfig.xml actually work. My index currently has mergeFactor=25 and maxMergeDocs=2147483647. So this means that up to 25 segments can be created before a merge happens, and each segment can have up to 2bn docs in, right? But this page: http://www.ibm.com/developerworks/java/library/j-solr2/ says "Smaller values [of maxMergeDocs] (< 10,000) are best for applications with a large number of updates." Our system does indeed have frequent updates. But if we set maxMergeDocs=1, what happens when we reach 25 segments with 1 docs in each? Is the mergeFactor just ignored, so we start a new segment anyway? More generally, what would be reasonable params for a large index consisting of many small docs, updated frequently? I think a few different use-case examples like this would be a great addition to the wiki. Thanks! Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/maxMergeDocs-and-performance-tuning-tp1162695p1162695.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: maxMergeDocs and performance tuning
Okay, thanks Marc. I don't really have any complaints about performance (yet!) but I'm still wondering how the mechanics work, e.g. when you have a number of segments equal to mergeFactor, and each contains maxMergeDocs documents. The docs are a bit fuzzy on this... -- View this message in context: http://lucene.472066.n3.nabble.com/maxMergeDocs-and-performance-tuning-tp1162695p1183064.html Sent from the Solr - User mailing list archive at Nabble.com.
Duplicate docs when mergin
-- View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-docs-when-mergin-tp1261979p1261979.html Sent from the Solr - User mailing list archive at Nabble.com.
Duplicate docs when merging indices?
Hi, First off, sorry about previous accidental post, had a sausage-fingered moment. Anyway... If I merge two indices with CoreAdmin, as detailed here... http://wiki.apache.org/solr/MergingSolrIndexes What happens to duplicate documents between the two? i.e. those that have the same unique key. What decides which copy takes precedence? Will documents get indexed multiple times, or will the second one just get skipped? Also, does the behaviour vary between CoreAdmin and IndexMergeTool? This thread from a couple of years ago: http://web.archiveorange.com/archive/v/AAfXfQIiBU7vyQBt6qdk suggests that IndexMergeTool can result in dupes, unless I'm misinterpreting. Thanks! Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-docs-when-merging-indices-tp1262043p1262043.html Sent from the Solr - User mailing list archive at Nabble.com.
Replication snapshot, tar says "file changed as we read it"
(Many apologies if this appears twice, I tried to send it via Nabble first but it seems to have got stuck, and is fairly urgent/serious.) Hi, I'm trying to use the replication handler to take snapshots, then archive them and ship them off-site. Just now I got a message from tar that worried me: tar: snapshot.20110115035710/_70b.tis: file changed as we read it tar: snapshot.20110115035710: file changed as we read it The relevant bit of script that does it looks like this (error checking removed): curl 'http://localhost:8983/solr/core/1replication?command=backup' PREFIX='' if [[ "$START_TIME" =~ 'Sun' ]] then PREFIX='weekly.' fi cd $SOLR_DATA_DIR for snapshot in `ls -d -1 snapshot.*` do TARGET="${LOCAL_BACKUP_DIR}/${PREFIX}${snapshot}.tar.bz2" echo "Archiving ${snapshot} into $TARGET" tar jcf $TARGET $snapshot echo "Deleting ${snapshot}" rm -rf $snapshot done I was under the impression that files in the snapshot were guaranteed to never change, right? Otherwise what's the point of the replication backup command? I tried putting in a 30-second sleep after the snapshot and before the tar, but the error occurred again anyway. There was a message from Lance N. with a similar error in, years ago: http://www.mail-archive.com/solr-user@lucene.apache.org/msg06104.html but that would be pre-replication anyway, right? This is on Ubuntu 10.10 using java 1.6.0_22 and Solr 1.4.0. Thanks, Andrew. -- :: http://biotext.org.uk/ :: http://twitter.com/andrew_clegg/ ::
Re: Replication snapshot, tar says "file changed as we read it"
PS one other point I didn't mention is that this server has a very fast autocommit limit (2 seconds max time). But I don't know if this is relevant -- I thought the files in the snapshot wouldn't be committed to again. Please correct me if this is a huge misunderstanding. On 16 January 2011 12:30, Andrew Clegg wrote: > (Many apologies if this appears twice, I tried to send it via Nabble > first but it seems to have got stuck, and is fairly urgent/serious.) > > Hi, > > I'm trying to use the replication handler to take snapshots, then > archive them and ship them off-site. > > Just now I got a message from tar that worried me: > > tar: snapshot.20110115035710/_70b.tis: file changed as we read it > tar: snapshot.20110115035710: file changed as we read it > > The relevant bit of script that does it looks like this (error > checking removed): > > curl 'http://localhost:8983/solr/core/1replication?command=backup' > PREFIX='' > if [[ "$START_TIME" =~ 'Sun' ]] > then > PREFIX='weekly.' > fi > cd $SOLR_DATA_DIR > for snapshot in `ls -d -1 snapshot.*` > do > TARGET="${LOCAL_BACKUP_DIR}/${PREFIX}${snapshot}.tar.bz2" > echo "Archiving ${snapshot} into $TARGET" > tar jcf $TARGET $snapshot > echo "Deleting ${snapshot}" > rm -rf $snapshot > done > > I was under the impression that files in the snapshot were guaranteed > to never change, right? Otherwise what's the point of the replication > backup command? > > I tried putting in a 30-second sleep after the snapshot and before the > tar, but the error occurred again anyway. > > There was a message from Lance N. with a similar error in, years ago: > > http://www.mail-archive.com/solr-user@lucene.apache.org/msg06104.html > > but that would be pre-replication anyway, right? > > This is on Ubuntu 10.10 using java 1.6.0_22 and Solr 1.4.0. > > Thanks, > > Andrew. > > > -- > > :: http://biotext.org.uk/ :: http://twitter.com/andrew_clegg/ :: > -- :: http://biotext.org.uk/ :: http://twitter.com/andrew_clegg/ ::
Re: Replication snapshot, tar says "file changed as we read it"
Sorry to re-open an old thread, but this just happened to me again, even with a 30 second sleep between taking the snapshot and starting to tar it up. Then, even more strangely, the snapshot was removed again before tar completed. Archiving snapshot.20110320113401 into /var/www/mesh/backups/weekly.snapshot.20110320113401.tar.bz2 tar: snapshot.20110320113401/_neqv.fdt: file changed as we read it tar: snapshot.20110320113401/_neqv.prx: File removed before we read it tar: snapshot.20110320113401/_neqv.fnm: File removed before we read it tar: snapshot.20110320113401: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors Has anybody seen this before, or been able to replicate it themselves? (no pun intended) Or, is anyone else using replication snapshots for backup? Have I misunderstood them? I thought the point of a snapshot was that once taken it was immutable. If it's important, this is on a machine configured as a replication master, but with no slaves attached to it (it's basically a failover and backup machine). startup commit admin-extra.html,elevate.xml,protwords.txt,schema.xml,scripts.conf,solrconfig_slave.xml:solrconfig.xml,stopwords.txt,synonyms.txt 00:00:10 Thanks, Andrew. On 16 January 2011 12:55, Andrew Clegg wrote: > PS one other point I didn't mention is that this server has a very > fast autocommit limit (2 seconds max time). > > But I don't know if this is relevant -- I thought the files in the > snapshot wouldn't be committed to again. Please correct me if this is > a huge misunderstanding. > > On 16 January 2011 12:30, Andrew Clegg wrote: >> (Many apologies if this appears twice, I tried to send it via Nabble >> first but it seems to have got stuck, and is fairly urgent/serious.) >> >> Hi, >> >> I'm trying to use the replication handler to take snapshots, then >> archive them and ship them off-site. >> >> Just now I got a message from tar that worried me: >> >> tar: snapshot.20110115035710/_70b.tis: file changed as we read it >> tar: snapshot.20110115035710: file changed as we read it >> >> The relevant bit of script that does it looks like this (error >> checking removed): >> >> curl 'http://localhost:8983/solr/core/1replication?command=backup' >> PREFIX='' >> if [[ "$START_TIME" =~ 'Sun' ]] >> then >> PREFIX='weekly.' >> fi >> cd $SOLR_DATA_DIR >> for snapshot in `ls -d -1 snapshot.*` >> do >> TARGET="${LOCAL_BACKUP_DIR}/${PREFIX}${snapshot}.tar.bz2" >> echo "Archiving ${snapshot} into $TARGET" >> tar jcf $TARGET $snapshot >> echo "Deleting ${snapshot}" >> rm -rf $snapshot >> done >> >> I was under the impression that files in the snapshot were guaranteed >> to never change, right? Otherwise what's the point of the replication >> backup command? >> >> I tried putting in a 30-second sleep after the snapshot and before the >> tar, but the error occurred again anyway. >> >> There was a message from Lance N. with a similar error in, years ago: >> >> http://www.mail-archive.com/solr-user@lucene.apache.org/msg06104.html >> >> but that would be pre-replication anyway, right? >> >> This is on Ubuntu 10.10 using java 1.6.0_22 and Solr 1.4.0. >> >> Thanks, >> >> Andrew. >> >> >> -- >> >> :: http://biotext.org.uk/ :: http://twitter.com/andrew_clegg/ :: >> > > > > -- > > :: http://biotext.org.uk/ :: http://twitter.com/andrew_clegg/ :: > -- :: http://biotext.org.uk/ :: http://twitter.com/andrew_clegg/ ::
NullPointerException in DataImportHandler
First of all, apologies if you get this twice. I posted it by email an hour ago but it hasn't appeared in any of the archives, so I'm worried it's got junked somewhere. I'm trying to use a DataImportHandler to merge some data from a database with some other fields from a collection of XML files, rather like the example in the Architecture section here: http://wiki.apache.org/solr/DataImportHandler ... so a given document is built from some fields from the database and some from the XML. My dataconfig.xml looks like this: This works if I comment out the inner entity, but when I uncomment it, I get this error: 30-Jul-2009 14:32:50 org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: domain document : SolrInputDocument[{id=id(1.0)={1s32D00}, title=title(1.0)={PDB code 1s32, chain D, domain 00}, keywords=keywords(1.0)={some ke ywords go here}, pdb_code=pdb_code(1.0)={1s32}, doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1s32 1s32D}}] org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:64) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:344) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.lang.NullPointerException at java.io.File.(File.java:222) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:75) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:44) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) ... 9 more I have checked that the file /cath/people/cathdata/v3_3_0/pdb-XML-noatom/1s32-noatom.xml is readable, so maybe the full path to the file isn't being constructed properly or something? I also tried with the full path template for the file in the entity url attribute, instead of using a basePath in the dataSource, but I get exactly the same exception. This is with the 2009-07-30 nightly build. See attached for schema. http://www.nabble.com/file/p24739580/schema.xml schema.xml Any ideas? Thanks in advance! Andrew. -- :: http://biotext.org.uk/ :: -- View this message in context: http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24739580.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NullPointerException in DataImportHandler
Chantal Ackermann wrote: > > Hi Andrew, > > your inner entity uses an XML type datasource. The default entity > processor is the SQL one, however. > > For your inner entity, you have to specify the correct entity processor > explicitly. You do that by adding the attribute "processor", and the > value is the classname of the processor you want to use. > > e.g. processor="XPathEntityProcessor" > Thanks -- I was also missing a forEach expression -- in my case, just "/" since each XML file contains the information for no more than one document. However, I'm now getting a different exception: 30-Jul-2009 16:48:52 org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: domain document : SolrInputDocument[{id=id(1.0)={1udaA02}, title=title(1.0)={PDB code 1uda, chain A, domain 02}, pdb_code=pdb_code(1.0)={1uda}, doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1uda,1udaA}}] org.apache.solr.handler.dataimport.DataImportHandlerException: Exception while reading xpaths for fields Processing Document # 1 at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:135) at org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:307) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.LinkedList.entry(LinkedList.java:365) at java.util.LinkedList.get(LinkedList.java:315) at org.apache.solr.handler.dataimport.XPathRecordReader.addField0(XPathRecordReader.java:71) at org.apache.solr.handler.dataimport.XPathRecordReader.(XPathRecordReader.java:50) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:121) ... 9 more My data config now looks like this: Thanks in advance, again :-) Andrew. -- View this message in context: http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741292.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NullPointerException in DataImportHandler
Erik Hatcher wrote: > > > On Jul 30, 2009, at 11:54 AM, Andrew Clegg wrote: >>> url="${domain.pdb_code}-noatom.xml" processor="XPathEntityProcessor" >> forEach="/"> >>> xpath="//*[local-name()='structCategory']/*[local-name()='struct']/ >> *[local-name()='title']" >> /> > > The XPathEntityProcessor doesn't support that fancy of an xpath - it > supports only a limited subset. Try /structCategory/struct/title > perhaps? > > Sadly not... I tried with: (full path from root) and Same ArrayIndex error each time. Doesn't it use javax.xml then? I was using the complex local-name expressions to make it namespace-agnostic -- is it agnostic anyway? Thanks, Andrew. -- View this message in context: http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741696.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NullPointerException in DataImportHandler
Chantal Ackermann wrote: > > > my experience with XPathEntityProcessor is non-existent. ;-) > > Don't worry -- your hints put me on the right track :-) I got it working with: Now, to get it to ignore missing files without an error... Hmm... Cheers, Andrew. -- View this message in context: http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741772.html Sent from the Solr - User mailing list archive at Nabble.com.
Questions about XPath in data import handler
A couple of questions about the DIH XPath syntax... The docs say it supports: xpath="/a/b/subje...@qualifier='fullTitle']" xpath="/a/b/subject/@qualifier" xpath="/a/b/c" Does the second one mean "select the value of the attribute called qualifier in the /a/b/subject element"? e.g. For this document: ... I would get the result "some text"? Also... Can I select a non-leaf node and get *ALL* the text underneath it? e.g. /a/b in this example? Thanks! Andrew. -- View this message in context: http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954223.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Questions about XPath in data import handler
Andrew Clegg wrote: > > > Sorry, Nabble swallowed my XML example. That was supposed to be [a] [b] [subject qualifier="some text" /] [/b] [/a] ... but in XML. Andrew. -- View this message in context: http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954263.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Questions about XPath in data import handler
Noble Paul നോബിള് नोब्ळ्-2 wrote: > > On Thu, Aug 13, 2009 at 6:35 PM, Andrew Clegg > wrote: > >> Does the second one mean "select the value of the attribute called >> qualifier >> in the /a/b/subject element"? > > yes you are right. Isn't that the semantics of standard xpath syntax? > Yes, just checking since the DIH XPath engine is a little different. Do you know what I would get in this case? > > Also... Can I select a non-leaf node and get *ALL* the text underneath > it? > > e.g. /a/b in this example? Cheers, Andrew. -- View this message in context: http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954869.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Questions about XPath in data import handler
Noble Paul നോബിള് नोब्ळ्-2 wrote: > > yes. look at the 'flatten' attribute in the field. It should give you > all the text (not attributes) under a given node. > > I missed that one -- many thanks. Andrew. -- View this message in context: http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24968349.html Sent from the Solr - User mailing list archive at Nabble.com.
'Connection reset' in DataImportHandler Development Console
Hi folks, I'm trying to use the Debug Now button in the development console to test the effects of some changes in my data import config (see attached). However, each time I click it, the right-hand frame fails to load -- it just gets replaced with the standard 'connection reset' message from Firefox, as if the server's dropped the HTTP connection. Everything else seems okay -- I can run queries in Solr Admin without any problems, and all the other buttons in the dev console work -- status, document count, reload config etc. There's nothing suspicious in Tomcat's catalina.out either. If I hit Reload Config, then Status, then Debug Now, I get this: 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImportHandler processConfiguration INFO: Processing configuration from solrconfig.xml: {config=dataconfig.xml} 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: id is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: title is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: doc_type is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: id is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: title is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: doc_type is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: id is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: title is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: doc_type is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: id is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: title is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.handler.dataimport.DataImporter verifyWithSchema INFO: doc_type is a required field in SolrSchema . But not found in DataConfig 17-Aug-2009 13:12:12 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={clean=false&command=reload-config&commit=true&qt=/dataimport} status=0 QTime=5 17-Aug-2009 13:12:21 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={clean=false&command=status&commit=true&qt=/dataimport} status=0 QTime=0 (The warnings are because the doc_type field comes out of the JDBC result set automatically by column name -- this isn't a problem.) Also, there's no entry in the Tomcat access log for the debug request either, just the first two: [17/Aug/2009:13:12:12 +0100] HTTP/1.1 cookie:- request:- GET /solr/select 200 ?clean=false&commit=true&qt=%2Fdataimport&command=reload-config GET /solr/select?clean=false&commit=t rue&qt=%2Fdataimport&command=reload-config HTTP/1.1 [17/Aug/2009:13:12:21 +0100] HTTP/1.1 cookie:- request:- GET /solr/select 200 ?clean=false&commit=true&qt=%2Fdataimport&command=status GET /solr/select?clean=false&commit=true&qt= %2Fdataimport&command=status HTTP/1.1 PS... Nightly build, 30th of July. Thanks, Andrew. http://www.nabble.com/file/p25005850/dataconfig.xml dataconfig.xml -- View this message in context: http://www.nabble.com/%27Connection-reset%27-in-DataImportHandler-Development-Console-tp25005850p25005850.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 'Connection reset' in DataImportHandler Development Console
Noble Paul നോബിള് नोब्ळ्-2 wrote: > > apparently I do not see any command full-import, delta-import being > fired. Is that true? > It seems that way -- they're not appearing in the logs. I've tried Debug Now with both full and delta selected from the dropdown, no difference either way. If I click the Full Import button it starts an import okay. I don't have to Full Import manually every time I want to debug a config change do I? That's not what the docs say. (A full import takes about 6 or 7 hours...) Thanks, Andrew. -- View this message in context: http://www.nabble.com/%27Connection-reset%27-in-DataImportHandler-Development-Console-tp25005850p25006284.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding a prefix to fields
ahammad wrote: > > Is it possible to add a prefix to the data in a Solr field? For example, > right now, I have a field called "id" that gets data from a DB through the > DataImportHandler. The DB returns a 4-character string like "ag5f". Would > it be possible to add a prefix to the data that is received? > > In this specific case, the data relates to articles. So effectively, if > the DB has "ag5f" as an ID, I want it to be stored as "Article_ag5f". Is > there a way to define a prefix of "Article_" for a certain field? > I have exactly this situation and I just handle it by adding the prefixes in the SQL query. select 'Article_' || id as id from articles etc. I wrap all these up as views and store them in the DB, so Solr just has to select * from each view. Andrew. -- View this message in context: http://www.nabble.com/Adding-a-prefix-to-fields-tp25062226p25062356.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Range Query Anomalities
Try a sdouble or sfloat field type? Andrew. johan.sjoberg wrote: > > Hi, > > we're performing range queries of a field which is of type double. Some > queries which should generate results does not, and I think it's best > explained by the following examples; it's also expected to exist data in > all ranges: > > > ?q=field:[10.0 TO 20.0] // OK > ?q=field:[9.0 TO 20.0] // NOT OK > ?q=field:[09.0 TO 20.0] // OK > > Interesting here is that the range query only works if both ends of the > interval is of equal length (hence 09-to-20 works, but not 9-20). > Unfortunately, this logic does not work for ranges in the 100s. > > > > ?q=field:[* TO 500] // OK > ?q=field:[100.0 TO 500.0] // OK > ?q=field:[90.0 TO 500.0] // NOT OK > ?q=field:[090.0 TO 500.0] // NOT OK > > > > Any ideas to this very strange behaviour? > > > Regards, > Johan > > > -- View this message in context: http://www.nabble.com/Solr-773-%28GEO-Module%29-question-tp25041799p25062912.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard seaches?
Paul Tomblin wrote: > > Is there such a thing as a wildcard search? If I have a simple > solr.StrField with no analyzer defined, can I query for "foo*" or > "foo.*" and get everything that starts with "foo" such as 'foobar" and > "foobaz"? > Yes. foo* is fine even on a simple string field. Andrew. -- View this message in context: http://www.nabble.com/Wildcard-seaches--tp25063582p25063623.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can solr accept other tag other than field?
You can use the Data Import Handler to pull data out of any XML or SQL data source: http://wiki.apache.org/solr/DataImportHandler Andrew. Elaine Li wrote: > > Hi, > > I am new solr user. I want to use solr search to run query against > many xml files I have. > I have set up the solr server to run query against the example files. > > One problem is my xml does not have tag and "name" attribute. > My format is rather easy: > > > > > > > I looked at the schema.xml file and realized I can only customize(add) > attribute name. > > Is there a way to let Solr accept my xml w/o me changing my xml into > the ? > > Thanks. > > Elaine > > -- View this message in context: http://www.nabble.com/can-solr-accept-other-tag-other-than-field--tp25066496p25066638.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem getting Solr home from JNDI in Tomcat
Hi all, I'm having problems getting Solr to start on Tomcat 6. Tomcat is installed in /opt/apache-tomcat , solr is in /opt/apache-tomcat/webapps/solr , and my Solr home directory is /opt/solr . My config file is in /opt/solr/conf/solrconfig.xml . I have a Solr-specific context file in /opt/apache-tomcat/conf/Catalina/localhost/solr.xml which looks like this: But when I start Solr and browse to it, it tells me: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/ at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:194) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:162) at org.apache.solr.core.Config.(Config.java:100) at org.apache.solr.core.SolrConfig.(SolrConfig.java:113) at org.apache.solr.core.SolrConfig.(SolrConfig.java:70) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4356) at org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:1244) at org.apache.catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:604) at org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:129) at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Weirdly, the exact same context file works fine on a different machine. I've tried giving Context a docBase element (both absolute, and relative paths) but it makes no difference -- Solr still isn't seeing the right home directory. I also tried setting debug="1" but didn't see any more useful info anywhere. Any ideas? This is a total show-stopper for me as this is our production server. (Otherwise I'd think about taking it down and hardwiring the Solr home path into the server's context...) Yours hopefully, Andrew. -- View this message in context: http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25662200.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem getting Solr home from JNDI in Tomcat
Constantijn Visinescu wrote: > > This might be a bit of a hack but i got this in the web.xml of my > applicatin > and it works great. > > > >solr/home >/Solr/WebRoot/WEB-INF/solr >java.lang.String > > > That worked, thanks. You're right though, it is a bit of a hack -- I'd prefer to set the path from *outside* the app so it won't get overwritten when I upgrade. Now I've got a completely different error: "org.apache.lucene.index.CorruptIndexException: Unknown format version: -9". I think it might be time for a fresh install... Cheers, Andrew. -- View this message in context: http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25663931.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem getting Solr home from JNDI in Tomcat
hossman wrote: > > > : Hi all, I'm having problems getting Solr to start on Tomcat 6. > > which version of Solr? > > Sorry -- a nightly build from about a month ago. Re. your other message, I was sure the two machines had the same version on, but maybe not -- when I'm back in the office tomorrow I'll upgrade them both to a fresh nightly. hossman wrote: > > > : Tomcat is installed in /opt/apache-tomcat , solr is in > : /opt/apache-tomcat/webapps/solr , and my Solr home directory is > /opt/solr . > > if "solr is in /opt/apache-tomcat/webapps/solr" means that you put the > solr.war in /opt/apache-tomcat/webapps/ and tomcat expanded it into > /opt/apache-tomcat/webapps/solr then that is your problem -- tomcat isn't > even looking at your context file (it only looks at the context files to > ersolve URLs that it cant resolve looking in the webapps directory) > > Yes, it's auto-expanded from a war in webapps. I have to admit to being a bit baffled though -- I can't find this rule anywhere in the Tomcat docs, but I'm a beginner really and they're not the clearest :-) hossman wrote: > > > This is why the examples of using context files on the wiki talk about > keeping the war *outside* of the webapps directory, and using docBase in > your Context declaration... > http://wiki.apache.org/solr/SolrTomcat > > Great, I'll try it this way and see if it clears up. Is it okay to keep the war file *inside* the Solr home directory (/opt/solr in my case) so it's all self-contained? Many thanks, Andrew. -- View this message in context: http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25677750.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem getting Solr home from JNDI in Tomcat
Andrew Clegg wrote: > > > hossman wrote: >> >> >> This is why the examples of using context files on the wiki talk about >> keeping the war *outside* of the webapps directory, and using docBase in >> your Context declaration... >> http://wiki.apache.org/solr/SolrTomcat >> >> > > Great, I'll try it this way and see if it clears up. Is it okay to keep > the war file *inside* the Solr home directory (/opt/solr in my case) so > it's all self-contained? > For the benefit of future searchers -- I tried it this way and it works fine. Thanks again to everyone for helping. Andrew. -- View this message in context: http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25701748.html Sent from the Solr - User mailing list archive at Nabble.com.
Quotes in query string cause NullPointerException
Hi folks, I'm using the 2009-09-30 build, and any single or double quotes in the query string cause an NPE. Is this normal behaviour? I never tried it with my previous installation. Example: http://myserver:8080/solr/select/?title:%22Creatine+kinase%22 (I've also tried without the URL encoding, no difference) Response: HTTP Status 500 - null java.lang.NullPointerException at java.io.StringReader.(StringReader.java:33) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:173) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.valves.RequestFilterValve.process(RequestFilterValve.java:269) at org.apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java:81) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Single quotes have the same effect. Is there another way to specify exact phrases? Thanks, Andrew. -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Quotes in query string cause NullPointerException
Sorry! I'm officially a complete idiot. Personally I'd try to catch things like that and rethrow a 'QueryParseException' or something -- but don't feel under any obligation to listen to me because, well, I'm an idiot. Thanks :-) Andrew. Erik Hatcher-4 wrote: > > don't forget q=... :) > > Erik > > On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote: > >> >> Hi folks, >> >> I'm using the 2009-09-30 build, and any single or double quotes in >> the query >> string cause an NPE. Is this normal behaviour? I never tried it with >> my >> previous installation. >> >> Example: >> >> http://myserver:8080/solr/select/?title:%22Creatine+kinase%22 >> >> (I've also tried without the URL encoding, no difference) >> >> Response: >> >> HTTP Status 500 - null java.lang.NullPointerException at >> java.io.StringReader.(StringReader.java:33) at >> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java: >> 173) at >> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java: >> 78) at >> org.apache.solr.search.QParser.getQuery(QParser.java:131) at >> org >> .apache >> .solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) >> at >> org >> .apache >> .solr >> .handler >> .component.SearchHandler.handleRequestBody(SearchHandler.java:174) >> at >> org >> .apache >> .solr >> .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at >> org >> .apache >> .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) >> at >> org >> .apache >> .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) >> at >> org >> .apache >> .catalina >> .core >> .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: >> 235) >> at >> org >> .apache >> .catalina >> .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >> at >> org >> .apache >> .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: >> 233) >> at >> org >> .apache >> .catalina.core.StandardContextValve.invoke(StandardContextValve.java: >> 175) >> at >> org >> .apache >> .catalina.valves.RequestFilterValve.process(RequestFilterValve.java: >> 269) >> at >> org >> .apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java: >> 81) >> at >> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: >> 568) >> at >> org >> .apache >> .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >> at >> org >> .jstripe >> .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) >> at >> org >> .jstripe >> .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) >> at >> org >> .jstripe >> .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) >> at >> org >> .jstripe >> .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) >> at >> org >> .apache >> .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >> at >> org >> .apache >> .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: >> 109) >> at >> org >> .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: >> 286) >> at >> org >> .apache.coyote.http11.Http11Processor.process(Http11Processor.java: >> 844) >> at >> org.apache.coyote.http11.Http11Protocol >> $Http11ConnectionHandler.process(Http11Protocol.java:583) >> at org.apache.tomcat.util.net.JIoEndpoint >> $Worker.run(JIoEndpoint.java:447) >> at java.lang.Thread.run(Thread.java:619) >> >> Single quotes have the same effect. >> >> Is there another way to specify exact phrases? >> >> Thanks, >> >> Andrew. >> >> -- >> View this message in context: >> http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25704050.html Sent from the Solr - User mailing list archive at Nabble.com.
Result missing from query, but match shows in Field Analysis tool
Hi, I have a field in my index called related_ids, indexed and stored, with the following field type: Several records in my index contain the token 1cuk in the related_ids field, but only *some* of them are returned when I query on this. e.g. if I send a query like this: http://localhost:8080/solr/select/?q=id:2.40.50+AND+related_ids:1cuk&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids I get a single hit for the record with id:2.40.50 . But if I try this, on a different record with id:2.40 : http://localhost:8080/solr/select/?q=id:2.40+AND+related_ids:1cuk&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids I get no hits. However, if I just query for id:2.40 ... http://localhost:8080/solr/select/?q=id:2.40&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids I can clearly see the token "1cuk" in the related_ids field. Not only that, but if I copy and paste record 2.40's related_ids field into the Field Analysis tool in the admin interface, and search on "1cuk", the term 1cuk is visible in the index analyzer's term list, and highlighted! So Field Analysis thinks that I *should* be getting a hit for this term. Can anyone suggest how I'd go about diagnosing this? I'm kind of hitting a brick wall here. If it makes any difference, related_ids for the culprit record 2.40 is large-ish but not enormous (31000 terms). Also I've tried stopping and restarting Solr in case it was some weird caching thing. Thanks in advance, Andrew. -- View this message in context: http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-tp26029040p26029040.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Result missing from query, but match shows in Field Analysis tool
That's probably it! It is quite near the end of the field. I'll try upping it and re-indexing. Thanks :-) Erick Erickson wrote: > > I'm really reaching here, but lucene only indexes the first 10,000 terms > by > default (you can up the limit). Is there a chancethat you're hitting that > limit? That 1cuk is past the 10,000th term > in record 2.40? > > For this to be possible, I have to assume that the FieldAnalysis > tool ignores this limit > > FWIW > Erick > > On Fri, Oct 23, 2009 at 12:01 PM, Andrew Clegg > wrote: > >> >> Hi, >> >> I have a field in my index called related_ids, indexed and stored, with >> the >> following field type: >> >> >>> positionIncrementGap="100"> >> >>> pattern="\W*\s+\W*" /> >> >> >> >> >> Several records in my index contain the token 1cuk in the related_ids >> field, >> but only *some* of them are returned when I query on this. e.g. if I send >> a >> query like this: >> >> >> http://localhost:8080/solr/select/?q=id:2.40.50+AND+related_ids:1cuk&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids >> >> I get a single hit for the record with id:2.40.50 . But if I try this, on >> a >> different record with id:2.40 : >> >> >> http://localhost:8080/solr/select/?q=id:2.40+AND+related_ids:1cuk&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids >> >> I get no hits. However, if I just query for id:2.40 ... >> >> >> http://localhost:8080/solr/select/?q=id:2.40&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids >> >> I can clearly see the token "1cuk" in the related_ids field. >> >> Not only that, but if I copy and paste record 2.40's related_ids field >> into >> the Field Analysis tool in the admin interface, and search on "1cuk", the >> term 1cuk is visible in the index analyzer's term list, and highlighted! >> So >> Field Analysis thinks that I *should* be getting a hit for this term. >> >> Can anyone suggest how I'd go about diagnosing this? I'm kind of hitting >> a >> brick wall here. >> >> If it makes any difference, related_ids for the culprit record 2.40 is >> large-ish but not enormous (31000 terms). Also I've tried stopping and >> restarting Solr in case it was some weird caching thing. >> >> Thanks in advance, >> >> Andrew. >> >> -- >> View this message in context: >> http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-tp26029040p26029040.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-tp26029040p26029417.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr ignoring maxFieldLength?
Morning, Last week I was having a problem with terms visible in my search results in large documents not causing query hits: http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-td26029040.html#a26029351 Erick suggested it might be related to maxFieldLength, so I set this to 2147483647 in my solrconfig.xml and reindexed over the weekend. Unfortunately I'm having the same problem now, even though Erick appears to be right! I've narrowed it down to a single document for testing purposes, and I can get it returned by querying for a term near the beginning, but terms near the end cause no hit, and I can even find the point part way through the document, after which, none of the remaining terms seem to cause a hit. The document is about 32000 terms long, most of which is in a single field called related_ids of about 31000 terms. My first thought was that the text was being chopped up into so many tokens that it was going over the maxFieldLength anyway, but 2147483647/32000=67109, and it seems very unlikely that 67109 tokens would be generated per term! I've tried undeploying and redeploying the whole web app from Tomcat in case the new maxFieldLength hadn't been read, but no difference. If I go to http://localhost:8080/solr/admin/file/?file=solrconfig.xml I can see 2147483647 as expected. Does anyone have any more ideas? This could potentially be a showstopper for us as we have quite a few long-ish documents to index. (32K words doesn't seem that long to me, but still...) I've tried it with today's nightly build (2009-10-26) and it makes no difference. If this sounds like a bug, I'll open a JIRA and attach tars of my config and data directories. Any thoughts? Thanks, Andrew. -- View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26057808.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr ignoring maxFieldLength?
Yep, I just re-indexed it again to make double sure -- same problem unfortunately. My solrconfig.xml and schema.xml are attached. In case you want to see it in action on the same data I've got, I've tarred up my data and conf directories here: http://biotext.org.uk/static/solr-issue-example.tar.gz That should be enough to reproduce it with. Thanks! Andrew. Yonik Seeley-2 wrote: > > Yes, please show us your solrconfig.xml, and verify that you reindexed > the document after changing maxFieldLength and restarting solr. > > I'll also see if I can reproduce a problem with maxFieldLength being > ignored. > > -Yonik > http://www.lucidimagination.com > > > > On Mon, Oct 26, 2009 at 7:11 AM, Andrew Clegg > wrote: >> >> Morning, >> >> Last week I was having a problem with terms visible in my search results >> in >> large documents not causing query hits: >> >> http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-td26029040.html#a26029351 >> >> Erick suggested it might be related to maxFieldLength, so I set this to >> 2147483647 in my solrconfig.xml and reindexed over the weekend. >> >> Unfortunately I'm having the same problem now, even though Erick appears >> to >> be right! I've narrowed it down to a single document for testing >> purposes, >> and I can get it returned by querying for a term near the beginning, but >> terms near the end cause no hit, and I can even find the point part way >> through the document, after which, none of the remaining terms seem to >> cause >> a hit. >> >> The document is about 32000 terms long, most of which is in a single >> field >> called related_ids of about 31000 terms. My first thought was that the >> text >> was being chopped up into so many tokens that it was going over the >> maxFieldLength anyway, but 2147483647/32000=67109, and it seems very >> unlikely that 67109 tokens would be generated per term! >> >> I've tried undeploying and redeploying the whole web app from Tomcat in >> case >> the new maxFieldLength hadn't been read, but no difference. If I go to >> >> http://localhost:8080/solr/admin/file/?file=solrconfig.xml >> >> I can see >> >> 2147483647 >> >> as expected. >> >> Does anyone have any more ideas? This could potentially be a showstopper >> for >> us as we have quite a few long-ish documents to index. (32K words doesn't >> seem that long to me, but still...) >> >> I've tried it with today's nightly build (2009-10-26) and it makes no >> difference. If this sounds like a bug, I'll open a JIRA and attach tars >> of >> my config and data directories. Any thoughts? >> >> Thanks, >> >> Andrew. >> >> -- >> View this message in context: >> http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26057808.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > http://www.nabble.com/file/p26060882/solrconfig.xml solrconfig.xml http://www.nabble.com/file/p26060882/schema.xml schema.xml -- View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26060882.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr ignoring maxFieldLength?
Yonik Seeley-2 wrote: > > Sorry Andrew, this is something that's bitten people before. > search for maxFieldLength and you will see *2* of them in your config > - one for indexDefaults and one for mainIndex. > The one in mainIndex is set at 1 and hence overrides the one in > indexDefaults. > Sorry -- schoolboy error. Glad I'm not the only one though. Yes, that seems to have fixed it... Cheers, Andrew. -- View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26061360.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr ignoring maxFieldLength?
Yonik Seeley-2 wrote: > > If you could, it would be great if you could test commenting out the > one in mainIndex and see if it inherits correctly from > indexDefaults... if so, I can comment it out in the example and remove > one other little thing that people could get wrong. > Yep, it seems perfectly happy like this. I'm going to try commenting out all of mainIndex to see if it can successfully inherit everything from indexDefaults -- since I have single I won't need an unlockOnStartup entry, which doesn't appear in indexDefaults (at least in any of the config files I've seen). So... false 10 2147483647 2147483647 1000 1 single If the big overnight indexing job fails with these settings, I'll let you know. Cheers, Andrew. -- View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26062113.html Sent from the Solr - User mailing list archive at Nabble.com.
Greater-than and less-than in data import SQL queries
Hi, If I have a DataImportHandler query with a greater-than sign in, like this: Everything's fine. However, if it contains a less-than sign: I get this exception: INFO: Processing configuration from solrconfig.xml: {config=dataconfig.xml} [Fatal Error] :240:129: The value of attribute "query" associated with an element type "null" must not contain the '<' character. 27-Oct-2009 15:30:49 org.apache.solr.handler.dataimport.DataImportHandler inform SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:184) at org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:101) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:113) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:424) at org.apache.solr.core.SolrCore.(SolrCore.java:588) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4356) at org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:1244) at org.apache.catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:604) at org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:129) at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: org.xml.sax.SAXParseException: The value of attribute "query" associated with an element type "null" must not contain the '<' character. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283) at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:172) ... 30 more Is this fixable, or an unavoidable feature of Xerces? If the latter, perhaps the docs could benefit from a note to say "use NOT a >= b" or something? Speaking of, I found this in the wiki examples for the DIH: Shouldn't that be one equals sign: deltaImportQuery="select * from item where ID='${dataimporter.delta.id}'" Or is it doing something clever with Java operators? Cheers, Andrew. -- View this message in context: http://www.nabble.com/Greater-than-and-less-than-in-data-import-SQL-queries-tp26080100p26080100.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Greater-than and less-than in data import SQL queries
Heh, eventually I decided "where 4 > node_depth" was the most pleasing (if slightly WTF-ish) way of writing it... Cheers, Andrew. Erik Hatcher-4 wrote: > > Use < instead of < in that attribute. That should fix the issue. > Remember, it's an XML file, so it has to obey XML encoding rules which > make it ugly but whatcha gonna do? > > Erik > > On Oct 27, 2009, at 11:50 AM, Andrew Clegg wrote: > >> >> Hi, >> >> If I have a DataImportHandler query with a greater-than sign in, >> like this: >> >>> query="select *, >> title as keywords from cathnode_text where node_depth > 4"> >> >> Everything's fine. However, if it contains a less-than sign: >> >>> query="select *, >> title as keywords from cathnode_text where node_depth < 4"> >> >> I get this exception: >> >> INFO: Processing configuration from solrconfig.xml: >> {config=dataconfig.xml} >> [Fatal Error] :240:129: The value of attribute "query" associated >> with an >> element type "null" must not contain the '<' character. >> 27-Oct-2009 15:30:49 >> org.apache.solr.handler.dataimport.DataImportHandler >> inform >> SEVERE: Exception while loading DataImporter >> org.apache.solr.handler.dataimport.DataImportHandlerException: >> Exception >> occurred while initializing context >>at >> org >> .apache >> .solr >> .handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:184) >>at >> org >> .apache >> .solr.handler.dataimport.DataImporter.(DataImporter.java:101) >>at >> org >> .apache >> .solr >> .handler.dataimport.DataImportHandler.inform(DataImportHandler.java: >> 113) >>at >> org >> .apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java: >> 424) >>at org.apache.solr.core.SolrCore.(SolrCore.java:588) >>at >> org.apache.solr.core.CoreContainer >> $Initializer.initialize(CoreContainer.java:137) >>at >> org >> .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: >> 83) >>at >> org >> .apache >> .catalina >> .core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: >> 275) >>at >> org >> .apache >> .catalina >> .core >> .ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: >> 397) >>at >> org >> .apache >> .catalina >> .core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) >>at >> org >> .apache >> .catalina.core.StandardContext.filterStart(StandardContext.java:3709) >>at >> org.apache.catalina.core.StandardContext.start(StandardContext.java: >> 4356) >>at >> org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java: >> 1244) >>at >> org >> .apache >> .catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java: >> 604) >>at >> org >> .apache >> .catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java: >> 129) >>at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) >>at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) >>at >> org >> .apache >> .catalina >> .core >> .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: >> 290) >>at >> org >> .apache >> .catalina >> .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>at >> org >> .apache >> .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: >> 233) >>at >> org >> .apache >> .catalina.core.StandardContextValve.invoke(StandardContextValve.java: >> 175) >>at >> org >> .apache >> .catalina >> .authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) >>at >> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: >> 568) >>at >> org >> .apache >> .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >>at >> org >> .apache >> .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >>at >> org >> .apache >> .catalina.core.StandardEngineValve.invoke(Sta
Faceting within one document
Hi, If I give a query that matches a single document, and facet on a particular field, I get a list of all the terms in that field which appear in that document. (I also get some with a count of zero, I don't really understand where they come from... ?) Is it possible with faceting, or a similar mechanism, to get a count of how many times each term appears within that document? This would be really useful for building a list of top keywords within a long document, for summarization purposes. I can do it on the client side but it'd be nice to know if there's a quicker way. Thanks! Andrew. -- View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting within one document
Isn't the TermVectorComponent more for one document at a time, and the TermsComponent for the whole index? Actually -- having done some digging... What I'm really after is the most informative terms in a given document, which should take into account global document frequency as well as term frequency in the document at hand. I think I can use the MoreLikeThisHandler to do this, with a bit of experimentation... Thanks for the facet mincount tip BTW. Andrew. Avlesh Singh wrote: > > For facets - > http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount > For terms - http://wiki.apache.org/solr/TermsComponent > > Helps? > > Cheers > Avlesh > > On Wed, Oct 28, 2009 at 11:32 PM, Andrew Clegg > wrote: > >> >> Hi, >> >> If I give a query that matches a single document, and facet on a >> particular >> field, I get a list of all the terms in that field which appear in that >> document. >> >> (I also get some with a count of zero, I don't really understand where >> they >> come from... ?) >> >> Is it possible with faceting, or a similar mechanism, to get a count of >> how >> many times each term appears within that document? >> >> This would be really useful for building a list of top keywords within a >> long document, for summarization purposes. I can do it on the client side >> but it'd be nice to know if there's a quicker way. >> >> Thanks! >> >> Andrew. >> >> -- >> View this message in context: >> http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26099847.html Sent from the Solr - User mailing list archive at Nabble.com.
dismax and query analysis
Morning, Can someone clarify how dismax queries work under the hood? I couldn't work this particular point out from the documentation... I get that they pretty much issue the user's query against all of the fields in the schema -- or rather, all of the fields you've specified in the qf parameter in the config or the request. But, does each of these 'sub'-queries get analyzed according to the normal analysis rules for the field it's getting sent to? Or are they passed through verbatim? I'm hoping it's the former, as we have a variety of different field types with radically different tokenization and filtering... Also, is there any plan to implement wildcards in dismax, or is this unfeasible? Thanks once again :-) Andrew. -- View this message in context: http://www.nabble.com/dismax-and-query-analysis-tp26111465p26111465.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dismax and query analysis
Thanks, that demonstrates it really nicely. Now if only dismax did wildcards too... :-) Cheers, Andrew. ANithian wrote: > > The best way to get started with answering this is to pass the > &debugQuery=true and to scroll down the results page. Here, you will see a > breakdown of how the query you entered in the q field is being parsed and > sent to lucene via the pf,qf, and bf. You can also see how the weights > affect the different score and why one document was ranked higher than > another. > > The text of the query will be analyzed depending on the set of analyzers > assigned to that particular field for queries (as opposed to indexing). > For > example, if "test" is matched against a "string" vs "text" field, > different > analyzers may be applied to "string" or "text" > > Hope that helps > Amit > > On Thu, Oct 29, 2009 at 4:39 AM, Andrew Clegg > wrote: > >> >> Morning, >> >> Can someone clarify how dismax queries work under the hood? I couldn't >> work >> this particular point out from the documentation... >> >> I get that they pretty much issue the user's query against all of the >> fields >> in the schema -- or rather, all of the fields you've specified in the qf >> parameter in the config or the request. >> >> But, does each of these 'sub'-queries get analyzed according to the >> normal >> analysis rules for the field it's getting sent to? Or are they passed >> through verbatim? >> >> I'm hoping it's the former, as we have a variety of different field types >> with radically different tokenization and filtering... >> >> Also, is there any plan to implement wildcards in dismax, or is this >> unfeasible? >> >> Thanks once again :-) >> >> Andrew. >> >> -- >> View this message in context: >> http://www.nabble.com/dismax-and-query-analysis-tp26111465p26111465.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/dismax-and-query-analysis-tp26111465p26118506.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting within one document
Are you sure? I've *never* explicitly deleted a document, I only ever rebuild the entire index with the data import handler's "full import with cleaning" operation. Lance Norskog-2 wrote: > > 0-value facets are left behind by docs which you have deleted. If you > optimize, there should be no 0-value facets. > > On Wed, Oct 28, 2009 at 11:36 AM, Andrew Clegg > wrote: >> >> >> Isn't the TermVectorComponent more for one document at a time, and the >> TermsComponent for the whole index? >> >> Actually -- having done some digging... What I'm really after is the most >> informative terms in a given document, which should take into account >> global >> document frequency as well as term frequency in the document at hand. I >> think I can use the MoreLikeThisHandler to do this, with a bit of >> experimentation... >> >> Thanks for the facet mincount tip BTW. >> >> Andrew. >> >> >> Avlesh Singh wrote: >>> >>> For facets - >>> http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount >>> For terms - http://wiki.apache.org/solr/TermsComponent >>> >>> Helps? >>> >>> Cheers >>> Avlesh >>> >>> On Wed, Oct 28, 2009 at 11:32 PM, Andrew Clegg >>> wrote: >>> >>>> >>>> Hi, >>>> >>>> If I give a query that matches a single document, and facet on a >>>> particular >>>> field, I get a list of all the terms in that field which appear in that >>>> document. >>>> >>>> (I also get some with a count of zero, I don't really understand where >>>> they >>>> come from... ?) >>>> >>>> Is it possible with faceting, or a similar mechanism, to get a count of >>>> how >>>> many times each term appears within that document? >>>> >>>> This would be really useful for building a list of top keywords within >>>> a >>>> long document, for summarization purposes. I can do it on the client >>>> side >>>> but it'd be nice to know if there's a quicker way. >>>> >>>> Thanks! >>>> >>>> Andrew. >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>> >>>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Faceting-within-one-document-tp26099278p26099847.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Lance Norskog > goks...@gmail.com > > -- View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26119536.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting within one document
Actually Avlesh pointed me at that, earlier in the thread. But thanks :-) Yonik Seeley-2 wrote: > > On Wed, Oct 28, 2009 at 2:02 PM, Andrew Clegg > wrote: >> If I give a query that matches a single document, and facet on a >> particular >> field, I get a list of all the terms in that field which appear in that >> document. >> >> (I also get some with a count of zero, I don't really understand where >> they >> come from... ?) > > By default, solr has a facet.mincount of zero, so it includes terms > that don't match your set of documents. > Try facet.mincount=1 > > -Yonik > http://www.lucidimagination.com > > >> Is it possible with faceting, or a similar mechanism, to get a count of >> how >> many times each term appears within that document? >> >> This would be really useful for building a list of top keywords within a >> long document, for summarization purposes. I can do it on the client side >> but it'd be nice to know if there's a quicker way. >> >> Thanks! >> >> Andrew. >> >> -- >> View this message in context: >> http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26120291.html Sent from the Solr - User mailing list archive at Nabble.com.
NullPointerException with TermVectorComponent
Hi, I've recently added the TermVectorComponent as a separate handler, following the example in the supplied config file, i.e.: true tvComponent It works, but with one quirk. When you use tf.all=true, you get the tf*idf scores in the output, just fine (along with tf and df). But if you use tv.tf_idf=true you get an NPE: http://server:8080/solr/tvrh/?q=1cuk&version=2.2&indent=on&tv.tf_idf=true HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.handler.component.TermVectorComponent$TVMapper.getDocFreq(TermVectorComponent.java:253) at org.apache.solr.handler.component.TermVectorComponent$TVMapper.map(TermVectorComponent.java:245) at org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.java:522) at org.apache.lucene.index.TermVectorsReader.readTermVectors(TermVectorsReader.java:401) at org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:378) at org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:1253) at org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java:474) at org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java:244) at org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:125) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at (etc.) Is this a bug, or am I doing it wrong? Cheers, Andrew. -- View this message in context: http://old.nabble.com/NullPointerException-with-TermVectorComponent-tp26156903p26156903.html Sent from the Solr - User mailing list archive at Nabble.com.
Highlighting is very slow
Hi everyone, I'm experimenting with highlighting for the first time, and it seems shockingly slow for some queries. For example, this query: http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on takes 313ms. But when I add highlighting: http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=*&fl=id it takes 305212ms = 5mins! Some of my documents are slightly large -- the 10 hits for that query contain between 362 bytes and 1.4 megabytes of text each. All fields are stored and indexed, and most are termvectored. But this doesn't seem excessively large! Has anyone else seen this sort of behaviour before? This is with a nightly from 2009-10-26. All suggestions would be appreciated. My schema and config files are attached... http://old.nabble.com/file/p26160216/schema.xml schema.xml http://old.nabble.com/file/p26160216/solrconfig.xml solrconfig.xml Thanks (once again), Andrew. -- View this message in context: http://old.nabble.com/Highlighting-is-very-slow-tp26160216p26160216.html Sent from the Solr - User mailing list archive at Nabble.com.
Highlighting is very slow
Hi everyone, I'm experimenting with highlighting for the first time, and it seems shockingly slow for some queries. For example, this query: http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on takes 313ms. But when I add highlighting: http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=*&fl=id it takes 305212ms = 5mins! Some of my documents are slightly large -- the 10 hits for that query contain between 362 bytes and 1.4 megabytes of text each. All fields are stored and indexed, and most are termvectored. But this doesn't seem excessively large! Has anyone else seen this sort of behaviour before? This is with a nightly from 2009-10-26. All suggestions would be appreciated. My schema and config files are attached... http://old.nabble.com/file/p26160217/schema.xml schema.xml http://old.nabble.com/file/p26160217/solrconfig.xml solrconfig.xml Thanks (once again), Andrew. -- View this message in context: http://old.nabble.com/Highlighting-is-very-slow-tp26160217p26160217.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting is very slow
We're on 1.6 already. Any chance you could share your GC settings? Thanks, Andrew. PS apologies for the duplicate message yesterday, Nabble threw an exception when I posted the first one. And the second one actually. Jaco-4 wrote: > > Hi, > > We had a similar case once (although not with those really long response > times). Fixed by moving to JRE 1.6 and tuning garbage collection. > > Bye, > > Jaco. > > 2009/11/3 Andrew Clegg > >> >> Hi everyone, >> >> I'm experimenting with highlighting for the first time, and it seems >> shockingly slow for some queries. >> >> For example, this query: >> >> >> http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on >> >> takes 313ms. But when I add highlighting: >> >> >> http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=*&fl=id >> >> it takes 305212ms = 5mins! >> >> Some of my documents are slightly large -- the 10 hits for that query >> contain between 362 bytes and 1.4 megabytes of text each. All fields are >> stored and indexed, and most are termvectored. But this doesn't seem >> excessively large! >> >> Has anyone else seen this sort of behaviour before? This is with a >> nightly >> from 2009-10-26. >> >> All suggestions would be appreciated. My schema and config files are >> attached... >> >> http://old.nabble.com/file/p26160217/schema.xml schema.xml >> http://old.nabble.com/file/p26160217/solrconfig.xml solrconfig.xml >> >> Thanks (once again), >> >> Andrew. >> >> -- >> View this message in context: >> http://old.nabble.com/Highlighting-is-very-slow-tp26160217p26160217.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/Highlighting-is-very-slow-tp26160217p26194384.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting is very slow
Nicolas Dessaigne wrote: > > Alternatively, you could use a copyfield with a maxChars limit as your > highlighting field. Works well in my case. > Thanks for the tip. We did think about doing something similar (only enabling highlighting for certain shorter fields) but we decided that perhaps users would be confused if search terms were sometimes snippeted+highlighted and sometimes not. (A brief run through with a single user suggested this, although that's not statistically significant...) So we decided to avoid highlighting altogether until we can do it across the board. Cheers, Andrew. -- View this message in context: http://old.nabble.com/Highlighting-is-very-slow-tp26160216p26267441.html Sent from the Solr - User mailing list archive at Nabble.com.
Selection of terms for MoreLikeThis
Hi, If I run a MoreLikeThis query like the following: http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=list&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1 one of the hits in the results is "and" (I don't do any stopword removal on this field). However if I look inside that document with the TermVectorComponent: http://www.cathdb.info/solr/select/?q=id:3.40.50.720&tv=true&tv.all=true&tv.fl=keywords I see that "and" has a measly tf.idf of 7.46E-4. But there are other terms with *much* higher tf.idf scores, e.g.: 1 10 0.1 that *don't* appear in the MoreLikeThis list. (I tried adding &mlt.maxwl=999 to the end of the MLT query but it makes no difference.) What's going on? Surely something with tf.idf = 0.1 is a far better candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4? Or does MoreLikeThis do some other heuristic magic to select good candidates, and sometimes get it wrong? BTW the keywords field is indexed, stored, multi-valued and term-vectored. Thanks, Andrew. -- :: http://biotext.org.uk/ :: -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26286005.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Arguments for Solr implementation at public web site
Lukáš Vlček wrote: > > I am looking for good arguments to justify implementation a search for > sites > which are available on the public internet. There are many sites in > "powered > by Solr" section which are indexed by Google and other search engines but > still they decided to invest resources into building and maintenance of > their own search functionality and not to go with [user_query site: > my_site.com] google search. Why? > You're assuming that Solr is just used in these cases to index discrete web pages which Google etc. would be able to access via following navigational links. I would imagine that in a lot of cases, Solr is used to index database entities which are used to build [parts of] pages dynamically, and which might be viewable in different forms in various different pages. Plus, with stored fields, you have the option of actually driving a website off Solr instead of directly off a database, which might make sense from a speed perspective in some cases. And further, going back to page-only indexing -- you have no guarantee when Google will decide to recrawl your site, so there may be a delay before changes show up in their index. With an in-house search engine you can reindex as often as you like. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html Sent from the Solr - User mailing list archive at Nabble.com.
Data import problem with child entity from different database
Morning all, I'm having problems with joining child a child entity from one database to a parent from another... My entity definitions look like this (names changed for brevity): c is getting indexed fine (it's stored, I can see field 'c' in the search results) but child.d isn't. I know the child table has data for the corresponding parent rows, and I've even watched the SQL queries against the child table appearing in Oracle's sqldeveloper as the DataImportHandler runs. But no content for child.d gets into the index. My schema contains a definition for a field called d like so: (keywords_ids is a conservatively-analyzed text type which has worked fine in other contexts.) Two things occur to me. 1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables is just a char(4), nothing fancy. Could something weird with character encodings be happening? 2. d isn't a primary key in either parent or child, but this shouldn't matter should it? Additional data points -- I also tried using the CachedSqlEntityProcessor to do in-memory table caching of child, but it didn't work then either. I got a lot of error messages like this: No value available for the cache key : d in the entity : child If anyone knows whether this is a known limitation (if so I can work round it), or an unexpected case (if so I'll file a bug report), please shout. I'm using 1.4. Yet again, many thanks :-) Andrew. -- View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Arguments for Solr implementation at public web site
Lukáš Vlček wrote: > > When you need to search for something Lucene or Solr related, which one do > you use: > - generic Google > - go to a particular mail list web site and search from here (if there is > any search form at all) > Both of these (Nabble in the second case) in case any recent posts have appeared which Google hasn't picked up. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334980.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Selection of terms for MoreLikeThis
Any ideas on this? Is it worth sending a bug report? Those links are live, by the way, in case anyone wants to verify that MLT is returning suggestions with very low tf.idf. Cheers, Andrew. Andrew Clegg wrote: > > Hi, > > If I run a MoreLikeThis query like the following: > > http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=list&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1 > > one of the hits in the results is "and" (I don't do any stopword removal > on this field). > > However if I look inside that document with the TermVectorComponent: > > http://www.cathdb.info/solr/select/?q=id:3.40.50.720&tv=true&tv.all=true&tv.fl=keywords > > I see that "and" has a measly tf.idf of 7.46E-4. But there are other terms > with *much* higher tf.idf scores, e.g.: > > > 1 > 10 > 0.1 > > > that *don't* appear in the MoreLikeThis list. (I tried adding > &mlt.maxwl=999 to the end of the MLT query but it makes no difference.) > > What's going on? Surely something with tf.idf = 0.1 is a far better > candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4? > Or does MoreLikeThis do some other heuristic magic to select good > candidates, and sometimes get it wrong? > > BTW the keywords field is indexed, stored, multi-valued and term-vectored. > > Thanks, > > Andrew. > > -- > :: http://biotext.org.uk/ :: > > -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26335061.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Data import problem with child entity from different database
Noble Paul നോബിള് नोब्ळ्-2 wrote: > > no obvious issues. > you may post your entire data-config.xml > Here it is, exactly as last attempt but with usernames etc. removed. Ignore the comments and the unused FileDataSource... http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml Noble Paul നോബിള് नोब्ळ्-2 wrote: > > do w/o CachedSqlEntityProcessor first and then apply that later > Yep, that was just a bit of a wild stab in the dark to see if it made any difference. Thanks, Andrew. -- View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Selection of terms for MoreLikeThis
Chantal Ackermann wrote: > > no idea, I'm afraid - but could you sent the output of > interestingTerms=details? > This at least would show what MoreLikeThis uses, in comparison to the > TermVectorComponent you've already pasted. > I can, but I'm afraid they're not very illuminating! http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=details&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1 0 59 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 Cheers, Andrew. -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26336558.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Selection of terms for MoreLikeThis
Chantal Ackermann wrote: > > your URL does not include the parameter mlt.boost. Setting that to > "true" made a noticeable difference for my queries. > Hmm, I'm really not sure if this is doing the right thing either. When I add it I get: 1.0 0.60737264 0.27599618 0.2476748 0.24487767 0.23969446 0.1990452 0.18447271 0.13297324 0.1233415 0.11993817 0.11789705 0.117194556 0.11164951 0.10744005 0.09943076 0.097062066 0.09287166 0.0877542 0.0864609 0.08362857 0.07988805 0.079598725 0.07747293 0.075560644 "and" scores far more highly than much more discriminative words like "chloroplast" and "glyoxylate", both of which have *much* higher tf.idf scores than "and" according to the TermVectorComponent: 8 1887 0.0042395336512983575 7 0.0063006300630063005 45 60316 7.460706943431262E-4 In fact an order of magnitude higher. Chantal Ackermann wrote: > > If not, there is also the parameter > mlt.minwl > "minimum word length below which words will be ignored." > > All your other terms seem longer than 3, so it would help in this case? > But seems a bit like work around. > Yeah, I could do that, or add a stopword list to that field. But there are some other common terms in the list like "protein" or "enzyme" that are long and not really stopwords, but have a similarly low tf.idf to "and": 43 189541 2.2686384476181933E-4 15 16712 8.975586404978459E-4 Plus, of course, I'm curious to know exactly how MLT is identifying those terms as important, and if it's a bug or my fault... Thanks for your help though! Do any of the Solr devs have an idea of the mechanism at work here? Andrew. -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26337677.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 'Connection reset' in DataImportHandler Development Console
aerox7 wrote: > > Hi Andrew, > I download the last build of solr (1.4) and i have the same probleme with > DebugNow in Dataimport dev Console. have you found a solution ? > Sorry about slow reply, I've been on holiday. No, I never found a solution, it worked in some nightlies but not in others, if I remember correctly. I haven't tried it in 1.4 yet, I got around my problem another way. Andrew. -- View this message in context: http://old.nabble.com/%27Connection-reset%27-in-DataImportHandler-Development-Console-tp25005850p26779966.html Sent from the Solr - User mailing list archive at Nabble.com.
Filtering near-duplicates using TextProfileSignature
Hi, I'm interested in near-dupe removal as mentioned (briefly) here: http://wiki.apache.org/solr/Deduplication However the link for TextProfileSignature hasn't been filled in yet. Does anyone have an example of using TextProfileSignature that demonstrates the tunable parameters mentioned in the wiki? Thanks! Andrew. -- View this message in context: http://old.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp27127151p27127151.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering near-duplicates using TextProfileSignature
Thanks Erik, but I'm still a little confused as to exactly where in the Solr config I set these parameters. The example on the wiki page uses Lookup3Signature which (presumably) takes no parameters, so there's no indication in the XML examples of where you would set them. Unless I'm missing something. Thanks again, Andrew. Erik Hatcher-4 wrote: > > > On Jan 12, 2010, at 7:56 AM, Andrew Clegg wrote: >> I'm interested in near-dupe removal as mentioned (briefly) here: >> >> http://wiki.apache.org/solr/Deduplication >> >> However the link for TextProfileSignature hasn't been filled in yet. >> >> Does anyone have an example of using TextProfileSignature that >> demonstrates >> the tunable parameters mentioned in the wiki? > > There are some comments in the source code*, but they weren't made > class-level. I'm fixing that and committing it now, but here's the > comment: > > /** > * This implementation is copied from Apache Nutch. > * An implementation of a page signature. It calculates an MD5 hash > * of a plain text "profile" of a page. > * The algorithm to calculate a page "profile" takes the plain > text version of > * a page and performs the following steps: > * > * remove all characters except letters and digits, and bring all > characters > * to lower case, > * split the text into tokens (all consecutive non-whitespace > characters), > * discard tokens equal or shorter than MIN_TOKEN_LEN (default 2 > characters), > * sort the list of tokens by decreasing frequency, > * round down the counts of tokens to the nearest multiple of QUANT > * (QUANT = QUANT_RATE * maxFreq, where > QUANT_RATE is 0.01f > * by default, and maxFreq is the maximum token > frequency). If > * maxFreq is higher than 1, then QUANT is always higher > than 2 (which > * means that tokens with frequency 1 are always discarded). > * tokens, which frequency after quantization falls below QUANT, > are discarded. > * create a list of tokens and their quantized frequency, > separated by spaces, > * in the order of decreasing frequency. > * > * This list is then submitted to an MD5 hash calculation.*/ > > There are two parameters this implementation takes: > > quantRate = params.getFloat("quantRate", 0.01f); > minTokenLen = params.getInt("minTokenLen", 2); > > Hope that helps. > > Erik > > > > * > http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/update/processor/TextProfileSignature.java > > > -- View this message in context: http://old.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp27127151p27128173.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering near-duplicates using TextProfileSignature
Erik Hatcher-4 wrote: > > > On Jan 12, 2010, at 9:15 AM, Andrew Clegg wrote: >> Thanks Erik, but I'm still a little confused as to exactly where in >> the Solr >> config I set these parameters. > > You'd configure them within the element, something like > this: > > 5 > > OK, thanks. (Should that really be str though, and not int or something?) Erik Hatcher-4 wrote: > > > Perhaps you could update the wiki with an example once you get it > working? > > I'm flying a little blind here, just going off the source code, not > trying it out for real. > > Sure -- it won't be til next week at the earliest though. Cheers, Andrew. -- View this message in context: http://old.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp27127151p27128493.html Sent from the Solr - User mailing list archive at Nabble.com.
Skipping duplicates in DataImportHandler based on uniqueKey
Hi, Is there a way to get the DataImportHandler to skip already-seen records rather than reindexing them? The UpdateHandler has an capability which (as I understand it) means that a document whose uniqueKey matches one already in the index will be skipped instead of overwritten. Can the DIH be made to behave this way? If not, would it be an easy patch? This is using the XPathEntityProcessor by the way. Thanks, Andrew. -- :: http://biotext.org.uk/ :: -- View this message in context: http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-on-uniqueKey-tp771559p771559.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Skipping duplicates in DataImportHandler based on uniqueKey
Marc Sturlese wrote: > > You can use deduplication to do that. Create the signature based on the > unique field or any field you want. > Cool, thanks, I hadn't thought of that. -- View this message in context: http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-on-uniqueKey-tp771559p773268.html Sent from the Solr - User mailing list archive at Nabble.com.
ClassNotFoundException: org.apache.solr.response.VelocityResponseWriter
Hi, I'm trying to get the Velocity / Solritas feature to work for one core of a two-core Solr instance, but it's not playing nice. I know the right jars are being loaded, because I can see them mentioned in the log, but still I get a class not found exception: 09-May-2010 15:34:02 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/var/www/smesh/current/config/solr/twitter/lib/apache-solr-velocity-1.4.1-dev.jar' to classloader 09-May-2010 15:34:02 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/var/www/smesh/current/config/solr/twitter/lib/velocity-1.6.1.jar' to classloader 09-May-2010 15:34:02 org.apache.solr.core.SolrResourceLoader replaceClassLoader ... SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.response.VelocityResponseWriter' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) ... I've attached the whole log, as it's quite big, and Nabble thinks it's spam because it has "too many 'anal' words" ;-) http://n3.nabble.com/file/n787256/solr.log solr.log Here is the appropriate part of my solrconfig.xml for the core which is attempting to load Velocity: browse velocity.properties text/html;charset=UTF-8 Solritas velocity standard *:* 10 *,score on author_name 1 Any ideas? Many thanks, once again! Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/ClassNotFoundException-org-apache-solr-response-VelocityResponseWriter-tp787256p787256.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ClassNotFoundException: org.apache.solr.response.VelocityResponseWriter
Erik Hatcher-4 wrote: > > What version of Solr? Try switching to > class="solr.VelocityResponseWriter", and if that doesn't work use > class="org.apache.solr.request.VelocityResponseWriter". The first > form is the recommended way to do it. The actual package changed in > trunk not too long ago. > Hi Erik, This is with vanilla Solr 1.4 I got it working with solr.VelocityResponseWriter -- thanks. However, I'm having trouble with the defType parameter. I want to use the standard query type so people can used nested booleans etc. in the queries. When I tried this in solrconfig: standard I got this from the Solritas page: HTTP ERROR: 400 Unknown query type 'standard' RequestURI=/solr/twitter/itas Powered by Jetty:// However when I tried this: standard I got this exception: HTTP ERROR: 500 null java.lang.NullPointerException at java.io.StringReader.(StringReader.java:33) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) RequestURI=/solr/twitter/itas Powered by Jetty:// Do you know what the right way to do this is? Thanks, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/ClassNotFoundException-org-apache-solr-response-VelocityResponseWriter-tp787256p787487.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ClassNotFoundException: org.apache.solr.response.VelocityResponseWriter
Sorry -- in the second of those error messages (the NPE) I meant lucene not standard. Andrew Clegg wrote: > > > Erik Hatcher-4 wrote: >> >> What version of Solr? Try switching to >> class="solr.VelocityResponseWriter", and if that doesn't work use >> class="org.apache.solr.request.VelocityResponseWriter". The first >> form is the recommended way to do it. The actual package changed in >> trunk not too long ago. >> > > Hi Erik, > > This is with vanilla Solr 1.4 > > I got it working with solr.VelocityResponseWriter -- thanks. > > However, I'm having trouble with the defType parameter. I want to use the > standard query type so people can used nested booleans etc. in the > queries. When I tried this in solrconfig: > >standard > > I got this from the Solritas page: > > > HTTP ERROR: 400 > > Unknown query type 'standard' > > RequestURI=/solr/twitter/itas > > Powered by Jetty:// > > > However when I tried this: > >standard > > I got this exception: > > > HTTP ERROR: 500 > > null > > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:33) > at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) > at > org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) > at org.apache.solr.search.QParser.getQuery(QParser.java:131) > at > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > > RequestURI=/solr/twitter/itas > > Powered by Jetty:// > > > Do you know what the right way to do this is? > > Thanks, > > Andrew. > > -- View this message in context: http://lucene.472066.n3.nabble.com/ClassNotFoundException-org-apache-solr-response-VelocityResponseWriter-tp787256p787490.html Sent from the Solr - User mailing list archive at Nabble.com.
Fixed: Solritas on multicore Solr, using standard query handler (was Re: ClassNotFoundException: org.apache.solr.response.VelocityResponseWriter)
Don't worry Erik -- I figured this one out. For the benefit of future searchers, you need lucene And to avoid the NullPointerException from the /solr/CORENAME/itas page, you actually need to supply a ?q=blah initial query. I just assumed it would give you a blank search page if you didn't supply a query. N.B. In case this catches anyone out -- there's also a few places where you need to put the core name into the templates in the conf/velocity directory for the core. They don't pick this up automatically so you need to find any references to /solr/admin or /solr/itas and insert your core name in the middle. (Does anyone know if there'd be a simple way to make that automatic?) Andrew Clegg wrote: > > > Erik Hatcher-4 wrote: >> >> What version of Solr? Try switching to >> class="solr.VelocityResponseWriter", and if that doesn't work use >> class="org.apache.solr.request.VelocityResponseWriter". The first >> form is the recommended way to do it. The actual package changed in >> trunk not too long ago. >> > > Hi Erik, > > This is with vanilla Solr 1.4 > > I got it working with solr.VelocityResponseWriter -- thanks. > > However, I'm having trouble with the defType parameter. I want to use the > standard query type so people can used nested booleans etc. in the > queries. When I tried this in solrconfig: > >standard > > I got this from the Solritas page: > > > HTTP ERROR: 400 > > Unknown query type 'standard' > > RequestURI=/solr/twitter/itas > > Powered by Jetty:// > > > However when I tried this: > >standard > > I got this exception: > > > HTTP ERROR: 500 > > null > > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:33) > at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) > at > org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) > at org.apache.solr.search.QParser.getQuery(QParser.java:131) > at > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > > RequestURI=/solr/twitter/itas > > Powered by Jetty:// > > > Do you know what the right way to do this is? > > Thanks, > > Andrew. > > -- :: http://biotext.org.uk/ :: -- View this message in context: http://lucene.472066.n3.nabble.com/ClassNotFoundException-org-apache-solr-response-VelocityResponseWriter-tp787256p787589.html Sent from the Solr - User mailing list archive at Nabble.com.
How bad is stopping Solr with SIGKILL?
Hi folks, I had a Solr instance (in Jetty on Linux) taken down by a process monitoring tool (God) with a SIGKILL recently. How bad is this? Can it cause index corruption if it's in the middle of indexing something? Or will it just lose uncommitted changes? What if the signal arrives in the middle of the commit process? Unfortunately I can't tell exactly what it was doing at the time as someone's deleted the logfile :-( Thanks, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/How-bad-is-stopping-Solr-with-SIGKILL-tp858119p858119.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing link targets in HTML fragments
Hi Solr gurus, I'm wondering if there is an easy way to keep the targets of hyperlinks from a field which may contain HTML fragments, while stripping the HTML. e.g. if I had a field that looked like this: "This is the entire content of my field, but http://example.com/ some of the words are a hyperlink." Then I'd like to keep "http://example.com/"; as a single token (along with all of the actual words) but not the "a" and "href", giving me: "This is the entire content of my field but http://example.com/ some of the words are a hyperlink" I'm thinking that since we're dealing with individual fragments rather than entire HTML pages, Tika/SolrCell may be poorly suited and/or too heavyweight -- but please correct me if I'm wrong. Maybe something using regular expressions? Does anyone have a code snippet they could share? Many thanks, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-link-targets-in-HTML-fragments-tp874547p874547.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing link targets in HTML fragments
Lance Norskog-2 wrote: > > The PatternReplace and HTMPStrip tokenizers might be the right bet. > The easiest way to go about this is to make a bunch of text fields > with different analysis stacks and investigate them in the Scema > Browser. You can paste an HTML document into the text box and see > exactly how the words & markup get torn apart. > Thanks Lance, I'll experiment. For reference, for anyone else who comes across this thread -- the html in my original post might have got munged on the way into or out of the list server. It was supposed to look like this: This is the entire content of my field, but [a href="http://example.com/"]some of the words[/a] are a hyperlink. (but with real html tags instead of the square brackets) and I am just trying to extract the words and the link target but lose the rest of the markup. Cheers, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-link-targets-in-HTML-fragments-tp874547p875503.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing link targets in HTML fragments
findbestopensource wrote: > > Could you tell us your schema used for indexing. In my opinion, using > standardanalyzer / Snowball analyzer will do the best. They will not break > the URLs. Add href, and other related html tags as part of stop words and > it > will removed while indexing. > This project's still in the planning stages -- I haven't designed the pipeline yet. But you're right, maybe starting with everything and just stopping out the tag and attribute names is the most fail-safe approach. Then at least if I get something wrong I won't miss anything. Worst case scenario, I just end up with some extra terms in the index. Thanks, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-link-targets-in-HTML-fragments-tp874547p876343.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering near-duplicates using TextProfileSignature
Neeb wrote: > > Just wondering if you ever managed to run TextProfileSignature based > deduplication. I would appreciate it if you could send me the code > fragment for it from solrconfig. > Actually the project that was for got postponed and I got distracted by other things, for now at least. Re. your config, I don't see a minTokenLength in the wiki page for deduplication, is this a recent addition that's not documented yet? It looks okay to me though -- perhaps you could do some empirical tests to see if it's working? i.e. add some near-dupes to a collection manually and see if it finds them? Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp479039p880379.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering near-duplicates using TextProfileSignature
Andrew Clegg wrote: > > Re. your config, I don't see a minTokenLength in the wiki page for > deduplication, is this a recent addition that's not documented yet? > Sorry about this -- stupid question -- I should have read back through the thread and refreshed my memory. -- View this message in context: http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp479039p880385.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering near-duplicates using TextProfileSignature
Markus Jelsma wrote: > > Well, it got me too! KMail didn't properly order this thread. Can't seem > to > find Hatcher's reply anywhere. ??!!? > Whole thread here: http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tt479039.html -- View this message in context: http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp479039p881797.html Sent from the Solr - User mailing list archive at Nabble.com.