Re: memory size
Hi This is a php problem you need to increase your per thread memory limit in your php.ini the field name is memory_limit Regards David On 11 Nov 2009, at 07:56, Jörg Agatz wrote: Hallo, I have a Problem withe the Memory Size, but i dont know how i can repair it. Maby it is a PHP problem, but i dont know. My Error: Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 16515072 bytes) I hope you can help me KinGArtus
Re: memory size
Depends on number of rows being fetched from Solr + php configuration + solr writer you are using json || xml etc. Rgds, Ritesh Gurung David Stuart wrote: > Hi > This is a php problem you need to increase your per thread memory > limit in your php.ini the field name is memory_limit > > Regards > > David > > On 11 Nov 2009, at 07:56, Jörg Agatz wrote: > >> Hallo, >> >> I have a Problem withe the Memory Size, but i dont know how i can >> repair it. >> >> Maby it is a PHP problem, but i dont know. >> >> My Error: >> >> Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to >> allocate 16515072 bytes) >> >> >> I hope you can help me >> >> KinGArtus >
How to get WildCard/prefix in SolrSharp
In Solrj, there is a method called setAllowLeadingWildcard(true). I need to call the same method in SolrSharp API as well. But I don't find the class "SolrQueryParser.cs" in SolrSharp API. Can any one suggest me how to call that method, if I can use any provided namespace as "org.apache.solr.SolrSharp.Search.SolrQueryParser" in SolrSharp. Thank you Ashik Rajbhandari -- View this message in context: http://old.nabble.com/How-to-get-WildCard-prefix-in-SolrSharp-tp26300435p26300435.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: memory size
I have change the php.ini, now it works... it was a Problem in PHP, because i grouping the Results in PHP, when i have much results i need more memory Thanks for the Help
Re: How to get WildCard/prefix in SolrSharp
AFAIK this needs to be set in the config in your case, which is still an open issue: http://issues.apache.org/jira/browse/SOLR-218 On Wed, Nov 11, 2009 at 9:25 AM, theashik wrote: > > In Solrj, there is a method called setAllowLeadingWildcard(true). I need to > call the same method in SolrSharp API as well. But I don't find the class > "SolrQueryParser.cs" in SolrSharp API. Can any one suggest me how to call > that method, if I can use any provided namespace as > "org.apache.solr.SolrSharp.Search.SolrQueryParser" in SolrSharp. > > > Thank you > Ashik Rajbhandari > -- > View this message in context: > http://old.nabble.com/How-to-get-WildCard-prefix-in-SolrSharp-tp26300435p26300435.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Commit error
Hi folks, i'm getting this error while committing after a dataimport of only 12 docs !!! Exception while solr commit. java.io.IOException: background merge hit exception: _3kta:C2329239 _3ktb:c11->_3ktb into _3ktc [optimize] [mergeDocStores] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2829) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2750) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:138) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:66) at org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:170) at org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:208) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.io.IOException: No hay espacio libre en el dispositivo at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(RandomAccessFile.java:499) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:191) at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85) at org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:75) at org.apache.lucene.store.IndexOutput.writeBytes(IndexOutput.java:45) at org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:229) at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:184) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:217) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5089) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4589) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) Index info: 2.600.000 docs | 11G size System info: 15GB free disk space When attempting to commit the disk usage increases until solr breaks ... it looks like 15 GB is not enought space to do the merge | optimize Any advice? -- Lici
Re: Commit error
2009/11/11 Licinio Fernández Maurelo > Hi folks, > > i'm getting this error while committing after a dataimport of only 12 docs > !!! > > Exception while solr commit. > java.io.IOException: background merge hit exception: _3kta:C2329239 > _3ktb:c11->_3ktb into _3ktc [optimize] [mergeDocStores] > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2829) > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2750) > at > > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401) > at > > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) > at > > org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:138) > at > > org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:66) > at > org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:170) > at > org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:208) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185) > at > > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) > at > > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) > at > > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) > Caused by: java.io.IOException: No hay espacio libre en el dispositivo > at java.io.RandomAccessFile.writeBytes(Native Method) > at java.io.RandomAccessFile.write(RandomAccessFile.java:499) > at > > org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:191) > at > > org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) > at > > org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85) > at > > org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:75) > at org.apache.lucene.store.IndexOutput.writeBytes(IndexOutput.java:45) > at > > org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:229) > at > > org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:184) > at > > org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:217) > at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5089) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4589) > at > > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) > at > > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) > > Index info: 2.600.000 docs | 11G size > System info: 15GB free disk space > > When attempting to commit the disk usage increases until solr breaks ... it > looks like 15 GB is not enought space to do the merge | optimize > > Any advice? > > -- > Lici > Hi Licinio, During the the optimization process, the index size would be approximately double what it was originally and the remaining space on disk may not be enough for the task. You are describing exactly what could be going on -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Commit error
Thanks Israel, i've done a sucesfull import using optimize=false 2009/11/11 Israel Ekpo > 2009/11/11 Licinio Fernández Maurelo > > > Hi folks, > > > > i'm getting this error while committing after a dataimport of only 12 > docs > > !!! > > > > Exception while solr commit. > > java.io.IOException: background merge hit exception: _3kta:C2329239 > > _3ktb:c11->_3ktb into _3ktc [optimize] [mergeDocStores] > > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2829) > > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2750) > > at > > > > > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401) > > at > > > > > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) > > at > > > > > org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:138) > > at > > > > > org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:66) > > at > > org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:170) > > at > > org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:208) > > at > > > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185) > > at > > > > > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) > > at > > > > > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) > > at > > > > > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) > > Caused by: java.io.IOException: No hay espacio libre en el dispositivo > > at java.io.RandomAccessFile.writeBytes(Native Method) > > at java.io.RandomAccessFile.write(RandomAccessFile.java:499) > > at > > > > > org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:191) > > at > > > > > org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) > > at > > > > > org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85) > > at > > > > > org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:75) > > at org.apache.lucene.store.IndexOutput.writeBytes(IndexOutput.java:45) > > at > > > > > org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:229) > > at > > > > > org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:184) > > at > > > > > org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:217) > > at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5089) > > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4589) > > at > > > > > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) > > at > > > > > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) > > > > Index info: 2.600.000 docs | 11G size > > System info: 15GB free disk space > > > > When attempting to commit the disk usage increases until solr breaks ... > it > > looks like 15 GB is not enought space to do the merge | optimize > > > > Any advice? > > > > -- > > Lici > > > > > Hi Licinio, > > During the the optimization process, the index size would be approximately > double what it was originally and the remaining space on disk may not be > enough for the task. > > You are describing exactly what could be going on > -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > -- Lici
Re: deployment questions
Anyone? I have done more reading and testing and it seems like I want to: Use SolrJ and embed solr in my webapp, but I want to disable the http access to solr, meaning force all calls through my solrj interface I am building (no admin access etc). Is there a simple way to do this? Am I better off running solr as a server on its own and using network security? thanks Joel On Nov 9, 2009, at 5:04 PM, Joel Nylund wrote: Hi, I have a java app that is deployed in jboss/tomcat container. I would like to add my solr index to it. I have read about this and it seems fairly straight forward, but im curious the best way to secure it. I require my users to login to my app to use it, so I want the search functions to behave the same way. Ideally I would like to do the solr queries from the client using ajax/json calls. So given this my thinking was I should wrapper the solr servlet and do a local proxy type interface to ensure security. Is there any easier way to do this, or an example of a good way to do this? Or does the solr servlet support a "interceptor" type pattern where I can have it call a piece of code before I execute the call (this application is old and not using std j2ee security so I dont think I can use that.) Another option is to do solrj on the server, and not do the client side calls, in this case I think I could lock down the solr servlet interface to only allow local calls. thanks Joel
${dataimporter.delta.twitter_id} not getting populated in deltaImportQuery
Hi, I have a interesting issue... Basically I am trying to delta imports on solr 1.4 on a postgresql 8.3 database. Basically when I am running a delta import with the entity below I get an exception (see below the entity definition) showing the query its trying to run and you can see that its not populating the where clause of my dataImportQuery. I have tried ${dataimporter.delta.twitter_id} and ${dataimporter.delta.id} and get the same exceptions. Am I missing something obvious? Any help would be appreciated! Regards Mark INFO: Completed parentDeltaQuery for Entity: Tweeter Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: Tweeter document : SolrInputDocument[{}] org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select twitter_id,twitter_id as pk,1 as site_id, screen_name from api_tweeter where twitter_id=;Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:276) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at end of input Position: 1197 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:353) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:345) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:246) ... 11 more Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport SEVERE: Delta Import Failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select twitter_id,twitter_id as pk,1 as site_id, screen_name from api_tweeter where twitter_id=;Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:276) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at end of input Position: 1197 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlag
Re: synonym payload boosting
Hi, I have added a PayloadTermQueryPlugin after reading https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel my class is : */ *import org.apache.solr.common.params.SolrParams;* *import org.apache.solr.common.util.NamedList;* *import org.apache.solr.common.SolrException;* *import org.apache.solr.request.SolrQueryRequest;* *import org.apache.lucene.search.Query;* *import org.apache.lucene.search.payloads.*;* *import org.apache.lucene.queryParser.ParseException;* *import org.apache.lucene.index.Term;* *import org.apache.solr.search.QParser;* *import org.apache.solr.search.QParserPlugin;* *import org.apache.solr.search.QueryParsing;* * * *public class PayloadTermQueryPlugin extends QParserPlugin {* *private MinPayloadFunction payloadFunc;* *@Override* * public void init(NamedList args) {* * this.payloadFunc=new MinPayloadFunction();* * }* * * * @Override* * public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {* *return new QParser(qstr, localParams, params, req) {* *@Override* * public Query parse() throws ParseException {* * * * Term term = new Term(localParams.get(QueryParsing.F), localParams.get(QueryParsing.V));* * return new PayloadTermQuery(term,payloadFunc, false);* * }* *};* * }* I tested it using Solrj * @Override* *protected void setUp() throws Exception {* *super.setUp();* *System.setProperty("solr.solr.home", "C:\\temp\\solr_home1.4");* *CoreContainer.Initializer initializer = new CoreContainer.Initializer();* * * *try {* *coreContainer = initializer.initialize();* *} catch (IOException ex) {* * Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE, null, ex);* *} catch (ParserConfigurationException ex) {* * Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE, null, ex);* *} catch (SAXException ex) {* * Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE, null, ex);* *}* *server = new EmbeddedSolrServer(coreContainer, "");* *}* ** *public void testSeacrhAndBoost() {* *SolrQuery query = new SolrQuery();* *query.setQuery("PFirstName:steve");* *query.setParam("hl.fl", "PFirstName");* * query.setParam("defType", "payload");* *query.setIncludeScore(true);* * * *query.setRows(10);* *query.setFacet(false);* * * *try {* *QueryResponse qr = server.query(query);* ** *List l = qr.getBeans(PersonDoc.class);* *for (PersonDoc personDoc : l) {* *System.out.println(personDoc);* *}* * * *} catch (SolrServerException ex) {* * Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE, null, ex);* * * *}* *}* *}* I get an NPE trying to access localParams in the *public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req)* method The NPE is actually in the *public Query parse() throws ParseException*method I could not find documentation about the parse method, How can I pass the localParams? What is the difference between the localParams and params? I would be happy to write the a case study on the wiki but, I'm not sure exactly what you mean- The resolution i will eventually come to or the process of finding it? I'm still trying to figure out what exactly to do. I have purchased the Solr 1.4 book , but it doesn't seem to have much information about my needs. On Tue, Nov 10, 2009 at 10:09, David Ginzburg wrote: > I would be happy to. > I'm not sure exactly what you mean- The resolution i will eventually come > to or the process of finding it? > I'm still trying to figure out what exactly to do. I have purchased the > Solr 1.4 book , but it doesn't seem to have much information about my needs. > > > -- Forwarded message -- > From: Lance Norskog > Date: Tue, Nov 10, 2009 at 04:11 > Subject: Re: synonym payload boosting > To: solr-user@lucene.apache.org > > > David, when you get this working would you consider writing a case > study on the wiki? Nothing complex, just something that describes how > you did several customizations to create a new feature. > > On Mon, Nov 9, 2009 at 4:10 AM, Grant Ingersoll > wrote: > > > > On Nov 9, 2009, at 4:41 AM, David Ginzburg wrote: > > > >> I have found this > >> > >> > https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > >> patch > >> But i don't want to use any function, just the normal scoring and the > >> similarity class I have written. > >> Can you point me to modifications I need (if any) ? > >> > >> > > > > Amhet's point is that you need some query that will actually invoke the > > payload in scoring. PayloadTermQuery a
Re: ${dataimporter.delta.twitter_id} not getting populated in deltaImportQuery
I have 2 entities from the root node, not sure if that makes a difference! On Wed, Nov 11, 2009 at 4:49 PM, Mark Ellul wrote: > Hi, > > I have a interesting issue... > > Basically I am trying to delta imports on solr 1.4 on a postgresql 8.3 > database. > > Basically when I am running a delta import with the entity below I get an > exception (see below the entity definition) showing the query its trying to > run and you can see that its not populating the where clause of my > dataImportQuery. > > I have tried ${dataimporter.delta.twitter_id} and ${dataimporter.delta.id} > and get the same exceptions. > > Am I missing something obvious? > > Any help would be appreciated! > > Regards > > Mark > > > query=" > select twitter_id, > twitter_id as pk, > 1 as site_id, > screen_name > > from api_tweeter WHERE > tweet_mapreduce_on IS NOT NULL; > " transformer="TemplateTransformer" > > deltaImportQuery=" > select twitter_id, > twitter_id as pk, > 1 as site_id, > screen_name > > from api_tweeter > where twitter_id=${dataimporter.delta.twitter_id }; > " > deltaQuery ="select twitter_id from api_tweeter where modified_on > > '${dataimporter.last_index_time}' and tweet_mapreduce_on IS NOT NULL;" > > > > > > > > > > INFO: Completed parentDeltaQuery for Entity: Tweeter > Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DocBuilder > buildDocument > SEVERE: Exception while processing: Tweeter document : > SolrInputDocument[{}] > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to > execute query: select twitter_id,twitter_id > as pk,1 as site_id, screen_name from api_tweeter where > twitter_id=;Processing Document # 1 > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) > at > org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:276) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172) > at > org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) > Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at end of > input > Position: 1197 > at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:353) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:345) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:246) > ... 11 more > Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DataImporter > doDeltaImport > SEVERE: Delta Import Failed > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to > execute query: select twitter_id,twitter_id > as pk,1 as site_id, screen_name from api_tweeter where > twitter_id=;Processing Document # 1 > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) > at > org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilde
Re: Is optimized?
Yes. I believe the "is the index already optimized" is in the guts of Lucene. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: William Pierce > To: solr-user@lucene.apache.org > Sent: Fri, October 23, 2009 1:05:52 PM > Subject: Is optimized? > > Folks: > > If I issue two requests with no intervening changes to the index, > will the second optimize request be smart enough to not do anything? > > Thanks, > > Bill
Re: NGram query failing
That's actually easy to explain/understand. If the min n-gram size is 3, a query term with just 2 characters will ever match any terms that originally had > 2 characters because longer terms will never get tokenized into terms below 3-character tokens. Take the term: house house => hou ous use If you search term is "ho", it will never match the above, as there is no term "ho" in there. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Charlie Jackson > To: solr-user@lucene.apache.org > Sent: Fri, October 23, 2009 4:32:33 PM > Subject: RE: NGram query failing > > > Well, I fixed my own problem in the end. For the record, this is the > schema I ended up going with: > > > > > > > minGramSize="2" /> > > > > > > minGramSize="2"/> > > > > I could have left it a trigram but went with a bigram because with this > setup, I can get queries to properly hit as long as the min/max gram > size is met. In other words, for any queries two or more characters > long, this works for me. Less than two characters and it fails. > > I don't know exactly why that is, but I'll take it anyway! > > - Charlie > > > -Original Message- > From: Charlie Jackson [mailto:charlie.jack...@cision.com] > Sent: Friday, October 23, 2009 10:00 AM > To: solr-user@lucene.apache.org > Subject: NGram query failing > > I have a requirement to be able to find hits within words in a free-form > id field. The field can have any type of alphanumeric data - it's as > likely it will be something like "123456" as it is to be "SUN-123-ABC". > I thought of using NGrams to accomplish the task, but I'm having a > problem. I set up a field like this > > > > > positionIncrementGap="100"> > > > > > minGramSize="1" maxGramSize="3"/> > > > > > > > > > > After indexing a field like this, the analysis page indicates my queries > should work. If I give it a sample field value of "ABC-123456-SUN" and a > query value of "45" it shows hits in several places, which is what I > expected. > > > > However, when I actually query the field with something like "45" I get > no hits back. Looking at the debugQuery output, it looks like it's > taking my analyzed query text and putting it into a phrase query. So, > for a query of "45" it turns into a phrase query of :"4 5 45" > which then doesn't hit on anything in my index. > > > > What am I missing to make this work? > > > > - Charlie
Re: Delete of non-existent record succeeds
Rather than start a new thread, I'd like to follow up on this. I'm going to oversimplify but the basic question should be straightforward. I currently have one very large SOLR index, and 5 small ones which contain filtered subsets out of the big one and are used for faceting in one area of our site. The means by which we determine documents to go into the smaller ones is somewhat expensive computationally, and involves hitting a database and a machine learning system among other things. The problem I'm considering is that when a document goes "inactive" (indicated by a status field) in the big index, I'd like to remove it from any of the small ones that it happens to be in. This may be any of the 5 or none at all, as they don't nearly cover the whole space. I don't need to keep inactive documents in the small indexes, and prefer to keep them small for performance purposes. So rather than doing the expensive process to figure out what, if any, of the small indexes to issue the delete against, would it be terribly expensive to issue 5 deletes against the 5 servers (cores) and have them not match? What is the overhead on the SOLR side internally to process a (non-)delete in this case? I'm hoping the main overhead on this is bandwidth to issue the requests, which is not a concern since the code will be running on the same machine as the SOLR instances. I appreciate any advice on this matter, and congrats on the release of 1.4! Yonik Seeley wrote: > > delete means delete if it exists. > > Due to how lucene works, to get good performance deletes are actually > buffered... when the method returns, the deletes haven't really been > applied yet. > -- View this message in context: http://old.nabble.com/Delete-of-non-existent-record-succeeds-tp12060767p26304667.html Sent from the Solr - User mailing list archive at Nabble.com.
XmlUpdateRequestHandler with HTMLStripCharFilterFactory
I am trying to post a document with the following content using SolrJ: content I need the xml/html tags to be ignored. Even though this works fine in analysis.jsp, this does not work with SolrJ, as the client escapes the < and > with < and > and HTMLStripCharFilterFactory does not strip those escaped tags. How can I achieve this? Any ideas will be highly appreciated. There is escapedTags in HTMLStripCharFilterFactory constructor. Is there a way to get that to work? Thanks -- Aseem
Re: deployment questions
Either way works, but running Solr as a server means that you have an admin interface. That can be very useful. You will want it as soon as someone asks why some document is not the first hit for their favorite query. wunder On Nov 11, 2009, at 7:26 AM, Joel Nylund wrote: Anyone? I have done more reading and testing and it seems like I want to: Use SolrJ and embed solr in my webapp, but I want to disable the http access to solr, meaning force all calls through my solrj interface I am building (no admin access etc). Is there a simple way to do this? Am I better off running solr as a server on its own and using network security? thanks Joel On Nov 9, 2009, at 5:04 PM, Joel Nylund wrote: Hi, I have a java app that is deployed in jboss/tomcat container. I would like to add my solr index to it. I have read about this and it seems fairly straight forward, but im curious the best way to secure it. I require my users to login to my app to use it, so I want the search functions to behave the same way. Ideally I would like to do the solr queries from the client using ajax/json calls. So given this my thinking was I should wrapper the solr servlet and do a local proxy type interface to ensure security. Is there any easier way to do this, or an example of a good way to do this? Or does the solr servlet support a "interceptor" type pattern where I can have it call a piece of code before I execute the call (this application is old and not using std j2ee security so I dont think I can use that.) Another option is to do solrj on the server, and not do the client side calls, in this case I think I could lock down the solr servlet interface to only allow local calls. thanks Joel
indexing on differnt server
is it possible to index on one server and copy the files over? thanks Joel
Re: indexing on differnt server
Hello! > is it possible to index on one server and copy the files over? > thanks > Joel Yes, it is possible, look at the CollectionDistribution wiki page (http://wiki.apache.org/solr/CollectionDistribution). -- Regards, Rafał Kuć
Re: Delete of non-existent record succeeds
I'd go with just broadcasting the delete. If I remember correctly, that's what we did at one place where we used vanilla Lucene with RMI (pre-Solr) and we didn't see any problems due to that (RMI, on the other hand). Whether this will work for you depends on how often you'll need to do that, among other things. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Eric Kilby > To: solr-user@lucene.apache.org > Sent: Wed, November 11, 2009 11:55:54 AM > Subject: Re: Delete of non-existent record succeeds > > > Rather than start a new thread, I'd like to follow up on this. I'm going to > oversimplify but the basic question should be straightforward. > > I currently have one very large SOLR index, and 5 small ones which contain > filtered subsets out of the big one and are used for faceting in one area of > our site. The means by which we determine documents to go into the smaller > ones is somewhat expensive computationally, and involves hitting a database > and a machine learning system among other things. > > The problem I'm considering is that when a document goes "inactive" > (indicated by a status field) in the big index, I'd like to remove it from > any of the small ones that it happens to be in. This may be any of the 5 or > none at all, as they don't nearly cover the whole space. I don't need to > keep inactive documents in the small indexes, and prefer to keep them small > for performance purposes. > > So rather than doing the expensive process to figure out what, if any, of > the small indexes to issue the delete against, would it be terribly > expensive to issue 5 deletes against the 5 servers (cores) and have them not > match? What is the overhead on the SOLR side internally to process a > (non-)delete in this case? I'm hoping the main overhead on this is > bandwidth to issue the requests, which is not a concern since the code will > be running on the same machine as the SOLR instances. > > I appreciate any advice on this matter, and congrats on the release of 1.4! > > > Yonik Seeley wrote: > > > > delete means delete if it exists. > > > > Due to how lucene works, to get good performance deletes are actually > > buffered... when the method returns, the deletes haven't really been > > applied yet. > > > > -- > View this message in context: > http://old.nabble.com/Delete-of-non-existent-record-succeeds-tp12060767p26304667.html > Sent from the Solr - User mailing list archive at Nabble.com.
Persist in Core Admin
It looks like our core admin wiki doesn't cover the persist action? http://wiki.apache.org/solr/CoreAdmin I'd like to be able to persist the cores to solr.xml, even if . It seems like the persist action does this?
Re: [DIH] blocking import operation
Noble, Noble Paul wrote: > DIH imports are really long running. There is a good chance that the > connection times out or breaks in between. Yes, you're right, I missed that point (in my case imports take no longer than a minute). > how about a callback? Thanks for the hint. There was a discussion on adding a callback url to DIH a month ago, but it seems that no issue was raised. So, up to now its only possible to implement an appropriate Solr EventListener. Should we open an issue for supporting callback urls? Best, Sascha > > On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott wrote: >> Hi all, >> >> currently, DIH's import operation(s) only works asynchronously. >> Therefore, >> after submitting an import request, DIH returns immediately, while the >> import process (in case a large amount of data needs to be indexed) >> continues asynchronously behind the scenes. >> >> So, what is the recommended way to check if the import process has >> already >> finished? Or still better, is there any method / workaround that will >> block >> the import operation's caller until the operation has finished? >> >> In my application, the DIH receives some URL parameters which are used >> for >> determining the database name that is used within data-config.xml, e.g. >> >> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo >> >> Since only one DIH, /dataimport, is defined, but several database needs >> to >> be indexed, it is required to issue this command several times, e.g. >> >> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo >> >> ... wait until /dataimport?command=status says "Indexing completed" (but >> without using a loop that checks it again and again) ... >> >> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false >> >> >> A suitable solution, at least IMHO, would be to have an additional DIH >> parameter which determines whether the import call is blocking on >> non-blocking, the default. As far as I see, this could be accomplished >> since >> Solr can execute more than one import operation at a time (it starts a >> new >> thread for each). Perhaps, my question is somehow related to the >> discussion >> [1] on ParallelDataImportHandler. >> >> Best, >> Sascha >> >> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee >>
weird problem with solr.DateField
Hi, I'm using Solr 1.4 (from nightly build about 2 months ago) and have this defined in solrconfig: and following code that get executed once every night: CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer("http://...";); solrServer.setRequestWriter(new BinaryRequestWriter()); solrServer.add(documents); solrServer.commit(); UpdateResponse deleteResult = solrServer.deleteByQuery("lastUpdate:[* TO NOW-2HOUR]"); solrServer.commit(); The purpose is to refresh index with latest data (in "documents"). This works fine, except that after a few days I start to see a few documents with no "lastUpdate" field (query "-lastUpdate:[* TO *]") -- how can that be possible? thanks in advance. _ Windows 7: Unclutter your desktop. http://go.microsoft.com/?linkid=9690331&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009
Re: any docs on solr.EdgeNGramFilterFactory?
It looks like the CJK one actually does 2-grams plus a little processing separate processing on latin text. That's kind of interesting - in general can I build a custom tokenizer from existing tokenizers that treats different parts of the input differently based on the utf-8 range of the characters? E.g. use a porter stemmer for stretches of Latin text and n-gram or something else for CJK? -Peter On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic wrote: > Yes, that's the n-gram one. I believe the existing CJK one in Lucene is > really just an n-gram tokenizer, so no different than the normal n-gram > tokenizer. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Peter Wolanin >> To: solr-user@lucene.apache.org >> Sent: Tue, November 10, 2009 7:34:37 PM >> Subject: Re: any docs on solr.EdgeNGramFilterFactory? >> >> So, this is the normal N-gram one? NGramTokenizerFactory >> >> Digging deeper - there are actualy CJK and Chinese tokenizers in the >> Solr codebase: >> >> http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html >> http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html >> >> The CJK one uses the lucene CJKTokenizer >> http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html >> >> and there seems to be another one even that no one has wrapped into Solr: >> http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html >> >> So seems like the existing options are a little better than I thought, >> though it would be nice to have some docs on properly configuring >> these. >> >> -Peter >> >> On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic >> wrote: >> > Peter, >> > >> > For CJK and n-grams, I think you don't want the *Edge* n-grams, but just >> n-grams. >> > Before you take the n-gram route, you may want to look at the smart Chinese >> analyzer in Lucene contrib (I think it works only for Simplified Chinese) and >> Sen (on java.net). I also spotted a Korean analyzer in the wild a few months >> back. >> > >> > Otis >> > -- >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> > >> > >> > >> > - Original Message >> >> From: Peter Wolanin >> >> To: solr-user@lucene.apache.org >> >> Sent: Tue, November 10, 2009 4:06:52 PM >> >> Subject: any docs on solr.EdgeNGramFilterFactory? >> >> >> >> This fairly recent blog post: >> >> >> >> >> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ >> >> >> >> describes the use of the solr.EdgeNGramFilterFactory as the tokenizer >> >> for the index. I don't see any mention of that tokenizer on the Solr >> >> wiki - is it just waiting to be added, or is there any other >> >> documentation in addition to the blog post? In particular, there was >> >> a thread last year about using an N-gram tokenizer to enable >> >> reasonable (if not ideal) searching of CJK text, so I'd be curious to >> >> know how people are configuring their schema (with this tokenizer?) >> >> for that use case. >> >> >> >> Thanks, >> >> >> >> Peter >> >> >> >> -- >> >> Peter M. Wolanin, Ph.D. >> >> Momentum Specialist, Acquia. Inc. >> >> peter.wola...@acquia.com >> > >> > >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
[DIH] concurrent requests to DIH
Hi all, I'm using the DIH in a parameterized way by passing request parameters that are used inside of my data-config. All imports end up in the same index. 1. Is it considered as good practice to set up several DIH request handlers, one for each possible parameter value? 2. In case the range of parameter values is broad, it's not convenient to define separate request handlers for each value. But this entails a limitation (as far as I see): It is not possible to fire several request to the same DIH handler (with different parameter values) at the same time. However, in case several request handlers would be used (as in 1.), concurrent requests (to the different handlers) are possible. So, how to overcome this limitation? Best, Sascha
add XML/HTML documents using SolrJ, without bypassing HTML char filter
Hey Guys, How do I add HTML/XML documents using SolrJ such that it does not by pass the HTML char filter? SolrJ escapes the HTML/XML value of a field, and that make it bypass the HTML char filter. For example content if added to a field with HTMLStripCharFilter on the field using SolrJ, is not stripped of center tags. But if check in analysis.jsp, it does get stripped. When I look at the SolrJ XML feed, it looks like this: http://haha.comcontent Any help is highly appreciated. Thanks. -- Aseem
Configuring Solr to use RAMDirectory
Is it possible to configure Solr to fully load indexes in memory? I wasn't able to find any documentation about this on either their site or in the Solr 1.4 Enterprise Search Server book.
Re: complex queries
Hi Erik, Is it possible to get result of one solr query feed into another Solr Query? Issue which I am facing right now is:: I am getting results from one query and I just need 2 index attribute values . These index attribute values are used for form new Query to Solr. Since Solr gives result only for GET request, hence there is restriction on : forming query with all values. Please do send your views on above problem Thanks ~Vikrant Erik Hatcher wrote: > > > On May 6, 2008, at 8:57 PM, Kevin Osborn wrote: >> I don't think this is possible, but I figure that I would ask. >> >> So, I want to find documents that match a search term and where a >> field in those documents are also in the results of a subquery. >> Basically, I am looking for the Solr equivalent of doing a SQL IN >> clause. > > "search clause" AND field:(value1 OR value2 OR value3) > > does that do the trick for you?If not, could you elaborate with > an example? > > Erik > > > -- View this message in context: http://old.nabble.com/complex-queries-tp17095335p26312245.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any docs on solr.EdgeNGramFilterFactory?
Peter, here is a project that does this: http://issues.apache.org/jira/browse/LUCENE-1488 > That's kind of interesting - in general can I build a custom tokenizer > from existing tokenizers that treats different parts of the input > differently based on the utf-8 range of the characters? E.g. use a > porter stemmer for stretches of Latin text and n-gram or something > else for CJK? > > -Peter > > On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic > wrote: > > Yes, that's the n-gram one. I believe the existing CJK one in Lucene is > really just an n-gram tokenizer, so no different than the normal n-gram > tokenizer. > > > > Otis > > -- > > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > > > > > - Original Message > >> From: Peter Wolanin > >> To: solr-user@lucene.apache.org > >> Sent: Tue, November 10, 2009 7:34:37 PM > >> Subject: Re: any docs on solr.EdgeNGramFilterFactory? > >> > >> So, this is the normal N-gram one? NGramTokenizerFactory > >> > >> Digging deeper - there are actualy CJK and Chinese tokenizers in the > >> Solr codebase: > >> > >> > http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html > >> > http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html > >> > >> The CJK one uses the lucene CJKTokenizer > >> > http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html > >> > >> and there seems to be another one even that no one has wrapped into > Solr: > >> > http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html > >> > >> So seems like the existing options are a little better than I thought, > >> though it would be nice to have some docs on properly configuring > >> these. > >> > >> -Peter > >> > >> On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic > >> wrote: > >> > Peter, > >> > > >> > For CJK and n-grams, I think you don't want the *Edge* n-grams, but > just > >> n-grams. > >> > Before you take the n-gram route, you may want to look at the smart > Chinese > >> analyzer in Lucene contrib (I think it works only for Simplified > Chinese) and > >> Sen (on java.net). I also spotted a Korean analyzer in the wild a few > months > >> back. > >> > > >> > Otis > >> > -- > >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > >> > > >> > > >> > > >> > - Original Message > >> >> From: Peter Wolanin > >> >> To: solr-user@lucene.apache.org > >> >> Sent: Tue, November 10, 2009 4:06:52 PM > >> >> Subject: any docs on solr.EdgeNGramFilterFactory? > >> >> > >> >> This fairly recent blog post: > >> >> > >> >> > >> > http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ > >> >> > >> >> describes the use of the solr.EdgeNGramFilterFactory as the tokenizer > >> >> for the index. I don't see any mention of that tokenizer on the Solr > >> >> wiki - is it just waiting to be added, or is there any other > >> >> documentation in addition to the blog post? In particular, there was > >> >> a thread last year about using an N-gram tokenizer to enable > >> >> reasonable (if not ideal) searching of CJK text, so I'd be curious to > >> >> know how people are configuring their schema (with this tokenizer?) > >> >> for that use case. > >> >> > >> >> Thanks, > >> >> > >> >> Peter > >> >> > >> >> -- > >> >> Peter M. Wolanin, Ph.D. > >> >> Momentum Specialist, Acquia. Inc. > >> >> peter.wola...@acquia.com > >> > > >> > > >> > >> > >> > >> -- > >> Peter M. Wolanin, Ph.D. > >> Momentum Specialist, Acquia. Inc. > >> peter.wola...@acquia.com > > > > > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Robert Muir rcm...@gmail.com
Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter
The HTMLStripCharFilter will strip the html for the *indexed* terms, it does not effect the *stored* field. If you don't want html in the stored field, can you just strip it out before passing to solr? On Nov 11, 2009, at 8:07 PM, aseem cheema wrote: Hey Guys, How do I add HTML/XML documents using SolrJ such that it does not by pass the HTML char filter? SolrJ escapes the HTML/XML value of a field, and that make it bypass the HTML char filter. For example content if added to a field with HTMLStripCharFilter on the field using SolrJ, is not stripped of center tags. But if check in analysis.jsp, it does get stripped. When I look at the SolrJ XML feed, it looks like this: http://haha.comcontent Any help is highly appreciated. Thanks. -- Aseem
Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter
Ohhh... you are a life saver... thank you so much.. it makes sense. Aseem On Wed, Nov 11, 2009 at 7:40 PM, Ryan McKinley wrote: > The HTMLStripCharFilter will strip the html for the *indexed* terms, it does > not effect the *stored* field. > > If you don't want html in the stored field, can you just strip it out before > passing to solr? > > > On Nov 11, 2009, at 8:07 PM, aseem cheema wrote: > >> Hey Guys, >> How do I add HTML/XML documents using SolrJ such that it does not by >> pass the HTML char filter? >> >> SolrJ escapes the HTML/XML value of a field, and that make it bypass >> the HTML char filter. For example content if added to >> a field with HTMLStripCharFilter on the field using SolrJ, is not >> stripped of center tags. But if check in analysis.jsp, it does get >> stripped. When I look at the SolrJ XML feed, it looks like this: >> http://haha.com> name="text">content >> >> Any help is highly appreciated. Thanks. >> >> -- >> Aseem > > -- Aseem
Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory
Alright. It turns out that escapedTags is not for what I thought it is for. The problem that I am having with HTMLStripCharFilterFactory is that it strips the html while indexing the field, but not while storing the field. That is why what is see in analysis.jsp, which is index analysis, does not match what gets stored... because.. well HTML is stripped only for indexing. Makes so much sense. Thanks to Ryan McKinley for clarifying this. Aseem On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema wrote: > I am trying to post a document with the following content using SolrJ: > content > I need the xml/html tags to be ignored. Even though this works fine in > analysis.jsp, this does not work with SolrJ, as the client escapes the > < and > with < and > and HTMLStripCharFilterFactory does not > strip those escaped tags. How can I achieve this? Any ideas will be > highly appreciated. > > There is escapedTags in HTMLStripCharFilterFactory constructor. Is > there a way to get that to work? > Thanks > -- > Aseem > -- Aseem
Re: [DIH] concurrent requests to DIH
> > 1. Is it considered as good practice to set up several DIH request > handlers, one for each possible parameter value? > Nothing wrong with this. My assumption is that you want to do this to speed up indexing. Each DIH instance would block all others, once a Lucene commit for the former is performed. 2. In case the range of parameter values is broad, it's not convenient to > define separate request handlers for each value. But this entails a > limitation (as far as I see): It is not possible to fire several request > to the same DIH handler (with different parameter values) at the same > time. > Nope. I had done a similar exercise in my quest to write a ParallelDataImportHandler. This thread might be of interest to you - http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler. Though there is a ticket in JIRA, I haven't been able to contribute this back. If you think this is what you need, lemme know. Cheers Avlesh On Thu, Nov 12, 2009 at 6:35 AM, Sascha Szott wrote: > Hi all, > > I'm using the DIH in a parameterized way by passing request parameters > that are used inside of my data-config. All imports end up in the same > index. > > 1. Is it considered as good practice to set up several DIH request > handlers, one for each possible parameter value? > > 2. In case the range of parameter values is broad, it's not convenient to > define separate request handlers for each value. But this entails a > limitation (as far as I see): It is not possible to fire several request > to the same DIH handler (with different parameter values) at the same > time. However, in case several request handlers would be used (as in 1.), > concurrent requests (to the different handlers) are possible. So, how to > overcome this limitation? > > Best, > Sascha >
Re: Configuring Solr to use RAMDirectory
I think not out of the box, but look at SOLR-243 issue in JIRA. You could also put your index on ram disk (tmpfs), but it would be useless for writing to it. Note that when people ask about loading the whole index in memory explicitly, it's often a premature optimization attempt. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Thomas Nguyen > To: solr-user@lucene.apache.org > Sent: Wed, November 11, 2009 8:46:11 PM > Subject: Configuring Solr to use RAMDirectory > > Is it possible to configure Solr to fully load indexes in memory? I > wasn't able to find any documentation about this on either their site or > in the Solr 1.4 Enterprise Search Server book.
Re: weird problem with solr.DateField
Try changing: to: Then watch the logs for errors during indexing. Otis-- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: siping liu > To: solr-user@lucene.apache.org > Sent: Wed, November 11, 2009 7:29:18 PM > Subject: weird problem with solr.DateField > > > Hi, > > I'm using Solr 1.4 (from nightly build about 2 months ago) and have this > defined > in solrconfig: > > > omitNorms="true" /> > > > multiValued="false" /> > > > > and following code that get executed once every night: > > CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer("http://...";); > solrServer.setRequestWriter(new BinaryRequestWriter()); > > solrServer.add(documents); > solrServer.commit(); > > UpdateResponse deleteResult = solrServer.deleteByQuery("lastUpdate:[* TO > NOW-2HOUR]"); > solrServer.commit(); > > > > The purpose is to refresh index with latest data (in "documents"). > > This works fine, except that after a few days I start to see a few documents > with no "lastUpdate" field (query "-lastUpdate:[* TO *]") -- how can that be > possible? > > > > thanks in advance. > > > > _ > Windows 7: Unclutter your desktop. > http://go.microsoft.com/?linkid=9690331&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009
Re: [DIH] blocking import operation
Yes , open an issue . This is a trivial change On Thu, Nov 12, 2009 at 5:08 AM, Sascha Szott wrote: > Noble, > > Noble Paul wrote: >> DIH imports are really long running. There is a good chance that the >> connection times out or breaks in between. > Yes, you're right, I missed that point (in my case imports take no longer > than a minute). > >> how about a callback? > Thanks for the hint. There was a discussion on adding a callback url to > DIH a month ago, but it seems that no issue was raised. So, up to now its > only possible to implement an appropriate Solr EventListener. Should we > open an issue for supporting callback urls? > > Best, > Sascha > >> >> On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott wrote: >>> Hi all, >>> >>> currently, DIH's import operation(s) only works asynchronously. >>> Therefore, >>> after submitting an import request, DIH returns immediately, while the >>> import process (in case a large amount of data needs to be indexed) >>> continues asynchronously behind the scenes. >>> >>> So, what is the recommended way to check if the import process has >>> already >>> finished? Or still better, is there any method / workaround that will >>> block >>> the import operation's caller until the operation has finished? >>> >>> In my application, the DIH receives some URL parameters which are used >>> for >>> determining the database name that is used within data-config.xml, e.g. >>> >>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo >>> >>> Since only one DIH, /dataimport, is defined, but several database needs >>> to >>> be indexed, it is required to issue this command several times, e.g. >>> >>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo >>> >>> ... wait until /dataimport?command=status says "Indexing completed" (but >>> without using a loop that checks it again and again) ... >>> >>> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false >>> >>> >>> A suitable solution, at least IMHO, would be to have an additional DIH >>> parameter which determines whether the import call is blocking on >>> non-blocking, the default. As far as I see, this could be accomplished >>> since >>> Solr can execute more than one import operation at a time (it starts a >>> new >>> thread for each). Perhaps, my question is somehow related to the >>> discussion >>> [1] on ParallelDataImportHandler. >>> >>> Best, >>> Sascha >>> >>> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee >>> > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: ${dataimporter.delta.twitter_id} not getting populated in deltaImportQuery
are you sure the data comes back in the same name. Some DBs return the field names in ALL CAPS you may try out a delta_import using a full import too http://wiki.apache.org/solr/DataImportHandlerFaq#My_delta-import_goes_out_of_memory_._Any_workaround_.3F On Wed, Nov 11, 2009 at 9:55 PM, Mark Ellul wrote: > I have 2 entities from the root node, not sure if that makes a difference! > > On Wed, Nov 11, 2009 at 4:49 PM, Mark Ellul wrote: > >> Hi, >> >> I have a interesting issue... >> >> Basically I am trying to delta imports on solr 1.4 on a postgresql 8.3 >> database. >> >> Basically when I am running a delta import with the entity below I get an >> exception (see below the entity definition) showing the query its trying to >> run and you can see that its not populating the where clause of my >> dataImportQuery. >> >> I have tried ${dataimporter.delta.twitter_id} and ${dataimporter.delta.id} >> and get the same exceptions. >> >> Am I missing something obvious? >> >> Any help would be appreciated! >> >> Regards >> >> Mark >> >> >> > query=" >> select twitter_id, >> twitter_id as pk, >> 1 as site_id, >> screen_name >> >> from api_tweeter WHERE >> tweet_mapreduce_on IS NOT NULL; >> " transformer="TemplateTransformer" >> >> deltaImportQuery=" >> select twitter_id, >> twitter_id as pk, >> 1 as site_id, >> screen_name >> >> from api_tweeter >> where twitter_id=${dataimporter.delta.twitter_id }; >> " >> deltaQuery ="select twitter_id from api_tweeter where modified_on > >> '${dataimporter.last_index_time}' and tweet_mapreduce_on IS NOT NULL;" >> >> > >> >> >> >> >> >> >> INFO: Completed parentDeltaQuery for Entity: Tweeter >> Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DocBuilder >> buildDocument >> SEVERE: Exception while processing: Tweeter document : >> SolrInputDocument[{}] >> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to >> execute query: select twitter_id, twitter_id >> as pk, 1 as site_id, screen_name from api_tweeter where >> twitter_id=; Processing Document # 1 >> at >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) >> at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) >> at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) >> at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) >> at >> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:276) >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172) >> at >> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352) >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) >> at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) >> Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at end of >> input >> Position: 1197 >> at >> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062) >> at >> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795) >> at >> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) >> at >> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479) >> at >> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:353) >> at >> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:345) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:246) >> ... 11 more >> Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DataImporter >> doDeltaImport >> SEVERE: Delta Import Failed >> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to >> execute query: select twitter_id, twitter_id >> as pk, 1 as site_id, screen_name from api_tweeter where >> twitter_id=; Processing Document # 1 >> at >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) >> at >> org.apache.solr
Re: Persist in Core Admin
On Thu, Nov 12, 2009 at 3:13 AM, Jason Rutherglen wrote: > It looks like our core admin wiki doesn't cover the persist action? > http://wiki.apache.org/solr/CoreAdmin > > I'd like to be able to persist the cores to solr.xml, even if persistent="false">. It seems like the persist action does this? yes. But you will have to specify a 'file' parameter > -- - Noble Paul | Principal Engineer| AOL | http://aol.com