Solr custom plugins: is it possible to have them persistent?
I've posted a similar question few days ago, but our needs have gone a bit further. I need to develop two plugins which need to be persistent throu the whole indexing and updating process The first one need to open a connection to a mysql instance (and query that connection during every document processing) The second one uses a java library (classifier4j) which is a Bayesian categorization system (and doesn't talk to a db). This one learns while matching, so the object needs to be created at the very beginning of the indexing and should be available for all the documents (i cannot create the object for every object, because i'd miss the learning feature) For the first one i could use a dataimporthandler, but i'm not sure about it: i don't need to import the whole db, but just the occurencies matching a particular condition for each document. About the second one, i'm blind. Is there a place in solr where i can create the connection object and the categorizer object before everything else, and have them available to all documents? Thanks all in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3292781.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr custom plugins: is it possible to have them persistent?
it's how i'm doing it now... but i'm not sure i'm placing the objects into the right place significant part of my code here : http://pastie.org/2448984 (i've omitted the methods implementations since are pretty long) inside the method setLocation, i create the connection to mysql database inside the method setFieldPosition, i create the categorization object Then i started thinking i was creating and deleting those objects locally everytime solr reads a document to index. So, where should i put them? inside the tothegocustom class constructor, after the super call? I'm asking this because i'm not sure if my custom updaterequestprocessor is created once or for everydocument parsed (i'm still learning solr, but i think i'm getting into it, bits per bits!) Thanks again! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3292928.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr custom plugins: is it possible to have them persistent?
my problem is i still don't understand where i have to put that singleton (or how i can load it into solr) i have my singleton class Connector for mysql, with all its methods defined. Now what? This is the point i'm missing :( -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3295320.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr custom plugins: is it possible to have them persistent?
ok so my two singleton classes are MysqlConnector and JFJPConnector basically: 1 - jar them 2 - cp them to /custom/path/within/solr/ 3 - modify solrconfig.xml with /custom/path/within/solr/ my two jars are then automatically loaded? nice! in my CustomUpdateProcessor class i can call MysqlConnector.start_query() and JFJPConnector.other_method(), and it will refer to an active instance of those 2 classes? Is this how it works, without any other trick around? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3295818.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr custom plugins: is it possible to have them persistent?
i think it's better for me to keep it under some solr installation path, i don't want to loose files :) ok, i'm going to try this out :) i already got into the "package" issue (my.package.whatever) this one i know how to handle! thanks for all the help, i'll post again to tell you "It Works!" (but i'm not sure about it!) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3295842.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr custom plugins: is it possible to have them persistent?
i thinki i have to drop the singleton class solution, since my boss wants to add 2 other different solr installation and i need to reuse the plugins i'm working on... so i'll have to use a connectionpool or i will create hangs when the 3 cores update their indexes at the same time :( -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3296627.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr custom plugins: is it possible to have them persistent?
Eh eh you're right! This is what happens when you try to learn too much things in the same moment! Btw, i found this http://webdevelopersjournal.com/columns/connection_pool.html which is perfect, i can use the provided code as my singleton instance, now i just have to figure out how i can detect the end of the indexing operation and so close all the connections to the pool (the example shows a servlet using that singleton, and they define a destroy method, but they never use it inside the servlet class) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3297716.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr custom plugins: is it possible to have them persistent?
SEVERE: org.apache.solr.common.SolrException: Error Instantiating UpdateRequestProcessorFactory, ToTheGoCustom is not a org.apache.solr.update.processor.UpdateRequestProcessorFactory i'm getting this error, but i don't know how to fix it this is solrconfig.xml: ... ToTheGoCustom and this is my class implementation import java.io.IOException; import org.apache.solr.common.SolrInputDocument; /* Solr import */ import org.apache.solr.request.SolrQueryRequest; //import org.apache.solr.request.SolrQueryResponse; import org.apache.solr.update.AddUpdateCommand; import org.apache.solr.update.processor.UpdateRequestProcessor; import org.apache.solr.update.processor.UpdateRequestProcessorFactory; class ToTheGoCustom extends UpdateRequestProcessor { public ToTheGoCustom( UpdateRequestProcessor next) { super( next ); } //routine di modifica @Override public void processAdd(AddUpdateCommand cmd) throws IOException { SolrInputDocument doc = cmd.getSolrInputDocument(); //salary dal documento Object sal = doc.getFieldValue( "salary" ); setSalary(doc,sal); //location dal documento Object loc = doc.getFieldValue( "location" ); Object cc = doc.getFieldValue( "countrycode" ); setLocation(doc,loc,cc); //jobfield, jobposition dal document Object title = doc.getFieldValue( "job_title" ); Object description = doc.getFieldValue( "description" ); //setFieldPosition(doc,title,description); // ritorna il documento modificato all'handler principale super.processAdd(cmd); } /* stuff here, not dangerous */ } the file is called ToTheGoCustom.java, inside a NetBeans project called ToTheGoCustom, and built as jar ToTheGoCustom.jar i put it inside the solr-installation lib folder. I already did that once, and it worked smoothly, i just added some methods and it gave me that error. The only thing that may have changed is my editor, since i went throu a formatting and reinstalled everything... So i think i built the plugins in different ways (one working and one not, but i cannot recall the working one...) what am i missing? please be explicit, i'm really giving it up, this is too messy to even only understand :( -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3298850.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr custom plugins: is it possible to have them persistent?
yay i did it! i wasn't that far away from the correct implementation, it just was a bit tricky to understand how to... now i've got a problem with my singleton class: i have DBConnectionManager.jar put inside a folder ( from solrconfig.xml) but at indextime i have this error: Sep 1, 2011 10:21:28 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NoClassDefFoundError: db/connection/DBConnectionManager at tothego_custom.ToTheGoCustom.(ToTheGoCustom.java:23) at tothego_custom.ToTheGoCustomFactory.getInstance(ToTheGoCustomFactory.java:18) at org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:74) Caused by: java.lang.ClassNotFoundException: db.connection.DBConnectionManager ... 20 more I did exactly how you told me: i created the DBConnectionManager singleton, made the jar, put inside a folder (the one in the lib directive) and added the lib directive in solrconfig.xml. In ToTheGoCustomFactory i have import db.connection.*; and no errors at all, but now solr doesn't find that class... what am i missing this time? i think it's the last thing i need to understand now hehe :) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3300614.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr custom plugins: is it possible to have them persistent?
ok solved it by changing () to ty guys for all yor help, now off to debug some java errors hehe thanks again, for real! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3300629.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue with Solr and copyFields
you need to define the "search" field as MultiValued since you're copying into it multiple sources http://wiki.apache.org/solr/FAQ#How_do_I_use_copyField_with_wildcards.3F -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-Solr-and-copyFields-tp3300763p3300794.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr custom plugins: is it possible to have them persistent?
Thanks, but this was not the point of the topic :) I'm way more further than this :) Please, avoid random replies :) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3301057.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr indexing plugin: skip single faulty document?
Hi all, as far as i know, when solr finds a faulty document (inside an xml containing let say 1000 docs) it skips the whole file and the indexing process exits with exception (am i correct?) I'm using a custom indexing plugin, and i can trap the exception. Instead of using "default" values if that exception is raised, i would like to skip the document raising the error (example: sometimes i try to insert a string inside a "string" field, but solr exits saying it's expecting a multiValued field... i guess it's because of some ascii chars within the text, something like \n or sort...) maybe logging it somewhere, and pass to the next one. We're indexing millions of them, and we don't care much if we loose 10-20% of them, so the best solution is skip the single faulty doc and continue with the rest. I guess i have to work on the super.processAdd() call, but i don't know where i can find info about it. Can anybody help me? Is there a book talking about advanced solr plugin developement i could read? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3427646.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr indexing plugin: skip single faulty document?
Thanks Erik! I'll be reading that issue, it's pretty much everything i need! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3447400.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr indexing plugin: skip single faulty document?
Ok i'll surely check out what i can! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3447537.html Sent from the Solr - User mailing list archive at Nabble.com.
Regex replacement not working!
Hi, i have this bunch of lines in my schema.xml that should do a replacement but it doesn't work! I need it to extract only the numbers from some other string. The strings can be anything: only letters (so it should replace it with an empty string), letters + numbers. The numbers can be in one of those formats 17000 --> ok 17,000 --> should be replaced with 17000 17.000 --> should be replaced with 17000 17k --> should be replaced with 17000 how can i accomplish this? -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3120748.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
this is the "final" version of my schema part, but what i get is this: 1.0 Negotiable Negotiable Negotiable 1.0 £7 to £8 per hour £7 to £8 per hour £7 to £8 per hour 1.0 £125 to £150 per day £125 to £150 per day £125 to £150 per day which is not what i'm expecting... the regular expression works in http://www.fileformat.info/tool/regex.htm without any problem -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
Index Analyzer org.apache.solr.analysis.KeywordTokenizerFactory {luceneMatchVersion=LUCENE_31} position1 term text £22000 - £25000 per annum + benefits startOffset 0 endOffset 36 org.apache.solr.analysis.PatternReplaceFilterFactory {replacement=$2, pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*, luceneMatchVersion=LUCENE_31} position1 term text 25000 startOffset 0 endOffset 36 this is my output for the field salary_max, it seems to be working from the admin jsp interface -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121353.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
i have the string "You may earn 25k dollars per week" stored in the field "salary" i'm using 2 copyfields "salary_min" and "salary_max" with source in "salary" with those 2 datatypes salary is "text" salary_min is "salary_min_text" salary_max is "salary_max_text" so, i was expecting this: solr updates its index solr copies the value from salary to salary_min and applies the value with the regex solr copies the value from salary to salary_max and applies the value with the regex but it's not working, it copies the value from one field to another, but the filter isn't applied, even if it's working as you could see -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121386.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
ok, but i'm not applying the filtering on the copyfields. this is how my schema looks: and the two datatypes defined before. that's why i tought i could first use "copyField" to copy the value then index them with my two datatypes filtering... -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121497.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
my goal is/was storing the value into the field, and i get i have to create my Update handler. i was trying to use query with salary_min:[100 TO 200] and it's actually working... since i just need it to search, i'll stay with this solution is the [100 TO 200] a performance killer? i remember reading something around, but cannot find it again... -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121625.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
ok, last question on the UpdateProcessor: can you please give me the steps to implement my own? i mean, i can push my custom processor in solr's code, and then what? i don't understand how i have to change the solrconf.xml and how can i bind that to the updater i just wrotea and also i don't understand how i do have to change the schema.xml i'm sorry for this question, but i started working on solr 5 days ago and for some things i really need a lot of documentation, and this isn't fully covered anywhere -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121743.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex replacement not working!
too bad it is still in todo, that's why i was asking some for some tips on writing, compiling, registration, calling... -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any detailed tutorials on plugin development?
actually i'm rewriting http://wiki.apache.org/solr/UpdateRequestProcessor this wiki page with a more detailed how-to, it will be ready and online after i get back from work! -- View this message in context: http://lucene.472066.n3.nabble.com/any-detailed-tutorials-on-plugin-development-tp3177821p3184990.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr indexing process: keep a persistent Mysql connection throu all the indexing process
I wrote my custom update handler for my solr installation, using jdbc to query a mysql database. Everything works fine: the updater queries the db, gets the data i need and update it in my documents! Fantastic! Only issue is i have to open and close a mysql connection for every document i read. Since we have something like 10kk indexed document, i was thinking about opening a mysql connection at the very beginning of the indexing process, keeping it stored somewhere and use it inside my custom update handler. When the whole indexing process is complete, the connection should be closed. So far, is it possible? Thanks all in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3278608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process
those documents are unrelated to the database. the db i have is just storing countries - region - cities, and it's used to do a refinement on a specific solr field example: solrField "thetext" with content "Mary comes from London" updateHandler polls the database for europe - great britain - london and updates those values to the correct fields isnt an update handler relative to a single document? at least, that's what i understood... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3279765.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process
those documents are unrelated to the database. the db i have is just storing countries - region - cities, and it's used to do a refinement on a specific solr field example: solrField "thetext" with content "Mary comes from London" updateHandler polls the database for europe - great britain - london and updates those values to the correct fields isnt an update handler relative to a single document? at least, that's what i understood... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3279764.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process
since i'm barely new to solr, can you please give some guidelines or provide an example i can look at for starters? i already tought about a singleton implementation, but i'm not sure where i have to put it and how should i start coding it -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3283901.html Sent from the Solr - User mailing list archive at Nabble.com.