Solr custom plugins: is it possible to have them persistent?

2011-08-29 Thread samuele.mattiuzzo
I've posted a similar question few days ago, but our needs have gone a bit
further.

I need to develop two plugins which need to be persistent throu the whole
indexing and updating process

The first one need to open a connection to a mysql instance (and query that
connection during every document processing)

The second one uses a java library (classifier4j) which is a Bayesian
categorization system (and doesn't talk to a db). This one learns while
matching, so the object needs to be created at the very beginning of the
indexing and should be available for all the documents (i cannot create the
object for every object, because i'd miss the learning feature)

For the first one i could use a dataimporthandler, but i'm not sure about
it: i don't need to import the whole db, but just the occurencies matching a
particular condition for each document. About the second one, i'm blind.

Is there a place in solr where i can create the connection object and the
categorizer object before everything else, and have them available to all
documents?

Thanks all in advance!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3292781.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom plugins: is it possible to have them persistent?

2011-08-29 Thread samuele.mattiuzzo
it's how i'm doing it now... but i'm not sure i'm placing the objects into
the right place

significant part of my code here : http://pastie.org/2448984

(i've omitted the methods implementations since are pretty long)

inside the method setLocation, i create the connection to mysql database

inside the method setFieldPosition, i create the categorization object

Then i started thinking i was creating and deleting those objects locally
everytime solr reads a document to index. So, where should i put them?
inside the tothegocustom class constructor, after the super call?

I'm asking this because i'm not sure if my custom updaterequestprocessor is
created once or for everydocument parsed (i'm still learning solr, but i
think i'm getting into it, bits per bits!)

Thanks again!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3292928.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom plugins: is it possible to have them persistent?

2011-08-30 Thread samuele.mattiuzzo
my problem is i still don't understand where i have to put that singleton (or
how i can load it into solr)

i have my singleton class Connector for mysql, with all its methods defined.
Now what? This is the point i'm missing :(

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3295320.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom plugins: is it possible to have them persistent?

2011-08-30 Thread samuele.mattiuzzo
ok so my two singleton classes are MysqlConnector and JFJPConnector

basically:

1 - jar them
2 - cp them to /custom/path/within/solr/
3 - modify solrconfig.xml with /custom/path/within/solr/

my two jars are then automatically loaded? nice!

in my CustomUpdateProcessor class i can call MysqlConnector.start_query()
and JFJPConnector.other_method(), and it will refer to an active instance of
those 2 classes? Is this how it works, without any other trick around?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3295818.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom plugins: is it possible to have them persistent?

2011-08-30 Thread samuele.mattiuzzo
i think it's better for me to keep it under some solr installation path, i
don't want to loose files :)

ok, i'm going to try this out :) i already got into the "package" issue
(my.package.whatever) this one i know how to handle!

thanks for all the help, i'll post again to tell you "It Works!" (but i'm
not sure about it!)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3295842.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom plugins: is it possible to have them persistent?

2011-08-30 Thread samuele.mattiuzzo
i thinki i have to drop the singleton class solution, since my boss wants to
add 2 other different solr installation and i need to reuse the plugins i'm
working on... so i'll have to use a connectionpool or i will create hangs
when the 3 cores update their indexes at the same time :(

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3296627.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom plugins: is it possible to have them persistent?

2011-08-31 Thread samuele.mattiuzzo
Eh eh you're right! This is what happens when you try to learn too much
things in the same moment!

Btw, i found this
http://webdevelopersjournal.com/columns/connection_pool.html which is
perfect, i can use the provided code as my singleton instance, now i just
have to figure out how i can detect the end of the indexing operation and so
close all the connections to the pool (the example shows a servlet using
that singleton, and they define a destroy method, but they never use it
inside the servlet class)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3297716.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom plugins: is it possible to have them persistent?

2011-08-31 Thread samuele.mattiuzzo
SEVERE: org.apache.solr.common.SolrException: Error Instantiating
UpdateRequestProcessorFactory, ToTheGoCustom is not a
org.apache.solr.update.processor.UpdateRequestProcessorFactory

i'm getting this error, but i don't know how to fix it

this is solrconfig.xml:

  
  
   
   
  

...

  

   
 ToTheGoCustom
   

 

and this is my class implementation

import java.io.IOException;

import org.apache.solr.common.SolrInputDocument;
/* Solr import */
import org.apache.solr.request.SolrQueryRequest;
//import org.apache.solr.request.SolrQueryResponse;
import org.apache.solr.update.AddUpdateCommand;
import org.apache.solr.update.processor.UpdateRequestProcessor;
import org.apache.solr.update.processor.UpdateRequestProcessorFactory;


class ToTheGoCustom extends UpdateRequestProcessor
{



public ToTheGoCustom( UpdateRequestProcessor next) {
super( next );

}

//routine di modifica
@Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument doc = cmd.getSolrInputDocument();

//salary dal documento
Object sal = doc.getFieldValue( "salary" );
setSalary(doc,sal);

//location dal documento
Object loc = doc.getFieldValue( "location" );
Object cc = doc.getFieldValue( "countrycode" );
setLocation(doc,loc,cc);

//jobfield, jobposition dal document
Object title = doc.getFieldValue( "job_title" );
Object description = doc.getFieldValue( "description" );
//setFieldPosition(doc,title,description);


// ritorna il documento modificato all'handler principale
super.processAdd(cmd);
}
/* stuff here, not dangerous */
}

the file is called ToTheGoCustom.java, inside a NetBeans project called
ToTheGoCustom, and built as jar ToTheGoCustom.jar
i put it inside the solr-installation lib folder. I already did that once,
and it worked smoothly, i just added some methods and it gave me that error.

The only thing that may have changed is my editor, since i went throu a
formatting and reinstalled everything... So i think i built the plugins in
different ways (one working and one not, but i cannot recall the working
one...)

what am i missing? please be explicit, i'm really giving it up, this is too
messy to even only understand :(

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3298850.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom plugins: is it possible to have them persistent?

2011-09-01 Thread samuele.mattiuzzo
yay i did it! i wasn't that far away from the correct implementation, it just
was a bit tricky to understand how to...
now i've got a problem with my singleton class:

i have DBConnectionManager.jar put inside a folder ( from solrconfig.xml) but at
indextime i have this error:


Sep 1, 2011 10:21:28 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError: db/connection/DBConnectionManager
at tothego_custom.ToTheGoCustom.(ToTheGoCustom.java:23)
at
tothego_custom.ToTheGoCustomFactory.getInstance(ToTheGoCustomFactory.java:18)
at
org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:74)

Caused by: java.lang.ClassNotFoundException:
db.connection.DBConnectionManager

... 20 more

I did exactly how you told me: i created the DBConnectionManager singleton,
made the jar, put inside a folder (the one in the lib directive) and added
the lib directive in solrconfig.xml.

In ToTheGoCustomFactory i have import db.connection.*; and no errors at all,
but now solr doesn't find that class... what am i missing this time? i think
it's the last thing i need to understand now hehe :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3300614.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom plugins: is it possible to have them persistent?

2011-09-01 Thread samuele.mattiuzzo
ok solved it by changing () to 



ty guys for all yor help, now off to debug some java errors hehe

thanks again, for real!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3300629.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue with Solr and copyFields

2011-09-01 Thread samuele.mattiuzzo
you need to define the "search" field as MultiValued since you're copying
into it multiple sources

http://wiki.apache.org/solr/FAQ#How_do_I_use_copyField_with_wildcards.3F



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-Solr-and-copyFields-tp3300763p3300794.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom plugins: is it possible to have them persistent?

2011-09-01 Thread samuele.mattiuzzo
Thanks, but this was not the point of the topic :) I'm way more further than
this :) Please, avoid random replies :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3301057.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr indexing plugin: skip single faulty document?

2011-10-17 Thread samuele.mattiuzzo
Hi all, as far as i know, when solr finds a faulty document (inside an xml
containing let say 1000 docs) it skips the whole file and the indexing
process exits with exception (am i correct?)

I'm using a custom indexing plugin, and i can trap the exception. Instead of
using "default" values if that exception is raised, i would like to skip the
document raising the error (example: sometimes i try to insert a string
inside a "string" field, but solr exits saying it's expecting a multiValued
field... i guess it's because of some ascii chars within the text, something
like \n or sort...) maybe logging it somewhere, and pass to the next one.
We're indexing millions of them, and we don't care much if we loose 10-20%
of them, so the best solution is skip the single faulty doc and continue
with the rest.

I guess i have to work on the super.processAdd() call, but i don't know
where i can find info about it. Can anybody help me? Is there a book talking
about advanced solr plugin developement i could read?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3427646.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr indexing plugin: skip single faulty document?

2011-10-24 Thread samuele.mattiuzzo
Thanks Erik! I'll be reading that issue, it's pretty much everything i need!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3447400.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr indexing plugin: skip single faulty document?

2011-10-24 Thread samuele.mattiuzzo
Ok i'll surely check out what i can!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3447537.html
Sent from the Solr - User mailing list archive at Nabble.com.


Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
Hi, i have this bunch of lines in my schema.xml that should do a replacement
but it doesn't work!


  
  

  



I need it to extract only the numbers from some other string. The strings
can be anything: only letters (so it should replace it with an empty
string), letters + numbers. The numbers can be in one of those formats

17000 --> ok
17,000 --> should be replaced with 17000
17.000 --> should be replaced with 17000
17k --> should be replaced with 17000

how can i accomplish this? 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3120748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo

  




  
  




  



  




  
  




  


this is the "final" version of my schema part, but what i get is this:



1.0
Negotiable
Negotiable
Negotiable


1.0
£7 to £8 per hour
£7 to £8 per hour
£7 to £8 per hour


1.0
£125 to £150 per day
£125 to £150 per day
£125 to £150 per day


which is not what i'm expecting... the regular expression works in
http://www.fileformat.info/tool/regex.htm without any problem

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121055.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
Index Analyzer
org.apache.solr.analysis.KeywordTokenizerFactory
{luceneMatchVersion=LUCENE_31}
position1
term text   £22000 - £25000 per annum + benefits
startOffset 0
endOffset   36


org.apache.solr.analysis.PatternReplaceFilterFactory {replacement=$2,
pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*,
luceneMatchVersion=LUCENE_31}
position1
term text   25000
startOffset 0
endOffset   36


this is my output for the field salary_max, it seems to be working from the
admin jsp interface

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121353.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
i have the string "You may earn 25k dollars per week" stored in the field
"salary"

i'm using 2 copyfields "salary_min" and "salary_max" with source in "salary"
with those 2 datatypes 

salary is "text"
salary_min is "salary_min_text"
salary_max is "salary_max_text"

so, i was expecting this:

solr updates its index
solr copies the value from salary to salary_min and applies the value with
the regex
solr copies the value from salary to salary_max and applies the value with
the regex


but it's not working, it copies the value from one field to another, but the
filter isn't applied, even if it's working as you could see


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121386.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
ok, but i'm not applying the filtering on the copyfields.
this is how my schema looks:






 




and the two datatypes defined before. that's why i tought i could first use
"copyField" to copy the value then index them with my two datatypes
filtering...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121497.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
my goal is/was storing the value into the field, and i get i have to create
my Update handler.

i was trying to use query with salary_min:[100 TO 200] and it's actually
working... since i just need it to search, i'll stay with this solution

is the [100 TO 200] a performance killer? i remember reading something
around, but cannot find it again...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121625.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
ok, last question on the UpdateProcessor: can you please give me the steps to
implement my own?
i mean, i can push my custom processor in solr's code, and then what?
i don't understand how i have to change the solrconf.xml and how can i bind
that to the updater i just wrotea
and also i don't understand how i do have to change the schema.xml

i'm sorry for this question, but i started working on solr 5 days ago and
for some things i really need a lot of documentation, and this isn't fully
covered anywhere

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121743.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
too bad it is still in todo, that's why i was asking some for some tips on
writing, compiling, registration, calling...


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any detailed tutorials on plugin development?

2011-07-20 Thread samuele.mattiuzzo
actually i'm rewriting http://wiki.apache.org/solr/UpdateRequestProcessor
this wiki page with a more detailed how-to, it will be ready and online
after i get back from work!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-detailed-tutorials-on-plugin-development-tp3177821p3184990.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr indexing process: keep a persistent Mysql connection throu all the indexing process

2011-08-23 Thread samuele.mattiuzzo
I wrote my custom update handler for my solr installation, using jdbc to
query a mysql database. Everything works fine: the updater queries the db,
gets the data i need and update it in my documents! Fantastic!

Only issue is i have to open and close a mysql connection for every document
i read. Since we have something like 10kk indexed document, i was thinking
about opening a mysql connection at the very beginning of the indexing
process, keeping it stored somewhere and use it inside my custom update
handler. When the whole indexing process is complete, the connection should
be closed.

So far, is it possible?

Thanks all in advance!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3278608.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process

2011-08-23 Thread samuele.mattiuzzo
those documents are unrelated to the database. the db i have is just storing
countries - region - cities, and it's used to do a refinement on a specific
solr field

example:

solrField "thetext" with content "Mary comes from London"

updateHandler polls the database for europe - great britain - london and
updates those values to the correct fields

isnt an update handler relative to a single document? at least, that's what
i understood...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3279765.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process

2011-08-23 Thread samuele.mattiuzzo
those documents are unrelated to the database. the db i have is just storing
countries - region - cities, and it's used to do a refinement on a specific
solr field

example:

solrField "thetext" with content "Mary comes from London"

updateHandler polls the database for europe - great britain - london and
updates those values to the correct fields

isnt an update handler relative to a single document? at least, that's what
i understood...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3279764.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process

2011-08-25 Thread samuele.mattiuzzo
since i'm barely new to solr, can you please give some guidelines or provide
an example i can look at for starters?

i already tought about a singleton implementation, but i'm not sure where i
have to put it and how should i start coding it

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3283901.html
Sent from the Solr - User mailing list archive at Nabble.com.