Greek and English text into the same field

2011-03-17 Thread abiratsis
Hello everyone,

I have a index that contains text (several fileds) that can be in English or
in Greek. I have found the corresponding filters

solr.GreekLowerCaseFilterFactory
solr.GreekStemFilterFactory

for the greek language along with the special type text_greek included to
the default schema.xml file, although I need to know if I can use them with
the existing filters for a text field (embed them to the existing
configuration for english). 

So my 1st question is if I can simply add these two filters to the existing
field types or an extra configuration needed? 

And the 2nd question is about how to handle the greek synonyms-stopwords...
should I simply add onether solr.SynonymFilterFactory filter to the existing
configuration? Should I merge both files (english-greek) together?

Basicaly I don't know what the best approach is for handling a multilingual
case like mine e.g:should I create a seperate index for each language?

Any suggestions appreciated...

Thanx,
Alex




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Greek-and-English-text-into-the-same-field-tp2696186p2696186.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Greek and English text into the same field

2011-03-18 Thread abiratsis
OK thanx a lot guys, one last question is there any need to download and
embed the stopwords-synonyms files or solr.war already contains them?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Greek-and-English-text-into-the-same-field-tp2696186p2697795.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to get stopwords and synonyms files for several lanuages

2011-03-18 Thread abiratsis
Hello everyone,

I am developing a multilingual index so there is a need for different
languages support. I need some answers to the follwing questions:

1. Which steps should I follow in order to get(download) all the
stopwords-synonyms files for several languages? 

2. Is there any site containing them? 

3. Should I download them somehow or they are already embedded to the
solr.war?

Thanx,
Alex

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files-for-several-lanuages-tp2698494p2698494.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get stopwords and synonyms files for several lanuages

2011-03-18 Thread abiratsis
OK thanx Markus, is clear enough now

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files-for-several-lanuages-tp2698494p2698566.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get stopwords and synonyms files for several lanuages

2011-03-18 Thread abiratsis
OK thanx Markus, is clear enough now

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files-for-several-lanuages-tp2698494p2698567.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get stopwords and synonyms files for several lanuages

2011-03-18 Thread abiratsis
Basically I have one more question, by saying that "Synonyms largely depend
on what you're indexing" you mean that I probably need to implement a
mechanism for handling synonyms right? If yes, you have any suggestions how
to implement this?

Thanx,
Alex

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files-for-several-lanuages-tp2698494p2698593.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr Scheduling doesnt fire

2011-05-15 Thread abiratsis
Hello everyone, I trying to use DIH in Solr 3.1 with scheduling but it never
fires. Here is my dataimport.properties file:

#Mon May 16 02:43:35 CEST 2011
last_index_time=2011-05-16 02\:43\:35
element.last_index_time=2011-05-16 02\:43\:35
syncCores=
server=localhost
port=8080
webapp=solr
params=/select?qt=/dataimport&command=delta-import&clean=false&commit=true
interval=1

Does anybody faced any similar issue guys?

Thanx,
Alex

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-Scheduling-doesnt-fire-tp2946016p2946016.html
Sent from the Solr - User mailing list archive at Nabble.com.


Tika parser doesn't seem to work with Solr DIH Row Transformer

2011-07-07 Thread abiratsis
Hello there, I am using DIH for importing data from a mysql db and a
directory. For this purpose I have wrote my own Transformer class in order
to modify imported values under several cases. Now we need to add document
support for our indexing server and that leaded us to use Tika in order to
import documents' content. My index server contains data for the following
objects:
 
* Bookmarks

* Courses

* Files (here I need to use Tika)


All the previous elements share some common properties such as: Id, Title,
Description, Text. Also all the needed data are stored to the database and
thats why we decided to use a single DIH mechanism in order to import all
these elements to the Solr index. Of course in the case of the files I need
to read their content. 

So I have wrote something similar to the next code in order to handle
documents' content:


//each file is downloaded first using FTP
FTPClient ftpClient = new FTPClient();
ftpClient.connect("FTPServer");
ftpClient.login("uname", "pass");
File localFile =  new File("/tmp/" + fileName);
ftpClient.download("/repos/files/original/" + fileName,
localFile);
   

InputStream input = new FileInputStream(localFile);
ContentHandler textHandler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();  

AutoDetectParser parser = new AutoDetectParser();
try {
parser.parse(input, textHandler, metadata);
} catch (IOException ex) {
Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE,
null, ex);
} catch (SAXException ex) {
Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE,
null, ex);
} catch (TikaException ex) {
Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE,
null, ex);
}finally{
input.close();
}
row.put("text", textHandler.toString());
row.put("title", metadata.get("title"));


This code is under the transformRow method that my class overrides. 
The problem is that when I run the same code in a main class the code
executes normally but when I move the previous code to the transformRow
method, textHandler.toString() doesn't return any text neither metadata.
Also no exception is thrown!

Has anyone face something similar on the past?

Thanks a lot

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tika-parser-doesn-t-seem-to-work-with-Solr-DIH-Row-Transformer-tp3148853p3148853.html
Sent from the Solr - User mailing list archive at Nabble.com.