Greek and English text into the same field
Hello everyone, I have a index that contains text (several fileds) that can be in English or in Greek. I have found the corresponding filters solr.GreekLowerCaseFilterFactory solr.GreekStemFilterFactory for the greek language along with the special type text_greek included to the default schema.xml file, although I need to know if I can use them with the existing filters for a text field (embed them to the existing configuration for english). So my 1st question is if I can simply add these two filters to the existing field types or an extra configuration needed? And the 2nd question is about how to handle the greek synonyms-stopwords... should I simply add onether solr.SynonymFilterFactory filter to the existing configuration? Should I merge both files (english-greek) together? Basicaly I don't know what the best approach is for handling a multilingual case like mine e.g:should I create a seperate index for each language? Any suggestions appreciated... Thanx, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Greek-and-English-text-into-the-same-field-tp2696186p2696186.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Greek and English text into the same field
OK thanx a lot guys, one last question is there any need to download and embed the stopwords-synonyms files or solr.war already contains them? -- View this message in context: http://lucene.472066.n3.nabble.com/Greek-and-English-text-into-the-same-field-tp2696186p2697795.html Sent from the Solr - User mailing list archive at Nabble.com.
How to get stopwords and synonyms files for several lanuages
Hello everyone, I am developing a multilingual index so there is a need for different languages support. I need some answers to the follwing questions: 1. Which steps should I follow in order to get(download) all the stopwords-synonyms files for several languages? 2. Is there any site containing them? 3. Should I download them somehow or they are already embedded to the solr.war? Thanx, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files-for-several-lanuages-tp2698494p2698494.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get stopwords and synonyms files for several lanuages
OK thanx Markus, is clear enough now -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files-for-several-lanuages-tp2698494p2698566.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get stopwords and synonyms files for several lanuages
OK thanx Markus, is clear enough now -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files-for-several-lanuages-tp2698494p2698567.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get stopwords and synonyms files for several lanuages
Basically I have one more question, by saying that "Synonyms largely depend on what you're indexing" you mean that I probably need to implement a mechanism for handling synonyms right? If yes, you have any suggestions how to implement this? Thanx, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-stopwords-and-synonyms-files-for-several-lanuages-tp2698494p2698593.html Sent from the Solr - User mailing list archive at Nabble.com.
solr Scheduling doesnt fire
Hello everyone, I trying to use DIH in Solr 3.1 with scheduling but it never fires. Here is my dataimport.properties file: #Mon May 16 02:43:35 CEST 2011 last_index_time=2011-05-16 02\:43\:35 element.last_index_time=2011-05-16 02\:43\:35 syncCores= server=localhost port=8080 webapp=solr params=/select?qt=/dataimport&command=delta-import&clean=false&commit=true interval=1 Does anybody faced any similar issue guys? Thanx, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/solr-Scheduling-doesnt-fire-tp2946016p2946016.html Sent from the Solr - User mailing list archive at Nabble.com.
Tika parser doesn't seem to work with Solr DIH Row Transformer
Hello there, I am using DIH for importing data from a mysql db and a directory. For this purpose I have wrote my own Transformer class in order to modify imported values under several cases. Now we need to add document support for our indexing server and that leaded us to use Tika in order to import documents' content. My index server contains data for the following objects: * Bookmarks * Courses * Files (here I need to use Tika) All the previous elements share some common properties such as: Id, Title, Description, Text. Also all the needed data are stored to the database and thats why we decided to use a single DIH mechanism in order to import all these elements to the Solr index. Of course in the case of the files I need to read their content. So I have wrote something similar to the next code in order to handle documents' content: //each file is downloaded first using FTP FTPClient ftpClient = new FTPClient(); ftpClient.connect("FTPServer"); ftpClient.login("uname", "pass"); File localFile = new File("/tmp/" + fileName); ftpClient.download("/repos/files/original/" + fileName, localFile); InputStream input = new FileInputStream(localFile); ContentHandler textHandler = new BodyContentHandler(-1); Metadata metadata = new Metadata(); AutoDetectParser parser = new AutoDetectParser(); try { parser.parse(input, textHandler, metadata); } catch (IOException ex) { Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE, null, ex); } catch (SAXException ex) { Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE, null, ex); } catch (TikaException ex) { Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE, null, ex); }finally{ input.close(); } row.put("text", textHandler.toString()); row.put("title", metadata.get("title")); This code is under the transformRow method that my class overrides. The problem is that when I run the same code in a main class the code executes normally but when I move the previous code to the transformRow method, textHandler.toString() doesn't return any text neither metadata. Also no exception is thrown! Has anyone face something similar on the past? Thanks a lot -- View this message in context: http://lucene.472066.n3.nabble.com/Tika-parser-doesn-t-seem-to-work-with-Solr-DIH-Row-Transformer-tp3148853p3148853.html Sent from the Solr - User mailing list archive at Nabble.com.