Re: indexing entire text but only storing first N characters?

2009-02-19 Thread revathy arun
I have also done this and i used two separate fields like the ones you mentioned On 2/19/09, Mike Topper wrote: > > Hello, > > In one of the fields in my schema I am sending somewhat large texts. I > want to be able to index all of it since I want to search on the entire > text, but I only need

Re: utf 8 issue

2009-02-18 Thread revathy arun
Hi Eril, $post_string is xml data i dont see any content for those files when i give *:* .what would that mean? On 2/19/09, Erik Hatcher wrote: > > > On Feb 18, 2009, at 1:53 PM, revathy arun wrote: > >> I am using php curl to post data to solr >> >> containe

Re: utf 8 issue

2009-02-18 Thread revathy arun
R, $header ); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS,$post_string); .$data = curl_exec($ch); .. however the document i am sending does not seem to have the utf8 encoding regards On 2/18/09, Erik Hatcher wrote: > > > On Feb 18, 2009, at 7:34 AM, revathy ar

utf 8 issue

2009-02-18 Thread revathy arun
Hi , I am trying to index various langauge documents (foroyo,chinese,japanese) .These have been converted from pdf to text using xpdf I am using the standard anlyzer for content analysis ,but i am not able to search anything from some of the files. My guess is that these documents are not in utf-

solr 1.3 analyzers

2009-02-18 Thread revathy arun
HI , In the solr 1.3 under src/classes/java/analyzers i see only the following language specific tokenizer chinestokenizer cjktokenizer russiantokenizer but i see filterfactories for other languages like dutch ,french,barzialian etc but no tokenizer in this scenario are we supposed to use the s

Re: multicore

2009-02-18 Thread revathy arun
? Regards On 2/18/09, Noble Paul നോബിള്‍ नोब्ळ् wrote: > > there are no limits . It must be Integer.MAX_VALUE > > the limits are usually decided by the number of file handles the > system can open and the amount of RAM cpu you may have > > On Wed, Feb 18, 2009 at 2:15 PM, revathy arun

multicore

2009-02-18 Thread revathy arun
Is there any known limit to number of cores that can be create on a single webapp. What are possible limiting factors? Regards

Re: Multilanguage

2009-02-17 Thread revathy arun
ementation is at the URL below my name. > > Otis -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > From: revathy arun > To: solr-user@lucene.apache.org > Sent: Tuesday, February 17, 2009 6:39:40 PM > Subj

Re: Multilanguage

2009-02-17 Thread revathy arun
Does Apache Tika help find the language of the given document? On 2/17/09, Till Kinstler wrote: > > Paul Libbrecht schrieb: > > Clearly, then, something that matches words in a dictionary and decides on >> the language based on the language of the majority could do a decent job to >> decide the

Distributed search

2009-02-16 Thread revathy arun
Hi, Can we use multicore to have several indexes per webapp and use distributed search to merge the indexes? for exampe if we have 3 cores -core0 ,core1 and core2 for 3 different languages and to search across all the 3 indexes use the shard parameter as shard=localhost:8080/solr/core0,localhost:

indexing Chienese langage

2009-02-16 Thread revathy arun
Hi, When I index chinese content using chinese tokenizer and analyzer in solr 1.3 ,some of the chinese text files are getting indexed but others are not. Since chinese has got many different language subtypes as in standard chinese,simplified chinese etc which of these does the chinese tokenizer

Multilanguage

2009-02-15 Thread revathy arun
Hi, I have a scenario where ,i need to convert pdf content to text and then index the same at run time .I do not know as to what language the pdf would be ,in this case which is the best soln i have with respect the content field type in the schema where the text content would be indexed to? Th

Re: multilanguage prototype

2009-01-27 Thread revathy arun
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:619) regards On 1/28/09, revathy arun wrote: > > Hi, > > > This is the only info in th

Re: multilanguage prototype

2009-01-27 Thread revathy arun
r log during indexing? > >Erik > > On Jan 27, 2009, at 6:56 AM, revathy arun wrote: > > Hi Shalin, >> >> The admin page stats are as follows >> searcherName : searc...@1d4c3d5 main >> caching : true >> numDocs : 0 >> maxDoc : 0 >> >> *na

Re: multilanguage prototype

2009-01-27 Thread revathy arun
rned? > > On Tue, Jan 27, 2009 at 5:01 PM, revathy arun wrote: > > > this is the stats of my updatehandler > > but i still dont see any index created > > *stats: *commits : 7 > > autocommits : 0 > > optimizes : 2 > > docsPending : 0 > > adds :

Re: multilanguage prototype

2009-01-27 Thread revathy arun
regards On 1/27/09, revathy arun wrote: > > Hi > > I have committed.The admin page does not show any docs pending or committed > or any errors. > > Regards > Sujatha > > > On 1/27/09, Shalin Shekhar Mangar wrote: >> >> Did you commit after the upda

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi I have committed.The admin page does not show any docs pending or committed or any errors. Regards Sujatha On 1/27/09, Shalin Shekhar Mangar wrote: > > Did you commit after the updates? > > 2009/1/27 revathy arun > > > Hi, > > > > I have downloade so

multilanguage prototype

2009-01-27 Thread revathy arun
Hi, I have downloade solr1.3.0 . I need to index chinese content ,for this i have defined a new field in the schema as I beleive solr1.3 already has the cjkanalyzer by default. my schema in the testing stage has only 2 fields However when i index the chinese text into