Re: Indexing TIKA extracted text. Are there some issues?

2009-07-29 Thread ashokc
> > On Wed, Jul 29, 2009 at 6:02 PM, ashokc wrote: >> >> Sure. >> >> The java command I use with TIKA to extract text from a URL is: >> >> java -jar tika-0.3-standalone.jar -t $url >> >> I have also attached the screenshots of the web page, post d

Re: Indexing TIKA extracted text. Are there some issues?

2009-07-29 Thread ashokc
tp://www.nabble.com/file/p24728917/china.tika.xml china.tika.xml Grant Ingersoll-6 wrote: > > Hmm, looks very much like an encoding problem. Can you post a sample > showing it, along with the commands you invoked? > > Thanks, > Grant > > On Jul 28, 2009, at 6:14 PM, ash

Indexing TIKA extracted text. Are there some issues?

2009-07-28 Thread ashokc
I am finding that the search results based on indexing Tika extracted text are very different from results based on indexing the text extracted via other means. This shows up for example with a chinese web site that I am trying to index. I created the documents (for posting to SOLR) in two ways.

Re: CJKTokenizerFactory seems to work for Korea but not for China and Japan

2009-07-01 Thread ashokc
Yes, I reindexed the entire repository after each of my changes. Here is the output with debug on. == DEBUG OUTPUT BEGIN == 0 83 standard 10 0 content on *,score on 创意或商业创新、 on dismax 2.2 创意或商业创新、 创意或商业创新、 +Disjunct

CJKTokenizerFactory seems to work for Korea but not for China and Japan

2009-06-30 Thread ashokc
Hi I have the following fieldType that processes korean/chinese/japanese text When I supply korean words/phrases in the query, I do get several expected Korean URLs as search results, and the my keywords are correctly highlighted in the excerpt. B

copyfield and 'store' and highlighting

2009-06-10 Thread ashokc
Hi, I copy 'field1' to 'field2' so that I can apply a different set of analyzers & filters. Content wise, they are identical. 'field2' has to be stored because it is used for high-lighting. Do I have to declare 'field1' also to be stored? 'field1' is never returned in the response. Thanks. - ashok

qf boost Versus field boost for Dismax queries

2009-06-09 Thread ashokc
When 'dismax' queries are use, where is the best place to apply boost values/factors? While indexing by supplying the 'boost' attribute to the field, or in solrconfig.xml by specifying the 'qf' parameter with the same boosts? What are the advantages/disadvantages to each? What happens if both boos

How to disable posting updates from a remote server

2009-06-04 Thread ashokc
Hi, I find that I am freely able to post to my production SOLR server, from any other host that can run the post command. So somebody can wipe out the whole index by posting a delete query. Is there a way SOLR can be configured so that it will take updates ONLY from the server on which it is runn

Highlighting and Field options

2009-06-01 Thread ashokc
Hi, The 'content' field that I am indexing is usually large (e.g. a pdf doc of a few Mb in size). I need highlighting to be on. This 'seems' to require that I have to set the 'content' field to be STORED. This returns the whole content field in the search result XML. for each matching document. T

Re: Boosting by facets with standard query

2009-04-19 Thread ashokc
n Shekhar Mangar wrote: > > On Fri, Apr 17, 2009 at 11:32 AM, ashokc wrote: > >> >> What we need is for the white_papers & pdfs to be boosted, but if and >> only >> if such doucments are valid results to the search term in question. How >> would I writ

Re: Boosting by facets with standard query

2009-04-16 Thread ashokc
if and only if such doucments are valid results to the search term in question. How would I write my above 'q' to accomplish that? Thanks - ashok Shalin Shekhar Mangar wrote: > > On Fri, Apr 17, 2009 at 1:03 AM, ashokc wrote: > >> >> I have a query that yields

Boosting by facets with standard query

2009-04-16 Thread ashokc
I have a query that yields results binned in several facets. How can I boost the results that fall in certain facets over the rest of them that do not belong to those facets? I use the standard query format. Thank you - ashok -- View this message in context: http://www.nabble.com/Boosting-by-fac

DIH & uniqueKey

2009-04-14 Thread ashokc
Hi, I have separate JDBC datasources (DS1 & DS2) that I want to index with DIH in a single SOLR instance. The unique record for the two sources are different. Do I have to synthesize a uniqueKey that spans both the datasources? Something like this? That is, the uniqueKey values will be like (+ in

Re: More than one language in the same document

2009-04-07 Thread ashokc
What I am doing right now is to capture all the content under "content_korea" for example, use 'copyField' to duplicate that content to "content_english". "content_korea" gets processed with CJK analyzers, and "content_english" gets processed with usual detailed index/query analyzers, filters, syn

Re: Multi-valued fields with DIH

2009-04-04 Thread ashokc
That worked. Thanks again. Noble Paul നോബിള്‍ नोब्ळ् wrote: > > the column names are case sensitive try this > > > > On Sat, Apr 4, 2009 at 3:58 AM, ashokc wrote: >> >> Hi, >> I need to assign multiple values to a field, with each value

Re: Oracle Clob column with DIH does not turn to String

2009-04-04 Thread ashokc
; it may not be always in uppercase it can be in mixed case as well > > On Sat, Apr 4, 2009 at 12:58 AM, ashokc wrote: >> >> Happy to report that it is working. Looks like we have to use UPPER CASE >> for >> all the column names. When I examined the map 'aRow&#x

Multi-valued fields with DIH

2009-04-03 Thread ashokc
Hi, I need to assign multiple values to a field, with each value coming from a different column of the sql query. My data config snippet has lines like where 'project_area' & 'project_version' are output by the sql query to the datasource. The 'verbose-output'

Re: Oracle Clob column with DIH does not turn to String

2009-04-03 Thread ashokc
B. I am just out of clue, why this may happen. I > even wrote a testcase and it seems to work fine > --Noble > > On Fri, Apr 3, 2009 at 10:23 PM, ashokc wrote: >> >> I downloaded the nightly build yesterday (2nd April), modified the >> ClobTransformer.java file wi

Re: Oracle Clob column with DIH does not turn to String

2009-04-03 Thread ashokc
tting the same behavior with the 'war' that download came with. Thanks Noble. Noble Paul നോബിള്‍ नोब्ळ् wrote: > > and which version of Solr are u using? > > On Fri, Apr 3, 2009 at 10:09 PM, ashokc wrote: >> >> Sure: >> >> data-config Xml >>

Re: Oracle Clob column with DIH does not turn to String

2009-04-03 Thread ashokc
for QIN 2009-04-03T11:47:32.635Z Noble Paul നോബിള്‍ नोब्ळ् wrote: > > There is something else wrong with your setup. > > can you just paste the whole data-config.xml > > --Noble > > On Fri, Apr 3, 2009 at 5:39 PM, ashokc wrote: >> >> Noble, &

Re: Oracle Clob column with DIH does not turn to String

2009-04-03 Thread ashokc
ou can hook up a debugger to a > running Solr that is the easiest > --Noble > > On Fri, Apr 3, 2009 at 9:35 AM, ashokc wrote: >> >> That would require me to recompile (with ant/maven scripts?) the source >> and >> replace the jar for DIH, right? I can try -

Re: Oracle Clob column with DIH does not turn to String

2009-04-02 Thread ashokc
to debug ClobTransformer adding(System.out.println > into ClobTransformer may help) > > On Fri, Apr 3, 2009 at 6:04 AM, ashokc wrote: >> >> Correcting my earlier post. It lost some lines some how. >> >> Hi, >> >> I have set up to import some oracle clob colu

Re: Oracle Clob column with DIH does not turn to String

2009-04-02 Thread ashokc
oracle.sql.c...@aed3a5 4486 Any pointers on why I do not get the 'string' out of the clob for indexing? Is the nightly war NOT the right one to use? Thanks for your help. - ashok ashokc wrote: > > Hi, > > I have set up to import some oracle clob columns with DIH. I am usin

Oracle Clob column with DIH does not turn to String

2009-04-02 Thread ashokc
Hi, I have set up to import some oracle clob columns with DIH. I am using the latest nightly release. My config says, But it does not seem to turn this clob into a String. The search results show: 1.8670129 oracle.sql.c...@aed3a5 4486 Any pointers on why I do not get t

More than one language in the same document

2009-03-26 Thread ashokc
Hi, I have documents where text from two languages, e.g. (english & korean) or (english & german) are mixed u p in a fairly intensive way. 20-30% of the text is in English and the rest in the other. Can somebody indicate how I should set up the 'analyzers' and 'fields' in schema.xml? Should I hav

Re: Highlighting Oddities

2009-02-04 Thread ashokc
This problem went away when I updated to use the latest nightly release (2009-02-04) - ashok ashokc wrote: > > I have seen some of these oddities that Chris is referring to. In my case, > terms that are NOT in the query get highlighted. For example searching for > 'Intel'

Re: Highlighting Oddities

2009-02-04 Thread ashokc
I have seen some of these oddities that Chris is referring to. In my case, terms that are NOT in the query get highlighted. For example searching for 'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms either. Do these filter factories add some extra intelligence to the inde

Re: Single index - multiple SOLR instances

2009-01-12 Thread ashokc
ngs down too much, if > there is network in the picture. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: ashokc >> To: solr-user@lucene.apache.org >> Sent: Monday, January 1

Single index - multiple SOLR instances

2009-01-12 Thread ashokc
Hello, Is it possible to have the index created by a single SOLR instance, but have several SOLR instances field the search queries. Or do I HAVE to replicate the index for each SOLR instance that I want to answer queries? I need to set up a fail-over instance. Thanks - ashok -- View this messa

Re: Boost a query by field at query time - Standard Request Handler

2008-12-09 Thread ashokc
Thanks for the reply. I figured there is no simple solution here. I am parsing the query in my code separating out negations, assertions and such and building the final SOLR query to issue. I simply ue the boost as given by the user. If none given, I use a default boost for title & url matches. -

Re: Merging Indices

2008-12-05 Thread ashokc
dd a new " > On Thu, Dec 4, 2008 at 6:39 PM, ashokc <[EMAIL PROTECTED]> wrote: >> >> The SOLR wiki says >> >>>>3. Make sure both indexes you want to merge are closed. >> >> What exactly does 'closed' mean? > > If you

Merging Indices

2008-12-04 Thread ashokc
The SOLR wiki says >>3. Make sure both indexes you want to merge are closed. What exactly does 'closed' mean? 1. Do I need to stop SOLR search on both indexes before running the merge command? So a brief downtime is required? Or do I simply prevent any 'updates/deletes' to these indices during

Boost a query by field at query time - Standard Request Handler

2008-12-04 Thread ashokc
Here is the problem I am trying to solve. I have to use the Standard Request Handler. Query (can be quite complex, as it gets built from an advanced search form): term1^2.0 OR term2 OR "term3 term4" I have 3 fields - content (the default search field), title and url. Any matches in the title or

solrQueryParser does not take effect - nightly build

2008-11-20 Thread ashokc
Hi, I have set but it is not taking effect. It continues to take it as OR. I am working with the latest nightly build 11/20/2008 For a querry like term1 term2 Debug shows content:term1 content:term2>/str> Bug? Thanks - ashok -- View this message in context: http://www.nabble.com/sol