>
> On Wed, Jul 29, 2009 at 6:02 PM, ashokc wrote:
>>
>> Sure.
>>
>> The java command I use with TIKA to extract text from a URL is:
>>
>> java -jar tika-0.3-standalone.jar -t $url
>>
>> I have also attached the screenshots of the web page, post d
tp://www.nabble.com/file/p24728917/china.tika.xml china.tika.xml
Grant Ingersoll-6 wrote:
>
> Hmm, looks very much like an encoding problem. Can you post a sample
> showing it, along with the commands you invoked?
>
> Thanks,
> Grant
>
> On Jul 28, 2009, at 6:14 PM, ash
I am finding that the search results based on indexing Tika extracted text
are very different from results based on indexing the text extracted via
other means. This shows up for example with a chinese web site that I am
trying to index.
I created the documents (for posting to SOLR) in two ways.
Yes, I reindexed the entire repository after each of my changes. Here is the
output with debug on.
== DEBUG OUTPUT BEGIN ==
0
83
standard
10
0
content
on
*,score
on
创意或商业创新、
on
dismax
2.2
创意或商业创新、
创意或商业创新、
+Disjunct
Hi
I have the following fieldType that processes korean/chinese/japanese text
When I supply korean words/phrases in the query, I do get several expected
Korean URLs as search results, and the my keywords are correctly highlighted
in the excerpt. B
Hi,
I copy 'field1' to 'field2' so that I can apply a different set of analyzers
& filters. Content wise, they are identical. 'field2' has to be stored
because it is used for high-lighting. Do I have to declare 'field1' also to
be stored? 'field1' is never returned in the response. Thanks. - ashok
When 'dismax' queries are use, where is the best place to apply boost
values/factors? While indexing by supplying the 'boost' attribute to the
field, or in solrconfig.xml by specifying the 'qf' parameter with the same
boosts? What are the advantages/disadvantages to each? What happens if both
boos
Hi,
I find that I am freely able to post to my production SOLR server, from any
other host that can run the post command. So somebody can wipe out the whole
index by posting a delete query. Is there a way SOLR can be configured so
that it will take updates ONLY from the server on which it is runn
Hi,
The 'content' field that I am indexing is usually large (e.g. a pdf doc of a
few Mb in size). I need highlighting to be on. This 'seems' to require that
I have to set the 'content' field to be STORED. This returns the whole
content field in the search result XML. for each matching document. T
n Shekhar Mangar wrote:
>
> On Fri, Apr 17, 2009 at 11:32 AM, ashokc wrote:
>
>>
>> What we need is for the white_papers & pdfs to be boosted, but if and
>> only
>> if such doucments are valid results to the search term in question. How
>> would I writ
if and only
if such doucments are valid results to the search term in question. How
would I write my above 'q' to accomplish that?
Thanks
- ashok
Shalin Shekhar Mangar wrote:
>
> On Fri, Apr 17, 2009 at 1:03 AM, ashokc wrote:
>
>>
>> I have a query that yields
I have a query that yields results binned in several facets. How can I boost
the results that fall in certain facets over the rest of them that do not
belong to those facets? I use the standard query format. Thank you
- ashok
--
View this message in context:
http://www.nabble.com/Boosting-by-fac
Hi,
I have separate JDBC datasources (DS1 & DS2) that I want to index with DIH
in a single SOLR instance. The unique record for the two sources are
different. Do I have to synthesize a uniqueKey that spans both the
datasources? Something like this? That is, the uniqueKey values will be like
(+ in
What I am doing right now is to capture all the content under "content_korea"
for example, use 'copyField' to duplicate that content to "content_english".
"content_korea" gets processed with CJK analyzers, and "content_english"
gets processed with usual detailed index/query analyzers, filters, syn
That worked. Thanks again.
Noble Paul നോബിള് नोब्ळ् wrote:
>
> the column names are case sensitive try this
>
>
>
> On Sat, Apr 4, 2009 at 3:58 AM, ashokc wrote:
>>
>> Hi,
>> I need to assign multiple values to a field, with each value
; it may not be always in uppercase it can be in mixed case as well
>
> On Sat, Apr 4, 2009 at 12:58 AM, ashokc wrote:
>>
>> Happy to report that it is working. Looks like we have to use UPPER CASE
>> for
>> all the column names. When I examined the map 'aRow
Hi,
I need to assign multiple values to a field, with each value coming from a
different column of the sql query.
My data config snippet has lines like
where 'project_area' & 'project_version' are output by the sql query to the
datasource. The 'verbose-output'
B. I am just out of clue, why this may happen. I
> even wrote a testcase and it seems to work fine
> --Noble
>
> On Fri, Apr 3, 2009 at 10:23 PM, ashokc wrote:
>>
>> I downloaded the nightly build yesterday (2nd April), modified the
>> ClobTransformer.java file wi
tting the
same behavior with the 'war' that download came with. Thanks Noble.
Noble Paul നോബിള് नोब्ळ् wrote:
>
> and which version of Solr are u using?
>
> On Fri, Apr 3, 2009 at 10:09 PM, ashokc wrote:
>>
>> Sure:
>>
>> data-config Xml
>>
for QIN
2009-04-03T11:47:32.635Z
Noble Paul നോബിള് नोब्ळ् wrote:
>
> There is something else wrong with your setup.
>
> can you just paste the whole data-config.xml
>
> --Noble
>
> On Fri, Apr 3, 2009 at 5:39 PM, ashokc wrote:
>>
>> Noble,
&
ou can hook up a debugger to a
> running Solr that is the easiest
> --Noble
>
> On Fri, Apr 3, 2009 at 9:35 AM, ashokc wrote:
>>
>> That would require me to recompile (with ant/maven scripts?) the source
>> and
>> replace the jar for DIH, right? I can try -
to debug ClobTransformer adding(System.out.println
> into ClobTransformer may help)
>
> On Fri, Apr 3, 2009 at 6:04 AM, ashokc wrote:
>>
>> Correcting my earlier post. It lost some lines some how.
>>
>> Hi,
>>
>> I have set up to import some oracle clob colu
oracle.sql.c...@aed3a5
4486
Any pointers on why I do not get the 'string' out of the clob for indexing?
Is the nightly war NOT the right one to use?
Thanks for your help.
- ashok
ashokc wrote:
>
> Hi,
>
> I have set up to import some oracle clob columns with DIH. I am usin
Hi,
I have set up to import some oracle clob columns with DIH. I am using the
latest nightly release. My config says,
But it does not seem to turn this clob into a String. The search results
show:
1.8670129
oracle.sql.c...@aed3a5
4486
Any pointers on why I do not get t
Hi,
I have documents where text from two languages, e.g. (english & korean) or
(english & german) are mixed u p in a fairly intensive way. 20-30% of the
text is in English and the rest in the other. Can somebody indicate how I
should set up the 'analyzers' and 'fields' in schema.xml? Should I hav
This problem went away when I updated to use the latest nightly release
(2009-02-04)
- ashok
ashokc wrote:
>
> I have seen some of these oddities that Chris is referring to. In my case,
> terms that are NOT in the query get highlighted. For example searching for
> 'Intel'
I have seen some of these oddities that Chris is referring to. In my case,
terms that are NOT in the query get highlighted. For example searching for
'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms
either. Do these filter factories add some extra intelligence to the inde
ngs down too much, if
> there is network in the picture.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message
>> From: ashokc
>> To: solr-user@lucene.apache.org
>> Sent: Monday, January 1
Hello,
Is it possible to have the index created by a single SOLR instance, but have
several SOLR instances field the search queries. Or do I HAVE to replicate
the index for each SOLR instance that I want to answer queries? I need to
set up a fail-over instance. Thanks
- ashok
--
View this messa
Thanks for the reply. I figured there is no simple solution here. I am
parsing the query in my code separating out negations, assertions and such
and building the final SOLR query to issue. I simply ue the boost as given
by the user. If none given, I use a default boost for title & url matches.
-
dd a new "
> On Thu, Dec 4, 2008 at 6:39 PM, ashokc <[EMAIL PROTECTED]> wrote:
>>
>> The SOLR wiki says
>>
>>>>3. Make sure both indexes you want to merge are closed.
>>
>> What exactly does 'closed' mean?
>
> If you
The SOLR wiki says
>>3. Make sure both indexes you want to merge are closed.
What exactly does 'closed' mean?
1. Do I need to stop SOLR search on both indexes before running the merge
command? So a brief downtime is required?
Or do I simply prevent any 'updates/deletes' to these indices during
Here is the problem I am trying to solve. I have to use the Standard Request
Handler.
Query (can be quite complex, as it gets built from an advanced search form):
term1^2.0 OR term2 OR "term3 term4"
I have 3 fields - content (the default search field), title and url.
Any matches in the title or
Hi,
I have set
but it is not taking effect. It continues to take it as OR. I am working
with the latest nightly build 11/20/2008
For a querry like
term1 term2
Debug shows
content:term1 content:term2>/str>
Bug?
Thanks
- ashok
--
View this message in context:
http://www.nabble.com/sol
34 matches
Mail list logo