you can convert Chinese words to pinyin and use n-gram to search phonetic
similar words
On Wed, Feb 8, 2012 at 11:10 AM, Floyd Wu wrote:
> Hi there,
>
> Does anyone here ever implemented phonetic search especially with
> Chinese(traditional/simplified) using SOLR or Lucene?
>
> Please share some
>
> *
> *
> This is not unusual, but there's also not much reason to give this much
> memory in your case. This is the cache that is hit when a user pages
> through result set. Your numbers would seem to indicate one of two things:
> 1> your window is smaller than 2 pages, see solrconfig.xml,
>
Experience has shown that it is much faster to run Solr with a small
amount of memory and let the rest of the ram be used by the operating
system "disk cache". That is, the OS is very good at keeping the right
disk blocks in memory, much better than Solr.
How much RAM is in the server and how much
Hi Guys,
I am using Solr 3.5, and would like to use a fq like
'getField(getDoc(uuid:workspace_${workspaceId})), "isPublic"):true?
- workspace_${workspaceId}: workspaceId is indexed field.
- getDoc(uuid:concat("workspace_", workspaceId): return the document whose
uuid is "workspace_${workspaceI
It seems a bug to me
Can you open a ticket? Thank you
Koji Sekiguchi from iPhone
On 2012/02/08, at 13:32, Shyam Bhaskaran wrote:
> Hi Koji,
>
> Thanks for the response when I use hl.bs.chars=".!?" and hl.bs.maxScan=200 I
> see improvements, below is the highlighted value
>
> "The synthesis t
Hi Koji,
Thanks for the response when I use hl.bs.chars=".!?" and hl.bs.maxScan=200 I
see improvements, below is the highlighted value
"The synthesis tool only supports the resolution functions for
std_logic and std_logic_vector."
But in other cases I also see that some of the words break in
hello all,
i am struggling with getting solr.WordDelimiterFilterFactory to behave as is
indicated in the solr book (Smiley) on page 54.
the example in the books reads like this:
>>
Here is an example exercising all options:
WiFi-802.11b to Wi, Fi, WiFi, 802, 11, 80211, b, WiFi80211b
<<
essentia
: This all seems a bit too much work for such a real-world scenario?
You haven't really told us what your scenerio is.
You said you want to split tokens on whitespace, full-stop (aka:
period) and comma only, but then in response to some suggestions you added
comments other things that you neve
(12/02/08 1:54), Shyam Bhaskaran wrote:
Hi Koji,
I have tried using hl.bs.type=SENTENCE and still no improvement.
We are storing PDF extracted content in the field which has termVectors enabled.
Example the field contains the following data extracted from PDF
"User-defined resolution function
Hi,
I try to get Solr 3.3.0 to process Arabic search requests using its
admin interface. I have successfully managed to set it up on Tomcat
using the URIEncoding attribute but fail miserably on WebLogic 10.
Invoking the URL http://localhost:7012/solr/select/?q=? returns the
XML below:
Are you able to explain how I would create another field to fit my scenario?
-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl]
Sent: Tuesday, February 07, 2012 1:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms
Well, if you want both multi word and single
Well, if you want both multi word and single words I guess you will have to
create another field :) Or make queries like you suggested.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3724009.html
Sent from the Solr - User mailing list archive at
A custom tokenizer/tokenfilter could set the position increment when a newline
comes through as well.
Erik
On Feb 7, 2012, at 15:28, Erick Erickson wrote:
> Well, this is a common approach. Someone has to split up the
> input as "sentences" (whatever they are). Putting them in multi-valued
It doesn't seem to do it for me. My field type is:
I am using edism
I simulated a hierarchical faceting browsing scheme using facet.prefix.
However, it seems there can only be one facet.prefix per field. For OR
queries, the browsing scheme requires multiple facet prefixes. For example:
fq=facet1:term1 OR facet1:term2 OR facet1:term3
Something like the above
Isn't that what autoGeneratePhraseQueries="true" is for?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3723886.html
Sent from the Solr - User mailing list archive at Nabble.com.
Yes, you could do that. I guess numbers will give you trouble
under all circumstances.
You may be able to do something like search against your non-
phonetic field with higher boosts to preferentially do those
matches.
Best
Erick
On Tue, Feb 7, 2012 at 2:30 PM, Dirk Högemann
wrote:
> Thanks Eri
Well, this is a common approach. Someone has to split up the
input as "sentences" (whatever they are). Putting them in multi-valued
fields is trivial.
Then you confine things to within sentences, then you start searching
phrases with a slop less than your incrementGap...
Best
Erick
On Tue, Feb 7
I suppose I could translate every user query to include the term with quotes.
e.g. if someone searches for stock syrup I send a query like:
q=stock syrup OR "stock syrup"
Seems like a bit of a hack though, is there a better way of doing this?
Zac
-Original Message-
From: Zac Smith
Sent
Hi, all...
I have a small problem retrieving the full set of query responses I need
and would appreciate any help.
I have a query string as follows:
+((Title:"sales") (+Title:sales) (TOC:"sales") (+TOC:sales)
(Keywords:"sales") (+Keywords:sales) (text:"sales") (+text:sales)
(sales)) +(RepType:"W
Thanks Erick.
In the first place we thought of removing numbers with a pattern filter.
Setting inject to false will have the "same" effect
If we want to be able to search for numbers in the content this solution
will not work,but another field without phonetic filtering and searching in
both fields
Am 07.02.2012 15:12, schrieb Erick Erickson:
Right, I suspect you're hitting merges.
Guess so.
How often are you
committing?
One time, after all work is done.
In other words, why are you committing explicitly?
It's often better to use commitWithin on the add command
and just let Solr do i
Walter Underwood wrote
>
> Looking at SOLR-1335 and the wiki, I'm not quite sure of the final
> behavior for this.
>
> These properties are per-core, and not visible in other cores, right?
>
>
yes it is.
Walter Underwood wrote
>
>
> Are variables substituted in solr.xml, so I can swap in
This all seems a bit too much work for such a real-world scenario?
---
IntelCompute
Web Design & Local Online Marketing
http://www.intelcompute.com
On Tue, 7 Feb 2012 05:11:01 -0800 (PST), Ahmet Arslan
wrote:
>> I'm still finding matches across
>> newlines
>>
>> index...
>>
>> i am fluent
>>
Thank you. I'll try NRT and some post-filter :)
On Tue, Feb 7, 2012 at 3:09 PM, Erick Erickson wrote:
> You have several options:
> 1> if you can go to trunk (bleeding edge, I admit), you can
> get into the near real time (NRT) stuff.
> 2> You could maintain essentially a post-filter step wh
Hi Koji,
I have tried using hl.bs.type=SENTENCE and still no improvement.
We are storing PDF extracted content in the field which has termVectors enabled.
Example the field contains the following data extracted from PDF
"User-defined resolution functions. The synthesis tool only supports the
r
(12/02/08 0:50), Shyam Bhaskaran wrote:
Hi,
We are using Solr 4.0 along with FVH and there is an issue we are facing while
highlighting.
For our requirement we want the highlighted search result should start with the
beginning of the sentence and needed help to get this done.
As of now this i
Hi,
We are using Solr 4.0 along with FVH and there is an issue we are facing while
highlighting.
For our requirement we want the highlighted search result should start with the
beginning of the sentence and needed help to get this done.
As of now this is not happening and the highlighted output
See below...
On Tue, Feb 7, 2012 at 8:21 AM, Pranav Prakash wrote:
> Based on the hit ratio of my caches, they seem to be pretty low. Here they
> are. What are typical values of yours production setup? What are some of
> the things that can be done to improve the ratios?
>
> queryResultCache
>
>
Right, I suspect you're hitting merges. How often are you
committing? In other words, why are you committing explicitly?
It's often better to use commitWithin on the add command
and just let Solr do its work without explicitly committing.
Going forward, this is fixed in trunk by the DocumentWriter
You have several options:
1> if you can go to trunk (bleeding edge, I admit), you can
get into the near real time (NRT) stuff.
2> You could maintain essentially a post-filter step where
your app maintains a list of deleted messages and
removes them from the response. This will cause
So the obvious question is "what is your
performance like without the distance filters?"
Without that knowledge, we have no clue whether
the modifications you've made had any hope of
speeding up your response times
As for the docs, any improvements you'd like to
contribute would be happily re
What happens if you do NOT inject? Setting inject="false"
stores only the phonetic reduction, not the original text. In that
case your false match on "13" would go away
Not sure what that means for the rest of your app though.
Best
Erick
On Mon, Feb 6, 2012 at 5:44 AM, Dirk Högemann
wrote:
You're probably looking at a custom tokenizer and/or filter chain here. Or
at least creatively combining the ones that exist. The admin/analysis
page will be your friend.
Even if you define these as synonyms, the rest of the analysis chain may
break them up so you really have to look at the effect
You could try to isolate the bottleneck by testing the indexing speed
from the local machine hosting Solr. Also tools like iostat or sar
might give you more details about the disk side.
Yes, I am doing different stuff to isolate bottleneck. Im also profiling
JVM. And I am using iostat, top a
Based on the hit ratio of my caches, they seem to be pretty low. Here they
are. What are typical values of yours production setup? What are some of
the things that can be done to improve the ratios?
queryResultCache
lookups : 3234602
hits : 496
hitratio : 0.00
inserts : 3234239
evictions : 323014
> I'm still finding matches across
> newlines
>
> index...
>
> i am fluent
> german racing
>
> search...
>
> "fluent german"
>
> Any suggestions?
You can use a multiValued field for this. Split your document according to new
line at client side.
i am fluent
german racing
positionIncreme
I'm still finding matches across newlines
index...
i am fluent
german racing
search...
"fluent german"
Any suggestions? I've currently got this in wdftypes.txt for
WordDelimiterfilterfactory
\u000A => ALPHANUM
\u000B => ALPHANUM
\u000C => ALPHANUM
\u000D => ALPHANUM
# \u000D\u000A => ALPHA
ok, I try.
but I think:
If I Index a zip archive containing any pdf files and after, i search on
solr a query, I see only the list of the pdf title into my archive, but it
can't search into the single document..
I read on Tika documentation that "Package formats can contain multiple
separate docu
On Mon, Feb 6, 2012 at 5:55 PM, Per Steffensen wrote:
> Sami Siren skrev:
>
>> On Mon, Feb 6, 2012 at 2:53 PM, Per Steffensen
>> wrote:
>>
>>
>>
>>>
>>> Actually right now, I am trying to find our what my bottleneck is. The
>>> setup
>>> is more complex, than I would bother you with, but basicall
hi all,
we have used solr to provide searching service in many products. I
found for each product, we have to do some configurations and query
expressions.
our users are not used to this. they are familiar with sql and they may
describe like this: I want a query that can search books whose
hi everybody i have the following entities. i added the jar file into
webinf/lib folder and i dont know how to specify the field names in the
schema.xml pls help me anybody
http://test.xxx.com"; appKey="qto9gjtI68pi7JRxVZ8Z"
lastUpdate="${dataimporter.last_index_time}" />
http://abcs.xxx.com"; p
42 matches
Mail list logo