from:"Yao"

Date faceting and memory leaks

2010-05-17 Thread Yao


I have been running load testing using JMeter on a Solr 1.4 index with ~4
million docs. I notice a steady JVM heap size increase as I iterator 100
query terms a number of times against the index. The GC does not seems to
claim the heap after the test run is completed. It will run into OutOfMemory
as I repeat the test or increase the number of threads/users. 

The date facet queries are specified as following (as part of "append"
section in request handler):

{!ex=last_modified}last_modified:[NOW-30DAY TO
*]
 {!ex=last_modified}last_modified:[NOW-90DAY TO
NOW-30DAY]
 {!ex=last_modified}last_modified:[NOW-180DAY TO
NOW-90DAY]
 {!ex=last_modified}last_modified:[NOW-365DAY TO
NOW-180DAY]
 {!ex=last_modified}last_modified:[NOW-730DAY TO
NOW-365DAY]
 {!ex=last_modified}last_modified:[* TO
NOW-730DAY]


The last_modified field is a TrieDateField with a precisionStep of 6.

I have played for filterCache setting but does not have any effects as the
date field cache seems be  managed by Lucene FieldCahce.

Please help as I can be struggling with this for days. Thanks in advance.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p824372.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Date faceting and memory leaks

2010-05-17 Thread Yao


No I still have the OOM issue with repeated facet query request on the date
field. I forgot to mention that I am running 64-bit IBM 1.5 JVM. I also
tried the Sun 1.6 JVM with and without your GC arguments. The GC pattern is
different but the heap size does not drop as the test going on. I tested
with a single thread from Jmeter just to make sure there is ample room for
GC to clean house. The jmeter fires request one after another without pause
but I assume it should not effect GC. It is clear to me that date facet
query has some major impact on this as I can run the load test with other
field facets with no problem (JVM heap size would stabilize at certain level
over time).
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p824577.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Date faceting and memory leaks

2010-05-17 Thread Yao


Chris,

Thanks for the detailed response. No I am not using Date Facet but Facet
Query as for facet display. Here is the full configuration of my "dismax"
query handler:

  

 dismax
 explicit
 0.01
 
title text^0.5 domain^0.1 nature^0.1 author
 
 
title text
 
 
recip(ms(NOW,last_modified),3.16e-11,1,1)
 
 
url,title,domain,nature,src,last_modified,text,sz
 
 
2<-1 5<-2 6<90%
 
 100
 *:*
 
 on
 title,text
 
 0
 3
 
 text
 400
 regex 


 {!ex=src}src
 {!ex=domain}domain
 {!ex=nature}nature
 {!ex=last_modified}last_modified:[NOW-30DAY TO
*]
 {!ex=last_modified}last_modified:[NOW-90DAY TO
NOW-30DAY]
 {!ex=last_modified}last_modified:[NOW-180DAY TO
NOW-90DAY]
 {!ex=last_modified}last_modified:[NOW-365DAY TO
NOW-180DAY]
 {!ex=last_modified}last_modified:[NOW-730DAY TO
NOW-365DAY]
 {!ex=last_modified}last_modified:[* TO
NOW-730DAY]

  

Cache settings:
  
  
  

I am monitoring Solr JVM Heap Memory Usage via remote Jconsole, the image
below shows how heap size keep increasing as more facet query requests being
sent the Solr via JMeter:
http://n3.nabble.com/file/n825038/memory-1.jpg 

The following is the request URL pattern:
select?rows=0&facet=true&facet.mincount=1&facet.method=enum&q=${query}&qt=dismax
where ${query} is selected randomly from a list of 100 query terms

The date rounding suggest is a very good one, I will need to rerun the test
and report back on the cache setting. I remember my filterCache hit ratio is
around 0.7. I did use the tagged results for multi-select display of facet
values but in this case there is no fq in the load test request URL.

Thanks again and I will report back on the re-run with date rounding.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p825038.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Date faceting and memory leaks

2010-05-17 Thread Yao


Chris,

Just completed the re-run and your date rounding tip saved my day. I now
realized the "NOW" as a timestamp is a very bad idea for query caching as it
is never the same in value. NOW/DAY would at least makes a set facet queries
caches re-usable for a period of time. It turns on you can help with your
insight with just the little fraction of information provided. Thanks again!

-Yao
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p825059.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Date faceting and memory leaks

2010-05-17 Thread Yao


Just to close the loop.
 
I was fooling around the all the cache setting trying to figure out my
problem, so the filterCache is set as part of the experiments. It did
not cause any memory issue in this case. After the date rounding
adjustment, I re-ran the query with 15 threads with 6000 request and got
1,500/minute throughput by only using a little more than 0.5 GB of Heap
Memory.
 
The hit ratio reported in Solr admin statistics page shows filterCache
has a hitratio of 0.99. with 103800 lookups and 103773 hits, I assume it
is 99%. 
 
Have a nice day.
 
-Yao



From: Chris Hostetter-3 [via Lucene]
[mailto:ml-node+825052-1711725506-201...@n3.nabble.com] 
Sent: Monday, May 17, 2010 9:04 PM
To: Ge, Yao (Y.)
Subject: Re: Date faceting and memory leaks


: Cache settings: 
:

that's a monster filterCache ...i can easly imagine it causing an OOM if

your heap is only 5G. 

: The date rounding suggest is a very good one, I will need to rerun the
test 
: and report back on the cache setting. I remember my filterCache hit
ratio is 
: around 0.7. I did use the tagged results for multi-select display of
facet 

a "hit ratio" or "0.7" ratio, or "0.7% hit rate"? ... with that many 
unique facet queries, i can't imaging you were getting a 70% hit rate.

I'm betting if you monitor that filterCache size and hit rate as you run

your test you'll see it just grow and grow until the OOM.  and if you 
analyze the heap dumps you'll probably see the cache hanging on to a ton

of DocSets that will never be used again. 

: values but in this case there is no fq in the load test request URL. 

I've never tested this, so i can't say for sure, but if it turns out
that 
the filterCache is not your problem, then perhaps there is soemthing
wonky 
with the filterquery exclusion code in cases like this -- where you 
explicilty exlucde a taged fq but that fq doesn't exist.  the qya to
rule 
it out would be to remove the exlcusion from your configs and test it
that 
way to see if the behavior is hte same. 


-Hoss 






View message @
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp8243
72p825052.html 
To unsubscribe from Re: Date faceting and memory leaks, click here
< (link removed) 
WdlQGZvcmQuY29tfDgyNTAzOHwxNjYwNDQ2MTQ1> . 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p825086.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr read-only core

2010-05-25 Thread Yao


Is there a way to open a Solr index/core in read-only mode? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-read-only-core-tp843049p843049.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr read-only core

2010-05-25 Thread Yao


My motivation is more from the performance prospective than functional
prospective. I was hoping by opening the Solr index/core read-only,
underlying Lucene IndexReader can be opened in read-only mode for optimum
query performance (removing the overhead of multi-thread management).
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-read-only-core-tp843049p843099.html
Sent from the Solr - User mailing list archive at Nabble.com.

[SolrCloud] shard hash ranges changed after restoring backup

2016-06-15 Thread Gary Yao

Hi all,

My team at work maintains a SolrCloud 5.3.2 cluster with multiple
collections configured with sharding and replication.

We recently backed up our Solr indexes using the built-in backup
functionality. After the cluster was restored from the backup, we
noticed that atomic updates of documents are failing occasionally with
the error message 'missing required field [...]'. The exceptions are
thrown on a host on which the document to be updated is not stored. From
this we are deducing that there is a problem with finding the right host
by the hash of the uniqueKey. Indeed, our investigations so far showed
that for at least one collection in the new cluster, the shards have
different hash ranges assigned now. We checked the hash ranges by
querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
hash ranges of one collection that we debugged.

  Old cluster:
shard1_0 8000 - aaa9
shard1_1  - d554
shard2_0 d555 - fffe
shard2_1  - 2aa9
shard3_0 2aaa - 5554
shard3_1  - 7fff

  New cluster:
shard1 8000 - aaa9
shard2  - d554
shard3 d555 - 
shard4 0 - 2aa9
shard5 2aaa - 5554
shard6  - 7fff

  Note that the shard names differ because the old cluster's shards were
  split.

As you can see, the ranges of shard3 and shard4 differ from the old
cluster. This change of hash ranges matches with the symptoms we are
currently experiencing.

We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
in which David Smiley comments:

  shard hash ranges aren't restored; this error could be disasterous

It seems that this is what happened to us. We would like to hear some
suggestions on how we could recover from this problem.

Best,
Gary

Re: [SolrCloud] shard hash ranges changed after restoring backup

2016-06-16 Thread Gary Yao

Hi Erick,

I should add that our Solr cluster is in production and new documents
are constantly indexed. The new cluster has been up for three weeks now.
The problem was discovered only now because in our use case Atomic
Updates and RealTime Gets are mostly performed on new documents. With
almost absolute certainty there are already documents in the index that
were distributed to the shards according to the new hash ranges. If we
just changed the hash ranges in ZooKeeper, the index would still be in
an inconsistent state.

Is there any way to recover from this without having to re-index all
documents?

Best,
Gary

2016-06-15 19:23 GMT+02:00 Erick Erickson :
> Simplest, though a bit risky is to manually edit the znode and
> correct the znode entry. There are various tools out there, including
> one that ships with Zookeeper (see the ZK documentation).
>
> Or you can use the zkcli scripts (the Zookeeper ones) to get the znode
> down to your local machine, edit it there and then push it back up to ZK.
>
> I'd do all this with my Solr nodes shut down, then insure that my ZK
> ensemble was consistent after the update etc
>
> Best,
> Erick
>
> On Wed, Jun 15, 2016 at 8:36 AM, Gary Yao  wrote:
>> Hi all,
>>
>> My team at work maintains a SolrCloud 5.3.2 cluster with multiple
>> collections configured with sharding and replication.
>>
>> We recently backed up our Solr indexes using the built-in backup
>> functionality. After the cluster was restored from the backup, we
>> noticed that atomic updates of documents are failing occasionally with
>> the error message 'missing required field [...]'. The exceptions are
>> thrown on a host on which the document to be updated is not stored. From
>> this we are deducing that there is a problem with finding the right host
>> by the hash of the uniqueKey. Indeed, our investigations so far showed
>> that for at least one collection in the new cluster, the shards have
>> different hash ranges assigned now. We checked the hash ranges by
>> querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
>> hash ranges of one collection that we debugged.
>>
>>   Old cluster:
>> shard1_0 8000 - aaa9
>> shard1_1  - d554
>> shard2_0 d555 - fffe
>> shard2_1  - 2aa9
>> shard3_0 2aaa - 5554
>> shard3_1  - 7fff
>>
>>   New cluster:
>> shard1 8000 - aaa9
>> shard2  - d554
>> shard3 d555 - 
>> shard4 0 - 2aa9
>> shard5 2aaa - 5554
>> shard6  - 7fff
>>
>>   Note that the shard names differ because the old cluster's shards were
>>   split.
>>
>> As you can see, the ranges of shard3 and shard4 differ from the old
>> cluster. This change of hash ranges matches with the symptoms we are
>> currently experiencing.
>>
>> We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
>> in which David Smiley comments:
>>
>>   shard hash ranges aren't restored; this error could be disasterous
>>
>> It seems that this is what happened to us. We would like to hear some
>> suggestions on how we could recover from this problem.
>>
>> Best,
>> Gary

SolrCloud result correctness compared with single core

2015-01-23 Thread Yandong Yao

Hi Guys,

As the main scoring mechanism is based tf/idf, so will same query running
against SolrCloud return different result against running it against single
core with same data sets as idf will only count df inside one core?

eg: Assume I have 100GB data:
A) Index those data using single core
B) Index those data using SolrCloud with two cores (each has 50GB data
index)

Then If I query those with same query like 'apple', then will I get
different result for A and B?


Regards,
Yandong

Re: SolrCloud result correctness compared with single core

2015-01-29 Thread Yandong Yao

Pretty helpful, thanks Erick!

2015-01-24 9:48 GMT+08:00 Erick Erickson :

> you might, but probably not enough to notice. At 50G, the tf/idf
> stats will _probably_ be close enough you won't be able to tell.
>
> That said, recently distributed tf/idf has been implemented but
> you need to ask for it, see SOLR-1632. This is Solr 5.0 though.
>
> I've rarely seen it matter except in fairly specialized situations.
> Consider a single core. Deleted documents still count towards
> some of the tf/idf stats. So your scoring could theoretically
> change after, say, an optimize.
>
> So called "bottom line" is that yes, the scoring may change, but
> IMO not any more radically than was possible with single cores,
> and I wouldn't worry about unless I had evidence that it was
> biting me.
>
> Best
> Erick
>
> On Fri, Jan 23, 2015 at 2:52 PM, Yandong Yao  wrote:
>
> > Hi Guys,
> >
> > As the main scoring mechanism is based tf/idf, so will same query running
> > against SolrCloud return different result against running it against
> single
> > core with same data sets as idf will only count df inside one core?
> >
> > eg: Assume I have 100GB data:
> > A) Index those data using single core
> > B) Index those data using SolrCloud with two cores (each has 50GB data
> > index)
> >
> > Then If I query those with same query like 'apple', then will I get
> > different result for A and B?
> >
> >
> > Regards,
> > Yandong
> >
>

how to support "implicit trailing wildcards"

2010-08-09 Thread yandong yao

Hi everyone,


How to support 'implicit trailing wildcard *' using Solr, eg: using Google
to search 'umoun', 'umount' will be matched , search 'mounta', 'mountain'
will be matched.

>From my point of view, there are several ways, both with disadvantages:

1) Using EdgeNGramFilterFactory, thus 'umount' will be indexed with 'u',
'um', 'umo', 'umou', 'umoun', 'umount'. The disadvantages are: a) the index
size increases dramatically, b) will matches even has no relationship, such
as such 'mount' will match 'mountain' also.

2) Using two pass searching: first pass searches term dictionary through
TermsComponent using given keyword, then using the first matched term from
term dictionary to search again. eg: when user enter 'umoun', TermsComponent
will match 'umount', then use 'umount' to search. The disadvantage are: a)
need to parse query string so that could recognize meta keywords such as
'AND', 'OR', '+', '-', '"' (this makes more complex as I am using PHP
client), b) The returned hit counts is not for original search string, thus
will influence other components such as auto-suggest component based on user
search history and hit counts.

3) Write custom SearchComponent, while have no idea where/how to start with.

Is there any other way in Solr to do this, any feedback/suggestion are
welcome!

Thanks very much in advance!

Re: how to support "implicit trailing wildcards"

2010-08-09 Thread yandong yao

Hi Bastian,

Sorry for not make it clear, I also want exact match have higher score than
wildcard match, that is means: if searching 'mount', documents with 'mount'
will have higher score than documents with 'mountain', while 'mount*' seems
treat 'mount' and 'mountain' as same.

besides, also want the query to be processed with analyzer, while from
http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F,
Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer. The
rationale is that if search 'mounted', I also want documents with 'mount'
match.

So seems built-in wildcard search could not satisfy my requirements if i
understand correctly.

Thanks very much!


2010/8/9 Bastian Spitzer 

> Wildcard-Search is already built in, just use:
>
> ?q=umoun*
> ?q=mounta*
>
> -Ursprüngliche Nachricht-
> Von: yandong yao [mailto:yydz...@gmail.com]
> Gesendet: Montag, 9. August 2010 15:57
> An: solr-user@lucene.apache.org
> Betreff: how to support "implicit trailing wildcards"
>
> Hi everyone,
>
>
> How to support 'implicit trailing wildcard *' using Solr, eg: using Google
> to search 'umoun', 'umount' will be matched , search 'mounta', 'mountain'
> will be matched.
>
> From my point of view, there are several ways, both with disadvantages:
>
> 1) Using EdgeNGramFilterFactory, thus 'umount' will be indexed with 'u',
> 'um', 'umo', 'umou', 'umoun', 'umount'. The disadvantages are: a) the index
> size increases dramatically, b) will matches even has no relationship, such
> as such 'mount' will match 'mountain' also.
>
> 2) Using two pass searching: first pass searches term dictionary through
> TermsComponent using given keyword, then using the first matched term from
> term dictionary to search again. eg: when user enter 'umoun', TermsComponent
> will match 'umount', then use 'umount' to search. The disadvantage are: a)
> need to parse query string so that could recognize meta keywords such as
> 'AND', 'OR', '+', '-', '"' (this makes more complex as I am using PHP
> client), b) The returned hit counts is not for original search string, thus
> will influence other components such as auto-suggest component based on user
> search history and hit counts.
>
> 3) Write custom SearchComponent, while have no idea where/how to start
> with.
>
> Is there any other way in Solr to do this, any feedback/suggestion are
> welcome!
>
> Thanks very much in advance!
>

Re: how to support "implicit trailing wildcards"

2010-08-11 Thread yandong yao

Hi Jan,

Seems q=mount OR mount* have different sorting order with q=mount for those
documents including mount.
Change to  q=mount^100 OR (mount?* -mount)^1.0, and test well.

Thanks very much!

2010/8/10 Jan Høydahl / Cominvent 

> Hi,
>
> You don't need to duplicate the content into two fields to achieve this.
> Try this:
>
> q=mount OR mount*
>
> The exact match will always get higher score than the wildcard match
> because wildcard matches uses "constant score".
>
> Making this work for multi term queries is a bit trickier, but something
> along these lines:
>
> q=(mount OR mount*) AND (everest OR everest*)
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
>
> On 10. aug. 2010, at 09.38, Geert-Jan Brits wrote:
>
> > you could satisfy this by making 2 fields:
> > 1. exactmatch
> > 2. wildcardmatch
> >
> > use copyfield in your schema to copy 1 --> 2 .
> >
> > q=exactmatch:mount+wildcardmatch:mount*&q.op=OR
> > this would score exact matches above (solely) wildcard matches
> >
> > Geert-Jan
> >
> > 2010/8/10 yandong yao 
> >
> >> Hi Bastian,
> >>
> >> Sorry for not make it clear, I also want exact match have higher score
> than
> >> wildcard match, that is means: if searching 'mount', documents with
> 'mount'
> >> will have higher score than documents with 'mountain', while 'mount*'
> seems
> >> treat 'mount' and 'mountain' as same.
> >>
> >> besides, also want the query to be processed with analyzer, while from
> >>
> >>
> http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F
> >> ,
> >> Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer.
> >> The
> >> rationale is that if search 'mounted', I also want documents with
> 'mount'
> >> match.
> >>
> >> So seems built-in wildcard search could not satisfy my requirements if i
> >> understand correctly.
> >>
> >> Thanks very much!
> >>
> >>
> >> 2010/8/9 Bastian Spitzer 
> >>
> >>> Wildcard-Search is already built in, just use:
> >>>
> >>> ?q=umoun*
> >>> ?q=mounta*
> >>>
> >>> -Ursprüngliche Nachricht-
> >>> Von: yandong yao [mailto:yydz...@gmail.com]
> >>> Gesendet: Montag, 9. August 2010 15:57
> >>> An: solr-user@lucene.apache.org
> >>> Betreff: how to support "implicit trailing wildcards"
> >>>
> >>> Hi everyone,
> >>>
> >>>
> >>> How to support 'implicit trailing wildcard *' using Solr, eg: using
> >> Google
> >>> to search 'umoun', 'umount' will be matched , search 'mounta',
> 'mountain'
> >>> will be matched.
> >>>
> >>> From my point of view, there are several ways, both with disadvantages:
> >>>
> >>> 1) Using EdgeNGramFilterFactory, thus 'umount' will be indexed with
> 'u',
> >>> 'um', 'umo', 'umou', 'umoun', 'umount'. The disadvantages are: a) the
> >> index
> >>> size increases dramatically, b) will matches even has no relationship,
> >> such
> >>> as such 'mount' will match 'mountain' also.
> >>>
> >>> 2) Using two pass searching: first pass searches term dictionary
> through
> >>> TermsComponent using given keyword, then using the first matched term
> >> from
> >>> term dictionary to search again. eg: when user enter 'umoun',
> >> TermsComponent
> >>> will match 'umount', then use 'umount' to search. The disadvantage are:
> >> a)
> >>> need to parse query string so that could recognize meta keywords such
> as
> >>> 'AND', 'OR', '+', '-', '"' (this makes more complex as I am using PHP
> >>> client), b) The returned hit counts is not for original search string,
> >> thus
> >>> will influence other components such as auto-suggest component based on
> >> user
> >>> search history and hit counts.
> >>>
> >>> 3) Write custom SearchComponent, while have no idea where/how to start
> >>> with.
> >>>
> >>> Is there any other way in Solr to do this, any feedback/suggestion are
> >>> welcome!
> >>>
> >>> Thanks very much in advance!
> >>>
> >>
>
>

A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao

Hi Guys,

I encountered a problem when enabling WordDelimiterFilterFactory for both
index and query (pasted relative part of schema.xml at the bottom of email).

*1. Steps to reproduce:*
1.1 The indexed sample document contains only one sentence: "This is a
TechNote."
1.2 Query is: q=TechNote
1.3  Result: no matches return, while the above sentence contains word
'TechNote' absolutely.

*
2. Output when enabling debugQuery*
By turning on debugQuery
http://localhost:7111/solr/test/select?indent=on&version=2.2&q=TechNote&fq=&start=0&rows=0&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=id%3A001&hl.fl=,
get following information:

TechNote
TechNote
PhraseQuery(all:"tech note")
all:"tech note"

id:001


0.0 = fieldWeight(all:"tech note" in 0), product of: 0.0 =
tf(phraseFreq=0.0)
  0.61370564 = idf(all: tech=1 note=1)
  0.25 = fieldNorm(field=all, doc=0)



Seems that the raw query string is converted to phrase query "tech note",
while its term frequency is 0, so no matches.

*3. Result from admin/analysis.jsp page*

>From analysis.jsp, seems the query 'TechNote' matches the input document,
see below words marked by RED color.

Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}  term
position 1234 term text ThisisaTechNote. term type wordwordwordword source
start,end 0,45,78,910,19 payload



 org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
expand=true, ignoreCase=true}  term position 1234 term text
ThisisaTechNote. term
type wordwordwordword source start,end 0,45,78,910,19 payload



 org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0,
catenateNumbers=1}  term position 12345 term text ThisisaTechNote TechNote term
type wordwordwordwordword word source start,end 0,45,78,910,1414,18 10,18
payload





 org.apache.solr.analysis.LowerCaseFilterFactory {}  term position 12345 term
text thisisatechnote technote term type wordwordwordwordword word source
start,end 0,45,78,910,1414,18 10,18 payload





 org.apache.solr.analysis.SnowballPorterFilterFactory
{protected=protwords.txt, language=English}  term position 12345 term text
thisisa*tech**note* technot term type wordwordwordwordword word source
start,end 0,45,78,910,1414,18 10,18 payload





 Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}  term
position 1 term text TechNote term type word source start,end 0,8 payload
 org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
expand=true, ignoreCase=true}  term position 1 term text TechNote term type
word source start,end 0,8 payload
 org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=0, generateWordParts=1, catenateAll=0,
catenateNumbers=0}  term position 12 term text TechNote term type
wordword source
start,end 0,44,8 payload

 org.apache.solr.analysis.LowerCaseFilterFactory {}  term position 12 term
text technote term type wordword source start,end 0,44,8 payload

 org.apache.solr.analysis.SnowballPorterFilterFactory
{protected=protwords.txt, language=English} term position 12 term text tech
note term type wordword source start,end 0,44,8 payload


*
4. My questions are:*
4.1: Why debugQuery and analysis.jsp has different result?
4.2: From my understanding, during indexing, the word 'TechNote' will be
converted to: 1) 'technote' and 2) 'tech note' according to my config in
schema.xml. And at query time, 'TechNote' will be converted to 'tech note',
thus it SHOULD match.  Am I right?
 4.3: Why the phrase frequency 'tech note' is 0 in the output of
debugQuery result (0.0 = tf(phraseFreq=0.0))?

Any suggestion/comments are absolutely welcome!


*5. fieldType definition in schema.xml*


  





  
  





  



Thanks very much!

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao

Hi Robert,

I am using solr 1.4, will try with 1.4.1 tomorrow.

Thanks very much!

Regards,
Yandong Yao

2010/9/14 Robert Muir 

> did you index with solr 1.4 (or are you using solr 1.4) ?
>
> at a quick glance, it looks like it might be this:
> https://issues.apache.org/jira/browse/SOLR-1852 , which was fixed in 1.4.1
>
> On Tue, Sep 14, 2010 at 5:40 AM, yandong yao  wrote:
>
> > Hi Guys,
> >
> > I encountered a problem when enabling WordDelimiterFilterFactory for both
> > index and query (pasted relative part of schema.xml at the bottom of
> > email).
> >
> > *1. Steps to reproduce:*
> >1.1 The indexed sample document contains only one sentence: "This is a
> > TechNote."
> >1.2 Query is: q=TechNote
> >1.3  Result: no matches return, while the above sentence contains word
> > 'TechNote' absolutely.
> >
> > *
> > 2. Output when enabling debugQuery*
> > By turning on debugQuery
> >
> >
> http://localhost:7111/solr/test/select?indent=on&version=2.2&q=TechNote&fq=&start=0&rows=0&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=id%3A001&hl.fl=
> > ,
> > get following information:
> >
> > TechNote
> > TechNote
> > PhraseQuery(all:"tech note")
> > all:"tech note"
> > 
> > id:001
> > 
> > 
> > 0.0 = fieldWeight(all:"tech note" in 0), product of: 0.0 =
> > tf(phraseFreq=0.0)
> >  0.61370564 = idf(all: tech=1 note=1)
> >  0.25 = fieldNorm(field=all, doc=0)
> > 
> > 
> >
> > Seems that the raw query string is converted to phrase query "tech note",
> > while its term frequency is 0, so no matches.
> >
> > *3. Result from admin/analysis.jsp page*
> >
> > From analysis.jsp, seems the query 'TechNote' matches the input document,
> > see below words marked by RED color.
> >
> > Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>  term
> > position 1234 term text ThisisaTechNote. term type wordwordwordword
> source
> > start,end 0,45,78,910,19 payload
> >
> >
> >
> >  org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
> > expand=true, ignoreCase=true}  term position 1234 term text
> > ThisisaTechNote. term
> > type wordwordwordword source start,end 0,45,78,910,19 payload
> >
> >
> >
> >  org.apache.solr.analysis.WordDelimiterFilterFactory
> {splitOnCaseChange=1,
> > generateNumberParts=1, catenateWords=1, generateWordParts=1,
> catenateAll=0,
> > catenateNumbers=1}  term position 12345 term text ThisisaTechNote
> TechNote
> > term
> > type wordwordwordwordword word source start,end 0,45,78,910,1414,18 10,18
> > payload
> >
> >
> >
> >
> >
> >  org.apache.solr.analysis.LowerCaseFilterFactory {}  term position 12345
> > term
> > text thisisatechnote technote term type wordwordwordwordword word source
> > start,end 0,45,78,910,1414,18 10,18 payload
> >
> >
> >
> >
> >
> >  org.apache.solr.analysis.SnowballPorterFilterFactory
> > {protected=protwords.txt, language=English}  term position 12345 term
> text
> > thisisa*tech**note* technot term type wordwordwordwordword word source
> > start,end 0,45,78,910,1414,18 10,18 payload
> >
> >
> >
> >
> >
> >  Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> >  term
> > position 1 term text TechNote term type word source start,end 0,8 payload
> >  org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
> > expand=true, ignoreCase=true}  term position 1 term text TechNote term
> type
> > word source start,end 0,8 payload
> >  org.apache.solr.analysis.WordDelimiterFilterFactory
> {splitOnCaseChange=1,
> > generateNumberParts=1, catenateWords=0, generateWordParts=1,
> catenateAll=0,
> > catenateNumbers=0}  term position 12 term text TechNote term type
> > wordword source
> > start,end 0,44,8 payload
> >
> >  org.apache.solr.analysis.LowerCaseFilterFactory {}  term position 12
> term
> > text technote term type wordword source start,end 0,44,8 payload
> >
> >  org.apache.solr.analysis.SnowballPorterFilterFactory
> > {protected=protwords.txt, language=English} term position 12 term text
> tech
> > note term type wordword source start,end 0,44,8 payload
> >
> >
> > *
> > 4. My questions are:*
> >4.1: Why debugQuery and analysis.jsp has different result?
> >

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao

After upgrading to 1.4.1, it is fixed.

Thanks very much for your help!

Regards,
Yandong Yao

2010/9/14 yandong yao 

> Hi Robert,
>
> I am using solr 1.4, will try with 1.4.1 tomorrow.
>
> Thanks very much!
>
> Regards,
> Yandong Yao
>
> 2010/9/14 Robert Muir 
>
> did you index with solr 1.4 (or are you using solr 1.4) ?
>>
>> at a quick glance, it looks like it might be this:
>> https://issues.apache.org/jira/browse/SOLR-1852 , which was fixed in
>> 1.4.1
>>
>> On Tue, Sep 14, 2010 at 5:40 AM, yandong yao  wrote:
>>
>> > Hi Guys,
>> >
>> > I encountered a problem when enabling WordDelimiterFilterFactory for
>> both
>> > index and query (pasted relative part of schema.xml at the bottom of
>> > email).
>> >
>> > *1. Steps to reproduce:*
>> >1.1 The indexed sample document contains only one sentence: "This is
>> a
>> > TechNote."
>> >1.2 Query is: q=TechNote
>> >1.3  Result: no matches return, while the above sentence contains
>> word
>> > 'TechNote' absolutely.
>> >
>> > *
>> > 2. Output when enabling debugQuery*
>> > By turning on debugQuery
>> >
>> >
>> http://localhost:7111/solr/test/select?indent=on&version=2.2&q=TechNote&fq=&start=0&rows=0&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=id%3A001&hl.fl=
>> > ,
>> > get following information:
>> >
>> > TechNote
>> > TechNote
>> > PhraseQuery(all:"tech note")
>> > all:"tech note"
>> > 
>> > id:001
>> > 
>> > 
>> > 0.0 = fieldWeight(all:"tech note" in 0), product of: 0.0 =
>> > tf(phraseFreq=0.0)
>> >  0.61370564 = idf(all: tech=1 note=1)
>> >  0.25 = fieldNorm(field=all, doc=0)
>> > 
>> > 
>> >
>> > Seems that the raw query string is converted to phrase query "tech
>> note",
>> > while its term frequency is 0, so no matches.
>> >
>> > *3. Result from admin/analysis.jsp page*
>> >
>> > From analysis.jsp, seems the query 'TechNote' matches the input
>> document,
>> > see below words marked by RED color.
>> >
>> > Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>>  term
>> > position 1234 term text ThisisaTechNote. term type wordwordwordword
>> source
>> > start,end 0,45,78,910,19 payload
>> >
>> >
>> >
>> >  org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
>> > expand=true, ignoreCase=true}  term position 1234 term text
>> > ThisisaTechNote. term
>> > type wordwordwordword source start,end 0,45,78,910,19 payload
>> >
>> >
>> >
>> >  org.apache.solr.analysis.WordDelimiterFilterFactory
>> {splitOnCaseChange=1,
>> > generateNumberParts=1, catenateWords=1, generateWordParts=1,
>> catenateAll=0,
>> > catenateNumbers=1}  term position 12345 term text ThisisaTechNote
>> TechNote
>> > term
>> > type wordwordwordwordword word source start,end 0,45,78,910,1414,18
>> 10,18
>> > payload
>> >
>> >
>> >
>> >
>> >
>> >  org.apache.solr.analysis.LowerCaseFilterFactory {}  term position 12345
>> > term
>> > text thisisatechnote technote term type wordwordwordwordword word source
>> > start,end 0,45,78,910,1414,18 10,18 payload
>> >
>> >
>> >
>> >
>> >
>> >  org.apache.solr.analysis.SnowballPorterFilterFactory
>> > {protected=protwords.txt, language=English}  term position 12345 term
>> text
>> > thisisa*tech**note* technot term type wordwordwordwordword word source
>> > start,end 0,45,78,910,1414,18 10,18 payload
>> >
>> >
>> >
>> >
>> >
>> >  Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>> >  term
>> > position 1 term text TechNote term type word source start,end 0,8
>> payload
>> >  org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
>> > expand=true, ignoreCase=true}  term position 1 term text TechNote term
>> type
>> > word source start,end 0,8 payload
>> >  org.apache.solr.analysis.WordDelimiterFilterFactory
>> {splitOnCaseChange=1,
>> > generateNumberParts=1, catenateWords=0, generateWordParts=1,
>> catenateAll=0,
>> > catena

Re: Need help for solr searching case insensative item

2010-10-26 Thread yandong yao

Sounds like WordDelimiterFilter config issue, please refer to
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
.

Also it will help if you could provide:
1) Tokenizers/Filters config in schema file
2) analysis.jsp output in admin page.

2010/10/26 wu liu 

> Hi all,
>
> I just noticed a wierd thing happend to my solr search result.
> if I do a search for "ecommons", it cannot get the result for "eCommons",
> instead,
> if i do a search for "eCommons", i can only get all the match for
> "eCommons", but not "ecommons".
>
> I cannot figure it out why?
>
> please help me
>
> Thanks very much in advance
>

How to run many MoreLikeThis request efficiently?

2013-01-08 Thread Yandong Yao

Hi Solr Guru,

I have two set of documents in one SolrCore, each set has about 1M
documents with different document type, say 'type1' and 'type2'.

Many documents in first set are very similar with 1 or 2 documents in the
second set, What I want to get is:  for each document in set 2, return the
most similar document in set 1 using either 'MoreLikeThisHandler' or
'MoreLikeThisComponent'.

Currently I use following code to get the result, while it will send far
too many request to Solr server serially.  Is there any way to enhance this
besides using multi-threading?  Thanks very much!

for each document in set 2 whose type is 'type2'
run MoreLikeThis request against Solr server and get the most similar
document
end.

Regards,
Yandong

Re: How to run many MoreLikeThis request efficiently?

2013-01-09 Thread Yandong Yao

Any comments on this? Thanks very much in advance!

2013/1/9 Yandong Yao 

> Hi Solr Guru,
>
> I have two set of documents in one SolrCore, each set has about 1M
> documents with different document type, say 'type1' and 'type2'.
>
> Many documents in first set are very similar with 1 or 2 documents in the
> second set, What I want to get is:  for each document in set 2, return the
> most similar document in set 1 using either 'MoreLikeThisHandler' or
> 'MoreLikeThisComponent'.
>
> Currently I use following code to get the result, while it will send far
> too many request to Solr server serially.  Is there any way to enhance this
> besides using multi-threading?  Thanks very much!
>
> for each document in set 2 whose type is 'type2'
> run MoreLikeThis request against Solr server and get the most similar
> document
> end.
>
> Regards,
> Yandong
>

Re: How to run many MoreLikeThis request efficiently?

2013-01-09 Thread Yandong Yao

Hi Otis,

Really appreciate your help on this!!  Will go with multi-thread firstly,
and then provide a custom component when performance is not good enough.

Regards,
Yandong

2013/1/10 Otis Gospodnetic 

> Patience, young Yandong :)
>
> Multi-threading *in your application* is the way to go. Alternatively, one
> could write a custom SearchComponent that is called once and inside of
> which the whole work is done after just one call to it. This component
> could then write the output somewhere, like in a new index since making a
> blocking call to it may time out.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Jan 9, 2013 6:07 PM, "Yandong Yao"  wrote:
>
> > Any comments on this? Thanks very much in advance!
> >
> > 2013/1/9 Yandong Yao 
> >
> > > Hi Solr Guru,
> > >
> > > I have two set of documents in one SolrCore, each set has about 1M
> > > documents with different document type, say 'type1' and 'type2'.
> > >
> > > Many documents in first set are very similar with 1 or 2 documents in
> the
> > > second set, What I want to get is:  for each document in set 2, return
> > the
> > > most similar document in set 1 using either 'MoreLikeThisHandler' or
> > > 'MoreLikeThisComponent'.
> > >
> > > Currently I use following code to get the result, while it will send
> far
> > > too many request to Solr server serially.  Is there any way to enhance
> > this
> > > besides using multi-threading?  Thanks very much!
> > >
> > > for each document in set 2 whose type is 'type2'
> > > run MoreLikeThis request against Solr server and get the most
> similar
> > > document
> > > end.
> > >
> > > Regards,
> > > Yandong
> > >
> >
>

Re: Index optimize takes more than 40 minutes for 18M documents

2013-02-21 Thread Yandong Yao

Thans Walter for info, we will disable optimize then and do more testing.

Regards,
Yandong

2013/2/22 Walter Underwood 

> That seems fairly fast. We index about 3 million documents in about half
> that time. We are probably limited by the time it takes to get the data
> from MySQL.
>
> Don't optimize. Solr automatically merges index segments as needed.
> Optimize forces a full merge. You'll probably never notice the difference,
> either in disk space or speed.
>
> It might make sense to force merge (optimize) if you reindex everything
> once per day and have no updates in between. But even then it may be a
> waste of time.
>
> You need lots of free disk space for merging, whether a forced merge or
> automatic. Free space equal to the size of the index is usually enough, but
> worst case can need double the size of the index.
>
> wunder
>
> On Feb 21, 2013, at 9:20 AM, Yandong Yao wrote:
>
> > Hi Guys,
> >
> > I am using Solr 4.1 and have indexed 18M documents using solrj
> > ConcurrentUpdateSolrServer (each document contains 5 fields, and average
> > length is less than 1k).
> >
> > 1) It takes 70 minutes to index those documents without optimize on my
> mac
> > 10.8, how is the performance, slow, fast or common?
> >
> > 2) It takes about 40 minutes to optimize those documents, following is
> top
> > output, and there are lots of FAULTS, what does this means?
> >
> > Processes: 118 total, 2 running, 8 stuck, 108 sleeping, 719 threads
> >
> >   00:56:52
> > Load Avg: 1.48, 1.56, 1.73  CPU usage: 6.63% user, 6.40% sys, 86.95% idle
> > SharedLibs: 31M resident, 0B data, 6712K linkedit.
> > MemRegions: 34734 total, 5801M resident, 39M private, 638M shared.
> PhysMem:
> > 982M wired, 3600M active, 3567M inactive, 8150M used, 38M free.
> > VM: 254G vsize, 1285M framework vsize, 1469887(368) pageins, 1095550(0)
> > pageouts.  Networks: packets: 14842595/9661M in, 14777685/9395M out.
> > Disks: 820048/43G read, 523814/53G written.
> >
> > PID   COMMAND  %CPU  TIME #TH  #WQ  #POR #MRE RPRVT  RSHRD  RSIZE
> > VPRVT  VSIZE  PGRP PPID STATE   UID  FAULTS   COW  MSGSENT  MSGRECV
> SYSBSD
> >   SYSMACH
> > 4585  java 11.7  02:52:01 32   1483  342  3866M+ 6724K
>  3856M+
> > 4246M  6908M  4580 4580 sleepin 501  1490340+ 402  3000781+ 231785+
> > 15044055+ 10033109+
> >
> > 3) If I don't run optimize, what is the impact? bigger disk size or slow
> > query performance?
> >
> > Following is my index config in  solrconfig.xml:
> >
> > 100
> > 10
> > 
> >   10
> >   30
> >   false
> > 
> >
> > Thanks very much in advance!
> >
> > Regards,
> > Yandong
>
>
>
>
>

How to use nested query in fq?

2012-02-07 Thread Yandong Yao

Hi Guys,

I am using Solr 3.5, and would like to use a fq like
'getField(getDoc(uuid:workspace_${workspaceId})),  "isPublic"):true?

- workspace_${workspaceId}:  workspaceId is indexed field.
- getDoc(uuid:concat("workspace_", workspaceId):  return the document whose
uuid is "workspace_${workspaceId}"
- getField(getDoc(uuid:workspace_${workspaceId})),  "isPublic"):  return
the matched document's isPublic field

The use case is that I have workspace objects and workspace contains many
sub-objects, such as work files, comments, datasets and so on. And
workspace has a 'isPublic' field. If this field is true, then all
registered user could access this workspace and all its sub-objects.
Otherwise, only workspace member could access this workspace and its
sub-objects.

So I want to use fq to determine whether document in question belongs to
public workspace or not.  Is it possible?

If not, how to implement similar feature like this? implement a
ValueSourcePlugin? any guidance or example on this?

Or is there any better solutions?


It is possible to add 'isPublic' field to all sub-objects, while it makes
indexing update more complex. so try to find better solution.

Thanks very much in advance!

Regards,
Yandong

Re: Faster Solr Indexing

2012-03-11 Thread Yandong Yao

I have similar issues by using DIH,
and org.apache.solr.update.DirectUpdateHandler2.addDoc(AddUpdateCommand)
consumes most of the time when indexing 10K rows (each row is about 70K)
-  DIH nextRow takes about 10 seconds totally
-  If index uses whitespace tokenizer and lower case filter, then
addDoc() methods takes about 80 seconds
-  If index uses whitespace tokenizer, lower case filer, WDF, then
addDoc uses about 112 seconds
-  If index uses whitespace tokenizer, lower case filer, WDF and porter
stemmer, then addDoc uses about 145 seconds

We have more than million rows totally, and am wondering whether i am using
sth. wrong or is there any way to improve the performance of addDoc()?

Thanks very much in advance!


Following is the configure:
1) JVM:  -Xms256M -Xmx1048M -XX:MaxPermSize=512m
2) Solr version 3.5
3) solrconfig.xml  (almost copied from solr's  example/solr directory.)

  

false

10

64



2147483647
1000
1

native
  

2012/3/11 Peyman Faratin 

> Hi
>
> I am trying to index 12MM docs faster than is currently happening in Solr
> (using solrj). We have identified solr's add method as the bottleneck (and
> not commit - which is tuned ok through mergeFactor and maxRamBufferSize and
> jvm ram).
>
> Adding 1000 docs is taking approximately 25 seconds. We are making sure we
> add and commit in batches. And we've tried both CommonsHttpSolrServer and
> EmbeddedSolrServer (assuming removing http overhead would speed things up
> with embedding) but the differences is marginal.
>
> The docs being indexed are on average 20 fields long, mostly indexed but
> none stored. The major size contributors are two fields:
>
>- content, and
>- shingledContent (populated using copyField of content).
>
> The length of the content field is (likely) gaussian distributed (few
> large docs 50-80K tokens, but majority around 2k tokens). We use
> shingledContent to support phrase queries and content for unigram queries
> (following the advice of Solr Enterprise search server advice - p. 305,
> section "The Solution: Shingling").
>
> Clearly the size of the docs is a contributor to the slow adds (confirmed
> by removing these 2 fields resulting in halving the indexing time). We've
> tried compressed=true also but that is not working.
>
> Any guidance on how to support our application logic (without having to
> change the schema too much) and speed the indexing speed (from current 212
> days for 12MM docs) would be much appreciated.
>
> thank you
>
> Peyman
>
>

SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-21 Thread Yandong Yao

Hi Guys,

I use following command to start solr cloud according to solr cloud wiki.

yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983 -jar
start.jar

Then I have created several cores using CoreAdmin API (
http://localhost:8983/solr/admin/cores?action=CREATE&name=
&collection=collection1), and clusterstate.json show following
topology:


collection1:
-- shard1:
  -- collection1
  -- CoreForCustomer1
  -- CoreForCustomer3
  -- CoreForCustomer5
-- shard2:
  -- collection1
  -- CoreForCustomer2
  -- CoreForCustomer4


1) Index:

Using following command to index mem.xml file in exampledocs directory.

yydzero:exampledocs bjcoe$ java -Durl=
http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml
SimplePostTool: version 1.4
SimplePostTool: POSTing files to
http://localhost:8983/solr/coreForCustomer3/update..
SimplePostTool: POSTing file mem.xml
SimplePostTool: COMMITting Solr index changes.

And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3',
'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2
core has 0 documents.

*Question 1:*  Is this expected behavior? How do I to index documents into
a specific core?

*Question 2*:  If SolrCloud don't support this yet, how could I extend it
to support this feature (index document to particular core), where should i
start, the hashing algorithm?

*Question 3*:  Why the documents are also indexed into 'coreForCustomer1'
and 'coreForCustomer5'?  The default replica for documents are 1, right?

Then I try to index some document to 'coreForCustomer2':

$ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar
post.jar ipod_video.xml

While 'coreForCustomer2' still have 0 documents and documents in ipod_video
are indexed to core for customer 1/3/5.

*Question 4*:  Why this happens?

2) Search: I use "
http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*&wt=xml"; to
search against 'CoreForCustomer2', while it will return all documents in
the whole collection even though this core has no documents at all.

Then I use "
http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*&wt=xml&shards=localhost:8983/solr/coreForCustomer2";,
and it will return 0 documents.

*Question 5*: So If want to search against a particular core, we need to
use 'shards' parameter and use solrCore name as parameter value, right?


Thanks very much in advance!

Regards,
Yandong

Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-22 Thread Yandong Yao

Hi Darren,

Thanks very much for your reply.

The reason I want to control core indexing/searching is that I want to
use one core to store one customer's data (all customer share same
config):  such as customer 1 use coreForCustomer1 and customer 2
use coreForCustomer2.

Is there any better way than using different core for different customer?

Another way maybe use different collection for different customer, while
not sure how many collections solr cloud could support. Which way is better
in terms of flexibility/scalability? (suppose there are tens of thousands
customers).

Regards,
Yandong

2012/5/22 Darren Govoni 

> Why do you want to control what gets indexed into a core and then
> knowing what core to search? That's the kind of "knowing" that SolrCloud
> solves. In SolrCloud, it handles the distribution of documents across
> shards and retrieves them regardless of which node is searched from.
> That is the point of "cloud", you don't know the details of where
> exactly documents are being managed (i.e. they are cloudy). It can
> change and re-balance from time to time. SolrCloud performs the
> distributed search for you, therefore when you try to search a node/core
> with no documents, all the results from the "cloud" are retrieved
> regardless. This is considered "A Good Thing".
>
> It requires a change in thinking about indexing and searching
>
> On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote:
> > Hi Guys,
> >
> > I use following command to start solr cloud according to solr cloud wiki.
> >
> > yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
> > -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
> > yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983
> -jar
> > start.jar
> >
> > Then I have created several cores using CoreAdmin API (
> > http://localhost:8983/solr/admin/cores?action=CREATE&name=
> > &collection=collection1), and clusterstate.json show following
> > topology:
> >
> >
> > collection1:
> > -- shard1:
> >   -- collection1
> >   -- CoreForCustomer1
> >   -- CoreForCustomer3
> >   -- CoreForCustomer5
> > -- shard2:
> >   -- collection1
> >   -- CoreForCustomer2
> >   -- CoreForCustomer4
> >
> >
> > 1) Index:
> >
> > Using following command to index mem.xml file in exampledocs directory.
> >
> > yydzero:exampledocs bjcoe$ java -Durl=
> > http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml
> > SimplePostTool: version 1.4
> > SimplePostTool: POSTing files to
> > http://localhost:8983/solr/coreForCustomer3/update..
> > SimplePostTool: POSTing file mem.xml
> > SimplePostTool: COMMITting Solr index changes.
> >
> > And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3',
> > 'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2
> > core has 0 documents.
> >
> > *Question 1:*  Is this expected behavior? How do I to index documents
> into
> > a specific core?
> >
> > *Question 2*:  If SolrCloud don't support this yet, how could I extend it
> > to support this feature (index document to particular core), where
> should i
> > start, the hashing algorithm?
> >
> > *Question 3*:  Why the documents are also indexed into 'coreForCustomer1'
> > and 'coreForCustomer5'?  The default replica for documents are 1, right?
> >
> > Then I try to index some document to 'coreForCustomer2':
> >
> > $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar
> > post.jar ipod_video.xml
> >
> > While 'coreForCustomer2' still have 0 documents and documents in
> ipod_video
> > are indexed to core for customer 1/3/5.
> >
> > *Question 4*:  Why this happens?
> >
> > 2) Search: I use "
> > http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*&wt=xml"; to
> > search against 'CoreForCustomer2', while it will return all documents in
> > the whole collection even though this core has no documents at all.
> >
> > Then I use "
> >
> http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*&wt=xml&shards=localhost:8983/solr/coreForCustomer2
> ",
> > and it will return 0 documents.
> >
> > *Question 5*: So If want to search against a particular core, we need to
> > use 'shards' parameter and use solrCore name as parameter value, right?
> >
> >
> > Thanks very much in advance!
> >
> > Regards,
> > Yandong
>
>
>

Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-23 Thread Yandong Yao

Hi Mark, Darren

Thanks very much for your help, Will try collection for each customer then.

Regards,
Yandong


2012/5/22 Mark Miller 

> I think the key is this: you want to think of a SolrCore on a single node
> Solr installation as a collection on a multi node SolrCloud installation.
>
> So if you would use multiple SolrCore's with a std Solr setup, you should
> be using multiple collections in SolrCloud. If you were going to try to do
> everything in one SolrCore, that would be like putting everything in one
> collection in SolrCloud. I don't think it generally makes sense to try and
> work at the SolrCore level when working with SolrCloud. This will be made
> more clear once we add a simple collections api.
>
> So I think your choice should be similar to using a single node - do you
> want to put everything in one 'collection' and use a filter to separate
> customers (with all its caveats and limitations) or do you want to use a
> collection per customer. You can always start up more clusters if you reach
> any limits.
>
>
>
> On May 22, 2012, at 10:08 AM, Darren Govoni wrote:
>
> > I'm curious what the solrcloud experts say, but my suggestion is to try
> not to over-engineering the search architecture  on solrcloud. For example,
> what is the benefit of managing the what cores are indexed and searched?
> Having to know those details, in my mind, works against the automation in
> solrcore, but maybe there's a good reason you want to do it this way.
> >
> > --- Original Message ---
> > On 5/22/2012  07:35 AM Yandong Yao wrote:Hi Darren,
> > 
> > Thanks very much for your reply.
> > 
> > The reason I want to control core indexing/searching is that I want
> to
> > use one core to store one customer's data (all customer share same
> > config):  such as customer 1 use coreForCustomer1 and customer 2
> > use coreForCustomer2.
> > 
> > Is there any better way than using different core for different
> customer?
> > 
> > Another way maybe use different collection for different customer,
> while
> > not sure how many collections solr cloud could support. Which way is
> better
> > in terms of flexibility/scalability? (suppose there are tens of
> thousands
> > customers).
> > 
> > Regards,
> > Yandong
> > 
> > 2012/5/22 Darren Govoni 
> > 
> > > Why do you want to control what gets indexed into a core and then
> > > knowing what core to search? That's the kind of "knowing" that
> SolrCloud
> > > solves. In SolrCloud, it handles the distribution of documents
> across
> > > shards and retrieves them regardless of which node is searched
> from.
> > > That is the point of "cloud", you don't know the details of where
> > > exactly documents are being managed (i.e. they are cloudy). It can
> > > change and re-balance from time to time. SolrCloud performs the
> > > distributed search for you, therefore when you try to search a
> node/core
> > > with no documents, all the results from the "cloud" are retrieved
> > > regardless. This is considered "A Good Thing".
> > >
> > > It requires a change in thinking about indexing and searching
> > >
> > > On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote:
> > > > Hi Guys,
> > > >
> > > > I use following command to start solr cloud according to solr
> cloud wiki.
> > > >
> > > > yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
> > > > -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar
> start.jar
> > > > yydzero:example2 bjcoe$ java -Djetty.port=7574
> -DzkHost=localhost:9983
> > > -jar
> > > > start.jar
> > > >
> > > > Then I have created several cores using CoreAdmin API (
> > > > http://localhost:8983/solr/admin/cores?action=CREATE&name=
> > > > &collection=collection1), and clusterstate.json show
> following
> > > > topology:
> > > >
> > > >
> > > > collection1:
> > > > -- shard1:
> > > >   -- collection1
> > > >   -- CoreForCustomer1
> > > >   -- CoreForCustomer3
> > > >   -- CoreForCustomer5
> > > > -- shard2:
> > > >   -- collection1
> > > >   -- CoreForCustomer2
> > > >   -- CoreForCustomer4
> > > >
> > > >
> > > > 1) Index:
> > > >
> > > > Using

Count is inconsistent between facet and stats

2012-07-18 Thread Yandong Yao

Hi Guys,

Steps to reproduce:

1) Download apache-solr-4.0.0-ALPHA
2) cd example;  java -jar start.jar
3) cd exampledocs;  ./post.sh *.xml
4) Use statsComponent to get the stats info for field 'popularity' based on
facet 'cat'.  And the 'count' for 'electronics' is 3
http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&stats=true&stats.field=popularity&stats.facet=cat

{

   - stats_fields:
   {
  - popularity:
  {
 - min: 0,
 - max: 10,
 - count: 14,
 - missing: 0,
 - sum: 75,
 - sumOfSquares: 503,
 - mean: 5.357142857142857,
 - stddev: 2.7902892835178013,
 - facets:
 {
- cat:
{
   - music:
   {
  - min: 10,
  - max: 10,
  - count: 1,
  - missing: 0,
  - sum: 10,
  - sumOfSquares: 100,
  - mean: 10,
  - stddev: 0
  },
   - monitor:
   {
  - min: 6,
  - max: 6,
  - count: 2,
  - missing: 0,
  - sum: 12,
  - sumOfSquares: 72,
  - mean: 6,
  - stddev: 0
  },
   - hard drive:
   {
  - min: 6,
  - max: 6,
  - count: 2,
  - missing: 0,
  - sum: 12,
  - sumOfSquares: 72,
  - mean: 6,
  - stddev: 0
  },
   - scanner:
   {
  - min: 6,
  - max: 6,
  - count: 1,
  - missing: 0,
  - sum: 6,
  - sumOfSquares: 36,
  - mean: 6,
  - stddev: 0
  },
   - memory:
   {
  - min: 0,
  - max: 7,
  - count: 3,
  - missing: 0,
  - sum: 12,
  - sumOfSquares: 74,
  - mean: 4,
  - stddev: 3.605551275463989
  },
   - graphics card:
   {
  - min: 7,
  - max: 7,
  - count: 2,
  - missing: 0,
  - sum: 14,
  - sumOfSquares: 98,
  - mean: 7,
  - stddev: 0
  },
   - electronics:
   {
  - min: 1,
  - max: 7,
  - count: 3,
  - missing: 0,
  - sum: 9,
  - sumOfSquares: 51,
  - mean: 3,
  - stddev: 3.4641016151377544
  }
   }
}
 }
  }

}
5)  Facet on 'cat' and the count is 14.
http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&facet=true&facet.field=cat

{

   - cat:
   [
  - "electronics",
  - 14,
  - "memory",
  - 3,
  - "connector",
  - 2,
  - "graphics card",
  - 2,
  - "hard drive",
  - 2,
  - "monitor",
  - 2,
  - "camera",
  - 1,
  - "copier",
  - 1,
  - "multifunction printer",
  - 1,
  - "music",
  - 1,
  - "printer",
  - 1,
  - "scanner",
  - 1,
  - "currency",
  - 0,
  - "search",
  - 0,
  - "software",
  - 0
  ]

},



So from StatsComponent the count for 'electronics' cat is 3, while
FacetComponent report 14 'electronics'. Is this a bug?

Following is the field definition for 'cat'.


Thanks,
Yandong

mergeindex: what happens if there is deletion during index merging

2012-08-20 Thread Yandong Yao

Hi guys,

>From http://wiki.apache.org/solr/MergingSolrIndexes,  it said 'Using
"srcCore", care is taken to ensure that the merged index is not corrupted
even if writes are happening in parallel on the source index'.

What does it means? If there are deletion request during merging, will this
deletion be processed correctly after merging finished?

1)
eg:  I have an existing core 'core0', and I want to merge core 'core1' and
'core2' to core 'core0', so I will use
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=core0&srcCore=core1&srcCore=core2
,

During the merging happens, core0, core1, core2 have received deletion
request to delete some old documents, will the final core 'core0' contains
all content from 'core1' and 'core2' and also all documents matches
deletion criteria has been deleted?

2)
And if core0, core1, and core2 are processing deletion request, at the same
time core merge request comes in, what will happen then? Will merge request
block until deletion finished on all cores?

Thanks very much in advance!

Regards,
Yandong

Re: mergeindex: what happens if there is deletion during index merging

2012-08-21 Thread Yandong Yao

Hi Shalin,

Thanks very much for your detailed explanation!

Regards,
Yandong

2012/8/21 Shalin Shekhar Mangar 

> On Tue, Aug 21, 2012 at 8:47 AM, Yandong Yao  wrote:
>
> > Hi guys,
> >
> > From http://wiki.apache.org/solr/MergingSolrIndexes,  it said 'Using
> > "srcCore", care is taken to ensure that the merged index is not corrupted
> > even if writes are happening in parallel on the source index'.
> >
> > What does it means? If there are deletion request during merging, will
> this
> > deletion be processed correctly after merging finished?
> >
>
> Solr keeps an instance of the IndexReader for each srcCore which is a
> static snapshot of the index at the time of the merge request. This static
> snapshot is merged to the target core. Therefore any insert/delete request
> made to the srcCores after the merge request will not affect the merged
> index.
>
>
> >
> > 1)
> > eg:  I have an existing core 'core0', and I want to merge core 'core1'
> and
> > 'core2' to core 'core0', so I will use
> >
> >
> http://localhost:8983/solr/admin/cores?action=mergeindexes&core=core0&srcCore=core1&srcCore=core2
> > ,
> >
> > During the merging happens, core0, core1, core2 have received deletion
> > request to delete some old documents, will the final core 'core0'
> contains
> > all content from 'core1' and 'core2' and also all documents matches
> > deletion criteria has been deleted?
> >
>
> The final core0 will not have documents deleted by requests made on core0.
> However, documents deleted on core1 and core2 will still be in core0 if the
> merge started before those requests were made.
>
>
> >
> > 2)
> > And if core0, core1, and core2 are processing deletion request, at the
> same
> > time core merge request comes in, what will happen then? Will merge
> request
> > block until deletion finished on all cores?
> >
>
> I believe core0 will continue to process deletion requests concurrently
> with the merge. As for core1 and core2, since a merge reserves their
> IndexReader, the answer depends on when a commit happens on core1 and
> core2. If, for example, 2 deletions were made on core1 and then a commit
> was issued (or autoCommit happened) and then the merge was triggered then
> the final core0 will not have those documents but it may still have docs
> deleted after the commit.
>
>
> >
> > Thanks very much in advance!
> >
> > Regards,
> > Yandong
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Yao Ge

Having a large number of fields is not the same as having a large number of
facets. To facets are something you would display to users as aid for query
refinement or navigation. There is no way for a user to use 3700 facets at
the same time. So it more of question on how to determine what facets to
fetch on search time based on the user's actions or based on certain
predefined configurations. I have written an application with 30 some
facetable fields on millions of records, I also ran into the issue of
calculate all facets as the server resources as limited to number of caches
available and CPU cycles available for facet calculations. I then realize
why display all these facet regardless user want to see them or not? I have
then change to approach to only fetch minimum set of facets by default and
make the rest of facets fields open on-demand (using AJAX). I was able to
dramatically increase the response time by spreading the facet loading
overtime. There are still issues of total facet caches when you have a large
number available facets, but you need realistically evaluate what does it
means to a user to have large number of facet. I don't think on typical user
interface having more than 10 filters showing at the same time will be any
more effective than having a small number of filters to begin with and
progressive showing more on-demand (hierarchical facets?)

Rahul R wrote:
> 
> Hello,
> We are trying to get Solr to work for a really huge parts database.
> Details
> of the database
> - 55 million parts
> - Totally 3700 properties (facets). But each record will not have value
> for
> all properties.
> - Most of these facets are defined as dynamic fields within the Solr Index
> 
> We were getting really unacceptable timing while doing faceting/searches
> on
> an index created with this database. With only one user using the system,
> query times are in excess of 1 minute. With more users concurrently using
> the system, the response times are further high.
> 
> We thought that by limiting the number of properties that are available
> for
> faceting, the performance can be improved. To test this, we enabled only 6
> properties for faceting by setting indexed=true (in schema.xml) for only
> these properties. All other properties which are defined as dynamic
> properties had indexed=false. The observations after this change :
> 
> - Index size reduced by a meagre 5 % only
> - Performance did not improve. Infact during PSR run we observed that it
> degraded.
> 
> My questions:
>  - Will reducing the number of facets improve faceting and search
> performance ?
> - Is there a better way to reduce the number of facets ?
> - Will having a large number of properties defined as dynamic fields,
> reduce
> performance ?
> 
> Thank you.
> 
> Regards
> Rahul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Limiting-facets-for-huge-data---setting-indexed%3Dfalse-in-schema.xml-tp24751763p24761778.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Item Facet

2009-08-07 Thread Yao Ge


Are your product_name* fields numeric fields (integer or float)? 


Dals wrote:
> 
> Hi...
> 
> Is there any way to group values like shopping.yahoo.com or
> shopper.cnet.com do?
> 
> For instance, I have documents like:
> 
> doc1 - product_name1 - value1
> doc2 - product_name1 - value2
> doc3 - product_name1 - value3
> doc4 - product_name2 - value4
> doc5 - product_name2 - value5
> doc6 - product_name2 - value6
> 
> I'd like to have a result grouping by product name with the value
> range per product. Something like:
> 
> product_name1 - (value1 to value3)
> product_name2 - (value4 to value6)
> 
> It is not like the current facet because the information is grouped by
> item, not the entire result.
> 
> Any idea?
> 
> Thanks!
> 
> David Lojudice Sobrinho
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Item-Facet-tp24853669p24865535.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Google Side-By-Side UI

2009-10-02 Thread Yao Ge

Yes. I think would be very helpful tool for tunning search relevancy - you
can do a controlled experiment with your target audiences to understand
their responses to the parameter changes. We plan to use this feature to
benchmark Lucene/SOLR against our in-house commercial search engine - it
will be an interesting test.

Lance Norskog-2 wrote:
> 
> http://googleenterprise.blogspot.com/2009/08/compare-enterprise-search-relevance.html
> 
> This is really cool, and a version for Solr would help in doing
> relevance experiments. We don't need the "select A or B" feature, just
> seeing search result sets side-by-side would be great.
> 
> -- 
> Lance Norskog
> goks...@gmail.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Google-Side-By-Side-UI-tp25719087p25719806.html
Sent from the Solr - User mailing list archive at Nabble.com.

DIH - Export to XML

2009-10-30 Thread Yao Ge


For Data Import Handler, there is a way to dump data to a SOLR feed format
XML file?
-- 
View this message in context: 
http://old.nabble.com/DIH---Export-to-XML-tp26138213p26138213.html
Sent from the Solr - User mailing list archive at Nabble.com.

encountered the "Cannot allocate memory" when calling snapshooter program after optimize command

2009-01-07 Thread Justin Yao


Hi,

I configured solr to listen on postOptimize event and call the 
snapshooter program after an optimize command. It works well when the 
Java heap size is set to less than 4G. But if I increased the java heap 
size to 5G, the snapshooter program can't be successfully called after 
the optimize command and error message is here:


SEVERE: java.io.IOException: Cannot run program 
"/home/solr_1.3/solr/bin/snapshooter" (in directory 
"/home/solr_1.3/solr/bin"): java.io.IOException: error=12, Cannot 
allocate memory

at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at java.lang.Runtime.exec(Runtime.java:593)

Here is my server platform:

OS: CentOS 5.2 x86_64
Memory: 8G
Solr: 1.3


Any suggestion is appreciated.

Thanks,
Justin

Query Boost Functions

2009-05-18 Thread Yao Ge


I have a field named "last-modified" that I like to use in bf (Boot
Functions) parameter:
recip(rord(last-modified),1,1000,1000) in DisMaxRequestHander.
However the Solr query parser complain about the syntax of the formula. I
think it is related with hyphen in the field name. I have tried to add
single and double quote around the field name but didn't help.
 
Can field name contain hyphen in boot functions? How to do it? If not, where
do I find the field name special character restrictions?
 
-Yao

-- 
View this message in context: 
http://www.nabble.com/Query-Boost-Functions-tp23595860p23595860.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Shard - Strange results

2009-05-18 Thread Yao Ge


Maybe you want to try with docNumber field type as "string" and see it would
make a difference.


CB-PO wrote:
> 
> I'm not quite sure what logs you are talking about, but in the
> tomcat/logs/catalina.out logs, i found the following [note, i can't
> copy/paste, so i am typing up a summary]:
> 
> I execute command: 
> localhost:8080/bravo/select?q=fred&rows=102&start=0&shards=localhost:8080/alpha,localhost:8080/bravo
>  
> In this example, alpha has 27 instances of "fred", while bravo has 0.
> 
> Then in the catalina.out:
> 
> -There is the request for the command i sent, shards parameters and all. 
> it has the proper queryString.
> -Then I see the two requests sent to the shards, apha and bravo.  These
> two requests weave between each other until they are finished:
>  INFO: REQUEST URI =/alpha/select
>  INFO: REQUEST URI =/bravo/select
>   The parameters have changed to:
>  
> wt=javabin&fsv=true&version=2.2&f1=docNumber,score&q=fred&rows=102&isShard=true&start=0
> 
> -Then 2 INFO's scroll across:
> INFO: [] webapp=/bravo path=/select
> params={wt=javabin&fsv=true&version=2.2&f1=docNumber,score&q=fred&rows=102&isShard=true&start=0}
> hits=0 status=0 QTime=1
> INFO: [] webapp=/alpha path=/select
> params={wt=javabin&fsv=true&version=2.2&f1=docNumber,score&q=fred&rows=102&isShard=true&start=0}
> hits=27 status=0 QTime=1
> **Note, hits=27
> 
> -Then i see some octet-streams being transferred, with status 200, so
> those are OK.
> 
> -The i see something peculiar:
>   It calls alpha with the following parameters: 
> wt=javabin&version=2.2&ids=ABC-1353,ABC-408,ABC-1355,ABC-1824,ABC-1354,FRED-ID-27,55&q=fred&rows=102¶meter=isShard=true&start=0
> 
> Performing this query on my own (without the wt=javabin) gives me
> numFound=2, the result-set I get back from the overarching query.  
> Changing it to rows=10, it gives me numFound=2, and 2 's.  This is
> not the strange functionality I was seeing with the overarching query and
> the mis-matched "numfound" and 's.
> 
> This does beg the question.. why did it add:
> "ids=ABC-1353,ABC-408,ABC-1355,ABC-1824,ABC-1354,FRED-ID-27,55" to the
> query?  They are the format that would be under docNumber, if that helps.. 
> Any thoughts?  I will do some research on those particular ID numbered
> docs, in the mean time.
> 
> Here's the configuration information.  I only posted the difference from
> the default files in the solr/example/solr/conf
> 
> [solrconfig.xml]
> 
>   ${solr.data.dir:/data/indices/bravo/solr/data
>   
>class="org.apache.solr.handler.dataimport.DataImportHandler">
>   
>name="config">/data/indices/bravo/solr/conf/data-config.xml
>   
>   
> 
> 
> 
> [schema.xml]
> 
>   
>stored="true" />
>/>
>/>
>/>
>/>
>/>
>/>
>/>
>/>
>/>
>   
>   docNumber
>   column2
> 
> 
> 
> [data-config.xml]
> 
>url="jdbc:metamatrix:b...@mms://hostname:port" user="username"
> password="password"/>
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
> 
> 
> 
> 
> 
> 
> Yonik Seeley-2 wrote:
>> 
>> On Fri, May 15, 2009 at 4:11 PM, CB-PO  wrote:
>>> Yeah, the first thing I thought of was that perhaps there was something
>>> wrong
>>> with the uniqueKey and they were clashing between the indexes, however
>>> upon
>>> visual inspection of the data the field we are using as the unique key
>>> in
>>> each of the indexes is grossly different between the two databases, so
>>> there
>>> is no chance of them clashing.
>> 
>> Yes, but is the same fieldname and FieldType used for both indexes?
>> (that's sort of a requirement)
>> 
>> You might also try looking at the logs for the exact requests that
>> were sent to each shard as part of the distributed request, and
>> manually sending those requests and inspecting the results.  That
>> should tell you if the shard requests or responses are weird, or if
>> it's the top-level combining logic that's causing this.
>> 
>> -Yonik
>> http://www.lucidimagination.com
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-Shard---Strange-results-tp23561201p23601624.html
Sent from the Solr - User mailing list archive at Nabble.com.

DataImportHandler Template Transformer

2009-05-18 Thread Yao Ge


It took me a while to understand that to use the Template Transfomer
(http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/TemplateTransformer.html),
all building variable names (e.g. ${e.firstName} ${e.lastName} etc). can not
contain null values. I hope the parser can do a better job explaining it.
Also it will be nice to simple pad the null value will blank string. Should
this be considered as an enhancement?
-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-Template-Transformer-tp23609267p23609267.html
Sent from the Solr - User mailing list archive at Nabble.com.

spell checking

2009-06-02 Thread Yao Ge


Can someone help providing a tutorial like introduction on how to get
spell-checking work in Solr. It appears many steps are requires before the
spell-checkering functions can be used. It also appears that a dictionary (a
list of correctly spelled words) is required to setup the spell checker. Can
anyone validate my impression?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/spell-checking-tp23835427p23835427.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spell checking

2009-06-02 Thread Yao Ge


Yes. I did. I was not able to grasp the concept of making spell checking
work.
For example, the wiki page says an spell check index need to be built. But
did not say how to do it. Does Solr buid the index out of thin air? Or the
index is buit from the main index? or index is built form a dictionary or
word list?

Please help.


Grant Ingersoll-6 wrote:
> 
> Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent
> 
> 
> On Jun 2, 2009, at 8:50 AM, Yao Ge wrote:
> 
>>
>> Can someone help providing a tutorial like introduction on how to get
>> spell-checking work in Solr. It appears many steps are requires  
>> before the
>> spell-checkering functions can be used. It also appears that a  
>> dictionary (a
>> list of correctly spelled words) is required to setup the spell  
>> checker. Can
>> anyone validate my impression?
>>
>> Thanks.
>> -- 
>> View this message in context:
>> http://www.nabble.com/spell-checking-tp23835427p23835427.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/spell-checking-tp23835427p23840843.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spell checking

2009-06-02 Thread Yao Ge

Sorry for not be able to get my point across.

I know the syntax that leads to a index build for spell checking. I actually
run the command saw some additional file created in data\spellchecker1
directory. What I don't understand is what is in there as I can not trick
Solr to make spell suggestions based on the documented query structure in
wiki. 

Can anyone tell me what happened after when the default spell check is
built? In my case, I used copyField to copy a couple of text fields into a
field called "spell". These fields are the original text, they are the ones
with typos that I need to run spell check on. But how can these original
data be used as a base for spell checking? How does Solr know what are
correctly spelled words?

   ...

   ...

Yao Ge wrote:
> 
> Can someone help providing a tutorial like introduction on how to get
> spell-checking work in Solr. It appears many steps are requires before the
> spell-checkering functions can be used. It also appears that a dictionary
> (a list of correctly spelled words) is required to setup the spell
> checker. Can anyone validate my impression?
> 
> Thanks.
> 

-- 
View this message in context: 
http://www.nabble.com/spell-checking-tp23835427p23841373.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spell checking

2009-06-02 Thread Yao Ge


Excellent. Now everything make sense to me. :-)

The spell checking suggestion is the closest variance of user input that
actually existed in the main index. So called "correction" is relative the
text existed indexed. So there is no need for a brute force list of all
correctly spelled words. Maybe we should call this "alternative search
terms" or "suggested search terms" instead of spell checking. It is
misleading as there is no right or wrong in spelling, there is only popular
(term frequency?) alternatives.

Thanks for the insight.


Otis Gospodnetic wrote:
> 
> 
> Hello,
> 
> In short, the assumption behind this type of SC is that the text in the
> main index is (mostly) correctly spelled.  When the SC finds query
> terms that are close in spelling to words indexed in SC, it offers
> spelling suggestions/correction using those presumably correctly spelled
> terms (there are other parameters that control the exact behaviour, but
> this is the idea)
> 
> Solr (Lucene's spellchecker, which Solr uses under the hood, actually)
> turn the input text (values from those fields you copy to the spell field)
> into so called n-grams.  You can see that if you open up the SC index with
> something like Luke.  Please see
> http://wiki.apache.org/jakarta-lucene/SpellChecker .
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: Yao Ge 
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, June 2, 2009 5:34:07 PM
>> Subject: Re: spell checking
>> 
>> 
>> Sorry for not be able to get my point across.
>> 
>> I know the syntax that leads to a index build for spell checking. I
>> actually
>> run the command saw some additional file created in data\spellchecker1
>> directory. What I don't understand is what is in there as I can not trick
>> Solr to make spell suggestions based on the documented query structure in
>> wiki. 
>> 
>> Can anyone tell me what happened after when the default spell check is
>> built? In my case, I used copyField to copy a couple of text fields into
>> a
>> field called "spell". These fields are the original text, they are the
>> ones
>> with typos that I need to run spell check on. But how can these original
>> data be used as a base for spell checking? How does Solr know what are
>> correctly spelled words?
>> 
>>   
>> multiValued="true"/>
>>   
>> multiValued="true"/>
>>...
>>   
>> multiValued="true"/>
>>...
>>   
>>   
>> 
>> 
>> 
>> Yao Ge wrote:
>> > 
>> > Can someone help providing a tutorial like introduction on how to get
>> > spell-checking work in Solr. It appears many steps are requires before
>> the
>> > spell-checkering functions can be used. It also appears that a
>> dictionary
>> > (a list of correctly spelled words) is required to setup the spell
>> > checker. Can anyone validate my impression?
>> > 
>> > Thanks.
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/spell-checking-tp23835427p23841373.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/spell-checking-tp23835427p23844050.html
Sent from the Solr - User mailing list archive at Nabble.com.

Faceting on text fields

2009-06-04 Thread Yao Ge


I am index a database with over 1 millions rows. Two of fields contain
unstructured text but size of each fields is limited (256 characters). 

I come up with an idea to use visualize the text fields using text cloud by
turning the two text fields in facets. The weight of font and size is of
each facet value (words) derived from the facet counts. I used simpler field
type so that the there is no stemming to these facet values:

  






  


The facet query is considerably slower comparing to other facets from
structured database fields (with highly repeated values). What I found
interesting is that even after I constrained search results to just a few
hunderd hits using other facets, these text facets are still very slow.  

I understand that text fields are not good candidate for faceting as it can
contain very large number of unique values. However why it is still slow
after my matching documents is reduced to hundreds? Is it because the whole
filter is cached (regardless the matching docs) and I don't have enough
filter cache size to fit the whole list?

The following is my filterCahce setting:
 

Lastly, what I really want to is to give user a chance to visualize and
filter on top relevant words in the free-text fields. Are there alternative
to facet field approach? term vectors? I can do client side process based on
top N (say 100) hits for this but it is my last option.
-- 
View this message in context: 
http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting on text fields

2009-06-04 Thread Yao Ge


Yes. I am using 1.3. When is 1.4 due for release?


Yonik Seeley-2 wrote:
> 
> Are you using Solr 1.3?
> You might want to try the latest 1.4 test build - faceting has changed a
> lot.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge  wrote:
>>
>> I am index a database with over 1 millions rows. Two of fields contain
>> unstructured text but size of each fields is limited (256 characters).
>>
>> I come up with an idea to use visualize the text fields using text cloud
>> by
>> turning the two text fields in facets. The weight of font and size is of
>> each facet value (words) derived from the facet counts. I used simpler
>> field
>> type so that the there is no stemming to these facet values:
>>    > positionIncrementGap="100"
>>>
>>      
>>        
>>        > ignoreCase="true" expand="false"/>
>>        > words="stopwords.txt"/>
>>        > generateWordParts="0" generateNumberParts="0" catenateWords="1"
>> catenateNumbers="1" catenateAll="0"/>
>>        
>>        
>>      
>>    
>>
>> The facet query is considerably slower comparing to other facets from
>> structured database fields (with highly repeated values). What I found
>> interesting is that even after I constrained search results to just a few
>> hunderd hits using other facets, these text facets are still very slow.
>>
>> I understand that text fields are not good candidate for faceting as it
>> can
>> contain very large number of unique values. However why it is still slow
>> after my matching documents is reduced to hundreds? Is it because the
>> whole
>> filter is cached (regardless the matching docs) and I don't have enough
>> filter cache size to fit the whole list?
>>
>> The following is my filterCahce setting:
>>     > autowarmCount="128"/>
>>
>> Lastly, what I really want to is to give user a chance to visualize and
>> filter on top relevant words in the free-text fields. Are there
>> alternative
>> to facet field approach? term vectors? I can do client side process based
>> on
>> top N (say 100) hits for this but it is my last option.
>> --
>> View this message in context:
>> http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Faceting-on-text-fields-tp23872891p23876051.html
Sent from the Solr - User mailing list archive at Nabble.com.

Query Filter fq with OR operator

2009-06-05 Thread Yao Ge


If I want use OR operator with mutile query filters, I can do:
fq=popularity:[10 TO *] OR section:0
Is there a more effecient alternative to this?
-- 
View this message in context: 
http://www.nabble.com/Query-Filter-fq-with-OR-operator-tp23895837p23895837.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting on text fields

2009-06-09 Thread Yao Ge

Michael,

Thanks for the update! I definitely need to get a 1.4 build see if it makes
a difference.

BTW, maybe instead of using faceting for text
mining/clustering/visualization purpose, we can build a separate feature in
SOLR for this. Many of commercial search engines I have experiences with
(Google Search Appliance, Vivisimo etc) provide dynamic term clustering
based on top N ranked documents (N is a parameter can be configured). When
facet field is highly fragmented (say a text field), the existing set
intersection based approach might no longer be optimum. Aggregating term
vectors over top N docs might be more attractive. Another features I can
really appreciate is to provide search time n-gram term clustering. Maybe
this might be better suited for "spell checker" as it just a different way
to display the alternative search terms.

-Yao

Michael Ludwig-4 wrote:
> 
> Yao Ge schrieb:
> 
>> The facet query is considerably slower comparing to other facets from
>> structured database fields (with highly repeated values). What I found
>> interesting is that even after I constrained search results to just a
>> few hunderd hits using other facets, these text facets are still very
>> slow.
>>
>> I understand that text fields are not good candidate for faceting as
>> it can contain very large number of unique values. However why it is
>> still slow after my matching documents is reduced to hundreds? Is it
>> because the whole filter is cached (regardless the matching docs) and
>> I don't have enough filter cache size to fit the whole list?
> 
> Very interesting questions! I think an answer would both require and
> further an understanding of how filters work, which might even lead to
> a more general guideline on when and how to use filters and facets.
> 
> Even though faceting appears to have changed in 1.4 vs 1.3, it would
> still be interesting to understand the 1.3 side of things.
> 
>> Lastly, what I really want to is to give user a chance to visualize
>> and filter on top relevant words in the free-text fields. Are there
>> alternative to facet field approach? term vectors? I can do client
>> side process based on top N (say 100) hits for this but it is my last
>> option.
> 
> Also a very interesting data mining question! I'm sorry I don't have any
> answers for you. Maybe someone else does.
> 
> Best,
> 
> Michael Ludwig
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Faceting-on-text-fields-tp23872891p23950084.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting on text fields

2009-06-10 Thread Yao Ge


Thanks for insight Otis. I have no awareness of ClusteringComponent until
now. It is time to move to Solr 1.4

-Yao

Otis Gospodnetic wrote:
> 
> 
> Yao,
> 
> Solr can already cluster top N hits using Carrot2:
> http://wiki.apache.org/solr/ClusteringComponent
> 
> I've also done ugly "manual counting" of terms in top N hits.  For
> example, look at the right side of this:
> http://www.simpy.com/user/otis/tag/%22machine+learning%22
> 
> Something like http://www.sematext.com/product-key-phrase-extractor.html
> could also be used.
> 
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: Yao Ge 
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, June 9, 2009 3:46:13 PM
>> Subject: Re: Faceting on text fields
>> 
>> 
>> Michael,
>> 
>> Thanks for the update! I definitely need to get a 1.4 build see if it
>> makes
>> a difference.
>> 
>> BTW, maybe instead of using faceting for text
>> mining/clustering/visualization purpose, we can build a separate feature
>> in
>> SOLR for this. Many of commercial search engines I have experiences with
>> (Google Search Appliance, Vivisimo etc) provide dynamic term clustering
>> based on top N ranked documents (N is a parameter can be configured).
>> When
>> facet field is highly fragmented (say a text field), the existing set
>> intersection based approach might no longer be optimum. Aggregating term
>> vectors over top N docs might be more attractive. Another features I can
>> really appreciate is to provide search time n-gram term clustering. Maybe
>> this might be better suited for "spell checker" as it just a different
>> way
>> to display the alternative search terms.
>> 
>> -Yao
>> 
>> 
>> Michael Ludwig-4 wrote:
>> > 
>> > Yao Ge schrieb:
>> > 
>> >> The facet query is considerably slower comparing to other facets from
>> >> structured database fields (with highly repeated values). What I found
>> >> interesting is that even after I constrained search results to just a
>> >> few hunderd hits using other facets, these text facets are still very
>> >> slow.
>> >>
>> >> I understand that text fields are not good candidate for faceting as
>> >> it can contain very large number of unique values. However why it is
>> >> still slow after my matching documents is reduced to hundreds? Is it
>> >> because the whole filter is cached (regardless the matching docs) and
>> >> I don't have enough filter cache size to fit the whole list?
>> > 
>> > Very interesting questions! I think an answer would both require and
>> > further an understanding of how filters work, which might even lead to
>> > a more general guideline on when and how to use filters and facets.
>> > 
>> > Even though faceting appears to have changed in 1.4 vs 1.3, it would
>> > still be interesting to understand the 1.3 side of things.
>> > 
>> >> Lastly, what I really want to is to give user a chance to visualize
>> >> and filter on top relevant words in the free-text fields. Are there
>> >> alternative to facet field approach? term vectors? I can do client
>> >> side process based on top N (say 100) hits for this but it is my last
>> >> option.
>> > 
>> > Also a very interesting data mining question! I'm sorry I don't have
>> any
>> > answers for you. Maybe someone else does.
>> > 
>> > Best,
>> > 
>> > Michael Ludwig
>> > 
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Faceting-on-text-fields-tp23872891p23950084.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Faceting-on-text-fields-tp23872891p23965401.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting on text fields

2009-06-11 Thread Yao Ge


FYI. I did a direct integration with Carrot2 with Solrj with a separate Ajax
call from UI for top 100 hits to clusters terms in the two text fields. It
gots comparable performance to other facets in terms of response time. 

In terms of algorithms, their listed two "Lingo" and "STC" which I don't
reconize. But I think at least one of them might have used SVD
(http://en.wikipedia.org/wiki/Singular_value_decomposition).

-Yao


Otis Gospodnetic wrote:
> 
> 
> I'd call it related (their application in search encourages exploration),
> but also distinct enough to never mix them up.  I think your assessment
> below is correct, although I'm not familiar with the details of Carrot2
> any more (was once), so I can't tell you exactly which algo is used under
> the hood.
> 
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: Michael Ludwig 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, June 10, 2009 9:41:54 AM
>> Subject: Re: Faceting on text fields
>> 
>> Otis Gospodnetic schrieb:
>> >
>> > Solr can already cluster top N hits using Carrot2:
>> > http://wiki.apache.org/solr/ClusteringComponent
>> 
>> Would it be fair to say that clustering as detailed on the page you're
>> referring to is a kind of dynamic faceting? The faceting not being done
>> based on distinct values of certain fields, but on the presence (and
>> frequency) of terms in one field?
>> 
>> The main difference seems to be that with faceting, grouping criteria
>> (facets) are known beforehand, while with clustering, grouping criteria
>> (the significant terms which create clusters - the cluster keys) have
>> yet to be determined. Is that a correct assessment?
>> 
>> Michael Ludwig
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Faceting-on-text-fields-tp23872891p23980124.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting on text fields

2009-06-11 Thread Yao Ge


BTW, Carrot2 has a very impressive Clustering Workbench (based on eclipse)
that has built-in integration with Solr. If you have a Solr service running,
it is a just a matter of point the workbench to it. The clustering results
and visualization are amazing. (http://project.carrot2.org/download.html).


Yao Ge wrote:
> 
> FYI. I did a direct integration with Carrot2 with Solrj with a separate
> Ajax call from UI for top 100 hits to clusters terms in the two text
> fields. It gots comparable performance to other facets in terms of
> response time. 
> 
> In terms of algorithms, their listed two "Lingo" and "STC" which I don't
> reconize. But I think at least one of them might have used SVD
> (http://en.wikipedia.org/wiki/Singular_value_decomposition).
> 
> -Yao
> 
> 
> Otis Gospodnetic wrote:
>> 
>> 
>> I'd call it related (their application in search encourages exploration),
>> but also distinct enough to never mix them up.  I think your assessment
>> below is correct, although I'm not familiar with the details of Carrot2
>> any more (was once), so I can't tell you exactly which algo is used under
>> the hood.
>> 
>>  Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> 
>> 
>> 
>> - Original Message 
>>> From: Michael Ludwig 
>>> To: solr-user@lucene.apache.org
>>> Sent: Wednesday, June 10, 2009 9:41:54 AM
>>> Subject: Re: Faceting on text fields
>>> 
>>> Otis Gospodnetic schrieb:
>>> >
>>> > Solr can already cluster top N hits using Carrot2:
>>> > http://wiki.apache.org/solr/ClusteringComponent
>>> 
>>> Would it be fair to say that clustering as detailed on the page you're
>>> referring to is a kind of dynamic faceting? The faceting not being done
>>> based on distinct values of certain fields, but on the presence (and
>>> frequency) of terms in one field?
>>> 
>>> The main difference seems to be that with faceting, grouping criteria
>>> (facets) are known beforehand, while with clustering, grouping criteria
>>> (the significant terms which create clusters - the cluster keys) have
>>> yet to be determined. Is that a correct assessment?
>>> 
>>> Michael Ludwig
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Faceting-on-text-fields-tp23872891p23980959.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query Filter fq with OR operator

2009-06-26 Thread Yao Ge


I will like to submit a JIRA issue for this. Can anyone help me on where to
go?
-Yao


Otis Gospodnetic wrote:
> 
> 
> Brian,
> 
> Opening a JIRA issue if it doesn't already exist is the best way.  If you
> can provide a patch, even better!
> 
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: brian519 
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, June 16, 2009 1:32:41 PM
>> Subject: Re: Query Filter fq with OR operator
>> 
>> 
>> This feature is very important to me .. should I post something on the
>> dev
>> forum?  Not sure what the proper protocol is for adding a feature to the
>> roadmap
>> 
>> Thanks,
>> Brian.
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Query-Filter-fq-with-OR-operator-tp23895837p24059181.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Query-Filter-fq-with-OR-operator-tp23895837p24222170.html
Sent from the Solr - User mailing list archive at Nabble.com.

Faceting with MoreLikeThis

2009-07-06 Thread Yao Ge


Does Solr support faceting on MoreLikeThis search results?
-- 
View this message in context: 
http://www.nabble.com/Faceting-with-MoreLikeThis-tp24356166p24356166.html
Sent from the Solr - User mailing list archive at Nabble.com.

Filtering MoreLikeThis results

2009-07-06 Thread Yao Ge


I could not find any support from http://wiki.apache.org/solr/MoreLikeThis on
how to restrict MLT results to certain subsets. I passed along a fq
parameter and it is ignored. Since we can not incorporate the filters in the
query itself which is used to retrieve the target for similarity comparison,
it appears there is no way to filter MLT results. BTW. I am using Solr 1.3. 
Please let me know if there is way (other than hacking the source code) to
do this. Thanks!
-- 
View this message in context: 
http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filtering MoreLikeThis results

2009-07-07 Thread Yao Ge


I am not sure about the parameters for MLT the requestHandler plugin. Can one
of you share the solrconfig.xml entry for MLT? Thanks in advance.
-Yao


Bill Au wrote:
> 
> I have been using the StandardRequestHandler (ie /solr/select).  fq does
> work with the MoreLikeThisHandler.  I will switch to use that.  Thanks.
> 
> Bill
> 
> On Tue, Jul 7, 2009 at 11:02 AM, Marc Sturlese
> wrote:
> 
>>
>> At least in trunk, if you request for:
>> http://localhost:8084/solr/core_A/mlt?q=id:7468365&fq=price[100<http://localhost:8084/solr/core_A/mlt?q=id:7468365&fq=price%5B100>TO
>> 200]
>> It will filter the MoreLikeThis results
>>
>>
>> Bill Au wrote:
>> >
>> > I think fq only works on the main response, not the mlt matches.  I
>> found
>> > a
>> > couple of releated jira:
>> >
>> > http://issues.apache.org/jira/browse/SOLR-295
>> > http://issues.apache.org/jira/browse/SOLR-281
>> >
>> > If I am reading them correctly, I should be able to use DIsMax and
>> > MoreLikeThis together.  I will give that a try and report back.
>> >
>> > Bill
>> >
>> >
>> > On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese
>> > wrote:
>> >
>> >>
>> >> Using MoreLikeThisHandler you can use fq to filter your results. As
>> far
>> >> as
>> >> I
>> >> know bq are not allowed.
>> >>
>> >>
>> >> Bill Au wrote:
>> >> >
>> >> > I have been trying to restrict MoreLikeThis results without any luck
>> >> also.
>> >> > In additional to restricting the results, I am also looking to
>> >> influence
>> >> > the
>> >> > scores similar to the way boost query (bq) works in the
>> >> > DisMaxRequestHandler.
>> >> >
>> >> > I think Solr's MoreLikeThis depends on Lucene's contrib queries
>> >> > MoreLikeThis, or at least it used to.  Has anyone looked into
>> enhancing
>> >> > Solrs' MoreLikeThis to support bq and restricting mlt results?
>> >> >
>> >> > Bill
>> >> >
>> >> > On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge  wrote:
>> >> >
>> >> >>
>> >> >> I could not find any support from
>> >> >> http://wiki.apache.org/solr/MoreLikeThison
>> >> >> how to restrict MLT results to certain subsets. I passed along a fq
>> >> >> parameter and it is ignored. Since we can not incorporate the
>> filters
>> >> in
>> >> >> the
>> >> >> query itself which is used to retrieve the target for similarity
>> >> >> comparison,
>> >> >> it appears there is no way to filter MLT results. BTW. I am using
>> Solr
>> >> >> 1.3.
>> >> >> Please let me know if there is way (other than hacking the source
>> >> code)
>> >> >> to
>> >> >> do this. Thanks!
>> >> >> --
>> >> >> View this message in context:
>> >> >>
>> >>
>> http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html
>> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24374996.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24377360.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filtering MoreLikeThis results

2009-07-07 Thread Yao Ge


The answer to my owner question:
  ...
  

  
  ...

would work.
-Yao


Yao Ge wrote:
> 
> I am not sure about the parameters for MLT the requestHandler plugin. Can
> one of you share the solrconfig.xml entry for MLT? Thanks in advance.
> -Yao
> 
> 
> Bill Au wrote:
>> 
>> I have been using the StandardRequestHandler (ie /solr/select).  fq does
>> work with the MoreLikeThisHandler.  I will switch to use that.  Thanks.
>> 
>> Bill
>> 
>> On Tue, Jul 7, 2009 at 11:02 AM, Marc Sturlese
>> wrote:
>> 
>>>
>>> At least in trunk, if you request for:
>>> http://localhost:8084/solr/core_A/mlt?q=id:7468365&fq=price[100<http://localhost:8084/solr/core_A/mlt?q=id:7468365&fq=price%5B100>TO
>>> 200]
>>> It will filter the MoreLikeThis results
>>>
>>>
>>> Bill Au wrote:
>>> >
>>> > I think fq only works on the main response, not the mlt matches.  I
>>> found
>>> > a
>>> > couple of releated jira:
>>> >
>>> > http://issues.apache.org/jira/browse/SOLR-295
>>> > http://issues.apache.org/jira/browse/SOLR-281
>>> >
>>> > If I am reading them correctly, I should be able to use DIsMax and
>>> > MoreLikeThis together.  I will give that a try and report back.
>>> >
>>> > Bill
>>> >
>>> >
>>> > On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese
>>> > wrote:
>>> >
>>> >>
>>> >> Using MoreLikeThisHandler you can use fq to filter your results. As
>>> far
>>> >> as
>>> >> I
>>> >> know bq are not allowed.
>>> >>
>>> >>
>>> >> Bill Au wrote:
>>> >> >
>>> >> > I have been trying to restrict MoreLikeThis results without any
>>> luck
>>> >> also.
>>> >> > In additional to restricting the results, I am also looking to
>>> >> influence
>>> >> > the
>>> >> > scores similar to the way boost query (bq) works in the
>>> >> > DisMaxRequestHandler.
>>> >> >
>>> >> > I think Solr's MoreLikeThis depends on Lucene's contrib queries
>>> >> > MoreLikeThis, or at least it used to.  Has anyone looked into
>>> enhancing
>>> >> > Solrs' MoreLikeThis to support bq and restricting mlt results?
>>> >> >
>>> >> > Bill
>>> >> >
>>> >> > On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge  wrote:
>>> >> >
>>> >> >>
>>> >> >> I could not find any support from
>>> >> >> http://wiki.apache.org/solr/MoreLikeThison
>>> >> >> how to restrict MLT results to certain subsets. I passed along a
>>> fq
>>> >> >> parameter and it is ignored. Since we can not incorporate the
>>> filters
>>> >> in
>>> >> >> the
>>> >> >> query itself which is used to retrieve the target for similarity
>>> >> >> comparison,
>>> >> >> it appears there is no way to filter MLT results. BTW. I am using
>>> Solr
>>> >> >> 1.3.
>>> >> >> Please let me know if there is way (other than hacking the source
>>> >> code)
>>> >> >> to
>>> >> >> do this. Thanks!
>>> >> >> --
>>> >> >> View this message in context:
>>> >> >>
>>> >>
>>> http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html
>>> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
>>> >> >>
>>> >> >>
>>> >> >
>>> >> >
>>> >>
>>> >> --
>>> >> View this message in context:
>>> >>
>>> http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html
>>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>>> >>
>>> >>
>>> >
>>> >
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24374996.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24380408.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting with MoreLikeThis

2009-07-07 Thread Yao Ge

Faceting on MLT request the use of MoreLikeThisHandler. The standard request
handler, while provide support to MLT via a search component, does not
return facets on MLT results. To enable MLT handler, add an entry like below
to your solrconfig.xml

The query parameters syntax for faceting remains the same as standard
request handler.

-Yao

Yao Ge wrote:
> 
> Does Solr support faceting on MoreLikeThis search results?
> 

-- 
View this message in context: 
http://www.nabble.com/Faceting-with-MoreLikeThis-tp24356166p24380459.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: A big question about Solr and SolrJ range query ?

2009-07-07 Thread Yao Ge


use Solr's Filter Query parameter "fq":
fq=x:[10 TO 100]&fq=y:[20 TO 300]&fl=title

-Yao

huenzhao wrote:
> 
> Hi all:
> 
> Suppose that my index have 3 fields: title, x and y.
> 
> I know one range(10 < x < 100) can query liks this: 
> 
> http://localhost:8983/solr/select?q=x:[10 TO 100]&fl=title
> 
> If I want to two range(10 < x <100 AND 20 < y < 300) query like 
> 
> SQL(select title where x>10 and x < 100 and y > 20 and y < 300) 
> 
> by using Solr range query or SolrJ, but not know how to implement. Anybody
> know ? Thanks
> 
> Email: enzhao...@gmail.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/A-big-question-about-Solr-and-SolrJ-range-query---tp24384416p24384540.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: about defaultSearchField

2009-07-07 Thread Yao Ge


Try with fl=* or fl=*,score added to your request string.
-Yao

Yang Lin-2 wrote:
> 
> Hi,
> I have some problems.
> For my solr progame, I want to type only the Query String and get all
> field
> result that includ the Query String. But now I can't get any result
> without
> specified field. For example, query with "tina" get nothing, but
> "Sentence:tina" could.
> 
> I hava adjusted the *schema.xml* like this:
> 
> 
>>> stored="true" multiValued="true"/>
>>> stored="true" multiValued="true"/>
>>> stored="true" multiValued="true"/>
>>> multiValued="true"/>
>>
>>> multiValued="true"/>
>> 
>>
>> Sentence
>>
>>  
>>  allText
>>
>>  
>>  
>>
>> 
>> 
>> 
>> 
> 
> 
> I think the problem is in , but I don't know how to
> fix
> it. Could anyone help me?
> 
> Thanks
> Yang
> 
> 

-- 
View this message in context: 
http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr's MLT query call doesn't work

2009-07-08 Thread Yao Ge


A couple of things, your mlt.fl value, must be part of fl. In this case,
content_mlt is not included in fl.
I think the fl parameter value need to be comma separated. try
fl=title,author,content_mlt,score

-Yao

SergeyG wrote:
> 
> Hi,
> 
> Recently, while implementing the MoreLikeThis search, I've run into the
> situation when Solr's mlt query calls don't work. 
> 
> More specifically, the following query:
> 
> http://localhost:8080/solr/select?q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
> 5&mlt.interestingTerms=details&fl=title+author+score
> 
> brings back just the doc with id=10 and nothing else. While using the
> GetMethod approach (putting /mlt explicitely into the url), I got back
> some results.
> 
> I've been trying to solve this problem for more than a week with no luck.
> If anybody has any hint, please help.
> 
> Below, I put logs & outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c)
> GetMethod (/select).
> 
> Thanks a lot.
> 
> Regards,
> Sergey Goldberg
> 
> 
> Here're the logs: 
> 
> a) Solr (http://localhost:8080/solr/select)
> 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt=
> true&mlt.interestingTerms=details&mlt.maxqt=5&wt=javabin&version=2.2}
> hits=1 status=0 QTime=172
> 
> INFO MLTSearchRequestProcessor:49 - SolrServer url:
> http://localhost:8080/solr
> INFO MLTSearchRequestProcessor:67 - solrQuery>
> q=id%3A10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
>   5&mlt.interestingTerms=details&fl=title+author+score
> INFO MLTSearchRequestProcessor:73 - Number of docs found = 1
> INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612
> 
> 
> b) GetMethod (http://localhost:8080/solr/mlt)
> 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/mlt
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.max
> qt=5&mlt.interestingTerms=details} status=0 QTime=15
> 
> INFO MLT2SearchRequestProcessor:76 -  encoding="UTF-8"?>
> 
> 0 name="QTime">0 maxScore="2.098612">2.098612S.G. name="title">SG_Book umFound="4" start="0" maxScore="0.28923997"> name="score">0.28923997O.
> HenryS.G.Four Million,
> The0.08667877 name="author">Katherine MosbyThe Season
> of Lillian Dawes name="score">0.07947738Jerome K.
> JeromeThree Men in a
> Boat name="score">0.047219563Charles
> OliverS.G.ABC's of
> Science name="content_mlt:ye">1.0 name="content_mlt:tobin">1.0 name="content_mlt:a">1.0 name="content_mlt:i">1.0 name="content_mlt:his">1.0
> 
> 
> 
> c) GetMethod (http://localhost:8080/solr/select)
> 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.
> maxqt=5&mlt.interestingTerms=details} hits=1 status=0 QTime=16
> 
> INFO MLT2SearchRequestProcessor:80 -  encoding="UTF-8"?>
> 
> 0 name="QTime">16title author
> scorecontent_mlt name="q">id:105 name="mlt.interestingTerms">details name="response" numFound="1" start="0" maxScore="2.098612"> name="score">2.098612S.G. name="title">SG_Book name="rawquerystring">id:10id:10 name="parsedq
> uery">id:10id:10 name="explain">
> 2.098612 = (MATCH) weight(id:10 in 3), product of:
>   0.9994 = queryWeight(id:10), product of:
> 2.0986123 = idf(docFreq=1, numDocs=5)
> 0.47650534 = queryNorm
>   2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of:
> 1.0 = tf(termFreq(id:10)=1)
> 2.0986123 = idf(docFreq=1, numDocs=5)
> 1.0 = fieldNorm(field=id, doc=3)
> OldLuceneQParser name="timing">16.0 name="time">0.0 name="org.apache.solr.handler.component.QueryComponent"> name="time">0.0 name="org.apache.solr.handler.component.FacetComponent"> name="time">0.00.0 name="org.apache.solr.handler.component.HighlightComponent"> name="time">0.0 name="org.apache.solr.handler.component.DebugComponent"> name="time">0.0 name="time">16.0 name="org.apache.solr.handler.component.QueryComponent"> name="time">0.0 name="org.apache.solr.handler.component.FacetComponent"> name="time">0.0 name="org.apache.solr.handler.component.MoreLikeThisComponent"> name="time">0.0 name="org.apache.solr.handler.component.HighlightComponent"> name="time">0.0 name="org.apache.solr.handler.component.DebugComponent"> name="time">16.0
> 
> 
> 
> And here're the relevant entries from solrconfig.xml:
> 
>   default="true">
>
> 
>   explicit
>   id,title,author,score
>   on
> 
>  
> 
>  
> 
>   1
>   10
> 
>  
> 

-- 
View this message in context: 
http://www.nabble.com/Solr%27s-MLT-query-call-doesn%27t-work-tp24391843p24391918.html
Sent from the Solr - User mailing list archive at Nabble.com.

DIH delta import - last modified date

2010-01-19 Thread Yao Ge


I am struggling with the concept of delta import in DIH. According the to
documentation, the delta import will automatically record the last index
time stamp and make it available to use for the delta query. However in many
case when the last_modified date time stamp in the database lag behind the
current time, the last index time stamp is the not good for delta query. Can
I pick a different mechanism to generate "last_index_time" by using time
stamp computed from the database (such as from a column of the database)?
-- 
View this message in context: 
http://old.nabble.com/DIH-delta-import---last-modified-date-tp27231449p27231449.html
Sent from the Solr - User mailing list archive at Nabble.com.

hl.maxAlternateFieldLength defaults in solrconfig.xml

2010-02-10 Thread Yao Ge


It appears the hl.maxAlternateFieldLength parameter default setting in
solrconfig.xml does not take effect. I can only get it to work by explicitly
sending the parameter via the client request. It is not big deal but it
appears to be a bug.
-- 
View this message in context: 
http://old.nabble.com/hl.maxAlternateFieldLength-defaults-in-solrconfig.xml-tp27542463p27542463.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr Search problem; cannot search the existing word in the index content

2010-06-02 Thread Zero Yao

Modify all  settings in solrconfig.xml and try again, by 
default solr will only index the first 1 fields.

Best Regards,
Yandong

-Original Message-
From: Mint o_O! [mailto:mint@gmail.com] 
Sent: 2010年6月3日 13:58
To: solr-user@lucene.apache.org
Subject: Re: Solr Search problem; cannot search the existing word in the index 
content

Thanks for you advice. I did as you said and i still cannot search my
content.

One thing i notice here i can search for only the words within first 100
rows or maybe bigger than this not sure but not all. So is it the limitation
of the index it self? When I create another sample content with only small
amount of data. It's working great!!!
My content is around 1.2M. I stored it as the text field as in the
schema.xml sample file.

Anyone has the same issue with me?

thanks,

Mint

On Tue, May 18, 2010 at 1:58 PM, Lance Norskog  wrote:

> backslash*rhode
> \*rhode may work.
>
> On Mon, May 17, 2010 at 7:23 AM, Erick Erickson 
> wrote:
> > A couple of things:
> > 1> try searching with &debugQuery=on attached to your URL, that'll
> > give you some clues.
> > 2> It's really worthwhile exploring the admin pages for a while, it'll
> also
> > give you a world of information. It takes a while to understand what the
> > various pages are telling you, but you'll come to rely on them.
> > 3> Are you really searching with leading and trailing wildcards or is
> that
> > just the mail changing bolding? Because this is tricky, very tricky.
> Search
> > the mail archives for "leading wildcard" to see lots of discussion of
> this
> > topic.
> >
> > You might back off a bit and try building up to wildcards if that's what
> > you're doing
> >
> > HTH
> > Erick
> >
> > On Mon, May 17, 2010 at 1:11 AM, Mint o_O!  wrote:
> >
> >> Hi,
> >>
> >> I'm working on the index/search project recently and i found solr which
> is
> >> very fascinating to me.
> >>
> >> I followed the test successful from the tutorial page. Starting up jetty
> >> and
> >> run adding new xml (user:~/solr/example/exampledocs$ *java -jar post.jar
> >> *.xml*) so far so good at this stage.
> >>
> >> Now i have create my own testing westpac.xml file with real data I
> intend
> >> to
> >> implement, putting in exampledocs and again ran the command
> >> (user:~/solr/example/exampledocs$ *java -jar post.jar westpac.xml*).
> >> Everything went on very well however when i searched for "*rhode*" which
> is
> >> in the content. And Index returned nothing.
> >>
> >> Could anyone guide me what I did wrong why i couldn't search for that
> word
> >> even though that word is in my index content.
> >>
> >> thanks,
> >>
> >> Mint
> >>
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

RE: Permissions and user to acess administrative interface

2012-02-13 Thread Ge, Yao (Y.)

I can only speak from my experience with Tomcat.
First make sure the available authentication modes are available by checking 
server.xml. 
I added a few roles in tomcat-users.xml and add individual user id/password to 
these roles. For example you can separate by Search, Update, Admin roles.
Modified the web.xml to map different modules to different roles.

-Yao

-Original Message-
From: Em [mailto:mailformailingli...@yahoo.de] 
Sent: Monday, February 13, 2012 11:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Permissions and user to acess administrative interface

Hi Anderson,

you will need to rearrange the JSPs a little bit to do what you want.
If you do so, you can create rules via .htaccess.

Otherwise I would suggest you to look for a commercial distribution of
Solr which might fit your needs.

Regards,
Em

Am 13.02.2012 16:48, schrieb Anderson vasconcelos:
> Hi All
> 
> Is there some way to add users and permissions on SOLR administration page?
> I need to restrict the access of users in the administration page. I Just
> wanna expose  the query section for determinate user. Addition, i wanna to
> restrict the access of the cores per user. Somethings like that:
> 
> Core 1 - Users : John, Paul, Carter
>  Full Interface: John, Paul
>  Only search interface: Carter
> Core 2  -Users: John , Mary
>  Full Interface: John
>  Only search interface: Mary
> 
> Is that possible?
> 
> Thanks
>

RE: Item Facet

2009-08-06 Thread Ge, Yao (Y.)

If you can reindex, simply rebuild the index with fields replaced by
combining existing fields. 
-Yao 

-Original Message-
From: David Lojudice Sobrinho [mailto:dalss...@gmail.com] 
Sent: Thursday, August 06, 2009 4:17 PM
To: solr-user@lucene.apache.org
Subject: Item Facet

Hi...

Is there any way to group values like shopping.yahoo.com or
shopper.cnet.com do?

For instance, I have documents like:

doc1 - product_name1 - value1
doc2 - product_name1 - value2
doc3 - product_name1 - value3
doc4 - product_name2 - value4
doc5 - product_name2 - value5
doc6 - product_name2 - value6

I'd like to have a result grouping by product name with the value
range per product. Something like:

product_name1 - (value1 to value3)
product_name2 - (value4 to value6)

It is not like the current facet because the information is grouped by
item, not the entire result.

Any idea?

Thanks!

David Lojudice Sobrinho

RE: Date faceting and memory leaks

2010-05-17 Thread Ge, Yao (Y.)

I do not have any GC specific setting in command line. I had tried to
force GC collection via Jconsole at the end of the run but it didn't
seems to do anything the heap size.
-Yao 

-Original Message-
From: Antonio Lobato [mailto:alob...@symplicity.com] 
Sent: Monday, May 17, 2010 2:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Date faceting and memory leaks

What garbage collection settings are you running at the command line
when starting Solr?
On May 17, 2010, at 2:41 PM, Yao wrote:

> 
> I have been running load testing using JMeter on a Solr 1.4 index with
~4
> million docs. I notice a steady JVM heap size increase as I iterator
100
> query terms a number of times against the index. The GC does not seems
to
> claim the heap after the test run is completed. It will run into
OutOfMemory
> as I repeat the test or increase the number of threads/users. 
> 
> The date facet queries are specified as following (as part of "append"
> section in request handler):
>
>{!ex=last_modified}last_modified:[NOW-30DAY
TO
> *]
> {!ex=last_modified}last_modified:[NOW-90DAY TO
> NOW-30DAY]
> {!ex=last_modified}last_modified:[NOW-180DAY TO
> NOW-90DAY]
> {!ex=last_modified}last_modified:[NOW-365DAY TO
> NOW-180DAY]
> {!ex=last_modified}last_modified:[NOW-730DAY TO
> NOW-365DAY]
> {!ex=last_modified}last_modified:[* TO
> NOW-730DAY]
>
> 
> The last_modified field is a TrieDateField with a precisionStep of 6.
> 
> I have played for filterCache setting but does not have any effects as
the
> date field cache seems be  managed by Lucene FieldCahce.
> 
> Please help as I can be struggling with this for days. Thanks in
advance.
> -- 
> View this message in context:
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp8243
72p824372.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

---
Antonio Lobato
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8101
alob...@symplicity.com

Facet.query

2007-04-19 Thread Ge, Yao \(Y.\)

When mutiple facet queries are specified, are they booleaned as OR or
AND?
-Yao

RE: Facet.query

2007-04-19 Thread Ge, Yao \(Y.\)

Never mind. I should have read the example
(http://wiki.apache.org/solr/SimpleFacetParameters#head-1da3ab3995bc4abc
dce8e0f04be7355ba19e9b2c) first.

From: Ge, Yao (Y.) 
Sent: Thursday, April 19, 2007 10:41 PM
To: 'solr-user@lucene.apache.org'
Subject: Facet.query

When mutiple facet queries are specified, are they booleaned as OR or
AND?
-Yao

solr java client code and XML schema

2007-04-21 Thread Ge, Yao \(Y.\)

Looks like there is no publish java client for solr - what a surprise. I
would assume it would be very useful for integrating solr into existing
apps. 
Anyone has done parsing standard XML response to Java Objects? I would
like to create some strong typed object hierarchy instead bunch of
Collections and Maps. The current XML schema is very light and I was
having a hard time writing Digester rules. I am in the mid of write XSLT
to transform response into "easier" XML tags for Digester (such as
0... instead of 0...). Is there a good
reason for solr to not have a particular rich XML schema? 
 
-Yao

RE: Faceted count syntax (exclude zeros)...

2007-05-01 Thread Ge, Yao \(Y.\)

There is an bug related to "facet.mincount" in incubating version.
http://www.mail-archive.com/solr-user@lucene.apache.org/msg03269.html
-Yao 

-Original Message-
From: escher2k [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 01, 2007 2:00 AM
To: solr-user@lucene.apache.org
Subject: Faceted count syntax (exclude zeros)...

I am trying to execute a faceted count on a field called "load_id" and
want
to exclude 0s. The URL below
doesn't seem to be excluding zeros. 
http://localhost:12002/solr/select/?qt=dismax&q=Y&qf=show_all_flag&fl=lo
ad_id&facet=true&facet.limit=-1&facet.field=load_id&facet.mincount=1&row
s=0

Result (relevant part of XML):

   0
   0
   80
   81
   77
   62
   31061

Thanks.
-- 
View this message in context:
http://www.nabble.com/Faceted-count-syntax-%28exclude-zeros%29...-tf3673
535.html#a10264961
Sent from the Solr - User mailing list archive at Nabble.com.

Look ahead queries

2007-05-03 Thread Ge, Yao \(Y.\)

I am planning to develop look ahead queries with Solr so that as user
type query terms a list of related terms is shown in a popup window
(similar to Google suggest). It will be a little AJAX type calls to Solr
with wildcards. So if user types "fuel", a look ahead query will be sent
to solr in form of "fuel *". User will end-up seeing relevant terms like
"fuel consumption", "fuel leaks", "fuel tank" etc showing up. In this
case, I will likely to limit queries to certain fields only and some
post processing is required to get a final list of suggestion. Let me
know if someone has already done this and there are better ways or
suggestions to accomplish this. I figured solr's caching will make this
type of application more efficient than a straight Lucene integration.

Thanks.

-Yao

69 matches

Mail list logo