Replication failed without an error =(

2012-04-24 Thread stockii
hello..

anyone a idea how i can figure out why my replication failed ? i got no
errors  =(

my configuratio is.

2 server! both are master and slave at the same time. only one server makes
updates and is so the master. on slave is started via cron a replication. is
one server crashed, i can easy switch master to slave, this is because both
are master AND slave at the same time.

this works well but now no replicate is working since i deleted the
pollInterval !?!? is this a reason?

thx

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 
1 Core with 45 Million Documents other Cores < 200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-failed-without-an-error-tp3934655p3934655.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-words synonyms matching

2012-04-24 Thread elisabeth benoit
Hello,

I'd like to resume this post.

The only way I found to do not split synonyms in words in synonyms.txt it
to use the line

 

in schema.xml

where tokenizerFactory="solr.KeywordTokenizerFactory"

instructs SynonymFilterFactory not to break synonyms into words on white
spaces when parsing synonyms file.

So now it works fine, "mairie" is mapped into "hotel de ville" and when I
send request q="hotel de ville" (quotes are mandatory to prevent analyzer
to split hotel de ville on white spaces), I get answers with word "mairie".

But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de ville"), it
doesn't work!!!

CATEGORY_ANALYZED is same field type as default search field. This means
that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel de
ville", solr uses the same analyzer, the one with the line

.

Anyone as a clue what is different between q analysis behaviour and fq
analysis behaviour?

Thanks a lot
Elisabeth

2012/4/12 elisabeth benoit 

> oh, that's right.
>
> thanks a lot,
> Elisabeth
>
>
> 2012/4/11 Jeevanandam Madanagopal 
>
>> Elisabeth -
>>
>> As you described, below mapping might suit for your need.
>> mairie => hotel de ville, mairie
>>
>> mairie gets expanded to "hotel de ville" and "mairie" at index time.  So
>> "mairie" and "hotel de ville" searchable on document.
>>
>> However, still white space tokenizer splits at query time will be a
>> problem as described by Markus.
>>
>> --Jeevanandam
>>
>> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
>>
>> > <' mapping instead? Something
>> > <> > < mairie
>> > <> >
>> > Yes, thanks, I've tried it but from what I undestand it doesn't solve my
>> > problem, since this means hotel de ville will be replace by mairie at
>> > index time (I use synonyms only at index time). So when user will ask
>> > "hôtel de ville", it won't match.
>> >
>> > In fact, at index time I have mairie in my data, but I want user to be
>> able
>> > to request "mairie" or "hôtel de ville" and have mairie as answer, and
>> not
>> > have mairie as an answer when requesting "hôtel".
>> >
>> >
>> > <> your
>> > white
>> > <> >
>> > <> >
>> > <> > query
>> > <> >
>> > Ok, I guess this means I have a problem. No simple solution since at
>> query
>> > time my tokenizer do split on white spaces.
>> >
>> > I guess my problem is more or less one of the problems discussed in
>> >
>> >
>> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
>> >
>> >
>> > Thanks a lot for your answers,
>> > Elisabeth
>> >
>> >
>> >
>> >
>> >
>> > 2012/4/10 Erick Erickson 
>> >
>> >> Have you tried the "=>' mapping instead? Something
>> >> like
>> >> hotel de ville => mairie
>> >> might work for you.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
>> >>  wrote:
>> >>> Hello,
>> >>>
>> >>> I've read several post on this issue, but can't find a real solution
>> to
>> >> my
>> >>> multi-words synonyms matching problem.
>> >>>
>> >>> I have in my synonyms.txt an entry like
>> >>>
>> >>> mairie, hotel de ville
>> >>>
>> >>> and my index time analyzer is configured as followed for synonyms.
>> >>>
>> >>> > >>> ignoreCase="true" expand="true"/>
>> >>>
>> >>> The problem I have is that now "mairie" matches with "hotel" and I
>> would
>> >>> only want "mairie" to match with "hotel de ville" and "mairie".
>> >>>
>> >>> When I look into the analyzer, I see that "mairie" is mapped into
>> >> "hotel",
>> >>> and words "de ville" are added in second and third position. To change
>> >>> that, I tried to do
>> >>>
>> >>> > >>> ignoreCase="true" expand="true"
>> >>> tokenizerFactory="solr.KeywordTokenizerFactory"/> (as I read in one
>> post)
>> >>>
>> >>> and I can see now in the analyzer that "mairie" is mapped to "hotel de
>> >>> ville", but now when I have query "hotel de ville", it doesn't match
>> at
>> >> all
>> >>> with "mairie".
>> >>>
>> >>> Anyone has a clue of what I'm doing wrong?
>> >>>
>> >>> I'm using Solr 3.4.
>> >>>
>> >>> Thanks,
>> >>> Elisabeth
>> >>
>>
>>
>


Re: Replication failed without an error =(

2012-04-24 Thread stockii
bevore this problem i got this problem
https://issues.apache.org/jira/browse/SOLR-1781

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 
1 Core with 45 Million Documents other Cores < 200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-failed-without-an-error-tp3934655p3934813.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Group by distance

2012-04-24 Thread ravicv
Use group=true and group.field in your query.
And your solr version should be solr 3.4 and above.

Thanks,
Ravi

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Group-by-distance-tp3934876p3934886.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Group by distance

2012-04-24 Thread ViruS
I think this only can works when I have many records in same position.
My problem is to group witch short distance... like I say in last mail...
about 10km.
I need put markers on Poland country and display this.
Now I have 100k records, but in future I will have about 2mln records so I
must send grouped records.

Best,
Piotr

On 24 April 2012 12:08, ravicv  wrote:

> Use group=true and group.field in your query.
> And your solr version should be solr 3.4 and above.
>
> Thanks,
> Ravi
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Group-by-distance-tp3934876p3934886.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Piotr (ViruS) Sikora
E-mail/JID: vi...@hostv.pl
http://piotrsikora.pl


Auto suggest on indexed file content filtered based on user

2012-04-24 Thread prakash_ajp
I am trying to implement an auto-suggest feature. The search feature already
exists and searches on file content in user's allotted workspace.

The following is from my schema that will be used for search indexing:

   
   

The search result is filtered by the user name. The suggest is implemented
as a searchComponent and the field 'Text' is used by the suggester and would
have to be filtered the same way the search is done. The problem with this
approach is, suggest works on a single field and there is no way to include
the UserName field as a filter.

What's the best way out from here?

Thanks in advance!
Jay

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3934565.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Deciding whether to stem at query time

2012-04-24 Thread Andrew Wagner
Ah, this is a really good point. Still seems like it has the downsides of
#2, though, much bigger space requirements and possibly some time lost on
queries.

On Mon, Apr 23, 2012 at 3:35 PM, Walter Underwood wrote:

> There is a third approach. Create two fields and always query both of
> them, with the exact field given a higher weight. This works great and
> performs well.
>
> It is what we did at Netflix and what I'm doing at Chegg.
>
> wunder
>
> On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote:
>
> > So I just realized the other day that stemming basically happens at index
> > time. If I'm understanding correctly, there's no way to allow a user to
> > specify, at run time, whether to stem particular words or not based on a
> > single index. I think there are two options, but I'd love to hear that
> I'm
> > wrong:
> >
> > 1.) Incrementally build up a white list of words that don't stem very
> well.
> > To pick a random example out of the blue, "light" isn't super closely
> > related to, "lighter", so I might choose not to stem that. If I wanted to
> > do this, I think (if I understand correctly), stemmerOverrideFilter would
> > help me out with this. I'm not a big fan of this approach.
> >
> > 2.) Index all the text in two fields, once with stemming and once
> without.
> > Then build some kind of option into the UI for specifying whether to stem
> > the words or not, and search the appropriate field. Unfortunately, this
> > would roughly double the size of my index, and probably affect query
> times
> > too. Plus, the UI would probably suck.
> >
> > Am I missing an option? Has anyone tried one of these approaches?
> >
> > Thanks!
> > Andrew
>
>
>
>
>
>


Searching on fields with White Spaces

2012-04-24 Thread Shubham Srivastava
I have a custom fieldtype with the below config


  





  
  




  



I have an Autocomplete configured on the same field which gives me result as 
expected. A new use case is to search kualalumpur or say newyork with out 
spaces returning Kuala Lumpur and New York which happen to be the original 
values.

What should be the recommended solution.

Regards,
Shubham




Re: Multi-words synonyms matching

2012-04-24 Thread Jeevanandam


usage of q and fq

q => is typically the main query for the search request

fq => is Filter Query; generally used to restrict the super set of 
documents without influencing score (more info. 
http://wiki.apache.org/solr/CommonQueryParameters#q)


For example:

q="hotel de ville" ===> returns 100 documents

q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed" ===> 
returns 40 documents from super set of 100 documents



hope this helps!

- Jeevanandam


On 24-04-2012 3:08 pm, elisabeth benoit wrote:

Hello,

I'd like to resume this post.

The only way I found to do not split synonyms in words in 
synonyms.txt it

to use the line

 

in schema.xml

where tokenizerFactory="solr.KeywordTokenizerFactory"

instructs SynonymFilterFactory not to break synonyms into words on 
white

spaces when parsing synonyms file.

So now it works fine, "mairie" is mapped into "hotel de ville" and 
when I
send request q="hotel de ville" (quotes are mandatory to prevent 
analyzer
to split hotel de ville on white spaces), I get answers with word 
"mairie".


But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de ville"), 
it

doesn't work!!!

CATEGORY_ANALYZED is same field type as default search field. This 
means
that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel 
de

ville", solr uses the same analyzer, the one with the line

.

Anyone as a clue what is different between q analysis behaviour and 
fq

analysis behaviour?

Thanks a lot
Elisabeth

2012/4/12 elisabeth benoit 


oh, that's right.

thanks a lot,
Elisabeth


2012/4/11 Jeevanandam Madanagopal 


Elisabeth -

As you described, below mapping might suit for your need.
mairie => hotel de ville, mairie

mairie gets expanded to "hotel de ville" and "mairie" at index 
time.  So

"mairie" and "hotel de ville" searchable on document.

However, still white space tokenizer splits at query time will be a
problem as described by Markus.

--Jeevanandam

On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:

> <' mapping instead? Something
> < < mairie
> <
> Yes, thanks, I've tried it but from what I undestand it doesn't 
solve my
> problem, since this means hotel de ville will be replace by 
mairie at
> index time (I use synonyms only at index time). So when user will 
ask

> "hôtel de ville", it won't match.
>
> In fact, at index time I have mairie in my data, but I want user 
to be

able
> to request "mairie" or "hôtel de ville" and have mairie as 
answer, and

not
> have mairie as an answer when requesting "hôtel".
>
>
>  white
> <
> <
>  query
> <
> Ok, I guess this means I have a problem. No simple solution since 
at

query
> time my tokenizer do split on white spaces.
>
> I guess my problem is more or less one of the problems discussed 
in

>
>

http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
>
>
> Thanks a lot for your answers,
> Elisabeth
>
>
>
>
>
> 2012/4/10 Erick Erickson 
>
>> Have you tried the "=>' mapping instead? Something
>> like
>> hotel de ville => mairie
>> might work for you.
>>
>> Best
>> Erick
>>
>> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
>>  wrote:
>>> Hello,
>>>
>>> I've read several post on this issue, but can't find a real 
solution

to
>> my
>>> multi-words synonyms matching problem.
>>>
>>> I have in my synonyms.txt an entry like
>>>
>>> mairie, hotel de ville
>>>
>>> and my index time analyzer is configured as followed for 
synonyms.

>>>
>>> synonyms="synonyms.txt"

>>> ignoreCase="true" expand="true"/>
>>>
>>> The problem I have is that now "mairie" matches with "hotel" 
and I

would
>>> only want "mairie" to match with "hotel de ville" and "mairie".
>>>
>>> When I look into the analyzer, I see that "mairie" is mapped 
into

>> "hotel",
>>> and words "de ville" are added in second and third position. To 
change

>>> that, I tried to do
>>>
>>> synonyms="synonyms.txt"

>>> ignoreCase="true" expand="true"
>>> tokenizerFactory="solr.KeywordTokenizerFactory"/> (as I read in 
one

post)
>>>
>>> and I can see now in the analyzer that "mairie" is mapped to 
"hotel de
>>> ville", but now when I have query "hotel de ville", it doesn't 
match

at
>> all
>>> with "mairie".
>>>
>>> Anyone has a clue of what I'm doing wrong?
>>>
>>> I'm using Solr 3.4.
>>>
>>> Thanks,
>>> Elisabeth
>>








Re: Auto suggest on indexed file content filtered based on user

2012-04-24 Thread Jeevanandam


can you please share a sample query?

-Jeevanandam


On 24-04-2012 1:49 pm, prakash_ajp wrote:
I am trying to implement an auto-suggest feature. The search feature 
already

exists and searches on file content in user's allotted workspace.

The following is from my schema that will be used for search 
indexing:


   
   

The search result is filtered by the user name. The suggest is 
implemented
as a searchComponent and the field 'Text' is used by the suggester 
and would
have to be filtered the same way the search is done. The problem with 
this
approach is, suggest works on a single field and there is no way to 
include

the UserName field as a filter.

What's the best way out from here?

Thanks in advance!
Jay

--
View this message in context:

http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3934565.html
Sent from the Solr - User mailing list archive at Nabble.com.


Recovery - too many updates received since start

2012-04-24 Thread Trym R. Møller

Hi

I experience that a Solr looses its connection with Zookeeper and 
re-establish it. After Solr is reconnection to Zookeeper it begins to 
recover.
It has been missing the connection approximately 10 seconds and 
meanwhile the leader slice has received some documents (maybe about 1000 
documents). Solr fails to update peer sync with the log message:

Apr 21, 2012 10:13:40 AM org.apache.solr.update.PeerSync sync
WARNING: PeerSync: core=mycollection_slice21_shard1 
url=zk-1:2181,zk-2:2181,zk-3:2181 too many updates received since start 
- startingUpdates no longer overlaps with our currentUpdates


Looking into PeerSync and UpdateLog I can see that 100 updates is the 
maximum allowed updates that a shard can be behind.
Is it correct that this is not configurable and what is the reasons for 
choosing 100?


I suspect that one must compare the work needed to replicate the full 
index with the performance loss/resource usage when enhancing the size 
of the UpdateLog?


Any comments regarding this is greatly appreciated.

Best regards Trym


JDBC import yields no data

2012-04-24 Thread Hasan Diwan
I'm trying to migrate from RDBMS to the Lucene ecosystem. To do this, I'm
trying to use the JDBC importer[1]. My configuration is given below:

  
  





 
 
 



And the resulting query of "*:*":
% curl "http://192.168.1.6:8995/solr/db/select/?q=*%3A*";

   [~]


01*:*

The SQL query does work properly, the relevant jars are in the lib
subdirectory. Help? -- H
-- 
Sent from my mobile device
Envoyait de mon portable
1. http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource


Recover - Read timed out

2012-04-24 Thread Trym R. Møller

Hi

I experience that a Solr looses its connection with Zookeeper and 
re-establish it. After Solr is reconnection to Zookeeper it begins to 
recover its replicas. It has been missing the connection approximately 
10 seconds and meanwhile the leader slice has received some documents 
(maybe about 1000 documents). Solr fails to update using peer sync and 
fails afterwards to do a full replicate with the log message below. The 
Solr from where the documents are replicated doesn't log anything when 
the replication is in progress. The full replica continues to fail with 
the "read time out" for about 10 hours and then Solr gives up.


1. How can I get more information about why the Read time out happens?
2. It seems like the Solr from where it replicates leaks a http 
connection each time (and a thread) having about 18.000 threads in 8 hours.


Any comments are welcome.

Best regards Trym

Apr 21, 2012 10:14:11 AM org.apache.solr.common.SolrException log
SEVERE: Error while trying to 
recover:org.apache.solr.client.solrj.SolrServerException: 
http://solr-ip:8983/solr/mycollection_slice21_shard2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:493)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:103)
at 
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:180)
at 
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:156)
at 
org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:170)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:341)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)

Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:440)

... 8 more


Re: Multi-words synonyms matching

2012-04-24 Thread elisabeth benoit
yes, thanks, but this is NOT my question.

I was wondering why I have multiple matches with q="hotel de ville" and no
match with fq=CATEGORY_ANALYZED:"hotel de ville", since in both case I'm
searching in the same solr fieldType.

Why is q parameter behaving differently in that case? Why do the quotes
work in one case and not in the other?

Does anyone know?

Thanks,
Elisabeth

2012/4/24 Jeevanandam 

>
> usage of q and fq
>
> q => is typically the main query for the search request
>
> fq => is Filter Query; generally used to restrict the super set of
> documents without influencing score (more info.
> http://wiki.apache.org/solr/**CommonQueryParameters#q
> )
>
> For example:
> 
> q="hotel de ville" ===> returns 100 documents
>
> q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed" ===>
> returns 40 documents from super set of 100 documents
>
>
> hope this helps!
>
> - Jeevanandam
>
>
>
> On 24-04-2012 3:08 pm, elisabeth benoit wrote:
>
>> Hello,
>>
>> I'd like to resume this post.
>>
>> The only way I found to do not split synonyms in words in synonyms.txt it
>> to use the line
>>
>>  > ignoreCase="true" expand="true"
>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>
>>
>> in schema.xml
>>
>> where tokenizerFactory="solr.**KeywordTokenizerFactory"
>>
>> instructs SynonymFilterFactory not to break synonyms into words on white
>> spaces when parsing synonyms file.
>>
>> So now it works fine, "mairie" is mapped into "hotel de ville" and when I
>> send request q="hotel de ville" (quotes are mandatory to prevent analyzer
>> to split hotel de ville on white spaces), I get answers with word
>> "mairie".
>>
>> But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de ville"), it
>> doesn't work!!!
>>
>> CATEGORY_ANALYZED is same field type as default search field. This means
>> that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel de
>> ville", solr uses the same analyzer, the one with the line
>>
>> > ignoreCase="true" expand="true"
>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>.
>>
>> Anyone as a clue what is different between q analysis behaviour and fq
>> analysis behaviour?
>>
>> Thanks a lot
>> Elisabeth
>>
>> 2012/4/12 elisabeth benoit 
>>
>>  oh, that's right.
>>>
>>> thanks a lot,
>>> Elisabeth
>>>
>>>
>>> 2012/4/11 Jeevanandam Madanagopal 
>>>
>>>  Elisabeth -

 As you described, below mapping might suit for your need.
 mairie => hotel de ville, mairie

 mairie gets expanded to "hotel de ville" and "mairie" at index time.  So
 "mairie" and "hotel de ville" searchable on document.

 However, still white space tokenizer splits at query time will be a
 problem as described by Markus.

 --Jeevanandam

 On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:

 > <' mapping instead? Something
 > <>>> > < mairie
 > <>>> >
 > Yes, thanks, I've tried it but from what I undestand it doesn't solve
 my
 > problem, since this means hotel de ville will be replace by mairie at
 > index time (I use synonyms only at index time). So when user will ask
 > "hôtel de ville", it won't match.
 >
 > In fact, at index time I have mairie in my data, but I want user to be
 able
 > to request "mairie" or "hôtel de ville" and have mairie as answer, and
 not
 > have mairie as an answer when requesting "hôtel".
 >
 >
 > <>>> your
 > white
 > <>>> >
 > <>>> >
 > <>>> at
 > query
 > <>>> >
 > Ok, I guess this means I have a problem. No simple solution since at
 query
 > time my tokenizer do split on white spaces.
 >
 > I guess my problem is more or less one of the problems discussed in
 >
 >

 http://lucene.472066.n3.**nabble.com/Multi-word-**
 synonyms-td3716292.html#**a3717215
 >
 >
 > Thanks a lot for your answers,
 > Elisabeth
 >
 >
 >
 >
 >
 > 2012/4/10 Erick Erickson 
 >
 >> Have you tried the "=>' mapping instead? Something
 >> like
 >> hotel de ville => mairie
 >> might work for you.
 >>
 >> Best
 >> Erick
 >>
 >> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
 >>  wrote:
 >>> Hello,
 >>>
 >>> I've read several post on this issue, but can't find a real solution
 to
 >> my
 >>> multi-words synonyms matching problem.
 >>>
 >>> I have in my synonyms.txt an entry like
 >>>
 >>> mairie, hotel de ville
 >>>
 >>> and my index time analyzer is configured as followed for synonyms.
 >>>
 >>> >>> >>> ignoreCase="true" expand="true"/>
 >>>
 >>> The problem I have is that now "mairie" matches with "hotel" and I
 would
 >>> only want "mairie" to match with "hotel de ville" and "mairie".
 >>>
 >>> When I look int

debugging junit test with eclipse

2012-04-24 Thread Bernd Fehling
I have tried all hints from internet for debugging a junit test of
solr 3.6 under eclipse but didn't succeed.

eclipse and everything is running, compiling, debugging with runjettyrun.
Tests have no errors.
Ant from command line ist also running with ivy, e.g.
ant -Dtestmethod=testUserFields -Dtestcase=TestExtendedDismaxParser 
test-solr-core

But I can't get a single test with junit running from eclipse and then
jump into it for debugging.

Any idea what's going wrong?

Regards
Bernd


Re: Deciding whether to stem at query time

2012-04-24 Thread Otis Gospodnetic
Hi Andrew,

This would not necessarily increase the size of your index that much - you 
don't to store both fields, just 1 of them if you really need it for 
highlighting or displaying.  If not, just index.

Otis 

Performance Monitoring for Solr - 
http://sematext.com/spm/solr-performance-monitoring



>
> From: Andrew Wagner 
>To: solr-user@lucene.apache.org 
>Sent: Tuesday, April 24, 2012 7:21 AM
>Subject: Re: Deciding whether to stem at query time
> 
>Ah, this is a really good point. Still seems like it has the downsides of
>#2, though, much bigger space requirements and possibly some time lost on
>queries.
>
>On Mon, Apr 23, 2012 at 3:35 PM, Walter Underwood wrote:
>
>> There is a third approach. Create two fields and always query both of
>> them, with the exact field given a higher weight. This works great and
>> performs well.
>>
>> It is what we did at Netflix and what I'm doing at Chegg.
>>
>> wunder
>>
>> On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote:
>>
>> > So I just realized the other day that stemming basically happens at index
>> > time. If I'm understanding correctly, there's no way to allow a user to
>> > specify, at run time, whether to stem particular words or not based on a
>> > single index. I think there are two options, but I'd love to hear that
>> I'm
>> > wrong:
>> >
>> > 1.) Incrementally build up a white list of words that don't stem very
>> well.
>> > To pick a random example out of the blue, "light" isn't super closely
>> > related to, "lighter", so I might choose not to stem that. If I wanted to
>> > do this, I think (if I understand correctly), stemmerOverrideFilter would
>> > help me out with this. I'm not a big fan of this approach.
>> >
>> > 2.) Index all the text in two fields, once with stemming and once
>> without.
>> > Then build some kind of option into the UI for specifying whether to stem
>> > the words or not, and search the appropriate field. Unfortunately, this
>> > would roughly double the size of my index, and probably affect query
>> times
>> > too. Plus, the UI would probably suck.
>> >
>> > Am I missing an option? Has anyone tried one of these approaches?
>> >
>> > Thanks!
>> > Andrew
>>
>>
>>
>>
>>
>>
>
>
>

Query parsing VS marshalling/unmarshalling

2012-04-24 Thread Mindaugas Žakšauskas
Hi,

I maintain a distributed system which Solr is part of. The data which
is kept is Solr is "permissioned" and permissions are currently
implemented by taking the original user query, adding certain bits to
it which would make it return less data in the search results. Now I
am at the point where I need to go over this functionality and try to
improve it.

Changing this to send two separate queries (q=...&fq=...) would be the
first logical thing to do, however I was thinking of an extra
improvement. Instead of generating filter query, converting it into a
String, sending over the HTTP just to parse it by Solr again - would
it not be better to take generated Lucene fq query, serialize it using
Java serialization, convert it to, say, Base64 and then send and
deserialize it on the Solr end? Has anyone tried doing any performance
comparisons on this topic?

I am particularly concerned about this because in extreme cases my
filter queries can be very large (1000s of characters long) and we
already had to do tweaks as the size of GET requests would exceed
default limits. And yes, we could move to POST but I would like to
minimize both the amount of data that is sent over and the time taken
to parse large queries.

Thanks in advance.

m.


Re: Query parsing VS marshalling/unmarshalling

2012-04-24 Thread Benson Margulies
2012/4/24 Mindaugas Žakšauskas :
> Hi,
>
> I maintain a distributed system which Solr is part of. The data which
> is kept is Solr is "permissioned" and permissions are currently
> implemented by taking the original user query, adding certain bits to
> it which would make it return less data in the search results. Now I
> am at the point where I need to go over this functionality and try to
> improve it.
>
> Changing this to send two separate queries (q=...&fq=...) would be the
> first logical thing to do, however I was thinking of an extra
> improvement. Instead of generating filter query, converting it into a
> String, sending over the HTTP just to parse it by Solr again - would
> it not be better to take generated Lucene fq query, serialize it using
> Java serialization, convert it to, say, Base64 and then send and
> deserialize it on the Solr end? Has anyone tried doing any performance
> comparisons on this topic?

I'm about to try out a contribution for serializing queries in
Javascript using Jackson. I've previously done this by serializing my
own data structure and putting the JSON into a custom query parameter.


>
> I am particularly concerned about this because in extreme cases my
> filter queries can be very large (1000s of characters long) and we
> already had to do tweaks as the size of GET requests would exceed
> default limits. And yes, we could move to POST but I would like to
> minimize both the amount of data that is sent over and the time taken
> to parse large queries.
>
> Thanks in advance.
>
> m.


Re: Deciding whether to stem at query time

2012-04-24 Thread Paul Libbrecht

Le 24 avr. 2012 à 17:16, Otis Gospodnetic a écrit :
> This would not necessarily increase the size of your index that much - you 
> don't to store both fields, just 1 of them if you really need it for 
> highlighting or displaying.  If not, just index.

I second this.
The query expansion process is far from being a slow thing... you can easily 
expand to tens of fields with a fairly small penalty.

Where you have a penalty is at stored fields... these need to be really 
carefully avoided as much as possible.
As long as you keep them small, the legendary performance of SOLR will still 
hold.

paul

Re: Deciding whether to stem at query time

2012-04-24 Thread Andrew Wagner
I'm sorry, I'm missing something. What's the difference between "storing"
and "indexing" a field?

On Tue, Apr 24, 2012 at 10:28 AM, Paul Libbrecht  wrote:

>
> Le 24 avr. 2012 à 17:16, Otis Gospodnetic a écrit :
> > This would not necessarily increase the size of your index that much -
> you don't to store both fields, just 1 of them if you really need it for
> highlighting or displaying.  If not, just index.
>
> I second this.
> The query expansion process is far from being a slow thing... you can
> easily expand to tens of fields with a fairly small penalty.
>
> Where you have a penalty is at stored fields... these need to be really
> carefully avoided as much as possible.
> As long as you keep them small, the legendary performance of SOLR will
> still hold.
>
> paul


Re: Query parsing VS marshalling/unmarshalling

2012-04-24 Thread Mindaugas Žakšauskas
On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies  wrote:
> I'm about to try out a contribution for serializing queries in
> Javascript using Jackson. I've previously done this by serializing my
> own data structure and putting the JSON into a custom query parameter.

Thanks for your reply. Appreciate your effort, but I'm not sure if I
fully understand the gain.

Having data in JSON would still require it to be converted into Lucene
Query at the end which takes space & CPU effort, right? Or are you
saying that having query serialized into a structured data blob (JSON
in this case) makes it somehow easier to convert it into Lucene Query?

I only thought about Java serialization because:
- it's rather close to the in-object format
- the mechanism is rather stable and is an established standard in Java/JVM
- Lucene Queries seem to implement java.io.Serializable (haven't done
a thorough check but looks good on the surface)
- other conversions (e.g. using Xtream) are either slow or require
custom annotations. I personally don't see how would Lucene/Solr
include them in their core classes.

Anyway, it would still be interesting to hear if anyone could
elaborate on query parsing complexity.

m.


RE: JDBC import yields no data

2012-04-24 Thread Dyer, James
You might also want to show us your "dataimport" handler configuration from 
solrconfig.xml and also the url you're using to start the data import.  When 
its complete, browsing to "http://192.168.1.6:8995/solr/db/dataimport"; (or 
whatever the DIH handler name is in your config) should say "indexing complete" 
and also the number of documents it imported.  Also, if you have "commit=false" 
in your config, it won't issue a commit so you won't see the documents.

If it fails, your servlet container's logs should have a stack trace or 
something indicating what the failure was.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Hasan Diwan [mailto:hasan.di...@gmail.com] 
Sent: Tuesday, April 24, 2012 8:51 AM
To: solr-user@lucene.apache.org
Subject: JDBC import yields no data

I'm trying to migrate from RDBMS to the Lucene ecosystem. To do this, I'm
trying to use the JDBC importer[1]. My configuration is given below:

  
  





 
 
 



And the resulting query of "*:*":
% curl "http://192.168.1.6:8995/solr/db/select/?q=*%3A*";

   [~]


01*:*

The SQL query does work properly, the relevant jars are in the lib
subdirectory. Help? -- H
-- 
Sent from my mobile device
Envoyait de mon portable
1. http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource


Re: Group by distance

2012-04-24 Thread Erick Erickson
What do you mean by "grouped"? It's relatively easy to return
only documents within a certain radius, and it's also easy to
return the results ordered by distance.

Here's a good place to start:
http://wiki.apache.org/solr/SpatialSearch#geofilt_-_The_distance_filter

Best
Erick

On Tue, Apr 24, 2012 at 6:33 AM, ViruS  wrote:
> I think this only can works when I have many records in same position.
> My problem is to group witch short distance... like I say in last mail...
> about 10km.
> I need put markers on Poland country and display this.
> Now I have 100k records, but in future I will have about 2mln records so I
> must send grouped records.
>
> Best,
> Piotr
>
> On 24 April 2012 12:08, ravicv  wrote:
>
>> Use group=true and group.field in your query.
>> And your solr version should be solr 3.4 and above.
>>
>> Thanks,
>> Ravi
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Group-by-distance-tp3934876p3934886.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Piotr (ViruS) Sikora
> E-mail/JID: vi...@hostv.pl
> http://piotrsikora.pl


Stats Component and solrj

2012-04-24 Thread Erik Fäßler
Hey all,

I'd like to know how many terms I have in a particular field in a search. In 
other words, I want to know how many facets I have in that field. I use string 
fields, there are no numbers. I wanted to use the Stats Component and use its 
"count" value. When trying this out in the browser, everything works like 
expected.
However, when I want to do the same thing in my Java web app, I get an error 
because in FieldStatsInfo.class it says

 min = (Double)entry.getValue();

Where 'entry.getValue()' is a String because I have a string field here. Thus, 
I get an error that String cannot be cast to Double.
In the browser I just got a String returned here, probably relative to an 
lexicographical order.

I switched the Stats Component on with

query.setGetFieldStatistics("authors");

Where 'authors' is a field with author names.
Is it possible that solrj not yet works with the Stats Component on string 
fields? I tried Solr 3.5 and 3.6 without success. Is there another easy way to 
get the count I want? Will solrj be fixed? Or am I just doing an error?

Best regards,

Erik

correct location in chain for EdgeNGramFilterFactory ?

2012-04-24 Thread geeky2
hello all,

i want to experiment with the EdgeNGramFilterFactory at index time.

i believe this needs to go in post tokenization - but i am doing a pattern
replace as well as other things.

should the EdgeNGramFilterFactory go in right after the pattern replace?





  






*put EdgeNGramFilterFactory here ===> ?*





  
  







  


thanks for any help,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/correct-location-in-chain-for-EdgeNGramFilterFactory-tp3935589p3935589.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-words synonyms matching

2012-04-24 Thread Erick Erickson
Elisabeth:

What shows up in the debug section of the response when you add
&debugQuery=on? There should be some bit of that section like:
"parsed_filter_queries"

My other question is "are you absolutely sure that your
CATEGORY_ANALYZED field has the correct content?". How does it
get populated?

Nothing jumps out at me here

Best
Erick

On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit
 wrote:
> yes, thanks, but this is NOT my question.
>
> I was wondering why I have multiple matches with q="hotel de ville" and no
> match with fq=CATEGORY_ANALYZED:"hotel de ville", since in both case I'm
> searching in the same solr fieldType.
>
> Why is q parameter behaving differently in that case? Why do the quotes
> work in one case and not in the other?
>
> Does anyone know?
>
> Thanks,
> Elisabeth
>
> 2012/4/24 Jeevanandam 
>
>>
>> usage of q and fq
>>
>> q => is typically the main query for the search request
>>
>> fq => is Filter Query; generally used to restrict the super set of
>> documents without influencing score (more info.
>> http://wiki.apache.org/solr/**CommonQueryParameters#q
>> )
>>
>> For example:
>> 
>> q="hotel de ville" ===> returns 100 documents
>>
>> q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed" ===>
>> returns 40 documents from super set of 100 documents
>>
>>
>> hope this helps!
>>
>> - Jeevanandam
>>
>>
>>
>> On 24-04-2012 3:08 pm, elisabeth benoit wrote:
>>
>>> Hello,
>>>
>>> I'd like to resume this post.
>>>
>>> The only way I found to do not split synonyms in words in synonyms.txt it
>>> to use the line
>>>
>>>  >> ignoreCase="true" expand="true"
>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>
>>>
>>> in schema.xml
>>>
>>> where tokenizerFactory="solr.**KeywordTokenizerFactory"
>>>
>>> instructs SynonymFilterFactory not to break synonyms into words on white
>>> spaces when parsing synonyms file.
>>>
>>> So now it works fine, "mairie" is mapped into "hotel de ville" and when I
>>> send request q="hotel de ville" (quotes are mandatory to prevent analyzer
>>> to split hotel de ville on white spaces), I get answers with word
>>> "mairie".
>>>
>>> But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de ville"), it
>>> doesn't work!!!
>>>
>>> CATEGORY_ANALYZED is same field type as default search field. This means
>>> that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel de
>>> ville", solr uses the same analyzer, the one with the line
>>>
>>> >> ignoreCase="true" expand="true"
>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>.
>>>
>>> Anyone as a clue what is different between q analysis behaviour and fq
>>> analysis behaviour?
>>>
>>> Thanks a lot
>>> Elisabeth
>>>
>>> 2012/4/12 elisabeth benoit 
>>>
>>>  oh, that's right.

 thanks a lot,
 Elisabeth


 2012/4/11 Jeevanandam Madanagopal 

  Elisabeth -
>
> As you described, below mapping might suit for your need.
> mairie => hotel de ville, mairie
>
> mairie gets expanded to "hotel de ville" and "mairie" at index time.  So
> "mairie" and "hotel de ville" searchable on document.
>
> However, still white space tokenizer splits at query time will be a
> problem as described by Markus.
>
> --Jeevanandam
>
> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
>
> > <' mapping instead? Something
> > < > < mairie
> > < >
> > Yes, thanks, I've tried it but from what I undestand it doesn't solve
> my
> > problem, since this means hotel de ville will be replace by mairie at
> > index time (I use synonyms only at index time). So when user will ask
> > "hôtel de ville", it won't match.
> >
> > In fact, at index time I have mairie in my data, but I want user to be
> able
> > to request "mairie" or "hôtel de ville" and have mairie as answer, and
> not
> > have mairie as an answer when requesting "hôtel".
> >
> >
> > < your
> > white
> > < >
> > < >
> > < at
> > query
> > < >
> > Ok, I guess this means I have a problem. No simple solution since at
> query
> > time my tokenizer do split on white spaces.
> >
> > I guess my problem is more or less one of the problems discussed in
> >
> >
>
> http://lucene.472066.n3.**nabble.com/Multi-word-**
> synonyms-td3716292.html#**a3717215
> >
> >
> > Thanks a lot for your answers,
> > Elisabeth
> >
> >
> >
> >
> >
> > 2012/4/10 Erick Erickson 
> >
> >> Have you tried the "=>' mapping instead? Something
> >> like
> >> hotel de ville => mairie
> >> might work for you.
> >>
> >> Best
> >> Erick
> >>
> >> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
> >>  wrote:
> >>> Hello,
> >>>
> >>

Re: Deciding whether to stem at query time

2012-04-24 Thread Erick Erickson
When you set store="true" in your schema, a verbatim copy of
the raw input is placed in the *.fdt file. That is the information
returned when you specify the "fl" parameter for instance.

When you set index="true", the input is analyzed and the
resulting terms are placed in the inverted index and are
searchable.

The two are essentially completely orthogonal for all you
specify them at the same time.

So, a field that's stored but not indexed would be displayable
to the user, but no searches could be performed on it.

A field indexed but stored can be searched, but the information
is not retrievable.

Why are there two options? Well, you may use copyField to
index the data two different ways for two different purposes, as
in this thread. Putting the verbatim data in twice is wasteful,
you only ever need it once.

Why store in the first palce? Because all that gets into the
inverted index is the results of the analysis. So if you indexed
"story" with stemming turned on, it might result in "stori" being
in the index. And if you use phonetic filters, it's much worse,
your terms will be something like "UNT4" or "KMPT" which are
totally unsuitable to show the user. So if you want to _search_
phonetically but display the field to the user, you would both
index and store.

And even if you could recover the terms from the inverted
index as they were fed in, it would be a very expensive
process. Luke does this, you might try reconstructing
a document with Luke to see what a reconstructed doc
looks like, and how long it takes.

Hope that helps
Erick

On Tue, Apr 24, 2012 at 10:40 AM, Andrew Wagner  wrote:
> I'm sorry, I'm missing something. What's the difference between "storing"
> and "indexing" a field?
>
> On Tue, Apr 24, 2012 at 10:28 AM, Paul Libbrecht  wrote:
>
>>
>> Le 24 avr. 2012 à 17:16, Otis Gospodnetic a écrit :
>> > This would not necessarily increase the size of your index that much -
>> you don't to store both fields, just 1 of them if you really need it for
>> highlighting or displaying.  If not, just index.
>>
>> I second this.
>> The query expansion process is far from being a slow thing... you can
>> easily expand to tens of fields with a fairly small penalty.
>>
>> Where you have a penalty is at stored fields... these need to be really
>> carefully avoided as much as possible.
>> As long as you keep them small, the legendary performance of SOLR will
>> still hold.
>>
>> paul


Re: Query parsing VS marshalling/unmarshalling

2012-04-24 Thread Erick Erickson
In general, query parsing is such a small fraction of the total time that,
almost no matter how complex, it's not worth worrying about. To see
this, attach &debugQuery=on to your query and look at the timings
in the "pepare" and "process" portions of the response. I'd  be
very sure that it was a problem before spending any time trying to make
the transmission of the data across the wire more efficient, my first
reaction is that this is premature optimization.

Second, you could do this on the server side with a custom query
component if you chose. You can freely modify the query
over there and it may make sense in your situation.

Third, consider "no cache filters", which were developed for
expensive filter queries, ACL being one of them. See:
https://issues.apache.org/jira/browse/SOLR-2429

Fourth, I'd ask if there's a way to reduce the size of the FQ
clause. Is this on a particular user basis or groups basis?
If you can get this down to a few groups that would help. Although
there's often some outlier who is member of thousands of
groups :(.

Best
Erick


2012/4/24 Mindaugas Žakšauskas :
> On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies  
> wrote:
>> I'm about to try out a contribution for serializing queries in
>> Javascript using Jackson. I've previously done this by serializing my
>> own data structure and putting the JSON into a custom query parameter.
>
> Thanks for your reply. Appreciate your effort, but I'm not sure if I
> fully understand the gain.
>
> Having data in JSON would still require it to be converted into Lucene
> Query at the end which takes space & CPU effort, right? Or are you
> saying that having query serialized into a structured data blob (JSON
> in this case) makes it somehow easier to convert it into Lucene Query?
>
> I only thought about Java serialization because:
> - it's rather close to the in-object format
> - the mechanism is rather stable and is an established standard in Java/JVM
> - Lucene Queries seem to implement java.io.Serializable (haven't done
> a thorough check but looks good on the surface)
> - other conversions (e.g. using Xtream) are either slow or require
> custom annotations. I personally don't see how would Lucene/Solr
> include them in their core classes.
>
> Anyway, it would still be interesting to hear if anyone could
> elaborate on query parsing complexity.
>
> m.


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-24 Thread geeky2
hello,

thank you for the reply,

yes - master has been indexed.

ok - makes sense - the polling interval needs to change

i did check the solr war file on both boxes (master and slave).  they are
identical.  actually - if they were not indentical - this would point to a
different issue altogether - since our deployment infrastructure - rolls the
war file to the slaves when you do a deployment on the master.

this has me stumped - not sure what to check next.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3935699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query parsing VS marshalling/unmarshalling

2012-04-24 Thread Mindaugas Žakšauskas
Hi Erick,

Thanks for looking into this and for the tips you've sent.

I am leaning towards custom query component at the moment, the primary
reason for it would be to be able to squeeze the amount of data that
is sent over to Solr. A single round trip within the same datacenter
is worth around 0.5 ms [1] and if query doesn't fit into a single
ethernet packet, this number effectively has to double/triple/etc.

Regarding cache filters - I was actually thinking the opposite:
caching ACL queries (filter queries) would be beneficial as those tend
to be the same across multiple search requests.

[1] 
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//people/jeff/stanford-295-talk.pdf
, slide 13

m.

On Tue, Apr 24, 2012 at 4:43 PM, Erick Erickson  wrote:
> In general, query parsing is such a small fraction of the total time that,
> almost no matter how complex, it's not worth worrying about. To see
> this, attach &debugQuery=on to your query and look at the timings
> in the "pepare" and "process" portions of the response. I'd  be
> very sure that it was a problem before spending any time trying to make
> the transmission of the data across the wire more efficient, my first
> reaction is that this is premature optimization.
>
> Second, you could do this on the server side with a custom query
> component if you chose. You can freely modify the query
> over there and it may make sense in your situation.
>
> Third, consider "no cache filters", which were developed for
> expensive filter queries, ACL being one of them. See:
> https://issues.apache.org/jira/browse/SOLR-2429
>
> Fourth, I'd ask if there's a way to reduce the size of the FQ
> clause. Is this on a particular user basis or groups basis?
> If you can get this down to a few groups that would help. Although
> there's often some outlier who is member of thousands of
> groups :(.
>
> Best
> Erick
>
>
> 2012/4/24 Mindaugas Žakšauskas :
>> On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies  
>> wrote:
>>> I'm about to try out a contribution for serializing queries in
>>> Javascript using Jackson. I've previously done this by serializing my
>>> own data structure and putting the JSON into a custom query parameter.
>>
>> Thanks for your reply. Appreciate your effort, but I'm not sure if I
>> fully understand the gain.
>>
>> Having data in JSON would still require it to be converted into Lucene
>> Query at the end which takes space & CPU effort, right? Or are you
>> saying that having query serialized into a structured data blob (JSON
>> in this case) makes it somehow easier to convert it into Lucene Query?
>>
>> I only thought about Java serialization because:
>> - it's rather close to the in-object format
>> - the mechanism is rather stable and is an established standard in Java/JVM
>> - Lucene Queries seem to implement java.io.Serializable (haven't done
>> a thorough check but looks good on the surface)
>> - other conversions (e.g. using Xtream) are either slow or require
>> custom annotations. I personally don't see how would Lucene/Solr
>> include them in their core classes.
>>
>> Anyway, it would still be interesting to hear if anyone could
>> elaborate on query parsing complexity.
>>
>> m.


Re: Auto suggest on indexed file content filtered based on user

2012-04-24 Thread prakash_ajp
Right now, the query is a very simple one, something like q=text. Basically,
it would return ['textview', 'textviewer', ..]

But the issue is, the 'textviewer' could be from a file that is out of
bounds for this user. So, ultimately I would like to include the userName in
the query. As mentioned earlier, userName is another field in the main
index.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3935765.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-24 Thread Rahul Warawdekar
Hi,

In Solr wiki, for replication, the master url is defined as follows
http://master_host:port
/solr/corename/replication

This url does not contain "admin" in its path where as in the master url
provided by you, you have an additional "admin" in the url.
Not very sure if this might be an issue but you can just check removing
"admin" and check if replication works.


On Tue, Apr 24, 2012 at 11:49 AM, geeky2  wrote:

> hello,
>
> thank you for the reply,
>
> yes - master has been indexed.
>
> ok - makes sense - the polling interval needs to change
>
> i did check the solr war file on both boxes (master and slave).  they are
> identical.  actually - if they were not indentical - this would point to a
> different issue altogether - since our deployment infrastructure - rolls
> the
> war file to the slaves when you do a deployment on the master.
>
> this has me stumped - not sure what to check next.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3935699.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Auto suggest on indexed file content filtered based on user

2012-04-24 Thread Jeevanandam Madanagopal
On Apr 24, 2012, at 9:37 PM, prakash_ajp wrote:

> Right now, the query is a very simple one, something like q=text. Basically,
> it would return ['textview', 'textviewer', ..]
   hmm, so you're using default query field

> 
> But the issue is, the 'textviewer' could be from a file that is out of
> bounds for this user. So, ultimately I would like to include the userName in
> the query. As mentioned earlier, userName is another field in the main
> index.
   and you like to filter the result set along with userName field value
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3935765.html
> Sent from the Solr - User mailing list archive at Nabble.com.

in this scenario 'fq' parameter will facilitate to achieve your desire result.
Please refer http://wiki.apache.org/solr/CommonQueryParameters#fq

try this   q=text&fq=userName:"prakash"

Let us know!

-Jeevanandam



Re: JDBC import yields no data

2012-04-24 Thread Hasan Diwan
On 24 April 2012 07:49, Dyer, James  wrote:

> You might also want to show us your "dataimport" handler configuration
> from solrconfig.xml and also the url you're using to start the data import.
>  When its complete, browsing to "
> http://192.168.1.6:8995/solr/db/dataimport"; (or whatever the DIH handler
> name is in your config) should say "indexing complete" and also the number
> of documents it imported.  Also, if you have "commit=false" in your config,
> it won't issue a commit so you won't see the documents.
>

solrconfig.xml:





  LUCENE_35

  

  

${solr.abortOnConfigurationError:true}

  

  
   
false

10



32
2147483647
1
1000










single
  

  

false
32
10


2147483647
1


false
  

  
  




10







  


  

1024





   


  



true




   

   
50


200







  
 solr 0 10 
 rocks 0 10 
static newSearcher warming query from
solrconfig.xml
  




  
  



false


4

  

  
  






   
   
   

  


  
  

 
   explicit
   
 
  


  
  

 explicit
 0.01
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 
 
ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
 
id,name,price,score
 
 
2<-1 5<-2 6<90%
 
 100
 *:*
 
 text features name
 
 0
 
 name
 regex 

  

  
  

 explicit
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0
 2<-1 5<-2 6<90%
 
 incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2



  inStock:true



  cat
  manu_exact
  price:[* TO 500]
  price:[500 TO *]

  

  

 
inStock:true
 
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
2<-1 5<-2 6<90%
 
  


  
  

 
   1
   0.5
 

 

 
 
 
 spell

 
 
 
 word

   

   
 
   manu,cat
   1
 
   

   

db-data-config.xml

  



  

  

  explicit


  

  

string
elevate.xml
  

  

  explicit


  elevator

  



  
  

  

  
  

  

  
  


  
  

  
  

 explicit 
 true

  

  
   
   
   

 100

   

   
   

  
  70
  
  0.5
  
  [-\w ,/\n\"']{20,200}

   

   
   

 
 

   
  


  

  
  
5
  

  
  
*:*


  



The dataimport url I'm using is
http://192.168.1.6:8995/solr/db/dataimport?command=full-import

The servlet log doesn't show any errors. I appreciate the kind assistance.
-- H
-- 
Sent from my mobile device
Envoyait de mon portable


Re: JDBC import yields no data

2012-04-24 Thread Gora Mohanty
On 24 April 2012 22:22, Hasan Diwan  wrote:
[...]
> The dataimport url I'm using is
> http://192.168.1.6:8995/solr/db/dataimport?command=full-import

And, does it show you any output? As James mentions, it should
say "busy" while the data import is running, and "indexing completed"
when done. Also, is the above URL correct? /solr/db/ looks a little
odd, but that could have to do with how you have Solr set up.

My other guess would be that your JDBC set up is not correct.
For testing, you could try to simplify it by not using
net.sf.log4jdbc.DriverSpy , and trying directly with the H2
database JDBC driver.

Regards,
Gora


RE: JDBC import yields no data

2012-04-24 Thread Dyer, James
After you issue the full-import command with the url you gave:

http://192.168.1.6:8995/solr/db/dataimport?command=full-import

Paste the url in a web browser without the "command"

http://192.168.1.6:8995/solr/db/dataimport

It should be giving you status as to how many database calls its made, how many 
rows read & documents indexed.  Keep refreshing the page until it is done.  
When it finishes, you should get either a Success or a Failure message.  Is it 
saying success or failure?  Also how many documents does it say it indexed?

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Hasan Diwan [mailto:hasan.di...@gmail.com]
Sent: Tuesday, April 24, 2012 11:52 AM
To: solr-user@lucene.apache.org
Subject: Re: JDBC import yields no data

On 24 April 2012 07:49, Dyer, James  wrote:

> You might also want to show us your "dataimport" handler configuration
> from solrconfig.xml and also the url you're using to start the data import.
>  When its complete, browsing to "
> http://192.168.1.6:8995/solr/db/dataimport"; (or whatever the DIH handler
> name is in your config) should say "indexing complete" and also the number
> of documents it imported.  Also, if you have "commit=false" in your config,
> it won't issue a commit so you won't see the documents.
>

solrconfig.xml:





  LUCENE_35

  

  

${solr.abortOnConfigurationError:true}

  

  
   
false

10



32
2147483647
1
1000










single
  

  

false
32
10


2147483647
1


false
  

  
  




10







  


  

1024





   


  



true




   

   
50


200







  
 solr 0 10 
 rocks 0 10 
static newSearcher warming query from
solrconfig.xml
  




  
  



false


4

  

  
  






   
   
   

  


  
  

 
   explicit
   
 
  


  
  

 explicit
 0.01
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 
 
ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
 
id,name,price,score
 
 
2<-1 5<-2 6<90%
 
 100
 *:*
 
 text features name
 
 0
 
 name
 regex 

  

  
  

 explicit
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0
 2<-1 5<-2 6<90%
 
 incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2



  inStock:true



  cat
  manu_exact
  price:[* TO 500]
  price:[500 TO *]

  

  

 
inStock:true
 
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
2<-1 5<-2 6<90%
 
  


  
  

 
   1
   0.5
 

 

 
 
 
 spell

 
 
 
 word

   

   
 
   manu,cat
   1
 
   

   

db-data-config.xml

  



  

  

  explicit


  

  

string
elevate.xml
  

  

  explicit


  elevator

  



  
  

  

  
  

  

  
  


  
  

  
  

 explicit 
 true

  

  
   
   
   

 100

   

   
   

  
  70
  
  0.5
  
  [-\w ,/\n\"']{20,200}

   

   
   

 
 

   
  


  

  
  
5
  

  
  
*:*


  



The dataimport url I'm using is
http://192.168.1.6:8995/solr/db/dataimport?command=full-import

The servlet log doesn't show any errors. I appreciate the kind assistance.
-- H
--
Sent from my mobile device
Envoyait de mon portable


Re: Query parsing VS marshalling/unmarshalling

2012-04-24 Thread Erick Erickson
If you're assembling an fq clause, this is all done or you, although
you need to take some care to form the fq clause _exactly_
the same way each time. Think of the filterCache as a key/value
map where the key is the raw fq text and the value is the docs
satisfying that query.

So fq=acl:(a OR a) will not, for instance, match
 fq=acl:(b OR a)

FWIW
Erick

2012/4/24 Mindaugas Žakšauskas :
> Hi Erick,
>
> Thanks for looking into this and for the tips you've sent.
>
> I am leaning towards custom query component at the moment, the primary
> reason for it would be to be able to squeeze the amount of data that
> is sent over to Solr. A single round trip within the same datacenter
> is worth around 0.5 ms [1] and if query doesn't fit into a single
> ethernet packet, this number effectively has to double/triple/etc.
>
> Regarding cache filters - I was actually thinking the opposite:
> caching ACL queries (filter queries) would be beneficial as those tend
> to be the same across multiple search requests.
>
> [1] 
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//people/jeff/stanford-295-talk.pdf
> , slide 13
>
> m.
>
> On Tue, Apr 24, 2012 at 4:43 PM, Erick Erickson  
> wrote:
>> In general, query parsing is such a small fraction of the total time that,
>> almost no matter how complex, it's not worth worrying about. To see
>> this, attach &debugQuery=on to your query and look at the timings
>> in the "pepare" and "process" portions of the response. I'd  be
>> very sure that it was a problem before spending any time trying to make
>> the transmission of the data across the wire more efficient, my first
>> reaction is that this is premature optimization.
>>
>> Second, you could do this on the server side with a custom query
>> component if you chose. You can freely modify the query
>> over there and it may make sense in your situation.
>>
>> Third, consider "no cache filters", which were developed for
>> expensive filter queries, ACL being one of them. See:
>> https://issues.apache.org/jira/browse/SOLR-2429
>>
>> Fourth, I'd ask if there's a way to reduce the size of the FQ
>> clause. Is this on a particular user basis or groups basis?
>> If you can get this down to a few groups that would help. Although
>> there's often some outlier who is member of thousands of
>> groups :(.
>>
>> Best
>> Erick
>>
>>
>> 2012/4/24 Mindaugas Žakšauskas :
>>> On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies  
>>> wrote:
 I'm about to try out a contribution for serializing queries in
 Javascript using Jackson. I've previously done this by serializing my
 own data structure and putting the JSON into a custom query parameter.
>>>
>>> Thanks for your reply. Appreciate your effort, but I'm not sure if I
>>> fully understand the gain.
>>>
>>> Having data in JSON would still require it to be converted into Lucene
>>> Query at the end which takes space & CPU effort, right? Or are you
>>> saying that having query serialized into a structured data blob (JSON
>>> in this case) makes it somehow easier to convert it into Lucene Query?
>>>
>>> I only thought about Java serialization because:
>>> - it's rather close to the in-object format
>>> - the mechanism is rather stable and is an established standard in Java/JVM
>>> - Lucene Queries seem to implement java.io.Serializable (haven't done
>>> a thorough check but looks good on the surface)
>>> - other conversions (e.g. using Xtream) are either slow or require
>>> custom annotations. I personally don't see how would Lucene/Solr
>>> include them in their core classes.
>>>
>>> Anyway, it would still be interesting to hear if anyone could
>>> elaborate on query parsing complexity.
>>>
>>> m.


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-24 Thread geeky2
that was it!

thank you.

i did notice something else in the logs now ...

what is the meaning or implication of the message, "Connection reset".?



2012-04-24 12:59:19,996 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 12:59:39,998 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
*2012-04-24 12:59:59,997 SEVERE [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Master at:
http://bogus:bogusport/somepath/somecore/replication/ is not available.
Index fetch failed. Exception: Connection reset*
2012-04-24 13:00:19,998 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:00:40,004 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:00:59,992 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:01:19,993 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:01:39,992 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:01:59,989 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:02:19,990 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:02:39,989 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-24 13:02:59,991 INFO  [org.a

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3936107.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Auto suggest on indexed file content filtered based on user

2012-04-24 Thread Klostermeyer, Michael
I'm new to Solr, but I would think the fq=[username] would work here.

http://wiki.apache.org/solr/CommonQueryParameters#fq

Mike

-Original Message-
From: prakash_ajp [mailto:prakash_...@yahoo.com] 
Sent: Tuesday, April 24, 2012 11:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Auto suggest on indexed file content filtered based on user

Right now, the query is a very simple one, something like q=text. Basically, it 
would return ['textview', 'textviewer', ..]

But the issue is, the 'textviewer' could be from a file that is out of bounds 
for this user. So, ultimately I would like to include the userName in the 
query. As mentioned earlier, userName is another field in the main index.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3935765.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto suggest on indexed file content filtered based on user

2012-04-24 Thread prakash_ajp
I read on a couple of other web pages that fq is not supported for suggester.
I even tried the query and it doesn't help. My understanding was, when the
suggest (spellcheck) index is built, only the field chosen is considered for
queries and the other fields from the main index are not available for
filtering purposes once the index is created.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3936144.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto suggest on indexed file content filtered based on user

2012-04-24 Thread Jeevanandam Madanagopal
yes only spellcheck indexed build field is for suggest query
I believe, filtering a documents on search handler using fq parameter and spell 
suggest are two part we are discussing here.

lets say you have field for spellcheck - used to build spell dictionary



using copyField for populating a spell field and get dictionary created

referring spellcheck handler in the default search handler at 'last-components' 
section, like below
 
   spellcheck
 

then you will be able to apply search documents filtering and spellcheck params 
to search handler while querying. 

detailed info http://wiki.apache.org/solr/SpellCheckComponent [probably you 
might have already went thru :) ]

-Jeevanandam


On Apr 25, 2012, at 12:01 AM, prakash_ajp wrote:

> I read on a couple of other web pages that fq is not supported for suggester.
> I even tried the query and it doesn't help. My understanding was, when the
> suggest (spellcheck) index is built, only the field chosen is considered for
> queries and the other fields from the main index are not available for
> filtering purposes once the index is created.
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3936144.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Field names w/ leading digits cause strange behavior

2012-04-24 Thread bleakley
When specifying a field name that starts with a digit (or digits) in the "fl"
parameter solr returns both the field name and field value as the those
digits. For example, using nightly build
"apache-solr-4.0-2012-04-24_08-27-47" I run:

java -jar start.jar
and
java -jar post.jar solr.xml monitor.xml

If I then add a field to the field list that starts with a digit (
localhost:8983/solr/select?q=*:*&fl=24 ) the results look like:
...

24

...

if I try fl=24_7 it looks like everything after the underscore is truncated
...

24

...

and if I try fl=3test it looks like everything after the last digit is
truncated
...

3

...

If I have an actual value for that field (say I've indexed 24_7 to be "true"
) I get back that value as well as the behavior above.
...

true
24

...

Is it ok the have fields that start with digits? If so, is there a different
way to specify them using the "fl" parameter? Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936354.html
Sent from the Solr - User mailing list archive at Nabble.com.


embedded solr populating field of type LatLonType

2012-04-24 Thread Jason Cunning
Hi,

I have a question concerning the spatial field type LatLonType and populating 
it via an embedded solr server in java.

So far I've only ever had to index simple types like boolean, float, and 
string. This is the first complex type. So I'd like to use the following field 
definition for example in my schema:



And then I'd like to populate this field in java as in the following psuedo 
code:

public SolrInputDocument populate(AppropriateJavaType coordinate) {

SolrInputField inputField = new SolrInputField("coordinate");
inputField.addValue(coordinate, 1.0F);

SolrInputDocument inputDocument = new SolrInputDocument();
inputDocument.put("coordinate", inputField);

return inputDocument;
}

My question is, what is the AppropriateJavaType for populating a solr field of 
type LatLonType?

Thank you for your time.

Re: correct location in chain for EdgeNGramFilterFactory ?

2012-04-24 Thread Erick Erickson
Well, what effect do you _want_?

I'd probably put it after the PorterStemFilterFactory. As it is, it'll
form a bunch of ngrams, then WordDelimiterFilterFactory will
try to break them up according to _its_ rules and eventually
you'll be sending absolute gibberish to the stemmer. I mean
what is the stemmer going to think of (starting out with running)
ru, run, runn, runni, runnin, running?

I suggest you spend some time with admin/analysis with various
orderings to understand better how all the parts interact.

Best
Erick

On Tue, Apr 24, 2012 at 11:20 AM, geeky2  wrote:
> hello all,
>
> i want to experiment with the EdgeNGramFilterFactory at index time.
>
> i believe this needs to go in post tokenization - but i am doing a pattern
> replace as well as other things.
>
> should the EdgeNGramFilterFactory go in right after the pattern replace?
>
>
>
>
>     positionIncrementGap="100">
>      
>        
>
>
>         words="stopwords.txt" enablePositionIncrements="true"/>
>         replacement="" replace="all"/>
>
> *put EdgeNGramFilterFactory here ===> ?*
>
>         generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> preserveOriginal="1"/>
>        
>         protected="protwords.txt"/>
>        
>      
>      
>        
>         words="stopwords.txt" enablePositionIncrements="true"/>
>         replacement="" replace="all"/>
>         generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
>        
>         protected="protwords.txt"/>
>        
>      
>    
>
> thanks for any help,
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/correct-location-in-chain-for-EdgeNGramFilterFactory-tp3935589p3935589.html
> Sent from the Solr - User mailing list archive at Nabble.com.


faceted searches - design question - facet field not part of qf search fields

2012-04-24 Thread geeky2


hello all,

this is more of a design / newbie question on how others combine faceted
search fields in to their requestHandlers.

say you have a request handler set up like below.

does it make sense (from a design perspective) to add a faceted search field
that is NOT part of the main search fields (itemNo, productType, brand) in
the qf param?

for example, augment the requestHandler below to include a faceted search on
itemDesc?

would this be confusing ? - to be searching across three fields - but
offering faceted suggestions on itemDesc?

just trying to understand how others approach this

thanks

  

  edismax
  all
  10
  itemNo^1.0 productType^.8 brand^.5
  *:*


 

  false

  



  


--
View this message in context: 
http://lucene.472066.n3.nabble.com/faceted-searches-design-question-facet-field-not-part-of-qf-search-fields-tp3936509p3936509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto suggest on indexed file content filtered based on user

2012-04-24 Thread Erick Erickson
I don't know if there is a really good solution here. The problem is that
suggester (and the trunk FST version) simply traverse the terms in
the index. there's not even a real concept of those terms belonging to
any document. Since your security level is on a document basis, that
makes things hard.

How many users do you have? And do you ever expect to search
across more than one user's files? If not, you could consider having
one core per user. Then the suggestions would be correct and since
the searches would be against the user's core, they'd never see
any documents they didn't own.

But that solution has some complexity involved, and if you have a zillion
users it can be difficult to get right.

You could consider having separate (dynamically-defined) fields that
had the suggestion list for each individual user. that would be
administratively easier. Then you suggestions would simply go against
that user's suggestion field (suggestion_user1 e.g.).

None of this is elegant, but this is not an elegant problem given how
Solr is structured.

Best
Erick

On Tue, Apr 24, 2012 at 2:31 PM, prakash_ajp  wrote:
> I read on a couple of other web pages that fq is not supported for suggester.
> I even tried the query and it doesn't help. My understanding was, when the
> suggest (spellcheck) index is built, only the field chosen is considered for
> queries and the other fields from the main index are not available for
> filtering purposes once the index is created.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3936144.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: faceted searches - design question - facet field not part of qf search fields

2012-04-24 Thread Erick Erickson
No problem here at all, it's done all the time. Consider a popular
facet series "in the last day", "in the last week", "in the last month"...
There's no reason you have to facet on the fields that are
searched on.

The user as search terms like "my dog has fleas" and your query
looks like
q=my dog has fleas&fq=timestamp:[NOW/DAY TO NOW/DAY+1DAY]
and the user sees all documents with those terms added since midnight
last night. No confusion at all...

Best
Erick


On Tue, Apr 24, 2012 at 4:28 PM, geeky2  wrote:
>
>
> hello all,
>
> this is more of a design / newbie question on how others combine faceted
> search fields in to their requestHandlers.
>
> say you have a request handler set up like below.
>
> does it make sense (from a design perspective) to add a faceted search field
> that is NOT part of the main search fields (itemNo, productType, brand) in
> the qf param?
>
> for example, augment the requestHandler below to include a faceted search on
> itemDesc?
>
> would this be confusing ? - to be searching across three fields - but
> offering faceted suggestions on itemDesc?
>
> just trying to understand how others approach this
>
> thanks
>
>   default="false">
>    
>      edismax
>      all
>      10
>      itemNo^1.0 productType^.8 brand^.5
>      *:*
>    
>    
>     
>    
>      false
>    
>  
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/faceted-searches-design-question-facet-field-not-part-of-qf-search-fields-tp3936509p3936509.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field names w/ leading digits cause strange behavior

2012-04-24 Thread Erick Erickson
Hmmm, this does NOT happen on 3.6, and it DOES happen on
trunk. Sure sounds like a JIRA to me, would you mind raising one?

I can't imagine this is desired behavior, it's just weird.

Thanks for pointing this out!
Erick

On Tue, Apr 24, 2012 at 3:38 PM, bleakley  wrote:
> When specifying a field name that starts with a digit (or digits) in the "fl"
> parameter solr returns both the field name and field value as the those
> digits. For example, using nightly build
> "apache-solr-4.0-2012-04-24_08-27-47" I run:
>
> java -jar start.jar
> and
> java -jar post.jar solr.xml monitor.xml
>
> If I then add a field to the field list that starts with a digit (
> localhost:8983/solr/select?q=*:*&fl=24 ) the results look like:
> ...
> 
> 24
> 
> ...
>
> if I try fl=24_7 it looks like everything after the underscore is truncated
> ...
> 
> 24
> 
> ...
>
> and if I try fl=3test it looks like everything after the last digit is
> truncated
> ...
> 
> 3
> 
> ...
>
> If I have an actual value for that field (say I've indexed 24_7 to be "true"
> ) I get back that value as well as the behavior above.
> ...
> 
> true
> 24
> 
> ...
>
> Is it ok the have fields that start with digits? If so, is there a different
> way to specify them using the "fl" parameter? Thanks!
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936354.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: faceted searches - design question - facet field not part of qf search fields

2012-04-24 Thread Chris Hostetter
: 
: The user as search terms like "my dog has fleas" and your query
: looks like
: q=my dog has fleas&fq=timestamp:[NOW/DAY TO NOW/DAY+1DAY]
: and the user sees all documents with those terms added since midnight
: last night. No confusion at all...

right ... wether the facets are useful or confusing has nothing to do with 
wether the fields are in your "qf" ... what matters is what you *do* with 
those facet counts once you have them.

if you over the user the ability to filter on a constraint (which is what 
most people do with facet info) then as long as you generate that filter 
using hte same field, as an fq, then everything should make sense.

if instead you just try to add the constraint to your main "q" query 
string, as an additional clause, then that is likely to make no sense at 
all, since the terms from your facet field may not have any bearing on the 
fields you are querying against.


-Hoss


Re: Field names w/ leading digits cause strange behavior

2012-04-24 Thread bleakley
Thank you for verifying the issue. I've created a ticket at
https://issues.apache.org/jira/browse/SOLR-3407

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936599.html
Sent from the Solr - User mailing list archive at Nabble.com.


Title Boosting and IDF

2012-04-24 Thread Tavi Nathanson
Hey everyone,

I field documents by "title" and "body". The title field often has far fewer
terms than the body field. IDF, as a result, will have a profound effect in
the title field compared to the body field.

I currently have the title field boosted by 4x relative to the body field.
While I want matches in the title field to result in higher scores than
matches in the body field, I don't believe I want the title to completely
trump the body. I've seen this happen when a rare term is present in the
title field, and IDF combines with the 4x boost to wreak havoc.

I'd like to get your thoughts on the following:

- Is it standard practice to avoid boosting the title field much, because of
the (generally) high IDF of title field terms?
- Are there other strategies for handling the high IDF of a title field?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Title-Boosting-and-IDF-tp3936709p3936709.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto suggest on indexed file content filtered based on user

2012-04-24 Thread Doug Mittendorf
Another option is to use faceting (via the facet.prefix param) for your 
auto-suggest.  It's not as fast and scalable as using one of the 
Suggester implementations, but it does allow arbitrary fq parameters to 
be included in the request to limit the results.


http://wiki.apache.org/solr/SimpleFacetParameters#Facet_prefix_.28term_suggest.29

Doug

On 04/24/2012 04:30 PM, Erick Erickson wrote:

I don't know if there is a really good solution here. The problem is that
suggester (and the trunk FST version) simply traverse the terms in
the index. there's not even a real concept of those terms belonging to
any document. Since your security level is on a document basis, that
makes things hard.

How many users do you have? And do you ever expect to search
across more than one user's files? If not, you could consider having
one core per user. Then the suggestions would be correct and since
the searches would be against the user's core, they'd never see
any documents they didn't own.

But that solution has some complexity involved, and if you have a zillion
users it can be difficult to get right.

You could consider having separate (dynamically-defined) fields that
had the suggestion list for each individual user. that would be
administratively easier. Then you suggestions would simply go against
that user's suggestion field (suggestion_user1 e.g.).

None of this is elegant, but this is not an elegant problem given how
Solr is structured.

Best
Erick

On Tue, Apr 24, 2012 at 2:31 PM, prakash_ajp  wrote:

I read on a couple of other web pages that fq is not supported for suggester.
I even tried the query and it doesn't help. My understanding was, when the
suggest (spellcheck) index is built, only the field chosen is considered for
queries and the other fields from the main index are not available for
filtering purposes once the index is created.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3936144.html
Sent from the Solr - User mailing list archive at Nabble.com.




QueryElevationComponent and distributed search

2012-04-24 Thread srinir
Hi,

I am using solr 3.6. I saw in Solr wiki that QueryElevationComponent is not
supported for distributed search. 

https://issues.apache.org/jira/browse/SOLR-2949

When I checked the above ticket, it looks like its fixed in Solr 4.0. Does
anyone have any idea when a stable version of solr 4.0 will be released
(approx time frame). If not, are these changes independent to other solr 4.0
changes that i can just copy this patch to my setup for now? I would like to
use solr 3.6 because i would like to use a stable version in production.


Thanks
Srini

--
View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-and-distributed-search-tp3936998p3936998.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto suggest on indexed file content filtered based on user

2012-04-24 Thread prakash_ajp
The first one may not work because the number of users can be big. Besides,
the users can simply register themselves and start using it. It won't work
if an admin has to intervene in the registration process.

The second could work I guess. But the problem would be data duplication as
users might also share permissions to same files and folders. I understand
my requirement is a little complicated.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3937368.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto suggest on indexed file content filtered based on user

2012-04-24 Thread prakash_ajp
Is it true that faceting is case sensitive? That would be disastrous for our
requirement :(

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3937370.html
Sent from the Solr - User mailing list archive at Nabble.com.