Re: need a way so that solr return result for misspelled terms

2011-11-24 Thread meghana
okey, i am not very much aware of it , can i use lucene query parser with
solr and make this fuzzy search possible?

Erik Hatcher-4 wrote
> 
> Sure... if you're using the "lucene" query parser and put a ~ after every
> term in the query :)
> 
> But that would mean that either the users or your application do this.
> 
>   Erik
> 
> On Nov 23, 2011, at 09:03 , meghana wrote:
> 
>> Hi Erik, 
>> 
>> Thanks for your reply. i come to know  that  Lucene provides the fuzzy
>> search by applying tilde("~") symbol at the end of search with like
>> delll~0.8
>> 
>> can we apply such fuzzy logic in solr in any way?
>> 
>> Thanks 
>> Meghana
>> Erik Hatcher-4 wrote
>>> 
>>> Meghana -
>>> 
>>> There's currently no facility in Solr to return results for suggestions
>>> automatically.  You'll have to code this into your client to make
>>> another
>>> request to Solr for the suggestions returned from the first request.
>>> 
>>> Erik
>>> 
>>> On Nov 23, 2011, at 07:58 , meghana wrote:
>>> 
 Hi,
 
 I have configured spellchecker component in my solr. it works with
 custom
 request handler (however its not working with standard request handler
 ,
 but
 this is not concern at now) . but its returning suggestions for the
 matching
 spells, instead of it we want that we can directly get result for
 relative
 spells of misspelled search term.
 
 Can we do this. 
 Any help much appreciated.
 Meghana
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530584.html
 Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530769.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3533046.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Autocomplete(terms) performance problem

2011-11-24 Thread roySolr
Thanks, it looks great!

In the nearby future i will give it a try.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3533066.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Integrating Surround Query Parser

2011-11-24 Thread Erik Hatcher

On Nov 23, 2011, at 09:56 , Ahmet Arslan wrote:

> 
>> is this is the trunk of solr 4.0 ,
>> can't i implement in solr 3.1 .?
> 
> Author of the patch would know answer to this. But why not use trunk?

I spent a fair bit of time yesterday on making a 3.x compatible patch but have 
not completed that work yet.  It's a bit more work because of the dependency in 
the build system.   I may not be able to get back to this for some weeks yet.  
The SurroundQParserPlugin is really all you need to make this work, just need 
to get the compilation bit fixed (as things changed from 3.x to trunk with 
contrib/modules). 

Rahul - if you'd like to see this done, feel free to take a stab at it.  I'll 
tinker with it as I have time.

Erik



Re: need a way so that solr return result for misspelled terms

2011-11-24 Thread Erik Hatcher
The default query parser in Solr is the "lucene" one.  q=term~ 

But there is nothing that automatically makes terms fuzzy with the ~ at the 
end.  (and fuzzy queries only work on individual terms, not terms inside "of 
phrases").

Erik


On Nov 24, 2011, at 03:08 , meghana wrote:

> okey, i am not very much aware of it , can i use lucene query parser with
> solr and make this fuzzy search possible?
> 
> Erik Hatcher-4 wrote
>> 
>> Sure... if you're using the "lucene" query parser and put a ~ after every
>> term in the query :)
>> 
>> But that would mean that either the users or your application do this.
>> 
>>  Erik
>> 
>> On Nov 23, 2011, at 09:03 , meghana wrote:
>> 
>>> Hi Erik, 
>>> 
>>> Thanks for your reply. i come to know  that  Lucene provides the fuzzy
>>> search by applying tilde("~") symbol at the end of search with like
>>> delll~0.8
>>> 
>>> can we apply such fuzzy logic in solr in any way?
>>> 
>>> Thanks 
>>> Meghana
>>> Erik Hatcher-4 wrote
 
 Meghana -
 
 There's currently no facility in Solr to return results for suggestions
 automatically.  You'll have to code this into your client to make
 another
 request to Solr for the suggestions returned from the first request.
 
Erik
 
 On Nov 23, 2011, at 07:58 , meghana wrote:
 
> Hi,
> 
> I have configured spellchecker component in my solr. it works with
> custom
> request handler (however its not working with standard request handler
> ,
> but
> this is not concern at now) . but its returning suggestions for the
> matching
> spells, instead of it we want that we can directly get result for
> relative
> spells of misspelled search term.
> 
> Can we do this. 
> Any help much appreciated.
> Meghana
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530584.html
> Sent from the Solr - User mailing list archive at Nabble.com.
 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530769.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3533046.html
> Sent from the Solr - User mailing list archive at Nabble.com.



complex phrase plugin install

2011-11-24 Thread Rahul Mehta
Hi,

I want to install complex phrase plugin this one.
https://issues.apache.org/jira/browse/SOLR-1604?focusedCommentId=12923982&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12923982

I had done following step and got an error :


   - configure maven path variable in .bashrc
  - http://maven.apache.org/download.html#Installation
   - download the ComplexPhrase.zip
   - run the mvn -e package command in ComplexPhrase Folder
  - [INFO]
  
  - [ERROR] BUILD ERROR
  - [INFO]
  
  - [INFO] Error configuring:
  org.apache.maven.plugins:maven-resources-plugin. Reason: ERROR: Cannot
  override read-only parameter: resources in goal: resources:resources
  - [INFO]
  
  - [INFO] Trace
  - org.apache.maven.lifecycle.LifecycleExecutionException: Error
  configuring: org.apache.maven.plugins:maven-resources-plugin. Reason:
  ERROR: Cannot override read-only parameter: resources in goal:
  resources:resources
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:723)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:556)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:535)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:348)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180)
  -at
  org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328)
  -at
  org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138)
  -at org.apache.maven.cli.MavenCli.main(MavenCli.java:362)
  -at
  org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60)
  -at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
  Method)
  -at
  
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  -at
  
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  -at java.lang.reflect.Method.invoke(Method.java:616)
  -at
  org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)
  -at
  org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
  -at
  org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)
  -at org.codehaus.classworlds.Launcher.main(Launcher.java:375)
  - Caused by: org.apache.maven.plugin.PluginConfigurationException:
  Error configuring:
org.apache.maven.plugins:maven-resources-plugin. Reason:
  ERROR: Cannot override read-only parameter: resources in goal:
  resources:resources
  -at
  
org.apache.maven.plugin.DefaultPluginManager.validatePomConfiguration(DefaultPluginManager.java:1157)
  -at
  
org.apache.maven.plugin.DefaultPluginManager.getConfiguredMojo(DefaultPluginManager.java:705)
  -at
  
org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:468)
  -at
  
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:694)
  -... 17 more
  - [INFO]
  


Please suggest how to solve this error.

-- 
Thanks & Regards

Rahul Mehta


Re: complex phrase plugin install

2011-11-24 Thread Ahmet Arslan
> I want to install complex phrase plugin this one.
> https://issues.apache.org/jira/browse/SOLR-1604?focusedCommentId=12923982&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12923982
> 
> I had done following step and got an error :

'mvn package' works for me. (Apache Maven 3.0.3)


Re: complex phrase plugin install

2011-11-24 Thread meghana
is this for wildcard search  and search for misspell words. i need the same
to do in my application.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/complex-phrase-plugin-install-tp3533123p3533182.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need a way so that solr return result for misspelled terms

2011-11-24 Thread meghana
Hi Erik ,
I am sorry , i did not get you exactly. do you tries to say that tilde (~)
works for single term only.
Say for ex. i have sentence like "i like solr speed for searching." and i
try to search with slor~ , then it will not work bcoz it "inside of phrases"
? or i misunderstood you. plz clarify.



Erik Hatcher-4 wrote
> 
> The default query parser in Solr is the "lucene" one.  q=term~ 
> 
> But there is nothing that automatically makes terms fuzzy with the ~ at
> the end.  (and fuzzy queries only work on individual terms, not terms
> inside "of phrases").
> 
>   Erik
> 
> 
> On Nov 24, 2011, at 03:08 , meghana wrote:
> 
>> okey, i am not very much aware of it , can i use lucene query parser with
>> solr and make this fuzzy search possible?
>> 
>> Erik Hatcher-4 wrote
>>> 
>>> Sure... if you're using the "lucene" query parser and put a ~ after
>>> every
>>> term in the query :)
>>> 
>>> But that would mean that either the users or your application do this.
>>> 
>>> Erik
>>> 
>>> On Nov 23, 2011, at 09:03 , meghana wrote:
>>> 
 Hi Erik, 
 
 Thanks for your reply. i come to know  that  Lucene provides the fuzzy
 search by applying tilde("~") symbol at the end of search with like
 delll~0.8
 
 can we apply such fuzzy logic in solr in any way?
 
 Thanks 
 Meghana
 Erik Hatcher-4 wrote
> 
> Meghana -
> 
> There's currently no facility in Solr to return results for
> suggestions
> automatically.  You'll have to code this into your client to make
> another
> request to Solr for the suggestions returned from the first request.
> 
>   Erik
> 
> On Nov 23, 2011, at 07:58 , meghana wrote:
> 
>> Hi,
>> 
>> I have configured spellchecker component in my solr. it works with
>> custom
>> request handler (however its not working with standard request
>> handler
>> ,
>> but
>> this is not concern at now) . but its returning suggestions for the
>> matching
>> spells, instead of it we want that we can directly get result for
>> relative
>> spells of misspelled search term.
>> 
>> Can we do this. 
>> Any help much appreciated.
>> Meghana
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530584.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530769.html
 Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3533046.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3533198.html
Sent from the Solr - User mailing list archive at Nabble.com.


highlighting on range query

2011-11-24 Thread Rahul Mehta
Hello,

I want to have result of a range query with highlighted Result.

e.g. i have this query
http://localhsot:8983/solr/select?q=field1:[5000%20TO%206000]&fl=field2&hl=on&rows=5&wt=json&indent=on&hl.fl=field3

is not giving any result in hightliting.

Please suggest how can i get the result?

-- 
Thanks & Regards

Rahul Mehta


Re: need a way so that solr return result for misspelled terms

2011-11-24 Thread Ahmet Arslan
> Hi Erik ,
> I am sorry , i did not get you exactly. do you tries to say
> that tilde (~)
> works for single term only.
> Say for ex. i have sentence like "i like solr speed for
> searching." and i
> try to search with slor~ , then it will not work bcoz it
> "inside of phrases"
> ? or i misunderstood you. plz clarify.

Please refer to documentation:

http://wiki.apache.org/solr/SolrQuerySyntax

http://lucene.apache.org/java/3_4_0/queryparsersyntax.html


Re: highlighting on range query

2011-11-24 Thread Ahmet Arslan
> I want to have result of a range query with highlighted
> Result.

http://wiki.apache.org/solr/HighlightingParameters#hl.highlightMultiTerm


Re: Integrating Surround Query Parser

2011-11-24 Thread Rahul Mehta
Okay, thanks for reply.

On Thu, Nov 24, 2011 at 2:35 PM, Erik Hatcher wrote:

>
> On Nov 23, 2011, at 09:56 , Ahmet Arslan wrote:
>
> >
> >> is this is the trunk of solr 4.0 ,
> >> can't i implement in solr 3.1 .?
> >
> > Author of the patch would know answer to this. But why not use trunk?
>
> I spent a fair bit of time yesterday on making a 3.x compatible patch but
> have not completed that work yet.  It's a bit more work because of the
> dependency in the build system.   I may not be able to get back to this for
> some weeks yet.  The SurroundQParserPlugin is really all you need to make
> this work, just need to get the compilation bit fixed (as things changed
> from 3.x to trunk with contrib/modules).
>
> Rahul - if you'd like to see this done, feel free to take a stab at it.
>  I'll tinker with it as I have time.
>
>Erik
>
>


-- 
Thanks & Regards

Rahul Mehta


Re: highlighting on range query

2011-11-24 Thread Rahul Mehta
Hi Ahmet,

I passed &hl.highlightMultiTerm=true in request ,* but still field1 is not
coming in hightlighting.*

http://localhsot:8983/solr/select?q=field1:[5000%20TO%206000]&fl=field2&hl=on&rows=5&wt=json&indent=on&hl.fl=field3&hl.highlightMultiTerm=true

I am using solr 3.1.

is i need to install the patch ? or any thing else i need to do ?






On Thu, Nov 24, 2011 at 3:36 PM, Ahmet Arslan  wrote:

> > I want to have result of a range query with highlighted
> > Result.
>
> http://wiki.apache.org/solr/HighlightingParameters#hl.highlightMultiTerm
>



-- 
Thanks & Regards

Rahul Mehta


Re: highlighting on range query

2011-11-24 Thread Ahmet Arslan
> I passed &hl.highlightMultiTerm=true in request ,* but
> still field1 is not
> coming in hightlighting.*
> 
> http://localhsot:8983/solr/select?q=field1:[5000%20TO%206000]&fl=field2&hl=on&rows=5&wt=json&indent=on&hl.fl=field3&hl.highlightMultiTerm=true
> 

As wiki says "If the SpanScorer is also being used..." which means you need to 
add &hl.usePhraseHighlighter=true too.


Re: highlighting on range query

2011-11-24 Thread Rahul Mehta
oh sorry forgot to tell you that i added &hl.usePhraseHighlighter=true this
also , but still no result is coming .

On Thu, Nov 24, 2011 at 5:14 PM, Ahmet Arslan  wrote:

> > I passed &hl.highlightMultiTerm=true in request ,* but
> > still field1 is not
> > coming in hightlighting.*
> >
> >
> http://localhsot:8983/solr/select?q=field1:[5000%20TO%206000]&fl=field2&hl=on&rows=5&wt=json&indent=on&hl.fl=field3&hl.highlightMultiTerm=true
> >
>
> As wiki says "If the SpanScorer is also being used..." which means you
> need to add &hl.usePhraseHighlighter=true too.
>



-- 
Thanks & Regards

Rahul Mehta


Re: highlighting on range query

2011-11-24 Thread Ahmet Arslan
> oh sorry forgot to tell you that i
> added &hl.usePhraseHighlighter=true this
> also , but still no result is coming .

Did you specify field1 in hl.fl parameter?

Plus you need you mark field1 as indexed="true" and stored="true" to enable 
highlighting.

http://wiki.apache.org/solr/FieldOptionsByUseCase



Re: highlighting on range query

2011-11-24 Thread Rahul Mehta
Yes, I tried with specifiying hl.fl=field1, and field1 is indexed and
stored.


On Thu, Nov 24, 2011 at 5:23 PM, Ahmet Arslan  wrote:

> > oh sorry forgot to tell you that i
> > added &hl.usePhraseHighlighter=true this
> > also , but still no result is coming .
>
> Did you specify field1 in hl.fl parameter?
>
> Plus you need you mark field1 as indexed="true" and stored="true" to
> enable highlighting.
>
> http://wiki.apache.org/solr/FieldOptionsByUseCase
>
>


-- 
Thanks & Regards

Rahul Mehta


Re: highlighting on range query

2011-11-24 Thread Rahul Mehta
Any other Suggestion.

On Thu, Nov 24, 2011 at 5:30 PM, Rahul Mehta wrote:

> Yes, I tried with specifiying hl.fl=field1, and field1 is indexed and
> stored.
>
>
> On Thu, Nov 24, 2011 at 5:23 PM, Ahmet Arslan  wrote:
>
>> > oh sorry forgot to tell you that i
>> > added &hl.usePhraseHighlighter=true this
>> > also , but still no result is coming .
>>
>> Did you specify field1 in hl.fl parameter?
>>
>> Plus you need you mark field1 as indexed="true" and stored="true" to
>> enable highlighting.
>>
>> http://wiki.apache.org/solr/FieldOptionsByUseCase
>>
>>
>
>
> --
> Thanks & Regards
>
> Rahul Mehta
>
>
>
>


-- 
Thanks & Regards

Rahul Mehta


inconsistent JVM crash with version 4.0-SNAPSHOT

2011-11-24 Thread Lasse Aagren
Hi,

We are running Solr-Lucene 4.0-SNAPSHOT (1199777M - hudson - 2011-11-09 
14:58:50) on severel servers running:

64bit Debian Squeeze (6.0.3)
OpenJDK6 (b18-1.8.9-0.1~squeeze1)
Tomcat 6.028 (6.0.28-9+squeeze1)

Some of the servers have 48G RAM and in that case java have 16G (-Xmx16g) and 
some of the servers have 96G RAM and in that case java have 48G (-Xmx48G).

We are seeing some inconsistent crashes of tomcat's JVM under different 
Solr/Lucene operations/circumstances. Sadly we can't replicate it. 

It doesn't happen often, but often enough that we can't rely on it in 
production.

When it happens, something like the following appears in the logs:

==
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f6c318d0902, pid=16516, tid=139772378892032
#
# JRE version: 6.0_18-b18
# Java VM: OpenJDK 64-Bit Server VM (14.0-b16 mixed mode linux-amd64 )
# Derivative: IcedTea6 1.8.9
# Distribution: Debian GNU/Linux 6.0.2 (squeeze), package 
6b18-1.8.9-0.1~squeeze1
# Problematic frame:
# j  
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(Lorg/apache/lucene/index/IndexReader$AtomicReaderContext;Lorg/apache/lucene/util/Bits;)Lorg/apache/lucene/search/DocIdSet;+193
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid16516.log
#
# If you would like to submit a bug report, please include
# instructions how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
#
==

Every time it happens the problematic frame is:

Problematic frame:
# j  
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(Lorg/apache/lucene/index/IndexReader$AtomicReaderContext;Lorg/apache/lucene/util/Bits;
)Lorg/apache/lucene/search/DocIdSet;+193

And /tmp/hs_err_pid16516.log is attached to this mail.

Has anyone seen this before? 

Please don't hesitate to ask for further specification about our setup.

Best regards,
-- 
Lasse  Aagren
DTU Library
---
Technical University of Denmark
Technical Information Center of Denmark
Anker Engelunds Vej 1
Building 101D
2800 Kgs. Lyngby
Direct +45 45257229
Mobile +45 40516542
l...@dtic.dtu.dk
http://www.dtic.dtu.dk/







Index a null text field

2011-11-24 Thread jawedshamshedi
Hi all,

I am indexing a table that has a field by the name of solr_keywords of type
text in mysql. And it contains null values also. While creating index in
solr, this field is not getting indexed.

Any help will be appreciated. 

Thanks


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-a-null-text-field-tp3533636p3533636.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr-user@lucene.apache.org

2011-11-24 Thread Tomasz Wegrzanowski
On 22 November 2011 14:28, Jan Høydahl  wrote:
> Why do you need spaces in the replacement?
>
> Try pattern="\+" replacement="plus" - it will cause the transformed 
> charstream to contain as many tokens as the original and avoid the 
> highlighting crash.

I tried that, it still crashes.

Replacing it with single character, including single non-ASCII
character, doesn't cause a crash.

I'm sort of tempted to just use reuse some CJK character, and synonym filter
it to mean "plus".


highlighting performance poor with *.tar, *.gz files

2011-11-24 Thread Shyam Bhaskaran
Hi,

It is observed that highlighting of search results is taking too much time 
especially for highlighting terms for archived files like *.gz, *.tar, *.zip.
What could be the reason behind it ? Is it because these files are unzipped and 
then highlighted from the index during display time ?
Or is it dependent on the size of the file ? Is there any way by which the 
search & highlighter performance improves for these kind of archived files 
(*.tar, *.zip etc)

Let me know if there is any workaround for improving the highlighting and 
search performance for these kind of files?

-Shyam


Fwd: Clustering and FieldType

2011-11-24 Thread Geetu Ambwani


Sent from my iPhone

Begin forwarded message:

> From: Geetu Ambwani 
> Date: November 23, 2011 2:52:38 PM EST
> To: solr-user-i...@lucene.apache.org
> Subject: Clustering and FieldType
> 

> Hi
> Trying to use carrot2 for clustering search results. I have it setup except 
> it seems to treat the field as regular text instead of applying some custom 
> filters I have. 
> 
> So my schema says something like
>  omitNorms="true"/>
>  compressed="true"/>
>  
> ic_text is our internal fieldtype with some custom analysers that strip out 
> certain special characters from the text. 
> 
> My solrconfig has something like this setup in our default search handler. 
> true
> default
> true
> 
> title
> 
> content
> 
> In my search results, I see clusters but the labels on these clusters have 
> the special characters in them - which means that the clustering must be 
> running on raw text and not on the "ic_text" field. 
> Can someone let me know if this is the default setup and if there is a way to 
> fix this ?
> Thanks !
> Geetu
> 


Re: Huge Performance: Solr distributed search

2011-11-24 Thread Artem Lokotosh
>> Can you merge, e.g. 3 shards together or is it much effort for your 
>> team?>Yes, we can merge. We'll try to do this and review how it will works
Merge does not help :(I've tried to merge two shards in one, three
shards in one, but results are similar to results first configuration
with 30 shardsbut this solution have an one big minus the optimization
proccess may take more time
>>In our setup we currently have 16 shards with ~30GB each, but we 
>>rarely>>search in all of them at once
How many documents per shards in your setup?Any difference between
Tomcat, Jetty or other?
Have you configured your servlet more specifically than default configuration?


On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh  wrote:
>> Is this log from the frontend SOLR (aggregator) or from a shard?
> from aggregator
>
>> Can you merge, e.g. 3 shards together or is it much effort for your team?
> Yes, we can merge. We'll try to do this and review how it will works
> Thanks, Dmitry
>
> Any another ideas?
>
> On Wed, Nov 23, 2011 at 4:01 PM, Dmitry Kan  wrote:
>> Hello,
>>
>> Is this log from the frontend SOLR (aggregator) or from a shard?
>> Can you merge, e.g. 3 shards together or is it much effort for your team?
>>
>> In our setup we currently have 16 shards with ~30GB each, but we rarely
>> search in all of them at once.
>>
>> Best,
>> Dmitry
>>
>> On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh  wrote:
>>
> --
> Best regards,
> Artem Lokotosh        mailto:arco...@gmail.com
>

-- 
Best regards,
Artem Lokotosh        mailto:arco...@gmail.com


Re: Huge Performance: Solr distributed search

2011-11-24 Thread Artem Lokotosh
>How big are the documents you return (how many fields, avg KB per doc, etc.)?
I have a following schema in my solr configuration
27M–30M docs and 12-15 GB for each shard, 0.5KB per doc
>Does performance get much better if you only request top 100, or top>10 
>documents instead of top 1000?
 |10 |100 |   1000 |2000
-|---|||
MIN  |   124 |146 |237 | 747
AVG  |   832 |   4666 |  16130 |   72542
MAX  |  3602 |  30197 |  57339 |  159482
QUERIES/5MIN |75 | 73 | 49 |  51
>>What if you only request a couple fields, instead of fl=*?>>What if you only 
>>search 10 shards instead of 30?
Results are similar to table above, btw I need to recieve all fields from shards
Another one problem.I use solrmeter or simple bash script to check the
search speed.I've got QTime from 16K to 24K for first ~20 queriesfrom
50K to 100K for next ~20 queries and until servlet goes down

On Wed, Nov 23, 2011 at 5:55 PM, Robert Stewart  wrote:
> If you request 1000 docs from each shard, then aggregator is really
> fetching 30,000 total documents, which then it must merge (re-sort
> results, and take top 1000 to return to client).  Its possible that
> SOLR merging implementation needs optimized, but it does not seem like
> it could be that slow.  How big are the documents you return (how many
> fields, avg KB per doc, etc.)?  I would take a look at network to make
> sure that is not some bottleneck, and also to make sure there is not
> some underlying issue making 30 concurrent HTTP requests from the
> aggregator.  I am not an expert in Java, but under .NET there is a
> setting that limits concurrent out-going HTTP requests from a process
> that must be over-ridden via configuration, otherwise by default is
> very limiting.
>
> Does performance get much better if you only request top 100, or top
> 10 documents instead of top 1000?
>
> What if you only request a couple fields, instead of fl=*?
>
> What if you only search 10 shards instead of 30?
>
> I would collect those numbers and try to determine if time increases
> linearly or not as you increase shards and/or # of docs.
>
>
>
>
>
> On Wed, Nov 23, 2011 at 9:55 AM, Artem Lokotosh  wrote:
>>> If the response time from each shard shows decent figures, then aggregator> 
>>> seems to be a bottleneck. Do you btw have a lot of concurrent users?For now 
>>> is not a problem, but we expect from 1K to 10K of concurrent users and 
>>> maybe more
>> On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan  wrote:
>>> If the response time from each shard shows decent figures, then aggregator
>>> seems to be a bottleneck. Do you btw have a lot of concurrent users?
>>>
>>> On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh  wrote:
>>>
 > Is this log from the frontend SOLR (aggregator) or from a shard?
 from aggregator

 > Can you merge, e.g. 3 shards together or is it much effort for your team?
 Yes, we can merge. We'll try to do this and review how it will works
 Thanks, Dmitry

 Any another ideas?

>>
>> --
>> Best regards,
>> Artem Lokotosh        mailto:arco...@gmail.com
>>
>



-- 
Best regards,
Artem Lokotosh        mailto:arco...@gmail.com


Attempting to achieve something similar to PostgreSQL's pg_trgm / K-NN combo with Solr

2011-11-24 Thread Matt Patterson
Hello,

I'm working on using trigrams for similarity matching on some data, where 
there's a canonical name and lots of personalised variants, e.g.:

canonical: "My Wonderful Thing"
variant: "My Wonderful Thing (for Matt Patterson)"

Using the pg_trgm 
(http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1#Extensions) index 
type and the K-Nearest-Neighbour operator in Postgres 9.1 I get pretty good 
results, and I want to do something similar using Solr - for one it feels like 
there's a lot more room to tweak and optimise this than with Postgres. Being 
new to Solr, I'm a little unsure about exactly what to do. I've set up a test 
Solr instance using a configuration like this: https://gist.github.com/1391468.

This is working, in as much as it's returning results, but the data set I'm 
working with is somewhat polluted, and even with regular manual cleaning 
probably will always be a bit polluted. So, we have names in the data like:

"My Wonderful Thing"
"My Wonderful Thing (for Somebody Else)"
"My Wonderful Thing (for Yet Another Person)"

I really want the canonical version to be returned first in the results list, 
and the setup I have now is returning results like:

* "My Wonderful Thing (for Somebody Else)"
* "My Wonderful Thing (for Yet Another Person)"
* "My Wonderful Thing"
* "Other name with Wonderful or Thing in it"

With the Postgres pg_trgm index and <-> K-NN operator I get results like

* "My Wonderful Thing"
* "My Wonderful Thing (for Somebody Else)"
* "My Wonderful Thing (for Yet Another Person)"
* "Other name with Wonderful or Thing in it"

Which is better, and I guess the difference is to do with the way that the 
distance between search term and results are calculated. 

So, is there something I can do to change the way ranking is calculated? Also, 
is there a good place to start reading about this kind of similarity searching 
and Solr?  Everything I've looked at so far seems to cover this kind of n-gram 
approach very lightly at best.

Thanks,

Matt

RE: Index a null text field

2011-11-24 Thread Young, Cody
Hello,

We'll need more information please. How are you indexing the documents?
DataImportHandler? Xml Updates?

Can you show us the relevant parts of your schema? (Field definition and
data type for the field)

Are you getting any error messages in the log files?

Tell us more about your environment. Windows? Linux?

Thanks,
Cody

-Original Message-
From: jawedshamshedi [mailto:jawedshamsh...@gmail.com] 
Sent: Thursday, November 24, 2011 5:38 AM
To: solr-user@lucene.apache.org
Subject: Index a null text field

Hi all,

I am indexing a table that has a field by the name of solr_keywords of
type text in mysql. And it contains null values also. While creating
index in solr, this field is not getting indexed.

Any help will be appreciated. 

Thanks


--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-a-null-text-field-tp3533636p353
3636.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: WordDelimiterFilter MultiPhraseQuery case insesitive Issue

2011-11-24 Thread Uomesh
Hi,

I tried with preserveOriginal="1" and reindex too but still no result.

Thanks,
Umesh

On Wed, Nov 23, 2011 at 5:33 PM, Shawn Heisey-4 [via Lucene] <
ml-node+s472066n3532405...@n3.nabble.com> wrote:

> On 11/23/2011 2:54 PM, Uomesh wrote:
>
> > Hi,
> >
> > case insesitive search is not working if I use WordDelimiterFilter
> > splitOnCaseChange="1"
> >
> > I am searching for word norton and here is result
> >
> > norton: returns result
> > Norton: returns result
> > but
> > nOrton: no results
> >
> > I want nOrton should results. Please help. below is my field type.
>
> Try adding preserveOriginal="1" to your WDF options.  You may not need
> to actually reindex before you see results, but it would be a good idea
> to reindex.  This will result in an increase in your index size.
>
> Thanks,
> Shawn
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/WordDelimiterFilter-MultiPhraseQuery-case-insesitive-Issue-tp3532209p3532405.html
>  To unsubscribe from WordDelimiterFilter MultiPhraseQuery case insesitive
> Issue, click 
> here
> .
> NAML
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/WordDelimiterFilter-MultiPhraseQuery-case-insesitive-Issue-tp3532209p3534518.html
Sent from the Solr - User mailing list archive at Nabble.com.

solrQueryParser defaultOperator

2011-11-24 Thread toto
Hi,
I install Apache solr and integrate it on a drupal website. Everythings
works perfectly. The default search operator is OR, so I changed it in my
schema.xml as :



But, it seems no working. For example, when I search : "bakery california",
solr return all the results contains "bakery" OR "california". 

Is there any solution for fix it?

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrQueryParser-defaultOperator-tp3534984p3534984.html
Sent from the Solr - User mailing list archive at Nabble.com.


remove answers with identical scores

2011-11-24 Thread Fred Zimmerman
I have a corpus that has a lot of identical or nearly identical documents.
I'd like to return only the unique ones (excluding the "nearly identical"
which are redirects).  I notice that all the identical/nearly identicals
have identical Solr scores. How can I tell Solr to  throw out all the
successive documents in an answer set that have identical scores?

doc 1 score 5.0
doc 2  score 5.0
doc 3 score 5.0
doc 4 score 4.9

skip docs 2 and 3

bring back 10 docs with unique scores


Re: Huge Performance: Solr distributed search

2011-11-24 Thread Mark Miller
On Thu, Nov 24, 2011 at 12:09 PM, Artem Lokotosh  wrote:

> >How big are the documents you return (how many fields, avg KB per doc,
> etc.)?
> I have a following schema in my solr configuration name="field1" type="text" indexed="true" stored="false"/> name="field2" type="text" indexed="true" stored="true"/> name="field3" type="text" indexed="true" stored="true"/> name="field4" type="tlong" indexed="true" stored="true"/> name="field5" type="tdate" indexed="true" stored="true"/> name="field6" type="text" indexed="true" stored="true"/> name="field7" type="text" indexed="true" stored="true"/> name="field8" type="tlong" indexed="true" stored="true"/> name="field9" type="text" indexed="true" stored="true"/> name="field10" type="tdate" indexed="true" stored="true"/> name="field11" type="text" indexed="true" stored="true"/> name="id" type="string" indexed="true" stored="true"
> required="true"/>
> 27M–30M docs and 12-15 GB for each shard, 0.5KB per doc
> >Does performance get much better if you only request top 100, or top>10
> documents instead of top 1000?
>  |10 |100 |   1000 |2000
> -|---|||
> MIN  |   124 |146 |237 | 747
> AVG  |   832 |   4666 |  16130 |   72542
> MAX  |  3602 |  30197 |  57339 |  159482
> QUERIES/5MIN |75 | 73 | 49 |  51
> >>What if you only request a couple fields, instead of fl=*?>>What if you
> only search 10 shards instead of 30?
> Results are similar to table above, btw I need to recieve all fields from
> shards
> Another one problem.I use solrmeter or simple bash script to check the
> search speed.I've got QTime from 16K to 24K for first ~20 queriesfrom
> 50K to 100K for next ~20 queries and until servlet goes down
>
> On Wed, Nov 23, 2011 at 5:55 PM, Robert Stewart 
> wrote:
> > If you request 1000 docs from each shard, then aggregator is really
> > fetching 30,000 total documents, which then it must merge (re-sort
> > results, and take top 1000 to return to client).  Its possible that
> > SOLR merging implementation needs optimized, but it does not seem like
> > it could be that slow.  How big are the documents you return (how many
> > fields, avg KB per doc, etc.)?  I would take a look at network to make
> > sure that is not some bottleneck, and also to make sure there is not
> > some underlying issue making 30 concurrent HTTP requests from the
> > aggregator.  I am not an expert in Java, but under .NET there is a
> > setting that limits concurrent out-going HTTP requests from a process
> > that must be over-ridden via configuration, otherwise by default is
> > very limiting.
> >
> > Does performance get much better if you only request top 100, or top
> > 10 documents instead of top 1000?
> >
> > What if you only request a couple fields, instead of fl=*?
> >
> > What if you only search 10 shards instead of 30?
> >
> > I would collect those numbers and try to determine if time increases
> > linearly or not as you increase shards and/or # of docs.
> >
> >
> >
> >
> >
> > On Wed, Nov 23, 2011 at 9:55 AM, Artem Lokotosh 
> wrote:
> >>> If the response time from each shard shows decent figures, then
> aggregator> seems to be a bottleneck. Do you btw have a lot of concurrent
> users?For now is not a problem, but we expect from 1K to 10K of concurrent
> users and maybe more
> >> On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan 
> wrote:
> >>> If the response time from each shard shows decent figures, then
> aggregator
> >>> seems to be a bottleneck. Do you btw have a lot of concurrent users?
> >>>
> >>> On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh 
> wrote:
> >>>
>  > Is this log from the frontend SOLR (aggregator) or from a shard?
>  from aggregator
> 
>  > Can you merge, e.g. 3 shards together or is it much effort for your
> team?
>  Yes, we can merge. We'll try to do this and review how it will works
>  Thanks, Dmitry
> 
>  Any another ideas?
> 
> >>
> >> --
> >> Best regards,
> >> Artem Lokotoshmailto:arco...@gmail.com
> >>
> >
>
>
>
> --
> Best regards,
> Artem Lokotoshmailto:arco...@gmail.com
>


When you search each shard, are you positive that you are using all of the
same parameters? You are sure you are hitting request handlers that are
configured exactly the same and sending exactly the same queries?

I'm my experience, the overhead for distrib search is usually very low.

What types of queries are you trying?

-- 
- Mark

http://www.lucidimagination.com


RE: Index a null text field

2011-11-24 Thread jawedshamshedi
Hi Cody,

Thanks for the reply.

Please find the detail of that I am doing. 

Yes, I am using dataimport handler and the code snippet of it from
solrconfig.xml is given below.



data-config.xml



The data-config.xml is give below.




 












schema.xml


 


   
 
 
 
  
 
  

 

 
 un_id

 
 ST_Name

he date type in mysql is given below.

keyword text
start_bidprice  float(12,2)
end_datedatetime
start_bidprice  float(12,2)
start_date  datetime


for some fields that are simple float, there index are being created. I also
added this in data-config.xml's url zeroDateTimeBehavior=convertToNull but
no avail.

Please help Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-a-null-text-field-tp3533636p3535376.html
Sent from the Solr - User mailing list archive at Nabble.com.


server down caused by complex query

2011-11-24 Thread Jason, Kim
Hi all

Nowadays our solr server is frequently down.
Because our user send very long and complex queries with asterisk and near
operator.
Sometimes near operator exceeds 1,000 and keywords almost include asterisk.
If such query is sent to server, jvm memory is full. (our jvm memory
allocates 110G.)
After that, server is like down.

We also have old version's k2 engine.
But k2 is not down for same query.
k2 uses more i/o than memory.

Could we control solr memory usage?
Or is there any other solution?
(we are using solr1.4)

Thanks in advance.
Jason

--
View this message in context: 
http://lucene.472066.n3.nabble.com/server-down-caused-by-complex-query-tp3535506p3535506.html
Sent from the Solr - User mailing list archive at Nabble.com.