Re: Solr - Index Concurrency - Is it possible to have multiple threads write to same index?

2012-08-23 Thread Mikhail Khludnev
I know the following drawbacks of EmbServer:

   - org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams()
   which is called on handling update request, provides a lot of garbage in
   memory and bloat it by expensive XML.
   - 
org.apache.solr.response.BinaryResponseWriter.getParsedResponse(SolrQueryRequest,
   SolrQueryResponse) does something like this on response side - it just
   bloat your heap

for me your task is covered by Multiple Cores. Anyway if you are ok with
EmbeddedServer let it be. Just be aware of stream updates feature
http://wiki.apache.org/solr/ContentStream

my average indexing speed estimate is for fairly small docs less than 1K
(which are always used for micro-benchmarking).

Much analysis is the key argument for invoking updates in multiple threads.
What's your CPU stat during indexing?




On Thu, Aug 23, 2012 at 7:52 AM, ksu wildcats wrote:

> Thanks for the reply Mikhail.
>
> For our needs the speed is more important than flexibility and we have huge
> text files (ex: blogs / articles ~2 MB size) that needs to be read from our
> filesystem and then store into the index.
>
> We have our app creating separate core per client (dynamically) and there
> is
> one instance of EmbeddedSolrServer for each core thats used for adding
> documents to the index.
> Each document has about 10 fields and one of the field has ~2MB data stored
> (stored = true, analyzed=true).
> Also we have logic built into our webapp to dynamically create the solr
> config files
> (solrConfig & schema per core - filters/analyzers/handler values can be
> different for each core)
> for each core before creating an instance of EmbeddedSolrServer for that
> core.
> Another reason to go with EmbeddedSolrServer is to reduce overhead of
> transporting large data (~2 MB) over http/xml.
>
> We use this setup for building our master index which then gets replicated
> to slave servers
> using replication scripts provided by solr.
> We also have solr admin ui integrated into our webapp (using admin jsp &
> handlers from solradmin ui)
>
> We have been using this MultiCore setup for more than a year now and so far
> we havent run into any issues with EmbeddedSolrServer integrated into our
> webapp.
> However I am now trying to figure out the impact if we allow multiple
> threads sending request to EmbeddedSolrServer (same core) for adding docs
> to
> index simultaneously.
>
> Our understanding was that EmbeddedSolrServer would give us better
> performance over http solr for our needs.
> Its quite possible that we might be wrong and http solr would have given us
> similar/better performance.
>
> Also based on documentation from SolrWiki I am assuming that
> EmbeddedSolrServer API is same as the one used by Http Solr.
>
> Said that, can you please tell if there is any specific downside to using
> EmbeddedSolrServer that could cause issues for us down the line.
>
> I am also interested in your below comment about indexing 1 million docs in
> few mins. Ideally we would like to get to that speed
> I am assuming this depends on the size of the doc and type of
> analyzer/tokenizer/filters being used. Correct?
> Can you please share (or point me to documentation) on how to get this
> speed
> for 1 mil docs.
> >>  - one million is a fairly small amount, in average it should be indexed
> >> in few mins. I doubt that you really need to distribute indexing
>
> Thanks
> -K
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Index-Concurrency-Is-it-possible-to-have-multiple-threads-write-to-same-index-tp4002544p4002776.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: Querying top n of each facet value

2012-08-23 Thread Tanguy Moal
Hello Kiran,

I think you can try turning grouping on and group "on", and ask solr to
group on the "Category" field.

Nevertheless, this will *not* ensure you that groups are returned in facet
counts order. This will *not* ensure you the mincount per group neither.

Hope this helps,

--
Tanguy

2012/8/23 Kiran Jayakumar 

> Hi everyone,
>
> I am building an auto complete feature, which facets by a field called
> "Category". I want to return a minimum number of documents per facet (say
> min=1 & max=5).
>
> The facet output is something like
>
> Category
> A: 500
> B: 10
> C: 2
>
> By default, it is returning 10 documents of category A.
>
> I want it to return a total of 10 documents, with 1 document atleast for
> each facet value. Is it possible to accomplish that with a single query ?
>
> Thanks
>


Can't extract Outlook message files

2012-08-23 Thread Alexander Cougarman
Hi. We're trying to use the following Curl command to perform an "extract only" 
of *.MSG file, but it blows up:

   curl "http://localhost:8983/solr/update/extract?extractOnly=true"; -F 
"myfile=@92.msg"

If we do this, it works fine:

  curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; 
-F "myfile=@92.msg"

We've tried a variety of MSG files and they all produce the same error; they 
all have content in them. What are we doing wrong?

Here's the exception the extractOnly=true command generates:




Error 500 null

org.apache.solr.common.SolrException
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:233)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:58)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:244)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:365)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:260)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
99)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
82)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
66)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
52)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
2)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
n.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
ava:582)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@aaf063
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
20)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:227)
... 23 more
Caused by: java.lang.IllegalStateException: Internal: Internal error: element st
ate is zero.
at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
wn Source)
at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
ler.java:256)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
java:273)
at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
er.java:213)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
:178)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
... 26 more


HTTP ERROR 500
Problem accessing /solr/update/extract. Reason:
null

org.apache.solr.common.SolrException
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:233)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:58)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(Requ

Re: javabin binary format specification

2012-08-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
There is no spec documented anywhere . It is all in this single file
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java

On Wed, Jul 25, 2012 at 6:47 PM, Ahmet Arslan  wrote:

> > Sorry, but I could not find any spec on the binary format
> > SolrJ is
> > using. Can you point me to an URL if any?
>
> may be this?
> https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/BinaryResponseWriter.java
>



-- 
-
Noble Paul


in solr4.0 where to set dataimport.properties

2012-08-23 Thread rayvicky
i am not find where to set dataimport.properties in solr4.0 beta 
who can help me ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/in-solr4-0-where-to-set-dataimport-properties-tp4002766.html
Sent from the Solr - User mailing list archive at Nabble.com.


where to set dataimport.properties in solr4.0 beta

2012-08-23 Thread rayvicky
dataimport.properties :
interval=1
port=
server=localhost
doc.id=
params=/select?qt\=/dataimport&command\=delta-import&clean\=false&commit\=true
webapp=solr
syncEnabled=1
last_index_time=2012-06-27 13\:05\:18
doc.last_index_time=2012-06-27 13\:05\:18
syncCores=

how to set these in solr4.0 , Please be specific, thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/where-to-set-dataimport-properties-in-solr4-0-beta-tp4002767.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search is slow for URL fields of type String.

2012-08-23 Thread Karthick Duraisamy Soundararaj
Srini,
Whats the size of your index? You are saying that searching on
'string' fieldType takes 400 milli seconds but did you try searching on any
other fieldType other than string? If so, how much time did it take?

On Wed, Aug 22, 2012 at 10:35 AM, srinalluri  wrote:

> This is string fieldType:
>
>  />
>
> These are the filelds using 'string' fieldType:
>
>multiValued="true" />
>multiValued="true" />
>
> And this the sample query:
> /select/?q=url:http\://
> www.foxbusiness.com/personal-finance/2012/08/10/social-change-coming-from-gas-prices-to-rent-prices-and-beyond/
> AND image_url:*
>
> Each query like this taking around 400 milli seconds. What are the change I
> can do to the fieldType to improve query performance?
>
> thanks
> Srini
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/search-is-slow-for-URL-fields-of-type-String-tp4002662.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

2012-08-23 Thread Erick Erickson
First, I'm no Tomcat expert here's the Tomcat Solr
page, but you've probably already seen it:
http://wiki.apache.org/solr/SolrTomcat

But I'm guessing that you may have old jars around
somewhere and things are getting confused. I'd
blow away the whole thing and start over, whenever
I start copying jars around I always lose track of
what's where.

Have you successfully had any other Solr operate
under Tomcat?

Sorry I can't be more help
Erick

On Wed, Aug 22, 2012 at 9:47 AM, Claudio Ranieri
 wrote:
> Hi,
>
> I tried to start the solr-4.0.0-BETA with tomcat-6.0.20 but does not work.
> I copied the apache-solr-4.0.0-BETA.war to $TOMCAT_HOME/webapps. Then I 
> copied the directory apache-solr-4.0.0-BETA\example\solr to 
> C:\home\solr-4.0-beta and adjusted the file 
> $TOMCAT_HOME\conf\Catalina\localhost\apache-solr-4.0.0-BETA.xml to point the 
> solr/home to C:/home/solr-4.0-beta. With this configuration, when I startup 
> tomcat I got:
>
> SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion 
> 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22, LUCENE_23, 
> LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_31, LUCENE_32, LUCENE_33, LUCENE_34, 
> LUCENE_35, LUCENE_36, LUCENE_CURRENT ] or a string in format 'VV'
>
> So I changed the line in solrconfig.xml:
>
> LUCENE_40
>
> to
>
> LUCENE_CURRENT
>
> So I got a new error:
>
> Caused by: java.lang.ClassNotFoundException: solr.NRTCachingDirectoryFactory
>
> This class is within the file apache-solr-core-4.0.0-BETA.jar but for some 
> reason classloader of the class is not loaded. I then moved all jars in 
> $TOMCAT_HOME\webapps\apache-solr-4.0.0-BETA\WEB-INF\lib to $TOMCAT_HOME\lib.
> After this setup, I got a new error:
>
> SEVERE: java.lang.ClassCastException: 
> org.apache.solr.core.NRTCachingDirectoryFactory can not be cast to 
> org.apache.solr.core.DirectoryFactory
>
> So I changed the line in solrconfig.xml:
>
>  
> class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>
> to
>
>  
> class="${solr.directoryFactory:solr.NIOFSDirectoryFactory}"/>
>
> So I got a new error:
>
> Caused by: java.lang.ClassCastException: 
> org.apache.solr.spelling.DirectSolrSpellChecker can not be cast to 
> org.apache.solr.spelling.SolrSpellChecker
>
> How can I resolve the problem of classloader?
> How can I resolve the problem of cast of NRTCachingDirectoryFactory and 
> DirectSolrSpellChecker?
> I can not startup the solr 4.0 beta with tomcat.
> Thanks,
>
>
>
>


Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

2012-08-23 Thread Karthick Duraisamy Soundararaj
Not sure if this can help. But once I had a similar problem with Solr 3.6.0
where tomcat refused to find one of the classes that existed. I deleted the
tomcat's webapp directory and then it worked fine.

On Thu, Aug 23, 2012 at 8:19 AM, Erick Erickson wrote:

> First, I'm no Tomcat expert here's the Tomcat Solr
> page, but you've probably already seen it:
> http://wiki.apache.org/solr/SolrTomcat
>
> But I'm guessing that you may have old jars around
> somewhere and things are getting confused. I'd
> blow away the whole thing and start over, whenever
> I start copying jars around I always lose track of
> what's where.
>
> Have you successfully had any other Solr operate
> under Tomcat?
>
> Sorry I can't be more help
> Erick
>
> On Wed, Aug 22, 2012 at 9:47 AM, Claudio Ranieri
>  wrote:
> > Hi,
> >
> > I tried to start the solr-4.0.0-BETA with tomcat-6.0.20 but does not
> work.
> > I copied the apache-solr-4.0.0-BETA.war to $TOMCAT_HOME/webapps. Then I
> copied the directory apache-solr-4.0.0-BETA\example\solr to
> C:\home\solr-4.0-beta and adjusted the file
> $TOMCAT_HOME\conf\Catalina\localhost\apache-solr-4.0.0-BETA.xml to point
> the solr/home to C:/home/solr-4.0-beta. With this configuration, when I
> startup tomcat I got:
> >
> > SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion
> 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22, LUCENE_23,
> LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_31, LUCENE_32, LUCENE_33,
> LUCENE_34, LUCENE_35, LUCENE_36, LUCENE_CURRENT ] or a string in format 'VV'
> >
> > So I changed the line in solrconfig.xml:
> >
> > LUCENE_40
> >
> > to
> >
> > LUCENE_CURRENT
> >
> > So I got a new error:
> >
> > Caused by: java.lang.ClassNotFoundException:
> solr.NRTCachingDirectoryFactory
> >
> > This class is within the file apache-solr-core-4.0.0-BETA.jar but for
> some reason classloader of the class is not loaded. I then moved all jars
> in $TOMCAT_HOME\webapps\apache-solr-4.0.0-BETA\WEB-INF\lib to
> $TOMCAT_HOME\lib.
> > After this setup, I got a new error:
> >
> > SEVERE: java.lang.ClassCastException:
> org.apache.solr.core.NRTCachingDirectoryFactory can not be cast to
> org.apache.solr.core.DirectoryFactory
> >
> > So I changed the line in solrconfig.xml:
> >
> >  >
> class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
> >
> > to
> >
> >  >
> class="${solr.directoryFactory:solr.NIOFSDirectoryFactory}"/>
> >
> > So I got a new error:
> >
> > Caused by: java.lang.ClassCastException:
> org.apache.solr.spelling.DirectSolrSpellChecker can not be cast to
> org.apache.solr.spelling.SolrSpellChecker
> >
> > How can I resolve the problem of classloader?
> > How can I resolve the problem of cast of NRTCachingDirectoryFactory and
> DirectSolrSpellChecker?
> > I can not startup the solr 4.0 beta with tomcat.
> > Thanks,
> >
> >
> >
> >
>



-- 
--
Karthick D S
Master's in Computer Engineering ( Software Track )
Syracuse University
Syracuse - 13210
New York
United States of America


Re: Edismax parser weird behavior

2012-08-23 Thread Erick Erickson
What do you get when you specify &debugQuery=on (&debug=query in 4.x)?
In other words, what does the parsed query look like?

Best
Erick

On Wed, Aug 22, 2012 at 8:13 AM, amitesh116  wrote:
> Hi I am experiencing 2 strange behavior in edismax:
> edismax is configured to behave default OR (using mm=0)
> Total there are 700 results
> 1. Search for *auto* = *50 results*
>Search for *NOT auto* it gives *651 results*.
> Mathematically, it should give only 650 results for *NOT auto*.
>
> 2. Search for *auto*  = 50 results
>  Search for *car =  100 results*
> Search for *auto and car = 10 results*
> Since we have set mm=0, it should behave like OR and results for auto and
> car would be more than 100 at least
>
> Please help me, understand these two issues. Are these normal behavior? Do I
> need to tweak the query? Or do I need to look into config or scheam xml
> files.
>
> Thanks in Advance
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Boosting documents matching in a specific shard

2012-08-23 Thread Husain, Yavar
I am aware that IDF is not distributed. Suppose I have to boost or give higher 
rank to documents which are matching in a specific/particular shard, how can I 
accomplish that?
**
 
This message may contain confidential or proprietary information intended only 
for the use of the 
addressee(s) named above or may contain information that is legally privileged. 
If you are 
not the intended addressee, or the person responsible for delivering it to the 
intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly 
prohibited. If you have received this message by mistake, please immediately 
notify us by 
replying to the message and delete the original message and any copies 
immediately thereafter. 

Thank you.- 
**
FAFLD



Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-23 Thread Erick Erickson
Tom:

I thin my comments were that grouping on a field where there was
a unique value _per document_ chewed up a lot of resources.
Conceptually, there's a bucket for each unique group value. And
grouping on a file path is just asking for trouble.

But the memory used for grouping should max as a function of
the unique values in the grouped field.

Best
Erick

On Wed, Aug 22, 2012 at 11:32 PM, Lance Norskog  wrote:
> Yes, distributed grouping works, but grouping takes a lot of
> resources. If you can avoid in distributed mode, so much the better.
>
> On Wed, Aug 22, 2012 at 3:35 PM, Tom Burton-West  wrote:
>> Thanks Tirthankar,
>>
>> So the issue in memory use for sorting.  I'm not sure I understand how
>> sorting of grouping fields  is involved with the defaults and field
>> collapsing, since the default sorts by relevance not grouping field.  On
>> the other hand I don't know much about how field collapsing is implemented.
>>
>> So far the few tests I've made haven't revealed any memory problems.  We
>> are using very small string fields for grouping and I think that we
>> probably only have a couple of cases where we are grouping more than a few
>> thousand docs.   I will try to find a query with a lot of docs per group
>> and take a look at the memory use using JConsole.
>>
>> Tom
>>
>>
>> On Wed, Aug 22, 2012 at 4:02 PM, Tirthankar Chatterjee <
>> tchatter...@commvault.com> wrote:
>>
>>>  Hi Tom,
>>>
>>> We had an issue where we are keeping millions of docs in a single node and
>>> we were trying to group them on a string field which is nothing but full
>>> file path… that caused SOLR to go out of memory…
>>>
>>> ** **
>>>
>>> Erick has explained nicely in the thread as to why it won’t work and I had
>>> to find another way of architecting it. 
>>>
>>> ** **
>>>
>>> How do you think this is different in your case. If you want to group by a
>>> string field with thousands of similar entries I am guessing you will face
>>> the same issue. 
>>>
>>> ** **
>>>
>>> Thanks,
>>>
>>> Tirthankar
>>> ***Legal Disclaimer***
>>> "This communication may contain confidential and privileged material for
>>> the
>>> sole use of the intended recipient. Any unauthorized review, use or
>>> distribution
>>> by others is strictly prohibited. If you have received the message in
>>> error,
>>> please advise the sender by reply email and delete the message. Thank you."
>>> **
>>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com


Re: search is slow for URL fields of type String.

2012-08-23 Thread Erick Erickson
There was just a thread on this, it's may be your
&image_url:*

try removing this clause just to test response time. If
that shows a vast improvement, try adding a boolean
field has_image_url, and then add a fq clause like
&fq=has_image_url:true

Best
Erick

On Wed, Aug 22, 2012 at 10:35 AM, srinalluri  wrote:
> This is string fieldType:
>
> 
>
> These are the filelds using 'string' fieldType:
>
>multiValued="true" />
>multiValued="true" />
>
> And this the sample query:
> /select/?q=url:http\://www.foxbusiness.com/personal-finance/2012/08/10/social-change-coming-from-gas-prices-to-rent-prices-and-beyond/
> AND image_url:*
>
> Each query like this taking around 400 milli seconds. What are the change I
> can do to the fieldType to improve query performance?
>
> thanks
> Srini
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/search-is-slow-for-URL-fields-of-type-String-tp4002662.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.6.1: query performance is slow when asterisk is in the query

2012-08-23 Thread Erick Erickson
Maybe you can spoof this by using an "fq" clause instead?
as &fq=body:*?

The first one will be slow, but after that it'll use the filterCache.

FWIW,
Erick

On Wed, Aug 22, 2012 at 4:51 PM, david3s  wrote:
> Ok, I'll take your suggestion, but I would still be really happy if the
> wildcard searches behaved a little more intelligent (body:* not looking for
> everything in the body). More like when you do "q=*:*" it doesn't really
> search for everything in every field.
>
> Thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-3-6-1-query-performance-is-slow-when-asterisk-is-in-the-query-tp4002496p4002743.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Query regarding multi core search

2012-08-23 Thread ravicv
Hi,

How sorting is done in SOLR with multiple cores .. say 20 cores.. because in
multi core search it should search in all cores then on complete results it
should sort... please correct me if i am wrong.

In our scenario we are executing same query on 4 cores and finally sorting
the results based on one field. It works good. But i want to implement
similar thing with in SOLR .  can any one suggest me some code or blog
regarding this?

I have tried some approaches , but it takes more memory :(

Thanks,
Ravi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-regarding-multi-core-search-tp4002847.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search is slow for URL fields of type String.

2012-08-23 Thread Jack Krupansky
And we should probably add a doc note with this same advice since it is an 
easy "mistake" to make.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Thursday, August 23, 2012 8:44 AM
To: solr-user@lucene.apache.org
Subject: Re: search is slow for URL fields of type String.

There was just a thread on this, it's may be your
&image_url:*

try removing this clause just to test response time. If
that shows a vast improvement, try adding a boolean
field has_image_url, and then add a fq clause like
&fq=has_image_url:true

Best
Erick

On Wed, Aug 22, 2012 at 10:35 AM, srinalluri  wrote:

This is string fieldType:

/>


These are the filelds using 'string' fieldType:

  
  

And this the sample query:
/select/?q=url:http\://www.foxbusiness.com/personal-finance/2012/08/10/social-change-coming-from-gas-prices-to-rent-prices-and-beyond/
AND image_url:*

Each query like this taking around 400 milli seconds. What are the change 
I

can do to the fieldType to improve query performance?

thanks
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-is-slow-for-URL-fields-of-type-String-tp4002662.html
Sent from the Solr - User mailing list archive at Nabble.com. 




Re: Solr 3.6.1: query performance is slow when asterisk is in the query

2012-08-23 Thread lboutros
You could add a default value in your field via the schema :



and then your query could be :

-body:mynuvalue

but I prefer the Chris's solution which is what I usually do.

Ludovic.







-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-1-query-performance-is-slow-when-asterisk-is-in-the-query-tp4002496p4002872.html
Sent from the Solr - User mailing list archive at Nabble.com.


Porting Lucene Index to Solr: ERROR:SCHEMA-INDEX-MISMATCH

2012-08-23 Thread Petra Lehrmann

Hello all!

I already posted this question to Stackoverflow 
(http://stackoverflow.com/questions/12027451/solr-net-query-argumentexception-in-windows-forms) 
but as is, either my question is too specific or just too trivial. I 
don't know. But maybe I'm just trying to put the cart before the horse.


I have a C# Windows Forms application up and running. It uses the 
Lucene.Net Library with which I created a Lucen index (off of a 
Postgres-database). There are some articles which have more than one 
value, so I decided to take numeric fields into account and used them in 
my application as:


|  var  valueField=  new  NumericField(internalname,  Field.Store.YES,  true);
 valueField.SetDoubleValue(value);
 doc.Add(valueField);

|

|I can open my Lucene index in Luke and see all those nice fields I 
made, so there should be no problem with the index; plus: my application 
searches and displays the result sets of the lucene index quite fine.|

||
So I thought about trying Solr and read that I could use the Lucene 
index at hand - I just had to edit the schema.xml file from Solr, which 
I did. For the numeric field variables of the Lucene index I read 
somewhere that I have to use TrieFields, so I updated my schema.xml as 
follows:


|
[...]


[...]

|||

For those fields, which use the numeric field I had the TrieDoubleField 
of Solr in mind and changed them. Firing up Solr on Tomcat and hitting 
it with a search query as "LampCount:1" returned all the right 
documents. But the xml-output always says:


|
ERROR:SCHEMA-INDEX-MISMATCH,stringValue=1

|


This could be the problem why my C# application is not running properly 
(using the solrnet library as the brigde between Solr instance and 
application) and always throws an ArgumentException when hitting my 
solrnet implementation with:


|  var  results=  solr.Query("LampCount:1");
|

But first things first: I'm not sure why there is this Index-Mismatch 
and how to solve it -maybe I just didn't understand the explanation of 
TrieFields or the port from NumericFields?


Any help would be greatly appreciated. :)

Greetings from Germany,

Petra




|

|



Error java.lang.NoSuchFieldError: rsp when using jteam spatial search module

2012-08-23 Thread leberknecht
Hi guys, i'm getting an error when using the GeoDistance component from jteam
(http://info.orange11.nl ) when fireing a query with a spatial tag:

error:
~~~
SEVERE: java.lang.NoSuchFieldError: rsp
at
nl.jteam.search.solrext.spatial.GeoDistanceComponent.process(GeoDistanceComponent.java:64)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
[...] 
(for full stacktrace see http://pastebin.com/rqMYZDb5)

solrconfig.xml:
~

lat
lng


As it seems, the GeoDistance component is loaded correctly and its "process"
method is called, but the passed ResponseBuilder object is missing the rsp
field. According the ResponseBuilder.java source, the rsp-member is
definied, so i'm wondering what the heck is going on here...(i'm not very
experienced with java, i thougth that either the passes param is of wrong
type or it should have the members definied in the class...) 
I googled and found some other dudes with the same problem, but i didn't
found the solution for this. 
Solr 3.6.1, jteam spatial module 2.0RC4, GeoDistance process-method:

public class GeoDistanceComponent extends SearchComponent {
[...]
public void process(ResponseBuilder responseBuilder) throws IOException {
[...]
Map idsByDocument = new HashMap();
SolrDocumentList documentList = SolrUtil.docListToSolrDocumentList(
responseBuilder.getResults().docList,
responseBuilder.req.getSearcher(),
responseBuilder.rsp.getReturnFields(),
idsByDocument);

this is the line that produces the error. I get the same error on tomcat7
and jetty.

Can anyone help / give a hint? If you need more information let me know.

Cheers :)
leber



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-java-lang-NoSuchFieldError-rsp-when-using-jteam-spatial-search-module-tp4002884.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr4 BETA "group.ngroups" count

2012-08-23 Thread Lenzner
Hello,

I have a problem using grouped queries and the 'group.ngroups' parameter.
When I run the following request

/select?q=&group=true&group.field=personId&group.ngroups=true&wt=xml

the response looks like this:


  
11
6

  
106.12345
...
  
  
106.12312
...
  
  
101.12313
...
  
  
101.12312
...
  

  


I expected, that the ngroups results in 4, because it is the total count 
of all groups, that match my query.

The result of 'matches' is right, and the 11 docs are distributed on the 4 
groups of my response, but I have no idea, 
what ngroups is counting in this case.
Can anybody explain to me, what's the meaning of ngroups is?

regards
Norman Lenzner

Re: The way to customize ranking?

2012-08-23 Thread Karthick Duraisamy Soundararaj
Hi
 You might add an int  field "Search Rule" that identifies the type of
search.
 example
Search Rule  Description
 0  Unpaid Search
 1  Paid Search - Rule 1
 2  Paid Serch - Rule 2

You can use filterqueries (http://wiki.apache.org/solr/CommonQueryParameters)
 like fq:  Search Rule :[1 TO *]

Alternatively, You can even use a boolean field to identify whether or not
a search is paid and then an addtitional field that identifies the type of
 paid search.

--
karthick

On Thu, Aug 23, 2012 at 11:16 AM, Nicholas Ding wrote:

> Hi
>
> I'm working on Solr to build a local business search in China. We have a
> special requirement from advertiser. When user makes a search, if the
> results contain paid advertisements, those ads need to be moved on the top
> of results. For different ads, they have detailed rules about which comes
> first.
>
> Could anyone offer me some suggestions how I customize the ranking based on
> my requirement?
>
> Thanks
> Nicholas
>


Re: The way to customize ranking?

2012-08-23 Thread Nicholas Ding
Thank you, but I don't want to filter those ads.

For example, when user make a search like q=Car
Result list:
1. Ford Automobile (score 10)
2. Honda Civic (score 9)
...
...
...
99. Paid Ads (score 1, Ad has own field to identify it's an Ad)

What I want to find is a way to make the score of "Paid Ads" higher than
"Ford Automobile". Basically, the result structure will look like

- [Paid Ads Section]
[Most valuable Ads 1]
[Most valuable Ads 2]
[Less valuable Ads 1]
[Less valuable Ads 2]
- [Relevant Results Section]


On Thu, Aug 23, 2012 at 11:33 AM, Karthick Duraisamy Soundararaj <
karthick.soundara...@gmail.com> wrote:

> Hi
>  You might add an int  field "Search Rule" that identifies the type of
> search.
>  example
> Search Rule  Description
>  0  Unpaid Search
>  1  Paid Search - Rule
> 1
>  2  Paid Serch - Rule 2
>
> You can use filterqueries (
> http://wiki.apache.org/solr/CommonQueryParameters)
>  like fq:  Search Rule :[1 TO *]
>
> Alternatively, You can even use a boolean field to identify whether or not
> a search is paid and then an addtitional field that identifies the type of
>  paid search.
>
> --
> karthick
>
> On Thu, Aug 23, 2012 at 11:16 AM, Nicholas Ding  >wrote:
>
> > Hi
> >
> > I'm working on Solr to build a local business search in China. We have a
> > special requirement from advertiser. When user makes a search, if the
> > results contain paid advertisements, those ads need to be moved on the
> top
> > of results. For different ads, they have detailed rules about which comes
> > first.
> >
> > Could anyone offer me some suggestions how I customize the ranking based
> on
> > my requirement?
> >
> > Thanks
> > Nicholas
> >
>


Re: search is slow for URL fields of type String.

2012-08-23 Thread Erik Hatcher
Also note, emphasizing what Erick said below, that with this type of "has a 
value in field" type clause, it works fine as an fq as that gets cached so you 
only take the performance hit once on it.  Generally this is a clause that is 
reused so be sure to peel it off as an fq rather than AND'ing it to a q(uery) 
parameter.  

The advice to make a separate has_ field (or field_size) is the best 
advice, but when dealing with low cardinality fields it's not really an issue 
to use something like category:* where there are only a handful of category 
values in use.

Erik

On Aug 23, 2012, at 08:51 , Jack Krupansky wrote:

> And we should probably add a doc note with this same advice since it is an 
> easy "mistake" to make.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Erick Erickson
> Sent: Thursday, August 23, 2012 8:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: search is slow for URL fields of type String.
> 
> There was just a thread on this, it's may be your
> &image_url:*
> 
> try removing this clause just to test response time. If
> that shows a vast improvement, try adding a boolean
> field has_image_url, and then add a fq clause like
> &fq=has_image_url:true
> 
> Best
> Erick
> 
> On Wed, Aug 22, 2012 at 10:35 AM, srinalluri  wrote:
>> This is string fieldType:
>> 
>>
>> 
>> These are the filelds using 'string' fieldType:
>> 
>>  > multiValued="true" />
>>  > multiValued="true" />
>> 
>> And this the sample query:
>> /select/?q=url:http\://www.foxbusiness.com/personal-finance/2012/08/10/social-change-coming-from-gas-prices-to-rent-prices-and-beyond/
>> AND image_url:*
>> 
>> Each query like this taking around 400 milli seconds. What are the change I
>> can do to the fieldType to improve query performance?
>> 
>> thanks
>> Srini
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/search-is-slow-for-URL-fields-of-type-String-tp4002662.html
>> Sent from the Solr - User mailing list archive at Nabble.com. 
> 



RE: Cloud assigning incorrect port to shards

2012-08-23 Thread Buttler, David
I am using the jetty container from the example.  The only thing I have done is 
change the schema to match up my documents rather than the example

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Wednesday, August 22, 2012 5:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Cloud assigning incorrect port to shards

What container are you using?

Sent from my iPhone

On Aug 22, 2012, at 3:14 PM, "Buttler, David"  wrote:

> Hi,
> I have set up a Solr 4 beta cloud cluster.  I have uploaded a config 
> directory, and linked it with a configuration name.
> 
> I have started two solr on two computers and added a couple of shards using 
> the Core Admin function on the admin page.
> 
> When I go to the admin cloud view, the shards all have the computer name and 
> port attached to them.  BUT, the port is the default port (8983), and not the 
> port that I assigned on the command line.  I can still connect to the correct 
> port, and not the reported port.  I anticipate that this will lead to errors 
> when I get to doing distributed query, as zookeeper seems to be collecting 
> incorrect information.
> 
> Any thoughts as to why the incorrect port is being stored in zookeeper?
> 
> Thanks,
> Dave


Data Import Handler - Could not load driver - com.microsoft.sqlserver.jdbc.SQLServerDriver - SOLR 4 Beta

2012-08-23 Thread awb3667
Hello,

I was able to get the DIH working in SOLR 3.6.1 (placed the sqljdbc4.jar
file in the lib directory, etc). Everything worked great. Tried to get
everything working in SOLR 4 beta (on the same dev machine connecting to
same db, etc) and was unable to due to the sql driver not loading.

What i've done:
1. SOLR 4 admin comes up fine(configured solrconfig.xml and schema.xml)
2. Dropped the sqljdbc4.jar in the lib directory
3. Added sqljdbc4.jar to classpath
4. Added dataimporthandler to solrconfig.xml:



5. Even tried jtds which also gave me errors that the driver could not be
loaded.

Here is my datasource in the data-config.xml (DIH config file):



Here is the error i get when trying to use jdbc connector: 
SEVERE: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
load driver: com.microsoft.sqlserver.jdbc.SQLServerDriver Processing
Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
load driver: com.microsoft.sqlserver.jdbc.SQLServerDriver Processing
Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Could not load driver: com.microsoft.sqlserver.jdbc.SQLServerDriver
Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
at
org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:114)
at
org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:62)
at
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:354)
at
org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:99)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.init(SqlEntityProcessor.java:53)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:74)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:430)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
... 5 more
Caused by: java.lang.ClassNotFoundException: Unable to load
com.microsoft.sqlserver.jdbc.SQLServerDriver or
org.apache.solr.handler.dataimport.com.microsoft.sqlserver.jdbc.SQLServerDriver
at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:899)
at
org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:112)
... 12 more
Caused by: org.apache.solr.common.SolrException: Error loading class
'com.microsoft.sqlserver.jdbc.SQLServerDriver'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:438)
at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:889)
... 13 more
Caused by: java.lang.ClassNotFoundException:
com.microsoft.sqlserver.jdbc.SQLServerDriver
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:422)
... 14 more



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Could-not-load-driver-com-microsoft-sqlserver-jdbc-SQLServerDriver-SOLR-4-Beta-tp4002902.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error java.lang.NoSuchFieldError: rsp when using jteam spatial search module

2012-08-23 Thread leberknecht
...FYI: works fine with 3.6.0 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-java-lang-NoSuchFieldError-rsp-when-using-jteam-spatial-search-module-tp4002884p4002904.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.0 beta : Is collection1 hard coded somewhere?

2012-08-23 Thread Tom Burton-West
I removed the string "collection1" from my solr.xml file in solr home and
modified my solr.xml file as follows:

  

  
Then I restarted Solr.

However, I keep getting messages about
"Can't find resource 'solrconfig.xml' in classpath or
'/l/solrs/dev/solrs/4.0/1/collection1/conf/'"
And the log messages show that Solr is trying to create the collection1
instance

"Aug 23, 2012 12:06:02 PM org.apache.solr.core.CoreContainer create
INFO: Creating SolrCore 'collection1' using instanceDir:
/l/solrs/dev/solrs/4.0/3/collection1
Aug 23, 2012 12:06:02 PM org.apache.solr.core.SolrResourceLoader "
I think somehow the previous solr.xml configuration is being stored on disk
somewhere and loaded.

Any clues?

Tom


Re: need help understanding an issue with scoring

2012-08-23 Thread geeky2
update:

as an experiment - i changed the query to a wildcard (9030*) instead of an
explicit value (9030)

example:

QUERY="http://$SERVER.intra.searshc.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearch&q=9030*&rows=2000&debugQuery=on&fl=*,score";

this resulted in a results list that appears much more rational from a sort
order perspective -

however - the wildcard query is not acceptable from a performance stand
point.

any input or illumination would be appreciated ;)

thank you

itemNo, score, rankNo, partCnt

  [9030],1.0,10353,1
[90302   ],1.0,6849,1
[9030P   ],1.0,444,1
[903093  ],1.0,51,1
[9030430 ],1.0,47,1
[9030],1.0,37,1
[903057-9010 ],1.0,26,1
[903061-9010 ],1.0,20,1
[903046-9010 ],1.0,18,1
[903056-9010 ],1.0,14,1
[903095  ],1.0,14,1
[90303-MR1-000   ],1.0,14,1
[903097-9050 ],1.0,12,1
[903046-9011 ],1.0,12,1
[903097-9010 ],1.0,11,1
[903097-9040 ],1.0,11,1
[903063-9100 ],1.0,6,1
[903066-9011 ],1.0,6,1
[903098  ],1.0,3,1




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002919.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help understanding an issue with scoring

2012-08-23 Thread geeky2
looks like the original complete list of the results did not get attached to
this thread 

here is a snippet of the list.

what i am trying to demonstrate, is the difference in scoring and
ultimately, sorting - and the breadth of documents (a few hundred) between
the two documents of interest (9030 and 90302)

thank you,

itemNo, score, rankNo, partCnt

  [9030],12.014701,10353,1
[9030],12.014701,37,1
[9030],12.014701,1,1
[9030   ],12.014701,0,167
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[9030],12.014701,0,1
[PC-9030],7.509188,0,169
[58-9030 ],7.509188,0,1
[9030-1R ],7.509188,0,1
[903028-9030 ],7.509188,0,1
[903139-9030 ],7.509188,0,1
[903091-9030 ],7.509188,0,1
[903099-9030 ],7.509188,0,1
[903153-9030 ],7.509188,0,1
[031-9030],7.509188,0,1
[308-9030],7.509188,0,1
[9030-6010   ],7.509188,0,1
[9030-6010   ],7.509188,0,1
[9030-6006   ],7.509188,0,1
[9030-6008   ],7.509188,0,1
[9030-6008   ],7.509188,0,1
[9030-6001   ],7.509188,0,1
[9030-6003   ],7.509188,0,1
[9030-6006   ],7.509188,0,1
[208568-9030 ],7.509188,0,1
[79-9030 ],7.509188,0,1
[33-9030 ],7.509188,0,1
[M-9030  ],7.509188,0,1

... a few hundred more ...

[LGQ9030PQ1 ],0.41475832,0,150
[LEQ9030PQ0 ],0.41475832,0,124
[LEQ9030PQ1 ],0.41475832,0,123
[CWE9030BCE ],0.41475832,0,115
[PJDS9030Z   ],0.29327843,0,1
[8A-CT9-030-010  ],0.29327843,0,1
[RDT9030A],0.29327843,0,1
[PJDG9030Z   ],0.29327843,0,1
[90302   ],0.20737916,6849,1
~   



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002922.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 beta : Is collection1 hard coded somewhere?

2012-08-23 Thread Tom Burton-West
I did not describe the problems correctly.

I have 3 solr shards with solr homes .../solrs/4.0/1  .../solrs/4.0/2 and
.../solrs/4.0/2solrs/3

For shard 1 I have a solr.xml file with the modifications described in the
previous message.  For that instance, it appears that the problem is that
the semantics of specifing the instancedir have changed between 3.6 and
4.0.

I specified the instancedir as  instanceDir="."

However, I get this error in the log:

"Cannot create directory: /l/solrs/dev/solrs/4.0/1/./data/index"

Note that instead of using Solr home /l/solrs/dev/solrs/4.0/1 (what I would
expect for the relative path "."), that Solr appears to be appending "." to
Solr home.
The solr.xml file says that paths are relative to the "installation
directory".  Perhaps that needs to be clarified in the file.


For shards 2 and 3, I tried not using a solr.xml file and I did not create
a "collection1" subdirectory.  For these solr instances, I got the messages
about collection1 and files not being found in the $SOLRHOME/collection1
path

" Can't find resource 'solrconfig.xml' in classpath or
'/l/solrs/dev/solrs/4.0/3/collection1/conf/',
cwd=/l/local/apache-tomcat-dev"
Looking at the logs it appears that "collection1" is specified as the
default core somewhere:

Aug 23, 2012 12:42:47 PM org.apache.solr.core.CoreContainer$Initializer
initialize
INFO: looking for solr.xml: /l/solrs/dev/solrs/4.0/3/solr.xml
Aug 23, 2012 12:42:47 PM org.apache.solr.core.CoreContainer 
INFO: New CoreContainer 1281149250
Aug 23, 2012 12:42:47 PM org.apache.solr.core.CoreContainer$Initializer
initialize
INFO: no solr.xml file found - using default
Aug 23, 2012 12:42:47 PM org.apache.solr.core.CoreContainer load
INFO: Loading CoreContainer using Solr Home: '/l/solrs/dev/solrs/4.0/3/'
Aug 23, 2012 12:42:47 PM org.apache.solr.core.SolrResourceLoader 
INFO: Creating SolrCore 'collection1' using instanceDir:
/l/solrs/dev/solrs/4.0/3/collection1

Is this default of "collection1" specified in some other config file or
hardcoded into Solr somewhere?

If using a core is mandatory with Solr 4.0 , the CoreAdmin wiki page and
the release notes should point this out.


Tom


Re: The way to customize ranking?

2012-08-23 Thread Karthick Duraisamy Soundararaj
I cant think of a way you can achieve this in one request. Can you make two
different solr requests? If so, you can make on with fq=PaidSearch:0 &
other with fq=padidSearch:[1:*] .

On Thu, Aug 23, 2012 at 11:45 AM, Nicholas Ding wrote:

> Thank you, but I don't want to filter those ads.
>
> For example, when user make a search like q=Car
> Result list:
> 1. Ford Automobile (score 10)
> 2. Honda Civic (score 9)
> ...
> ...
> ...
> 99. Paid Ads (score 1, Ad has own field to identify it's an Ad)
>
> What I want to find is a way to make the score of "Paid Ads" higher than
> "Ford Automobile". Basically, the result structure will look like
>
> - [Paid Ads Section]
> [Most valuable Ads 1]
> [Most valuable Ads 2]
> [Less valuable Ads 1]
> [Less valuable Ads 2]
> - [Relevant Results Section]
>
>
> On Thu, Aug 23, 2012 at 11:33 AM, Karthick Duraisamy Soundararaj <
> karthick.soundara...@gmail.com> wrote:
>
> > Hi
> >  You might add an int  field "Search Rule" that identifies the type
> of
> > search.
> >  example
> > Search Rule  Description
> >  0  Unpaid Search
> >  1  Paid Search -
> Rule
> > 1
> >  2  Paid Serch -
> Rule 2
> >
> > You can use filterqueries (
> > http://wiki.apache.org/solr/CommonQueryParameters)
> >  like fq:  Search Rule :[1 TO *]
> >
> > Alternatively, You can even use a boolean field to identify whether or
> not
> > a search is paid and then an addtitional field that identifies the type
> of
> >  paid search.
> >
> > --
> > karthick
> >
> > On Thu, Aug 23, 2012 at 11:16 AM, Nicholas Ding  > >wrote:
> >
> > > Hi
> > >
> > > I'm working on Solr to build a local business search in China. We have
> a
> > > special requirement from advertiser. When user makes a search, if the
> > > results contain paid advertisements, those ads need to be moved on the
> > top
> > > of results. For different ads, they have detailed rules about which
> comes
> > > first.
> > >
> > > Could anyone offer me some suggestions how I customize the ranking
> based
> > on
> > > my requirement?
> > >
> > > Thanks
> > > Nicholas
> > >
> >
>


RES: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

2012-08-23 Thread Claudio Ranieri
I made this instalation on a new tomcat.
With Solr 3.4.*, 3.5.*, 3.6.* works with jars into 
$TOMCAT_HOME/webapps/solr/WEB-INF/lib, but with solr 4.0 beta doesn´t work. I 
needed to add the jars into $TOMCAT_HOME/lib.
The problem with the cast seems to be in the source code. 


-Mensagem original-
De: Karthick Duraisamy Soundararaj [mailto:karthick.soundara...@gmail.com] 
Enviada em: quinta-feira, 23 de agosto de 2012 09:22
Para: solr-user@lucene.apache.org
Assunto: Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

Not sure if this can help. But once I had a similar problem with Solr 3.6.0 
where tomcat refused to find one of the classes that existed. I deleted the 
tomcat's webapp directory and then it worked fine.

On Thu, Aug 23, 2012 at 8:19 AM, Erick Erickson wrote:

> First, I'm no Tomcat expert here's the Tomcat Solr page, but 
> you've probably already seen it:
> http://wiki.apache.org/solr/SolrTomcat
>
> But I'm guessing that you may have old jars around somewhere and 
> things are getting confused. I'd blow away the whole thing and start 
> over, whenever I start copying jars around I always lose track of 
> what's where.
>
> Have you successfully had any other Solr operate under Tomcat?
>
> Sorry I can't be more help
> Erick
>
> On Wed, Aug 22, 2012 at 9:47 AM, Claudio Ranieri 
>  wrote:
> > Hi,
> >
> > I tried to start the solr-4.0.0-BETA with tomcat-6.0.20 but does not
> work.
> > I copied the apache-solr-4.0.0-BETA.war to $TOMCAT_HOME/webapps. 
> > Then I
> copied the directory apache-solr-4.0.0-BETA\example\solr to 
> C:\home\solr-4.0-beta and adjusted the file 
> $TOMCAT_HOME\conf\Catalina\localhost\apache-solr-4.0.0-BETA.xml to 
> point the solr/home to C:/home/solr-4.0-beta. With this configuration, 
> when I startup tomcat I got:
> >
> > SEVERE: org.apache.solr.common.SolrException: Invalid 
> > luceneMatchVersion
> 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22, 
> LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_31, LUCENE_32, 
> LUCENE_33, LUCENE_34, LUCENE_35, LUCENE_36, LUCENE_CURRENT ] or a string in 
> format 'VV'
> >
> > So I changed the line in solrconfig.xml:
> >
> > LUCENE_40
> >
> > to
> >
> > LUCENE_CURRENT
> >
> > So I got a new error:
> >
> > Caused by: java.lang.ClassNotFoundException:
> solr.NRTCachingDirectoryFactory
> >
> > This class is within the file apache-solr-core-4.0.0-BETA.jar but 
> > for
> some reason classloader of the class is not loaded. I then moved all 
> jars in $TOMCAT_HOME\webapps\apache-solr-4.0.0-BETA\WEB-INF\lib to 
> $TOMCAT_HOME\lib.
> > After this setup, I got a new error:
> >
> > SEVERE: java.lang.ClassCastException:
> org.apache.solr.core.NRTCachingDirectoryFactory can not be cast to 
> org.apache.solr.core.DirectoryFactory
> >
> > So I changed the line in solrconfig.xml:
> >
> >  >
> class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
> >
> > to
> >
> >  >
> class="${solr.directoryFactory:solr.NIOFSDirectoryFactory}"/>
> >
> > So I got a new error:
> >
> > Caused by: java.lang.ClassCastException:
> org.apache.solr.spelling.DirectSolrSpellChecker can not be cast to 
> org.apache.solr.spelling.SolrSpellChecker
> >
> > How can I resolve the problem of classloader?
> > How can I resolve the problem of cast of NRTCachingDirectoryFactory 
> > and
> DirectSolrSpellChecker?
> > I can not startup the solr 4.0 beta with tomcat.
> > Thanks,
> >
> >
> >
> >
>



--
--
Karthick D S
Master's in Computer Engineering ( Software Track ) Syracuse University 
Syracuse - 13210 New York United States of America


Re: Solr 4.0 beta : Is collection1 hard coded somewhere?

2012-08-23 Thread Tom Burton-West
The answer is yes.   "collection1" is defined as the default core name in
CoreContainer.java on line 94 or so.   I have opened a jira issue for this
and other issues related to the documentation of solr.xml and Solr core
configuration issues for Solr 4.0

https://issues.apache.org/jira/browse/SOLR-3753

On Thu, Aug 23, 2012 at 1:04 PM, Tom Burton-West  wrote:

> I did not describe the problems correctly.
>
> I have 3 solr shards with solr homes .../solrs/4.0/1  .../solrs/4.0/2 and
> .../solrs/4.0/2solrs/3
>
> For shard 1 I have a solr.xml file with the modifications described in the
> previous message.  For that instance, it appears that the problem is that
> the semantics of specifing the instancedir have changed between 3.6 and
> 4.0.
>
> I specified the instancedir as  instanceDir="."
>
> However, I get this error in the log:
>
> "Cannot create directory: /l/solrs/dev/solrs/4.0/1/./data/index"
>
> Note that instead of using Solr home /l/solrs/dev/solrs/4.0/1 (what I
> would expect for the relative path "."), that Solr appears to be appending
> "." to Solr home.
> The solr.xml file says that paths are relative to the "installation
> directory".  Perhaps that needs to be clarified in the file.
>
>
> For shards 2 and 3, I tried not using a solr.xml file and I did not create
> a "collection1" subdirectory.  For these solr instances, I got the messages
> about collection1 and files not being found in the $SOLRHOME/collection1
> path
>
> " Can't find resource 'solrconfig.xml' in classpath or
> '/l/solrs/dev/solrs/4.0/3/collection1/conf/',
> cwd=/l/local/apache-tomcat-dev"
> Looking at the logs it appears that "collection1" is specified as the
> default core somewhere:
>
> Aug 23, 2012 12:42:47 PM org.apache.solr.core.CoreContainer$Initializer
> initialize
> INFO: looking for solr.xml: /l/solrs/dev/solrs/4.0/3/solr.xml
> Aug 23, 2012 12:42:47 PM org.apache.solr.core.CoreContainer 
> INFO: New CoreContainer 1281149250
> Aug 23, 2012 12:42:47 PM org.apache.solr.core.CoreContainer$Initializer
> initialize
> INFO: no solr.xml file found - using default
> Aug 23, 2012 12:42:47 PM org.apache.solr.core.CoreContainer load
> INFO: Loading CoreContainer using Solr Home: '/l/solrs/dev/solrs/4.0/3/'
> Aug 23, 2012 12:42:47 PM org.apache.solr.core.SolrResourceLoader 
> INFO: Creating SolrCore 'collection1' using instanceDir:
> /l/solrs/dev/solrs/4.0/3/collection1
>
> Is this default of "collection1" specified in some other config file or
> hardcoded into Solr somewhere?
>
> If using a core is mandatory with Solr 4.0 , the CoreAdmin wiki page and
> the release notes should point this out.
>
>
> Tom
>
>
>
>


Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-23 Thread Mikhail Khludnev
Tom,
Feel free to find my benchmark results for two alternative joining
approaches.
http://blog.griddynamics.com/2012/08/block-join-query-performs.html

Regards

On Thu, Aug 23, 2012 at 4:40 PM, Erick Erickson wrote:

> Tom:
>
> I thin my comments were that grouping on a field where there was
> a unique value _per document_ chewed up a lot of resources.
> Conceptually, there's a bucket for each unique group value. And
> grouping on a file path is just asking for trouble.
>
> But the memory used for grouping should max as a function of
> the unique values in the grouped field.
>
> Best
> Erick
>
> On Wed, Aug 22, 2012 at 11:32 PM, Lance Norskog  wrote:
> > Yes, distributed grouping works, but grouping takes a lot of
> > resources. If you can avoid in distributed mode, so much the better.
> >
> > On Wed, Aug 22, 2012 at 3:35 PM, Tom Burton-West 
> wrote:
> >> Thanks Tirthankar,
> >>
> >> So the issue in memory use for sorting.  I'm not sure I understand how
> >> sorting of grouping fields  is involved with the defaults and field
> >> collapsing, since the default sorts by relevance not grouping field.  On
> >> the other hand I don't know much about how field collapsing is
> implemented.
> >>
> >> So far the few tests I've made haven't revealed any memory problems.  We
> >> are using very small string fields for grouping and I think that we
> >> probably only have a couple of cases where we are grouping more than a
> few
> >> thousand docs.   I will try to find a query with a lot of docs per group
> >> and take a look at the memory use using JConsole.
> >>
> >> Tom
> >>
> >>
> >> On Wed, Aug 22, 2012 at 4:02 PM, Tirthankar Chatterjee <
> >> tchatter...@commvault.com> wrote:
> >>
> >>>  Hi Tom,
> >>>
> >>> We had an issue where we are keeping millions of docs in a single node
> and
> >>> we were trying to group them on a string field which is nothing but
> full
> >>> file path… that caused SOLR to go out of memory…
> >>>
> >>> ** **
> >>>
> >>> Erick has explained nicely in the thread as to why it won’t work and I
> had
> >>> to find another way of architecting it. 
> >>>
> >>> ** **
> >>>
> >>> How do you think this is different in your case. If you want to group
> by a
> >>> string field with thousands of similar entries I am guessing you will
> face
> >>> the same issue. 
> >>>
> >>> ** **
> >>>
> >>> Thanks,
> >>>
> >>> Tirthankar
> >>> ***Legal Disclaimer***
> >>> "This communication may contain confidential and privileged material
> for
> >>> the
> >>> sole use of the intended recipient. Any unauthorized review, use or
> >>> distribution
> >>> by others is strictly prohibited. If you have received the message in
> >>> error,
> >>> please advise the sender by reply email and delete the message. Thank
> you."
> >>> **
> >>>
> >
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: Holy cow do I love 4.0's admin screen

2012-08-23 Thread geeky2
Andy,

we are not running solr 4.0 here in production.

can you elaborate on your comment related to your polling script written in
ruby and how the new data import status screen makes your polling app
obsolete?

i wrote my own polling app (in shell) to work around the very same issues:

http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-td3987110.html

thx for the post



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Holy-cow-do-I-love-4-0-s-admin-screen-tp4002912p4002936.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Holy cow do I love 4.0's admin screen

2012-08-23 Thread Andy Lester
> can you elaborate on your comment related to your polling script written in
> ruby and how the new data import status screen makes your polling app
> obsolete?

The 4.0 admin tools have a screen that give the status in the web app so I 
don't have to run the CLI tool to check the indexing status.

However, it will still be necessary if I need to wait for indexing to complete 
in, for example, a Makefile or a script.

xoxo
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



using tie parameter of edismax to raise a score (disjunction max query)?

2012-08-23 Thread geeky2

Hello all,

this "more specific" question is related to my earlier post at:
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-td4002897.html

i am reading here about the tie parameter:
http://wiki.apache.org/solr/ExtendedDisMax?highlight=%28edismax%29#tie_.28Tie_breaker.29

*can i use the edismax, tie= parameter, to "raise" the following score?*

my goal is to raise the total score of this document (see score snippet
below) to 9.11329.

to do this - would i use tie=0.0 to make a pure "disjunction max query" --
only the maximum scoring sub query contributes to the final score?


  
*0.20737723* = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of:
0.022755474 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  0.0027743944 = queryNorm
*9.11329* = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  1.0 = fieldNorm(field=itemNo, doc=1796597)


thank you








--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-tie-parameter-of-edismax-to-raise-a-score-disjunction-max-query-tp4002935.html
Sent from the Solr - User mailing list archive at Nabble.com.


need help understanding an issue with scoring

2012-08-23 Thread geeky2
hello,

i am trying to understand the "debug" output from a query, and specifically
- how scores for two (2) documents are derived and why they are so far
apart.

the user is entering 9030 for the search

the search is rightfully returning the top document, however - the question
is why is the document with id 90302 so far down on the list.  

i have attached a text file i generated with xslt, pulling the document
information.  the text file has the itemNo, the rankNo and the partCnt.  the
sort order of the response handler is:

  score desc, rankNo desc, partCnt desc



if you look at the text file - you will see that 90302 is 174'th on the
list!  90302 has a rankNo of 6849 - and i would think that would drive it
much higher on the list and therefore much closer to 9030.

what is happening from a business perspective - is - 9030 is one of our top
selling parts as is 90302.  they need to be closer together in the results
instead of separated by 170+ documents that have a rankNo of 0.

i have also CnP the response handler that is being used - below

can someone help me understand the scoring so i can correct this?

this is the scoring for the two documents:

  
12.014634 = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of:
0.022755474 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  0.0027743944 = queryNorm
9.11329 = (MATCH) fieldWeight(itemNo:9030 in 2308681), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  1.0 = fieldNorm(field=itemNo, doc=2308681)
  12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),
product of:
1.0 = tf(termFreq(itemNoExactMatchStr:9030)=1)
12.014634 = idf(docFreq=140, maxDocs=8566704)
1.0 = fieldNorm(field=itemNoExactMatchStr, doc=2308681)





  
0.20737723 = (MATCH) max of:
  0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of:
0.022755474 = queryWeight(itemNo:9030^0.9), product of:
  0.9 = boost
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  0.0027743944 = queryNorm
9.11329 = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of:
  1.0 = tf(termFreq(itemNo:9030)=1)
  9.11329 = idf(docFreq=2565, maxDocs=8566704)
  1.0 = fieldNorm(field=itemNo, doc=1796597)


~  

  

  edismax
  all
  10
  itemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8
brand^.5
  *:*
  score desc, rankNo desc, partCnt desc
  true
  itemDescFacet
  brandFacet
  divProductTypeIdFacet





  
 
thank you for any help




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: The way to customize ranking?

2012-08-23 Thread François Schiettecatte
I would create two indices, one with your content and one with your ads. This 
approach would allow you to precisely control how many ads you pull back and 
how you merge them into the results, and you would be able to control schemas, 
boosting, defaults fields, etc for each index independently. 

Best regards

François

On Aug 23, 2012, at 11:45 AM, Nicholas Ding  wrote:

> Thank you, but I don't want to filter those ads.
> 
> For example, when user make a search like q=Car
> Result list:
> 1. Ford Automobile (score 10)
> 2. Honda Civic (score 9)
> ...
> ...
> ...
> 99. Paid Ads (score 1, Ad has own field to identify it's an Ad)
> 
> What I want to find is a way to make the score of "Paid Ads" higher than
> "Ford Automobile". Basically, the result structure will look like
> 
> - [Paid Ads Section]
>[Most valuable Ads 1]
>[Most valuable Ads 2]
>[Less valuable Ads 1]
>[Less valuable Ads 2]
> - [Relevant Results Section]
> 
> 
> On Thu, Aug 23, 2012 at 11:33 AM, Karthick Duraisamy Soundararaj <
> karthick.soundara...@gmail.com> wrote:
> 
>> Hi
>> You might add an int  field "Search Rule" that identifies the type of
>> search.
>> example
>>Search Rule  Description
>> 0  Unpaid Search
>> 1  Paid Search - Rule
>> 1
>> 2  Paid Serch - Rule 2
>> 
>> You can use filterqueries (
>> http://wiki.apache.org/solr/CommonQueryParameters)
>> like fq:  Search Rule :[1 TO *]
>> 
>> Alternatively, You can even use a boolean field to identify whether or not
>> a search is paid and then an addtitional field that identifies the type of
>> paid search.
>> 
>> --
>> karthick
>> 
>> On Thu, Aug 23, 2012 at 11:16 AM, Nicholas Ding >> wrote:
>> 
>>> Hi
>>> 
>>> I'm working on Solr to build a local business search in China. We have a
>>> special requirement from advertiser. When user makes a search, if the
>>> results contain paid advertisements, those ads need to be moved on the
>> top
>>> of results. For different ads, they have detailed rules about which comes
>>> first.
>>> 
>>> Could anyone offer me some suggestions how I customize the ranking based
>> on
>>> my requirement?
>>> 
>>> Thanks
>>> Nicholas
>>> 
>> 



Re: need help understanding an issue with scoring

2012-08-23 Thread Jack Krupansky

What is your query and "qf"?

The first doc gets its high score due to a match on the 
"itemNoExactMatchStr" field which the second doc doesn't have:


12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),

With a low document frequency (inverts to high inverse document frequency):

12.014634 = idf(docFreq=140, maxDocs=8566704)

-- Jack Krupansky

-Original Message- 
From: geeky2

Sent: Thursday, August 23, 2012 11:44 AM
To: solr-user@lucene.apache.org
Subject: need help understanding an issue with scoring

hello,

i am trying to understand the "debug" output from a query, and specifically
- how scores for two (2) documents are derived and why they are so far
apart.

the user is entering 9030 for the search

the search is rightfully returning the top document, however - the question
is why is the document with id 90302 so far down on the list.

i have attached a text file i generated with xslt, pulling the document
information.  the text file has the itemNo, the rankNo and the partCnt.  the
sort order of the response handler is:

 score desc, rankNo desc, partCnt desc



if you look at the text file - you will see that 90302 is 174'th on the
list!  90302 has a rankNo of 6849 - and i would think that would drive it
much higher on the list and therefore much closer to 9030.

what is happening from a business perspective - is - 9030 is one of our top
selling parts as is 90302.  they need to be closer together in the results
instead of separated by 170+ documents that have a rankNo of 0.

i have also CnP the response handler that is being used - below

can someone help me understand the scoring so i can correct this?

this is the scoring for the two documents:

 
12.014634 = (MATCH) max of:
 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of:
   0.022755474 = queryWeight(itemNo:9030^0.9), product of:
 0.9 = boost
 9.11329 = idf(docFreq=2565, maxDocs=8566704)
 0.0027743944 = queryNorm
   9.11329 = (MATCH) fieldWeight(itemNo:9030 in 2308681), product of:
 1.0 = tf(termFreq(itemNo:9030)=1)
 9.11329 = idf(docFreq=2565, maxDocs=8566704)
 1.0 = fieldNorm(field=itemNo, doc=2308681)
 12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681),
product of:
   1.0 = tf(termFreq(itemNoExactMatchStr:9030)=1)
   12.014634 = idf(docFreq=140, maxDocs=8566704)
   1.0 = fieldNorm(field=itemNoExactMatchStr, doc=2308681)





 
0.20737723 = (MATCH) max of:
 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of:
   0.022755474 = queryWeight(itemNo:9030^0.9), product of:
 0.9 = boost
 9.11329 = idf(docFreq=2565, maxDocs=8566704)
 0.0027743944 = queryNorm
   9.11329 = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of:
 1.0 = tf(termFreq(itemNo:9030)=1)
 9.11329 = idf(docFreq=2565, maxDocs=8566704)
 1.0 = fieldNorm(field=itemNo, doc=1796597)


~

 
   
 edismax
 all
 10
 itemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8
brand^.5
 *:*
 score desc, rankNo desc, partCnt desc
 true
 itemDescFacet
 brandFacet
 divProductTypeIdFacet
   
   
   
   
   
 

thank you for any help




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Bitmap field in solr

2012-08-23 Thread Andy Lester

On Aug 23, 2012, at 2:54 PM, Rohit Harchandani wrote:

> Hi all,
> Is there any way to have a bitmap field in Solr??
> I have a use case where I need to search specific attributes of a document.
> Rather than having an is_A, is_B, is_C (all related to each other)etc...how
> would i store all this data in a single field and still be able to query
> it?? Can it be done in any way apart from storing them as strings in a text
> field?


You can have a field that is multiValued.  It still needs a base type, like 
"string" or "int".  For instance, in my book database, I have a field called 
"classifications" and it is multivalued.  



A classification of 1 means "spiralbound", and 2 means "large print" and 3 
means "multilingual" and so on.  So if my user wants to search for a 
multilingual book, I search for "classifications:3".  If you want spiralbound 
large print, you'd search for "classifications:1 classifications:2".

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: need help understanding an issue with scoring

2012-08-23 Thread geeky2
hello,


this is the query i am using:

 cat goquery.sh
#!/bin/bash

SERVER=$1
PORT=$2


QUERY="http://$SERVER.blah.blah.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearch&q=9030&rows=2000&debugQuery=on&fl=*,score";

curl -v $QUERY




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002969.html
Sent from the Solr - User mailing list archive at Nabble.com.


recommended SSD

2012-08-23 Thread Peyman Faratin
Hi

Is there a SSD brand and spec that the community recommends for an index of 
size 56G with mostly reads? We are evaluating this one

http://www.newegg.com/Product/Product.aspx?Item=N82E16820227706

thank you

Peyman




Re: The way to customize ranking?

2012-08-23 Thread Savvas Andreas Moysidis
Could you not apply this logic in your solr client prior to displaying
the results?

On 23 August 2012 20:56, François Schiettecatte
 wrote:
> I would create two indices, one with your content and one with your ads. This 
> approach would allow you to precisely control how many ads you pull back and 
> how you merge them into the results, and you would be able to control 
> schemas, boosting, defaults fields, etc for each index independently.
>
> Best regards
>
> François
>
> On Aug 23, 2012, at 11:45 AM, Nicholas Ding  wrote:
>
>> Thank you, but I don't want to filter those ads.
>>
>> For example, when user make a search like q=Car
>> Result list:
>> 1. Ford Automobile (score 10)
>> 2. Honda Civic (score 9)
>> ...
>> ...
>> ...
>> 99. Paid Ads (score 1, Ad has own field to identify it's an Ad)
>>
>> What I want to find is a way to make the score of "Paid Ads" higher than
>> "Ford Automobile". Basically, the result structure will look like
>>
>> - [Paid Ads Section]
>>[Most valuable Ads 1]
>>[Most valuable Ads 2]
>>[Less valuable Ads 1]
>>[Less valuable Ads 2]
>> - [Relevant Results Section]
>>
>>
>> On Thu, Aug 23, 2012 at 11:33 AM, Karthick Duraisamy Soundararaj <
>> karthick.soundara...@gmail.com> wrote:
>>
>>> Hi
>>> You might add an int  field "Search Rule" that identifies the type of
>>> search.
>>> example
>>>Search Rule  Description
>>> 0  Unpaid Search
>>> 1  Paid Search - Rule
>>> 1
>>> 2  Paid Serch - Rule 2
>>>
>>> You can use filterqueries (
>>> http://wiki.apache.org/solr/CommonQueryParameters)
>>> like fq:  Search Rule :[1 TO *]
>>>
>>> Alternatively, You can even use a boolean field to identify whether or not
>>> a search is paid and then an addtitional field that identifies the type of
>>> paid search.
>>>
>>> --
>>> karthick
>>>
>>> On Thu, Aug 23, 2012 at 11:16 AM, Nicholas Ding >>> wrote:
>>>
 Hi

 I'm working on Solr to build a local business search in China. We have a
 special requirement from advertiser. When user makes a search, if the
 results contain paid advertisements, those ads need to be moved on the
>>> top
 of results. For different ads, they have detailed rules about which comes
 first.

 Could anyone offer me some suggestions how I customize the ranking based
>>> on
 my requirement?

 Thanks
 Nicholas

>>>
>


Re: recommended SSD

2012-08-23 Thread François Schiettecatte
You should check this at pcper.com:

http://pcper.com/ssd-decoder

http://pcper.com/content/SSD-Decoder-popup

Specs for a wide range of SSDs.

Best regards

François


On Aug 23, 2012, at 5:35 PM, Peyman Faratin  wrote:

> Hi
> 
> Is there a SSD brand and spec that the community recommends for an index of 
> size 56G with mostly reads? We are evaluating this one
> 
> http://www.newegg.com/Product/Product.aspx?Item=N82E16820227706
> 
> thank you
> 
> Peyman
> 
> 



Re: Solr 4.0 Beta missing example/conf files?

2012-08-23 Thread Erik Hatcher
Tom - 

I corrected, on both trunk and 4_x, a reference to solr/conf (to 
solr/collection1/conf) in tutorial.html.  I didn't see anything in 
example/README that needed fixing.  Was there something that is awry there that 
needs correcting that I missed?   If so, feel free to file a JIRA marked for 
4.0 so we can be sure to fix it before final release.

Thanks,
Erik

On Aug 22, 2012, at 16:32 , Tom Burton-West wrote:

> Thanks Markus!
> 
> Should the README.txt file in solr/example be updated to reflect this?
> Is that something I need to enter a JIRA issue for?
> 
> Tom
> 
> On Wed, Aug 22, 2012 at 3:12 PM, Markus Jelsma
> wrote:
> 
>> Hi - The example has been moved to collection1/
>> 
>> 
>> 
>> -Original message-
>>> From:Tom Burton-West 
>>> Sent: Wed 22-Aug-2012 20:59
>>> To: solr-user@lucene.apache.org
>>> Subject: Solr 4.0 Beta missing example/conf files?
>>> 
>>> Hello,
>>> 
>>> Usually in the example/solr file in Solr distributions there is a
>> populated
>>> conf file.  However in the distribution I downloaded of solr 4.0.0-BETA,
>>> there is no /conf directory.   Has this been moved somewhere?
>>> 
>>> Tom
>>> 
>>> ls -l apache-solr-4.0.0-BETA/example/solr
>>> total 107
>>> drwxr-sr-x 2 tburtonw dlps0 May 29 13:02 bin
>>> drwxr-sr-x 3 tburtonw dlps   22 Jun 28 09:21 collection1
>>> -rw-r--r-- 1 tburtonw dlps 2259 May 29 13:02 README.txt
>>> -rw-r--r-- 1 tburtonw dlps 2171 Jul 31 19:35 solr.xml
>>> -rw-r--r-- 1 tburtonw dlps  501 May 29 13:02 zoo.cfg
>>> 
>> 



Re: The way to customize ranking?

2012-08-23 Thread Nicholas Ding
Yes, I think do two separate calls to Solr could solve my problem. But I
really want to reduce the HTTP requests to Solr, if I could write a Solr
extension and place my ranking logics to inside, that could be perfect.

On Thu, Aug 23, 2012 at 5:53 PM, Savvas Andreas Moysidis <
savvas.andreas.moysi...@gmail.com> wrote:

> Could you not apply this logic in your solr client prior to displaying
> the results?
>
> On 23 August 2012 20:56, François Schiettecatte
>  wrote:
> > I would create two indices, one with your content and one with your ads.
> This approach would allow you to precisely control how many ads you pull
> back and how you merge them into the results, and you would be able to
> control schemas, boosting, defaults fields, etc for each index
> independently.
> >
> > Best regards
> >
> > François
> >
> > On Aug 23, 2012, at 11:45 AM, Nicholas Ding 
> wrote:
> >
> >> Thank you, but I don't want to filter those ads.
> >>
> >> For example, when user make a search like q=Car
> >> Result list:
> >> 1. Ford Automobile (score 10)
> >> 2. Honda Civic (score 9)
> >> ...
> >> ...
> >> ...
> >> 99. Paid Ads (score 1, Ad has own field to identify it's an Ad)
> >>
> >> What I want to find is a way to make the score of "Paid Ads" higher than
> >> "Ford Automobile". Basically, the result structure will look like
> >>
> >> - [Paid Ads Section]
> >>[Most valuable Ads 1]
> >>[Most valuable Ads 2]
> >>[Less valuable Ads 1]
> >>[Less valuable Ads 2]
> >> - [Relevant Results Section]
> >>
> >>
> >> On Thu, Aug 23, 2012 at 11:33 AM, Karthick Duraisamy Soundararaj <
> >> karthick.soundara...@gmail.com> wrote:
> >>
> >>> Hi
> >>> You might add an int  field "Search Rule" that identifies the type
> of
> >>> search.
> >>> example
> >>>Search Rule  Description
> >>> 0  Unpaid Search
> >>> 1  Paid Search -
> Rule
> >>> 1
> >>> 2  Paid Serch -
> Rule 2
> >>>
> >>> You can use filterqueries (
> >>> http://wiki.apache.org/solr/CommonQueryParameters)
> >>> like fq:  Search Rule :[1 TO *]
> >>>
> >>> Alternatively, You can even use a boolean field to identify whether or
> not
> >>> a search is paid and then an addtitional field that identifies the
> type of
> >>> paid search.
> >>>
> >>> --
> >>> karthick
> >>>
> >>> On Thu, Aug 23, 2012 at 11:16 AM, Nicholas Ding   wrote:
> >>>
>  Hi
> 
>  I'm working on Solr to build a local business search in China. We
> have a
>  special requirement from advertiser. When user makes a search, if the
>  results contain paid advertisements, those ads need to be moved on the
> >>> top
>  of results. For different ads, they have detailed rules about which
> comes
>  first.
> 
>  Could anyone offer me some suggestions how I customize the ranking
> based
> >>> on
>  my requirement?
> 
>  Thanks
>  Nicholas
> 
> >>>
> >
>


Re: Querying top n of each facet value

2012-08-23 Thread Kiran Jayakumar
Thank you Tanguy. This seems to work:

group = true
group.field = Category
group.limit = 5

http://wiki.apache.org/solr/FieldCollapsing

group.limit

[number]

The number of results (documents) to return for each group. Defaults to 1.

On Thu, Aug 23, 2012 at 1:33 AM, Tanguy Moal  wrote:

> Hello Kiran,
>
> I think you can try turning grouping on and group "on", and ask solr to
> group on the "Category" field.
>
> Nevertheless, this will *not* ensure you that groups are returned in facet
> counts order. This will *not* ensure you the mincount per group neither.
>
> Hope this helps,
>
> --
> Tanguy
>
> 2012/8/23 Kiran Jayakumar 
>
> > Hi everyone,
> >
> > I am building an auto complete feature, which facets by a field called
> > "Category". I want to return a minimum number of documents per facet (say
> > min=1 & max=5).
> >
> > The facet output is something like
> >
> > Category
> > A: 500
> > B: 10
> > C: 2
> >
> > By default, it is returning 10 documents of category A.
> >
> > I want it to return a total of 10 documents, with 1 document atleast for
> > each facet value. Is it possible to accomplish that with a single query ?
> >
> > Thanks
> >
>


Re: Cloud assigning incorrect port to shards

2012-08-23 Thread Mark Miller
Can you post your solr.xml file?

On Thursday, August 23, 2012, Buttler, David wrote:

> I am using the jetty container from the example.  The only thing I have
> done is change the schema to match up my documents rather than the example
>
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com ]
> Sent: Wednesday, August 22, 2012 5:50 PM
> To: solr-user@lucene.apache.org 
> Subject: Re: Cloud assigning incorrect port to shards
>
> What container are you using?
>
> Sent from my iPhone
>
> On Aug 22, 2012, at 3:14 PM, "Buttler, David" 
> >
> wrote:
>
> > Hi,
> > I have set up a Solr 4 beta cloud cluster.  I have uploaded a config
> directory, and linked it with a configuration name.
> >
> > I have started two solr on two computers and added a couple of shards
> using the Core Admin function on the admin page.
> >
> > When I go to the admin cloud view, the shards all have the computer name
> and port attached to them.  BUT, the port is the default port (8983), and
> not the port that I assigned on the command line.  I can still connect to
> the correct port, and not the reported port.  I anticipate that this will
> lead to errors when I get to doing distributed query, as zookeeper seems to
> be collecting incorrect information.
> >
> > Any thoughts as to why the incorrect port is being stored in zookeeper?
> >
> > Thanks,
> > Dave
>


-- 
- Mark

http://www.lucidimagination.com


Solr Index problem

2012-08-23 Thread ranmatrix S
Hi,

I have setup Solr to index data from Oracle DB through DIH handler. However
through Solr admin I could see the DB connection is successfull, data
retrieved from DB to Solr but not added into index. The message is that "0
documents added" even when I am able to see that 9 records are returned
back.

The schema and fields in db-data-config.xml are one and the same.

Please suggest if anything I should look for.

-- 
Regards,
Ran...


Re: Solr Index problem

2012-08-23 Thread Andy Lester

On Aug 23, 2012, at 4:46 PM, ranmatrix S  wrote:

> The schema and fields in db-data-config.xml are one and the same.

Please attach or post both the schema and the DIH config XML files so we can 
see them.  The DIH can be pretty tricky.

You say you can see 9 records are returned back.  How do you see that?

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



RE: Solr Index problem

2012-08-23 Thread Swati Swoboda
Are you committing? You have to commit for them to be actually added

-Original Message-
From: ranmatrix S [mailto:ranmat...@gmail.com] 
Sent: Thursday, August 23, 2012 5:46 PM
To: solr-user@lucene.apache.org
Subject: Solr Index problem

Hi,

I have setup Solr to index data from Oracle DB through DIH handler. However 
through Solr admin I could see the DB connection is successfull, data retrieved 
from DB to Solr but not added into index. The message is that "0 documents 
added" even when I am able to see that 9 records are returned back.

The schema and fields in db-data-config.xml are one and the same.

Please suggest if anything I should look for.

--
Regards,
Ran...


Re: Solr 4.0 Beta missing example/conf files?

2012-08-23 Thread Tom Burton-West
Thanks Erik!

What confused me in the README is that it wasn't clear what
files/directorys need to be in Solr home and what files/directories need to
be in SolrHome/corename.  For example the /conf and /data directories are
now under the core subdirectory.  What about /lib and /bin?   Will a core
use a conf file in SolrHome/conf if there is no Solrhome/collection1/conf
directory?

Also when upgrading from a previous Solr setup that doesn't use a core,  I
was definitely confused about whether or not it is mandatory to have core
with Solr 4.0.  And when I tried not using a solr.xml file, it was very
wierd to still get a message about a missing collection1 core directory.

See this JIRA issue:https://issues.apache.org/jira/browse/SOLR-3753

Tom


On Thu, Aug 23, 2012 at 7:56 PM, Erik Hatcher wrote:

> Tom -
>
> I corrected, on both trunk and 4_x, a reference to solr/conf (to
> solr/collection1/conf) in tutorial.html.  I didn't see anything in
> example/README that needed fixing.  Was there something that is awry there
> that needs correcting that I missed?   If so, feel free to file a JIRA
> marked for 4.0 so we can be sure to fix it before final release.
>
> Thanks,
> Erik
>
> On Aug 22, 2012, at 16:32 , Tom Burton-West wrote:
>
> > Thanks Markus!
> >
> > Should the README.txt file in solr/example be updated to reflect this?
> > Is that something I need to enter a JIRA issue for?
> >
> > Tom
> >
> > On Wed, Aug 22, 2012 at 3:12 PM, Markus Jelsma
> > wrote:
> >
> >> Hi - The example has been moved to collection1/
> >>
> >>
> >>
> >> -Original message-
> >>> From:Tom Burton-West 
> >>> Sent: Wed 22-Aug-2012 20:59
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Solr 4.0 Beta missing example/conf files?
> >>>
> >>> Hello,
> >>>
> >>> Usually in the example/solr file in Solr distributions there is a
> >> populated
> >>> conf file.  However in the distribution I downloaded of solr
> 4.0.0-BETA,
> >>> there is no /conf directory.   Has this been moved somewhere?
> >>>
> >>> Tom
> >>>
> >>> ls -l apache-solr-4.0.0-BETA/example/solr
> >>> total 107
> >>> drwxr-sr-x 2 tburtonw dlps0 May 29 13:02 bin
> >>> drwxr-sr-x 3 tburtonw dlps   22 Jun 28 09:21 collection1
> >>> -rw-r--r-- 1 tburtonw dlps 2259 May 29 13:02 README.txt
> >>> -rw-r--r-- 1 tburtonw dlps 2171 Jul 31 19:35 solr.xml
> >>> -rw-r--r-- 1 tburtonw dlps  501 May 29 13:02 zoo.cfg
> >>>
> >>
>
>


Re: Solr Custom Filter Factory - How to pass parameters?

2012-08-23 Thread KnightRider
Can someone please point to some samples on how to implement custom
SolrEventListeners?

Whats the default behavior of Solr when no SolrEventListeners are configured
in solrconfig.xml.

I am trying to understand how does custom listener fit in with default
listeners (if there are any)

Thanks
-K'Rider



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-handle-PostProcessing-tp4002217p4003014.html
Sent from the Solr - User mailing list archive at Nabble.com.