Remove "-" from search query

2010-11-18 Thread ionysis

Hi

I have the following search query "LTJ Bukem - Horizons"

When this is processed by solr it subtracts Horizons from the search e.g.

+text:LTJ +text:Bukem -text:Horizons

I do not want to subtract Horizons from the search, but to ignore/remove the
minus sign, to end up with...

+text:LTJ +text:Bukem +text:Horizons

I tried adding the minus to the stopwords.txt, but no success.

Any thoughts would be greatly appreciated!

thanks James


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Remove-from-search-query-tp1922490p1922490.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Remove "-" from search query

2010-11-18 Thread Ahmet Arslan
> I have the following search query "LTJ Bukem - Horizons"
> 
> When this is processed by solr it subtracts Horizons from
> the search e.g.
> 
> +text:LTJ
> +text:Bukem -text:Horizons
> 
> I do not want to subtract Horizons from the search, but to
> ignore/remove the
> minus sign, to end up with...
> 
> +text:LTJ
> +text:Bukem +text:Horizons
> 
> I tried adding the minus to the stopwords.txt, but no
> success.

Can't you process your query at client side? either remove it or escape it by 
backslash. LTJ Bukem \- Horizons

At server side there are few things you can do. It is easier to do it at client 
side. You can use raw query parser and set autoGeneratePhraseQueries to false 
in your field type but this setting requires solr trunk.


  


Re: DateFormatTransformer issue with value 0000-00-00T00:00:00Z

2010-11-18 Thread gwk
While the year zero exists, month zero and day zero don't. And while 
APIs ofttimes accept those values (ie day zero is the last day of the 
previous month) the ISO 8601 spec does not accept it as far as I know.


On 11/18/2010 4:26 AM, Dennis Gearon wrote:

I thought that that value was a perfectly valid one for ISO 9601 time?

http://en.wikipedia.org/wiki/Year_zero


  Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: gwk
To: solr-user@lucene.apache.org
Sent: Wed, November 17, 2010 2:12:16 AM
Subject: Re: DateFormatTransformer issue with value -00-00T00:00:00Z

On 11/16/2010 1:41 PM, Shanmugavel SRD wrote:

Hi,
 I am having a field as below in my feed.
-00-00T00:00:00Z

 I have configured the field as below in data-config.xml.


 But after indexing, the field value becomes like this
0002-11-30T00:00:00Z

 I want to have the value as '-00-00T00:00:00Z' after indexing also.
Could anyone help on this?

PS: I am using solr 1.4.1

As -00-00T00:00:00Z isn't a valid date I don't think the Solr's date
field will accept it. Assuming this is MySQL you can use the
zeroDateTimeBehavior connection string option, i.e.
mysql://user:passw...@mysqlhost/database?zeroDateTimeBehavior=convertToNull
This will make the mysql driver return those values as NULL instead of
all-zero dates.

Regards,

gwk





JMX Cache values are wrong

2010-11-18 Thread dan sutton
Hi,

I've used three different JMX clients to query

solr/:id=org.apache.solr.search.FastLRUCache,type=queryResultCache
and
solr/:id=org.apache.solr.search.FastLRUCache,type=documentCache

beans and they appear to return old cache information.

As new searchers come online, the newer caches dosen't appear to be
registered perhaps?
I can see this when I query JMX for the 'description' attribute and
the regenerator JMX output shows a different
org.apache.solr.search.SolrIndexSearcher to that which appears in the
stats.jsp page.

Any ideas as to what's gone wrong ... anyone else experience this?

>From registry.jsp:

Solr Specification Version: 1.4.0.2010.09.10.17.10.36
Solr Implementation Version: 1.4.1-dev exported
Lucene Specification Version: 2.9.1
Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25

Cheers,
Dan


Re: Must require quote with single word token query?

2010-11-18 Thread Ahmet Arslan
This happening because query parser pre-tokenizes your query using whites 
paces. It is tokenized before it reaches your query analyzer.
And you are using KeywordTokenizer in your field definition.

Is there a special reason for you to use KeywordTokenizer ?


--- On Thu, 11/18/10, Chamnap Chhorn  wrote:

> From: Chamnap Chhorn 
> Subject: Re: Must require quote with single word token query?
> To: solr-user@lucene.apache.org
> Date: Thursday, November 18, 2010, 5:19 AM
> Thanks for your reply. Here is some
> other details:
> 
> 1. Keyphrase field definition:
>     type="text_keyword" indexed="true" stored="false"
> multiValued="true"/>
> 
> 2. I'm using solr 1.4.
> 
> 3. My dismax definition is the original configuration after
> install solr:
>    class="solr.SearchHandler" >
>     
>       name="defType">dismax
>       name="echoParams">explicit
>       name="tie">0.01
>      
>         text^0.5 features^1.0 name^1.2
> sku^1.5 id^10.0 manu^1.1 cat^1.4
>      
>      
>         text^0.2 features^1.1 name^1.5
> manu^1.4 manu_exact^1.9
>      
>      
>         popularity^0.5
> recip(price,1,1000,1000)^0.3
>      
>      
>         id,name,price,score
>      
>      
>         2<-1 5<-2
> 6<90%
>      
>       name="ps">100
>       name="q.alt">*:*
>      
>      text
> features name
>      
>       name="f.name.hl.fragsize">0
>      
>       name="f.name.hl.alternateField">name
>       name="f.text.hl.fragmenter">regex 
>     
>   
> 
> 4. Here is the result returned back of the original query:
> smart mobile.
> 
> 
> 
> 
> 
> 
>  0
> 
>  1
>  
> 
>   on
>   uuid,name,fap
> 
>   true
>   smart mobile
> 
>   keyphrase
>   dismax
> 
>  
> 
> 
> 
> 
> 
>  smart mobile
> 
>  smart mobile
>   name="parsedquery">+((DisjunctionMaxQuery((keyphrase:smart))
> DisjunctionMaxQuery((keyphrase:mobile)))~2) ()
> 
>   name="parsedquery_toString">+(((keyphrase:smart)
> (keyphrase:mobile))~2) ()
> 
>  
>  DisMaxQParser
> 
>  
>  
> 
>  
>   1.0
> 
>   
>      name="time">1.0
> 
>      name="org.apache.solr.handler.component.QueryComponent">
>   name="time">1.0
> 
>     
>      name="org.apache.solr.handler.component.FacetComponent">
> 
>   name="time">0.0
>     
> 
>      name="org.apache.solr.handler.component.MoreLikeThisComponent">
>   name="time">0.0
> 
>     
>      name="org.apache.solr.handler.component.HighlightComponent">
> 
>   name="time">0.0
>     
> 
>      name="org.apache.solr.handler.component.StatsComponent">
>   name="time">0.0
> 
>     
>      name="org.apache.solr.handler.component.DebugComponent">
> 
>   name="time">0.0
> 
>     
> 
>   
>   
>      name="time">0.0
> 
>      name="org.apache.solr.handler.component.QueryComponent">
>   name="time">0.0
> 
>     
>      name="org.apache.solr.handler.component.FacetComponent">
> 
>   name="time">0.0
>     
> 
>      name="org.apache.solr.handler.component.MoreLikeThisComponent">
>   name="time">0.0
> 
>     
>      name="org.apache.solr.handler.component.HighlightComponent">
> 
>   name="time">0.0
> 
>     
> 
>      name="org.apache.solr.handler.component.StatsComponent">
>   name="time">0.0
> 
>     
>      name="org.apache.solr.handler.component.DebugComponent">
> 
>   name="time">0.0
>     
> 
>   
> 
>  
> 
> 
> 
> 5. Here is parsed query with "smart mobile" (with quotes)
> which returns the
> result:
> 
> 
>  "smart
> mobile"
> 
>  "smart mobile"
> 
> 
>   name="parsedquery">+DisjunctionMaxQuery((keyphrase:smart
> mobile)) ()
> 
>  +(keyphrase:smart
> mobile) ()
> 
>  
>    name="D297A64B-D4BA-4445-B63E-726E5A4F758D">
> 
> 4.503682 = (MATCH) sum of:
>   4.503682 = (MATCH) fieldWeight(keyphrase:smart
> mobile in 13092), product of:
>     1.0 = tf(termFreq(keyphrase:smart mobile)=1)
>     10.29413 = idf(docFreq=1, maxDocs=21748)
>     0.4375 = fieldNorm(field=keyphrase,
> doc=13092)
> 
> 
>  
> 
> 6. Here, I tried to use automatic phrase query (pf
> parameter): doesn't
> return any results.
> http://localhost:8081/solr/select?q=smart%20mobile&qf=keyphrase&pf=keyphrase&debugQuery=on&defType=dismax
> 
>  smart mobile
> 
>  smart mobile
>   name="parsedquery">+((DisjunctionMaxQuery((keyphrase:smart))
> DisjunctionMaxQuery((keyphrase:mobile)))~2)
> DisjunctionMaxQuery((keyphrase:smart mobile))
> 
>   name="parsedquery_toString">+(((keyphrase:smart)
> (keyphrase:mobile))~2) (keyphrase:smart
> mobile)
> 
>  
> 
>  DisMaxQParser
> 
>  
>  
> 
> Thanks
> Chamnap
> 
> On Wed, Nov 17, 2010 at 8:10 PM, Erick Erickson 
> wrote:
> 
> > Try qt=dismax or deftype=dismax, I was also getting 0
> results with
> > defType on 1.4.1. I'll see what's up with that...
> >
> > But if that doesn't work...
> >
> > May we see your dismax definition too? You shouldn't
> need the
> > quotes, so something's wrong somewhere
> >
> > What version of Solr are you using?
> >
> > Also, please post the results of running your original
> query
> > with &debugQuery=on
> >
> > Best
> > Erick
> >
> > On Tue, Nov 16, 2010 at 10:28 PM, Chamnap Chh

Multivalued field search...

2010-11-18 Thread Dario Rigolin
I think this question is more related to Lucene query search but I'm posting 
here becuase I feel more "Solr User" :-)

I have multiple value field named field1 containint codes separated by a space


doc1
A BB1 B BB2 C BB3
A CC1 B CC2 C CC3


doc2
A BB1 B FF2 C FF3
A YY1 B BB2 C KK3


I would like that my query: 

q=field1:("A BB1" AND "A BB2")

returns only doc1. At the moment is returning doc1 and doc2.

Any way to "force" query on a per single field instance and not considering 
"multivalued" as a unique string?
Looking at proximity search I saw that is working only on two term distance 
not on two phrase distance.

Any suggestion or ideas?

Thank you.

Dario


Re: Multivalued field search...

2010-11-18 Thread Dario Rigolin
On Thursday, November 18, 2010 12:36:40 pm Dario Rigolin wrote:

Sorry wrong query:
 
q=field1:("A BB1" AND "B BB2")

Dario


Upgraing from SOLR 1.3 to 3

2010-11-18 Thread Moritz Krinke
Hello,

i have a running solr 1.3 installation and would like to migrate it to
solr 3 in order to get speed improvements by using the multiple threads
for indexing.

When starting SOLR 3, i get the following error message:
SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'textfc'
specified on field descr

I'm using the exact same schema.xml as with solr 1.3.
In the schema.xml, the fieldTyper "textfc" is specified as follows:


  









  

 










  




Any ideas why this does not work?

Thanks a lot,
Moritz




Re: ranged and boolean query

2010-11-18 Thread Peter Blokland
hi,

On Wed, Nov 17, 2010 at 05:00:04PM +0100, Peter Blokland wrote:
 
>>> pubdate:([* TO NOW] OR (NOT *))

i've gone back to the examples provided with solr 1.4.1. the
standard example has 19 documents, one of which has a date-field
called 'incubationdate_dt'. so the query 

incubationdate_dt:[* TO NOW]

is expected to return 1 document, which it does. the query

-incubationdate_dt:* 

is expected to return 18 documents, which it does. however,

incubationdate_dt:[* TO NOW] (-incubationdate_dt:*)

which should (imho) return all 19 documents just returns the
one document that has such a field.

can anyone confirm whether or not this is expected behavior, and
if so, why ?

-- 
CUL8R, Peter.

www.desk.nl --- Sent from my NetBSD-powered Talkie Toaster™


Re: case insensitive sort and LowerCaseFilterFactory

2010-11-18 Thread Erick Erickson
On a quick glance: You're trying to sort on a tokenized field,
which is not good. Not good at all. You'll go 'round and 'round
and it'll never give you what you want.

Consider your example "Whithers, Alfred Robert" The
WhitespaceTokenizer breaks this up into three tokens
(I'm ignoring everything but whitespace and lowercase)
"whithers", "alfred" and "robert" and puts these tokens in
the index. What does sorting on this field mean? Should
this be put in Ws? As? Rs?

The usual process is to use copyfield and put the input into
an un-tokenized field and sort on that field. The stock schema.xml
comes with an "alphaOnlySort" that should be a good place to start

Best
Erick

On Wed, Nov 17, 2010 at 11:15 PM, Scott Yeadon wrote:

> Sorry, looks like it was a data-related issue, apologies for the noise
> (although if anyone spots anything dodgy in the config feel free to let me
> know).
>
> Scott.
>
>
> On 18/11/10 2:21 PM, Scott Yeadon wrote:
>
>> Hi,
>>
>> I'm running solr-tomcat 1.4.0 on Ubuntu and have an issue with the sorting
>> of results. According to this page
>> http://web.archiveorange.com/archive/v/AAfXfzy5Tm1uDy5mYW3B I should be
>> able to configure the LowerCaseFilterFactory to ensure results will be
>> indexed and returned in a case insensitive means, however this does not
>> appear to be working for me. Is someone able to check my field config to
>> confirm it is ok (and if anyone has any advice on making this work it would
>> be appreciated - my issue is the same as that in the provided link (that is,
>> upper case and lower case are being ordered separately instead of being
>> interspersed). The sort field I'm using is of type text as defined below.
>>
>> The text field type is configured as follows:
>>
>> 
>> 
>> 
>> > ignoreCase="true"
>> words="stopwords.txt"
>> enablePositionIncrements="true"
>> />
>> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>> catenateAll="0" splitOnCaseChange="1"/>
>> 
>> > protected="protwords.txt"/>
>> 
>> 
>> 
>> > ignoreCase="true" expand="true"/>
>> > ignoreCase="true"
>> words="stopwords.txt"
>> enablePositionIncrements="true"
>> />
>> > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
>> catenateAll="0" splitOnCaseChange="1"/>
>> 
>> > protected="protwords.txt"/>
>> 
>> 
>>
>> When I sort on a primaryName field (which is a "text" field as define
>> above) for example, I get records listed out of order as in the following
>> example:
>> - Withers, Alfred Robert (1863–1956)
>> - Young, Charles (1838–1916)
>> - de Little, Ernest (1868–1926)
>> - de Pledge, Thomas Frederick (1867–1954)
>> - von Bibra, William (1876–1926)
>>
>> I imagine I'm missing something obvious, the obvious workaround is a
>> namesort field however from the above post it looks like this can be
>> avoided.
>>
>> Scott.
>>
>>
>>
>


Re: Dismax is failing with json response writer

2010-11-18 Thread Erick Erickson
What version of Solr are you using? Could we see the actual query
you're sending?

And the dismax definition, and perhaps the relevant parts of schema.xml.

There's not much information to go on here to help debug this.

Best
Erick

On Thu, Nov 18, 2010 at 1:21 AM, sivaprasad wrote:

>
> Hi,
>
> I am using dismax query parser.When i want the response in JSON, iam giving
> wt=json.Here it is throwing the below exception.
>
> HTTP Status 500 - null java.lang.NullPointerException at
> org.apache.solr.search.DocSlice$1.score(DocSlice.java:121) at
>
> org.apache.solr.request.JSONWriter.writeDocList(JSONResponseWriter.java:502)
> at
>
> org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:141)
> at
>
> org.apache.solr.request.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:182)
> at
>
> org.apache.solr.request.JSONWriter.writeNamedList(JSONResponseWriter.java:297)
> at
>
> org.apache.solr.request.JSONWriter.writeResponse(JSONResponseWriter.java:92)
> at
>
> org.apache.solr.request.JSONResponseWriter.write(JSONResponseWriter.java:51)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:325)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
> at
>
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> at java.lang.Thread.run(Thread.java:619)
>
>
> But the same is wroking with Standared request handler.
>
> Any has idea how to fix this.
>
> Regards,
> Siva
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Dismax-is-failing-with-json-response-writer-tp1922170p1922170.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Does edismax support wildcard queries?

2010-11-18 Thread Erick Erickson
Well, the claim is that eDismax supports "full Lucene syntax", so I
assume so.

Here's the JIRA: https://issues.apache.org/jira/browse/SOLR-1553
which indicates that you
have two choices: Trunk or the 3.x build, see
https://hudson.apache.org/hudson/

Best
Erick

On Thu, Nov 18, 2010 at 1:38 AM, Swapnonil Mukherjee <
swapnonil.mukher...@gettyimages.com> wrote:

> Hi Everybody,
>
> We have started to the use the dismax query handler but one serious
> limitation of it is that it does not support wild card queries? I think I
> have 2 ways to overcome this problem
>
> 1. Apply some old patches to the dismax parser itself from here
> https://issues.apache.org/jira/browse/SOLR-756
> 2. Or start using the Solr trunk which will allow me to switch to edismax.
> I am specially hopeful of moving to solr trunk and using edismax as I
> believe this well help me support fuzzy search in future as well.
>
> So my question is does edismax support wildcard queries? I could not
> understand by looking at the source code though.
>
> Thanks
> Swapnonil Mukherjee
>
>
>
>


Re: Reading Solr Index directly

2010-11-18 Thread Erick Erickson
See below:

On Thu, Nov 18, 2010 at 2:59 AM, Sasank Mudunuri  wrote:

> Hi,
>
> I've been poking around the JavaDocs a bit, and it looks like it's possible
> to directly read the index using the Solr Java API. Hoping to clarify a
> couple of things --
>
> 1) Do I need to read the index with Solr APIs, or can I use Lucene
> (PyLucene
> is particularly attractive...)? If so, how wary should I be about the
> Lucene
> version number?
>
> Shouldn't be any problem to use Lucene (whatever). The only real issue is
that the you have to be sure the analysis chain you use in Lucene matches
the one used to index the data or you'll get surprising results. But that
only
really counts if you're searching.

The version should be OK, the underlying Lucene will barf when you open
a reader if the versions are incompatible.


> 2) Is there anything I should worry about in terms of opening a read-only
> reader against an active Solr instance? Or will this just block?
>
> Any number of r/o searchers can be open against an index, it makes
no difference whether Solr does this or your Lucene app. Simultaneous
writer *processess* are another story (threads within a process are OK).

You won't see *changes* that Solr makes to the index unless you
reopen the underlying readers, there's no magic notification you'll
get either, if you care you'll have to check periodically somehow.


> 3) Anything else that jumps out at gotchas?
>
> Nope. But fair warning, this isn't something I've had to do, I'm replying
based on "general principles" so caveat emptor.


> I couldn't find any pages about how to do this. I'm happy to compile any
> responses for inclusion on the Solr wiki.
>
> thanks!
> sasank
>


Re: Meaning of avgTimePerRequest & avgRequestsPerSecond in SOLR stats page

2010-11-18 Thread Erick Erickson
average time per request is
the total time spent servicing X requests divided by X (in milliseconds I
believe).
If no searches are being processed, this number doesn't change.
It's a measure of how long it takes, on average, to service a single
request.

avrRequests per second is the total time since you started Solr divided by
the
total number of requests. It's really a measure of how fast requests come
in.

Best
Erick

On Thu, Nov 18, 2010 at 3:57 AM, Shanmugavel SRD
wrote:

>
> Can anyone please explain me about avgTimePerRequest & avgRequestsPerSecond
> in SOLR stats page?
>
> Thanks,
> Shanmugavel SRD
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Meaning-of-avgTimePerRequest-avgRequestsPerSecond-in-SOLR-stats-page-tp1922692p1922692.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Spell-Check Component Functionality

2010-11-18 Thread rajini maski
All,

I am trying apply the Solr spell check component functionality to our
data.

The configuration set up I needed to make for it by updating config.xml and
schema.xml is done as follows..
Please let me know if any errors in it.

 I am not getting any suggestions in suggestion tags of solr output xml.

I indexed word "Crust" to the field textSpell that is enabled for spell
check and then I searched for
"Curst"

The queries i tried were :
http://localhost:8909/solr/spell?q=Curst&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.q=true

http://localhost:8909/solr/spell?q=Cruste&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.q=true&spellcheck.dictionary=default


The CONFIG.XML :



  default
  spell
  ./spellchecker




  jarowinkler
  lowerfilt
  org.apache.lucene.search.spell.JaroWinklerDistance
  ./spellchecker2

 textSpell




default
  
  false
  
  false
  
  1


  spellcheck

  



SCHEMA:


  





  











If any error in above that is not enabling spell check please let me know.

The output I am getting is like null  suggetions






Regards,
Rajani Maski


RE: Does edismax support wildcard queries?

2010-11-18 Thread Thumuluri, Sai
It does support wildcard queries - we are using that feature from
edismax

-Original Message-
From: Swapnonil Mukherjee [mailto:swapnonil.mukher...@gettyimages.com] 
Sent: Thursday, November 18, 2010 1:39 AM
To: solr-user@lucene.apache.org
Subject: Does edismax support wildcard queries?

Hi Everybody,

We have started to the use the dismax query handler but one serious
limitation of it is that it does not support wild card queries? I think
I have 2 ways to overcome this problem 

1. Apply some old patches to the dismax parser itself from here
https://issues.apache.org/jira/browse/SOLR-756
2. Or start using the Solr trunk which will allow me to switch to
edismax. I am specially hopeful of moving to solr trunk and using
edismax as I believe this well help me support fuzzy search in future as
well. 

So my question is does edismax support wildcard queries? I could not
understand by looking at the source code though.

Thanks
Swapnonil Mukherjee





Re: Meaning of avgTimePerRequest & avgRequestsPerSecond in SOLR stats page

2010-11-18 Thread mesenthil

We have recently upgraded some of our solr instances to 1.4.1 from 1.3.
Interestingly both these parameter values got increased after our upgrade.
When avgRequestsPerSecond increases, avgTimePerRequest should be increased.
But it is not in our case.. 

Any thoughts ?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Meaning-of-avgTimePerRequest-avgRequestsPerSecond-in-SOLR-stats-page-tp1922692p1924031.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgraing from SOLR 1.3 to 3

2010-11-18 Thread Shawn Heisey
I did a quick grep through the directory listing of the Solr 3.1 source, 
the only part of your analysis chain that came up empty was 
HTMLStripWhitespaceTokenizerFactory.  I think you'll have to replace it 
with something like this:





Also, the ISOLatin1AccentFilterFactory  is deprecated, replaced with 
ASCIIFoldingFilterFactory.  It's still around, but could be removed at 
any time.


Shawn


On 11/18/2010 4:56 AM, Moritz Krinke wrote:

Hello,

i have a running solr 1.3 installation and would like to migrate it to
solr 3 in order to get speed improvements by using the multiple threads
for indexing.

When starting SOLR 3, i get the following error message:
SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'textfc'
specified on field descr

I'm using the exact same schema.xml as with solr 1.3.
In the schema.xml, the fieldTyper "textfc" is specified as follows:

 
   
 
 
 
 
 
 
 
 
 
   

  
 
 
 
 
 
 
 
 
 
 
   

 


Any ideas why this does not work?

Thanks a lot,
Moritz






Re: Does edismax support wildcard queries?

2010-11-18 Thread Swapnonil Mukherjee
Thanks. I downloaded and built the solr trunk to test for wild card queries and 
as you guys reported, edismax does support wildcards and it works beautifully.

My next challenge will be to apply these patches 

1. https://issues.apache.org/jira/browse/SOLR-1553
2. https://issues.apache.org/jira/browse/SOLR-2058

to the apache-solr-1.4.1 as using the trunk is something which we cannot do. 

On 18-Nov-2010, at 7:28 PM, Thumuluri, Sai wrote:

> It does support wildcard queries - we are using that feature from
> edismax
> 
> -Original Message-
> From: Swapnonil Mukherjee [mailto:swapnonil.mukher...@gettyimages.com] 
> Sent: Thursday, November 18, 2010 1:39 AM
> To: solr-user@lucene.apache.org
> Subject: Does edismax support wildcard queries?
> 
> Hi Everybody,
> 
> We have started to the use the dismax query handler but one serious
> limitation of it is that it does not support wild card queries? I think
> I have 2 ways to overcome this problem 
> 
> 1. Apply some old patches to the dismax parser itself from here
> https://issues.apache.org/jira/browse/SOLR-756
> 2. Or start using the Solr trunk which will allow me to switch to
> edismax. I am specially hopeful of moving to solr trunk and using
> edismax as I believe this well help me support fuzzy search in future as
> well. 
> 
> So my question is does edismax support wildcard queries? I could not
> understand by looking at the source code though.
> 
> Thanks
> Swapnonil Mukherjee
> 
> 
> 



Re: Spell-Check Component Functionality

2010-11-18 Thread Peter Karich

 Hi Rajani,

some notes:
 * try spellcheck.q=curst or completely without spellcheck.q but with q
 * compared to the normal q parameter spellcheck.q can have a different 
analyzer/tokenizer and is used if present
 * do not do spellcheck.build=true for every request (creating the 
spellcheck index can be very expensive)
 * if you got spellcheck working embed the spellcheck component into 
your normal query component. otherwise you need to query 2 times ...


Regards,
Peter.


All,

 I am trying apply the Solr spell check component functionality to our
data.

The configuration set up I needed to make for it by updating config.xml and
schema.xml is done as follows..
Please let me know if any errors in it.

  I am not getting any suggestions in suggestion tags of solr output xml.

I indexed word "Crust" to the field textSpell that is enabled for spell
check and then I searched for
"Curst"

The queries i tried were :
http://localhost:8909/solr/spell?q=Curst&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.q=true

http://localhost:8909/solr/spell?q=Cruste&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.q=true&spellcheck.dictionary=default


The CONFIG.XML :


 
   default
   spell
   ./spellchecker
 

 
 
   jarowinkler
   lowerfilt
   org.apache.lucene.search.spell.JaroWinklerDistance
   ./spellchecker2
 
  textSpell



 
default
   
   false
   
   false
   
   1
 
 
   spellcheck
 
   



SCHEMA:


   





   











If any error in above that is not enabling spell check please let me know.

The output I am getting is like null  suggetions






Regards,
Rajani Maski




--
http://jetwick.com twitter search prototype



Re: Upgraing from SOLR 1.3 to 3

2010-11-18 Thread Moritz Krinke
Thanks for the tip. This seems to work ;)
But now i ran into another problem - im trying to use the "threads"
parameter in my entitys in order to speed up the index creation. as soon
as i use the threads parameter (e.g. threads="2") i get the following
errors in my log:

org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT name AS section_name, shortname as
section_shortname FROM cat where id='739'
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at org.apache.solr.handler.dataimport.JdbcDataSource
$ResultSetIterator.(JdbcDataSource.java:251)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:208)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
at
org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper.nextRow(ThreadedEntityProcessorWrapper.java:84)
at org.apache.solr.handler.dataimport.DocBuilder
$EntityRunner.runAThread(DocBuilder.java:438)
at org.apache.solr.handler.dataimport.DocBuilder
$EntityRunner.run(DocBuilder.java:391)
at org.apache.solr.handler.dataimport.DocBuilder
$EntityRunner.runAThread(DocBuilder.java:458)
at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.access
$000(DocBuilder.java:345)
at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner
$1.run(DocBuilder.java:398)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.sql.SQLException: Streaming result set
com.mysql.jdbc.rowdatadyna...@15b57613 is still active. No statements
may be issued when any streaming result sets are open and in use on a
given connection. Ensure that you have called .close() on any active
streaming result sets before attempting more queries.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:930)
at
com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2694)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1868)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2109)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2642)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1544)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:672)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:625)
at org.apache.solr.handler.dataimport.JdbcDataSource
$ResultSetIterator.(JdbcDataSource.java:244)
... 13 more



Is the threads parameter supposed to work with jdbc/mysql?
As far as i understand the error message, solr/jdbc-mysql tries to use
the same mysql connection for multiple statements, which does not work.

Have i misunderstood the usage of the threads parameter?

Thanks again,
Moritz

Am Donnerstag, den 18.11.2010, 07:11 -0700 schrieb Shawn Heisey:
> I did a quick grep through the directory listing of the Solr 3.1 source, 
> the only part of your analysis chain that came up empty was 
> HTMLStripWhitespaceTokenizerFactory.  I think you'll have to replace it 
> with something like this:
> 
>  
>  
> 
> Also, the ISOLatin1AccentFilterFactory  is deprecated, replaced with 
> ASCIIFoldingFilterFactory.  It's still around, but could be removed at 
> any time.
> 
> Shawn
> 
> 
> On 11/18/2010 4:56 AM, Moritz Krinke wrote:
> > Hello,
> >
> > i have a running solr 1.3 installation and would like to migrate it to
> > solr 3 in order to get speed improvements by using the multiple threads
> > for indexing.
> >
> > When starting SOLR 3, i get the following error message:
> > SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'textfc'
> > specified on field descr
> >
> > I'm using the exact same schema.xml as with solr 1.3.
> > In the schema.xml, the fieldTyper "textfc" is specified as follows:
> >
> >   > positionIncrementGap="100">
> >
> >  
> >  
> >  
> >  
> >   > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
> >  
> >  
> >   > protected="protwords.txt"/>
> >  
> >
> >
> >   
> >  
> >  
> >  
> >   > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >  
> >   > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
> >  
> >  
> >   > protected="pro

Respect token order in matches

2010-11-18 Thread Robert Gründler
Hi,

is there a way to make solr respect the order of token matches when the query 
is a multi-term string?

Here's an example:

Query String: "John C"

Indexed Strings:

- "John Cage"
- "Cargill John"

This will return both indexed strings as a result. However, "Cargill John" 
should not match in that case, because the order 
of the tokens is not the same as in the query.

Here's the fieldtype:

  

   
 
 
  
 
 
   

   
 
 
 
 
   

  

Is there a way to achieve this using this fieldtype?


thanks!






Re: Save the file sent to the ExtractingRequestHandler locally on the server.

2010-11-18 Thread Chad Salamon
I'm using solr as a part of what could be described as a content
management system. I don't have a problem with uploading the files
independently of Solr, but I'm trying to avoid sending excess data.
I'm also trying to avoid any solutions that are system dependent.
Perhaps another option would be to create a servlet that handled the
file upload and then forwarded the data to Solr using SolrJ.

On Thu, Nov 18, 2010 at 2:06 AM, Kaustuv Royburman
 wrote:
> A possible solution is to use a directory on the server to upload the files.
> Monitor the directory for new uploads and then post the documents to the
> solr using curl.
>
> If you are using a linux based server you can use inotifywatch to monitor
> the folder for new file uploads and then use the following curl command
>
> curl
> "http://:/solr/update/extract?literal.id=&uprefix=attr_&fmap.content=content&literal.type=binaryfile&literal.url="
> -F "myfile=@"
>
> curl http://:/solr/update --data-binary '' -H
> 'Content-type:text/xml; charset=utf-8'
>
> Replace the following with actual values :
> :
>  : Actual name with extension
> 
> 
>
>
>
> ---
> Regards,
> Kaustuv Royburman
>
> Senior Software Developer
> infoservices.in
> DLF IT Park,
> Rajarhat, 1st Floor, Tower - 3
> Major Arterial Road,
> Kolkata - 700156,
> India
>
> On Thursday 18 November 2010 12:10 PM, Lance Norskog wrote:
>>
>> Upload the files independently of Solr. Solr is not a content management
>> system.
>> One problem is getting the links put together so that the link that comes
>> out with the document can be turned into a link the user can open.
>>
>> Chad Salamon wrote:
>>>
>>> I would like to save files sent to the ExtractingRequestHandler on the
>>> server processing it, and provide a link to the file in the solr
>>> document. I currently am running a solr core as a part of a larger web
>>> app, and I would like to publish the files as a part of that same web
>>> app. This way, both solr and the files can be behind the same security
>>> filters (Spring Security).
>>>
>>> I can think of two ways to do this - one would be to extend
>>> ExtractingRequestHandler to grab the files and then save them where I
>>> want to. The other would be to upload the files independently of Solr
>>> and then send them to the ExtractingRequestHandler through remote
>>> streaming.
>>>
>>> Any other suggestions would be appreciated. Thanks.
>>
>


Re: Meaning of avgTimePerRequest & avgRequestsPerSecond in SOLR stats page

2010-11-18 Thread Erick Erickson
No, that's not true. As long as you're not limited by some resource,
avgRequestsPerSecond can grow without impacting avgTimePerRequest
much.

avgTimePerRequest is the elapsed time from the beginning of Solr handling
the request to the end, measured in clock time. Say it takes 100 ms. Of
that 100 ms, let's say (and these are examples, not real data) that 50ms
is sitting around waiting for disk. A second query could be being serviced
in that same interval, so at 1 QPS you'd have an averageTimePreRequest
of 100ms. Ditto at 2 QPS.

That said, at some point you *will* get a relationship between the two
numbers, when some resource is being used 100%. At that point
as your QPS rises, so will your avgTimePerRequest.

Of course it's not that clean, but that's the idea...

Best
Erick

On Thu, Nov 18, 2010 at 9:02 AM, mesenthil <
senthilkumar.arumu...@mtvncontractor.com> wrote:

>
> We have recently upgraded some of our solr instances to 1.4.1 from 1.3.
> Interestingly both these parameter values got increased after our upgrade.
> When avgRequestsPerSecond increases, avgTimePerRequest should be increased.
> But it is not in our case..
>
> Any thoughts ?
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Meaning-of-avgTimePerRequest-avgRequestsPerSecond-in-SOLR-stats-page-tp1922692p1924031.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Issue with copyField when updating document

2010-11-18 Thread Pramod Goyal
I am using the solr admin to query the document. The returned document is
showing old values.

Lance,
I will not be able to post my configuration but i will create a simple
schema just to highlight the issue.

On Wed, Nov 17, 2010 at 9:56 PM, Erick Erickson wrote:

> How are you looking at the document? You mention using admin,
> are you searching?
>
> Because if you're looking at *terms* rather then the document,
> you should be aware that deleting a document does NOT remove
> the terms from the index, it just marks the doc as deleted.
>
> An optimize will remove the deleted document's terms.
>
> As Lance says, though, if you're displaying the document you
> should not be seeing the original values.
>
> Best
> Erick
>
> On Tue, Nov 16, 2010 at 11:50 PM, Pramod Goyal  >wrote:
>
> > Hi,
> > I am facing a issue with copyFields in SOlr. Here is what i am doing
> >
> > Schema:
> >
> >   
> >> multiValued="true"/>
> >> multiValued="true"/>
> >
> >   
> >
> >
> > I insert a document with say ID as 100 and product as sampleproduct. When
> i
> > view the document in the solr admin page i see the correct value for
> > the product_copy field ( same as the prodcut field ).
> > Next i try to update this document and for the field product i give 2
> > values
> > sampleproduct and testproduct. When i view the document on the solr admin
> > now
> > it show me 3 values in the copy field i.e. sampleproduct, testproduct and
> > sampleproduct ( the initial value in the copy field is retained even on
> > update ).
> >
> > Why is copyFiled retaining the old values when the original field value
> has
> > been updated. If i update the document multiple times all the old values
> > are
> > still retained in the copyField.
> >
> > Note that i am using the solrJ api to insert the document. I tried
> setting
> > null values for the copy field when i am updating the document but it
> didnt
> > solve the problem.
> >
>


Re: Issue with copyField when updating document

2010-11-18 Thread Pramod Goyal
Hi,
Forgot to mention solr version number:

Solr Implementation Version: 2010-04-30_08-05-41 939580 - hudson -
2010-04-30 08:37:22

On Thu, Nov 18, 2010 at 10:50 PM, Pramod Goyal wrote:

> I am using the solr admin to query the document. The returned document is
> showing old values.
>
> Lance,
> I will not be able to post my configuration but i will create a simple
> schema just to highlight the issue.
>
>
> On Wed, Nov 17, 2010 at 9:56 PM, Erick Erickson 
> wrote:
>
>> How are you looking at the document? You mention using admin,
>> are you searching?
>>
>> Because if you're looking at *terms* rather then the document,
>> you should be aware that deleting a document does NOT remove
>> the terms from the index, it just marks the doc as deleted.
>>
>> An optimize will remove the deleted document's terms.
>>
>> As Lance says, though, if you're displaying the document you
>> should not be seeing the original values.
>>
>> Best
>> Erick
>>
>> On Tue, Nov 16, 2010 at 11:50 PM, Pramod Goyal > >wrote:
>>
>> > Hi,
>> > I am facing a issue with copyFields in SOlr. Here is what i am doing
>> >
>> > Schema:
>> >
>> >   
>> >   > > multiValued="true"/>
>> >   > > multiValued="true"/>
>> >
>> >   
>> >
>> >
>> > I insert a document with say ID as 100 and product as sampleproduct.
>> When i
>> > view the document in the solr admin page i see the correct value for
>> > the product_copy field ( same as the prodcut field ).
>> > Next i try to update this document and for the field product i give 2
>> > values
>> > sampleproduct and testproduct. When i view the document on the solr
>> admin
>> > now
>> > it show me 3 values in the copy field i.e. sampleproduct, testproduct
>> and
>> > sampleproduct ( the initial value in the copy field is retained even on
>> > update ).
>> >
>> > Why is copyFiled retaining the old values when the original field value
>> has
>> > been updated. If i update the document multiple times all the old values
>> > are
>> > still retained in the copyField.
>> >
>> > Note that i am using the solrJ api to insert the document. I tried
>> setting
>> > null values for the copy field when i am updating the document but it
>> didnt
>> > solve the problem.
>> >
>>
>
>


[solved] Re: Multivalued field search...

2010-11-18 Thread Dario Rigolin
On Thursday, November 18, 2010 12:42:49 pm Dario Rigolin wrote:
> On Thursday, November 18, 2010 12:36:40 pm Dario Rigolin wrote:
> 
> Sorry wrong query:
> 
> q=field1:("A BB1" AND "B BB2")
> 
> Dario

q=field1:("A BB1 B BB2"~10)

I discovered that proximity search works well with multiple terms

Ciao.

Dario.


Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Karich

 Hi Peter!


* I believe the NRT patches are included in the 4.x trunk. I don't
think there's any support as yet in 3x (uses features in Lucene 3.0).


I'll investage how much effort it is to update to solr4


* For merging, I'm talking about commits/writes. If you merge while
commits are going on, things can get a bit messy (maybe on source
cores this is ok, but I have a feeling it's not).


ok


* For moving data to a an 'offline' read-only core, this is the trickiest bit.
We do this today by using a round-robin chain of remote shards and 2
local cores. At the boundary time (e.g. 1 day), the 'active' core is
replicated locally, then this local replica is replicated to the next
shard in the chain. Once everything is complete, the local replica is
discarded, and the 'active' core is cleaned, being careful not to
delete any new data since the replicated commit point.


Maybe I didn't fully understood what you explained: but doesn't this 
mean that you'll have one index per day?
Or are you overwriting, via replicating, every shard and the number of 
shard is fixed?
And why are you replicating from the local replica to the next shard? 
(why not directly from active to next shard?)


Regards,
Peter.


Re: Meaning of avgTimePerRequest & avgRequestsPerSecond in SOLR stats page

2010-11-18 Thread Shanmugavel SRD

Erick,
   Thanks a lot for explaining about these two fields.
   Could you please let us know which one we have to look for if we have to
monitor the performance? avgTimePerRequest OR avgRequestsPerSecond.

Thanks,
SRD
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Meaning-of-avgTimePerRequest-avgRequestsPerSecond-in-SOLR-stats-page-tp1922692p1925407.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dismax - Boosting

2010-11-18 Thread Solr User
Ahmet,

I modified the schema as follows: (Added more fields for faceting)














































































Also added Copy Fields as below:

























With the above changes I am not getting any facet data as a result.

Why is that the facet data not returning and what mistake I did with the
schema?

Thanks,
Solr User

On Wed, Nov 17, 2010 at 6:42 PM, Ahmet Arslan  wrote:

>
>
> Wow you facet on many fields :
>
> author,pubyear,format,series,season,imprint,category,award,age,reading,grade,price
>
> The fields you facet on should be untokenized type: string, int, tint date
> etc.
>
> The fields you want full text search, e.g. the ones you specify in qf, pf
> parameter should be text type.
> (title subtitle authordesc shortdesc imprint category isbn13 isbn10 format
> series season bisacsub award)
>
> If you have common fields, for example category, you need two copy of that.
> one string one text. So that you can both full-text search and facet on.
> Use copy field for this.
>
> 
>
> Example document:
> category: electronic devices
>
>
> query electronic will return it, and facets on category_string will be
> displayed as :
>
> electronic devices (1)
>
> not :
>
> electronic (1)
> devices (1)
>
>
>
> --- On Wed, 11/17/10, Solr User  wrote:
>
> > From: Solr User 
> > Subject: Re: Dismax - Boosting
> > To: solr-user@lucene.apache.org
> > Date: Wednesday, November 17, 2010, 11:31 PM
>  > Ahmet,
> >
> > Thanks for the reply and it was very helpful.
> >
> > The query that I used before changing to dismax was:
> >
> >
> /solr/tradecore/spell/?q=curious&wt=json&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true
> >
> > The above query use to return all the data related to
> > facets, data and also
> > any suggestions related to spelling mistakes properly.
> >
> > The configuration after modifying using dismax is as
> > below:
> >
> > Schema.xml:
> >
> > > indexed="true" stored="true"
> > omitNorms="true" />
> > > indexed="true" stored="true"
> > multiValued="true" omitNorms="true" />
> > > indexed="true" stored="true"
> > multiValued="true" omitNorms="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="false" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="false" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true"
> > multiValued="true" omitNorms="true" />
> > > indexed="false" stored="true" />
> > > indexed="true" stored="true"
> > multiValued="true" omitNorms="true" />
> > > indexed="true" stored="true"
> > multiValued="true" omitNorms="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="false" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true"
> > omitNorms="true"/>
> > > indexed="true" stored="true"/>
> >
> > SolrConfig.xml:
> >
> >> class="solr.SearchHandler" default="true">
> > 
> >   > name="defType">dismax
> >   > name="echoParams">explicit
> >  
> >  
> > title^9.0 subtitle^3.0
> > author^1.0 desc shortdesc imprint category
> > isbn13 isbn10 format series season bisacsub award
> >  
> >  
> >  
> > *
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >   
> >
> > The query that I used after changing to dismax is:
> >
> >
> solr/tradecore/select/?q=curious&wt=json&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true
> >
> >
> > The following are the issues that I am having after
> > modifying to dismax:
> >
> > 1. Facets data is not coming correctly. Lot of extra data
> > is coming. Why and
> > how to fix it?
> > 2. How to use spell checker request handler along with
> > dismax?
> >
> > Thanks,
> > Murali
> >
> > On Mon, Nov 15, 2010 at 5:38 PM, Ahmet Arslan 
> > wrote:
> >
> > > > 1. Do we need to change the above DisMax handler
> > > > configuration as per our
> > > > requirements? Or Leave it as it is? What
> > changes?
> > >
> > > Yes, you need to edit it. At least field names. Does
> > your schema has a
> > > field named sku?
> > >
> > 

Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Sturge
> Maybe I didn't fully understood what you explained: but doesn't this mean
> that you'll have one index per day?
> Or are you overwriting, via replicating, every shard and the number of shard
> is fixed?
> And why are you replicating from the local replica to the next shard? (why
> not directly from active to next shard?)

Yes, you can have one index per day (for us, our boundary is typically
1 month, so is less of an issue).
The 'oldest' replica in the round robin is overwritten, yes. We use
fixed shard numbers, but you don't have to.
Does yours need to be once a day?
We used our own round robin code because it was pre-Solr Cloud...
I'm not too familiar with them, but I believe it's certainly worth
having a look at Solr Cloud or Katta - could be useful here in
dynamically allocating shards.

Peter



On Thu, Nov 18, 2010 at 5:41 PM, Peter Karich  wrote:
>  Hi Peter!
>
>> * I believe the NRT patches are included in the 4.x trunk. I don't
>> think there's any support as yet in 3x (uses features in Lucene 3.0).
>
> I'll investage how much effort it is to update to solr4
>
>> * For merging, I'm talking about commits/writes. If you merge while
>> commits are going on, things can get a bit messy (maybe on source
>> cores this is ok, but I have a feeling it's not).
>
> ok
>
>> * For moving data to a an 'offline' read-only core, this is the trickiest
>> bit.
>> We do this today by using a round-robin chain of remote shards and 2
>> local cores. At the boundary time (e.g. 1 day), the 'active' core is
>> replicated locally, then this local replica is replicated to the next
>> shard in the chain. Once everything is complete, the local replica is
>> discarded, and the 'active' core is cleaned, being careful not to
>> delete any new data since the replicated commit point.
>
> Maybe I didn't fully understood what you explained: but doesn't this mean
> that you'll have one index per day?
> Or are you overwriting, via replicating, every shard and the number of shard
> is fixed?
> And why are you replicating from the local replica to the next shard? (why
> not directly from active to next shard?)
>
> Regards,
> Peter.
>


LockReleaseFailedException

2010-11-18 Thread Robert Gründler
Hi,

i'm suddenly getting a LockReleaseFailedException when starting a full-import 
using the Dataimporthandler:

org.apache.lucene.store.LockReleaseFailedException: Cannot forcefully unlock a 
NativeFSLock which is held by another indexer component


This worked without problems until just now. Is there some lock file i can 
remove to unlock the index again?


thanks.

-robert





WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich

 Hi,

I am going crazy but which config is necessary to include the missing doc 2?
I have:
doc1 tw:aBc
doc2 tw:abc

Now a query "aBc" returns only doc 1 although when I try doc2 from 
admin/analysis.jsp

then the term text 'abc' of the index gets highlighted as intended.
I even indexed a simple example (no stopwords, no protwords, no 
synonyms) via* and
tried this with the normal and dismax handler but I cannot make it 
working :-/


What have I misunderstood?

Regards,
Peter.





generateWordParts="1" generateNumberParts="1" 
catenateAll="0" preserveOriginal="1"/>


protected="protwords.txt"/>




ignoreCase="true" expand="true"/>
words="stopwords.txt" enablePositionIncrements="true" />

generateWordParts="1" generateNumberParts="1" 
catenateAll="0" preserveOriginal="1"/>


protected="protwords.txt"/>



--


*
books.csv:

id,tw
1,aBc
2,abc

curl http://localhost:8983/solr/update/csv?commit=true --data-binary 
@books.csv -H 'Content-type:text/plain; charset=utf-8'




Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Markus Jelsma
Hi,

Please add preserveOriginal="1"  to your WDF [1] definition and reindex (or 
just try with the analysis page).

This will make sure the original input token is being preserved along the 
newly generated tokens. If you then pass it all through a lowercase filter, it 
should match both documents.

[1]: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Cheers,


>   Hi,
> 
> I am going crazy but which config is necessary to include the missing doc
> 2? I have:
> doc1 tw:aBc
> doc2 tw:abc
> 
> Now a query "aBc" returns only doc 1 although when I try doc2 from
> admin/analysis.jsp
> then the term text 'abc' of the index gets highlighted as intended.
> I even indexed a simple example (no stopwords, no protwords, no
> synonyms) via* and
> tried this with the normal and dismax handler but I cannot make it
> working :-/
> 
> What have I misunderstood?
> 
> Regards,
> Peter.
> 
> 
> 
> 
> 
>   generateWordParts="1" generateNumberParts="1"
> catenateAll="0" preserveOriginal="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
>  ignoreCase="true" expand="true"/>
>  words="stopwords.txt" enablePositionIncrements="true" />
>   generateWordParts="1" generateNumberParts="1"
> catenateAll="0" preserveOriginal="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> --
> 
> 
> *
> books.csv:
> 
> id,tw
> 1,aBc
> 2,abc
> 
> curl http://localhost:8983/solr/update/csv?commit=true --data-binary
> @books.csv -H 'Content-type:text/plain; charset=utf-8'


Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Karich




 Does yours need to be once a day?


no, I only thought you use one day :-)
so you don't or do you have 31 shards?



 having a look at Solr Cloud or Katta - could be useful
 here in dynamically allocating shards.


ah, thx! I will take a look at it (after trying solr4)!

Regards,
Peter.



Maybe I didn't fully understood what you explained: but doesn't this mean
that you'll have one index per day?
Or are you overwriting, via replicating, every shard and the number of shard
is fixed?
And why are you replicating from the local replica to the next shard? (why
not directly from active to next shard?)

Yes, you can have one index per day (for us, our boundary is typically
1 month, so is less of an issue).
The 'oldest' replica in the round robin is overwritten, yes. We use
fixed shard numbers, but you don't have to.
Does yours need to be once a day?
We used our own round robin code because it was pre-Solr Cloud...
I'm not too familiar with them, but I believe it's certainly worth
having a look at Solr Cloud or Katta - could be useful here in
dynamically allocating shards.

Peter



On Thu, Nov 18, 2010 at 5:41 PM, Peter Karich  wrote:

  Hi Peter!


* I believe the NRT patches are included in the 4.x trunk. I don't
think there's any support as yet in 3x (uses features in Lucene 3.0).

I'll investage how much effort it is to update to solr4


* For merging, I'm talking about commits/writes. If you merge while
commits are going on, things can get a bit messy (maybe on source
cores this is ok, but I have a feeling it's not).

ok


* For moving data to a an 'offline' read-only core, this is the trickiest
bit.
We do this today by using a round-robin chain of remote shards and 2
local cores. At the boundary time (e.g. 1 day), the 'active' core is
replicated locally, then this local replica is replicated to the next
shard in the chain. Once everything is complete, the local replica is
discarded, and the 'active' core is cleaned, being careful not to
delete any new data since the replicated commit point.

Maybe I didn't fully understood what you explained: but doesn't this mean
that you'll have one index per day?
Or are you overwriting, via replicating, every shard and the number of shard
is fixed?
And why are you replicating from the local replica to the next shard? (why
not directly from active to next shard?)

Regards,
Peter.




--
http://jetwick.com twitter search prototype



Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich



Hi,

Please add preserveOriginal="1"  to your WDF [1] definition and reindex (or
just try with the analysis page).


but it is already there!?




Regards,
Peter.


Hi,

Please add preserveOriginal="1"  to your WDF [1] definition and reindex (or
just try with the analysis page).

This will make sure the original input token is being preserved along the
newly generated tokens. If you then pass it all through a lowercase filter, it
should match both documents.

[1]:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Cheers,



   Hi,

I am going crazy but which config is necessary to include the missing doc
2? I have:
doc1 tw:aBc
doc2 tw:abc

Now a query "aBc" returns only doc 1 although when I try doc2 from
admin/analysis.jsp
then the term text 'abc' of the index gets highlighted as intended.
I even indexed a simple example (no stopwords, no protwords, no
synonyms) via* and
tried this with the normal and dismax handler but I cannot make it
working :-/

What have I misunderstood?

Regards,
Peter.


















--


*
books.csv:

id,tw
1,aBc
2,abc

curl http://localhost:8983/solr/update/csv?commit=true --data-binary
@books.csv -H 'Content-type:text/plain; charset=utf-8'


Re: Respect token order in matches

2010-11-18 Thread Markus Jelsma
Hi,

I'm not sure what QParser you're using but with the DismaxQParser you can 
specify slop on explicit phrase queries, did you set it because it can make a 
difference. Check it out:

http://wiki.apache.org/solr/DisMaxQParserPlugin#qs_.28Query_Phrase_Slop.29

Cheers,

> Hi,
> 
> is there a way to make solr respect the order of token matches when the
> query is a multi-term string?
> 
> Here's an example:
> 
> Query String: "John C"
> 
> Indexed Strings:
> 
> - "John Cage"
> - "Cargill John"
> 
> This will return both indexed strings as a result. However, "Cargill John"
> should not match in that case, because the order of the tokens is not the
> same as in the query.
> 
> Here's the fieldtype:
> 
>positionIncrementGap="100">
> 
>
>  
>  
>   words="stopwords.txt" enablePositionIncrements="true" />  class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement=""
> replace="all" />  minGramSize="1" maxGramSize="25" /> 
> 
>
>  
>  
>   words="stopwords.txt" enablePositionIncrements="true" />  class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement=""
> replace="all" /> 
> 
>   
> 
> Is there a way to achieve this using this fieldtype?
> 
> 
> thanks!


Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Ken Stanley
On Thu, Nov 18, 2010 at 3:22 PM, Peter Karich  wrote:
>
>> Hi,
>>
>> Please add preserveOriginal="1"  to your WDF [1] definition and reindex
>> (or
>> just try with the analysis page).
>
> but it is already there!?
>
>                          generateWordParts="1" generateNumberParts="1"
> catenateAll="0" preserveOriginal="1"/>
>
>
> Regards,
> Peter.
>

Peter,

I recently had this issue, and I had to set splitOnCaseChange="0" to
keep the word delimiter filter from doing what you describe. Can you
try that and see if it helps?

- Ken


Reindex Solr Using Tomcat

2010-11-18 Thread Eric Martin
Hi,

 

I searched google and the wiki to find out how I can force a full re-index
of all of my content and I came up with zilch. My goal is to be able to
adjust the weight settings, re-index  my entire database and then search my
site and view the results of my weight adjustments.

 

I am using Tomcat 5.x and Solr 1.4.1. Weird how I couldn't find this info. I
must have missed it. Anyone know where to find it?

 

Eric

 



Re: Meaning of avgTimePerRequest & avgRequestsPerSecond in SOLR stats page

2010-11-18 Thread Erick Erickson
avgTimePerRequest is the important one for your users. They don't care if
you're processing
a million QPS, they care how long *their* query took. But you also have to
pay attention
to longest response times

Best
Erick

On Thu, Nov 18, 2010 at 12:54 PM, Shanmugavel SRD
wrote:

>
> Erick,
>   Thanks a lot for explaining about these two fields.
>   Could you please let us know which one we have to look for if we have to
> monitor the performance? avgTimePerRequest OR avgRequestsPerSecond.
>
> Thanks,
> SRD
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Meaning-of-avgTimePerRequest-avgRequestsPerSecond-in-SOLR-stats-page-tp1922692p1925407.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Reindex Solr Using Tomcat

2010-11-18 Thread Ken Stanley
On Thu, Nov 18, 2010 at 3:33 PM, Eric Martin  wrote:
> Hi,
>
>
>
> I searched google and the wiki to find out how I can force a full re-index
> of all of my content and I came up with zilch. My goal is to be able to
> adjust the weight settings, re-index  my entire database and then search my
> site and view the results of my weight adjustments.
>
>
>
> I am using Tomcat 5.x and Solr 1.4.1. Weird how I couldn't find this info. I
> must have missed it. Anyone know where to find it?
>
>
>
> Eric
>

Eric,

How you re-index SOLR determines which method you wish to use. You can
either use the UpdateHandler using a POST of an XML file [1], or you
can use the DataImportHandler (DIH) [2]. There exist other means, but
these two should be sufficient to get started. How did you import your
initial index in the first place?

[1] http://wiki.apache.org/solr/UpdateXmlMessages
[2] http://wiki.apache.org/solr/DataImportHandler


Re: Dismax - Boosting

2010-11-18 Thread Erick Erickson
The changes that you made have no relevance to the fields you named
in your query. Things like author, format, etc. You have to ask to
facet by your new fields...

And if you did send a different query, did you reindex after your config
changes?

It would be better if you made a habit of showing the results of
your query with &debugQuery=on to help us diagnose problems, otherwise
we're just guessing...

Best
Erick

On Thu, Nov 18, 2010 at 1:05 PM, Solr User  wrote:

> Ahmet,
>
> I modified the schema as follows: (Added more fields for faceting)
>
>
>  omitNorms="true" />
>
>  multiValued="true" omitNorms="true" />
>
>  multiValued="true" omitNorms="true" />
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
>  multiValued="true" omitNorms="true" />
>
> 
>
>  multiValued="true" omitNorms="true" />
>
>  multiValued="true" omitNorms="true" />
>
> 
>
> 
>
> 
>
> 
>
> 
>
>  omitNorms="true"/>
>
> 
>
>  omitNorms="true"/>
>
>  multiValued="true" omitNorms="true"/>
>
>  omitNorms="true"/>
>
>  omitNorms="true"/>
>
>  omitNorms="true"/>
>
>  omitNorms="true"/>
>
>  multiValued="true" omitNorms="true"/>
>
>  multiValued="true" omitNorms="true"/>
>
>  omitNorms="true"/>
>
>  omitNorms="true"/>
>
>  omitNorms="true"/>
>
>  omitNorms="true"/>
>
> Also added Copy Fields as below:
>
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
> With the above changes I am not getting any facet data as a result.
>
> Why is that the facet data not returning and what mistake I did with the
> schema?
>
> Thanks,
> Solr User
>
> On Wed, Nov 17, 2010 at 6:42 PM, Ahmet Arslan  wrote:
>
> >
> >
> > Wow you facet on many fields :
> >
> >
> author,pubyear,format,series,season,imprint,category,award,age,reading,grade,price
> >
> > The fields you facet on should be untokenized type: string, int, tint
> date
> > etc.
> >
> > The fields you want full text search, e.g. the ones you specify in qf, pf
> > parameter should be text type.
> > (title subtitle authordesc shortdesc imprint category isbn13 isbn10
> format
> > series season bisacsub award)
> >
> > If you have common fields, for example category, you need two copy of
> that.
> > one string one text. So that you can both full-text search and facet on.
> > Use copy field for this.
> >
> > 
> >
> > Example document:
> > category: electronic devices
> >
> >
> > query electronic will return it, and facets on category_string will be
> > displayed as :
> >
> > electronic devices (1)
> >
> > not :
> >
> > electronic (1)
> > devices (1)
> >
> >
> >
> > --- On Wed, 11/17/10, Solr User  wrote:
> >
> > > From: Solr User 
> > > Subject: Re: Dismax - Boosting
> > > To: solr-user@lucene.apache.org
> > > Date: Wednesday, November 17, 2010, 11:31 PM
> >  > Ahmet,
> > >
> > > Thanks for the reply and it was very helpful.
> > >
> > > The query that I used before changing to dismax was:
> > >
> > >
> >
> /solr/tradecore/spell/?q=curious&wt=json&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true
> > >
> > > The above query use to return all the data related to
> > > facets, data and also
> > > any suggestions related to spelling mistakes properly.
> > >
> > > The configuration after modifying using dismax is as
> > > below:
> > >
> > > Schema.xml:
> > >
> > > > > indexed="true" stored="true"
> > > omitNorms="true" />
> > > > > indexed="true" stored="true"
> > > multiValued="true" omitNorms="true" />
> > > > > indexed="true" stored="true"
> > > multiValued="true" omitNorms="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="false" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="false" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="true" stored="true"
> > > multiValued="true" omitNorms="true" />
> > > > > indexed="false" stored="true" />
> > > > > indexed="true" stored="true"
> > > multiValued="true" omitNorms="true" />
> > > > > indexed="true" stored="true"
> > > multiValued="true" omitNorms="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="false" stored="true" />
> > > > > indexed="true" stored="true" />
> > > > > indexed="true" stored="true"
> > > omitNorms="true"/>
> > > > > indexed="true" stored="true"/>
> > >
> > > SolrConfig.xml:
> > >
> > >> > class="solr.SearchHandler" 

RE: Reindex Solr Using Tomcat

2010-11-18 Thread Eric Martin
Ah, I am using an ApacheSolr module in Drupal and used nutch to insert the data 
into the Solr index. When I using Jetty I could just delete the data contents 
in sshd and then restart the service forcing the reindex.

Currently, the ApacheSolr module for Drupal allows for a 200 record re-index 
every cron run, but that is too slow for me. During implantation and testing I 
would prefer to re-index the entire database as I have over 400k records. 

I appreciate your help. My mind was searching for a command on the CLI that 
would just tell solr to reindex the entire dbase and be done with it.

-Original Message-
From: Ken Stanley [mailto:doh...@gmail.com] 
Sent: Thursday, November 18, 2010 12:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Reindex Solr Using Tomcat

On Thu, Nov 18, 2010 at 3:33 PM, Eric Martin  wrote:
> Hi,
>
>
>
> I searched google and the wiki to find out how I can force a full re-index
> of all of my content and I came up with zilch. My goal is to be able to
> adjust the weight settings, re-index  my entire database and then search my
> site and view the results of my weight adjustments.
>
>
>
> I am using Tomcat 5.x and Solr 1.4.1. Weird how I couldn't find this info. I
> must have missed it. Anyone know where to find it?
>
>
>
> Eric
>

Eric,

How you re-index SOLR determines which method you wish to use. You can
either use the UpdateHandler using a POST of an XML file [1], or you
can use the DataImportHandler (DIH) [2]. There exist other means, but
these two should be sufficient to get started. How did you import your
initial index in the first place?

[1] http://wiki.apache.org/solr/UpdateXmlMessages
[2] http://wiki.apache.org/solr/DataImportHandler



Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich



Peter,

I recently had this issue, and I had to set splitOnCaseChange="0" to
keep the word delimiter filter from doing what you describe. Can you
try that and see if it helps?

- Ken



Hi Ken,

yes this would solve my problem,
but then I would lost a match for 'SuperMario' if I query 'mario', right?

This is not an option for me at the moment. Maybe its a bug in solr? 
Again the admin page says:
"all is fine" but when I query via http (or SolrJ) it does not return 
doc2. Strange.


Regards,
Peter.


Containers running SOLR: supported or unsupported?

2010-11-18 Thread Dyer, James
We're working on a budgeting for an environment to begin using SOLR in 
Production and the question came up about whether or not we should pay for 
commercial support on the container that SOLR runs under.  We've pretty much 
decided to run on JBOSS simply because that's what we use company-wide.  But 
we're not sure if a community edition would suffice or if we should pay for 
commercial support.

In my reading, it sounds like almost everyone is using either Tomcat or 
Jettyand probably unsupported?  So I'm asking, if anyone here has any 
guidance on the subject.  I guess if anyone has a story of how they had 
difficulty with SOLR and the container it ran and having Support helped, or if 
you wish you had Support and didn't, I'd like to hear it.

I appreciate your feedback and advice.  Thanks.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311



Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Sturge
> no, I only thought you use one day :-)
> so you don't or do you have 31 shards?
>

No, we use 1 shard per month - e.g. 7 shards will hold 7 month's of data.
It can be set to 1 day, but you would need to have a huge amount of
data in a single day to warrant doing that.



On Thu, Nov 18, 2010 at 8:20 PM, Peter Karich  wrote:
>
>
>>  Does yours need to be once a day?
>
> no, I only thought you use one day :-)
> so you don't or do you have 31 shards?
>
>
>>  having a look at Solr Cloud or Katta - could be useful
>>  here in dynamically allocating shards.
>
> ah, thx! I will take a look at it (after trying solr4)!
>
> Regards,
> Peter.
>
>
>>> Maybe I didn't fully understood what you explained: but doesn't this mean
>>> that you'll have one index per day?
>>> Or are you overwriting, via replicating, every shard and the number of
>>> shard
>>> is fixed?
>>> And why are you replicating from the local replica to the next shard?
>>> (why
>>> not directly from active to next shard?)
>>
>> Yes, you can have one index per day (for us, our boundary is typically
>> 1 month, so is less of an issue).
>> The 'oldest' replica in the round robin is overwritten, yes. We use
>> fixed shard numbers, but you don't have to.
>> Does yours need to be once a day?
>> We used our own round robin code because it was pre-Solr Cloud...
>> I'm not too familiar with them, but I believe it's certainly worth
>> having a look at Solr Cloud or Katta - could be useful here in
>> dynamically allocating shards.
>>
>> Peter
>>
>>
>>
>> On Thu, Nov 18, 2010 at 5:41 PM, Peter Karich  wrote:
>>>
>>>  Hi Peter!
>>>
 * I believe the NRT patches are included in the 4.x trunk. I don't
 think there's any support as yet in 3x (uses features in Lucene 3.0).
>>>
>>> I'll investage how much effort it is to update to solr4
>>>
 * For merging, I'm talking about commits/writes. If you merge while
 commits are going on, things can get a bit messy (maybe on source
 cores this is ok, but I have a feeling it's not).
>>>
>>> ok
>>>
 * For moving data to a an 'offline' read-only core, this is the
 trickiest
 bit.
 We do this today by using a round-robin chain of remote shards and 2
 local cores. At the boundary time (e.g. 1 day), the 'active' core is
 replicated locally, then this local replica is replicated to the next
 shard in the chain. Once everything is complete, the local replica is
 discarded, and the 'active' core is cleaned, being careful not to
 delete any new data since the replicated commit point.
>>>
>>> Maybe I didn't fully understood what you explained: but doesn't this mean
>>> that you'll have one index per day?
>>> Or are you overwriting, via replicating, every shard and the number of
>>> shard
>>> is fixed?
>>> And why are you replicating from the local replica to the next shard?
>>> (why
>>> not directly from active to next shard?)
>>>
>>> Regards,
>>> Peter.
>>>
>
>
> --
> http://jetwick.com twitter search prototype
>
>


using DIH with mets/alto file sets

2010-11-18 Thread Fred Gilmore
mets/alto is an xml standard for describing physical objects.  In this 
case, we're describing books.  The mets file holds the metadata (author, 
title, etc.), the alto file is the physical description (words on the 
page, formatting of the page).  So it's a one (mets) to many (alto) 
relationship.


the directory structure:

/our/collection/IDxxx/:

IDxxx-mets.xml
ALTO/

/our/collection/IDxxx/ALTO/:

IDxxx-ALTO001.xml
IDxxx-ALTO002.xml

ie. an xml file per scanned book page.

Beyond the ID number as part of the file names, the mets file contains 
no reference to the alto children.  The alto children do contain a 
reference to the jpg page scan, which is labelled with the ID number as 
part of the name.


The idea is to create a full text index of the alto content, accompanied 
by the author/title info from the mets file for purposes of results 
display.  The first try with this is attempting a recursive 
FileDataSource approach.


It was relatively easy to create a "content" field which holds the text 
of the page (each word is actually an attribute of a separate tag), but 
I'm having difficulty determining how I'm going to conditionally add the 
author and title data from the METS file to the rows created with the 
ALTO content field.  It'll involve regex'ing out the ID number 
associated with both the mets and alto filenames for starters, but even 
at that, I don't see how to keep it straight since it's not one mets=one 
alto and it's also not a static string for the entire index.


thanks for any hints you can provide.

Fred
University of Texas at Austin
==
data-config.xml thus far:




processor="FileListEntityProcessor" fileName=".xml$" recursive="true"

baseDir="/home/utlol/htdocs/lib-landscapes-new/publications/">








xpath="/mets/dmdSec/mdWrap/xmlData/mods/titleInfo/title" />


xpath="/alto/Description/sourceImageInformation/fileName" />
xpath="/alto/Layout/Page/PrintSpace/TextBlock/TextLine/String/@CONTENT" />





==
METS example:


http://www.w3.org/2001/XMLSchema-instance"; 
xmlns="http://www.loc.gov/METS/"; 
xsi:schemaLocation="http://www.loc.gov/METS/ 
http://schema.ccs-gmbh.com/docworks/version20/mets-docworks.xsd"; 
xmlns:MODS="http://www.loc.gov/mods/v3"; 
xmlns:mix="http://www.loc.gov/mix/"; 
xmlns:xlink="http://www.w3.org/1999/xlink"; TYPE="METAe_Monograph" 
LABEL="ENVIRONMENTAL GEOLOGIC ATLAS OF THE TEXAS COASTAL ZONE- 
Kingsville Area">



CCS docWORKS/METAe Version 6.3-0
docWORKS-ID: 1677








ENVIRONMENTAL GEOLOGIC ATLAS OF THE TEXAS COASTAL ZONE- 
Kingsville Area



L F. Brown, Jr., J. H. McGowen, T. J. Evans, 
C. G.

Groat

aut



W. L.
Fisher

aut




ALTO example:


http://www.w3.org/2001/XMLSchema-instance"; 
xsi:noNamespaceSchemaLocation="http://schema.ccs-gmbh.com/metae/alto-1-1.xsd"; 
xmlns:xlink="http://www.w3.org/TR/xlink";>


mm10

/Docworks/IN/GeologyBooks/txu-oclc-6917337/txu-oclc-6917337-009.jpg




CCS Content Conversion Specialists GmbH, 
Germany

CCS docWORKS
6.3-0.93




ABBYY (BIT Software), Russia
FineReader
7.0















HEIGHT="2345"/>
HEIGHT="314"/>
HEIGHT="2345">
HEIGHT="28" STYLEREFS="TXT_0 PAR_CENTER">


CONTENT="Preface" WC="0.98" CC="000"/>







simple production set up

2010-11-18 Thread lee carroll
Hi I'm pretty new to SOLR and interested in getting an idea about a simple
standard way of setting up a production SOLR service. I have read the FAQs
and the wiki around SOLR security and performance but have not found much on
a best practice architecture. I'm particularly interested in best practices
around DOS prevention, securing the SOLR web app and setting up dev, test,
production indexes.

Any pointers, links to resources would be great. Thanks in advance

Lee C


Re: simple production set up

2010-11-18 Thread Markus Jelsma
Hi,

It's a common practice not to use Solr as a frontend. Almost all deployed 
instances live in the backend near the database servers. And if Solr is being 
put to the front, it's still being secured by a proxy.

Setting up staging and production instances depend on your need. If the load 
is small, you can run two Solr cores [1] on the same instance and if the load 
is high you'd just separate them, the same goes for development and test 
instances.

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers,

> Hi I'm pretty new to SOLR and interested in getting an idea about a simple
> standard way of setting up a production SOLR service. I have read the FAQs
> and the wiki around SOLR security and performance but have not found much
> on a best practice architecture. I'm particularly interested in best
> practices around DOS prevention, securing the SOLR web app and setting up
> dev, test, production indexes.
> 
> Any pointers, links to resources would be great. Thanks in advance
> 
> Lee C


Re: Reindex Solr Using Tomcat

2010-11-18 Thread Ken Stanley
On Thu, Nov 18, 2010 at 3:42 PM, Eric Martin  wrote:
> Ah, I am using an ApacheSolr module in Drupal and used nutch to insert the 
> data into the Solr index. When I using Jetty I could just delete the data 
> contents in sshd and then restart the service forcing the reindex.
>
> Currently, the ApacheSolr module for Drupal allows for a 200 record re-index 
> every cron run, but that is too slow for me. During implantation and testing 
> I would prefer to re-index the entire database as I have over 400k records.
>
> I appreciate your help. My mind was searching for a command on the CLI that 
> would just tell solr to reindex the entire dbase and be done with it.
>

Eric,

>From what I could find, this looks to be your best bet:
http://drupal.org/node/267543.

- Ken


Experiencing lots of full GC runs

2010-11-18 Thread Simon Wistow
We currently have a 30G index with 73M of .tii files running on a 
machine with 4 Intel 2.27GHz Xeons with 15G of memory.

About once a second a process indexes ~10-20 smallish documents using 
the XML Update Handler. A commit happens after every update. However we 
see this behaviour even if the indexer isn't running.

The system is running under Tomcat6 with Solr 1.4.1 955763M - mark - 
2010-06-17 18:06:42 and Lucene 2.9.3 951790 - 2010-06-06 01:30:55

Out GC settings (the least worst we've found so far) currently look like 

-XX:+UseConcMarkSweepGC 
-XX:+CMSIncrementalMode 
-XX:+UseParNewGC
-XX:NewSize=5G 
-XX:SurvivorRatio=3 
-Xmx10G -Xss10M 
-XX:CMSInitiatingOccupancyFraction=40 
-XX:+UseCMSInitiatingOccupancyOnly

Everything is fine until we start to try and search at which point 
performance goes to hell with multi second response times and frequent 
full GC runs (approx every 15 seconds) looking like

2372.886: [Full GC 2372.886: [CMS2378.577: [CMS-concurrent-mark: 
5.912/5.913 secs] [Times: user=6.10 sys=0.01, real=5.91 secs] 
 (concurrent mode failure): 5242879K->5242879K(5242880K), 18.2557740 
secs] 9437183K->9409440K(9437184K), [CMS Perm : 30246K->30242K(50552K)] 
icms_dc=100 , 18.2558680 secs] [Times: user=18.20 sys=0.05, real=18.26 
secs] 

Looking at top jsvc is using 100% of CPU.

I'm baffled - I've had way bigger indexes than this before with no 
performance problems. At first it was the frequent updates but the fact 
that it happens even when the indexer isn't running seems to put paid to 
that.

One salient point - because of the frequent updates we don't have a 
queryResultCache configured.

Any ideas? Hints? Tips?

Simon



 


Re: Experiencing lots of full GC runs

2010-11-18 Thread Simon Wistow
On Fri, Nov 19, 2010 at 12:01:09AM +, me said:
> I'm baffled - I've had way bigger indexes than this before with no 
> performance problems. At first it was the frequent updates but the fact 
> that it happens even when the indexer isn't running seems to put paid to 
> that.

More information:

- The index has ~30 million smallish documents 

- Once a slow query has been executed all other queries, even ones which 
had previously been slow but tolerable (response times ~1s) become 
incredibly slow

- Once the process has turned slow only a kill -9 will bring it down

- Upgrading to a recent nightly build of Solr (3.1-2010-11-18_05-27-29 
1036325 - hudson - 2010-11-18 05:41:58) has made things even slower

- I'd check with 4.0.x if someone can point me at a tool that can 
migrate indexes. I seem to be unable to find one and Lucene 3.0 informs 
me that it's incompatible with 2.9.x 









Re: Must require quote with single word token query?

2010-11-18 Thread Chamnap Chhorn
Well, this field is a keyphrase. I want to make it to case-insensitive
single token field. It matches only when the user types the same as data in
solr.

What's wrong with that? Does it can be done in another way?

On Thu, Nov 18, 2010 at 6:08 PM, Ahmet Arslan  wrote:

> This happening because query parser pre-tokenizes your query using whites
> paces. It is tokenized before it reaches your query analyzer.
> And you are using KeywordTokenizer in your field definition.
>
> Is there a special reason for you to use KeywordTokenizer ?
>
>
> --- On Thu, 11/18/10, Chamnap Chhorn  wrote:
>
> > From: Chamnap Chhorn 
> > Subject: Re: Must require quote with single word token query?
> > To: solr-user@lucene.apache.org
> > Date: Thursday, November 18, 2010, 5:19 AM
> > Thanks for your reply. Here is some
> > other details:
> >
> > 1. Keyphrase field definition:
> > > type="text_keyword" indexed="true" stored="false"
> > multiValued="true"/>
> >
> > 2. I'm using solr 1.4.
> >
> > 3. My dismax definition is the original configuration after
> > install solr:
> >> class="solr.SearchHandler" >
> > 
> >   > name="defType">dismax
> >   > name="echoParams">explicit
> >   > name="tie">0.01
> >  
> > text^0.5 features^1.0 name^1.2
> > sku^1.5 id^10.0 manu^1.1 cat^1.4
> >  
> >  
> > text^0.2 features^1.1 name^1.5
> > manu^1.4 manu_exact^1.9
> >  
> >  
> > popularity^0.5
> > recip(price,1,1000,1000)^0.3
> >  
> >  
> > id,name,price,score
> >  
> >  
> > 2<-1 5<-2
> > 6<90%
> >  
> >   > name="ps">100
> >   > name="q.alt">*:*
> >  
> >  text
> > features name
> >  
> >   > name="f.name.hl.fragsize">0
> >  
> >   > name="f.name.hl.alternateField">name
> >   > name="f.text.hl.fragmenter">regex 
> > 
> >   
> >
> > 4. Here is the result returned back of the original query:
> > smart mobile.
> >
> > 
> >
> > 
> >
> > 
> >  0
> >
> >  1
> >  
> >
> >   on
> >   uuid,name,fap
> >
> >   true
> >   smart mobile
> >
> >   keyphrase
> >   dismax
> >
> >  
> > 
> > 
> >
> > 
> >
> >  smart mobile
> >
> >  smart mobile
> >   > name="parsedquery">+((DisjunctionMaxQuery((keyphrase:smart))
> > DisjunctionMaxQuery((keyphrase:mobile)))~2) ()
> >
> >   > name="parsedquery_toString">+(((keyphrase:smart)
> > (keyphrase:mobile))~2) ()
> >
> >  
> >  DisMaxQParser
> >
> >  
> >  
> >
> >  
> >   1.0
> >
> >   
> >  > name="time">1.0
> >
> >  > name="org.apache.solr.handler.component.QueryComponent">
> >   > name="time">1.0
> >
> > 
> >  > name="org.apache.solr.handler.component.FacetComponent">
> >
> >   > name="time">0.0
> > 
> >
> >  > name="org.apache.solr.handler.component.MoreLikeThisComponent">
> >   > name="time">0.0
> >
> > 
> >  > name="org.apache.solr.handler.component.HighlightComponent">
> >
> >   > name="time">0.0
> > 
> >
> >  > name="org.apache.solr.handler.component.StatsComponent">
> >   > name="time">0.0
> >
> > 
> >  > name="org.apache.solr.handler.component.DebugComponent">
> >
> >   > name="time">0.0
> >
> > 
> >
> >   
> >   
> >  > name="time">0.0
> >
> >  > name="org.apache.solr.handler.component.QueryComponent">
> >   > name="time">0.0
> >
> > 
> >  > name="org.apache.solr.handler.component.FacetComponent">
> >
> >   > name="time">0.0
> > 
> >
> >  > name="org.apache.solr.handler.component.MoreLikeThisComponent">
> >   > name="time">0.0
> >
> > 
> >  > name="org.apache.solr.handler.component.HighlightComponent">
> >
> >   > name="time">0.0
> >
> > 
> >
> >  > name="org.apache.solr.handler.component.StatsComponent">
> >   > name="time">0.0
> >
> > 
> >  > name="org.apache.solr.handler.component.DebugComponent">
> >
> >   > name="time">0.0
> > 
> >
> >   
> >
> >  
> > 
> > 
> >
> > 5. Here is parsed query with "smart mobile" (with quotes)
> > which returns the
> > result:
> >
> > 
> >  "smart
> > mobile"
> >
> >  "smart mobile"
> >
> >
> >   > name="parsedquery">+DisjunctionMaxQuery((keyphrase:smart
> > mobile)) ()
> >
> >  +(keyphrase:smart
> > mobile) ()
> >
> >  
> >> name="D297A64B-D4BA-4445-B63E-726E5A4F758D">
> >
> > 4.503682 = (MATCH) sum of:
> >   4.503682 = (MATCH) fieldWeight(keyphrase:smart
> > mobile in 13092), product of:
> > 1.0 = tf(termFreq(keyphrase:smart mobile)=1)
> > 10.29413 = idf(docFreq=1, maxDocs=21748)
> > 0.4375 = fieldNorm(field=keyphrase,
> > doc=13092)
> >
> > 
> >  
> >
> > 6. Here, I tried to use automatic phrase query (pf
> > parameter): doesn't
> > return any results.
> >
> http://localhost:8081/solr/select?q=smart%20mobile&qf=keyphrase&pf=keyphrase&debugQuery=on&defType=dismax
> >
> >  smart mobile
> >
> >  smart mobile
> >   > name="parsedquery">+((DisjunctionMaxQuery((keyphrase:smart))
> > DisjunctionMaxQuery((keyphrase:mobile)))~2)
> > DisjunctionMaxQuery((keyphrase:smart mobile))
>

Re: Experiencing lots of full GC runs

2010-11-18 Thread Lance Norskog
Does it need 10G to run? Have you cycled it down to, say, 4-5G as a
test? Large memory sizes can just cause more garbage collection.

What is the disk activity when this happens? Do you have paging turned
on? I generally turn it off- having things go into page-thrash mode is
lame.

How many searchers are open? With commits that quick, you might be
building up old searches which hold their data index data open.

On Thu, Nov 18, 2010 at 5:49 PM, Simon Wistow  wrote:
> On Fri, Nov 19, 2010 at 12:01:09AM +, me said:
>> I'm baffled - I've had way bigger indexes than this before with no
>> performance problems. At first it was the frequent updates but the fact
>> that it happens even when the indexer isn't running seems to put paid to
>> that.
>
> More information:
>
> - The index has ~30 million smallish documents
>
> - Once a slow query has been executed all other queries, even ones which
> had previously been slow but tolerable (response times ~1s) become
> incredibly slow
>
> - Once the process has turned slow only a kill -9 will bring it down
>
> - Upgrading to a recent nightly build of Solr (3.1-2010-11-18_05-27-29
> 1036325 - hudson - 2010-11-18 05:41:58) has made things even slower
>
> - I'd check with 4.0.x if someone can point me at a tool that can
> migrate indexes. I seem to be unable to find one and Lucene 3.0 informs
> me that it's incompatible with 2.9.x
>
>
>
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: using DIH with mets/alto file sets

2010-11-18 Thread Lance Norskog
Some ideas:

XPathEntityProcessor parses a very limited XPath syntax. However, you
can add an XSL script as an attribute, and this somehow gets called
instead.

With this, you might be able to create an XPath that selects out every
combination that you want.

A second option: SOLR-1499 is an entity processor that fetches from a
Solr instance. You could index all of the Alto records in one pass,
then fetch back each all records for each Mets record that you want to
associate with the alto record  and re-index the Alto document with
the new data.

But really this sounds like a database join problem.

On Thu, Nov 18, 2010 at 1:09 PM, Fred Gilmore  wrote:
> mets/alto is an xml standard for describing physical objects.  In this case,
> we're describing books.  The mets file holds the metadata (author, title,
> etc.), the alto file is the physical description (words on the page,
> formatting of the page).  So it's a one (mets) to many (alto) relationship.
>
> the directory structure:
>
> /our/collection/IDxxx/:
>
> IDxxx-mets.xml
> ALTO/
>
> /our/collection/IDxxx/ALTO/:
>
> IDxxx-ALTO001.xml
> IDxxx-ALTO002.xml
>
> ie. an xml file per scanned book page.
>
> Beyond the ID number as part of the file names, the mets file contains no
> reference to the alto children.  The alto children do contain a reference to
> the jpg page scan, which is labelled with the ID number as part of the name.
>
> The idea is to create a full text index of the alto content, accompanied by
> the author/title info from the mets file for purposes of results display.
>  The first try with this is attempting a recursive FileDataSource approach.
>
> It was relatively easy to create a "content" field which holds the text of
> the page (each word is actually an attribute of a separate tag), but I'm
> having difficulty determining how I'm going to conditionally add the author
> and title data from the METS file to the rows created with the ALTO content
> field.  It'll involve regex'ing out the ID number associated with both the
> mets and alto filenames for starters, but even at that, I don't see how to
> keep it straight since it's not one mets=one alto and it's also not a static
> string for the entire index.
>
> thanks for any hints you can provide.
>
> Fred
> University of Texas at Austin
> ==
> data-config.xml thus far:
>
> 
> 
> 
>  processor="FileListEntityProcessor" fileName=".xml$" recursive="true"
> baseDir="/home/utlol/htdocs/lib-landscapes-new/publications/">
>  stream="true"
> pk="filename"
> url="${landscapes.fileAbsolutePath}"
> processor="XPathEntityProcessor"
> forEach="/mets | /alto"
> transformer="TemplateTransformer,RegexTransformer,LogTransformer"
> logTemplate=" processing ${landscapes.fileAbsolutePath}"
> logLevel="info"
>>
>
> 
> 
> 
>
>
>  xpath="/mets/dmdSec/mdWrap/xmlData/mods/titleInfo/title" />
> 
>  xpath="/alto/Description/sourceImageInformation/fileName" />
>  xpath="/alto/Layout/Page/PrintSpace/TextBlock/TextLine/String/@CONTENT" />
> 
> 
> 
> 
> ==
> METS example:
>
> 
> http://www.w3.org/2001/XMLSchema-instance";
> xmlns="http://www.loc.gov/METS/";
> xsi:schemaLocation="http://www.loc.gov/METS/
> http://schema.ccs-gmbh.com/docworks/version20/mets-docworks.xsd";
> xmlns:MODS="http://www.loc.gov/mods/v3"; xmlns:mix="http://www.loc.gov/mix/";
> xmlns:xlink="http://www.w3.org/1999/xlink"; TYPE="METAe_Monograph"
> LABEL="ENVIRONMENTAL GEOLOGIC ATLAS OF THE TEXAS COASTAL ZONE- Kingsville
> Area">
> 
> 
> CCS docWORKS/METAe Version 6.3-0
> docWORKS-ID: 1677
> 
> 
> 
> 
> 
> 
> 
> ENVIRONMENTAL GEOLOGIC ATLAS OF THE TEXAS COASTAL ZONE-
> Kingsville Area
> 
> 
> L F. Brown, Jr., J. H. McGowen, T. J. Evans, C.
> G.
> Groat
> 
> aut
> 
> 
> 
> W. L.
> Fisher
> 
> aut
> 
> 
>
> 
> ALTO example:
>
> 
> http://www.w3.org/2001/XMLSchema-instance";
> xsi:noNamespaceSchemaLocation="http://schema.ccs-gmbh.com/metae/alto-1-1.xsd";
> xmlns:xlink="http://www.w3.org/TR/xlink";>
> 
> mm10
> 
> /Docworks/IN/GeologyBooks/txu-oclc-6917337/txu-oclc-6917337-009.jpg
> 
> 
> 
> 
> CCS Content Conversion Specialists GmbH,
> Germany
> CCS docWORKS
> 6.3-0.93
> 
> 
> 
> 
> ABBYY (BIT Software), Russia
> FineReader
> 7.0
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  HEIGHT="2345"/>
>  HEIGHT="314"/>
>  HEIGHT="2345">
>  STYLEREFS="TXT_0 PAR_CENTER">
> 
>  CONTENT="Preface" WC="0.98" CC="000"/>
> 
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Issue with copyField when updating document

2010-11-18 Thread Lance Norskog
Have you tried removing the index files and rebuilding it from
scratch? The index could be corrupted. It's rare, but it does happen.

On Thu, Nov 18, 2010 at 9:30 AM, Pramod Goyal  wrote:
> Hi,
> Forgot to mention solr version number:
>
> Solr Implementation Version: 2010-04-30_08-05-41 939580 - hudson -
> 2010-04-30 08:37:22
>
> On Thu, Nov 18, 2010 at 10:50 PM, Pramod Goyal wrote:
>
>> I am using the solr admin to query the document. The returned document is
>> showing old values.
>>
>> Lance,
>> I will not be able to post my configuration but i will create a simple
>> schema just to highlight the issue.
>>
>>
>> On Wed, Nov 17, 2010 at 9:56 PM, Erick Erickson 
>> wrote:
>>
>>> How are you looking at the document? You mention using admin,
>>> are you searching?
>>>
>>> Because if you're looking at *terms* rather then the document,
>>> you should be aware that deleting a document does NOT remove
>>> the terms from the index, it just marks the doc as deleted.
>>>
>>> An optimize will remove the deleted document's terms.
>>>
>>> As Lance says, though, if you're displaying the document you
>>> should not be seeing the original values.
>>>
>>> Best
>>> Erick
>>>
>>> On Tue, Nov 16, 2010 at 11:50 PM, Pramod Goyal >> >wrote:
>>>
>>> > Hi,
>>> >     I am facing a issue with copyFields in SOlr. Here is what i am doing
>>> >
>>> > Schema:
>>> >
>>> >   
>>> >   >> > multiValued="true"/>
>>> >   >> > multiValued="true"/>
>>> >
>>> >   
>>> >
>>> >
>>> > I insert a document with say ID as 100 and product as sampleproduct.
>>> When i
>>> > view the document in the solr admin page i see the correct value for
>>> > the product_copy field ( same as the prodcut field ).
>>> > Next i try to update this document and for the field product i give 2
>>> > values
>>> > sampleproduct and testproduct. When i view the document on the solr
>>> admin
>>> > now
>>> > it show me 3 values in the copy field i.e. sampleproduct, testproduct
>>> and
>>> > sampleproduct ( the initial value in the copy field is retained even on
>>> > update ).
>>> >
>>> > Why is copyFiled retaining the old values when the original field value
>>> has
>>> > been updated. If i update the document multiple times all the old values
>>> > are
>>> > still retained in the copyField.
>>> >
>>> > Note that i am using the solrJ api to insert the document. I tried
>>> setting
>>> > null values for the copy field when i am updating the document but it
>>> didnt
>>> > solve the problem.
>>> >
>>>
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Master/Slave High CPU Usage

2010-11-18 Thread Lance Norskog
If they are on the same server, you do not need to replicate.

If you only do queries, the query server can use the same index
directory as the master. Works quite well. Both have to have the same
LockPolicy in solrconfig.xml. For security reasons, I would run the
query server as a different user who has read-only access to the
index; that way it cannot touch the index.

On Wed, Nov 17, 2010 at 11:28 PM, Ofer Fort  wrote:
> anybody?
>
> On Wed, Nov 17, 2010 at 12:09 PM, Ofer Fort  wrote:
>>
>> Hi, I'm working with Erez,
>> we experienced this again, and this time the slave index folder didn't 
>> contain the index.XXX folder, only one index folder.
>> if we shutdown the slave, the CPU on the master was normal, as soon as we 
>> started the slave again, the CPU went up to 100% again.
>> thanks for any help
>> ofer
>>
>> On Wed, Nov 17, 2010 at 11:15 AM, Erez Zarum  wrote:
>>>
>>> Hi all,
>>> We've been seeing this for the second time already.
>>> I have a solr (1.4.1) master and a slave. both are located on the same 
>>> machine (16GB RAM, 4GB allocated to the slave and 3GB to the master)
>>> All our updates are going towards the master, and all the queries are 
>>> towards the slave.
>>> Once in a while the slave gets OutOfMemoryError. This is not the big 
>>> problem (i have a about 100M documents)
>>> The problem is that from that moment the CPU of the slave AND the master is 
>>> almost 100%.
>>> If i shutdown the slave, the CPU of the master drops.
>>> If i start the slave again, the CPU is 100% again.
>>> I have the replication set on commit and startup.
>>> I see that in the data folder contains three index folders: index, 
>>> index.XXXYYY and  index.XXXYYY.ZZZ
>>>
>>> The only way i was able to get pass it (worked two times already), is to 
>>> shutdown the two servers, and to copy all the index of the master to the 
>>> slave, and start them again.
>>> From that moment and on, they continue to work and replicate with a very 
>>> reasonable CPU usage.
>>>
>>> Our guess is that it failed to replicate due to the OOM and since then 
>>> tries to do a full replication again and again?
>>> but why is the CPU of the master so high?
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Spell-Check Component Functionality

2010-11-18 Thread rajini maski
Hello Peter,
Thanks For reply :)I did spellcheck.q=Curst as you said ...Query is
like:

http://localhost:8909/solr/select/?spellcheck.q=Curst&version=2.2&start=0&rows=10&indent=on&spellcheck=true



I am getting this error :(

HTTP Status 500 - null java.lang.NullPointerException at
java.io.StringReader.(Unknown Source) at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) at
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at
org.apache.solr.search.QParser.getQuery(QParser.java:131) at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at


What is the error mean ... ? what do I need to do for this.. Any mistake in
config?

The config.xml and schema I have attached in the mail below FYI..Please let
me know if anyone know why is this error..

Awaiting reply
Rajani Maski


On Thu, Nov 18, 2010 at 8:09 PM, Peter Karich  wrote:

>  Hi Rajani,
>
> some notes:
>  * try spellcheck.q=curst or completely without spellcheck.q but with q
>  * compared to the normal q parameter spellcheck.q can have a different
> analyzer/tokenizer and is used if present
>  * do not do spellcheck.build=true for every request (creating the
> spellcheck index can be very expensive)
>  * if you got spellcheck working embed the spellcheck component into your
> normal query component. otherwise you need to query 2 times ...
>
> Regards,
> Peter.
>
>
>  All,
>>
>> I am trying apply the Solr spell check component functionality to our
>> data.
>>
>> The configuration set up I needed to make for it by updating config.xml
>> and
>> schema.xml is done as follows..
>> Please let me know if any errors in it.
>>
>>  I am not getting any suggestions in suggestion tags of solr output xml.
>>
>> I indexed word "Crust" to the field textSpell that is enabled for spell
>> check and then I searched for
>> "Curst"
>>
>> The queries i tried were :
>>
>> http://localhost:8909/solr/spell?q=Curst&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.q=true
>>
>>
>> http://localhost:8909/solr/spell?q=Cruste&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.q=true&spellcheck.dictionary=default
>>
>>
>> The CONFIG.XML :
>>
>> 
>> 
>>   default
>>   spell
>>   ./spellchecker
>> 
>>
>> 
>> 
>>   jarowinkler
>>   lowerfilt
>>   >
>> name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance
>>   ./spellchecker2
>> 
>>  textSpell
>> 
>>
>> 
>> 
>> default
>>   
>>   false
>>   
>>   false
>>   
>>   1
>> 
>> 
>>   spellcheck
>> 
>>   
>>
>>
>>
>> SCHEMA:
>>
>> > positionIncrementGap="100">
>>   
>> 
>> 
>> 
>> 
>> 
>>   
>> 
>> 
>> 
>> 
>> 
>> 
>>
>>
>> 
>>
>>
>> If any error in above that is not enabling spell check please let me know.
>>
>> The output I am getting is like null  suggetions
>>
>> 
>> 
>> 
>>
>>
>> Regards,
>> Rajani Maski
>>
>>
>
> --
> http://jetwick.com twitter search prototype
>
>


Re: Spell-Check Component Functionality

2010-11-18 Thread rajini maski
And If I am trying to do :
http://localhost:8909/solr/select/?spellcheck.q=Curst&version=2.2&start=0&rows=10&indent=on&spellcheck=true
&q=Curst&

The XML OUTPUT IS

-

 
-

   0
   0
 
-

   on
   0
   Curst
   Curst
   10
   2.2
  
  
   
  

No suggestion Tags also...

If I am trying to do :
http://localhost:8909/solr/select/?spellcheck.q=Curst&version=2.2&start=0&rows=10&indent=on&spellcheck=true
&q=Crust&

The XML OUTPUT IS

-

 
-

   0
   0
 
-

   on
   0
   Crust
   Curst
   10
   2.2
  
  
 
-

 
-

   Crust
  
  
  

No suggestion Tags..

What is the proper configuration for this? Is there any specific article
written on spell check-solr  other then in solr-wiki page..I am not getting
clear idea about this component in solr-wiki..

Awaiting replies..
Rajani Maski


On Fri, Nov 19, 2010 at 11:32 AM, rajini maski wrote:

> Hello Peter,
> Thanks For reply :)I did spellcheck.q=Curst as you said ...Query is
> like:
>
>
> http://localhost:8909/solr/select/?spellcheck.q=Curst&version=2.2&start=0&rows=10&indent=on&spellcheck=true
>
>
>
> I am getting this error :(
>
> HTTP Status 500 - null java.lang.NullPointerException at
> java.io.StringReader.(Unknown Source) at
> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) at
> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at
> org.apache.solr.search.QParser.getQuery(QParser.java:131) at
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
>
>
> What is the error mean ... ? what do I need to do for this.. Any mistake in
> config?
>
> The config.xml and schema I have attached in the mail below FYI..Please let
> me know if anyone know why is this error..
>
> Awaiting reply
> Rajani Maski
>
>
> On Thu, Nov 18, 2010 at 8:09 PM, Peter Karich  wrote:
>
>>  Hi Rajani,
>>
>> some notes:
>>  * try spellcheck.q=curst or completely without spellcheck.q but with q
>>  * compared to the normal q parameter spellcheck.q can have a different
>> analyzer/tokenizer and is used if present
>>  * do not do spellcheck.build=true for every request (creating the
>> spellcheck index can be very expensive)
>>  * if you got spellcheck working embed the spellcheck component into your
>> normal query component. otherwise you need to query 2 times ...
>>
>> Regards,
>> Peter.
>>
>>
>>  All,
>>>
>>> I am trying apply the Solr spell check component functionality to our
>>> data.
>>>
>>> The configuration set up I needed to make for it by updating config.xml
>>> and
>>> schema.xml is done as follows..
>>> Please let me know if any errors in it.
>>>
>>>  I am not getting any suggestions in suggestion tags of solr output xml.
>>>
>>> I indexed word "Crust" to the field textSpell that is enabled for spell
>>> check and then I searched for
>>> "Curst"
>>>
>>> The queries i tried were :
>>>
>>> http://localhost:8909/solr/spell?q=Curst&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.q=true
>>>
>>>
>>> http://localhost:8909/solr/spell?q=Cruste&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.q=true&spellcheck.dictionary=default
>>>
>>>
>>> The CONFIG.XML :
>>>
>>> 
>>> 
>>>   default
>>>   spell
>>>   ./spellchecker
>>> 
>>>
>>> 
>>> 
>>>   jarowinkler
>>>   lowerfilt
>>>   >>
>>> name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance
>>>   ./spellchecker2
>>> 
>>>  textSpell
>>> 
>>>
>>> 
>>> 
>>> default
>>>   
>>>   false
>>>   
>>>   false
>>>   

Re: Spell-Check Component Functionality

2010-11-18 Thread Shanmugavel SRD

Did you configure below one in your default request handler? 

  spellcheck
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Spell-Check-Component-Functionality-tp1923954p1929124.html
Sent from the Solr - User mailing list archive at Nabble.com.


Doubts regarding Multiple Keyword Search

2010-11-18 Thread Pawan Darira
Hi

I am searching for keywords: ad testing (without quotes). I want result
containing both words on the top. But it is giving me results containing
words: ad test. Is it correct or any logic behind that i.e. will it consider
the word "test" also ?

Please help

-- 
Thanks,
Pawan Darira


how about another SolrIndexSearcher.numDocs method?

2010-11-18 Thread kafka0102
In my app,I want to search numdocs for some queries. I see SolrIndexSearcher 
has two methods:
public int numDocs(Query a, DocSet b)
public int numDocs(Query a, Query b)

But these're not fit for me.For search's params,I get q and fq, and q' results 
are not in filterCache.But above methods are both using filtercache. So I think 
a method like:
public int numDocs(Query q, List fqs) (q not with filtercache,fqs with 
filtercache)
would be fine.
And now,I cannot extend SolrIndexSearcher because of SolrCore. What should I do 
to solve the problem?
thanks.