Re: Facet count problem

2010-04-19 Thread Marco Martinez
Hi Ranveer,

The error in the count of the facets its caused by the tokenized field that
you are using, if you want to do facets for the whole string, use a
fieldType that doesn't strip the the field in tokens like the string field.

Regards,

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/4/19 Ranveer Kumar 

> Hi Erick,
>
> My schema configuration is following.
>
>
>  
>  
>
>
>
>
>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>  
>  
>  
>  
>
>
>
>   
> ignoreCase="true" expand="true"/>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>  
>
>
>
> 
>
> 
>  
>
>
>
>
>
> On Mon, Apr 19, 2010 at 6:22 AM, Erick Erickson  >wrote:
>
> > Can we see the actual field definitions from your schema file.
> > Ahmet's question is vital and is best answered if you'll
> > copy/paste the relevant configuration entries But based
> > on what you *have* posted, I'd guess you're trying to
> > facet on tokenized fields, which is not recommended.
> >
> > You might take a look at:
> > http://wiki.apache.org/solr/UsingMailingLists, it'll help you
> > frame your questions in a manner that gets you your
> > answers as fast as possibld.
> >
> > Best
> > Erick
> >
> > On Sun, Apr 18, 2010 at 12:59 PM, Ranveer Kumar  > >wrote:
> >
> > > I am.using text for type, which is static. For example: type is a field
> > and
> > > I am using type for categorization. For news type I am using news and
> for
> > > blog using blog.. type is a text field.
> > >
> > > On Apr 17, 2010 8:38 PM, "Ahmet Arslan"  wrote:
> > >
> > > > I am facing problem to get facet result count. I must be > wrong
> > > somewhere. > I am getting proper ...
> > > Are you faceting on a tokenized field? What is the fieldType of your
> > field?
> > >
> >
>


Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Hello,

I am confused about the proper usage of the Boolean operators, AND, OR and NOT. 
Could somebody please provide me an easy to understand explanation.

Thanks,
Sandhya


Re: LucidWorks Solr

2010-04-19 Thread MitchK

Andy, I think it is important to know what a stemmer really is.

It reduces words to their infinitves. Those infinitives do not refer to the
real infinitive everytime, but however: for the system, it is an infinitive,
since all its derivates could be reduced to the same form.
Thats a stemmer.

According to this, there can't exist a stemmer for every language, because
every language has got its own rules of how to reduce a word to its
infinitive.

If you apply a stemmer for english language on a german document, the
results might be unexpected. However, sometimes it still works good enough. 

Keep in mind that this is an algorithm. It is not important whether the
created infinitive is the real infinitive. It is only important that most of
the derivate forms can be reduced to the same basic form. Please ask, if
something is not clear.

KStem:
The wiki[1] says that KStem is less aggressive as the standard stemmer.
I guess that this means that there are more rules for how to reduce a word
to its infinitive and according to this the results might be better.


[1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/LucidWorks-Solr-tp727341p729110.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help using boolean operators

2010-04-19 Thread MitchK

Hello Sandhya,

title: star AND wars NOT sdi
This query will match every document where "star" *and* "wars" occur but
*not* the term "sdi" (SDI => Strategic Defense Initiative => in the media
there was often the term star wars used to describe the project).

title: star OR wars
This query will match every document where "star" *or* "wars" occur.

If your standard operator (defined in your schema.xml) is the OR, you don't
need to add the "OR" operator to your query.

Standard-operator: OR
title: star wars
This is the same as title: star OR wars

standard-operator: AND
title: star wars
- > the same as title: star AND wars

standard-operator: AND
title: star wars NOT sdi
is the same as: title: star AND wars NOT sdi

Hope this helps.

Kind regards
- Mitch 
-- 
View this message in context: 
http://n3.nabble.com/Help-using-boolean-operators-tp729102p729135.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query regarding "copyField"

2010-04-19 Thread MitchK

Hello Sandhya,

please, show us your schema.xml, so that we can have a look whether
something might be wrong there.

However, if the source of a copyField is "description" and the destination
is "description_stemmed", you can query both: description and
description_stemmed. There will be no error.

- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Query-regarding-copyField-tp728961p729140.html
Sent from the Solr - User mailing list archive at Nabble.com.


Stemming - disable at query time - reg.

2010-04-19 Thread Naga Darbha
Hi,

I have the following filter for a field named "myText"



This enables stemming, I guess.

My questions are:

1) Can I disable stemming for the same field at the query time?
2) Do I need to copyField the "myText" to "nonStemText", wherein "nonStemText" 
is not configured with the PorterFilterFactory.

regards,
Naga


Re: Stemming - disable at query time - reg.

2010-04-19 Thread Rafał Kuć
Hello!

  If  you  want  to have both non-stemmed and stemmed field You should
use copyField.

  Even  if  there would be a possibility to disable snowball filter at
query time, you would have stemmed tokens written in the index.


> Hi,

> I have the following filter for a field named "myText"

>  protected="protwords.txt"/>

> This enables stemming, I guess.

> My questions are:

> 1) Can I disable stemming for the same field at the query time?
> 2) Do I need to copyField the "myText" to "nonStemText", wherein
> "nonStemText" is not configured with the PorterFilterFactory.

> regards,
> Naga



-- 
Regards,
 Rafał Kuć



Re: Stemming - disable at query time - reg.

2010-04-19 Thread MitchK

Naga,

1) Yes, it is possible. 

  
  
   
  
   
  
 ... define those filters which you want to apply at query-time 
  


2) I am not sure whether I understand your question right:
You do not need to copyField your myText-field, if it is okay for you that
the indexed data of the myText-field is stemmed and the query is not.
For example: if the original data consists of the sentence "I am working"
than it (maybe) looks like this after it is stemmed "I am work". If you
query against this with the term "working" there will be no match, if you
don't stem your querystring, too.

Hope this helps.

- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p729171.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stemming - disable at query time - reg.

2010-04-19 Thread Rafał Kuć
Hello!

  MitchK posted the right solution, my post can be confusing ;( Sorry,
for that.


> Hello!

>   If  you  want  to have both non-stemmed and stemmed field You should
> use copyField.

>   Even  if  there would be a possibility to disable snowball filter at
> query time, you would have stemmed tokens written in the index.


>> Hi,

>> I have the following filter for a field named "myText"

>> > protected="protwords.txt"/>

>> This enables stemming, I guess.

>> My questions are:

>> 1) Can I disable stemming for the same field at the query time?
>> 2) Do I need to copyField the "myText" to "nonStemText", wherein
>> "nonStemText" is not configured with the PorterFilterFactory.

>> regards,
>> Naga






-- 
Regards,
 Rafał Kuć



RE: Stemming - disable at query time - reg.

2010-04-19 Thread Naga Darbha
Thank you Mitch! I will try that.

regards,
Naga



-Original Message-
From: MitchK [mailto:mitc...@web.de] 
Sent: Monday, April 19, 2010 2:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Stemming - disable at query time - reg.


Naga,

1) Yes, it is possible. 

  
  
   
  
   
  
 ... define those filters which you want to apply at query-time 
  


2) I am not sure whether I understand your question right:
You do not need to copyField your myText-field, if it is okay for you that
the indexed data of the myText-field is stemmed and the query is not.
For example: if the original data consists of the sentence "I am working"
than it (maybe) looks like this after it is stemmed "I am work". If you
query against this with the term "working" there will be no match, if you
don't stem your querystring, too.

Hope this helps.

- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p729171.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Thank You Mitch.

I have a query mentioned below : (my defaultOperator is set to "AND")

(field1 : This is a good string AND field2 : This is a good string AND field3 : 
This is a good string AND (field4 : ASCIIDocument OR field4 : BinaryDocument OR 
field4 : HTMLDocument) AND field5 : doc)

This is not giving me the desired results.

I want all documents with field1 = ' This is a good string' and field2 = 'This 
is a good string'  and field3 = ' This is a good string' and (field4 = 
'ASCIIDocument' or ' BinaryDocument' or ' HTMLDocument') and field5 = 'doc' to 
be returned.

I am not sure why this is not giving me the desired results.

Thanks,
Sandhya

-Original Message-
From: MitchK [mailto:mitc...@web.de] 
Sent: Monday, April 19, 2010 2:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Help using boolean operators


Hello Sandhya,

title: star AND wars NOT sdi
This query will match every document where "star" *and* "wars" occur but
*not* the term "sdi" (SDI => Strategic Defense Initiative => in the media
there was often the term star wars used to describe the project).

title: star OR wars
This query will match every document where "star" *or* "wars" occur.

If your standard operator (defined in your schema.xml) is the OR, you don't
need to add the "OR" operator to your query.

Standard-operator: OR
title: star wars
This is the same as title: star OR wars

standard-operator: AND
title: star wars
- > the same as title: star AND wars

standard-operator: AND
title: star wars NOT sdi
is the same as: title: star AND wars NOT sdi

Hope this helps.

Kind regards
- Mitch 
-- 
View this message in context: 
http://n3.nabble.com/Help-using-boolean-operators-tp729102p729135.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Stemming - disable at query time - reg.

2010-04-19 Thread Naga Darbha
Hi Mitch,

I have defined my field like:


  





  
  





  


I have indexed two documents with "working" and "worked" values and when I 
search for "working" it is not giving me any results, whereas when I search for 
"work" it is giving me two results.

What should I be doing to get the query results for "working".

regards,
Naga

-Original Message-
From: Naga Darbha [mailto:ndar...@opentext.com] 
Sent: Monday, April 19, 2010 2:45 PM
To: solr-user@lucene.apache.org
Subject: RE: Stemming - disable at query time - reg.

Thank you Mitch! I will try that.

regards,
Naga



-Original Message-
From: MitchK [mailto:mitc...@web.de] 
Sent: Monday, April 19, 2010 2:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Stemming - disable at query time - reg.


Naga,

1) Yes, it is possible. 

  
  
   
  
   
  
 ... define those filters which you want to apply at query-time 
  


2) I am not sure whether I understand your question right:
You do not need to copyField your myText-field, if it is okay for you that
the indexed data of the myText-field is stemmed and the query is not.
For example: if the original data consists of the sentence "I am working"
than it (maybe) looks like this after it is stemmed "I am work". If you
query against this with the term "working" there will be no match, if you
don't stem your querystring, too.

Hope this helps.

- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p729171.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Also, one of the fields here, *field3* is a dynamic field. All the other fields 
except this field, are copied into "text" with copyField.

Thanks,
Sandhya

-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com] 
Sent: Monday, April 19, 2010 2:55 PM
To: solr-user@lucene.apache.org
Subject: RE: Help using boolean operators

Thank You Mitch.

I have a query mentioned below : (my defaultOperator is set to "AND")

(field1 : This is a good string AND field2 : This is a good string AND field3 : 
This is a good string AND (field4 : ASCIIDocument OR field4 : BinaryDocument OR 
field4 : HTMLDocument) AND field5 : doc)

This is not giving me the desired results.

I want all documents with field1 = ' This is a good string' and field2 = 'This 
is a good string'  and field3 = ' This is a good string' and (field4 = 
'ASCIIDocument' or ' BinaryDocument' or ' HTMLDocument') and field5 = 'doc' to 
be returned.

I am not sure why this is not giving me the desired results.

Thanks,
Sandhya

-Original Message-
From: MitchK [mailto:mitc...@web.de] 
Sent: Monday, April 19, 2010 2:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Help using boolean operators


Hello Sandhya,

title: star AND wars NOT sdi
This query will match every document where "star" *and* "wars" occur but
*not* the term "sdi" (SDI => Strategic Defense Initiative => in the media
there was often the term star wars used to describe the project).

title: star OR wars
This query will match every document where "star" *or* "wars" occur.

If your standard operator (defined in your schema.xml) is the OR, you don't
need to add the "OR" operator to your query.

Standard-operator: OR
title: star wars
This is the same as title: star OR wars

standard-operator: AND
title: star wars
- > the same as title: star AND wars

standard-operator: AND
title: star wars NOT sdi
is the same as: title: star AND wars NOT sdi

Hope this helps.

Kind regards
- Mitch 
-- 
View this message in context: 
http://n3.nabble.com/Help-using-boolean-operators-tp729102p729135.html
Sent from the Solr - User mailing list archive at Nabble.com.


Wildcard search in phrase query using spanquery

2010-04-19 Thread Maddy.Jsh

I need to perform wildcard search in phrase query. I have 2 documents
containing text "how do impair" and "how to improve". I want to be able to
search both documents by searching (how to im*). There is a provision in
lucene which allows me to perform this operation using SpanWildcardQuery and
keeping span length to 0. 

http://mail-archives.apache.org/mod_mbox//lucene-java-user/200707.mbox/%3c469df09f.9030...@gmail.com%3e


I tried proximity search in solr but it didn't work with wildcard. Is there
any other provision to perform wildcard search in phrase query?

Any suggestions

Maddy.
-- 
View this message in context: 
http://n3.nabble.com/Wildcard-search-in-phrase-query-using-spanquery-tp729275p729275.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query 2 Cores

2010-04-19 Thread Lee Smith
Hey All

I have 2 cores which have been used with tika to do index files.

I would like to do one query on both at once as I will be searching 
attr_content field.

If I do a test on each core I get 1 & 17 results but trying with shards I just 
get 17 results.

Here is my example query

http://localhost8983/solr/core1/select?shards=localhost:8983/solr/core2&q=attr_content:test

Is this the correct way to query 2 cores at once ?

Hope you can help

Lee

Re: Stemming - disable at query time - reg.

2010-04-19 Thread Alejandro Marqués Rodríguez
Hi Naga,

I think you should add the same filter to the query configuration:


 
   
   
   
   

 
 

   
   
   
   
**
 
   

That way stemming is applied to the query, so it would search for "work"
instead of "working" and, therefore you should be able to retrieve both
"worked" and "working".

You can see the diferent transformations due to analyzers in query and index
time in the "analysis" link inside the Solr admin page so you can check why
a given query doesn't match some text.

In this case I think you should get:

Index: Working -> Work (Applies stemming)
Query: Working -> Working (Doesn't apply stemming)

So "working" won't match "work"

Regards


2010/4/19 Naga Darbha 

> Hi Mitch,
>
> I have defined my field like:
>
> positionIncrementGap="100">
>  
>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
>  protected="protwords.txt"/>
>  
>  
> 
> ignoreCase="true" expand="true"/>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
>  
>
>
> I have indexed two documents with "working" and "worked" values and when I
> search for "working" it is not giving me any results, whereas when I search
> for "work" it is giving me two results.
>
> What should I be doing to get the query results for "working".
>
> regards,
> Naga
>
> -Original Message-
> From: Naga Darbha [mailto:ndar...@opentext.com]
> Sent: Monday, April 19, 2010 2:45 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Stemming - disable at query time - reg.
>
> Thank you Mitch! I will try that.
>
> regards,
> Naga
>
>
>
> -Original Message-
> From: MitchK [mailto:mitc...@web.de]
> Sent: Monday, April 19, 2010 2:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Stemming - disable at query time - reg.
>
>
> Naga,
>
> 1) Yes, it is possible.
> 
>  
>  
>   language="English" protected="protwords.txt"/>
>  
>   
>  
> ... define those filters which you want to apply at query-time
>  
> 
>
> 2) I am not sure whether I understand your question right:
> You do not need to copyField your myText-field, if it is okay for you that
> the indexed data of the myText-field is stemmed and the query is not.
> For example: if the original data consists of the sentence "I am working"
> than it (maybe) looks like this after it is stemmed "I am work". If you
> query against this with the term "working" there will be no match, if you
> don't stem your querystring, too.
>
> Hope this helps.
>
> - Mitch
> --
> View this message in context:
> http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p729171.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Re: Solr throws TikaException while parsing sample PDF

2010-04-19 Thread Praveen Agrawal
Hi Grant,
I tried command line of Tika v-0.7(newest), and it parsed the file.. I
believe Solr1.4 contains 0.4 version of Tika.
Do you suggest to upgrade to new Tika? Can i upgrade only tika in Solr-1.4?
or i need to wait till Solr ships with new Tika?
Thanks.


On Sun, Apr 18, 2010 at 11:24 PM, Grant Ingersoll wrote:

> Can you extract content from this using Tika's standalone command line
> tool?  PDF's are notorious for problems in extracting.  To me, it looks like
> a bug in PDFBox.  I would try to isolate it down to there and then send, if
> possible, the sample document to PDFBox and see if they can come up w/ a
> fix.
>
> -Grant
>
> On Apr 18, 2010, at 1:12 PM, pk wrote:
>
> >
> > Hi,
> > while posting a sample pdf (that comes with Solr dist'n) to solr, i'm
> > getting a TikaException.
> > Using Solr-1.4, SolrJ (StreamingUpdateSolrServer) for posting pdf to
> solr.
> > Other sample pdfs can be parsed and indexed successfully.. I;m getting
> same
> > error with some other pdfs also (but adobe reader can open them fine, so
> i
> > dont think they have an issue in formatting or are corrupt etc)... Here
> is
> > the trace...
> >
> > 
> > found uploaded file : C:\solr_1.4.0\docs\Installing Solr in Tomcat.pdf ::
> > size=286242
> > Apr 18, 2010 10:31:34 PM
> org.apache.solr.update.processor.LogUpdateProcessor
> > finish
> > INFO: {} 0 640
> > Apr 18, 2010 10:31:34 PM org.apache.solr.common.SolrException log
> > SEVERE: org.apache.solr.common.SolrException:
> > org.apache.tika.exception.TikaException: Una
> > ble to extract PDF content
> >at
> >
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocu
> > mentLoader.java:211)
> >at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStrea
> > mHandlerBase.java:54)
> >at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.jav
> > a:131)
> >at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(Re
> > questHandlers.java:233)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> >
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241
> > )
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFil
> > terChain.java:215)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain
> > .java:188)
> >at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
> > 213)
> >at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:
> > 172)
> >at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> >at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
> >at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:10
> > 8)
> >at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
> >at
> >
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:873)
> >at
> >
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConn
> > ection(Http11BaseProtocol.java:665)
> >at
> >
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:5
> > 28)
> >at
> >
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorke
> > rThread.java:81)
> >at
> >
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:6
> > 89)
> >at java.lang.Thread.run(Thread.java:595)
> > Caused by: org.apache.tika.exception.TikaException: Unable to extract PDF
> > content
> >at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:58)
> >at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:51)
> >at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
> >at
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
> >at
> >
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocu
> > mentLoader.java:190)
> >... 20 more
> > Caused by: java.util.zip.ZipException: incorrect header check
> >at
> > java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
> >at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
> >at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
> >at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
> >at
> org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
> >at org.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:101)
> >at org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:132)
> >a

Re: Solr throws TikaException while parsing sample PDF

2010-04-19 Thread Koji Sekiguchi

Praveen Agrawal wrote:

Hi Grant,
I tried command line of Tika v-0.7(newest), and it parsed the file.. I
believe Solr1.4 contains 0.4 version of Tika.
Do you suggest to upgrade to new Tika? Can i upgrade only tika in Solr-1.4?
or i need to wait till Solr ships with new Tika?
Thanks.
  

Solr trunk uses Tika 0.7. I'm not SolrCell user, so this is just an FYI.

Koji

--
http://www.rondhuit.com/en/



Howto build a function query using the 'query' function

2010-04-19 Thread Villemos, Gert
I want to build a function expression for a dismax request handler 'bf'
field, to boost the documents if it is referenced by other documents.
I.e. the more often a document is referenced, the higher the boost. 

 

Something like

 
linear(query(myQueryReturningACountOfHowOftenThisDocumentIsReference
d, 1), 0.01, 1)

 

Intended to mean; 

if count is 0, then the boost is 0*0.01+1 = 1

if count is 1, then the boost is 1*0.01+1 = 1.01

If count is 100, then the boost is 100*0.01 + 1 = 2

 

However the query function
(http://wiki.apache.org/solr/FunctionQuery#query) seems to only be able
to return the score of the query results, not the count of results.

 

How can I do this?

 

Thanks,

Gert.



Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



OutOfMemoryError when using query with sort

2010-04-19 Thread Hamid Vahedi
Hi, i using solr that running on windows server 2008 32-bit. 

I add about 100 million article into solr without set store attribute. (only 
store document id) (index file size about 164 GB)
when try to get query without sort , it's return doc ids in some ms, but when 
add sort command, i get below error:

TTP Status 500 - Java heap space java.lang.OutOfMemoryError: Java heap 
space at 
org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:560)
 at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208) 
at 
org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:525) at 
org.apache.lucene.search.FieldComparator$LongComparator.setNextReader(FieldComparator.java:391)
 at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
 at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:245) at 
org.apache.lucene.search.Searcher.search(Searcher.java:171) at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
 at 

Note: i set max heap size to 1600MB (tomcat service not start when apply more 
heap size) but problem not solved

I check heap dump file with mat and see this info

org.apache.lucene.index.ReadOnlySegmentReader @ 0x253508e8  Shallow Size: 80 B 
Retained Size: 449,4 MB

Problem Suspect 1 
One instance of "org.apache.lucene.index.ReadOnlySegmentReader" loaded 
by "org.apache.catalina.loader.WebappClassLoader @ 0x25350c80" occupies 
471.244.848 (97,44%) bytes. The memory is accumulated in one instance of 
"org.apache.lucene.index.TermInfosReader" loaded by 
"org.apache.catalina.loader.WebappClassLoader @ 
0x25350c80".Keywords
org.apache.lucene.index.ReadOnlySegmentReader
org.apache.catalina.loader.WebappClassLoader 
@ 0x25350c80
org.apache.lucene.index.TermInfosReader
Problem Suspect 1

how to decrease segment file size for solving this problem 

Thanks in advanced 
Hamid



  

Re: LucidWorks Solr

2010-04-19 Thread Darren Govoni
Regarding stemmers, I ditched them altogether a long time ago in favor
of a dictionary of morphologies of all known words (for any given
language). A simple lookup of any word morphology thus produces the set,
including the correct stem.

Works great. 100% of the time.

Just a tip from me.


On Mon, 2010-04-19 at 00:36 -0800, MitchK wrote:

> Andy, I think it is important to know what a stemmer really is.
> 
> It reduces words to their infinitves. Those infinitives do not refer to the
> real infinitive everytime, but however: for the system, it is an infinitive,
> since all its derivates could be reduced to the same form.
> Thats a stemmer.
> 
> According to this, there can't exist a stemmer for every language, because
> every language has got its own rules of how to reduce a word to its
> infinitive.
> 
> If you apply a stemmer for english language on a german document, the
> results might be unexpected. However, sometimes it still works good enough. 
> 
> Keep in mind that this is an algorithm. It is not important whether the
> created infinitive is the real infinitive. It is only important that most of
> the derivate forms can be reduced to the same basic form. Please ask, if
> something is not clear.
> 
> KStem:
> The wiki[1] says that KStem is less aggressive as the standard stemmer.
> I guess that this means that there are more rules for how to reduce a word
> to its infinitive and according to this the results might be better.
> 
> 
> [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
> 
> Kind regards
> - Mitch




Ampersand in searchstring. how to replace ?

2010-04-19 Thread stockii

Hello..

I didnt find any about my problem...

how can i replace an ampersand in indextime ? 

my autosuggest words are haveing ampersands. how can i replace this sign (&)
???


PatternReplaceCharFilterFactory ??
how is to use this Factory ? 

or RegexTransformer ??? 

thx for ya help ;)
-- 
View this message in context: 
http://n3.nabble.com/Ampersand-in-searchstring-how-to-replace-tp729475p729475.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: LucidWorks Solr

2010-04-19 Thread Andy
Thanks for the explanation Mitch.

You're right. There can't be universal stemmers.

What about multi-language stemmers? I'm mostly interested in English, Spanish, 
German, French, Italian. Are there any stemmers that would handle those 
languages?

If not, what's the recommended way to deal with documents in multiple languages?

--- On Mon, 4/19/10, MitchK  wrote:

> From: MitchK 
> Subject: Re: LucidWorks Solr
> To: solr-user@lucene.apache.org
> Date: Monday, April 19, 2010, 4:36 AM
> 
> Andy, I think it is important to know what a stemmer really
> is.
> 
> It reduces words to their infinitves. Those infinitives do
> not refer to the
> real infinitive everytime, but however: for the system, it
> is an infinitive,
> since all its derivates could be reduced to the same form.
> Thats a stemmer.
> 
> According to this, there can't exist a stemmer for every
> language, because
> every language has got its own rules of how to reduce a
> word to its
> infinitive.
> 
> If you apply a stemmer for english language on a german
> document, the
> results might be unexpected. However, sometimes it still
> works good enough. 
> 
> Keep in mind that this is an algorithm. It is not important
> whether the
> created infinitive is the real infinitive. It is only
> important that most of
> the derivate forms can be reduced to the same basic form.
> Please ask, if
> something is not clear.
> 
> KStem:
> The wiki[1] says that KStem is less aggressive as the
> standard stemmer.
> I guess that this means that there are more rules for how
> to reduce a word
> to its infinitive and according to this the results might
> be better.
> 
> 
> [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
> 
> Kind regards
> - Mitch
> -- 
> View this message in context: 
> http://n3.nabble.com/LucidWorks-Solr-tp727341p729110.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 


  


Re: LucidWorks Solr

2010-04-19 Thread Andy
Thanks for the tip.

Are there any publicly available dictionary of morphologies that I could use? 
Or did you build your own one?


--- On Mon, 4/19/10, Darren Govoni  wrote:

> From: Darren Govoni 
> Subject: Re: LucidWorks Solr
> To: solr-user@lucene.apache.org
> Date: Monday, April 19, 2010, 7:39 AM
> Regarding stemmers, I ditched them
> altogether a long time ago in favor
> of a dictionary of morphologies of all known words (for any
> given
> language). A simple lookup of any word morphology thus
> produces the set,
> including the correct stem.
> 
> Works great. 100% of the time.
> 
> Just a tip from me.
> 
> 
> On Mon, 2010-04-19 at 00:36 -0800, MitchK wrote:
> 
> > Andy, I think it is important to know what a stemmer
> really is.
> > 
> > It reduces words to their infinitves. Those
> infinitives do not refer to the
> > real infinitive everytime, but however: for the
> system, it is an infinitive,
> > since all its derivates could be reduced to the same
> form.
> > Thats a stemmer.
> > 
> > According to this, there can't exist a stemmer for
> every language, because
> > every language has got its own rules of how to reduce
> a word to its
> > infinitive.
> > 
> > If you apply a stemmer for english language on a
> german document, the
> > results might be unexpected. However, sometimes it
> still works good enough. 
> > 
> > Keep in mind that this is an algorithm. It is not
> important whether the
> > created infinitive is the real infinitive. It is only
> important that most of
> > the derivate forms can be reduced to the same basic
> form. Please ask, if
> > something is not clear.
> > 
> > KStem:
> > The wiki[1] says that KStem is less aggressive as the
> standard stemmer.
> > I guess that this means that there are more rules for
> how to reduce a word
> > to its infinitive and according to this the results
> might be better.
> > 
> > 
> > [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
> > 
> > Kind regards
> > - Mitch
> 
> 
> 


  


Fwd: [Dbworld] Survey on Web Geo-Spatial Open-Source Technologies

2010-04-19 Thread Paul Libbrecht

maybe of interest to those doing geo-search in solr?

paul

Début du message réexpédié :


De : "Gavin McArdle" 
Date : 19 avril 2010 14:46:05 GMT+02:00
À : dbwo...@cs.wisc.edu
Objet : [Dbworld] Survey on Web Geo-Spatial Open-Source Technologies
Répondre à : dbworld_ow...@yahoo.com

[Apologies for cross-posting]

Hi everybody, I am part of the Spatial Information Systems Group in  
University College Dublin.


We are conducting a survey on Open-Source technologies with  
particular focus on Geo-Spatial projects. Our goal is to collect  
first-hand knowledge about a number of Open-Source projects active  
on the Internet.


With this work we hope to identify strong and weak points of each  
project in order to give some guidelines for future directions to  
the Open-Source community and potential developers in relation to  
Geo-Spatial research. Therefore we would like to ask you to take an  
anonymous questionnaire on these technologies.


The questionnaire consists of a few simple questions about your  
experience with the software in terms of usability, stability,  
interoperability and so on.


Estimated completion time: about 1 minute

Link to the questionnaire: http://bit.ly/geospatial-opensource-survey


Projects included in this survey: GeoServer, MapServer, PostGIS,  
MySQL, Hibernate Spatial, Ruby on Rails, Grails, Proj.4, GeoTools,  
Java Topology Suite, OpenLayers, JsExt, Prototype, MooTools


Feel free to contact us at andrea.ballatore [at] ucd.ie if you have  
any questions, comments and recommendation about this survey.



Thank you for your attention,

Spatial Information Systems Group,

School of Computer Science and Informatics,

University College Dublin

___
Please do not post msgs that are not relevant to the database  
community at large.  Go to www.cs.wisc.edu/dbworld for guidelines  
and posting forms.

To unsubscribe, go to https://lists.cs.wisc.edu/mailman/listinfo/dbworld




[ANN] Carrot2 3.3.0 released

2010-04-19 Thread Stanislaw Osinski
Dear All,

We're pleased to announce the 3.3.0 release of Carrot2 which significantly
improves the scalability of the clustering algorithms (up to 7x times faster
clustering in case of the STC algorithm) and fixes a number of minor issues.

Release notes:
http://project.carrot2.org/release-3.3.0-notes.html

Download:
http://download.carrot2.org

JIRA issues:
http://issues.carrot2.org/secure/IssueNavigator.jspa?jqlQuery=project+%3D+CARROT+AND+fixVersion+%3D+%223.3.0%22+ORDER+BY+priority+DESC%2C+key+DESC


Similar improvements are available in Lingo3G, the real-time document
clustering engine from Carrot Search.


Thanks!

Dawid Weiss, Stanislaw Osinski
Carrot Search, i...@carrot-search.com


is solr ignored my filters ?

2010-04-19 Thread stockii

hey.

sry for this ... stupid question ;)

when i perform an import from my data is use some filters. how can i really
be sure that solr used my configured filters and analyzer ? 

when i search in solr the result looks 100% like bevor an import.

th =)
-- 
View this message in context: 
http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729646.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is solr ignored my filters ?

2010-04-19 Thread Erik Hatcher
Analyzers/Tokenizers/TokenFilters operate on the text that gets  
indexed.  Stored text remains exactly as you sent it in.


Erik

On Apr 19, 2010, at 9:53 AM, stockii wrote:



hey.

sry for this ... stupid question ;)

when i perform an import from my data is use some filters. how can i  
really

be sure that solr used my configured filters and analyzer ?

when i search in solr the result looks 100% like bevor an import.

th =)
--
View this message in context: 
http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729646.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: is solr ignored my filters ?

2010-04-19 Thread Sven Maurmann

Hi,

could you provide at least some information? Usually you
can be 100% sure that Solr uses the configuration it is
provided with.

Cheers,
Sven

--On Montag, 19. April 2010 05:53 -0800 stockii  wrote:



hey.

sry for this ... stupid question ;)

when i perform an import from my data is use some filters. how can i
really be sure that solr used my configured filters and analyzer ?

when i search in solr the result looks 100% like bevor an import.

th =)
--
View this message in context:
http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729646.html Sent
from the Solr - User mailing list archive at Nabble.com.


Re: LucidWorks Solr

2010-04-19 Thread darren
There have been some open source ones. I don't have the links handy at
this moment[1]. But I parsed through the electronic dictionary and
generated a database of each word and its morphologies. I got tired of
lame stemmers that were wrong half the time. Computers are fast enough to
do lookups on 150,000 words noawadays, there's no need for fuzzy
algorithms here, IMO.

Good luck!

[1] google will turn up some I think.

> Thanks for the tip.
>
> Are there any publicly available dictionary of morphologies that I could
> use? Or did you build your own one?
>
>
> --- On Mon, 4/19/10, Darren Govoni  wrote:
>
>> From: Darren Govoni 
>> Subject: Re: LucidWorks Solr
>> To: solr-user@lucene.apache.org
>> Date: Monday, April 19, 2010, 7:39 AM
>> Regarding stemmers, I ditched them
>> altogether a long time ago in favor
>> of a dictionary of morphologies of all known words (for any
>> given
>> language). A simple lookup of any word morphology thus
>> produces the set,
>> including the correct stem.
>>
>> Works great. 100% of the time.
>>
>> Just a tip from me.
>>
>>
>> On Mon, 2010-04-19 at 00:36 -0800, MitchK wrote:
>>
>> > Andy, I think it is important to know what a stemmer
>> really is.
>> >
>> > It reduces words to their infinitves. Those
>> infinitives do not refer to the
>> > real infinitive everytime, but however: for the
>> system, it is an infinitive,
>> > since all its derivates could be reduced to the same
>> form.
>> > Thats a stemmer.
>> >
>> > According to this, there can't exist a stemmer for
>> every language, because
>> > every language has got its own rules of how to reduce
>> a word to its
>> > infinitive.
>> >
>> > If you apply a stemmer for english language on a
>> german document, the
>> > results might be unexpected. However, sometimes it
>> still works good enough.
>> >
>> > Keep in mind that this is an algorithm. It is not
>> important whether the
>> > created infinitive is the real infinitive. It is only
>> important that most of
>> > the derivate forms can be reduced to the same basic
>> form. Please ask, if
>> > something is not clear.
>> >
>> > KStem:
>> > The wiki[1] says that KStem is less aggressive as the
>> standard stemmer.
>> > I guess that this means that there are more rules for
>> how to reduce a word
>> > to its infinitive and according to this the results
>> might be better.
>> >
>> >
>> > [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
>> >
>> > Kind regards
>> > - Mitch
>>
>>
>>
>
>
>
>



Re: is solr ignored my filters ?

2010-04-19 Thread stockii

okay. 

as example. i want to check if WordDelimiterFactory works correct. And i
want to experimant with search in substrings with edgengram...

i have the problem with that string: "Kamera-Wasserwaage" ... 

so i think solr should filter this like this.

Kamera-Wasserwaage 
-> Kamera
-> Wasserwaage

but i want that solr split Wasserwaage into -> Wasser ->Waage and
wasserwaage. But this only works with WasserWaage. grml... 

so i want to see how it is indexed. 


-- 
View this message in context: 
http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is solr ignored my filters ?

2010-04-19 Thread Michael Kuhlmann
Am 19.04.2010 16:09, schrieb stockii:
> so i want to see how it is indexed. 
> 
> 
Go to the admin panel, open the schema browser, and set the number of
shown tokens to 1 or something.

-Michael



Re: Help using boolean operators

2010-04-19 Thread Erick Erickson
If you're submitting this:

field1 : This is a good string

then you're searching in "field1" ONLY for "This". the tokens "is",
"a" "good" and "string" are being searched against your default
search field as defined in your schema.

Have you tried parenthesizing?

Try the SOLR admin page for looking at how a query is parsed and/or
attach &debugQuery=on to your http request to see how the query
actually works

HTH
Erick

On Mon, Apr 19, 2010 at 5:47 AM, Sandhya Agarwal wrote:

> Also, one of the fields here, *field3* is a dynamic field. All the other
> fields except this field, are copied into "text" with copyField.
>
> Thanks,
> Sandhya
>
> -Original Message-
> From: Sandhya Agarwal [mailto:sagar...@opentext.com]
> Sent: Monday, April 19, 2010 2:55 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Help using boolean operators
>
> Thank You Mitch.
>
> I have a query mentioned below : (my defaultOperator is set to "AND")
>
> (field1 : This is a good string AND field2 : This is a good string AND
> field3 : This is a good string AND (field4 : ASCIIDocument OR field4 :
> BinaryDocument OR field4 : HTMLDocument) AND field5 : doc)
>
> This is not giving me the desired results.
>
> I want all documents with field1 = ' This is a good string' and field2 =
> 'This is a good string'  and field3 = ' This is a good string' and (field4 =
> 'ASCIIDocument' or ' BinaryDocument' or ' HTMLDocument') and field5 = 'doc'
> to be returned.
>
> I am not sure why this is not giving me the desired results.
>
> Thanks,
> Sandhya
>
> -Original Message-
> From: MitchK [mailto:mitc...@web.de]
> Sent: Monday, April 19, 2010 2:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Help using boolean operators
>
>
> Hello Sandhya,
>
> title: star AND wars NOT sdi
> This query will match every document where "star" *and* "wars" occur but
> *not* the term "sdi" (SDI => Strategic Defense Initiative => in the media
> there was often the term star wars used to describe the project).
>
> title: star OR wars
> This query will match every document where "star" *or* "wars" occur.
>
> If your standard operator (defined in your schema.xml) is the OR, you don't
> need to add the "OR" operator to your query.
>
> Standard-operator: OR
> title: star wars
> This is the same as title: star OR wars
>
> standard-operator: AND
> title: star wars
> - > the same as title: star AND wars
>
> standard-operator: AND
> title: star wars NOT sdi
> is the same as: title: star AND wars NOT sdi
>
> Hope this helps.
>
> Kind regards
> - Mitch
> --
> View this message in context:
> http://n3.nabble.com/Help-using-boolean-operators-tp729102p729135.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: is solr ignored my filters ?

2010-04-19 Thread stockii

oha, yes thx but

we have 800 000 items ... to find the right in this way ? XD 
-- 
View this message in context: 
http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729749.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is solr ignored my filters ?

2010-04-19 Thread Michael Kuhlmann
Am 19.04.2010 16:29, schrieb stockii:
> 
> oha, yes thx but
> 
> we have 800 000 items ... to find the right in this way ? XD 

Then use the TermsComponent: http://wiki.apache.org/solr/TermsComponent

-Michael


Re: Ampersand in searchstring. how to replace ?

2010-04-19 Thread Ahmet Arslan

> I didnt find any about my problem...
> 
> how can i replace an ampersand in indextime ? 
> 
> my autosuggest words are haveing ampersands. how can i
> replace this sign (&)
> ???
> 

Easiest way is to use MappingCharFilterFactory before your tokenizer.



mapping.txt will be placed under solrhome/conf directory and contain this line :

"&" => " " 


  


Re: Wildcard search in phrase query using spanquery

2010-04-19 Thread Ahmet Arslan
> I need to perform wildcard search in phrase query. I have 2
> documents
> containing text "how do impair" and "how to improve". I
> want to be able to
> search both documents by searching (how to im*). There is a
> provision in
> lucene which allows me to perform this operation using
> SpanWildcardQuery and
> keeping span length to 0. 
> 
> http://mail-archives.apache.org/mod_mbox//lucene-java-user/200707.mbox/%3c469df09f.9030...@gmail.com%3e
> 
> 
> I tried proximity search in solr but it didn't work with
> wildcard. Is there
> any other provision to perform wildcard search in phrase
> query?

With https://issues.apache.org/jira/browse/SOLR-1604  you can use * operator 
inside phrases, e.g. "how to im*" 



  


best practice handling html content

2010-04-19 Thread Markus.Rietzler
hello,

we want to index and search in our intranet documents.
the field "body" contains html-tags.

in our schema.xml we have a fieldType text_de (see at the end of this mail) 
which uses charFilter solr.HTMLStripCharFilterFactory with index. 
so this is no problem. the text is put into the index without any html. i can 
do search over this field, also html entities like ä for a german umlaut 
(ä) do work,   are filtered out correct, support for german language etc.

so now comes the problem. the field body is defined like



so we do index it and also store the content. on the result page when we are 
printing body or the highlighing on body we have all the html tags back. sounds 
correct, as the HTML-Filter only works on the indexing...

so my question is, how is the best way to handle this case? strip out all html 
before adding the document to the index.
let solr do the html-filtering and then using some additional filtering on the 
GUI frontend when printing the search result?

or do i have misunderstand something?

thank you

markus


 schema.xml 


  








  
  






  



Caching of search results, caching proxy

2010-04-19 Thread Andy
I'm setting up my Solr index to be updated every x minutes.

Does Solr cache the result of a search, and then when next time the same search 
is requested, it'd recognize that the Index has not changed and therefore just 
return the previous result from cache without processing the search again?

If Solr doesn't do that, can Tomcat or Jetty be configured to cache a 
dynamically generated result for x minutes and serve that from cache until it 
expires?

Or I'd need to use a caching reverse proxy like Squid or Varnish to do that?

Please share your experience - do you actually set up some caching system like 
this?




  


Re: Caching of search results, caching proxy

2010-04-19 Thread Ahmet Arslan

> I'm setting up my Solr index to be
> updated every x minutes.
> 
> Does Solr cache the result of a search, and then when next
> time the same search is requested, it'd recognize that the
> Index has not changed and therefore just return the previous
> result from cache without processing the search again?

Yes. http://wiki.apache.org/solr/SolrCaching

Also http://wiki.apache.org/solr/SolrAndHTTPCaches


  


Re: Stemming - disable at query time - reg.

2010-04-19 Thread MitchK

Additionally to Alejandro's posting, I would say that you don't need to
specify an analyzer for index-time and query-time, since it *seems* (maybe I
am wrong) like you want to use the same functionality on index- and
query-time.

Hope this helps

- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p730019.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: best practice handling html content

2010-04-19 Thread Ahmet Arslan

> we want to index and search in our intranet documents.
> the field "body" contains html-tags.
> 
> in our schema.xml we have a fieldType text_de (see at the
> end of this mail) which uses charFilter
> solr.HTMLStripCharFilterFactory with index. 
> so this is no problem. the text is put into the index
> without any html. i can do search over this field, also html
> entities like ä for a german umlaut (ä) do work,
>   are filtered out correct, support for german
> language etc.
> 
> so now comes the problem. the field body is defined like
> 
>  stored="true" />
> 
> so we do index it and also store the content. on the result
> page when we are printing body or the highlighing on body we
> have all the html tags back. sounds correct, as the
> HTML-Filter only works on the indexing...
> 
> so my question is, how is the best way to handle this case?
> strip out all html before adding the document to the index.

I think this is the best way to do it if you want to display html-stripped 
content.  By doing so you will save disk space too. 

Similar discussion: http://search-lucene.com/m/hyKqg1MJEDL






Big problem with solr in an official server.

2010-04-19 Thread Ariel
Hi everybody:

I have a big problem with solr in a server with the memory size it is using,
I am setting up Solr with "java -jar start.jar" command in an ubuntu server,
the process start.jar is using 7Gb of  memory in the server and it is
affecting considerably the performance of the server.
I would want to know how to configure it to use a limited memory size with
high performance results, Do I need to migrate the solr to an apache tomcat
servlet container to improve the memory performance ???
Could you help me please ???
Thanks in advance.
Regards


Re: Help using boolean operators

2010-04-19 Thread MitchK

Erick,

I am a little bit confused, because I wasn't aware of this fact (and have
never noticed any wrong behaviour... maybe because I used the
dismax-handler).
How should I search for 
field1: This is a good string 
without doing something like
field1:this field1:is ... ?
If I quote the whole thing, Solr would search for the whole phrase (and only
the whole phrase), or am I wrong?

I would test it, if I can, but unfortunately it's not possible at the
moment. 

Thank you!

Mitch
-- 
View this message in context: 
http://n3.nabble.com/Help-using-boolean-operators-tp729102p730051.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Big problem with solr in an official server.

2010-04-19 Thread Ahmet Arslan

> Hi everybody:
> 
> I have a big problem with solr in a server with the memory
> size it is using,
> I am setting up Solr with "java -jar start.jar" command in
> an ubuntu server,
> the process start.jar is using 7Gb of  memory in the
> server and it is
> affecting considerably the performance of the server.
> I would want to know how to configure it to use a limited
> memory size with
> high performance results, Do I need to migrate the solr to
> an apache tomcat
> servlet container to improve the memory performance ???

Recent post about the "java -jar start.jar" :
http://search-lucene.com/m/atxZc2MSKig2/run+in+background






Re: LucidWorks Solr

2010-04-19 Thread MitchK

I am curious:
The idea behind a stemmer is not that he produces the correct infinitive for
a given word. The idea is that he produces always the same infintive for any
derivate of the word. 

What would be, if there is an unknown word? For example something like
slang? How does your solution works here? Does it scale? 

Thank you for sharing experiences. :)

- Mitch
-- 
View this message in context: 
http://n3.nabble.com/LucidWorks-Solr-tp727341p730059.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is solr ignored my filters ?

2010-04-19 Thread MitchK

Where should Solr know that Wasserwaage contains on "Wasser" and "Waage"?
You are searching for some extra-filter like
DictionaryCompundWordTokenFilter. 

Kind regards
- Mitch


stockii wrote:
> 
> okay. 
> 
> as example. i want to check if WordDelimiterFactory works correct. And i
> want to experimant with search in substrings with edgengram...
> 
> i have the problem with that string: "Kamera-Wasserwaage" ... 
> 
> so i think solr should filter this like this.
> 
> Kamera-Wasserwaage 
> -> Kamera
> -> Wasserwaage
> 
> but i want that solr split Wasserwaage into -> Wasser ->Waage and
> wasserwaage. But this only works with WasserWaage. grml... 
> 
> so i want to see how it is indexed. 
> 
> 
> 
-- 
View this message in context: 
http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p730071.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Big problem with solr in an official server.

2010-04-19 Thread Ariel
I have just read the post, but it doesn't said if the problems with memory
are associated with that way, the jetty web server it is used when I start
solr that way, then I supposed that problems with memory should not happen
because jetty must administrate the way the memory is used.

Then are you really sure I must migrate to a non jetty web server ??? It is
that what you recommend ?
Thanks in advance again.
Regards
Ariel

On Mon, Apr 19, 2010 at 12:27 PM, Ahmet Arslan  wrote:

>
> > Hi everybody:
> >
> > I have a big problem with solr in a server with the memory
> > size it is using,
> > I am setting up Solr with "java -jar start.jar" command in
> > an ubuntu server,
> > the process start.jar is using 7Gb of  memory in the
> > server and it is
> > affecting considerably the performance of the server.
> > I would want to know how to configure it to use a limited
> > memory size with
> > high performance results, Do I need to migrate the solr to
> > an apache tomcat
> > servlet container to improve the memory performance ???
>
> Recent post about the "java -jar start.jar" :
> http://search-lucene.com/m/atxZc2MSKig2/run+in+background
>
>
>
>
>


Re: LucidWorks Solr

2010-04-19 Thread Erick Erickson
This is a little bit of hijacking going on here, but

It's algorithmic. That is, there isn't a list of variants that
stem to the same infinitive, and your statement
"always the same infintive for any derivate of the word"
isn't quite what happens.

Stemmers will always produce the same infinitive for any given
word, just the opposite of what you said. But it is NOT guaranteed
that a stemmer will always produce the same infinitive for all
derivatives. Rather it just does a pretty darn good job with some
anomalies because the rules don't cover all the edge cases.

Their *goal* is to do it perfectly, but we all know about unachievable
goals...

HTH
Erick

On Mon, Apr 19, 2010 at 12:28 PM, MitchK  wrote:

>
> I am curious:
> The idea behind a stemmer is not that he produces the correct infinitive
> for
> a given word. The idea is that he produces always the same infintive for
> any
> derivate of the word.
>
> What would be, if there is an unknown word? For example something like
> slang? How does your solution works here? Does it scale?
>
> Thank you for sharing experiences. :)
>
> - Mitch
> --
> View this message in context:
> http://n3.nabble.com/LucidWorks-Solr-tp727341p730059.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: is solr ignored my filters ?

2010-04-19 Thread stockii

yes, thats what im sying to my chef... 

but i found another solution in this moment ;)

->

i use EdgeNGram only for my productnames and search with an OR operator in
my default "text" field and in the productname field. so i found all
substrings :D
-- 
View this message in context: 
http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p730102.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Big problem with solr in an official server.

2010-04-19 Thread Geek Gamer
if you want to limit the use of memory by the java process you could use
java -XmxNGB
where N is the amount of memory you want to limit to jetty container.

On Mon, Apr 19, 2010 at 10:05 PM, Ariel  wrote:

> I have just read the post, but it doesn't said if the problems with memory
> are associated with that way, the jetty web server it is used when I start
> solr that way, then I supposed that problems with memory should not happen
> because jetty must administrate the way the memory is used.
>
> Then are you really sure I must migrate to a non jetty web server ??? It is
> that what you recommend ?
> Thanks in advance again.
> Regards
> Ariel
>
> On Mon, Apr 19, 2010 at 12:27 PM, Ahmet Arslan  wrote:
>
> >
> > > Hi everybody:
> > >
> > > I have a big problem with solr in a server with the memory
> > > size it is using,
> > > I am setting up Solr with "java -jar start.jar" command in
> > > an ubuntu server,
> > > the process start.jar is using 7Gb of  memory in the
> > > server and it is
> > > affecting considerably the performance of the server.
> > > I would want to know how to configure it to use a limited
> > > memory size with
> > > high performance results, Do I need to migrate the solr to
> > > an apache tomcat
> > > servlet container to improve the memory performance ???
> >
> > Recent post about the "java -jar start.jar" :
> > http://search-lucene.com/m/atxZc2MSKig2/run+in+background
> >
> >
> >
> >
> >
>


Re: Big problem with solr in an official server.

2010-04-19 Thread Ariel
And what is the recommended max size memory I should use ??? Is there anyone
recommended ???
Regards.


On Mon, Apr 19, 2010 at 12:44 PM, Geek Gamer  wrote:

> if you want to limit the use of memory by the java process you could use
> java -XmxNGB
> where N is the amount of memory you want to limit to jetty container.
>
> On Mon, Apr 19, 2010 at 10:05 PM, Ariel  wrote:
>
> > I have just read the post, but it doesn't said if the problems with
> memory
> > are associated with that way, the jetty web server it is used when I
> start
> > solr that way, then I supposed that problems with memory should not
> happen
> > because jetty must administrate the way the memory is used.
> >
> > Then are you really sure I must migrate to a non jetty web server ??? It
> is
> > that what you recommend ?
> > Thanks in advance again.
> > Regards
> > Ariel
> >
> > On Mon, Apr 19, 2010 at 12:27 PM, Ahmet Arslan 
> wrote:
> >
> > >
> > > > Hi everybody:
> > > >
> > > > I have a big problem with solr in a server with the memory
> > > > size it is using,
> > > > I am setting up Solr with "java -jar start.jar" command in
> > > > an ubuntu server,
> > > > the process start.jar is using 7Gb of  memory in the
> > > > server and it is
> > > > affecting considerably the performance of the server.
> > > > I would want to know how to configure it to use a limited
> > > > memory size with
> > > > high performance results, Do I need to migrate the solr to
> > > > an apache tomcat
> > > > servlet container to improve the memory performance ???
> > >
> > > Recent post about the "java -jar start.jar" :
> > > http://search-lucene.com/m/atxZc2MSKig2/run+in+background
> > >
> > >
> > >
> > >
> > >
> >
>


Re: Big problem with solr in an official server.

2010-04-19 Thread Ahmet Arslan

> And what is the recommended max size
> memory I should use ??? Is there anyone
> recommended ???

What is your index size?


  


Re: LucidWorks Solr

2010-04-19 Thread MitchK

Yes, you are right, thank you Erick.
I've lost this point and thought only of common cases, not of special ones. 

However, one can combine the mentioned solutions and different stem-filters
in different fields, so that one can be quite (not absolutely) sure, that in
most of all cases the application works as expected. 

- Mitch
-- 
View this message in context: 
http://n3.nabble.com/LucidWorks-Solr-tp727341p730160.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Big problem with solr in an official server.

2010-04-19 Thread MitchK

Wasn't there a good posting on lucidworks.com?
The title was something like "deadly sins" or so.

There are some good suggestions on things like that :).

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Big-problem-with-solr-in-an-official-server-tp730049p730168.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fwd: Query 2 Cores

2010-04-19 Thread Lee Smith
Any ideas about my below Q ?

Lee

Begin forwarded message:

> From: Lee Smith 
> Date: 19 April 2010 11:19:45 GMT+01:00
> To: solr-user@lucene.apache.org
> Subject: Query 2 Cores
> Reply-To: solr-user@lucene.apache.org
> 
> Hey All
> 
> I have 2 cores which have been used with tika to do index files.
> 
> I would like to do one query on both at once as I will be searching 
> attr_content field.
> 
> If I do a test on each core I get 1 & 17 results but trying with shards I 
> just get 17 results.
> 
> Here is my example query
> 
> http://localhost8983/solr/core1/select?shards=localhost:8983/solr/core2&q=attr_content:test
> 
> Is this the correct way to query 2 cores at once ?
> 
> Hope you can help
> 
> Lee



Re: Fwd: Query 2 Cores

2010-04-19 Thread Shawn Heisey

On 4/19/2010 11:09 AM, Lee Smith wrote:



http://localhost8983/solr/core1/select?shards=localhost:8983/solr/core2&q=attr_content:test

Is this the correct way to query 2 cores at once ?
 


This should do what you want:

http://localhost:8983/solr/core1/select?shards=localhost:8983/solr/core1,localhost:8983/solr/core2&q=attr_content:test



Re: LucidWorks Solr

2010-04-19 Thread darren
My use requires a mroe correct processing of language than what you define
as a stemmer. My experience with stemmers is that even with some words
without a stem, it makes a new word from it. I consider those false
positives.

My approach is based on the need to recognize that walk, walked, walking
all refer to the same lemma "walk" as is correct in grammar (not some
stemmer algorithm choice).

It scales fine. In fact, I use lucene with Instantiated in-memory index to
perform the lookups, but one could easily use MySQL or something else.

Darren

>
> I am curious:
> The idea behind a stemmer is not that he produces the correct infinitive
> for
> a given word. The idea is that he produces always the same infintive for
> any
> derivate of the word.
>
> What would be, if there is an unknown word? For example something like
> slang? How does your solution works here? Does it scale?
>
> Thank you for sharing experiences. :)
>
> - Mitch
> --
> View this message in context:
> http://n3.nabble.com/LucidWorks-Solr-tp727341p730059.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



Re: LucidWorks Solr

2010-04-19 Thread darren
> This is a little bit of hijacking going on here, but
You are right. Accept my regrets.


> It's algorithmic. That is, there isn't a list of variants that
> stem to the same infinitive, and your statement
> "always the same infintive for any derivate of the word"
> isn't quite what happens.
>
> Stemmers will always produce the same infinitive for any given
> word, just the opposite of what you said. But it is NOT guaranteed
> that a stemmer will always produce the same infinitive for all
> derivatives. Rather it just does a pretty darn good job with some
> anomalies because the rules don't cover all the edge cases.
>
> Their *goal* is to do it perfectly, but we all know about unachievable
> goals...
>
> HTH
> Erick
>
> On Mon, Apr 19, 2010 at 12:28 PM, MitchK  wrote:
>
>>
>> I am curious:
>> The idea behind a stemmer is not that he produces the correct infinitive
>> for
>> a given word. The idea is that he produces always the same infintive for
>> any
>> derivate of the word.
>>
>> What would be, if there is an unknown word? For example something like
>> slang? How does your solution works here? Does it scale?
>>
>> Thank you for sharing experiences. :)
>>
>> - Mitch
>> --
>> View this message in context:
>> http://n3.nabble.com/LucidWorks-Solr-tp727341p730059.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>



Re: LucidWorks Solr

2010-04-19 Thread Erick Erickson
no big deal, just wanted to mention.

On Mon, Apr 19, 2010 at 1:24 PM,  wrote:

> > This is a little bit of hijacking going on here, but
> You are right. Accept my regrets.
>
>
> > It's algorithmic. That is, there isn't a list of variants that
> > stem to the same infinitive, and your statement
> > "always the same infintive for any derivate of the word"
> > isn't quite what happens.
> >
> > Stemmers will always produce the same infinitive for any given
> > word, just the opposite of what you said. But it is NOT guaranteed
> > that a stemmer will always produce the same infinitive for all
> > derivatives. Rather it just does a pretty darn good job with some
> > anomalies because the rules don't cover all the edge cases.
> >
> > Their *goal* is to do it perfectly, but we all know about unachievable
> > goals...
> >
> > HTH
> > Erick
> >
> > On Mon, Apr 19, 2010 at 12:28 PM, MitchK  wrote:
> >
> >>
> >> I am curious:
> >> The idea behind a stemmer is not that he produces the correct infinitive
> >> for
> >> a given word. The idea is that he produces always the same infintive for
> >> any
> >> derivate of the word.
> >>
> >> What would be, if there is an unknown word? For example something like
> >> slang? How does your solution works here? Does it scale?
> >>
> >> Thank you for sharing experiences. :)
> >>
> >> - Mitch
> >> --
> >> View this message in context:
> >> http://n3.nabble.com/LucidWorks-Solr-tp727341p730059.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >
>
>


synonym filter and offsets

2010-04-19 Thread Joe Calderon
hello *, im having issues with the synonym filter altering token offsets,

my input text is
"saturday night live"
its is tokenized by the whitespace tokenizer yielding 3 tokens
[saturday, 0,8], [night, 9, 14], [live, 15,19]

on indexing these are passed through a synonym filter that has this line
saturday night live => snl, saturday night live


i now end up with four tokens
[saturday, 0, 19], [snl, 0, 19], [night, 0, 19], [live, 0,19]

what i want is
[saturday, 0,8], [snl, 0,19], [night, 9, 14], [live, 15,19]


when using the highlighter i want to make it so only the relevant part
of the text is highlighted, how can i fix my filter chain?


thx much
--joe


Re: LucidWorks Solr

2010-04-19 Thread Otis Gospodnetic
Andy,

This will help with smooth injection of your multilingual documents into Solr 
(multilingual either in the sense of 1 doc containing fields in multiple 
languages or 1 index containing documents in different languages):

  http://sematext.com/products/multilingual-indexer/index.html

Re your other question about open-source morpho dictionaries - I don't know of 
any.  Last time I looked for dictionaries I learned that they cost money.  That 
said, the market for datasets is starting to grow, so you may be able to find 
more and cheaper dictionaries now.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Andy 
> To: solr-user@lucene.apache.org
> Sent: Mon, April 19, 2010 8:45:40 AM
> Subject: Re: LucidWorks Solr
> 
> Thanks for the explanation Mitch.

You're right. There can't be universal 
> stemmers.

What about multi-language stemmers? I'm mostly interested in 
> English, Spanish, German, French, Italian. Are there any stemmers that would 
> handle those languages?

If not, what's the recommended way to deal with 
> documents in multiple languages?

--- On Mon, 4/19/10, MitchK <
> ymailto="mailto:mitc...@web.de"; 
> href="mailto:mitc...@web.de";>mitc...@web.de> wrote:

> From: 
> MitchK <
> href="mailto:mitc...@web.de";>mitc...@web.de>
> Subject: Re: 
> LucidWorks Solr
> To: 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
> 
> Date: Monday, April 19, 2010, 4:36 AM
> 
> Andy, I think it is 
> important to know what a stemmer really
> is.
> 
> It reduces 
> words to their infinitves. Those infinitives do
> not refer to the
> 
> real infinitive everytime, but however: for the system, it
> is an 
> infinitive,
> since all its derivates could be reduced to the same 
> form.
> Thats a stemmer.
> 
> According to this, there can't 
> exist a stemmer for every
> language, because
> every language has 
> got its own rules of how to reduce a
> word to its
> 
> infinitive.
> 
> If you apply a stemmer for english language on a 
> german
> document, the
> results might be unexpected. However, 
> sometimes it still
> works good enough. 
> 
> Keep in mind 
> that this is an algorithm. It is not important
> whether the
> 
> created infinitive is the real infinitive. It is only
> important that 
> most of
> the derivate forms can be reduced to the same basic 
> form.
> Please ask, if
> something is not clear.
> 
> 
> KStem:
> The wiki[1] says that KStem is less aggressive as the
> 
> standard stemmer.
> I guess that this means that there are more rules for 
> how
> to reduce a word
> to its infinitive and according to this the 
> results might
> be better.
> 
> 
> [1] 
> href="http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem"; 
> target=_blank 
> >http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
> 
> 
> Kind regards
> - Mitch
> -- 
> View this message in 
> context: 
> target=_blank 
> >http://n3.nabble.com/LucidWorks-Solr-tp727341p729110.html
> Sent from 
> the Solr - User mailing list archive at
> Nabble.com.
> 
> 


Re: LucidWorks Solr

2010-04-19 Thread Andy

> Andy,
> 
> This will help with smooth injection of your multilingual
> documents into Solr (multilingual either in the sense of 1
> doc containing fields in multiple languages or 1 index
> containing documents in different languages):
> 
>   http://sematext.com/products/multilingual-indexer/index.html


Otis,

Thanks for the info.

Is multilingual indexer an open source project or a commercial product? That 
web page doesn't mention anything about either open source or a price, so it's 
hard to tell.







Re: Help using boolean operators

2010-04-19 Thread Erick Erickson
?id you try parenthesizing:
field1:(This is a good string)

You can try lots of things easily by going to
http://localhost:8983/solr/admin/form.jsp
and clicking the "debug enable" checkbox...

HTH
Erick

On Mon, Apr 19, 2010 at 12:23 PM, MitchK  wrote:

>
> Erick,
>
> I am a little bit confused, because I wasn't aware of this fact (and have
> never noticed any wrong behaviour... maybe because I used the
> dismax-handler).
> How should I search for
> field1: This is a good string
> without doing something like
> field1:this field1:is ... ?
> If I quote the whole thing, Solr would search for the whole phrase (and
> only
> the whole phrase), or am I wrong?
>
> I would test it, if I can, but unfortunately it's not possible at the
> moment.
>
> Thank you!
>
> Mitch
> --
> View this message in context:
> http://n3.nabble.com/Help-using-boolean-operators-tp729102p730051.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Help using boolean operators

2010-04-19 Thread Erik Hatcher
Careful though... the Solr admin page is for *analysis* testing, not  
query parsing.  I saw that mentioned earlier too.  To test query  
parsing, submit your query to http://localhost:8983/solr/select?q=your_query&debugQuery=true 
 and look at the parsed query output.


Erik

On Apr 19, 2010, at 6:45 PM, Erick Erickson wrote:


?id you try parenthesizing:
field1:(This is a good string)

You can try lots of things easily by going to
http://localhost:8983/solr/admin/form.jsp
and clicking the "debug enable" checkbox...

HTH
Erick

On Mon, Apr 19, 2010 at 12:23 PM, MitchK  wrote:



Erick,

I am a little bit confused, because I wasn't aware of this fact  
(and have

never noticed any wrong behaviour... maybe because I used the
dismax-handler).
How should I search for
field1: This is a good string
without doing something like
field1:this field1:is ... ?
If I quote the whole thing, Solr would search for the whole phrase  
(and

only
the whole phrase), or am I wrong?

I would test it, if I can, but unfortunately it's not possible at the
moment.

Thank you!

Mitch
--
View this message in context:
http://n3.nabble.com/Help-using-boolean-operators- 
tp729102p730051.html

Sent from the Solr - User mailing list archive at Nabble.com.





Re: Help using boolean operators

2010-04-19 Thread Erick Erickson
Hmmm, I *thought* I saw the XML response with the parsed query in it, did I
miss the details *again*?

Erick

On Mon, Apr 19, 2010 at 7:15 PM, Erik Hatcher wrote:

> Careful though... the Solr admin page is for *analysis* testing, not query
> parsing.  I saw that mentioned earlier too.  To test query parsing, submit
> your query to
> http://localhost:8983/solr/select?q=your_query&debugQuery=true and look at
> the parsed query output.
>
>Erik
>
>
> On Apr 19, 2010, at 6:45 PM, Erick Erickson wrote:
>
>  ?id you try parenthesizing:
>> field1:(This is a good string)
>>
>> You can try lots of things easily by going to
>> http://localhost:8983/solr/admin/form.jsp
>> and clicking the "debug enable" checkbox...
>>
>> HTH
>> Erick
>>
>> On Mon, Apr 19, 2010 at 12:23 PM, MitchK  wrote:
>>
>>
>>> Erick,
>>>
>>> I am a little bit confused, because I wasn't aware of this fact (and have
>>> never noticed any wrong behaviour... maybe because I used the
>>> dismax-handler).
>>> How should I search for
>>> field1: This is a good string
>>> without doing something like
>>> field1:this field1:is ... ?
>>> If I quote the whole thing, Solr would search for the whole phrase (and
>>> only
>>> the whole phrase), or am I wrong?
>>>
>>> I would test it, if I can, but unfortunately it's not possible at the
>>> moment.
>>>
>>> Thank you!
>>>
>>> Mitch
>>> --
>>> View this message in context:
>>> http://n3.nabble.com/Help-using-boolean-operators-tp729102p730051.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>


Re: Help using boolean operators

2010-04-19 Thread Erik Hatcher
Ah sorry... my bad.  You're right.   I thought you were referring to  
the admin analysis.jsp page, but I misread and replied to quickly.   
You're spot on, Erick.


Erik


On Apr 19, 2010, at 7:21 PM, Erick Erickson wrote:

Hmmm, I *thought* I saw the XML response with the parsed query in  
it, did I

miss the details *again*?

Erick

On Mon, Apr 19, 2010 at 7:15 PM, Erik Hatcher  
wrote:


Careful though... the Solr admin page is for *analysis* testing,  
not query
parsing.  I saw that mentioned earlier too.  To test query parsing,  
submit

your query to
http://localhost:8983/solr/select?q=your_query&debugQuery=true and  
look at

the parsed query output.

  Erik


On Apr 19, 2010, at 6:45 PM, Erick Erickson wrote:

?id you try parenthesizing:

field1:(This is a good string)

You can try lots of things easily by going to
http://localhost:8983/solr/admin/form.jsp
and clicking the "debug enable" checkbox...

HTH
Erick

On Mon, Apr 19, 2010 at 12:23 PM, MitchK  wrote:



Erick,

I am a little bit confused, because I wasn't aware of this fact  
(and have

never noticed any wrong behaviour... maybe because I used the
dismax-handler).
How should I search for
field1: This is a good string
without doing something like
field1:this field1:is ... ?
If I quote the whole thing, Solr would search for the whole  
phrase (and

only
the whole phrase), or am I wrong?

I would test it, if I can, but unfortunately it's not possible at  
the

moment.

Thank you!

Mitch
--
View this message in context:
http://n3.nabble.com/Help-using-boolean-operators-tp729102p730051.html
Sent from the Solr - User mailing list archive at Nabble.com.








Highlighting apostrophe

2010-04-19 Thread Blargy

I have the following text field:


  





  

...
   


When I search for women's, womens or women I correctly get back all the
results I want. However when I use the highlighting feature it only
highlights women in the women's cases. How can I highlight the whole word
women's including the apostrophe?

Thanks
-- 
View this message in context: 
http://n3.nabble.com/Highlighting-apostrophe-tp731155p731155.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting apostrophe

2010-04-19 Thread Blargy

Same general question about highlighting the full work "sunglasses" when I
search for glasses. Is this possible?

Thanks
-- 
View this message in context: 
http://n3.nabble.com/Highlighting-apostrophe-tp731155p731305.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Stemming - disable at query time - reg.

2010-04-19 Thread Naga Darbha
Yes, both have same filters, so we can avoid specifying analyzer type.

- Naga

-Original Message-
From: MitchK [mailto:mitc...@web.de] 
Sent: Monday, April 19, 2010 9:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Stemming - disable at query time - reg.


Additionally to Alejandro's posting, I would say that you don't need to
specify an analyzer for index-time and query-time, since it *seems* (maybe I
am wrong) like you want to use the same functionality on index- and
query-time.

Hope this helps

- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Stemming-disable-at-query-time-reg-tp729152p730019.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Thanks Erick. Using parentheses works. 

With parentheses, the query,q=field1: (this is a good string) is parsed as 
follows :

+field1:this +field1:good +field1:string

Is that ok to do. 

Thanks,
Sandhya

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, April 20, 2010 4:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Help using boolean operators

?id you try parenthesizing:
field1:(This is a good string)

You can try lots of things easily by going to
http://localhost:8983/solr/admin/form.jsp
and clicking the "debug enable" checkbox...

HTH
Erick

On Mon, Apr 19, 2010 at 12:23 PM, MitchK  wrote:

>
> Erick,
>
> I am a little bit confused, because I wasn't aware of this fact (and have
> never noticed any wrong behaviour... maybe because I used the
> dismax-handler).
> How should I search for
> field1: This is a good string
> without doing something like
> field1:this field1:is ... ?
> If I quote the whole thing, Solr would search for the whole phrase (and
> only
> the whole phrase), or am I wrong?
>
> I would test it, if I can, but unfortunately it's not possible at the
> moment.
>
> Thank you!
>
> Mitch
> --
> View this message in context:
> http://n3.nabble.com/Help-using-boolean-operators-tp729102p730051.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr throws TikaException while parsing sample PDF

2010-04-19 Thread Praveen Agrawal
I'm using Solr 1.4 distribution, with Solr cell. Can i update only new
version of Tika in Solr 1.4 distn? If yes, any guide etc?
Thanks.


On Mon, Apr 19, 2010 at 4:36 PM, Koji Sekiguchi  wrote:

> Praveen Agrawal wrote:
>
>> Hi Grant,
>> I tried command line of Tika v-0.7(newest), and it parsed the file.. I
>> believe Solr1.4 contains 0.4 version of Tika.
>> Do you suggest to upgrade to new Tika? Can i upgrade only tika in
>> Solr-1.4?
>> or i need to wait till Solr ships with new Tika?
>> Thanks.
>>
>>
> Solr trunk uses Tika 0.7. I'm not SolrCell user, so this is just an FYI.
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>