date:20110222

Re: Problem with XML encode UFT-8

2011-02-22 Thread Jan Høydahl

Hi,

Please explain some more.
a) What version of Solr?
b) Are you trying to feed XML or PDF?
c) What request handler are you feeding to? /update or /update/extract ?
d) Can you copy/paste some more lines from the error log?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 21. feb. 2011, at 15.02, jayronsoares wrote:

> 
> Hi I'm using solr py to stored files in pdf, however at moment of run script,
> shows me that issue:
> 
> An invalid XML character (Unicode: 0xc) was found in the element content of
> the document.
> 
> Someone could give some help?
> 
> cheers
> jayron
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2545020.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting

2011-02-22 Thread Jan Høydahl

Hi,

Even if the customer types a correct product name, how do you know that 
merchant A and merchant B both have registered that exact product in the same 
way?

Merchant A may say as product name "White Sony LCD TV XY123" and the other says 
"Sony XY123 LCD TV", colour=white

If you're serious about price comparison service, I think you need to invest in 
finding what products are the same before indexing, and then tagging them with 
some unique normalized name. Then when after a search, you show a facet with 
that normalized name and first when the user has selected the correct facet, 
can you be 100% certain that you're comparing apples to apples.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 22. feb. 2011, at 07.23, Praveen Parameswaran wrote:

> Hi ,
> @Tommaso @Jan Høydahl Thanks for the response :)
> 
> I 've done it almost similar to what Tommaso suggested and yes it's about
> 70-80% accurate.
> I understand the contradiction in the search - customer find stuff without
> the exact right wording (recall) at the same time as you want the query to
> be precise (precision).
> 
> In my scenario both cases are there as well, but mostly a customer would
> know which product name he is searching for and he will be interested in
> comparing the prices that different marchants offer. What I feel is that ,
> may be the "Search" itself has to be classified based on the contexts.
> 
> Will it be possible in solr to have the below:
> 1 . A customer uses the correct product name to search , get the accurate
> results
> 2.  A customer uses a keyword or without the exact name , get the most
> relevant results.
> 
> 2nd part is fine as it's working good. 1st part is where I'm struggling.
> 
> thanks
> Praveen
> 
> On Mon, Feb 21, 2011 at 5:23 PM, Tommaso Teofili
> wrote:
> 
>> Hi Praveen,
>> as far as I understand you have to set the type of the field(s) you are
>> searching over to be conservative.
>> So for example you won't include stemmer and lowercase filters and use only
>> a whitespace tokenizer, more over you should search with the default
>> operator set to AND.
>> Then faceting over those field(s) will depend on those type settings.
>> You may find the following wiki page useful:
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> My 2 cents,
>> 
>> 
>> 2011/2/21 Praveen Parameswaran 
>> 
>>> Hi,
>>> 
>>> Is it possible to have 100% accuracy for facet counts using solr ? Since
>>> this is for a product price comparison site I would need the search to
>>> return accurate results. for example if I search "sony lcd Tv" I do not
>>> want
>>> "sony Led Tv" to be returned int he results.  Please let me know if this
>> is
>>> possible and how?
>>> 
>>> 
>>> Thanks
>>> 
>>> Prav
>>> 
>>

Configure 2 or more Tomcat instances.

2011-02-22 Thread rajini maski

   I have a tomcat6.0 instance running in my system, with
connector port-8090, shutdown port -8005 ,AJP/1.3  port-8009 and redirect
port-8443  in server.xml (path = C:\Program Files\Apache Software
Foundation\Tomcat 6.0\conf\server.xml)

   How do I configure one more independent tomcat instance
in the same system..? I went through many sites.. but couldn't fix
this. If anyone one know the proper configuration steps please reply..

Regards,
Rajani Maski

Re: Configure 2 or more Tomcat instances.

2011-02-22 Thread Jonathan DeMello

Hey Rajani,

>From what I've seen, you just need to copy the Tomcat folder and change the
following ports in server.xml: shutdown, connector,ajp. Then you can start
them up independently.

Regards,

Jonathan


On Tue, Feb 22, 2011 at 3:15 PM, rajini maski  wrote:

>   I have a tomcat6.0 instance running in my system, with
> connector port-8090, shutdown port -8005 ,AJP/1.3  port-8009 and redirect
> port-8443  in server.xml (path = C:\Program Files\Apache Software
> Foundation\Tomcat 6.0\conf\server.xml)
>
>   How do I configure one more independent tomcat instance
> in the same system..? I went through many sites.. but couldn't fix
> this. If anyone one know the proper configuration steps please reply..
>
> Regards,
> Rajani Maski
>

solr indexing

2011-02-22 Thread satya swaroop

Hi all,
   to my keen intrest on solr indexing mechanism i started mining the
code of solr indexing (/update/extract), i read the indexing file formats,
scoring procedure, i have some queries regarding this..
1) the scoring is performed on the dynamic and precalculated value(doc
boost, field boost, lengthnorm). In calculating the score if suppose a term
in the index consits nearly one million docs then is solr calculating the
score for each and every doc present for the term and getting the top docs
from the index??? or is it undergoing any mechanism such that limiting the
calculation of score to only a particular docs???

If anybody know about it or any documentation regarding this please inform
me...


Regards,
satya

Re: Any plan to make Field Collapsing available for distributed search?

2011-02-22 Thread Koji Sekiguchi


(11/02/22 13:46), Andy wrote:

Hello,

I'm looking into Field Collapsing. According to the documentation one limitation is that 
"distributed search support for result grouping has not yet been implemented."

Just wondered if there's any plan to add distributed search support to field 
collapsing. Or is there any technical obstacle that make such a feature 
unlikely?

Thanks

Andy


Andy,

There is an open ticket for it:
https://issues.apache.org/jira/browse/SOLR-2066

Koji
--
http://www.rondhuit.com/en/

disable replication in a persistent way

2011-02-22 Thread Ahmet Arslan

Hello,

solr/replication?command=disablepoll disables replication on slave(s). However 
it is not persistent. After solr/tomcat restart, slave(s) will continue 
polling. 

Is there a built-in way to disable replication on slave side in a persistent 
manner?

Currently I am using system property substitution along with 
solrcore.properties file to simulate this.


${enable.slave:false} 

#solrcore.properties in slave
enable.master=true

And modify solrcore.properties with a custom solr request handler after the 
disablepoll command, to make it persistent. It seems that there is no existing 
mechanism to write solrconfig.properties file, am I correct?

Thanks,
Ahmet

Re: Datetime problems with dataimport

2011-02-22 Thread MOuli


Ok i got it.

It should look like -mm-ddThh:mm:ssZ
for example: 2011-02-22T15:07:00Z
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Datetime-problems-with-dataimport-tp2545654p2552477.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Datetime problems with dataimport

2011-02-22 Thread Adam Estrada

I logged an issue in Jira that relates to this and it looks like Yonik picked 
it up.

https://issues.apache.org/jira/browse/SOLR-2286

Adam


On Feb 22, 2011, at 9:07 AM, MOuli wrote:

> 
> Ok i got it.
> 
> It should look like -mm-ddThh:mm:ssZ
> for example: 2011-02-22T15:07:00Z
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Datetime-problems-with-dataimport-tp2545654p2552477.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Multiple Blocked threads on UnInvertedField.getUnInvertedField() & SegmentReader$CoreReaders.getTermsReader

2011-02-22 Thread Rachita Choudhary

Hi Solr Users,

We are upgrading from Solr 1.3 to Solr 1.4.1.
While using Solr 1.3 , we were seeing multiple blocking active threads on
"org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal() ".

To utilize the benefits of NIO, on upgrading to Solr 1.4.1, we see other
type of multiple blocking threads on
"org.apache.solr.request.UnInvertedField.getUnInvertedField()  &

SegmentReader$CoreReaders.getTermsReader".
Due to this, the QTimes shoots up from few hundreds to thousand of
msec.. even going upto 30-40 secs for a single query.

- The multiple blocking threads show up after few thousands of queries.
- We do not have faceting and sorting on the same fields.
- Our facet fields are multivalued text fields, but no large text values are
present.
- Index size - around 10 GB
- We have not specified any method for faceting in our schema.xml.
- Our field value cache settings are:
 

Can someone please tell us the why we are seeing these blocked threads ?
Also if they are related to our field value cache , then a cache of size 175
will be filled up with very few initial queries and right after that we
should see multiple blocking threads ?
What difference it will make if we have "facet.method = enum" ?
Is this all related to fieldValueCache or is there some other configuration
which we need to set to avoid these blocking threads?

Thanks,
Rachita

*Cache values example:
*facetField1_27443 :
{field=facet1_27443,memSize=4214884,tindexSize=52,time=22,phase1=15,nTerms=4,bigTerms=0,termInstances=6,uses=1}

facetField1_70 :
{field=facetField1_70,memSize=4223310,tindexSize=308,time=28,phase1=21,nTerms=636,bigTerms=0,termInstances=14404,uses=1}

facetField2 : 
{field=facetField2,memSize=4262644,tindexSize=3156,time=273,phase1=267,nTerms=12188,bigTerms=0,termInstances=1255522,uses=7031}
*
Stack trace for
"org.apache.solr.request.UnInvertedField.getUnInvertedField() -
BLOCKED"*

at org.apache.solr.request.UnInvertedField.getUnInvertedField
(UnInvertedField.java:837)
 at org.apache.solr.request.SimpleFacets.getTermCounts (SimpleFacets.java:250)
 at org.apache.solr.request.SimpleFacets.getFacetFieldCounts
(SimpleFacets.java:283)
 at org.apache.solr.request.SimpleFacets.getFacetCounts (SimpleFacets.java:166)
 at org.apache.solr.handler.component.FacetComponent.process
(FacetComponent.java:72)
 at org.apache.solr.handler.component.SearchHandler.handleRequestBody
(SearchHandler.java:195)
 at org.apache.solr.handler.RequestHandlerBase.handleRequest
(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute (SolrCore.java:1316)
 at org.apache.solr.servlet.SolrDispatchFilter.execute
(SolrDispatchFilter.java:338)
 at org.apache.solr.servlet.SolrDispatchFilter.doFilter
(SolrDispatchFilter.java:241)
 at com.caucho.server.dispatch.FilterFilterChain.doFilter
(FilterFilterChain.java:87)
 at com.caucho.server.webapp.WebAppFilterChain.doFilter
(WebAppFilterChain.java:187)
 at com.caucho.server.dispatch.ServletInvocation.service
(ServletInvocation.java:266)
 at com.caucho.server.http.HttpRequest.handleRequest (HttpRequest.java:270)
 at com.caucho.server.port.TcpConnection.run (TcpConnection.java:678)
 at com.caucho.util.ThreadPool$Item.runTasks (ThreadPool.java:721)
 at com.caucho.util.ThreadPool$Item.run (ThreadPool.java:643)
 at java.lang.Thread.run (Thread.java:595)


*org.apache.lucene.index.SegmentReader$CoreReaders.getTermsReader() -
BLOCKED*

at org.apache.lucene.index.SegmentReader$CoreReaders.getTermsReader
(SegmentReader.java:170)
 at org.apache.lucene.index.SegmentTermDocs. (SegmentTermDocs.java:52)
 at org.apache.lucene.index.SegmentReader.termDocs (SegmentReader.java:987)
 at org.apache.lucene.index.IndexReader.termDocs (IndexReader.java:1102)
 at org.apache.lucene.index.SegmentReader.termDocs (SegmentReader.java:981)
 at org.apache.solr.search.SolrIndexReader.termDocs (SolrIndexReader.java:320)
 at org.apache.solr.search.SolrIndexSearcher.getDocSetNC
(SolrIndexSearcher.java:640)
 at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet
(SolrIndexSearcher.java:563)
 at org.apache.solr.search.SolrIndexSearcher.numDocs
(SolrIndexSearcher.java:1422)
 at com.askme.solrenhancements.facet.ExtendedFacet.getCustomFacetCount
(ExtendedFacet.java:132)
 at com.askme.solrenhancements.facet.ExtendedFacet.getCustomFacetCount
(ExtendedFacet.java:92)
 at com.askme.solrenhancements.facet.ExtendedFacet.getFacetAdditionalInfo
(ExtendedFacet.java:69)
 at com.askme.solrenhancements.facet.ExtendedFacet.getFacetInfo
(ExtendedFacet.java:56)
 at com.askme.solrenhancements.facet.CustomFacetComponent.process
(CustomFacetComponent.java:43)
 at org.apache.solr.handler.component.SearchHandler.handleRequestBody
(SearchHandler.java:195)
 at org.apache.solr.handler.RequestHandlerBase.handleRequest
(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute (SolrCore.java:1316)
 at org.apache.solr.servlet.SolrDispatchFilter.execute
(SolrDispatchFilter.java:338)
 at org.apache.solr.servlet.SolrDispatchFilter.doFilter
(SolrDi

Re: Datetime problems with dataimport

2011-02-22 Thread MOuli


Can you give me an example?

Should it looks like 2011-02-22'T'14:55:20 or 2011-02-22T14:55:20 or
2011-02-22 14:55:20. I tested every one of this formats, but got anyway the
Exception.

Invalid Date String:'2009-12-09'T'00:00:00'
Invalid Date String:'2009-12-09 00:00:00'
Invalid Date String:'2009-12-09T00:00:00'

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Datetime-problems-with-dataimport-tp2545654p2552422.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configure 2 or more Tomcat instances.

2011-02-22 Thread Paul Libbrecht

Rajini,

you need to make the (~3) ports defined in conf/server.xml different.

paul


Le 22 févr. 2011 à 12:15, rajini maski a écrit :

>   I have a tomcat6.0 instance running in my system, with
> connector port-8090, shutdown port -8005 ,AJP/1.3  port-8009 and redirect
> port-8443  in server.xml (path = C:\Program Files\Apache Software
> Foundation\Tomcat 6.0\conf\server.xml)
> 
>   How do I configure one more independent tomcat instance
> in the same system..? I went through many sites.. but couldn't fix
> this. If anyone one know the proper configuration steps please reply..
> 
> Regards,
> Rajani Maski

Re: Multiple Blocked threads on UnInvertedField.getUnInvertedField() & SegmentReader$CoreReaders.getTermsReader

2011-02-22 Thread Bill Bell

+1 for more investigation

Bill Bell
Sent from mobile


On Feb 22, 2011, at 7:13 AM, Rachita Choudhary  
wrote:

> Hi Solr Users,
> 
> We are upgrading from Solr 1.3 to Solr 1.4.1.
> While using Solr 1.3 , we were seeing multiple blocking active threads on
> "org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal() ".
> 
> To utilize the benefits of NIO, on upgrading to Solr 1.4.1, we see other
> type of multiple blocking threads on
> "org.apache.solr.request.UnInvertedField.getUnInvertedField()  &
> 
> SegmentReader$CoreReaders.getTermsReader".
> Due to this, the QTimes shoots up from few hundreds to thousand of
> msec.. even going upto 30-40 secs for a single query.
> 
> - The multiple blocking threads show up after few thousands of queries.
> - We do not have faceting and sorting on the same fields.
> - Our facet fields are multivalued text fields, but no large text values are
> present.
> - Index size - around 10 GB
> - We have not specified any method for faceting in our schema.xml.
> - Our field value cache settings are:
> class="solr.FastLRUCache"
>size="175"
>autowarmCount="0"
>showItems="10"
>  />
> 
> Can someone please tell us the why we are seeing these blocked threads ?
> Also if they are related to our field value cache , then a cache of size 175
> will be filled up with very few initial queries and right after that we
> should see multiple blocking threads ?
> What difference it will make if we have "facet.method = enum" ?
> Is this all related to fieldValueCache or is there some other configuration
> which we need to set to avoid these blocking threads?
> 
> Thanks,
> Rachita
> 
> *Cache values example:
> *facetField1_27443 :
> {field=facet1_27443,memSize=4214884,tindexSize=52,time=22,phase1=15,nTerms=4,bigTerms=0,termInstances=6,uses=1}
> 
> facetField1_70 :
> {field=facetField1_70,memSize=4223310,tindexSize=308,time=28,phase1=21,nTerms=636,bigTerms=0,termInstances=14404,uses=1}
> 
> facetField2 : 
> {field=facetField2,memSize=4262644,tindexSize=3156,time=273,phase1=267,nTerms=12188,bigTerms=0,termInstances=1255522,uses=7031}
> *
> Stack trace for
> "org.apache.solr.request.UnInvertedField.getUnInvertedField() -
> BLOCKED"*
> 
> at org.apache.solr.request.UnInvertedField.getUnInvertedField
> (UnInvertedField.java:837)
> at org.apache.solr.request.SimpleFacets.getTermCounts (SimpleFacets.java:250)
> at org.apache.solr.request.SimpleFacets.getFacetFieldCounts
> (SimpleFacets.java:283)
> at org.apache.solr.request.SimpleFacets.getFacetCounts (SimpleFacets.java:166)
> at org.apache.solr.handler.component.FacetComponent.process
> (FacetComponent.java:72)
> at org.apache.solr.handler.component.SearchHandler.handleRequestBody
> (SearchHandler.java:195)
> at org.apache.solr.handler.RequestHandlerBase.handleRequest
> (RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute (SolrCore.java:1316)
> at org.apache.solr.servlet.SolrDispatchFilter.execute
> (SolrDispatchFilter.java:338)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter
> (SolrDispatchFilter.java:241)
> at com.caucho.server.dispatch.FilterFilterChain.doFilter
> (FilterFilterChain.java:87)
> at com.caucho.server.webapp.WebAppFilterChain.doFilter
> (WebAppFilterChain.java:187)
> at com.caucho.server.dispatch.ServletInvocation.service
> (ServletInvocation.java:266)
> at com.caucho.server.http.HttpRequest.handleRequest (HttpRequest.java:270)
> at com.caucho.server.port.TcpConnection.run (TcpConnection.java:678)
> at com.caucho.util.ThreadPool$Item.runTasks (ThreadPool.java:721)
> at com.caucho.util.ThreadPool$Item.run (ThreadPool.java:643)
> at java.lang.Thread.run (Thread.java:595)
> 
> 
> *org.apache.lucene.index.SegmentReader$CoreReaders.getTermsReader() -
> BLOCKED*
> 
> at org.apache.lucene.index.SegmentReader$CoreReaders.getTermsReader
> (SegmentReader.java:170)
> at org.apache.lucene.index.SegmentTermDocs. (SegmentTermDocs.java:52)
> at org.apache.lucene.index.SegmentReader.termDocs (SegmentReader.java:987)
> at org.apache.lucene.index.IndexReader.termDocs (IndexReader.java:1102)
> at org.apache.lucene.index.SegmentReader.termDocs (SegmentReader.java:981)
> at org.apache.solr.search.SolrIndexReader.termDocs (SolrIndexReader.java:320)
> at org.apache.solr.search.SolrIndexSearcher.getDocSetNC
> (SolrIndexSearcher.java:640)
> at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet
> (SolrIndexSearcher.java:563)
> at org.apache.solr.search.SolrIndexSearcher.numDocs
> (SolrIndexSearcher.java:1422)
> at com.askme.solrenhancements.facet.ExtendedFacet.getCustomFacetCount
> (ExtendedFacet.java:132)
> at com.askme.solrenhancements.facet.ExtendedFacet.getCustomFacetCount
> (ExtendedFacet.java:92)
> at com.askme.solrenhancements.facet.ExtendedFacet.getFacetAdditionalInfo
> (ExtendedFacet.java:69)
> at com.askme.solrenhancements.facet.ExtendedFacet.getFacetInfo
> (ExtendedFacet.java:56)
> at com.askme.solrenhancements.facet.CustomFacetComponent.process
> (CustomFacetComponent.java:4

Re: Question About Highlighting

2011-02-22 Thread Ahsan |qbal

Hi All

I even tried that (Appending &hl.usePhraseHighlighter=true) but it still
does not work.

Please help
Regards
Ahsan Iqbal

On Fri, Feb 18, 2011 at 12:30 AM, Ahmet Arslan  wrote:

> > I had a requirement to implement phrase proximity like ["a
> > b c" w/5 "d e f"] for
> > this i have implemented a custom query parser plug in which
> > I make use of nested
> > span queries to fulfill this requirement. Now it looks that
> > documents are
> > filtered correctly, but there is an issue in highlighting
> > that also highlights
> > the terms that are alone(not in phrase), can some body
> > suggest me a fix to this
> > issue
> >
>
> Appending &hl.usePhraseHighlighter=true should work.
>
>
>
>

Question about Nested Span Near Query

2011-02-22 Thread Ahsan |qbal

Hi All

I had a requirement to implement queries that involves phrase proximity.
like user should be able to search "ab cd" w/5 "de fg", both phrases as
whole should be with in 5 words of each other. For this I implement a query
parser that make use of nested span queries, so above query would be parsed
as

spanNear([spanNear([Contents:ab, Contents:cd], 0, true),
spanNear([Contents:de, Contents:fg], 0, true)], 5, false)

Queries like this seems to work really good when phrases are small but when
phrases are large this doesn't work fine. Now my question, Is there any
limitation of SpanNearQuery. that we cannot handle large phrases in this
way?

please help

Regards
Ahsan

Tokenizer that Protects Phrases

2011-02-22 Thread David Yang

Hi,

 

I am trying to tokenize a string field of products. Two different
products are: "camera", "security camera". What I would like is for
"security camera" to be treated differently to "camera" - and only be
displayed when the search is for "security camera", otherwise, the
results should only display "camera". 

 

In other words, even though they share the English word "camera", their
meanings are different.

 

Now my guess about the best way to deal with this is just to manually
provide a file of words that together is a token. For ex. "laptop
battery", "security camera". Kind of like protwords, but like
protphrases.

 

Is this a good idea to solve this problem? How do I implement it if it
is the right way? If there is a better way of dealing with this what is
it?

 

Thanks for your time,

David

Re: Multiple Blocked threads on UnInvertedField.getUnInvertedField() & SegmentReader$CoreReaders.getTermsReader

2011-02-22 Thread Yonik Seeley

On Tue, Feb 22, 2011 at 9:13 AM, Rachita Choudhary
 wrote:
> Hi Solr Users,
>
> We are upgrading from Solr 1.3 to Solr 1.4.1.
> While using Solr 1.3 , we were seeing multiple blocking active threads on
> "org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal() ".
>
> To utilize the benefits of NIO, on upgrading to Solr 1.4.1, we see other
> type of multiple blocking threads on
> "org.apache.solr.request.UnInvertedField.getUnInvertedField()  &
>
> SegmentReader$CoreReaders.getTermsReader".
> Due to this, the QTimes shoots up from few hundreds to thousand of
> msec.. even going upto 30-40 secs for a single query.
>
> - The multiple blocking threads show up after few thousands of queries.
> - We do not have faceting and sorting on the same fields.
> - Our facet fields are multivalued text fields, but no large text values are
> present.
> - Index size - around 10 GB
> - We have not specified any method for faceting in our schema.xml.
> - Our field value cache settings are:
>          class="solr.FastLRUCache"
>        size="175"
>        autowarmCount="0"
>        showItems="10"
>  />
>
> Can someone please tell us the why we are seeing these blocked threads ?
> Also if they are related to our field value cache , then a cache of size 175
> will be filled up with very few initial queries and right after that we
> should see multiple blocking threads ?
> What difference it will make if we have "facet.method = enum" ?

fc method on a multivalued field instantiates an UnInvertedField (like
a multi-valued field cache) which can take some time.
Just like sorting, you may want to use some warming faceting queries
to make sure that real queries don't pay the cost of the initial entry
construction.

>From your fieldValueCache statistics, it looks like the number of
terms is low enough that the enum method may be fine here.

-Yonik
http://lucidimagination.com


> Is this all related to fieldValueCache or is there some other configuration
> which we need to set to avoid these blocking threads?
>
> Thanks,
> Rachita
>
> *Cache values example:
> *facetField1_27443 :
> {field=facet1_27443,memSize=4214884,tindexSize=52,time=22,phase1=15,nTerms=4,bigTerms=0,termInstances=6,uses=1}
>
> facetField1_70 :
> {field=facetField1_70,memSize=4223310,tindexSize=308,time=28,phase1=21,nTerms=636,bigTerms=0,termInstances=14404,uses=1}
>
> facetField2 : 
> {field=facetField2,memSize=4262644,tindexSize=3156,time=273,phase1=267,nTerms=12188,bigTerms=0,termInstances=1255522,uses=7031}

Snipet in results

2011-02-22 Thread Rosa (Anuncios)


Hi,

I would like to have a google similar snipet of 2-3 lines of docs in my 
search results.


Something like:

TITLE <- full title of doc
Description <- that extract the sentence or some text before and after 
keywords with highlightining and merge a couple of these extracted piece 
together


Thanks for your help,

Rosa

Sorting - bad performance

2011-02-22 Thread Jon Drukman

The performance factors wiki says:
"If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the "newSearcher" and "firstSearcher" event listeners in your
solrconfig which sort on those fields, so the FieldCache is populated prior to
any queries being executed by your users."

I've got an index with 24+ million docs of forum posts from users.  I want to be
able to get a given user's posts sorted by date.  It's taking 20 seconds right
now.  What would I put in the newSearch/firstSearcher to make that quicker?  Is
there any other general approach I can use to speed up sorting?

The schema looks like

 
   
   
   
   
   
 

cistring is a case-insensitive string type i created:

UpdateProcessor and copyField

2011-02-22 Thread Teruhiko Kurosaka

Can fields created by copyField instructions be processed by
UpdateProcessors?
Or only raw input fields can?

So far my experiment is suggesting the latter.


T. "Kuro" Kurosaka

Indexing languages, dataimporthandler

2011-02-22 Thread Greg Georges

Hello all,

I have just gone through the mailing list and have set up my different field 
type analysers for my 6 different languages in my shema.xml. Here is my 
question. I am using the dataimporthandler to import data from my database into 
my index. In my table, the documentname column's data can be in any of the 6 
languages. Lets say I want to index this data and apply the different language 
analysers for certain cases, what would be the best way in my case. The real 
problem is that I do not know the language of the string in the documentname 
column once I create my index, therefore I cannot apply the correct field type. 
Should I create a custom transformer?

Thanks

Greg

Re: Snipet in results

2011-02-22 Thread Leonardo Souza

http://wiki.apache.org/solr/HighlightingParameters

[ ]'s
Leonardo Souza
 °v°   Linux user #375225
 /(_)\   http://counter.li.org/
 ^ ^



On Tue, Feb 22, 2011 at 3:39 PM, Rosa (Anuncios) <
rosaemailanunc...@gmail.com> wrote:

> Hi,
>
> I would like to have a google similar snipet of 2-3 lines of docs in my
> search results.
>
> Something like:
>
> TITLE <- full title of doc
> Description <- that extract the sentence or some text before and after
> keywords with highlightining and merge a couple of these extracted piece
> together
>
> Thanks for your help,
>
> Rosa
>

DIH and updating specific record

2011-02-22 Thread Olson, Ron

Hi all-

I am trying to determine if there is a way to tell Solr to update its index 
with a specific ID to a record in the database. All the examples and 
documentation seems to discuss using a "last updated" date/time field, but in 
this case modifying the table would not be an option. Instead, I'd like to 
invoke Solr's DIH delta query with a specific ID to say "here's something new 
or updated, please update your index with it".

I apologize if this is a trivial thing, but I can't seem to find any 
documentation on how to do it.

Thanks,

Ron


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is unauthorized and strictly prohibited. If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.

RE: XML Stripping from DIH

2011-02-22 Thread Olson, Ron

Thanks a lot! I thought I'd looked on this page but didn't see this one, not 
sure why.

I greatly appreciate it!

Ron

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Sunday, February 20, 2011 5:59 AM
To: solr-user@lucene.apache.org
Subject: Re: XML Stripping from DIH

Ron,

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory


Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: "Olson, Ron" 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, February 18, 2011 4:05:15 PM
> Subject: XML Stripping from DIH
>
> Hi all-
>
> I have some XML in a database that I am trying to index and  store; I am
>interested in the various pieces of text, but none of the tags. I've  been
>trying to figure out a way to strip all the tags out, but haven't found
>anything within Solr to do so; the XML parser seems to want XPath to get the
>various element values, when all I want is to turn the whole thing into one 
>blob
>of text, regardless of whether it makes any "contextual" sense.
>
> Is there  something in Solr to do this, or is it something I'd have to write
>myself (which  I'm willing to do if necessary)?
>
> Thanks for any  info,
>
> Ron
>
> DISCLAIMER: This electronic message, including any  attachments, files or
>documents, is intended only for the addressee and may  contain CONFIDENTIAL,
>PROPRIETARY or LEGALLY PRIVILEGED information.  If  you are not the intended
>recipient, you are hereby notified that any use,  disclosure, copying or
>distribution of this message or any of the information  included in or with it
>is  unauthorized and strictly prohibited.  If  you have received this message 
>in
>error, please notify the sender immediately by  reply e-mail and permanently
>delete and destroy this message and its  attachments, along with any copies
>thereof. This message does not create any  contractual obligation on behalf of
>the sender or Law Bulletin Publishing  Company.
> Thank you.
>


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.

Re: Indexing languages, dataimporthandler

2011-02-22 Thread Teruhiko Kurosaka

Greg,

You could use copyField to copy the column in question to 6 fields, one
for each of your 6 languages,
and hope they none of the analyzers do something reasonable without
crashing.
Or apply the white-space tokenizer and hope for the best?

If the column has long enough text, you could try a language detector.
My company, Basis Technology, sells one, and it can plug into Solr easily.
http://www.basistech.com/language-identification/

On 2/22/11 11:50 AM, "Greg Georges"  wrote:

>Hello all,
>
>I have just gone through the mailing list and have set up my different
>field type analysers for my 6 different languages in my shema.xml. Here
>is my question. I am using the dataimporthandler to import data from my
>database into my index. In my table, the documentname column's data can
>be in any of the 6 languages. Lets say I want to index this data and
>apply the different language analysers for certain cases, what would be
>the best way in my case. The real problem is that I do not know the
>language of the string in the documentname column once I create my index,
>therefore I cannot apply the correct field type. Should I create a custom
>transformer?
>
>Thanks
>
>Greg

T. "Kuro" Kurosaka, 415-227-9600x122, 617-386-7122(direct)

Re: Passing parameters to DataImportHandler

2011-02-22 Thread Chris Hostetter


: It'd be nice to be able to pass HTTP parameters into DataImportHandler
: that'd be passed into the SQL as parameters, is this possible?

there is a specific sub-section about this in the docs...

http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters


-Hoss

Sort Stability With Date Boosting and Rounding

2011-02-22 Thread Stephen Duncan Jr

I'm trying to use
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
as
a bf parameter to my dismax handler.  The problem is, the value of NOW can
cause documents in a similar range (date value within a few seconds of each
other) to sometimes round to be equal, and sometimes not, changing their
sort order (when equal, falling back to a secondary sort).  This, in turn,
screws up paging.

The problem is that score is rounded to a lower level of precision than what
the suggested formula produces as a difference between two values within
seconds of each other.  It seems to me if I could round the value to minutes
or hours, where the difference will be large enough to not be rounded-out,
then I wouldn't have problems with order changing on me.  But it's not legal
syntax to specify something like:
recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)

Is this a problem anyone has faced and solved?  Anyone have suggested
solutions, other than indexing a copy of the date field that's rounded to
the hour?

--
Stephen Duncan Jr
www.stephenduncanjr.com

RE: DIH and updating specific record

2011-02-22 Thread David Yang

Chris Hostetter answered this just recently:
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_paramete
rs

My addition:
Pass a parameter like command=delta-import&idz=31415
And access it via 'sql where id=${dataimporter.request.idz}'

If the idz is a string you might need to prequote the idz value.

-Original Message-
From: Olson, Ron [mailto:rol...@lbpc.com] 
Sent: Tuesday, February 22, 2011 3:18 PM
To: solr-user@lucene.apache.org
Subject: DIH and updating specific record

Hi all-

I am trying to determine if there is a way to tell Solr to update its
index with a specific ID to a record in the database. All the examples
and documentation seems to discuss using a "last updated" date/time
field, but in this case modifying the table would not be an option.
Instead, I'd like to invoke Solr's DIH delta query with a specific ID to
say "here's something new or updated, please update your index with it".

I apologize if this is a trivial thing, but I can't seem to find any
documentation on how to do it.

Thanks,

Ron


DISCLAIMER: This electronic message, including any attachments, files or
documents, is intended only for the addressee and may contain
CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are
not the intended recipient, you are hereby notified that any use,
disclosure, copying or distribution of this message or any of the
information included in or with it is unauthorized and strictly
prohibited. If you have received this message in error, please notify
the sender immediately by reply e-mail and permanently delete and
destroy this message and its attachments, along with any copies thereof.
This message does not create any contractual obligation on behalf of the
sender or Law Bulletin Publishing Company.
Thank you.

Re: UpdateProcessor and copyField

2011-02-22 Thread Markus Jelsma

Yes. But did you actually search the mailing list or Solr's wiki? I guess not.

Here it is:
http://wiki.apache.org/solr/UpdateRequestProcessor

> Can fields created by copyField instructions be processed by
> UpdateProcessors?
> Or only raw input fields can?
> 
> So far my experiment is suggesting the latter.
> 
> 
> T. "Kuro" Kurosaka

RE: Sort Stability With Date Boosting and Rounding

2011-02-22 Thread David Yang

One suggestion: use logarithms to compress the large time range into something 
easier to compare: 1/log(ms(now,date)

-Original Message-
From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] 
Sent: Tuesday, February 22, 2011 6:03 PM
To: solr-user@lucene.apache.org
Subject: Sort Stability With Date Boosting and Rounding

I'm trying to use
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
as
a bf parameter to my dismax handler.  The problem is, the value of NOW can
cause documents in a similar range (date value within a few seconds of each
other) to sometimes round to be equal, and sometimes not, changing their
sort order (when equal, falling back to a secondary sort).  This, in turn,
screws up paging.

The problem is that score is rounded to a lower level of precision than what
the suggested formula produces as a difference between two values within
seconds of each other.  It seems to me if I could round the value to minutes
or hours, where the difference will be large enough to not be rounded-out,
then I wouldn't have problems with order changing on me.  But it's not legal
syntax to specify something like:
recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)

Is this a problem anyone has faced and solved?  Anyone have suggested
solutions, other than indexing a copy of the date field that's rounded to
the hour?

--
Stephen Duncan Jr
www.stephenduncanjr.com

Re: Sort Stability With Date Boosting and Rounding

2011-02-22 Thread Geert-Jan Brits

You could always use a secondary sort as a tie-breaker, i.e: something
unique like 'documentid' or something. That would ensure a stable sort.

2011/2/23 Stephen Duncan Jr 

> I'm trying to use
>
> http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
> as
> a bf parameter to my dismax handler.  The problem is, the value of NOW can
> cause documents in a similar range (date value within a few seconds of each
> other) to sometimes round to be equal, and sometimes not, changing their
> sort order (when equal, falling back to a secondary sort).  This, in turn,
> screws up paging.
>
> The problem is that score is rounded to a lower level of precision than
> what
> the suggested formula produces as a difference between two values within
> seconds of each other.  It seems to me if I could round the value to
> minutes
> or hours, where the difference will be large enough to not be rounded-out,
> then I wouldn't have problems with order changing on me.  But it's not
> legal
> syntax to specify something like:
> recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)
>
> Is this a problem anyone has faced and solved?  Anyone have suggested
> solutions, other than indexing a copy of the date field that's rounded to
> the hour?
>
> --
> Stephen Duncan Jr
> www.stephenduncanjr.com
>

Re: Sort Stability With Date Boosting and Rounding

2011-02-22 Thread Markus Jelsma

Hi,

You're right, it's illegal syntax to use other functions in the ms function, 
which is a pity indeed.

However, you reduce the score by 50% for each year. Therefore paging through 
the results shouldn't make that much of a difference because the difference in 
score with NOW+2 minutes has a negligable impact on the total score.

I had some thoughts on this issue as well but i decided the impact was too 
little to bother about.

Cheers,

> I'm trying to use
> http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_n
> ewer_documents as
> a bf parameter to my dismax handler.  The problem is, the value of NOW can
> cause documents in a similar range (date value within a few seconds of each
> other) to sometimes round to be equal, and sometimes not, changing their
> sort order (when equal, falling back to a secondary sort).  This, in turn,
> screws up paging.
> 
> The problem is that score is rounded to a lower level of precision than
> what the suggested formula produces as a difference between two values
> within seconds of each other.  It seems to me if I could round the value
> to minutes or hours, where the difference will be large enough to not be
> rounded-out, then I wouldn't have problems with order changing on me.  But
> it's not legal syntax to specify something like:
> recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)
> 
> Is this a problem anyone has faced and solved?  Anyone have suggested
> solutions, other than indexing a copy of the date field that's rounded to
> the hour?
> 
> --
> Stephen Duncan Jr
> www.stephenduncanjr.com

hierarchical faceting, SOLR-792 - confused on config

2011-02-22 Thread kmf


I'm using solr 4.0 and trying to implement a hierarchical faceting example. 
The example I'm trying to implement is taken from the webcast "Mastering the
Power of Faceted Search."
(http://www.lucidimagination.com/solutions/webcasts/faceting)  Around minute
30, Chris Hostetter gives a very nice "tips & tricks" example he described
as Taxonomy facets.  Where I'm confused is how to get the data
indexed/organized into the "taxonomy facets" (0/NonFic, 1/NonFic/Law,
0/NonFic, 1/NonFic/Sci, 0/NonFic, 1/NonFic/Hist, 1/NonFic/Sci,
2/NonFic/Sci/Phys).  Since I'm using DIH to import my data from a DB, do I
create a TemplateTransformer to produce the indexed data?  Do I have to do
something special within schema.xml and/or solrconfig.xml?  

Once I figure out the correct config setup, I assume it's simply a matter of
creating the correct solr query like he describes in the video?

Thanks,
kmf
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/hierarchical-faceting-SOLR-792-confused-on-config-tp2556394p2556394.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: hierarchical faceting, SOLR-792 - confused on config

2011-02-22 Thread Koji Sekiguchi


(11/02/23 8:26), kmf wrote:


I'm using solr 4.0 and trying to implement a hierarchical faceting example.
The example I'm trying to implement is taken from the webcast "Mastering the
Power of Faceted Search."
(http://www.lucidimagination.com/solutions/webcasts/faceting)  Around minute
30, Chris Hostetter gives a very nice "tips&  tricks" example he described
as Taxonomy facets.  Where I'm confused is how to get the data
indexed/organized into the "taxonomy facets" (0/NonFic, 1/NonFic/Law,
0/NonFic, 1/NonFic/Sci, 0/NonFic, 1/NonFic/Hist, 1/NonFic/Sci,
2/NonFic/Sci/Phys).  Since I'm using DIH to import my data from a DB, do I
create a TemplateTransformer to produce the indexed data?  Do I have to do
something special within schema.xml and/or solrconfig.xml?

Once I figure out the correct config setup, I assume it's simply a matter of
creating the correct solr query like he describes in the video?

Thanks,
kmf


kmf,

disclaimer: I've never seen the webcast yet.

First, SOLR-792 is not for hierarchical faceting. Please see SOLR-64.
Second, please take a look at PathHierarchyTokenizer in trunk and 3x.
It cannot output the depth factor ("0/", "1/", ...), though.

Hmm, does everyone think that it has to be better if it outputs
the depth factors to type or payload or somewhere else?

Koji
--
http://www.rondhuit.com/en/

Re: UpdateProcessor and copyField

2011-02-22 Thread Teruhiko Kurosaka

Markus,

I searched but I couldn't find a definite answer, so I posted this
question.
The article you quoted talks about implementing a copyField-like operation
using UpdateProcessor.  It doesn't talk about relationship between
the copyField operation proper and UpdateProcessors.

Kuro

On 2/22/11 3:00 PM, "Markus Jelsma"  wrote:

>Yes. But did you actually search the mailing list or Solr's wiki? I guess
>not.
>
>Here it is:
>http://wiki.apache.org/solr/UpdateRequestProcessor
>
>> Can fields created by copyField instructions be processed by
>> UpdateProcessors?
>> Or only raw input fields can?
>> 
>> So far my experiment is suggesting the latter.
>> 
>> 
>> T. "Kuro" Kurosaka

Re: Date Math

2011-02-22 Thread Chris Hostetter


: org.apache.lucene.queryParser.ParseException: Cannot parse 
'last_modified:-DAY': 
...
: Are they not supported as a short-cut for "NOW-1DAY"?  I'm using Solr 1.4.

No, "-1DAY" is a valid DateMath string (to the DateMathParser) but as a 
field value you must specify a valid date string, which can *end* with a 
DateMath string.  so "NOW-1DAY" is legal, as is 
"2011-02-22T12:34:56Z-1DAY"

Note also: you didn't do "-1DAY" you tried "-DAY" which isn't valid 
anywhere.


-Hoss

Re: Sort Stability With Date Boosting and Rounding

2011-02-22 Thread Stephen Duncan Jr

The problem comes when you have results that are all the same natural score
(because you've filtered them, with no primary search, for instance), and
are very close together in time.  Then, as you page through, the order
changes.  So the user experience is that they see duplicate documents, and
miss out on some of the docs in the overall set.  It's not something
negligible that I can ignore.  I either have to come up with a fix for this,
or get rid of the boost function altogether.

Stephen Duncan Jr
www.stephenduncanjr.com


On Tue, Feb 22, 2011 at 6:09 PM, Markus Jelsma
wrote:

> Hi,
>
> You're right, it's illegal syntax to use other functions in the ms
> function,
> which is a pity indeed.
>
> However, you reduce the score by 50% for each year. Therefore paging
> through
> the results shouldn't make that much of a difference because the difference
> in
> score with NOW+2 minutes has a negligable impact on the total score.
>
> I had some thoughts on this issue as well but i decided the impact was too
> little to bother about.
>
> Cheers,
>
> > I'm trying to use
> >
> http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_n
> > ewer_documents as
> > a bf parameter to my dismax handler.  The problem is, the value of NOW
> can
> > cause documents in a similar range (date value within a few seconds of
> each
> > other) to sometimes round to be equal, and sometimes not, changing their
> > sort order (when equal, falling back to a secondary sort).  This, in
> turn,
> > screws up paging.
> >
> > The problem is that score is rounded to a lower level of precision than
> > what the suggested formula produces as a difference between two values
> > within seconds of each other.  It seems to me if I could round the value
> > to minutes or hours, where the difference will be large enough to not be
> > rounded-out, then I wouldn't have problems with order changing on me.
>  But
> > it's not legal syntax to specify something like:
> > recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)
> >
> > Is this a problem anyone has faced and solved?  Anyone have suggested
> > solutions, other than indexing a copy of the date field that's rounded to
> > the hour?
> >
> > --
> > Stephen Duncan Jr
> > www.stephenduncanjr.com
>

Re: Sort Stability With Date Boosting and Rounding

2011-02-22 Thread Stephen Duncan Jr

No, the problem is that, due to rounding, sometimes the docs ARE considered
ties, and therefore the secondary sort is used, but sometimes they don't
round to exactly equal, and the tiebreaker isn't used, and the results get
shuffled.

Stephen Duncan Jr
www.stephenduncanjr.com


On Tue, Feb 22, 2011 at 6:09 PM, Geert-Jan Brits  wrote:

> You could always use a secondary sort as a tie-breaker, i.e: something
> unique like 'documentid' or something. That would ensure a stable sort.
>
> 2011/2/23 Stephen Duncan Jr 
>
> > I'm trying to use
> >
> >
> http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
> > as
> > a bf parameter to my dismax handler.  The problem is, the value of NOW
> can
> > cause documents in a similar range (date value within a few seconds of
> each
> > other) to sometimes round to be equal, and sometimes not, changing their
> > sort order (when equal, falling back to a secondary sort).  This, in
> turn,
> > screws up paging.
> >
> > The problem is that score is rounded to a lower level of precision than
> > what
> > the suggested formula produces as a difference between two values within
> > seconds of each other.  It seems to me if I could round the value to
> > minutes
> > or hours, where the difference will be large enough to not be
> rounded-out,
> > then I wouldn't have problems with order changing on me.  But it's not
> > legal
> > syntax to specify something like:
> > recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)
> >
> > Is this a problem anyone has faced and solved?  Anyone have suggested
> > solutions, other than indexing a copy of the date field that's rounded to
> > the hour?
> >
> > --
> > Stephen Duncan Jr
> > www.stephenduncanjr.com
> >
>

38 matches

Mail list logo