snapinstaller does not start newSearcher

2015-02-23 Thread alxsss
Hello,

I am using latest solr (solr trunk) . I run snapinstaller, and see that it 
copies snapshot to index folder but changes are not picked up and

 logs in slave after running snapinstaller are

44302 [qtp1312571113-14] INFO  org.apache.solr.update.UpdateHandler  – start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
44303 [qtp1312571113-14] INFO  org.apache.solr.update.UpdateHandler  – No 
uncommitted changes. Skipping IW.commit.
44304 [qtp1312571113-14] INFO  org.apache.solr.core.SolrCore  – 
SolrIndexSearcher has not changed - not re-opening: 
org.apache.solr.search.SolrIndexSearcher
44305 [qtp1312571113-14] INFO  org.apache.solr.update.UpdateHandler  – 
end_commit_flush
44305 [qtp1312571113-14] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  – [product] webapp=/solr 
path=/update params={} {commit=} 0 57

Restarting solr  gives

 Error creating core [product]: Error opening new searcher
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.(SolrCore.java:873)
at org.apache.solr.core.SolrCore.(SolrCore.java:646)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.(SolrCore.java:845)
... 9 more

Any idea what causes this issue.

Thanks in advance.
Alex.



Re: snapinstaller does not start newSearcher

2015-02-24 Thread alxsss
Hello,

We cannot use replication with the current architecture, so decided to use 
snapshotter with snapinstaller.

Here is the full stack trace

8937 [coreLoadExecutor-5-thread-3] INFO  
org.apache.solr.core.CachingDirectoryFactory  – Closing directory: 
/home/solr/solr-4.10.1/solr/example/solr/product/data
8938 [coreLoadExecutor-5-thread-3] ERROR org.apache.solr.core.CoreContainer  – 
Error creating core [product]: Error opening new searcher
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.(SolrCore.java:873)
at org.apache.solr.core.SolrCore.(SolrCore.java:646)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.(SolrCore.java:845)
... 9 more
Caused by: java.nio.file.NoSuchFileException: 
/home/solr/solr-4.10.1/solr/example/solr/product/data/index/segments_4
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:176)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:334)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:196)
at 
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:198)
at 
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:341)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:454)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:906)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:752)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:450)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:792)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:77)
at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
... 11 more
8943 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  – 
user.dir=/home/solr/solr-4.10.1/solr/example
8943 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  – 
SolrDispatchFilter.init() done
8982 [main] INFO  org.eclipse.jetty.server.AbstractConnector  – Started 
SocketConnector@0.0.0.0:8983

Thanks.
Alex.

 

 

 

-Original Message-
From: Shalin Shekhar Mangar 
To: solr-user 
Sent: Tue, Feb 24, 2015 12:13 am
Subject: Re: snapinstaller does not start newSearcher


Do you mean the snapinstaller (bash) script? Those are legacy scripts.
It's
been a long time since they were tested. The ReplicationHandler is
the
recommended way to setup replication. If you want to take a snapshot
then
the replication handler has an HTTP based API which lets you do
that.

In any case, do you have the full stack trace for that exception?
There
should be another cause nested under it.

On Tue, Feb 24, 2015 at 12:47
PM,  wrote:

> Hello,
>
> I am using latest solr (solr
trunk) . I run snapinstaller, and see that it
> copies snapshot to index folder
but changes are not picked up and
>
>  logs in slave after running
snapinstaller are
>
> 44302 [qtp1312571113-14] INFO 
org.apache.solr.update.UpdateHandler  –
> start
>
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>
44303 [qtp1312571113-14] INFO  org.apache.solr.update.UpdateHandler  – No
>
uncommitted changes. Skipping IW.commit.
> 44304 [qtp1312571113-14] INFO 
org.apache.solr.core.SolrCore  –
> SolrIndexSearcher has not changed - not
re-opening:
> org.apache.solr.search.SolrIndexSearcher

Re: snapinstaller does not start newSearcher

2015-03-04 Thread alxsss
I have used snapshotter api and modified snapinstaller script, so that it
successfully grabs the snapshot folder and updates index folder in slave.
However, it fails to open newSearcher. 
It simple, sends a commit command to slave, but  hasUncommittedChanges
function returns false.
That is the reason.

Reloading collection picks up changes.

Could reloading return no results for queries that were sent during this
process?

Thanks.
Alex.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/snapinstaller-does-not-start-newSearcher-tp4188449p4191069.html
Sent from the Solr - User mailing list archive at Nabble.com.


more like this generated query

2015-04-27 Thread alxsss
Hello,


I am using solr-4.10.4 with mlt. I noticed that mlt constructs query which is 
missing some words. For example, for doc with 
title: Jennnifer Lopez 
keywords: Jennifer, concert, Hollywood


the parsedquery generated by mlt for this doc  is  title:lopez 
keywords:jennifer keywords:concert keywords:hollywood.
It seems to me that there must be  title:jennifer, too


For another doc that has only title, mlt generated query includes  
keywords:famili. This doc has family in title.


Any ideas what is wrong here?


Thanks.
Alex.








spellcheck in solr-4.6-1 distrib=true

2014-03-31 Thread alxsss
Hello,

For queries in solrcloud and in distributed mode solr-4.6.1 spellcheck does not 
return any suggestions, but in non-distrubited mode.
Is this a know bug?

Thanks.
Alex.


Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-01 Thread alxsss
It seems to me that, you are missing this line  

  

under
 

Alex.

 

 

-Original Message-
From: solr-user 
To: solr-user 
Sent: Tue, Apr 1, 2014 5:01 pm
Subject: Re: how do I get search for "fort st john" to match "ft saint john"


Hi Eric.

Sorry, been away.  

The city_index_synonyms.txt file is pretty small as it contains just these
two lines:

saint,st,ste
fort,ft

There is nothing at all in the city_query_synonyms.txt file, and it isn't
used either.

My understanding is that solr would create the appropriate synonym entries
in the index and so treat "fort" and "ft" as equal

if you have a simple one line schema (that uses the type definition from my
original email) and index "fort saint john", does it work for you?  i.e.
does it return results if you search for "ft st john" and "ft saint john"
and "fort st john"?  

My Solr 4.6.1 instance doesn't.  I am wondering if synonyms just don't work
for all/some words in a phrase



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


Re: group.ngroups is set to an incorrect value - specific field types

2014-06-17 Thread alxsss
Hi,


I see similar problem in our solr application. Sometime it gives number in a 
group as number of all documents. This starting to happen after upgrade from 
4.6.1 to 4.8.1


Thanks.
Alex.



-Original Message-
From: 海老澤 志信 
To: solr-user 
Sent: Tue, Jun 17, 2014 5:24 am
Subject: RE: group.ngroups is set to an incorrect value - specific field types


Hi all

Could anyone have comments on my bug report?

Regards,
Ebisawa


>-Original Message-
>From: 海老澤 志信
>Sent: Friday, June 13, 2014 7:45 PM
>To: 'solr-user@lucene.apache.org'
>Subject: group.ngroups is set to an incorrect value - specific field types
>
>Hi,
>
>I'm using Solr version 4.1.
>I found a bug in group.ngroups. So could anyone kindly take a look at my
>bug report?
>
>If I specify the type Double as group.field, the value of group.ngroups is
>set to be an incorrect value.
>
>[condition]
>- Double is defined in group.field
>- Documents without the field which is defined as group.field,
>
>[Sample query and Example]
>---
>solr/select?q=*:*&group=true&group.ngroups=true&group.field=Double_Fiel
>d
>
>* "Double_Field" is defined "solr.TrieDoubleField" type.
>---
>When documents with group.field are 4 and documents without group.field are
>6,
>then it turns out 10 of group.ngroups as result of the query.
>
>But I think that group.ngroups should be 5 rightly in this case.
>
>[Root Cause]
>It seems there is a bug in the source code of Lucene.
>There is a function that compares a list of whether these groups contain
>the same group.field,
>It calls MutableValueDouble.compareSameType().
>
>See below the point which seems to be a root cause.
>-
>if (!exists) return -1;
>if (!b.exists) return 1;
>-
>If "exists" is false, it return -1.
>
>But I think it should return 0, when "exists" and "b.exists" are equal.
>
>[Similar problem]
>There is a similar problem to MutableValueBool.compareSameType().
>Therefore, when you grouping the field of type Boolean (solr.BoolField),
>value of group.ngroups is always 0 or 1 .
>
>[Solution]
>I propose the following modifications:
>MutableValueDouble.compareSameType()
>
>===
>--- MutableValueDouble.java
>+++ MutableValueDouble.java
>@@ -54,9 +54,8 @@
> MutableValueDouble b = (MutableValueDouble)other;
> int c = Double.compare(value, b.value);
> if (c != 0) return c;
>-if (!exists) return -1;
>-if (!b.exists) return 1;
>-return 0;
>+if (exists == b.exists) return 0;
>+return exists ? 1 : -1;
>   }
>===
>
>I propose the following modifications: MutableValueBool.compareSameType()
>
>===
>--- MutableValueBool.java
>+++ MutableValueBool.java
>@@ -52,7 +52,7 @@
>   @Override
>   public int compareSameType(Object other) {
> MutableValueBool b = (MutableValueBool)other;
>-if (value != b.value) return value ? 1 : 0;
>+if (value != b.value) return value ? 1 : -1;
> if (exists == b.exists) return 0;
> return exists ? 1 : -1;
>   }
>===
>
>
>Thanks,
>
>Ebisawa


 


regexTransformer returns no results if there is no match

2014-08-11 Thread alxsss
Hello,

I try to construct wikipedia page url from page title using regexTransformer
with




This does not work  for titles that have no space, so title_underscore for them 
is empty.

Any ideas what is wrong here?

This is with solr-4.8.1

Thanks. Alex.


change character correspondence in icu lib

2014-02-12 Thread alxsss
Hello,

I use
icu4j-49.1.jar,
lucene-analyzers-icu-4.6-SNAPSHOT.jar

for one of the fields in the form 



I need to change one of the accent char's corresponding letter. I made changes 
to this file

lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

recompiled solr and lucene and replaced the above jars with new ones, but no 
change in the indexing and parsing of keywords.

Any ideas where the appropriate change must be made?

Thanks.
Alex.





Re: change character correspondence in icu lib

2014-02-13 Thread alxsss
I found out that generated files are the same. I think this is because that 
these lines inside build file

  

  


  

  

  
  
  
  
Note that the gennorm2 and icupkg tools must be on your PATH. These 
tools
are part of the ICU4C package. See http://site.icu-project.org/ 


  
  
  
  
  
  



  
  
  


  

Are not executed and resource files are downloaded from internet instead.

Any ideas how to fix this issue?

Thanks.
Alex.

 

 

 

-Original Message-
From: Alexandre Rafalovitch 
To: solr-user 
Sent: Wed, Feb 12, 2014 5:20 pm
Subject: Re: change character correspondence in icu lib


Not a direct answer, but the usual next question is: are you
absolutely sure you are using the right jars? Try renaming them and
restarting Solr. If it complains, you got the right ones. If not
Also, unzip those jars and see if your file made it all the way
through the build pipeline.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 13, 2014 at 8:12 AM,   wrote:
> Hello,
>
> I use
> icu4j-49.1.jar,
> lucene-analyzers-icu-4.6-SNAPSHOT.jar
>
> for one of the fields in the form
>
> 
>
> I need to change one of the accent char's corresponding letter. I made 
> changes 
to this file
>
> lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt
>
> recompiled solr and lucene and replaced the above jars with new ones, but no 
change in the indexing and parsing of keywords.
>
> Any ideas where the appropriate change must be made?
>
> Thanks.
> Alex.
>
>
>

 


Re: change character correspondence in icu lib

2014-02-13 Thread alxsss
I found out that generated files are the same. I think this is because that 
these lines inside build file

  

  


  

  

  
  
  
  
Note that the gennorm2 and icupkg tools must be on your PATH. These 
tools
are part of the ICU4C package. See http://site.icu-project.org/ 


  
  
  
  
  
  



  
  
  


  

Are not executed and resource files are downloaded from internet instead.

Any ideas how to fix this issue?

Thanks.
Alex.

 

 

 

-Original Message-
From: Alexandre Rafalovitch 
To: solr-user 
Sent: Wed, Feb 12, 2014 5:20 pm
Subject: Re: change character correspondence in icu lib


Not a direct answer, but the usual next question is: are you
absolutely sure you are using the right jars? Try renaming them and
restarting Solr. If it complains, you got the right ones. If not
Also, unzip those jars and see if your file made it all the way
through the build pipeline.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 13, 2014 at 8:12 AM,   wrote:
> Hello,
>
> I use
> icu4j-49.1.jar,
> lucene-analyzers-icu-4.6-SNAPSHOT.jar
>
> for one of the fields in the form
>
> 
>
> I need to change one of the accent char's corresponding letter. I made 
> changes 
to this file
>
> lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt
>
> recompiled solr and lucene and replaced the above jars with new ones, but no 
change in the indexing and parsing of keywords.
>
> Any ideas where the appropriate change must be made?
>
> Thanks.
> Alex.
>
>
>

 


Re: Incorrect group.ngroups value

2014-08-25 Thread alxsss
Hi,

From the discussion it is not clear if this is a fixable bug in the case of 
documents being in different shards. If this is fixable could someone please 
direct me to the part of the code so that I could investigate.

Thanks.
Alex.

 

 

 

-Original Message-
From: Andrew Shumway 
To: solr-user 
Sent: Fri, Aug 22, 2014 8:15 am
Subject: RE: Incorrect group.ngroups value


The Co-location section of this document  
http://searchhub.org/2013/06/13/solr-cloud-document-routing/ 
might be of interest to you.  It mentions the need for using Solr Cloud routing 
to group documents in the same core so that grouping can work properly.

--Andrew Shumway


-Original Message-
From: Bryan Bende [mailto:bbe...@gmail.com] 
Sent: Friday, August 22, 2014 9:01 AM
To: solr-user@lucene.apache.org
Subject: Re: Incorrect group.ngroups value

Thanks Jim.

We've been using the composite id approach where we put group value as the 
leading portion of the id (i.e. groupValue!documentid), so I was expecting all 
of the documents for a given group to be in the same shard, but at least this 
gives me something to look into. I'm still suspicious of something changing 
between 4.6.1 and 4.8.1, because we've had the grouping implemented this way 
for 
a while, and only on the exact day we upgraded did someone bring this problem 
forward. I will keep investigating, thanks.


On Fri, Aug 22, 2014 at 9:18 AM, jim ferenczi 
wrote:

> Hi Bryan,
> This is a known limitations of the grouping.
> https://wiki.apache.org/solr/FieldCollapsing#RequestParameters
>
> group.ngroups:
>
>
> *WARNING: If this parameter is set to true on a sharded environment, 
> all the documents that belong to the same group have to be located in 
> the same shard, otherwise the count will be incorrect. If you are 
> using SolrCloud , consider 
> using "custom hashing"*
>
> Cheers,
> Jim
>
>
>
> 2014-08-21 21:44 GMT+02:00 Bryan Bende :
>
> > Is there any known issue with using group.ngroups in a distributed 
> > Solr using version 4.8.1 ?
> >
> > I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing
> several
> > queries where ngroups will be more than the actual groups returned 
> > in the response. For example, ngroups will say 5, but then there 
> > will be 3
> groups
> > in the response. It is not happening on all queries, only some.
> >
>

 


custom sorting of search result

2014-11-03 Thread alxsss
Hello,


We need to order solr search results according to specific rules. 


I will explain with an example. Let say solr returns 1000 results for query 
"sport". 
These results must be divided into three buckets according to rules that come 
from database. 
Then one doc must be chosen from each bucket and put in the results 
subsequently until all buckets are empty.


One approach was to modify/override solr code where it gets results, sorts them 
and return #rows of elements.
However, from the code in Weight.java scoreAll function we see that docs have 
only internal document id and nothing else. 


We expect unique solr document id in order to match documents with the custom 
scoring.
We also  see that Lucene code handles those doc ids to scoreAll function, and 
for now We do not want to modify Lucene code
 and prefer to solve this issue as a Solr  plugin .


Any ideas are welcome.




Thanks.
Alex. 






additional requests sent to solr

2013-07-18 Thread alxsss
Hello,

I send to solr( to server1 in the cluster of two servers) the folowing request

http://server1:8983/solr/mycollection/select?q=alex&wt=xml&defType=edismax&facet.field=school&facet.field=company&facet=true&facet.limit=10&facet.mincount=1&qf=school_txt+company_txt+name&shards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection

I see in the logs 2 additional requests

INFO: [mycollection] webapp=/solr path=/select 
params={facet=true&f.company.facet.limit=25&qf=school_txt+company_txt+name&distrib=false&wt=javabin&version=2&rows=10&defType=edismax&f.school_facet.facet.limit=25&NOW=1374191542130&shard.url=server1:8983/solr/mycollection&fl=id,score&start=0&q=alex&facet.field=school&facet.field=company&isShard=true&fsv=true}
 hits=9118 status=0 QTime=72

Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
INFO: [mycollection] webapp=/solr path=/select 
params={facet=true&facet.mincount=1&company__terms=Google&ids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511&facet.limit=10&qf=school_txt+company_txt+name&distrib=false&wt=javabin&version=2&rows=10&defType=edismax&NOW=1374191542130&shard.url=server1:8983/solr/mycollection&school__terms=Michigan+State+University,Brigham+Young+University,Northeastern+University&q=alex&facet.field={!terms%3D$school__terms}school&facet.field={!terms%3D$company__terms}company&isShard=true}
 status=0 QTime=6

Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
INFO: [mycollection] webapp=/solr path=/select 
params={facet=true&shards=server1.prod.mylife.com:8983/solr/mycollection,server2:8983/solr/mycollection&facet.mincount=1&q=alex&facet.limit=10&qf=school_txt+company_txt+name&facet.field=school&facet.field=company&wt=xml&defType=edismax}
 hits=97262 status=0 QTime=168


I can understand that the first and the third log records are related to the 
above request, but cannot inderstand where the second log comes from. 
I see in it, company__terms and 
{!terms%3D$school__terms}school&facet.field={!terms%3D$company__terms}, whish 
seems does not have anything to do with the initial request. This is solr-4.2.0


Any ideas about it are welcome.

Thanks in advance.
Alex.


Re: additional requests sent to solr

2013-08-04 Thread alxsss
Hello,

I still have this issue. Basically in distributed mode, when facet is true, 
solr-4.2 issues an additional query with 
facet.field={!terms%3D$company__terms}company&isShard=true} where for example 
company__terms have all values from company facet field.

I have added terms=false to the original query sent to solr, but it did not 
help.

Does anyone has any idea how to suppress these queries. 

Thanks.
Alex.


 

 

 

-Original Message-
From: alxsss 
To: solr-user 
Sent: Fri, Jul 19, 2013 5:00 am
Subject: additional requests sent to solr


Hello,

I send to solr( to server1 in the cluster of two servers) the folowing request

http://server1:8983/solr/mycollection/select?q=alex&wt=xml&defType=edismax&facet.field=school&facet.field=company&facet=true&facet.limit=10&facet.mincount=1&qf=school_txt+company_txt+name&shards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection

I see in the logs 2 additional requests

INFO: [mycollection] webapp=/solr path=/select 
params={facet=true&f.company.facet.limit=25&qf=school_txt+company_txt+name&distrib=false&wt=javabin&version=2&rows=10&defType=edismax&f.school_facet.facet.limit=25&NOW=1374191542130&shard.url=server1:8983/solr/mycollection&fl=id,score&start=0&q=alex&facet.field=school&facet.field=company&isShard=true&fsv=true}
 
hits=9118 status=0 QTime=72

Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
INFO: [mycollection] webapp=/solr path=/select 
params={facet=true&facet.mincount=1&company__terms=Google&ids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511&facet.limit=10&qf=school_txt+company_txt+name&distrib=false&wt=javabin&version=2&rows=10&defType=edismax&NOW=1374191542130&shard.url=server1:8983/solr/mycollection&school__terms=Michigan+State+University,Brigham+Young+University,Northeastern+University&q=alex&facet.field={!terms%3D$school__terms}school&facet.field={!terms%3D$company__terms}company&isShard=true}
 
status=0 QTime=6

Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
INFO: [mycollection] webapp=/solr path=/select 
params={facet=true&shards=server1.prod.mylife.com:8983/solr/mycollection,server2:8983/solr/mycollection&facet.mincount=1&q=alex&facet.limit=10&qf=school_txt+company_txt+name&facet.field=school&facet.field=company&wt=xml&defType=edismax}
 
hits=97262 status=0 QTime=168


I can understand that the first and the third log records are related to the 
above request, but cannot inderstand where the second log comes from. 
I see in it, company__terms and 
{!terms%3D$school__terms}school&facet.field={!terms%3D$company__terms}, whish 
seems does not have anything to do with the initial request. This is solr-4.2.0


Any ideas about it are welcome.

Thanks in advance.
Alex.

 


Re: additional requests sent to solr

2013-08-05 Thread alxsss
I care about performance. Since the data is too big the query with terms 
becomes to long and slows performance. 

bq
---
In general distributed searchrequires two round trips to the "other" shards."
---

In this case I have three queries to solr. The third one is with {!terms..., 
which I do not understand why it is there.

Thanks.
Alex.

 

 

 

-Original Message-
From: Erick Erickson 
To: solr-user 
Sent: Mon, Aug 5, 2013 7:10 pm
Subject: Re: additional requests sent to solr


Why do you care? Is this causing you trouble? In general distributed search
requires two round trips to the "other" shards. The first query gets the
top N, those are returned to the originator (just a list of IDs and sort
criteria,
often score). The originator then assembles the final top N, but then
the actual body of those documents must be fetched from the other
nodes.

Best
Erick


On Mon, Aug 5, 2013 at 2:02 AM,  wrote:

> Hello,
>
> I still have this issue. Basically in distributed mode, when facet is
> true, solr-4.2 issues an additional query with
> facet.field={!terms%3D$company__terms}company&isShard=true} where for
> example
> company__terms have all values from company facet field.
>
> I have added terms=false to the original query sent to solr, but it did
> not help.
>
> Does anyone has any idea how to suppress these queries.
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
>
> -Original Message-
> From: alxsss 
> To: solr-user 
> Sent: Fri, Jul 19, 2013 5:00 am
> Subject: additional requests sent to solr
>
>
> Hello,
>
> I send to solr( to server1 in the cluster of two servers) the folowing
> request
>
>
> http://server1:8983/solr/mycollection/select?q=alex&wt=xml&defType=edismax&facet.field=school&facet.field=company&facet=true&facet.limit=10&facet.mincount=1&qf=school_txt+company_txt+name&shards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection
>
> I see in the logs 2 additional requests
>
> INFO: [mycollection] webapp=/solr path=/select
> params={facet=true&f.company.facet.limit=25&qf=school_txt+company_txt+name&distrib=false&wt=javabin&version=2&rows=10&defType=edismax&f.school_facet.facet.limit=25&NOW=1374191542130&shard.url=server1:8983/solr/mycollection&fl=id,score&start=0&q=alex&facet.field=school&facet.field=company&isShard=true&fsv=true}
> hits=9118 status=0 QTime=72
>
> Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
> INFO: [mycollection] webapp=/solr path=/select
> params={facet=true&facet.mincount=1&company__terms=Google&ids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511&facet.limit=10&qf=school_txt+company_txt+name&distrib=false&wt=javabin&version=2&rows=10&defType=edismax&NOW=1374191542130&shard.url=server1:8983/solr/mycollection&school__terms=Michigan+State+University,Brigham+Young+University,Northeastern+University&q=alex&facet.field={!terms%3D$school__terms}school&facet.field={!terms%3D$company__terms}company&isShard=true}
> status=0 QTime=6
>
> Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
> INFO: [mycollection] webapp=/solr path=/select params={facet=true&shards=
> server1.prod.mylife.com:8983/solr/mycollection,server2:8983/solr/mycollection&facet.mincount=1&q=alex&facet.limit=10&qf=school_txt+company_txt+name&facet.field=school&facet.field=company&wt=xml&defType=edismax
> }
> hits=97262 status=0 QTime=168
>
>
> I can understand that the first and the third log records are related to
> the
> above request, but cannot inderstand where the second log comes from.
> I see in it, company__terms and
> {!terms%3D$school__terms}school&facet.field={!terms%3D$company__terms},
> whish
> seems does not have anything to do with the initial request. This is
> solr-4.2.0
>
>
> Any ideas about it are welcome.
>
> Thanks in advance.
> Alex.
>
>
>

 


Re: additional requests sent to solr

2013-08-11 Thread alxsss
Hi,

Could someone please confirm that this must me so or this is a bug in SOLR.

In short, I see three logs in SOLR for one  request
http://server1:8983/solr/mycollection/select?q=alex&wt=xml&defType=edismax&facet.field=school&facet.field=company&facet=true&facet.limit=10&facet.mincount=1&qf=school_txt+company_txt+name&shards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection

for the case when facet=true.  

The third log looks like as 
INFO: [mycollection] webapp=/solr path=/select

params={facet=true&facet.mincount=1&company__terms=Google&ids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511&facet.limit=10&qf=school_txt+company_txt+name&distrib=false&wt=javabin&version=2&rows=10&defType=edismax&NOW=1374191542130&shard.url=server1:8983/solr/mycollection&school__terms=Michigan+State+University,Brigham+Young+University,Northeastern+University&q=alex&facet.field={!terms%3D$school__terms}school&facet.field={!terms%3D$company__terms}company&isShard=true}
 status=0 QTime=6

where company__terms and school_terms values are taken from facet values for
company and school fields.

When data is big this leads to a log with all facet values, that
considerably slows performance. This issue is observed in distributed mode
only.

Thanks in advance.
Alex.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/additional-requests-sent-to-solr-tp4079007p4083799.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-21 Thread alxsss
Hello,

We need this feature be fixed ASAP. So, please let me know which class is 
responsible for combining spellcheck results from all shards. I will try to 
debug the code.

Thanks in advance.
Alex.

 

 

 

-Original Message-
From: alxsss 
To: solr-user 
Sent: Tue, Mar 19, 2013 11:34 am
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud


-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable 

Not sure what this is?

I have

 

spell





  direct
  spell
  solr.DirectSolrSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  wordbreak
  solr.WordBreakSolrSpellChecker
  spell
  true
  true
  10




 




  


spell filed in our schema is called spell and its type also is called spell.
Here are requests


 curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler&distrib=false'




  0
  32
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  






 curl 
'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler&distrib=false'




  0
  26
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  

  1
  0
  11
  
paul u soles   
  

(paul u soles)
  



No distrib param

curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler'




  0
  24
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  




curl 
'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler'




  0
  24
  
true
testhandler
paulusoles
10
  


  
0
0

  



  




Thanks.
Alex.

---Original Message-

From: Dyer, James 
To: solr-user 
Sent: Tue, Mar 19, 2013 11:10 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


You may likely be hitting on a bug with WordBreakSolrSpellChecker in a 
distributed environment.  But to nail it down, we probably need to see both the 
applicable  section of your config and also this section: 
.  Also 
need an example of a query that succeeds non-distributed (with the exact query 
url and output you get) vs the same query url and output in the distributed 
scenario.  Then, without access to your actual index, it might be possible to 
come up with a failing unit test.  With a failing unit test in hand, we have a 
good shot at getting a fix.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com] 
Sent: Tuesday, March 19, 2013 12:39 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

I was testing my custom testhandler. Direct spellchecker also was not working 
in 

cloud. After I added 

  
 spellcheck

to /select requestHandler it worked but the wordbreak spellchecker. I have 
added 

shards.qt=testhanlder to curl request but it did not solve the issue.

Thanks.
Alex.

 

 

 

-Original Message-
From: Dyer, James 
To: solr-user 
Sent: Tue, Mar 19, 2013 10:30 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


Mark,

I wasn't sure if Alex is actually testing /select, or if the problem is just 
coming up in /testhandler.  Just wanted to verify that before we get into bug 
reports.

DistributedSpellCheckComponentTest does have 1 little Word Break test scenario 
in it, so we know WordBreakSolrSpellChecker at least works some of the time in 
a 


Distributed environment :) .  Ideally, we should probably use a random test for 
stuff like this as adding a bunch of test scenarios would make this 
already-slower-than-molasses test even slower.  On the other hand, we want to 
test as many possibilities as we can.  Based on DSCCT and it being so 
superficial, I really can't vouch too much for my spell check enhancements 
working as well with shards as they do with a single index.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, March 19, 2013 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

My first thought too, but then I saw that he had the spell component in both 
his 


custom testhander and the /select handler, so I'd expect that to work as well.

- Mark

On Mar 19, 2013, at 12:18 PM, "Dyer, James"  
wrote:

> Can you try including in your request the "shards.qt" parameter?  In your 
case, I think you should set it to "testhandler".  See 
h

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-21 Thread alxsss

Hello,

I am debugging the SpellCheckComponent#finishStage. 
 
>From the responses I see that not only wordbreak, but also directSpellchecker 
>does not return some results in distributed mode. 
The request handler I was using had 

true


So, I desided to turn of grouping and I see spellcheck results in distributed 
mode.


curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler'
has no spellchek results 
but

curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler
&group=false'
returns results.

So, the conclusion is that grouping causes the distributed spellcheker to fail.

Could please you point me to the class that may be responsible to this issue?

Thanks.
Alex.
 




-Original Message-
From: Dyer, James 
To: solr-user 
Sent: Thu, Mar 21, 2013 11:23 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


The shard responses get combined in SpellCheckComponent#finishStage .  I highly 
recommend you file a JIRA bug report for this at 
https://issues.apache.org/jira/browse/SOLR 
.  If you write a failing unit test, it would make it much more likely that 
others would help you with a fix.  Of course, if you solve the issue entirely, 
a 
patch would be much appreciated.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Thursday, March 21, 2013 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

We need this feature be fixed ASAP. So, please let me know which class is 
responsible for combining spellcheck results from all shards. I will try to 
debug the code.

Thanks in advance.
Alex.







-Original Message-
From: alxsss 
To: solr-user 
Sent: Tue, Mar 19, 2013 11:34 am
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud


-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable 

Not sure what this is?

I have

 

spell





  direct
  spell
  solr.DirectSolrSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  wordbreak
  solr.WordBreakSolrSpellChecker
  spell
  true
  true
  10




 




  


spell filed in our schema is called spell and its type also is called spell.
Here are requests


 curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler&distrib=false'




  0
  32
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  






 curl 
'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler&distrib=false'




  0
  26
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  

  1
  0
  11
  
paul u soles
  

(paul u soles)
  



No distrib param

curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler'




  0
  24
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  




curl 
'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler'




  0
  24
  
true
testhandler
paulusoles
10
  


  
0
0

  



  




Thanks.
Alex.

---Original Message-

From: Dyer, James 
To: solr-user 
Sent: Tue, Mar 19, 2013 11:10 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


You may likely be hitting on a bug with WordBreakSolrSpellChecker in a
distributed environment.  But to nail it down, we probably need to see both the
applicable  section of your config and also this section:
.  Also
need an example of a query that succeeds non-distributed (with the exact query
url and output you get) vs the same query url and output in the distributed
scenario.  Then, without access to your actual index, it might be possible to
come up with a failing unit test.  With a failing unit test in hand, we have a
good shot at getting a fix.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Tuesday, March 19, 2013 12:39 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

I was testing my custom testhandler. Direct spellchecker also was not working in

cloud. After I added

  
 spellcheck
   
to /select requestHandler it worked but the wordbreak spellchecker. I have added

shards.qt=testhanlder to curl request but it did not solve the issue.

Thanks.
Alex.







-Original Message-
From: Dyer

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread alxsss
Hello,


Further investigation shows the following pattern, for both DirectIndex and 
wordbreak spellchekers.

Assume that in all cases there are spellchecker results when distrib=false

In distributed mode (distrib=true)
  case when matches=0
1. group=true,  no spellcheck results

2. group=false , there are spellcheck results

  case when matches>0
1. group=true, there are spellcheck results
2. group =false, there are spellcheck results


Do these constitute a failing test case?

Thanks.
Alex.

 

 

-Original Message-
From: alxsss 
To: solr-user 
Sent: Thu, Mar 21, 2013 6:50 pm
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud



Hello,

I am debugging the SpellCheckComponent#finishStage. 
 
>From the responses I see that not only wordbreak, but also directSpellchecker 
does not return some results in distributed mode. 
The request handler I was using had 

true


So, I desided to turn of grouping and I see spellcheck results in distributed 
mode.


curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler'
has no spellchek results 
but

curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler
&group=false'
returns results.

So, the conclusion is that grouping causes the distributed spellcheker to fail.

Could please you point me to the class that may be responsible to this issue?

Thanks.
Alex.
 




-Original Message-
From: Dyer, James 
To: solr-user 
Sent: Thu, Mar 21, 2013 11:23 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


The shard responses get combined in SpellCheckComponent#finishStage .  I highly 
recommend you file a JIRA bug report for this at 
https://issues.apache.org/jira/browse/SOLR 

.  If you write a failing unit test, it would make it much more likely that 
others would help you with a fix.  Of course, if you solve the issue entirely, 
a 

patch would be much appreciated.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Thursday, March 21, 2013 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

We need this feature be fixed ASAP. So, please let me know which class is 
responsible for combining spellcheck results from all shards. I will try to 
debug the code.

Thanks in advance.
Alex.







-Original Message-
From: alxsss 
To: solr-user 
Sent: Tue, Mar 19, 2013 11:34 am
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud


-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable 

Not sure what this is?

I have

 

spell





  direct
  spell
  solr.DirectSolrSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  wordbreak
  solr.WordBreakSolrSpellChecker
  spell
  true
  true
  10




 




  


spell filed in our schema is called spell and its type also is called spell.
Here are requests


 curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler&distrib=false'




  0
  32
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  






 curl 
'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler&distrib=false'




  0
  26
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  

  1
  0
  11
  
paul u soles
  

(paul u soles)
  



No distrib param

curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler'




  0
  24
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  




curl 
'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler'




  0
  24
  
true
testhandler
paulusoles
10
  


  
0
0

  



  




Thanks.
Alex.

---Original Message-

From: Dyer, James 
To: solr-user 
Sent: Tue, Mar 19, 2013 11:10 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


You may likely be hitting on a bug with WordBreakSolrSpellChecker in a
distributed environment.  But to nail it down, we probably need to see both the
applicable  section of your config and also this section:
.  Also
need an example of a query that succeeds non-distributed (with the exact query
url and output you get) vs the same query url and output in the distributed
scenario.  Then, without access to your actual index, it might be possible to
come up with a failing unit 

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread alxsss
Thanks.

I can fix this, but going over code it seems it is not easy to figure out where 
the whole request and response come from.

I followed up  SpellCheckComponent#finishStage
 

 and found out that SearchHandler#handleRequestBody calls this function. 
However, which part calls handleRequestBody and how its arguments are 
constructed is not clear.


Thanks.
Alex.

 

-Original Message-
From: Dyer, James 
To: solr-user 
Sent: Fri, Mar 22, 2013 2:08 pm
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


Alex,

I added your comments to SOLR-3758 
(https://issues.apache.org/jira/browse/SOLR-3758) 
, which seems to me to be the very same issue.

If you need this to work now and if you cannot devise a fix yourself, then 
perhaps a workaround is if the query returns with 0 results, re-issue the query 
with "&rows=0&group=false" (you would omit all other optional components also). 
 
This will give you back just a spell check result.  I realize this is not 
optimal because it requires the overhead of issuing 2 queries but if you do it 
only in instances the user gets nothing (or very little) back maybe it would be 
tolerable?  Then once a viable fix is devised you can remove the extra code 
from 
your application.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Friday, March 22, 2013 12:53 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,


Further investigation shows the following pattern, for both DirectIndex and 
wordbreak spellchekers.

Assume that in all cases there are spellchecker results when distrib=false

In distributed mode (distrib=true)
  case when matches=0
1. group=true,  no spellcheck results

2. group=false , there are spellcheck results

  case when matches>0
1. group=true, there are spellcheck results
2. group =false, there are spellcheck results


Do these constitute a failing test case?

Thanks.
Alex.





-Original Message-
From: alxsss 
To: solr-user 
Sent: Thu, Mar 21, 2013 6:50 pm
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud



Hello,

I am debugging the SpellCheckComponent#finishStage.

>From the responses I see that not only wordbreak, but also directSpellchecker
does not return some results in distributed mode.
The request handler I was using had

true


So, I desided to turn of grouping and I see spellcheck results in distributed
mode.


curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler'
has no spellchek results
but

curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler
&group=false'
returns results.

So, the conclusion is that grouping causes the distributed spellcheker to fail.

Could please you point me to the class that may be responsible to this issue?

Thanks.
Alex.





-Original Message-
From: Dyer, James 
To: solr-user 
Sent: Thu, Mar 21, 2013 11:23 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


The shard responses get combined in SpellCheckComponent#finishStage .  I highly
recommend you file a JIRA bug report for this at 
https://issues.apache.org/jira/browse/SOLR

.  If you write a failing unit test, it would make it much more likely that
others would help you with a fix.  Of course, if you solve the issue entirely, a

patch would be much appreciated.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Thursday, March 21, 2013 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

We need this feature be fixed ASAP. So, please let me know which class is
responsible for combining spellcheck results from all shards. I will try to
debug the code.

Thanks in advance.
Alex.







-Original Message-
From: alxsss 
To: solr-user 
Sent: Tue, Mar 19, 2013 11:34 am
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud


-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable 

Not sure what this is?

I have

 

spell





  direct
  spell
  solr.DirectSolrSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  wordbreak
  solr.WordBreakSolrSpellChecker
  spell
  true
  true
  10




 




  


spell filed in our schema is called spell and its type also is called spell.
Here are requests


 curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler&distrib=false'




  0
  32
  
true
testhandler
paulusoles
   

Re: Query slow with termVectors termPositions termOffsets

2013-03-25 Thread alxsss
Did index size increase after turning on termPositions and termOffsets?

Thanks.
Alex.

 

 

 

-Original Message-
From: Ravi Solr 
To: solr-user 
Sent: Mon, Mar 25, 2013 8:27 am
Subject: Query slow with termVectors termPositions termOffsets


Hello,
We re-indexed our entire core of 115 docs with some of the
fields having termVectors="true" termPositions="true" termOffsets="true",
prior to the reindex we only had termVectors="true". After the reindex the
the query component has become very slow. I thought that adding the
termOffsets and termPositions will increase the speed, am I wrong ? Several
queries like the one shown below which used to run fine are now very slow.
Can somebody kindly clarify how termOffsets and termPositions affect query
component ?

19076.0
 18972.0
0.0
0.0
0.0
0.0
0.0
0.0
104.0



[#|2013-03-25T11:22:53.446-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=45;_ThreadName=httpSSLWorkerThread-9001-19;|[xxx]
webapp=/solr-admin path=/select
params={q=primarysectionnode:(/national*+OR+/health*)+OR+(contenttype:Blog+AND+subheadline:("The+Checkup"+OR+"Checkpoint+Washington"+OR+"Post+Carbon"+OR+TSA+OR+"College+Inc."+OR+"Campus+Overload"+OR+"Planet+Panel"+OR+"The+Answer+Sheet"+OR+"Class+Struggle"+OR+"BlogPost"))+OR+(contenttype:"Photo+Gallery"+AND+headline:"day+in+photos")&start=0&rows=1&sort=displaydatetime+desc&fq=-source:(Reuters+OR+"PC+World"+OR+"CBS+News"+OR+NC8/WJLA+OR+"NewsChannel+8"+OR+NC8+OR+WJLA+OR+CBS)+-contenttype:("Discussion"+OR+"Photo")+-slug:(op-*dummy*+OR+noipad-*)+-(contenttype:"Photo+Gallery"+AND+headline:("Drawing+Board"+OR+"Drawing+board"+OR+"drawing+board"))+headline:[*+TO+*]+contenttype:[*+TO+*]+pubdatetime:[NOW/DAY-3YEARS+TO+NOW/DAY%2B1DAY]+-headline:("Summary+Box*"+OR+"Video*"+OR+"Post+Sports+Live*")+-slug:(warren*+OR+"history")+-(contenttype:Blog+AND+subheadline:("DC+Schools+Insider"+OR+"On+Leadership"))+contenttype:"Blog"+-systemid:(999c7102-955a-11e2-95ca-dd43e7ffee9c+OR+72bbb724-9554-11e2-95ca-dd43e7ffee9c+OR+2d008b80-9520-11e2-95ca-dd43e7ffee9c+OR+d2443d3c-9514-11e2-95ca-dd43e7ffee9c+OR+173764d6-9520-11e2-95ca-dd43e7ffee9c+OR+0181fd42-953c-11e2-95ca-dd43e7ffee9c+OR+e6cacb96-9559-11e2-95ca-dd43e7ffee9c+OR+03288052-9501-11e2-95ca-dd43e7ffee9c+OR+ddbf020c-9517-11e2-95ca-dd43e7ffee9c)+fullbody:[*+TO+*]&wt=javabin&version=2}
hits=4985 status=0 QTime=19044 |#]

Thanks,

Ravi Kiran Bhaskar

 


Re: Spellchecker not working for Solr 4.1

2013-04-11 Thread alxsss
inside your request handler try to put spellcheck true and name of the 
spellcheck dictionary

hth

Alex.

 

 

 

-Original Message-
From: davers 
To: solr-user 
Sent: Thu, Apr 11, 2013 6:24 pm
Subject: Spellchecker not working for Solr 4.1


This is almost the same exact setup I was using in solr 3.6 not sure why it's
not working. Here is my setup.


textSpell


   
 default
 spell
 solr.DirectSolrSpellChecker
 
 internal
 
 0.7
 
  2
 
 1
 
 5
 
 4
 
 0.01
 
   
   


   

   

   
 
 
 


  text
  edismax
  0.01
  
sku^9.0 upc^9.1 uniqueid^9.0 series^2.8 productTitle^1.2
productid^9.0 manufacturer^4.0 masterFinish^1.5 theme^1.1 categoryName^0.2
finish^1.4
  
  
text^0.2 productTitle^1.5 manufacturer^4.0 finish^1.9
  
  
linear(popularity_82_i,1,2)^3.0
  
  
uniqueid,productid,manufacturer
  
  
3<-1 5<-2 6<90%
  
  true
  groupid
  true
  100
  3
  10
  true
  10
  100


  spellcheck

  



  





  
  






  



This is what I see in my logs when I attempt a spellcheck

INFO: [productindex] webapp=/solr path=/select
params={spellcheck=false&group.distributed.first=true&tie=0.01&spellcheck.maxCollationTries=100&distrib=false&version=2&NOW=1365729795603&shard.url=solr-shard-1.sys.id.build.com:8080/solr/productindex/|solr-shard-4.sys.id.build.com:8080/solr/productindex/&fl=id,score&df=text&bf=%0a%09%09linear(popularity_82_i,1,2)^3.0%0a%09%09++&group.field=groupid&spellcheck.count=10&qs=3&spellcheck.build=true&mm=%0a%09%093<-1+5<-2+6<90%25%0a%09%09++&group.ngroups=true&spellcheck.maxCollations=10&qf=%0a%09%09sku^9.0+upc^9.1+uniqueid^9.0+series^2.8+productTitle^1.2+productid^9.0+manufacturer^4.0+masterFinish^1.5+theme^1.1+categoryName^0.2+finish^1.4%0a%09%09++&wt=javabin&spellcheck.collate=true&defType=edismax&rows=10&pf=%0a%09%09text^0.2+productTitle^1.5+manufacturer^4.0+finish^1.9%0a%09%09++&start=0&q=fuacet&group=true&isShard=true&ps=100}
status=0 QTime=13
Apr 11, 2013 6:23:15 PM
org.apache.solr.handler.component.SpellCheckComponent finishStage
INFO:
solr-shard-2.sys.id.build.com:8080/solr/productindex/|solr-shard-5.sys.id.build.com:8080/solr/productindex/
null
Apr 11, 2013 6:23:15 PM
org.apache.solr.handler.component.SpellCheckComponent finishStage
INFO:
solr-shard-3.sys.id.build.com:8080/solr/productindex/|solr-shard-6.sys.id.build.com:8080/solr/productindex/
null
Apr 11, 2013 6:23:15 PM
org.apache.solr.handler.component.SpellCheckComponent finishStage
INFO:
solr-shard-1.sys.id.build.com:8080/solr/productindex/|solr-shard-4.sys.id.build.com:8080/solr/productindex/
null



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecker-not-working-for-Solr-4-1-tp4055450.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


Re: solr-cloud performance decrease day by day

2013-04-19 Thread alxsss
How many segments each shard has and what is the reason of running multiple 
shards in one machine?

Alex.

 

 

 

-Original Message-
From: qibaoyuan 
To: solr-user 
Sent: Fri, Apr 19, 2013 12:26 am
Subject: Re: solr-cloud performance decrease day by day


there are 6 shards and they are in one machine,and the jvm param is very 
big,the 
physical memory is 16GB,the total #docs is about 150k,the index size of each 
shard is about 1GB.AND there is indexing while searching,I USE auto commit  
each 
10min.and the data comes about 100 per minutes. 


在 2013-4-19,下午3:17,Furkan KAMACI  写道:

> Could you give more info about your index size and technical details of
> your machine? Maybe you are indexing more data day by day and your RAM
> capability is not enough anymore?
> 
> 2013/4/19 qibaoyuan 
> 
>> Hello,
>>   i am using sold 4.1.0 and ihave used sold cloud in my product.I have
>> found at first everything seems good,the search time is fast and delay is
>> slow,but it becomes very slow after days.does any one knows if there maybe
>> some params or optimization to use sold cloud?


 


Re: EdgeGram filter

2013-04-23 Thread alxsss
Hi,

I was unable to find more info about 
LimitTokenCountFilterFactory
 in solr wiki. Is there any other place to get thorough description of what it 
does?

Thanks.
Alex.

 

 

 

-Original Message-
From: Jack Krupansky 
To: solr-user 
Sent: Tue, Apr 23, 2013 11:36 am
Subject: Re: EdgeGram filter


Well, you could copy to another field (using copyField) and then have an 
analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and 
then apply the EdgeNGramFilter to that one token. But you would have to 
query explicitly against that other field. Since you are using dismax, you 
should be able to add that second field to the qf parameter. And then remove 
the EdgeNGramFilter from your main field.

-- Jack Krupansky

-Original Message- 
From: hassancrowdc
Sent: Tuesday, April 23, 2013 12:09 PM
To: solr-user@lucene.apache.org
Subject: EdgeGram filter

Hi,

I want to edgeNgram let's say this document that has 'difficult contents' so
that if i query (using disman) q=dif  it shows me this result. This is
working fine. But now if i search for q=con it gives me this document as
well. is there any way to only show this document when i search for 'dif' or
'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and
'content'. Any help?


Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html
Sent from the Solr - User mailing list archive at Nabble.com. 


 


Re: EdgeGram filter

2013-04-23 Thread alxsss
Hi,

I did not find any descriptions, except constructor and method names. 

Thanks.
Alex.
 

 

 

-Original Message-
From: Markus Jelsma 
To: solr-user 
Sent: Tue, Apr 23, 2013 12:08 pm
Subject: RE: EdgeGram filter


Always check the javadocs. There's a lot of info to be found there:
http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilterFactory.html

 
 
-Original message-
> From:alx...@aim.com 
> Sent: Tue 23-Apr-2013 21:06
> To: solr-user@lucene.apache.org
> Subject: Re: EdgeGram filter
> 
> Hi,
> 
> I was unable to find more info about 
> LimitTokenCountFilterFactory
>  in solr wiki. Is there any other place to get thorough description of what 
> it 
does?
> 
> Thanks.
> Alex.
> 
>  
> 
>  
> 
>  
> 
> -Original Message-
> From: Jack Krupansky 
> To: solr-user 
> Sent: Tue, Apr 23, 2013 11:36 am
> Subject: Re: EdgeGram filter
> 
> 
> Well, you could copy to another field (using copyField) and then have an 
> analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and 
> then apply the EdgeNGramFilter to that one token. But you would have to 
> query explicitly against that other field. Since you are using dismax, you 
> should be able to add that second field to the qf parameter. And then remove 
> the EdgeNGramFilter from your main field.
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: hassancrowdc
> Sent: Tuesday, April 23, 2013 12:09 PM
> To: solr-user@lucene.apache.org
> Subject: EdgeGram filter
> 
> Hi,
> 
> I want to edgeNgram let's say this document that has 'difficult contents' so
> that if i query (using disman) q=dif  it shows me this result. This is
> working fine. But now if i search for q=con it gives me this document as
> well. is there any way to only show this document when i search for 'dif' or
> 'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and
> 'content'. Any help?
> 
> 
> Thanks.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html
> Sent from the Solr - User mailing list archive at Nabble.com. 
> 
> 
>  
> 

 


whole index in memory

2013-05-31 Thread alxsss
Hello,

I have a solr index of size 5GB. I am thinking of increasing  cache size to 5 
GB, expecting Solr will put whole index into memory.

1. Will Solr indeed put whole index into memory?
2. What are drawbacks of this approach?

Thanks in advance.
Alex.


Re: document id in nutch/solr

2013-06-24 Thread alxsss
Another way of overriding nutch fields is to modify solrindex-mapping.xml file.

hth
Alex.

 

 

 

-Original Message-
From: Jack Krupansky 
To: solr-user 
Sent: Sun, Jun 23, 2013 12:04 pm
Subject: Re: document id in nutch/solr


Add the "passthrough" dynamic field to your Solr schema, and then see what 
fields get passed through to Solr from Nutch. Then, add the missing fields 
to your Solr schema and remove the passthrough.



Or, add Solr  directives to place fields in existing named 
fields.

Or... talk to the nutch people about how to do field name mapping on the 
nutch side of the fence.

Hold off on UUIDs until you figure all of the above out and everything is 
working without them.

-- Jack Krupansky

-Original Message- 
From: Joe Zhang
Sent: Sunday, June 23, 2013 2:35 PM
To: solr-user@lucene.apache.org
Subject: Re: document id in nutch/solr

Can somebody help with this one, please?


On Fri, Jun 21, 2013 at 10:36 PM, Joe Zhang  wrote:

> A quite standard configuration of nutch seems to autoamtically map "url"
> to "id". Two questions:
>
> - Where is such mapping defined? I can't find it anywhere in
> nutch-site.xml or schema.xml. The latter does define the "id" field as 
> well
> as its uniqueness, but not the mapping.
>
> - Given that nutch nutch has already defined such an id, can i ask solr to
> redefine id as UUID?
> 
>
> - This leads to a related question: do solr and nutch have to have
> IDENTICAL schema.xml?
> 


 


spellchecking in nutch solr

2011-09-01 Thread alxsss


Hello,
I have tried to implement spellchecker based on index in nutch-solr by adding 
spell field to schema.xml and making it a copy from content field. However, 
this increased data folder size twice and spell filed as a copy of content 
field appears in xml feed which is not necessary. Is it possible to implement 
spellchecker without this issue?

Thanks.
Alex.
 


grouping by alpha-numeric field

2011-09-07 Thread alxsss

 

 Hello,

I try to group by a field with type string. In the results I see groupValues as 
parts of the group field.

Any ideas how to fix this.

Thanks.
Alex.






pagination with grouping

2011-09-08 Thread alxsss

 

 Hello,

When trying to implement pagination as in the case without grouping I see two 
issues.
1. with rows=10 solr feed displays 10 groups not 10 results
2. there is no total number of results with grouping  to show the last page.

In detail:
1. I need to display only 10 results in one page. For example if I have 
group.limit=5 and the first group has 5 docs, the second 3 and the third 2 then 
only these 3 group must be displayed in the first page.
Currently specifying rows=10, shows 10 groups and if we have 5 docs in each 
group then in the first page we will have 50 docs.

2.I need to show the last page, for which I need total number of results with 
grouping. For example if I have 5 groups with number of docs 5, 4, 3,2 1 then 
this total number must be 15.

Any ideas how to achieve this.

Thanks in advance.
Alex.





Re: pagination with grouping

2011-09-12 Thread alxsss
Is case #2 planned to be coded in the future releases?

Thanks.
Alex.

 

 


 

 

-Original Message-
From: Bill Bell 
To: solr-user 
Sent: Thu, Sep 8, 2011 10:17 pm
Subject: Re: pagination with grouping


There are 2 use cases:

1. rows=10 means 10 groups.
2. rows=10 means to results (irregardless of groups).

I thought there was a total number of groups (ngroups) or case #1.

I don't believe case #2 has been coded.

On 9/8/11 2:22 PM, "alx...@aim.com"  wrote:

>
> 
>
> Hello,
>
>When trying to implement pagination as in the case without grouping I see
>two issues.
>1. with rows=10 solr feed displays 10 groups not 10 results
>2. there is no total number of results with grouping  to show the last
>page.
>
>In detail:
>1. I need to display only 10 results in one page. For example if I have
>group.limit=5 and the first group has 5 docs, the second 3 and the third
>2 then only these 3 group must be displayed in the first page.
>Currently specifying rows=10, shows 10 groups and if we have 5 docs in
>each group then in the first page we will have 50 docs.
>
>2.I need to show the last page, for which I need total number of results
>with grouping. For example if I have 5 groups with number of docs 5, 4,
>3,2 1 then this total number must be 15.
>
>Any ideas how to achieve this.
>
>Thanks in advance.
>Alex.
>
>
>



 


apply filter to spell filed

2011-09-27 Thread alxsss

 

 Hello,

I have implemented spellchecker in two ways. 
1. Adding a textspell type to schema.xml and making a copy field from original 
content field, which is type text.
2. without adding new type and copy field. Simple adding name of spell field, 
content to solrconfig.xml

I have an issue in both cases. In case 1. data folder becomes twice bigger and 
it comes with additional copy field which is a exact copy of content field and 
is an unnecessary data .
In case 2 , suggestions are lower cases of search keywords, i.e. if a user 
searches for "Jessica Alba", solr suggests "jessica alba". 

So my question is that is it possible to resolve this issue without adding 
additional type and  copy field to the schema.xml?

Thanks.
Alex.




how to achieve google.com like results for phrase queries

2011-11-03 Thread alxsss
Hello,

I use nutch-1.3 crawled results in solr-3.4. I noticed that for two word 
phrases like newspaper latimes, latimes.com is not in results at all.
This may be due to the dismax def type that I use in  request handler 

dismax
url^1.5 id^1.5 content^ title^1.2
url^1.5 id^1.5 content^0.5 title^1.2


 with mm as
2<-1 5<-2 6<90% 

However, changing it to 
1<-1 2<-1 5<-2 6<90% 

and q.op to OR or AND 

do not solve the problem. In this case latimes.com is ranked higher, but still 
is not in the first place.
Also in this case results with both words are ranked very low, almost at the 
end.

We need to be able to achieve the case when latimes.com is placed in the first 
place then results with both words and etc.

Any ideas how to modify config to this end?

Thanks in advance.
Alex.



Re: how to achieve google.com like results for phrase queries

2011-11-05 Thread alxsss
Hi Erick,

The term  "newspaper latimes" is not found in latimes.com. However, google 
places it in the first place. My guess is that mm parameter must  not be set as 
2<-1 in order to achieve google.com like ranking for two word phrase queries.

My goal is to set mm parameter in such a way that latimes.com will be ranked in 
1-3rd places and sites with both words will be placed after them. As I wrote in 
my previous letter
setting mm as 1<-1 solves this issue partially. Problem in this case is that 
sites with both words are placed at the bottom or are not in the search results 
at all.

Thanks.
Alex.

 
 

 

-Original Message-
From: Erick Erickson 
To: solr-user 
Sent: Sat, Nov 5, 2011 9:01 am
Subject: Re: how to achieve google.com like results for phrase queries


First, the default query operator is ignored by edismax, so that's
not doing anything.

Why would you expect "newspaper latimes" to be found at all in
"latimes.com"? What
proof do you have that the two terms are even in the "latimes.com" document?

You can look at the Query Elevation Component to force certain known
documents to the top of the results based on the search terms, but that's
not a very elegant solution.

What business requirement are you trying to accomplish here? Because as
asked, there's really not enough information to provide a meaningful
suggestion.

Best
Erick

On Thu, Nov 3, 2011 at 7:30 PM,   wrote:
> Hello,
>
> I use nutch-1.3 crawled results in solr-3.4. I noticed that for two word 
phrases like newspaper latimes, latimes.com is not in results at all.
> This may be due to the dismax def type that I use in  request handler
>
> dismax
> url^1.5 id^1.5 content^ title^1.2
> url^1.5 id^1.5 content^0.5 title^1.2
>
>
>  with mm as
> 2<-1 5<-2 6<90%
>
> However, changing it to
> 1<-1 2<-1 5<-2 6<90%
>
> and q.op to OR or AND
>
> do not solve the problem. In this case latimes.com is ranked higher, but 
> still 
is not in the first place.
> Also in this case results with both words are ranked very low, almost at the 
end.
>
> We need to be able to achieve the case when latimes.com is placed in the 
> first 
place then results with both words and etc.
>
> Any ideas how to modify config to this end?
>
> Thanks in advance.
> Alex.
>
>

 
 


Re: how to achieve google.com like results for phrase queries

2011-11-07 Thread alxsss
Solr also can query link(url) text and rank them higher if we specify url in qf 
field. Only problem is that why it does not rank pages with both words higher 
when mm is set as 
1<-1. It seems to me that this is a bug.

Thanks.
Alex.

 
 

 

-Original Message-
From: Ted Dunning 
To: solr-user 
Sent: Sat, Nov 5, 2011 8:59 pm
Subject: Re: how to achieve google.com like results for phrase queries


Google achieves their results by using data not found in the web pages
themselves.  This additional data critically includes link text, but also
is derived from behavioral information.



On Sat, Nov 5, 2011 at 5:07 PM,  wrote:

> Hi Erick,
>
> The term  "newspaper latimes" is not found in latimes.com. However,
> google places it in the first place. My guess is that mm parameter must
>  not be set as 2<-1 in order to achieve google.com like ranking for
> two word phrase queries.
>
> My goal is to set mm parameter in such a way that latimes.com will be
> ranked in 1-3rd places and sites with both words will be placed after them.
> As I wrote in my previous letter
> setting mm as 1<-1 solves this issue partially. Problem in this case is
> that sites with both words are placed at the bottom or are not in the
> search results at all.
>
> Thanks.
> Alex.
>
>
>
>
>
>
> -Original Message-
> From: Erick Erickson 
> To: solr-user 
> Sent: Sat, Nov 5, 2011 9:01 am
> Subject: Re: how to achieve google.com like results for phrase queries
>
>
> First, the default query operator is ignored by edismax, so that's
> not doing anything.
>
> Why would you expect "newspaper latimes" to be found at all in
> "latimes.com"? What
> proof do you have that the two terms are even in the "latimes.com"
> document?
>
> You can look at the Query Elevation Component to force certain known
> documents to the top of the results based on the search terms, but that's
> not a very elegant solution.
>
> What business requirement are you trying to accomplish here? Because as
> asked, there's really not enough information to provide a meaningful
> suggestion.
>
> Best
> Erick
>
> On Thu, Nov 3, 2011 at 7:30 PM,   wrote:
> > Hello,
> >
> > I use nutch-1.3 crawled results in solr-3.4. I noticed that for two word
> phrases like newspaper latimes, latimes.com is not in results at all.
> > This may be due to the dismax def type that I use in  request handler
> >
> > dismax
> > url^1.5 id^1.5 content^ title^1.2
> > url^1.5 id^1.5 content^0.5 title^1.2
> >
> >
> >  with mm as
> > 2<-1 5<-2 6<90%
> >
> > However, changing it to
> > 1<-1 2<-1 5<-2 6<90%
> >
> > and q.op to OR or AND
> >
> > do not solve the problem. In this case latimes.com is ranked higher,
> but still
> is not in the first place.
> > Also in this case results with both words are ranked very low, almost at
> the
> end.
> >
> > We need to be able to achieve the case when latimes.com is placed in
> the first
> place then results with both words and etc.
> >
> > Any ideas how to modify config to this end?
> >
> > Thanks in advance.
> > Alex.
> >
> >
>
>
>
>

 


Re: two word phrase search using dismax

2011-11-15 Thread alxsss
Hello,

Thanks for your letter. I investigated further and found out that we have title 
scored more than content in qf field and those docs in the first places have 
one of the words in title but not both of them.
The doc in the first place has only one of the words in the content.
Docs with both words in content are placed after them in around 20th place.

After putting the same score for title and content in qf filed,  docs with both 
words in content moved to fifth place. The doc in the first, third and fourth 
places still have only one of the words in content and title.
The doc in the second place has one of the words in title and both words in the 
content but in different places not together.

Thanks.
Alex.
 

-Original Message-
From: Michael Kuhlmann 
To: solr-user 
Sent: Tue, Nov 15, 2011 12:20 am
Subject: Re: two word phrase search using dismax


Am 14.11.2011 21:50, schrieb alx...@aim.com:
> Hello,
>
> I use solr3.4 and nutch 1.3. In request handler we have
> 2<-1 5<-2 6<90%
>
> As fas as I know this means that for two word phrase search match must be 
100%.
> However, I noticed that in most cases documents with both words are ranked 
around 20 place.
> In the first places are documents with one of the words in the phrase.
>
> Any ideas why this happening and is it possible to fix it?

Hi,

are you sure that only one of the words matched in the found documents? 
Have you checked all fields that are listed in the qf parameter? And did 
you check for stemmed versions of your search terms?

If all this is true, you maybe want to give an example.

And AFAIK the mm parameter does not affect the ranking.


 


jetty error, broken pipe

2011-11-19 Thread alxsss
Hello,

I use solr 3.4 with jetty that is included in it. Periodically, I see this 
error in the jetty output

SEVERE: org.mortbay.jetty.EofException
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
at 
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
at 
org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:296)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
...
...
...
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)
at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)
at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)
... 25 more

2011-11-19 20:50:00.060:WARN::Committed before 500 
null||org.mortbay.jetty.EofException|?at 
org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at 
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at
 org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at 
sun.nio.cs.StreamEncoder.implFlush(S

I searched web and the only advice I get is to upgrade to jetty 6.1, but I 
think the version included in solr is 6.1.26.

Any advise is appreciated.


Thanks.
Alex.


Re: jetty error, broken pipe

2011-11-19 Thread alxsss
I found out that curl timeout was set to 10 and for queries taking longer than 
10 sec it was closing connection to jetty.
I noticed that when number of docs found is large solr returns results for 
about 20 sec. This is too long. I set caching to off but it did not help.
I think solr spends too much time to find total number of docs. Is there a way 
to turn off this count?

Thanks.
Alex.

 

 
-Original Message-
From: Fuad Efendi 
To: solr-user 
Cc: solr-user 
Sent: Sat, Nov 19, 2011 7:24 pm
Subject: Re: jetty error, broken pipe


It's not Jetty. It is broken TCP pipe due to client-side. It happens when 
client 
closes TCP connection.

And I even had this problem with recent Tomcat 6.


Problem disappeared after I explicitly tuned keep-alive at Tomcat, and started 
using "monitoring thread" with HttpClient and SOLRJ... 

Fuad Efendi
http://www.tokenizer.ca




Sent from my iPad

On 2011-11-19, at 9:14 PM, alx...@aim.com wrote:

> Hello,
> 
> I use solr 3.4 with jetty that is included in it. Periodically, I see this 
error in the jetty output
> 
> SEVERE: org.mortbay.jetty.EofException
>at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
>at 
> org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
>at 
> org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)
>at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:296)
>at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)
>at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
> ...
> ...
> ...
> Caused by: java.net.SocketException: Broken pipe
>at java.net.SocketOutputStream.socketWrite0(Native Method)
>at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)
>at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)
>at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)
>at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)
>... 25 more
> 
> 2011-11-19 20:50:00.060:WARN::Committed before 500 
> null||org.mortbay.jetty.EofException|?at 
org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at 
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at
 
org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at 
sun.nio.cs.StreamEncoder.implFlush(S
> 
> I searched web and the only advice I get is to upgrade to jetty 6.1, but I 
think the version included in solr is 6.1.26.
> 
> Any advise is appreciated.
> 
> 
> Thanks.
> Alex.

 


Re: spellcheck in dismax

2011-11-22 Thread alxsss

 It seem you forget this
true


 

 

-Original Message-
From: Ruixiang Zhang 
To: solr-user 
Sent: Tue, Nov 22, 2011 11:54 am
Subject: spellcheck in dismax


I put the following into dismax requestHandler, but no suggestion field is
returned.


  true
  false
  1


  spellcheck


But everything works if I put it as a separate requestHandler. Did I miss
something?

Thanks
Richard

 


less search results in prod

2011-12-03 Thread alxsss
Hello,

I have build solr-3.4.0 data folder in dev server and copied it to prod server. 
Made a search for a keyword, then modified qf and pf params in solrconfig.xml. 
Made search for the same keywords, then restored qf and pf params to their 
original value. Now, solr returns very less number of docs for the same 
keywords in comparison with the dev server. Tried other keywords, the issue is 
the same. Copied solrconfig.xml from dev server, but nothing changed.  Took a 
look to statistics, the numDocs and maxDoc values are the same in both servers.







Any ideas how to debug this issue?

Thanks in advance.
Alex.


Re: two word phrase search using dismax

2011-12-03 Thread alxsss
Hello,

Here is my request handler



edismax
explicit
0.01
site^1.5 content^0.5 title^1.2
site^1.5 content^0.5 title^1.2
id,title, site
2<-1 5<-2 6<90%
300
true
*:*
content
0
165
title
0
url
regex



I have made a few tests with debugQuery and realised that for two word phrases, 
solr takes the first word and gives it a score according to qf param then takes 
the second word and gives it score and etc, but not to the whole phrase. That 
is why if one of the words is in the title and one of them in the content then 
this doc is given higher score than the one that has both words in the content 
but none in the title.

Ideally, I want to achieve the following order.
1. If one (or both) of the words are in field site, then it must be given 
higher score.
2. Then come docs with both words in the title.
3. Next, docs with both words in the content.
4. And finally docs having either of words in the title and content.

I tried to change mm param to 1<-1 5<-2 6<90%
This allows to achieve 1,4 but not 2,3

Thanks.
Alex.






 

 

 

-Original Message-
From: Chris Hostetter 
To: solr-user 
Sent: Thu, Nov 17, 2011 2:17 pm
Subject: Re: two word phrase search using dismax




: After putting the same score for title and content in qf filed, docs 

: with both words in content moved to fifth place. The doc in the first, 

: third and fourth places still have only one of the words in content and 

: title. The doc in the second place has one of the words in title and 

: both words in the content but in different places not together.



details matter -- if you send futher followup mails the full details of 

your dismax options and the score explanations for debugQuery are 

neccessary to be sure people understand what you are describing (a 

snapshot of reality is far more valuable then a vague description of 

reality)



off hand what you are describing sounds correct -- this is what the 

dismax parser is really designed to do.



even if you have given both title and content equal boosts, your title 

field is probably shorter then your content field, so words matching once 

in title are likly to score higher then the same word matching once in 

content due to length normalization -- and unless you set the "tie" param 

to something really high, the score contribution from the highest scoring 

field (in this case title) will be the dominant factor in the score (it's 

disjunction *max* by default ... if you make tie=1 then it's disjunction 

*sum*)



you haven't mentioned anything about hte "pf" param at all which i can 

only assume means you aren't using it -- the pf param is how you configure 

that scores should be increased if/when all of the words in teh query 

string appear together.  I would suggest putting all of the fields in your 

"qf" param in your "pf" param as well.





-Hoss


 


Re: two word phrase search using dismax

2011-12-05 Thread alxsss
Hi Eric, 

After reading more about pf param I increased them a few times and this solved 
options 2, 3, 4 but 1. As an example,  for  phrase "newspaper latimes" 
latimes.com is not even in the results to boost it to the first place and 
changing mm param to   1<-1 5<-2 6<90% solves 
only 1,4 but 2,3.

Thanks.
Alex.

 

 

 

-Original Message-
From: Erick Erickson 
To: solr-user 
Sent: Mon, Dec 5, 2011 5:52 am
Subject: Re: two word phrase search using dismax


Have you looked at the "pf" (phrase fields)
parameter of edismax?

http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29

Best
Erick

On Sat, Dec 3, 2011 at 7:04 PM,   wrote:
> Hello,
>
> Here is my request handler
>
> 
> 
> edismax
> explicit
> 0.01
> site^1.5 content^0.5 title^1.2
> site^1.5 content^0.5 title^1.2
> id,title, site
> 2<-1 5<-2 6<90%
> 300
> true
> *:*
> content
> 0
> 165
> title
> 0
> url
> regex
> 
> 
>
> I have made a few tests with debugQuery and realised that for two word 
phrases, solr takes the first word and gives it a score according to qf param 
then takes the second word and gives it score and etc, but not to the whole 
phrase. That is why if one of the words is in the title and one of them in the 
content then this doc is given higher score than the one that has both words in 
the content but none in the title.
>
> Ideally, I want to achieve the following order.
> 1. If one (or both) of the words are in field site, then it must be given 
higher score.
> 2. Then come docs with both words in the title.
> 3. Next, docs with both words in the content.
> 4. And finally docs having either of words in the title and content.
>
> I tried to change mm param to 1<-1 5<-2 6<90%
> This allows to achieve 1,4 but not 2,3
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
>
>
>
>
>
> -Original Message-
> From: Chris Hostetter 
> To: solr-user 
> Sent: Thu, Nov 17, 2011 2:17 pm
> Subject: Re: two word phrase search using dismax
>
>
>
>
> : After putting the same score for title and content in qf filed, docs
>
> : with both words in content moved to fifth place. The doc in the first,
>
> : third and fourth places still have only one of the words in content and
>
> : title. The doc in the second place has one of the words in title and
>
> : both words in the content but in different places not together.
>
>
>
> details matter -- if you send futher followup mails the full details of
>
> your dismax options and the score explanations for debugQuery are
>
> neccessary to be sure people understand what you are describing (a
>
> snapshot of reality is far more valuable then a vague description of
>
> reality)
>
>
>
> off hand what you are describing sounds correct -- this is what the
>
> dismax parser is really designed to do.
>
>
>
> even if you have given both title and content equal boosts, your title
>
> field is probably shorter then your content field, so words matching once
>
> in title are likly to score higher then the same word matching once in
>
> content due to length normalization -- and unless you set the "tie" param
>
> to something really high, the score contribution from the highest scoring
>
> field (in this case title) will be the dominant factor in the score (it's
>
> disjunction *max* by default ... if you make tie=1 then it's disjunction
>
> *sum*)
>
>
>
> you haven't mentioned anything about hte "pf" param at all which i can
>
> only assume means you aren't using it -- the pf param is how you configure
>
> that scores should be increased if/when all of the words in teh query
>
> string appear together.  I would suggest putting all of the fields in your
>
> "qf" param in your "pf" param as well.
>
>
>
>
>
> -Hoss
>
>
>

 
 


Re: How to apply relevant Stemmer to each document

2011-12-22 Thread alxsss
Hi Erick,

Why querying would be wrong? 

It is my understanding that if I have let say 3 docs and each of them has been 
indexed with its own language stemmer, then sending a query will search  all  
docs and return matching results? Let say if a query is "driving" and one of 
the docs has drive and was stemmed by English Stemmer, then it would return 1 
result as opposed if I had applied to all docs Russian lang stemmer and resuilt 
be 0 docs?

Am I missing something?

Thanks.
Alex.

  

 

 

 

-Original Message-
From: Erick Erickson 
To: solr-user 
Sent: Thu, Dec 22, 2011 11:06 am
Subject: Re: How to apply relevant Stemmer to each document


Not really. And it's hard to make sense of how this would work in practice
because stemming the document (even if you could) because that's only
half the battle.

How would querying work then? No matter what language you used
for your stemming, it would be wrong for all the documents that used a
different stemmer (or a stemmer based on a different language).

So I wouldn't hold out too much hope here.

Best
Erick

On Wed, Dec 21, 2011 at 4:09 PM,   wrote:
> Hello,
>
> I would like to know if in the latest version of solr is it possible to apply 
relevant stemmer to each doc depending on its lang field.
> I searched solr-user mailing lists and fount this thread
>
> http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-td3235341.html
>
> but not sure if it was developed into a jira ticket.
>
> Thanks.
> Alex.
>
>

 


can solr automatically search for different punctuation of a word

2012-01-12 Thread alxsss
Hello,

I would like to know if solr has a functionality to automatically search for a 
different punctuation of a word. 
For example if I if a user searches for a word Uber, and stemmer is german 
lang, then solr looks for both Uber and  Über,  like in synonyms.

Is it possible to give a file with a list of possible substitutions of letters 
to solr and have it search for all possible punctuations?


Thanks.
Alex.


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread alxsss
Depending on your architecture, why not index the same data into two machines? 
One will be your prod another your backup?

Thanks.
Alex.

 

 

 

-Original Message-
From: Upayavira 
To: solr-user 
Sent: Thu, Dec 20, 2012 11:51 am
Subject: Re: Pause and resume indexing on SolR 4 for backups


You're saying that there's no chance to catch it in the middle of
writing the segments file?

Having said that, the segments file is pretty small, so the chance would
be pretty slim.

Upayavira

On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote:
> To be clear: 1) is fine. Lucene index updates are carefully sequenced so 
> that the index is never in a bogus state. All data files are written and 
> flushed to disk, then the segments.* files are written that match the 
> data files. You can capture the files with a set of hard links to create 
> a backup.
> 
> The CheckIndex program will verify the index backup.
> java -cp yourcopy/lucene-core-SOMETHING.jar 
> org.apache.lucene.index.CheckIndex collection/data/index
> 
> lucene-core-SOMETHING.jar is usually in the solr-webapp directory where 
> Solr is unpacked.
> 
> On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote:
> > Hi all.
> >
> > Can anyone advise me of a way to pause and resume SolR 4 so I can 
> > perform a backup? I need to be able to revert to a usable (though not 
> > necessarily complete) index after a crash or other "disaster" more 
> > quickly than a re-index operation would yield.
> >
> > I can't yet afford the "extravagance" of a separate SolR replica just 
> > for backups, and I'm not sure if I'll ever have the luxury. I'm 
> > currently running with just one node, be we are not yet live.
> >
> > I can think of the following ways to do this, each with various 
> > downsides:
> >
> > 1) Just backup the existing index files whilst indexing continues
> > + Easy
> > + Fast
> > - Incomplete
> > - Potential for corruption? (e.g. partial files)
> >
> > 2) Stop/Start Tomcat
> > + Easy
> > - Very slow and I/O, CPU intensive
> > - Client gets errors when trying to connect
> >
> > 3) Block/unblock SolR port with IpTables
> > + Fast
> > - Client gets errors when trying to connect
> > - Have to wait for existing transactions to complete (not sure 
> > how, maybe watch socket FD's in /proc)
> >
> > 4) Pause/Restart SolR service
> > + Fast ? (hopefully)
> > - Client gets errors when trying to connect
> >
> > In any event, the web app will have to gracefully handle 
> > unavailability of SolR, probably by displaying a "down for 
> > maintenance" message, but this should preferably be only a very short 
> > amount of time.
> >
> > Can anyone comment on my proposed solutions above, or provide any 
> > additional ones?
> >
> > Thanks for any input you can provide!
> >
> > -Andy
> >
> 

 


Re: long QTime for big index

2013-02-14 Thread alxsss
Hi,

It is curious to know how many linux boxes do you have and how many cores in 
each of them. It was my understanding that solr puts in the memory all 
documents found for a keyword, not the whole index. So, why it must be faster 
with more cores, when number of selected documents from many separate cores  
are the same as from one core? 

Thanks.
Alex.

 

 

 

-Original Message-
From: Mou 
To: solr-user 
Sent: Thu, Feb 14, 2013 2:35 pm
Subject: Re: long QTime for big index


Just to close this discussion , we solved the problem by splitting the index.
It turned out that distributed search with 12 cores are faster than
searching two cores.

All queries ,tomcat configuration, jvm configuration remain same. Now
queries are served in milliseconds.


On Thu, Jan 31, 2013 at 9:34 PM, Mou [via Lucene]
 wrote:
> Thank you again.
>
> Unfortunately the index files will not fit in the RAM.I have to try using
> document cache. I am also moving my index to SSD again, we took our index
> off when fusion IO cards failed twice during indexing and index was
> corrupted.Now with the bios upgrade and new driver, it is supposed to be
> more reliable.
>
> Also I am going to look into the client app to verify that it is making
> proper query requests.
>
> Surprisingly when I used a much lower value than default for
> defaultconnectionperhost and maxconnectionperhost in solrmeter , it performs
> very well, the same queries return in less than one sec . I am not sure yet,
> need to run solrmeter with different heap size , with cache and without
> cache etc.
>
> 
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html
> To unsubscribe from long QTime for big index, click here.
> NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040535.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


Re: How do I create two collections on the same cluster?

2013-02-22 Thread alxsss
Hi,

What if you add new collection to solr.xml file?

Alex.

 

 

 

-Original Message-
From: Shankar Sundararaju 
To: solr-user 
Sent: Thu, Feb 21, 2013 8:51 pm
Subject: How do I create two collections on the same cluster?


I am using Solr 4.1.

I created collection1 consisting of 2 leaders and 2 replicas (2 shards) at
boot time.

After the cluster is up, I am trying to create collection2 with 2 leaders
and 2 replicas just like collection1. I am using following collections API
for that:

http://localhost:7575/solr/admin/collections?action=CREATE&name=collection2&numShards=2&replicationFactor=2&collection.configName=myconf&createNodeSet=localhost:8983_solr,localhost:7574_solr,localhost:7575_solr,localhost:7576_solr

Yes, collection2 does get created. But I see a problem - createNodeSet
parameter is not being honored. All 4 nodes are not being used to create
collection2, only 3 are being used. Is this a bug or I don't understand how
this parameter should be used?

What is the best way to create collection2? Can I specify both collections
in solr.xml in the solr home dir in all nodes and launch them? Do I have to
get the configs for collection2 uploaded to zookeeper before I launch the
nodes?

Thanks in advance.

-Shankar

-- 
Regards,
*Shankar Sundararaju
*Sr. Software Architect
ebrary, a ProQuest company
410 Cambridge Avenue, Palo Alto, CA 94306 USA
shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)

 


how to overrride pre and post tags when usefastVectorHighlighter is set to true

2013-02-22 Thread alxsss
Hello,

I was unable to change pre and post tags for highlighting when 
usefastVectorHighlighter is set to true. Changing default tags in 
solrconfig.xml works for standard highlighter though. I searched mailing list 
and the net with no success.
I use solr-4.1.0.

Thanks.
Alex.


Re: solr cloud index size is too big

2013-03-04 Thread alxsss
Hi,

It is the index folder. tlog is only a few MB.

I have analysed all changed and found out that only one field in schema was 
changed.

This field in non cloud
 

was changed to
 

 in cloud to use fastVectorHighlighting.

Is it possible that this change could double index size?

Thanks.
Alex.

 

 

-Original Message-
From: Jan Høydahl 
To: solr-user 
Sent: Mon, Mar 4, 2013 2:24 pm
Subject: Re: solr cloud index size is too big


Can you tell whether it's the "index" folder that is that large or is it 
including the "tlog" transaction log folder?
If you have a huge transaction log, you need to start sending hard commits more 
often during indexing to flush the tlogs.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

4. mars 2013 kl. 04:16 skrev alx...@aim.com:

> Hello,
> 
> I had a non cloud collection index size around 80G for 15M documents with 
solr-4.1.0. So, I decided to use solr cloud with two shards and sent to solr 
the 
following command
> 
> curl 
> 'http://slave:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=1'
> 
> I tried to put replicationFactor=0 but this command gave an error.  After 
reindexing, into two separate linux boxes with one instances of solr running in 
each of them I see that size of index in each shard is 90GB versus expected 
40GB 
although each of the shards has half (7.5M) of  documents.
> 
> Any ideas what went wrong?
> 
> Thanks.
> Alex.


 


spellchecker does not have suggestion for keywords typed through a non-whitespace delimiter

2013-03-12 Thread alxsss
Hello,

Recently we noticed that solr and its spellchecker do not return results  for 
keywords typed with non-whitespace delimiter.
A user accidentally typed u instead of white space. For example, paulusoles 
instead of paul soles. Solr does not return any results or spellcheck 
suggestion for keyword paulusoles, although it returns results for keywords 
"paul soles", paul, and soles.

search.yahoo.com  returns results for the  keyword paulusoles as if it was 
given keyword paul soles.

Any ideas how to implement this functionality in solr?

text and spell fields are as follows;

  










  









true
direct
true
true
2

This is solr -4.1.0 with cloud feature and index based dictionary.

Thanks.
Alex.


structure of solr index

2013-03-15 Thread alxsss
Hi,

I wondered if solr searches on indexed fields only or on entire index? In more 
detail, let say I have fields id,  title and content, all  indexed, stored. 
Will a search send all these fields to memory or only indexed part of these 
fields? 

Thanks.
Alex.




Re: structure of solr index

2013-03-16 Thread alxsss
Hi,

So, will search time be the same for the case when fields are indexed only vs  
the case when they are indexed and stored?

 

 Thanks.
Alex.

 

-Original Message-
From: Otis Gospodnetic 
To: solr-user 
Sent: Fri, Mar 15, 2013 8:09 pm
Subject: Re: structure of solr index


Hi,

I think you are asking if the original/raw content of those fields will be
read.  No, it won't, not for the search itself.  If you want to
retrieve/return those fields then, of course, they will be read for the
documents being returned.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, Mar 15, 2013 at 2:41 PM,  wrote:

> Hi,
>
> I wondered if solr searches on indexed fields only or on entire index? In
> more detail, let say I have fields id,  title and content, all  indexed,
> stored. Will a search send all these fields to memory or only indexed part
> of these fields?
>
> Thanks.
> Alex.
>
>
>

 


Re: structure of solr index

2013-03-18 Thread alxsss

 
---So,"search" time is in no way impacting by the existence or non-existence of 
stored values,




 What about memory? Would it require to increase memeory in order to have the 
same Qtime as in the case of indexed only fields?
For example in the case of indexed fields only index size is 5GB, average Qtime 
is 0.1 sec and memory is 10G. 
In case when the same fields are indexed and stored index size is 50GB. Will 
the Qtime be 0.1s + time for extracting of stored fields?

Another scenario is to store fields in hbase or cassandra, have only indexed 
fields in Solr and after getting id field from solr extract stored values from 
hbase or cassandra. Will this setup be faster than the  one with stored fields 
in Solr?

Thanks.
Alex.

 

-Original Message-
From: Jack Krupansky 
To: solr-user 
Sent: Sat, Mar 16, 2013 9:53 am
Subject: Re: structure of solr index


"Search" depends only on the "index". But... returning field values for each 
of the matched documents does require access to the "stored" values. So, 
"search" time is in no way impacting by the existence or non-existence of 
stored values, but total query processing time would of course include both 
search time and the time to access and format the stored field values.

-- Jack Krupansky

-Original Message- 
From: alx...@aim.com
Sent: Saturday, March 16, 2013 12:48 PM
To: solr-user@lucene.apache.org
Subject: Re: structure of solr index

Hi,

So, will search time be the same for the case when fields are indexed only 
vs  the case when they are indexed and stored?



Thanks.
Alex.



-Original Message-
From: Otis Gospodnetic 
To: solr-user 
Sent: Fri, Mar 15, 2013 8:09 pm
Subject: Re: structure of solr index


Hi,

I think you are asking if the original/raw content of those fields will be
read.  No, it won't, not for the search itself.  If you want to
retrieve/return those fields then, of course, they will be read for the
documents being returned.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, Mar 15, 2013 at 2:41 PM,  wrote:

> Hi,
>
> I wondered if solr searches on indexed fields only or on entire index? In
> more detail, let say I have fields id,  title and content, all  indexed,
> stored. Will a search send all these fields to memory or only indexed part
> of these fields?
>
> Thanks.
> Alex.
>
>
>



 


strange behaviour of wordbreak spellchecker in solr cloud

2013-03-18 Thread alxsss
Hello,

I try to use wordbreak spellchecker in solr-4.2 with cloud feature. We have two 
server with one shard in each of them.

curl 'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10'
curl 'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10'

does not return any results in spellchecker. However, if I specify 
distrib=false only one of these has spellchecker results.

curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&distrib=false'

no spellcheler results 

curl 
'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&distrib=false'
returns spellcheker results.


My testhandler and select handlers are as follows




edismax
explicit
0.01
host^30  content^0.5 title^1.2 
site^25 content^10 title^22
url,id,title

3<-1 5<-3 6<90%
1

true
content
regex
165
default


direct
wordbreak
on
true
false
2




 spellcheck





  

 
   explicit
   10
   
 







   
 spellcheck





is this a bug or something else has to be done?


Thanks.
Alex.



Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-19 Thread alxsss
Hello,

I was testing my custom testhandler. Direct spellchecker also was not working 
in cloud. After I added 

  
 spellcheck

to /select requestHandler it worked but the wordbreak spellchecker. I have 
added shards.qt=testhanlder to curl request but it did not solve the issue.

Thanks.
Alex.

 

 

 

-Original Message-
From: Dyer, James 
To: solr-user 
Sent: Tue, Mar 19, 2013 10:30 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


Mark,

I wasn't sure if Alex is actually testing /select, or if the problem is just 
coming up in /testhandler.  Just wanted to verify that before we get into bug 
reports.

DistributedSpellCheckComponentTest does have 1 little Word Break test scenario 
in it, so we know WordBreakSolrSpellChecker at least works some of the time in 
a 
Distributed environment :) .  Ideally, we should probably use a random test for 
stuff like this as adding a bunch of test scenarios would make this 
already-slower-than-molasses test even slower.  On the other hand, we want to 
test as many possibilities as we can.  Based on DSCCT and it being so 
superficial, I really can't vouch too much for my spell check enhancements 
working as well with shards as they do with a single index.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, March 19, 2013 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

My first thought too, but then I saw that he had the spell component in both 
his 
custom testhander and the /select handler, so I'd expect that to work as well.

- Mark

On Mar 19, 2013, at 12:18 PM, "Dyer, James"  
wrote:

> Can you try including in your request the "shards.qt" parameter?  In your 
case, I think you should set it to "testhandler".  See 
http://wiki.apache.org/solr/SpellCheckComponent?highlight=%28shards\.qt%29#Distributed_Search_Support
 
for a brief discussion.
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -Original Message-
> From: alx...@aim.com [mailto:alx...@aim.com] 
> Sent: Monday, March 18, 2013 4:07 PM
> To: solr-user@lucene.apache.org
> Subject: strange behaviour of wordbreak spellchecker in solr cloud
> 
> Hello,
> 
> I try to use wordbreak spellchecker in solr-4.2 with cloud feature. We have 
two server with one shard in each of them.
> 
> curl 'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10'
> curl 'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10'
> 
> does not return any results in spellchecker. However, if I specify 
distrib=false only one of these has spellchecker results.
> 
> curl 
> 'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&distrib=false'
> 
> no spellcheler results 
> 
> curl 
> 'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&distrib=false'
> returns spellcheker results.
> 
> 
> My testhandler and select handlers are as follows
> 
> 
> 
> 
> edismax
> explicit
> 0.01
> host^30  content^0.5 title^1.2 
> site^25 content^10 title^22
> url,id,title
> 
> 3<-1 5<-3 6<90%
> 1
> 
> true
> content
> regex
> 165
> default
> 
> 
> direct
> wordbreak
> on
> true
> false
> 2
> 
> 
> 
> 
> spellcheck
> 
> 
> 
> 
> 
>  
>
> 
>   explicit
>   10
>   
> 
>
>
>
>
>
>
>
>   
> spellcheck
>
>
> 
> 
> 
> is this a bug or something else has to be done?
> 
> 
> Thanks.
> Alex.
> 




 
 


Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-19 Thread alxsss
-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable 

Not sure what this is?

I have

 

spell





  direct
  spell
  solr.DirectSolrSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  wordbreak
  solr.WordBreakSolrSpellChecker
  spell
  true
  true
  10




 




  


spell filed in our schema is called spell and its type also is called spell.
Here are requests


 curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler&distrib=false'




  0
  32
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  






 curl 
'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler&distrib=false'




  0
  26
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  

  1
  0
  11
  
paul u soles   
  

(paul u soles)
  



No distrib param

curl 
'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler'




  0
  24
  
true
testhandler
paulusoles
false
10
  


  
0
0

  



  




curl 
'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10&shards.qt=testhandler'




  0
  24
  
true
testhandler
paulusoles
10
  


  
0
0

  



  




Thanks.
Alex.

---Original Message-

From: Dyer, James 
To: solr-user 
Sent: Tue, Mar 19, 2013 11:10 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


You may likely be hitting on a bug with WordBreakSolrSpellChecker in a 
distributed environment.  But to nail it down, we probably need to see both the 
applicable  section of your config and also this section: 
.  Also 
need an example of a query that succeeds non-distributed (with the exact query 
url and output you get) vs the same query url and output in the distributed 
scenario.  Then, without access to your actual index, it might be possible to 
come up with a failing unit test.  With a failing unit test in hand, we have a 
good shot at getting a fix.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com] 
Sent: Tuesday, March 19, 2013 12:39 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

I was testing my custom testhandler. Direct spellchecker also was not working 
in 
cloud. After I added 

  
 spellcheck

to /select requestHandler it worked but the wordbreak spellchecker. I have 
added 
shards.qt=testhanlder to curl request but it did not solve the issue.

Thanks.
Alex.

 

 

 

-Original Message-
From: Dyer, James 
To: solr-user 
Sent: Tue, Mar 19, 2013 10:30 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


Mark,

I wasn't sure if Alex is actually testing /select, or if the problem is just 
coming up in /testhandler.  Just wanted to verify that before we get into bug 
reports.

DistributedSpellCheckComponentTest does have 1 little Word Break test scenario 
in it, so we know WordBreakSolrSpellChecker at least works some of the time in 
a 

Distributed environment :) .  Ideally, we should probably use a random test for 
stuff like this as adding a bunch of test scenarios would make this 
already-slower-than-molasses test even slower.  On the other hand, we want to 
test as many possibilities as we can.  Based on DSCCT and it being so 
superficial, I really can't vouch too much for my spell check enhancements 
working as well with shards as they do with a single index.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, March 19, 2013 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

My first thought too, but then I saw that he had the spell component in both 
his 

custom testhander and the /select handler, so I'd expect that to work as well.

- Mark

On Mar 19, 2013, at 12:18 PM, "Dyer, James"  
wrote:

> Can you try including in your request the "shards.qt" parameter?  In your 
case, I think you should set it to "testhandler".  See 
http://wiki.apache.org/solr/SpellCheckComponent?highlight=%28shards\.qt%29#Distributed_Search_Support
 

for a brief discussion.
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -Original Message-
> From: alx...@aim.com [mailto:alx...@aim.com] 
> Sent: Monday, March 18, 2013 4:07 PM
> To: solr-user@lucene.apache.org
> Subject: strange behaviour of wordbreak spellchecker in solr cloud
> 
> Hello,
> 
> I try to use wordbreak spellchecker in solr-4.2 with cloud feature. We have 
two server with one shard i

Re: can solr automatically search for different punctuation of a word

2012-01-30 Thread alxsss

 Hi Chantal,

In the readme file at  solr/contrib/analysis-extras/README.txt it says to add 
the ICU library (in lib/)

Do I need also add ... and where?

Thanks.
Alex.

 

 

-Original Message-
From: Chantal Ackermann 
To: solr-user 
Sent: Fri, Jan 13, 2012 1:52 am
Subject: Re: can solr automatically search for different punctuation of a word


Hi Alex,



for me, ICUFoldingFilterFactory works very good. It does lowercasing and

removes diacritica (this is how umlauts and accenting of letters is

called - punctuation means comma, points etc.). It will work for any any

language, not only German. And it will also handle apostrophs as in

"C'est bien".



ICU requires additional libraries in the classpath. For an in-built solr

solution have a look at ASCIIFoldingFilterFactory.



http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory







Example configuration:















And dependencies (example for Maven) in addition to solr-core:



org.apache.lucene

lucene-icu

${solr.version}

runtime





org.apache.solr

solr-analysis-extras

${solr.version}

runtime





Cheers,

Chantal



On Fri, 2012-01-13 at 00:09 +0100, alx...@aim.com wrote:

> Hello,

> 

> I would like to know if solr has a functionality to automatically search for 
> a 

different punctuation of a word. 

> For example if I if a user searches for a word Uber, and stemmer is german 

lang, then solr looks for both Uber and  Über,  like in synonyms.

> 

> Is it possible to give a file with a list of possible substitutions of 
> letters 

to solr and have it search for all possible punctuations?

> 

> 

> Thanks.

> Alex.




 


Re: spellcheck configuration not providing suggestions or corrections

2012-02-13 Thread alxsss
you have put this

 true

Maybe you need to put 
true

 

 Alex.

 

-Original Message-
From: Dyer, James  
To: solr-user 
Sent: Mon, Feb 13, 2012 12:43 pm
Subject: RE: spellcheck configuration not providing suggestions or corrections


That would be it, I tbinkl.  Your request is to "/select", but you've put 
spellchecking into "/search".  Try "/search" instead.  Also, I doubt its the 
problem, but try removing the trailing CRLFs from your query.  Also, typically 
you'd still query against the main field ("itemDesc" in your case) and just use 
"itemDescSpell" from which to build your dictionary.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Monday, February 13, 2012 2:28 PM
To: solr-user@lucene.apache.org
Subject: RE: spellcheck configuration not providing suggestions or corrections

hello 

thank you for the suggestion - however this did not work.

i went in to solrconfig and change the count to 20 - then restarted the
server and then did a reimport.



is it possible that i am not firing the request handler that i think i am
firing ?


  


default

false

true

20
  explicit


  spellcheck

  


query sent to server:

http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemDescSpell%3Agusket%0D%0A&version=2.2&start=0&rows=10&indent=on&spellcheck=true&spellcheck.build=true

results:

00trueon0itemDescSpell:gusket
true102.2

--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-configuration-not-providing-suggestions-or-corrections-tp3740877p3741521.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


Re: Help with duplicate unique IDs

2012-03-02 Thread alxsss

 take a look to  
I think you must use dedup to solve this issue

 

 

-Original Message-
From: Thomas Dowling 
To: solr-user 
Cc: Mikhail Khludnev 
Sent: Fri, Mar 2, 2012 1:10 pm
Subject: Re: Help with duplicate unique IDs


Thanks.  In fact, the behavior I want is overwrite=true.  I want to be 
able to reindex documents, with the same id string, and automatically 
overwrite the previous version.


Thomas


On 03/02/2012 04:01 PM, Mikhail Khludnev wrote:
> Hello Tomas,
>
> I guess you could just specify overwrite=false
> http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22
>
>
> On Fri, Mar 2, 2012 at 11:23 PM, Thomas Dowlingwrote:
>
>> In a Solr index of journal articles, I thought I was safe reindexing
>> articles because their unique ID would cause the new record in the index to
>> overwrite the old one. (As stated at http://wiki.apache.org/solr/**
>> SchemaXml#The_Unique_Key_Field-
>>  
right?)
>>

 


data/index/segments_u (No such file or directory)

2012-03-19 Thread alxsss
Hello,

I have copied solr's data folder from dev linux box to prod one. When starting 
solr I get this error in prod server. In dev solr starts sucessfully. 

Caused by: java.io.FileNotFoundException: 
/home/apache-solr-3.5.0/example/solr/data/index/segments_u (No such file or 
directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:70)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:97)
at 
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:92)
at 
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:79)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:265)
at 
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:79)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:754)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:462)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:405)
at 
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1092)

There is no segments_u file or folder in the dev box.

Thanks in advenace.
Alex.



term frequency outweighs exact phrase match

2012-04-10 Thread alxsss
Hello,

I use solr 3.5 with edismax. I have the following issue with phrase search. For 
example if I have three documents with content like

1.apache apache
2. solr solr
3.apache solr

then search for apache solr displays documents in the order 1,.2,3 instead of 
3, 2, 1 because term frequency in the first and second documents is higher than 
in the third document. We want results be displayed in the order as  3,2,1 
since the third document has exact match.

My request handler is as follows.



edismax
explicit
0.01
host^30  content^0.5 title^1.2
host^30  content^20 title^22 
url,id, site ,title
2<-1 5<-2 6<90%
1
true
*:*
content
0
165
title
0
url
regex
true
true
5
true
site
true


 spellcheck



Any ideas how to fix this issue?

Thanks in advance.
Alex.


Re: term frequency outweighs exact phrase match

2012-04-12 Thread alxsss
In that case documents 1 and 2 will not be in the results. We need them also be 
shown in the results but be ranked after those docs with exact match.
I think omitting term frequency in calculating ranking in phrase queries will 
solve this issue, but I do not see that such a parameter in configs.
I see omitTermFreqAndPositions="true" but not sure if it is the setting I need, 
because its description is too vague.

Thanks.
Alex.


 

 

 

-Original Message-
From: Erick Erickson 
To: solr-user 
Sent: Wed, Apr 11, 2012 8:23 am
Subject: Re: term frequency outweighs exact phrase match


Consider boosting on phrase with a SHOULD clause, something
like field:"apache solr"^2..

Best
Erick


On Tue, Apr 10, 2012 at 12:46 PM,   wrote:
> Hello,
>
> I use solr 3.5 with edismax. I have the following issue with phrase search. 
For example if I have three documents with content like
>
> 1.apache apache
> 2. solr solr
> 3.apache solr
>
> then search for apache solr displays documents in the order 1,.2,3 instead of 
3, 2, 1 because term frequency in the first and second documents is higher than 
in the third document. We want results be displayed in the order as  3,2,1 
since 
the third document has exact match.
>
> My request handler is as follows.
>
> 
> 
> edismax
> explicit
> 0.01
> host^30  content^0.5 title^1.2
> host^30  content^20 title^22 
> url,id, site ,title
> 2<-1 5<-2 6<90%
> 1
> true
> *:*
> content
> 0
> 165
> title
> 0
> url
> regex
> true
> true
> 5
> true
> site
> true
> 
> 
>  spellcheck
> 
> 
>
> Any ideas how to fix this issue?
>
> Thanks in advance.
> Alex.

 


Re: term frequency outweighs exact phrase match

2012-04-13 Thread alxsss
Hello Hoss,

Here are the explain tags for two doc


0.021646015 = (MATCH) sum of:
  0.021646015 = (MATCH) sum of:
0.02141003 = (MATCH) max plus 0.01 times others of:
  2.84194E-4 = (MATCH) weight(content:apache^0.5 in 3578), product of:
0.0029881175 = queryWeight(content:apache^0.5), product of:
  0.5 = boost
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.0013721307 = queryNorm
0.09510804 = (MATCH) fieldWeight(content:apache in 3578), product of:
  2.236068 = tf(termFreq(content:apache)=5)
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.009765625 = fieldNorm(field=content, doc=3578)
  0.021407187 = (MATCH) weight(title:apache^1.2 in 3578), product of:
0.01371095 = queryWeight(title:apache^1.2), product of:
  1.2 = boost
  8.327043 = idf(docFreq=2375, maxDocs=3613605)
  0.0013721307 = queryNorm
1.5613205 = (MATCH) fieldWeight(title:apache in 3578), product of:
  1.0 = tf(termFreq(title:apache)=1)
  8.327043 = idf(docFreq=2375, maxDocs=3613605)
  0.1875 = fieldNorm(field=title, doc=3578)
2.359865E-4 = (MATCH) max plus 0.01 times others of:
  2.359865E-4 = (MATCH) weight(content:solr^0.5 in 3578), product of:
0.004071705 = queryWeight(content:solr^0.5), product of:
  0.5 = boost
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.0013721307 = queryNorm
0.05795766 = (MATCH) fieldWeight(content:solr in 3578), product of:
  1.0 = tf(termFreq(content:solr)=1)
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.009765625 = fieldNorm(field=content, doc=3578)

0.021465056 = (MATCH) sum of:
  1.8154096E-4 = (MATCH) sum of:
6.354771E-5 = (MATCH) max plus 0.01 times others of:
  6.354771E-5 = (MATCH) weight(content:apache^0.5 in 638040), product of:
0.0029881175 = queryWeight(content:apache^0.5), product of:
  0.5 = boost
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.0013721307 = queryNorm
0.021266805 = (MATCH) fieldWeight(content:apache in 638040), product of:
  1.0 = tf(termFreq(content:apache)=1)
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.0048828125 = fieldNorm(field=content, doc=638040)
1.1799325E-4 = (MATCH) max plus 0.01 times others of:
  1.1799325E-4 = (MATCH) weight(content:solr^0.5 in 638040), product of:
0.004071705 = queryWeight(content:solr^0.5), product of:
  0.5 = boost
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.0013721307 = queryNorm
0.02897883 = (MATCH) fieldWeight(content:solr in 638040), product of:
  1.0 = tf(termFreq(content:solr)=1)
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.0048828125 = fieldNorm(field=content, doc=638040)
  0.021283515 = (MATCH) weight(content:"apache solr"~1^30.0 in 638040), product 
of:
0.42358932 = queryWeight(content:"apache solr"~1^30.0), product of:
  30.0 = boost
  10.290306 = idf(content: apache=126092 solr=25986)
  0.0013721307 = queryNorm
0.050245635 = fieldWeight(content:"apache solr" in 638040), product of:
  1.0 = tf(phraseFreq=1.0)
  10.290306 = idf(content: apache=126092 solr=25986)
  0.0048828125 = fieldNorm(field=content, doc=638040)


 

 

 Although the second doc has exact match it is placed after the first one which 
does not have exact match.

I use the following request handler



edismax
explicit
0.01
host^30  content^0.5 title^1.2 anchor^1.2
content^30
url,id, site ,title
2<-1 5<-2 6<90%
1
true
*:*
content
0
165
title
0
url
regex
true
true
5
true
site
true


 spellcheck




and the query is as follows 

http://localhost:8983/solr/select/?q=apache 
solr&version=2.2&start=0&rows=10&indent=on&qt=search&debugQuery=true

Thanks.
Alex.


-Original Message-
From: Chris Hostetter 
To: solr-user 
Sent: Thu, Apr 12, 2012 7:43 pm
Subject: Re: term frequency outweighs exact phrase match



: I use solr 3.5 with edismax. I have the following issue with phrase 
: search. For example if I have three documents with content like
: 
: 1.apache apache
: 2. solr solr
: 3.apache solr
: 
: then search for apache solr displays documents in the order 1,.2,3 
: instead of 3, 2, 1 because term frequency in the first and second 
: documents is higher than in the third document. We want results be 
: displayed in the order as 3,2,1 since the third document has exact 
: match.

you need to give us a lot more info, like what other data is in the 
various fields for those documents, exactly what your query URL looks 
like, and what debugQuery=true gives you back in terms of score 
explanations ofr each document, because if that sample content is the only 
thing you've got indexed (even if it's in multiple fields), then documents 
#1 and #2 shouldn't even match your query using the mm you've specified...

: 2<-1 5<-2 6<90%

...because 

Re: Removing old documents

2012-05-01 Thread alxsss
Hello,

I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/

without and with -noCommit  and restarted solr server

Log  shows that 5 documents were removed but they are still in the search 
results.
Is this a bug or something is missing?
I use nutch-1.4 and solr 3.5

Thanks.
Alex. 

 

 

 

-Original Message-
From: Markus Jelsma 
To: solr-user 
Sent: Tue, May 1, 2012 7:41 am
Subject: Re: Removing old documents


Nutch 1.4 has a separate tool to remove 404 and redirects documents from your 
index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents 
in one run based on segment data.

On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
> I'm running Nutch, so it's updating the documents, but I'm wanting to
> remove ones that are no longer available.  So in that case, there's no
> update possible.
> 
> On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk <
> 
> mav.p...@holidaylettings.co.uk> wrote:
> > Not sure if there is an automatic way but we do it via a delete query and
> > where possible we update doc under same id to avoid deletes.
> > 
> > On 01/05/2012 13:43, "Bai Shen"  wrote:
> > >What is the best method to remove old documents?  Things that no
> > >generate 404 errors, etc.
> > >
> > >Is there an automatic method or do I have to do it manually?
> > >
> > >THanks.

-- 
Markus Jelsma - CTO - Openindex

 


Re: Removing old documents

2012-05-01 Thread alxsss

 

 all caching is disabled and I restarted jetty. The same results.

Thanks.
Alex.

 

-Original Message-
From: Lance Norskog 
To: solr-user 
Sent: Tue, May 1, 2012 2:57 pm
Subject: Re: Removing old documents


Maybe this is the HTTP caching feature? Solr comes with HTTP caching
turned on by default and so when you do queries and changes your
browser does not fetch your changed documents.

On Tue, May 1, 2012 at 11:53 AM,   wrote:
> Hello,
>
> I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/
>
> without and with -noCommit  and restarted solr server
>
> Log  shows that 5 documents were removed but they are still in the search 
results.
> Is this a bug or something is missing?
> I use nutch-1.4 and solr 3.5
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
> -Original Message-
> From: Markus Jelsma 
> To: solr-user 
> Sent: Tue, May 1, 2012 7:41 am
> Subject: Re: Removing old documents
>
>
> Nutch 1.4 has a separate tool to remove 404 and redirects documents from your
> index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents
> in one run based on segment data.
>
> On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
>> I'm running Nutch, so it's updating the documents, but I'm wanting to
>> remove ones that are no longer available.  So in that case, there's no
>> update possible.
>>
>> On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk <
>>
>> mav.p...@holidaylettings.co.uk> wrote:
>> > Not sure if there is an automatic way but we do it via a delete query and
>> > where possible we update doc under same id to avoid deletes.
>> >
>> > On 01/05/2012 13:43, "Bai Shen"  wrote:
>> > >What is the best method to remove old documents?  Things that no
>> > >generate 404 errors, etc.
>> > >
>> > >Is there an automatic method or do I have to do it manually?
>> > >
>> > >THanks.
>
> --
> Markus Jelsma - CTO - Openindex
>
>



-- 
Lance Norskog
goks...@gmail.com

 


Re: Removing old documents

2012-05-02 Thread alxsss

 

 I use jetty that comes with solr. 
I use solr's dedupe


   
 true
 id
 true
 url
 solr.processor.Lookup3Signature
   
   
   
 


and because of this id is not url itself but its encoded signature.

I see solrclean uses url to delete a document.

Is it possible that the issue is because of this mismatch?


Thanks.
Alex.


 

-Original Message-
From: Paul Libbrecht 
To: solr-user 
Sent: Tue, May 1, 2012 11:43 pm
Subject: Re: Removing old documents


With which client?

paul


Le 2 mai 2012 à 01:29, alx...@aim.com a écrit :

> all caching is disabled and I restarted jetty. The same results.


 


Re: Broken pipe error

2012-07-03 Thread alxsss
I had the same problem with jetty. It turned out that broken pipe happens  when 
application disconnects from jetty. In my case I was using php client and it 
had 10 sec restriction in curl request. When solr takes more than 10 sec to 
respond, curl automatically disconnected from jetty.

Hope this can help.

Alex.



-Original Message-
From: Jason 
To: solr-user 
Sent: Mon, Jul 2, 2012 7:41 pm
Subject: Broken pipe error


Hi, all

We're independently running three search servers.
One of three servers has bigger index size and more connection users than
the others.
Except that, all configurations are same.
Problem is that server sometimes occurs broken pipe error.
But I don't know what problem is.
Please give some ideas.
Thanks in advance.
Jason


error message below...
===
2012-07-03 10:42:56,753 [http-8080-exec-3677] ERROR
org.apache.solr.servlet.SolrDispatchFilter - null:ClientAbortException: 
java.io.IOException: Broken pipe
at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:358)
at
org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:432)
at
org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:309)
at
org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:288)
at
org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:98)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:115)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:402)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:279)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:470)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
at
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:732)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2262)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
at sun.nio.ch.IOUtil.write(IOUtil.java:40)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:116)
at
org.apache.tomcat.util.net.NioBlockingSelector.write(NioBlockingSelector.java:93)
at
org.apache.tomcat.util.net.NioSelectorPool.write(NioSelectorPool.java:156)
at
org.apache.coyote.http11.InternalNioOutputBuffer.writeToSocket(InternalNioOutputBuffer.java:460)
at
org.apache.coyote.http11.InternalNioOutputBuffer.flushBuffer(InternalNioOutputBuffer.java:804)
at
org.apache.coyote.http11.InternalNioOutputBuffer.addToBB(InternalNioOutputBuffer.java:644)
at
org.apache.coyote.http11.InternalNioOutputBuffer.access$000(InternalNioOutputBuffer.java:46)
at
org.apache.coyote.http11.InternalNioOutputBuffer$SocketOutputBuffer.doWrite(InternalNioOutputBuffer.java:829)
at
org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:126)
at
org.apache.coyote.http11.InternalNioOutputBuffer.doWrite(InternalNioOutputBuffer.java:610)
at org.apache.coyote.Response.doWrite(Response.java:560)
at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:353)
... 25 more

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Broken-pipe-error-tp3992667.html
Sent from the Solr - User mailing list

Re: Grouping performance problem

2012-07-16 Thread alxsss



Re: Grouping performance problem

2012-07-16 Thread alxsss
What are the RAM of your server and size of the data folder?



-Original Message-
From: Agnieszka Kukałowicz 
To: solr-user 
Sent: Mon, Jul 16, 2012 6:16 am
Subject: Re: Grouping performance problem


Hi Pavel,

I tried with group.ngroups=false but didn't notice a big improvement.
The times were still about 4000 ms. It doesn't solve my problem.
Maybe this is because of my index type. I have millions of documents but
only about 20 000 groups.

 Cheers
 Agnieszka

2012/7/16 Pavel Goncharik 

> Hi Agnieszka ,
>
> if you don't need number of groups, you can try leaving out
> group.ngroups=true param.
> In this case Solr apparently skips calculating all groups and delivers
> results much faster.
> At least for our application the difference in performance
> with/without group.ngroups=true is significant (have to say, we use
> Solr 3.6).
>
> WBR,
> Pavel
>
> On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
>  wrote:
> > Hi,
> >
> > Is the any way to make grouping searches more efficient?
> >
> > My queries look like:
> >
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> >
> > For index with 3 mln documents query for all docs with group=true takes
> > almost 4000ms. Because queryResultCache is not used next queries take a
> > long time also.
> >
> > When I remove group=true and leave only faceting the query for all docs
> > takes much more less time: for first time ~ 700ms and next runs only
> 200ms
> > because of queryResultCache being used.
> >
> > So with group=true the query is about 20 time slower than without it.
> > Is it possible or is there any way to improve performance with grouping?
> >
> > My application needs grouping feature and all of the queries use it but
> the
> > performance of them is to low for production use.
> >
> > I use Solr 4.x from trunk
> >
> > Agnieszka Kukalowicz
>

 


Re: Grouping performance problem

2012-07-16 Thread alxsss
This is strange. We have data folder size 24Gb,  RAM for java 2GB. We query 
with grouping, ngroups and  highlighting, do not query all fields and query 
time mostly is less than 1 sec it rarely goes up to 2 sec. We use solr 3.6 and 
tuned off all kind of caching.
Maybe your problem is with caching and displaying all fields?

Hope this may help.

Alex.



-Original Message-
From: Agnieszka Kukałowicz 
To: solr-user 
Sent: Mon, Jul 16, 2012 10:04 am
Subject: Re: Grouping performance problem


I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB
RAM for java:
JAVA_OPTIONS="-server -Xms4096M -Xmx4096M"
The size is about 15GB for one shard (i use ssd disk for index data).

Agnieszka


2012/7/16 

> What are the RAM of your server and size of the data folder?
>
>
>
> -Original Message-
> From: Agnieszka Kukałowicz 
> To: solr-user 
> Sent: Mon, Jul 16, 2012 6:16 am
> Subject: Re: Grouping performance problem
>
>
> Hi Pavel,
>
> I tried with group.ngroups=false but didn't notice a big improvement.
> The times were still about 4000 ms. It doesn't solve my problem.
> Maybe this is because of my index type. I have millions of documents but
> only about 20 000 groups.
>
>  Cheers
>  Agnieszka
>
> 2012/7/16 Pavel Goncharik 
>
> > Hi Agnieszka ,
> >
> > if you don't need number of groups, you can try leaving out
> > group.ngroups=true param.
> > In this case Solr apparently skips calculating all groups and delivers
> > results much faster.
> > At least for our application the difference in performance
> > with/without group.ngroups=true is significant (have to say, we use
> > Solr 3.6).
> >
> > WBR,
> > Pavel
> >
> > On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
> >  wrote:
> > > Hi,
> > >
> > > Is the any way to make grouping searches more efficient?
> > >
> > > My queries look like:
> > >
> >
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> > >
> > > For index with 3 mln documents query for all docs with group=true takes
> > > almost 4000ms. Because queryResultCache is not used next queries take a
> > > long time also.
> > >
> > > When I remove group=true and leave only faceting the query for all docs
> > > takes much more less time: for first time ~ 700ms and next runs only
> > 200ms
> > > because of queryResultCache being used.
> > >
> > > So with group=true the query is about 20 time slower than without it.
> > > Is it possible or is there any way to improve performance with
> grouping?
> > >
> > > My application needs grouping feature and all of the queries use it but
> > the
> > > performance of them is to low for production use.
> > >
> > > I use Solr 4.x from trunk
> > >
> > > Agnieszka Kukalowicz
> >
>
>
>

 


how to manually add data to indexes generated by nutch-1.0 using solr

2009-05-11 Thread alxsss
Hello,

I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I needed to?

index a few files also. But I know keywords for those files and their?
locations. I need to add them manually. I took a look to two tutorials on the 
wiki, but did not find any info about this issue.
Is there a tutorial on, step by step procedure of adding data to? nutch index 
using solr? manually?

Thanks in advance.
Alex.


Re: how to manually add data to indexes generated by nutch-1.0 using solr

2009-05-12 Thread alxsss

 Tried to add a new record using



 curl http://localhost:8983/solr/update -H "Content-Type: text/xml" 
--data-binary '

20090512170318
86937aaee8e748ac3007ed8b66477624
0.21189615
test.com
test test
 20090513003210909
 '

I get



071



and added records are not found in the search.

Any ideas what went wrong?


Thanks.
Alex.


 

-Original Message-
From: alx...@aim.com
To: solr-user@lucene.apache.org
Sent: Mon, 11 May 2009 12:14 pm
Subject: how to manually add data to indexes generated by nutch-1.0 using solr










Hello,

I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I needed to?

index a few files also. But I know keywords for those files and their?
locations. I need to add them manually. I took a look to two tutorials on the 
wiki, but did not find any info about this issue.
Is there a tutorial on, step by step procedure of adding data to? nutch index 
using solr? manually?

Thanks in advance.
Alex.



 



Re: how to manually add data to indexes generated by nutch-1.0 using solr

2009-05-13 Thread alxsss

 I forget to say that when I do 

curl http://localhost:8983/solr/update -H "Content-Type: text/xml" 
--data-binary ''


0453



and search for added keywords gives 0 results. Does status 0 mean that addition 
was successful?

Thanks.
Alex.


 


 

-Original Message-
From: Erik Hatcher 
To: solr-user@lucene.apache.org
Sent: Tue, 12 May 2009 6:48 pm
Subject: Re: how to manually add data to indexes generated by nutch-1.0 using 
solr









send a  request afterwards, or you can add ?commit=true to the /update 
request with the adds.?
?

?  Erik?
?

On May 12, 2009, at 8:57 PM, alx...@aim.com wrote:?
?

>?

> Tried to add a new record using?

>?

>?

>?

> curl http://localhost:8983/solr/update -H "Content-Type: text/xml" --> 
> data-binary '?

> ?

> 20090512170318?

> 86937aaee8e748ac3007ed8b66477624?

> 0.21189615?

> test.com?

> test test?

>  20090513003210909?

>  '?

>?

> I get?

>?

> ?

> ?

> 0 
> name="QTime">71?

> ?

>?

>?

> and added records are not found in the search.?

>?

> Any ideas what went wrong??

>?

>?

> Thanks.?

> Alex.?

>?

>?

>?

>?

> -Original Message-?

> From: alx...@aim.com?

> To: solr-u...@lucene.apache.org?

> Sent: Mon, 11 May 2009 12:14 pm?

> Subject: how to manually add data to indexes generated by nutch-1.0 > using 
> solr?

>?

>?

>?

>?

>?

>?

>?

>?

>?

>?

> Hello,?

>?

> I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I > needed 
> to??

>?

> index a few files also. But I know keywords for those files and their??

> locations. I need to add them manually. I took a look to two > tutorials on 
> the?

> wiki, but did not find any info about this issue.?

> Is there a tutorial on, step by step procedure of adding data to? > nutch 
> index?

> using solr? manually??

>?

> Thanks in advance.?

> Alex.?

>?

>?

>?

>?

>?
?