search on default field returns less documents

2012-08-08 Thread Shalom
Hi All
we have two fields:





'text' is our default field:

text

we copy the doc field to the 'text' field



when indexing 10 documents that have a value with same prefix in the doc
field, for example: ca067-XXX ,and searching on the default field I get only
5 results, I search for ca067 on the default field.
when searching ca067 on the 'doc' field I get the expected 10 results.

anyone has an idea what is wrong here ?

Thank you










--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-on-default-field-returns-less-documents-tp3999896.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search on default field returns less documents

2012-08-09 Thread Shalom
Jack, Thanks for your reply.

We are using solr 3.4.

We use the standard lucene query parser.

I added debugQuery=true , this is the result when searching ca067 and
getting 5 documents:

ca067ca067PhraseQuery(text:"ca
067")text:"ca 067"
0.1108914 = (MATCH) weight(text:"ca 067" in 75), product of:
  1.0 = queryWeight(text:"ca 067"), product of:
5.67764 = idf(text: ca=16 067=9)
0.17612952 = queryNorm
  0.1108914 = fieldWeight(text:"ca 067" in 75), product of:
1.0 = tf(phraseFreq=1.0)
5.67764 = idf(text: ca=16 067=9)
0.01953125 = fieldNorm(field=text, doc=75)

0.088713124 = (MATCH) weight(text:"ca 067" in 71), product of:
  1.0 = queryWeight(text:"ca 067"), product of:
5.67764 = idf(text: ca=16 067=9)
0.17612952 = queryNorm
  0.088713124 = fieldWeight(text:"ca 067" in 71), product of:
1.0 = tf(phraseFreq=1.0)
5.67764 = idf(text: ca=16 067=9)
0.015625 = fieldNorm(field=text, doc=71)

0.088713124 = (MATCH) weight(text:"ca 067" in 72), product of:
  1.0 = queryWeight(text:"ca 067"), product of:
5.67764 = idf(text: ca=16 067=9)
0.17612952 = queryNorm
  0.088713124 = fieldWeight(text:"ca 067" in 72), product of:
1.0 = tf(phraseFreq=1.0)
5.67764 = idf(text: ca=16 067=9)
0.015625 = fieldNorm(field=text, doc=72)

0.06653485 = (MATCH) weight(text:"ca 067" in 74), product of:
  1.0 = queryWeight(text:"ca 067"), product of:
5.67764 = idf(text: ca=16 067=9)
0.17612952 = queryNorm
  0.06653485 = fieldWeight(text:"ca 067" in 74), product of:
1.0 = tf(phraseFreq=1.0)
5.67764 = idf(text: ca=16 067=9)
0.01171875 = fieldNorm(field=text, doc=74)

0.0554457 = (MATCH) weight(text:"ca 067" in 73), product of:
  1.0 = queryWeight(text:"ca 067"), product of:
5.67764 = idf(text: ca=16 067=9)
0.17612952 = queryNorm
  0.0554457 = fieldWeight(text:"ca 067" in 73), product of:
1.0 = tf(phraseFreq=1.0)
5.67764 = idf(text: ca=16 067=9)
0.009765625 = fieldNorm(field=text, doc=73)



this is the result when searching doc:ca067 and getting 10 documents:

doc:ca067doc:ca067PhraseQuery(doc:"ca 067")doc:"ca 067"
1.8805147 = (MATCH) weight(doc:"ca 067" in 71), product of:
  0.9994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
  1.8805149 = fieldWeight(doc:"ca 067" in 71), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=71)

1.8805147 = (MATCH) weight(doc:"ca 067" in 72), product of:
  0.9994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
  1.8805149 = fieldWeight(doc:"ca 067" in 72), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=72)

1.8805147 = (MATCH) weight(doc:"ca 067" in 73), product of:
  0.9994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
  1.8805149 = fieldWeight(doc:"ca 067" in 73), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=73)

1.8805147 = (MATCH) weight(doc:"ca 067" in 74), product of:
  0.9994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
  1.8805149 = fieldWeight(doc:"ca 067" in 74), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=74)

1.8805147 = (MATCH) weight(doc:"ca 067" in 75), product of:
  0.9994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
  1.8805149 = fieldWeight(doc:"ca 067" in 75), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=75)

1.8805147 = (MATCH) weight(doc:"ca 067" in 76), product of:
  0.9994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
  1.8805149 = fieldWeight(doc:"ca 067" in 76), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=76)

1.8805147 = (MATCH) weight(doc:"ca 067" in 77), product of:
  0.9994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
  1.8805149 = fieldWeight(doc:"ca 067" in 77), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=77)

1.8805147 = (MATCH) weight(doc:"ca 067" in 78), product of:
  0.9994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
  1.8805149 = fieldWeight(doc:"ca 067" in 78), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=78)

1.8805147 = (MATCH) weight(doc:"ca 067" in 79), product of:
  0.9994 = queryWeight(doc

Re: search on default field returns less documents

2012-08-09 Thread Shalom
Thanks Jack.

our schema version is 1.3


 we are using the official solr 3.4 release. actually we use maven to
download solr war and artifacts

org.apache.solr
solr
3.4.0
war


No, I did not modify the schema at anytime, all documents where indexed with
the same schema.


Yes, we have additional copyFields into the text field. usually none of them
will contain the same text as the document name, its mostly owner
information.

to make the picture clearer:
we are indexing text documents, every document has a db row, and the content
on disk space.
we index the db with DataImportHandler. among other columns we index the
document name which is our 'doc' column, another field 'docname' which is
the document display name,usually the same as 'doc', and we also index the
document content in 'content' field (the content is indexed in the same
DataImportHandler process).
we copy 'doc' and 'content' into the 'text' field plus some other fields
usually owner information like email address etc. it may be that the content
contains the document name or parts of it.

Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-on-default-field-returns-less-documents-tp3999896p4000238.html
Sent from the Solr - User mailing list archive at Nabble.com.


single node causing cluster-wide outage

2014-03-12 Thread Avishai Ish-Shalom
Hi all!

After upgrading to Solr 4.6.1 we encountered a situation where a cluster
outage was traced to a single node misbehaving, after restarting the node
the cluster immediately returned to normal operation.
The bad node had ~420 threads locked on FastLRUCache and most
httpshardexecutor threads were waiting on apache commons http futures.

Has anyone encountered such a situation? what can we do to prevent
misbehaving nodes from bringing down the entire cluster?

Cheers,
Avishai


Re: single node causing cluster-wide outage

2014-03-13 Thread Avishai Ish-Shalom
a little more information: it seems the issue is happening after we get
OutOfMemory error on facet query.


On Wed, Mar 12, 2014 at 11:06 PM, Avishai Ish-Shalom
wrote:

> Hi all!
>
> After upgrading to Solr 4.6.1 we encountered a situation where a cluster
> outage was traced to a single node misbehaving, after restarting the node
> the cluster immediately returned to normal operation.
> The bad node had ~420 threads locked on FastLRUCache and most
> httpshardexecutor threads were waiting on apache commons http futures.
>
> Has anyone encountered such a situation? what can we do to prevent
> misbehaving nodes from bringing down the entire cluster?
>
> Cheers,
> Avishai
>


Solr memory usage off-heap

2014-03-18 Thread Avishai Ish-Shalom
Hi,

My solr instances are configured with 10GB heap (Xmx) but linux shows
resident size of 16-20GB. even with thread stack and permgen taken into
account i'm still far off from these numbers. Could it be that jvm IO
buffers take so much space? does lucene use JNI/JNA memory allocations?


Re: Solr memory usage off-heap

2014-03-18 Thread Avishai Ish-Shalom
aha! mmap explains it. thank you.


On Tue, Mar 18, 2014 at 3:11 PM, Shawn Heisey  wrote:

> On 3/18/2014 5:30 AM, Avishai Ish-Shalom wrote:
> > My solr instances are configured with 10GB heap (Xmx) but linux shows
> > resident size of 16-20GB. even with thread stack and permgen taken into
> > account i'm still far off from these numbers. Could it be that jvm IO
> > buffers take so much space? does lucene use JNI/JNA memory allocations?
>
> Solr does not do anything off-heap.  There is a project called
> heliosearch underway that aims to use off-heap memory extensively with
> Solr.
>
> There IS some mis-reporting of memory usage, though.  See a screenshot
> that I just captured of top output, sorted by memory usage.  The java
> process at the top of the list is Solr, running under the included Jetty:
>
> https://www.dropbox.com/s/03a3pp510mrtixo/solr-ram-usage-wrong.png
>
> I have a 6GB heap and 52GB of index data on this server.  This makes the
> 62.2GB virtual memory size completely reasonable.  The claimed resident
> memory size is 20GB, though.  If you add that 20GB to the 49GB that is
> allocated to the OS disk cache and the 6GB that it says is free, that's
> 75GB.  I've only got 64GB of RAM on the box, so something is being
> reported wrong.
>
> If I take my 20GB resident size and subtract the 14GB shared size, that
> is closer to reality, and it makes the numbers fit into the actual
> amount of RAM that's on the machine.  I believe the misreporting is
> caused by the specific way that Java uses MMap when opening Lucene
> indexes.  This information comes from what I remember about a
> conversation I witnessed in #lucene or #lucene-dev, not from my own
> exploration.  I believe they said that the MMap methods which don't
> misreport memory usage would not do what Lucene requires.
>
> Thanks,
> Shawn
>
>


Re: Solr memory usage off-heap

2014-03-20 Thread Avishai Ish-Shalom
thanks!


On Tue, Mar 18, 2014 at 4:37 PM, Erick Erickson wrote:

> Avishai:
>
> It sounds like you already understand mmap. Even so you might be
> interested in this excellent writeup of MMapDirectory and Lucene by
> Uwe:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Best,
> Erick
>
> On Tue, Mar 18, 2014 at 7:23 AM, Avishai Ish-Shalom
>  wrote:
> > aha! mmap explains it. thank you.
> >
> >
> > On Tue, Mar 18, 2014 at 3:11 PM, Shawn Heisey  wrote:
> >
> >> On 3/18/2014 5:30 AM, Avishai Ish-Shalom wrote:
> >> > My solr instances are configured with 10GB heap (Xmx) but linux shows
> >> > resident size of 16-20GB. even with thread stack and permgen taken
> into
> >> > account i'm still far off from these numbers. Could it be that jvm IO
> >> > buffers take so much space? does lucene use JNI/JNA memory
> allocations?
> >>
> >> Solr does not do anything off-heap.  There is a project called
> >> heliosearch underway that aims to use off-heap memory extensively with
> >> Solr.
> >>
> >> There IS some mis-reporting of memory usage, though.  See a screenshot
> >> that I just captured of top output, sorted by memory usage.  The java
> >> process at the top of the list is Solr, running under the included
> Jetty:
> >>
> >> https://www.dropbox.com/s/03a3pp510mrtixo/solr-ram-usage-wrong.png
> >>
> >> I have a 6GB heap and 52GB of index data on this server.  This makes the
> >> 62.2GB virtual memory size completely reasonable.  The claimed resident
> >> memory size is 20GB, though.  If you add that 20GB to the 49GB that is
> >> allocated to the OS disk cache and the 6GB that it says is free, that's
> >> 75GB.  I've only got 64GB of RAM on the box, so something is being
> >> reported wrong.
> >>
> >> If I take my 20GB resident size and subtract the 14GB shared size, that
> >> is closer to reality, and it makes the numbers fit into the actual
> >> amount of RAM that's on the machine.  I believe the misreporting is
> >> caused by the specific way that Java uses MMap when opening Lucene
> >> indexes.  This information comes from what I remember about a
> >> conversation I witnessed in #lucene or #lucene-dev, not from my own
> >> exploration.  I believe they said that the MMap methods which don't
> >> misreport memory usage would not do what Lucene requires.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


hung threads and CLOSE_WAIT sockets

2014-03-06 Thread Avishai Ish-Shalom
Hi,

We've had a strange mishap with a solr cloud cluster (version 4.5.1) where
we observed high search latency. The problem appears to develop over
several hours until such point where the entire cluster stopped responding
properly.

After investigation we found that the number of threads (both solr and
jetty) gradually rose over several hours until it hit a the maximum allowed
at which point the cluster stopped responding properly. After restarting
several nodes the number of threads dropped and the cluster started
responding again.
We've examined nodes that were not restarted and found a high number of
CLOSE_WAIT sockets held by the solr process; these sockets were using a
random local port and 8983 remote port - meaning they were outgoing
connections. a thread dump did not show a large number of solr threads and
we were unable to determine which thread(s) is holding these sockets.

has anyone else encountered such a situation?

Regards,
Avishai


Re: hung threads and CLOSE_WAIT sockets

2014-03-07 Thread Avishai Ish-Shalom
SOLR-5216 ?


On Fri, Mar 7, 2014 at 12:13 AM, Mark Miller  wrote:

> It sounds like the distributed update deadlock issue.
>
> It's fixed in 4.6.1 and 4.7.
>
> - Mark
>
> http://about.me/markrmiller
>
> On Mar 6, 2014, at 3:10 PM, Avishai Ish-Shalom 
> wrote:
>
> > Hi,
> >
> > We've had a strange mishap with a solr cloud cluster (version 4.5.1)
> where
> > we observed high search latency. The problem appears to develop over
> > several hours until such point where the entire cluster stopped
> responding
> > properly.
> >
> > After investigation we found that the number of threads (both solr and
> > jetty) gradually rose over several hours until it hit a the maximum
> allowed
> > at which point the cluster stopped responding properly. After restarting
> > several nodes the number of threads dropped and the cluster started
> > responding again.
> > We've examined nodes that were not restarted and found a high number of
> > CLOSE_WAIT sockets held by the solr process; these sockets were using a
> > random local port and 8983 remote port - meaning they were outgoing
> > connections. a thread dump did not show a large number of solr threads
> and
> > we were unable to determine which thread(s) is holding these sockets.
> >
> > has anyone else encountered such a situation?
> >
> > Regards,
> > Avishai
>
>


Large fields storage

2014-12-01 Thread Avishai Ish-Shalom
Hi all,

I have very large documents (as big as 1GB) which i'm indexing and planning
to store in Solr in order to use highlighting snippets. I am concerned
about possible performance issues with such large fields - does storing the
fields require additional RAM over what is required to index/fetch/search?
I'm assuming Solr reads only the required data by offset from the storage
and not the entire field. Am I correct in this assumption?

Does anyone on this list has experience to share with such large documents?

Thanks,
Avishai


Re: Large fields storage

2014-12-04 Thread Avishai Ish-Shalom
The use case is not for pdf or documents with images but very large text
documents. My question is does storing the documents degrade performance
more then just indexing without storing? i will only return highlighted
text of limited length and probably never download the entire document.

On Tue, Dec 2, 2014 at 2:15 AM, Jack Krupansky 
wrote:

> In particular, if they are image-intensive, all the images go away. And
> the formatting as well.
>
> -- Jack Krupansky
>
> -Original Message- From: Ahmet Arslan
> Sent: Monday, December 1, 2014 6:02 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Large fields storage
>
>
> Hi Avi,
>
> I assume your documents are rich documents like pdf word, am I correct?
> When you extract textual content from them, their size will shrink.
>
> Ahmet
>
>
>
> On Tuesday, December 2, 2014 12:11 AM, Avishai Ish-Shalom <
> avis...@fewbytes.com> wrote:
> Hi all,
>
> I have very large documents (as big as 1GB) which i'm indexing and planning
> to store in Solr in order to use highlighting snippets. I am concerned
> about possible performance issues with such large fields - does storing the
> fields require additional RAM over what is required to index/fetch/search?
> I'm assuming Solr reads only the required data by offset from the storage
> and not the entire field. Am I correct in this assumption?
>
> Does anyone on this list has experience to share with such large documents?
>
> Thanks,
> Avishai
>


ReplicationHandler - SnapPull failed to download a file completely.

2013-10-30 Thread Shalom Ben-Zvi Kazaz
we are continuously getting this exception during replication from
master to slave. our index size is 9.27 G and we are trying to replicate
a slave from scratch.
Its a different file each time , sometimes we get to 60% replication
before it fails and sometimes only 10%, we never managed a successful
replication.

30 Oct 2013 18:38:52,884 [explicit-fetchindex-cmd] ERROR
ReplicationHandler - SnapPull failed
:org.apache.solr.common.SolrException: Unable to download
_aa7_Lucene41_0.tim completely. Downloaded 0!=1054090
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1244)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1124)
at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317)
at
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218)

I read in some thread that there was a related bug in solr 4.1, but we
are using solr 4.3 and tried with 4.5.1 also.
It seams that DirectoryFileFetcher can not download a file sometimes ,
the files is downloaded to the slave in size zero.
we are running in a test environment where bandwidth is high.

this is the master setup:

|
   
 commit
 startup
 stopwords.txt,spellings.txt,synonyms.txt,protwords.txt,elevate.xml,currency.xml
 00:00:50
   

|

and the slave setup:

| 

http://solr-master.saltdev.sealdoc.com:8081/solr-master
15
30



|



Re: ReplicationHandler - SnapPull failed to download a file completely.

2013-10-31 Thread Shalom Ben-Zvii Kazaz
Shawn, Thank you for your answer.
for the purpose of testing it we have a test environment where we are not
indexing anymore. We also disabled the DIH delta import. so as I understand
there shouldn't be any commits on the master.
I also tried with
50:50:50
and get the same failure.

I tried changing and increasing various parameters on the master and slave
but no luck yet.
the master is functioning ok, we do have search results so I assume there
is no index corruption on the master side.
just to mention , we have done that many times before in the past few
years, this started just now when we upgraded our solr from version 3.6 to
version 4.3 and we reindexed all documents.

if we have no solution soon, and this is holding an upgrade to our
production site and various customers, do you think we can copy the index
directory from the master to the slave and hope that future replication
will work ?

Thank you again.

Shalom





On Wed, Oct 30, 2013 at 10:00 PM, Shawn Heisey  wrote:

> On 10/30/2013 1:49 PM, Shalom Ben-Zvi Kazaz wrote:
>
>> we are continuously getting this exception during replication from
>> master to slave. our index size is 9.27 G and we are trying to replicate
>> a slave from scratch.
>> Its a different file each time , sometimes we get to 60% replication
>> before it fails and sometimes only 10%, we never managed a successful
>> replication.
>>
>
> 
>
>
>  this is the master setup:
>>
>> |
>> 
>>   commit
>>   startup<**/str>
>>   stopwords.**txt,spellings.txt,synonyms.**
>> txt,protwords.txt,elevate.xml,**currency.xml
>>   **00:00:50
>> 
>> 
>>
>
> I assume that you're probably doing commits fairly often, resulting in a
> lot of merge activity that frequently deletes segments.  That
> "commitReserveDuration" parameter needs to be made larger.  I would imagine
> that it takes a lot more than 50 seconds to do the replication - even if
> you've got an extremely fast network, replicating 9.7GB probably takes
> several minutes.
>
> From the wiki page on replication:  "If your commits are very frequent and
> network is particularly slow, you can tweak an extra attribute  name="commitReserveDuration">**00:00:10. This is roughly the time
> taken to download 5MB from master to slave. Default is 10 secs."
>
> http://wiki.apache.org/solr/**SolrReplication#Master<http://wiki.apache.org/solr/SolrReplication#Master>
>
> You've said that your network is not slow, but with that much data, all
> networks are slow.
>
> Thanks,
> Shawn
>
>


Re: [SOLVED] ReplicationHandler - SnapPull failed to download a file completely.

2013-10-31 Thread Shalom Ben-Zvii Kazaz
emoving directory before core close:
/opt/watchdox/solr-slave/data/index.20131031180837277
31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG
CachingDirectoryFactory - Removing from cache:
CachedDir<>
31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG
CachingDirectoryFactory - Releasing directory:
/opt/watchdox/solr-slave/data/index 1 false
31 Oct 2013 18:10:40,879 [explicit-fetchindex-cmd] ERROR ReplicationHandler
- SnapPull failed :org.apache.solr.common.SolrException: Unable to download
_aa7_Lucene41_0.pos completely. Downloaded 0!=1081710
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1212)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1092)
at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317)
at
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218)

31 Oct 2013 18:10:40,910 [http-bio-8080-exec-8] DEBUG
CachingDirectoryFactory - Reusing cached directory:
CachedDir<>




So I upgraded the httpcomponents jars to their latest 4.3.x version and the
problem disappeared.
the httpcomponents jars which are dependencies of solrj where in the 4.2.x
version, I upgraded to httpclient-4.3.1 , httpcore-4.3 and httpmime-4.3.1
I ran the replication a few times now and no problem at all, it is now
working as expected.
It seams that the upgrade is necessary only on the slave side but I'm going
to upgrade the master too.


Thank you so much for your help.

Shalom








On Thu, Oct 31, 2013 at 6:46 PM, Shawn Heisey  wrote:

> On 10/31/2013 7:26 AM, Shalom Ben-Zvii Kazaz wrote:
> > Shawn, Thank you for your answer.
> > for the purpose of testing it we have a test environment where we are not
> > indexing anymore. We also disabled the DIH delta import. so as I
> understand
> > there shouldn't be any commits on the master.
> > I also tried with
> > 50:50:50
> > and get the same failure.
>
> If it's in an environment where there are no commits, that's really
> odd.  I would suspect underlying filesystem or network issues.  There's
> one problem that's not well known, but is very common - problems with
> NIC firmware, most commonly Broadcom NICs.  These problems result in
> things working correctly almost all the time, but when there is a high
> network load, things break in strange ways, and the resulting errors
> rarely look like they are network-related.
>
> Most embedded NICs are either Broadcom or Realtek, both of which are
> famous for their firmware problems.  Broadcom NICs are very common on
> Dell and HP servers.  Upgrading the firmware (which is not usually the
> same thing as upgrading the driver) is the only fix.  NICs from other
> manufacturers also have upgradable firmware, but don't usually have the
> same kind of high-profile problems as Broadcom.
>
> The NIC firmware might not have anything to do with this problem, but
> it's the only thing left that I can think of.  I personally haven't used
> replication since Solr 1.4.1, but a lot of people do.  I can't say that
> there's no bugs, but so far I'm not seeing the kind of problem reports
> that appear when a bug in a critical piece of the software exists.
>
> Thanks,
> Shawn
>
>


searching both english and japanese

2013-07-07 Thread Shalom Ben-Zvi Kazaz
Hi,
We have a customer that needs support for both english and japanese, a
document can be any of the two and we have no indication about the
language for a document. ,so I know I can construct a schema with both
english and japanese fields and index them with copy field. I also know
I can detect the language and index only the relevant fields but I want
to support mixed language documents so I think I need to index to both
english and japanese fields. we are using the standard request handler
no dismax and we want to keep using it as our queries should be on
certain fields with no errors.
queries are user entered and can be any valid query like q=lexmark or
q=docname:lexmark AND content:printer , now what I think I want is to
add the japanese fields to this query and end up with "q=docname:lexmark
OR docname_ja:lexmark"  or "q=(docname:lexmark AND content:printer) OR
(docname_ja:lexmark AND content_ja:printer) " . of course I can not ask
the use to do that.  and also we have only one default field and it must
be japanese or english but not both. I think the default field can be
solved by using dixmax and specify multi default fields with qt, but we
don't use dismax.
we use solrj as our client and It would be better if I could do
something in the client side and not in solr side.

any help/idea is appreciated. ?


edismax behaviour with japanese

2013-07-11 Thread Shalom Ben-Zvi Kazaz
Hello,
I have a text and text_ja fields where text is english and text_ja is
japanese analyzers, i index both with copyfield from other fields.
I'm trying to search both fields using edismax and qf parameter, but I
see strange behaviour of edismax , I wonder if someone can give me a
hist to what's going on and what am I doing wrong?

when I run this query i can see that solr is searching both fields but
the text_ja: query is only a partial text and text: is the complete text.
http://localhost/solr/core0/select/?indent=on&rows=100&; debug=query&
defType=edismax&qf=text+text_ja&q=このたびは

このたびは
このたびは
(+DisjunctionMaxQuery((text_ja:たび | text:この
たびは)))/no_coord
+(text_ja:たび | text:このたびは)
ExtendedDismaxQParser



now, if I remove the last two characters from the query string solr will
not search the text_ja, at list that's what I understand from the debug
output:
http://localhost/solr/core0/select/?indent=on&rows=100&; debug=query&
defType=edismax&qf=text+text_ja&q=このた

このた
このた
(+DisjunctionMaxQuery((text:このた)))/no_coord<
/str>
+(text:このた)
ExtendedDismaxQParser


with another string of japanese text solr now cuts the query to multiple
text_ja queries:
http://localhost/solr/core0/select/?indent=on&rows=100&; debug=query&
defType=edismax&qf=text+text_ja&q=システムをお買い求め いただき

システムをお買い求めいただき
システムをお買い求めいただき
(+DisjunctionMaxQuerytext_ja:システム
text_ja:買い求める text_ja:いただく)~3) | text:システムをお買い求めいた
だき)))/no_coord
+(((text_ja:システム text_ja:買い求める
text_ja:いただく)~3) | text:システムをお買い求めいただき)
ExtendedDismaxQParser




Thank you.


filter result by numFound in Result Grouping

2013-05-09 Thread Shalom Ben-Zvi Kazaz
Hello list
In one of our search that we use Result Grouping we have a need to
filter results to only groups that have more then one document in the
group, or more specifically to groups that have two documents.
Is it possible in some way?

Thank you