RE: Solr 4.0 Optimize query very slow before the optimize end of a few minutes

2013-06-14 Thread Toke Eskildsen
On Fri, 2013-06-14 at 06:59 +0200, Jeffery Wang wrote:
> Time  queryTime(ms),  CPU %   r/s   w/s   rMB/s   wMB/s   IO %
> ...
> 7:30:52   16594   26  36  0   0.140   99.3
> 7:30:53   31  80  368 0   42.43   0   94.3
> 7:31:23   28575   41  35  21  0.372.3695.9   
> 7:32:22   53399   31  81  39  0.742.6399.5!!!
> 7:32:23   11  54  155 0   16.46   0   99.6
> 7:33:28   60199   28  30  2   0.120.0199.8!!

Having a single query that is slow is expected behaviour as the reader
will have opened the merged segment and caches needs to be filled. But I
do not know why you have more than one query that is slow. Do you use
the same query for each curl-call?

- Toke Eskildsen, State and University Library, Denmark



RE: Solr 4.0 Optimize query very slow before the optimize end of a few minutes

2013-06-14 Thread Jeffery Wang
Yes, I used the same query url for each curl-call, it is very simple 
"http://...q=OS01W:sina*&fl=SecId,OS01W&rows=1&wt=xml&indent=true";.


-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: 2013年6月14日 16:20
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.0 Optimize query very slow before the optimize end of a few 
minutes

On Fri, 2013-06-14 at 06:59 +0200, Jeffery Wang wrote:
> Time  queryTime(ms),  CPU %   r/s   w/s   rMB/s   wMB/s   IO %
> ...
> 7:30:52   16594   26  36  0   0.140   99.3
> 7:30:53   31  80  368 0   42.43   0   94.3
> 7:31:23   28575   41  35  21  0.372.3695.9   
> 7:32:22   53399   31  81  39  0.742.6399.5!!!
> 7:32:23   11  54  155 0   16.46   0   99.6
> 7:33:28   60199   28  30  2   0.120.0199.8!!

Having a single query that is slow is expected behaviour as the reader will 
have opened the merged segment and caches needs to be filled. But I do not know 
why you have more than one query that is slow. Do you use the same query for 
each curl-call?

- Toke Eskildsen, State and University Library, Denmark



Solr Hangs on startup

2013-06-14 Thread Cool Techi
Hi,

We are using solr4.3 cloud setup, but for some reason solr fails to startup, I 
see the following in the log file, post this there are no logs.,

org.apache.solr.search.SolrIndexSearcher  ? Opening Searcher@17586ed7 main
15293 [coreLoadExecutor-3-thread-1] INFO  org.apache.solr.core.SolrCore  ? 
[cmn_shard1_replica1] Registered new searcher Searcher@17586ed7 
main{StandardDirectoryReader(segments_zpt:1650158 _6nln(4.3):C2734441/433094 
_6rvw(4.3):C3395530/166626 _6vdg(4.3):C4039667/172929 _6z52(4.3):C4137543/2279 
_770z(4.3):C5879498/97346 _71rz(4.3):C4168660/440273 _74hn(4.3):C5900928/134106 
_7a5j(4.3):C5892645/269769 _7bcs(4.3):C5502048/2562430 
_7gnp(4.3):C2310243/676016 _7klj(4.3):C3753172/184399 _7nxn(4.3):C558455


A thread dump show the following,

"coreLoadExecutor-3-thread-1" prio=10 tid=0x40b44800 nid=0x27ad 
runnable [0x7fbbff2d6000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.pread0(Native Method)
at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:31)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:199)
at sun.nio.ch.IOUtil.read(IOUtil.java:175)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:612)
at 
org.apache.solr.update.ChannelFastInputStream.readWrappedStream(TransactionLog.java:752)
at 
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
at 
org.apache.solr.common.util.FastInputStream.peek(FastInputStream.java:61)
at 
org.apache.solr.update.TransactionLog$ReverseReader.next(TransactionLog.java:702)
at 
org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:925)
at 
org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:863)
at 
org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1014)
at org.apache.solr.update.UpdateLog.init(UpdateLog.java:253)
at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82)
at org.apache.solr.update.UpdateHandler.(UpdateHandler.java:137)
at org.apache.solr.update.UpdateHandler.(UpdateHandler.java:123)
at 
org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:95)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525)
at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596)
at org.apache.solr.core.SolrCore.(SolrCore.java:805)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


What could be causing this?

regards,
Ayush
  

Re: Debugging Solr XSL

2013-06-14 Thread Miguel

Hi

  You can use an online xsl validator, example: 
http://xslttest.appspot.com/
but I think it's better use XSLT editor. It's sure visual studio should 
have someone.


regars.

El 13/06/2013 23:45, O. Olson escribió:

Hi,

I am attempting to transform the XML output of Solr using the
XsltResponseWriter http://wiki.apache.org/solr/XsltResponseWriter to HTML.
This works, but I am wondering if there is a way for me to debug my creation
of XSL. If there is any problem in the XSL you simply get a stack trace in
the Solr Output.

For e.g. In adding a HTML Link Tag to my XSL, I forgot the closing i.e. 
I
did “>” instead of a “/>”. I would just get a stack trace, nothing to tell
me what I did wrong. Another time I had a template match that was very
specific. I expected it to have precedence over the more general template.
It did not, and I have no clue. I ultimately put in a priority to get my
expected value.

I am new to XSL. Is there any other free tool that would help me debug 
XSL
that Solr would accept? I have Visual Studio (full version) that has XSLT
debugging – but I have not tried this as yet. Would Solr accept as valid
what Visual Studio OKs?

I’m sorry I am new to this. I’d be grateful for any pointers.

Thank you,
O.O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Debugging-Solr-XSL-tp4070368.html
Sent from the Solr - User mailing list archive at Nabble.com.





How spell checker used if indexed document is containing misspelled words

2013-06-14 Thread venkatesham.gu...@igate.com
My data is picked from social media sites and misspelled words are very
frequent in social text because of the informal mode of
communication.Spellchecker does not work here because misspelled words are
present in the text corpus and not in the search query. Finding documents
with all the different misspelled forms for a given word is not possible
using spellchecker, how to go ahead with this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-spell-checker-used-if-indexed-document-is-containing-misspelled-words-tp4070463.html
Sent from the Solr - User mailing list archive at Nabble.com.


Want to avoid setting the solr.xml in conf/Catalina/localhost

2013-06-14 Thread bsargurunathan
Hi All,

I want to avoid to set the solr.xml in the conf/Catalina/localhost path for
Tomcat server in Windows OS.

So please suggest me to do that with the sample configuration.



Thanks,
Guru



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Want-to-avoid-setting-the-solr-xml-in-conf-Catalina-localhost-tp4070485.html
Sent from the Solr - User mailing list archive at Nabble.com.


ngroups does not show correct number of groups when used in SolrCloud

2013-06-14 Thread Markus.Mirsberger

Hi,

I just noticed (after long time testing and finally looking into the 
docu :p) that the ngroups parameter does not show the correct number of 
groups when used in anything else than a single shard environment (in my 
case SolrCloud).


Is there another way to get the amount of all groups without iterating 
through alot of resultsets?
I dont need the values of the grouping. I just need the complete number 
of groups.


Or can this be done with facets maybe?
I dont need to to use grouping but as far as I know I cant get the 
complete amount of facets without iterating through the resultsets.
So this seemed to me the only way to achieve something equal to a 
distinct count in sql.


any ideas how this can be done with solr?


Thanks,
Markus



Re: Solr Hangs on startup

2013-06-14 Thread Jack Krupansky
What are the last few lines of the Solr log? No errors, exceptions, or 
warnings?


-- Jack Krupansky

-Original Message- 
From: Cool Techi

Sent: Friday, June 14, 2013 4:49 AM
To: solr-user@lucene.apache.org
Subject: Solr Hangs on startup

Hi,

We are using solr4.3 cloud setup, but for some reason solr fails to startup, 
I see the following in the log file, post this there are no logs.,


org.apache.solr.search.SolrIndexSearcher  ? Opening Searcher@17586ed7 main
15293 [coreLoadExecutor-3-thread-1] INFO  org.apache.solr.core.SolrCore  ? 
[cmn_shard1_replica1] Registered new searcher Searcher@17586ed7 
main{StandardDirectoryReader(segments_zpt:1650158 _6nln(4.3):C2734441/433094 
_6rvw(4.3):C3395530/166626 _6vdg(4.3):C4039667/172929 
_6z52(4.3):C4137543/2279 _770z(4.3):C5879498/97346 
_71rz(4.3):C4168660/440273 _74hn(4.3):C5900928/134106 
_7a5j(4.3):C5892645/269769 _7bcs(4.3):C5502048/2562430 
_7gnp(4.3):C2310243/676016 _7klj(4.3):C3753172/184399 _7nxn(4.3):C558455



A thread dump show the following,

"coreLoadExecutor-3-thread-1" prio=10 tid=0x40b44800 nid=0x27ad 
runnable [0x7fbbff2d6000]

  java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.FileDispatcher.pread0(Native Method)
   at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:31)
   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:199)
   at sun.nio.ch.IOUtil.read(IOUtil.java:175)
   at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:612)
   at 
org.apache.solr.update.ChannelFastInputStream.readWrappedStream(TransactionLog.java:752)
   at 
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
   at 
org.apache.solr.common.util.FastInputStream.peek(FastInputStream.java:61)
   at 
org.apache.solr.update.TransactionLog$ReverseReader.next(TransactionLog.java:702)
   at 
org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:925)
   at 
org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:863)
   at 
org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1014)

   at org.apache.solr.update.UpdateLog.init(UpdateLog.java:253)
   at 
org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82)
   at 
org.apache.solr.update.UpdateHandler.(UpdateHandler.java:137)
   at 
org.apache.solr.update.UpdateHandler.(UpdateHandler.java:123)
   at 
org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:95)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525)
   at 
org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596)

   at org.apache.solr.core.SolrCore.(SolrCore.java:805)
   at org.apache.solr.core.SolrCore.(SolrCore.java:618)
   at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)

   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
   at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
   at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
   at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

   at java.lang.Thread.run(Thread.java:662)


What could be causing this?

regards,
Ayush




data consistency in solrcloud cluster deployed in aws

2013-06-14 Thread Luis Carlos Guerrero Covo
Hi,

I currently have solrcloud setup with single shards and two nodes behind a
load balancer in aws. I also have an additional node in the cluster which
is outside the load balancer (not receiving any client requests) importing
data into the cluster using data import handler. So that takes my cluster
to 3 nodes, 2 receiving user requests and the single data import node.

I'm experiencing several data replication issues that could be caused by
the irregular setup. The one node that is in the same availability zone as
the data import node (My two nodes are in two different aws availability
zones) is replicating correctly and is never far away from the import
node's generation number. The node that is in a different availability zone
is always lagging behind in terms of index replication. I'm mentioning
availability zones because I see that as the only thing that could be
causing this issue. Am I correct in asuming this? What are further steps
that I could take to verify what could be the cause of the index not
replicating fast enough to all nodes?

thanks in advance for any help provided,

Luis Guerrero


Re: Solr using a ridiculous amount of memory

2013-06-14 Thread John Nielsen
Sorry for not getting back to the list sooner. It seems like I finally
solved the memory problems by following Toke's instruction of splitting the
cores up into smaller chunks.

After some major refactoring, our 15 cores have now turned into ~500 cores
and our memory consumption has dropped dramaticly. Running 200 webshops now
actually uses less memory as our 24 test shops did before.

Thank you to everyone who helped, and especially to Toke.

I looked at the wiki, but could not find any reference to this unintuitive
way of using memory. Did I miss it somewhere?



On Fri, Apr 19, 2013 at 1:30 PM, Erick Erickson wrote:

> Hmmm. There has been quite a bit of work lately to support a couple of
> things that might be of interest (4.3, which Simon cut today, probably
> available to all mid next week at the latest). Basically, you can
> choose to pre-define all the cores in solr.xml (so-called "old style")
> _or_ use the new-style solr.xml which uses "auto-discover" mode to
> walk the indicated directory and find all the cores (indicated by the
> presence of a 'core.properties' file). Don't know if this would make
> your particular case easier, and I should warn you that this is
> relatively new code (although there are some reasonable unit tests).
>
> You also have the option to only load the cores when they are
> referenced, and only keep N cores open at a time (loadOnStartup and
> transient properties).
>
> See: http://wiki.apache.org/solr/CoreAdmin#Configuration and
> http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond
>
> Note, the docs are somewhat sketchy, so if you try to go down this
> route let us know anything that should be improved (or you can be
> added to the list of wiki page contributors and help out!)
>
> Best
> Erick
>
> On Thu, Apr 18, 2013 at 8:31 AM, John Nielsen  wrote:
> >> You are missing an essential part: Both the facet and the sort
> >> structures needs to hold one reference for each document
> >> _in_the_full_index_, even when the document does not have any values in
> >> the fields.
> >>
> >
> > Wow, thank you for this awesome explanation! This is where the penny
> > dropped for me.
> >
> > I will definetely move to a multi-core setup. It will take some time and
> a
> > lot of re-coding. As soon as I know the result, I will let you know!
> >
> >
> >
> >
> >
> >
> > --
> > Med venlig hilsen / Best regards
> >
> > *John Nielsen*
> > Programmer
> >
> >
> >
> > *MCB A/S*
> > Enghaven 15
> > DK-7500 Holstebro
> >
> > Kundeservice: +45 9610 2824
> > p...@mcb.dk
> > www.mcb.dk
>



-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: ngroups does not show correct number of groups when used in SolrCloud

2013-06-14 Thread Shreejay
Hi Markus,  

For ngroups to work in a cloud environment you have to make sure that all docs 
belonging to a group reside on the same shard. Custom hashing has been 
introduced in the recent versions of solr cloud. You might want to look into 
that 
https://issues.apache.org/jira/browse/SOLR-2592

All queries on SolrCloud are run individually on each shard. And then the 
results are merged. When u run a group query SolrCloud runs the query on each 
shard and when the results are merged the ngroups from each shard are added up. 
This is why the ngroups is incorrect when using SolrCloud.  

-- 
Shreejay


On Friday, June 14, 2013 at 5:11, Markus.Mirsberger wrote:

> Hi,
> 
> I just noticed (after long time testing and finally looking into the 
> docu :p) that the ngroups parameter does not show the correct number of 
> groups when used in anything else than a single shard environment (in my 
> case SolrCloud).
> 
> Is there another way to get the amount of all groups without iterating 
> through alot of resultsets?
> I dont need the values of the grouping. I just need the complete number 
> of groups.
> 
> Or can this be done with facets maybe?
> I dont need to to use grouping but as far as I know I cant get the 
> complete amount of facets without iterating through the resultsets.
> So this seemed to me the only way to achieve something equal to a 
> distinct count in sql.
> 
> any ideas how this can be done with solr?
> 
> 
> Thanks,
> Markus
> 
> 




Re: Solr using a ridiculous amount of memory

2013-06-14 Thread Toke Eskildsen
On Fri, 2013-06-14 at 14:55 +0200, John Nielsen wrote:
> Sorry for not getting back to the list sooner.

Time not important, only feedback important (apologies to Fifth
Element).

> After some major refactoring, our 15 cores have now turned into ~500 cores
> and our memory consumption has dropped dramaticly. Running 200 webshops now
> actually uses less memory as our 24 test shops did before.

That's great to hear. One core/shop also sounds like a cleaner setup.

> I looked at the wiki, but could not find any reference to this unintuitive
> way of using memory. Did I miss it somewhere?

I am not aware of a wikified explanation, but a section on "Why does
Solr use so much memory?" with some suggestions for changes to setup
would seem appropriate. You are not the first to have these kinds of
problems.


Thank you for closing the issue,
Toke Eskildsen



Re: How spell checker used if indexed document is containing misspelled words

2013-06-14 Thread Shreejay
Hi,  

Have you tried this? 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular

Of course this is assuming that your corpus has correct words occurring more 
frequently than incorrect ones!  

-- 
Shreejay


On Friday, June 14, 2013 at 2:49, venkatesham.gu...@igate.com wrote:

> My data is picked from social media sites and misspelled words are very
> frequent in social text because of the informal mode of
> communication.Spellchecker does not work here because misspelled words are
> present in the text corpus and not in the search query. Finding documents
> with all the different misspelled forms for a given word is not possible
> using spellchecker, how to go ahead with this.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-spell-checker-used-if-indexed-document-is-containing-misspelled-words-tp4070463.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 




Re: The 'threads' parameter in DIH - SOLR 4.3.0

2013-06-14 Thread Mikhail Khludnev
Hello,

Most times users end-up with coding multithread SolrJ indexer that I
consider as a sad thing. As 3.x fix contributor I want to share my vision
to the problem. While I did that work I realized that join operation itself
is too hard and even impossible to make concurrent. I propose to add
concurrency into outbound and inbound streams.

My plan is:
1. add threads to outbound flow
https://issues.apache.org/jira/browse/SOLR-3585 it allows to don't wait for
Solr. I mostly like that code, but recently I realized that this code
implements ConcurrentUpdateSolrServer algorithm, looking forward I prefer
to unify some core concurrent code between them or it's kind of using CUSS
inside of DIH's SolrWriter
2. The next problem, which we've faced is SQLEntityProcessor. It has two
modes, one of them gets miserable performance due to N+1 problem; cached
version is not production capable with default heap cache.  Our proposal
for it https://issues.apache.org/jira/browse/SOLR-4799 unfortunately I have
no time to polish the patch.
3. After that the only thing which DIH  waits for is jdbc. it can be easily
boosted by implementing DataSource wrapper with producer thread and bounded
queue as a buffer.

if we complete this plan, we will never need to code SolrJ indexers.

Particular question to you is what you need to speed up?

On Thu, Jun 13, 2013 at 11:01 PM, Shawn Heisey  wrote:

> On 6/13/2013 12:08 PM, bbarani wrote:
>
>> I see that the threads parameter has been removed from DIH from all
>> version
>> starting SOLR 4.x. Can someone let me know the best way to initiate
>> indexing
>> in multi threaded mode when using DIH now? Is there a way to do that?
>>
>
> That parameter was removed because it didn't work right, and there was no
> apparent way to fix it.  The change that went into a later 3.6 version was
> a bandaid, not a fix.  I don't know all the details.
>
> There's no way to get multithreading with DIH directly, but you can do it
> indirectly:
>
> Create multiple request handlers with different names, such as
> /dataimport1, /dataimport2, etc.  Configure each handler with settings that
> will pull part of your data source.  Start them so they run concurrently.
>
> Depending on your environment, it may be easier to just write a
> multi-threaded indexing application using the Solr API for your language of
> choice.
>
> Thanks,
> Shawn
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Solr 3.5 Optimization takes index file size almost double

2013-06-14 Thread Pravin Bhutada
Hi Viresh,

How much free disc space do you have?  if you have dont have enough space
on disc, optimization process stops and rollsback to some intermediate
state.


Pravin




On Fri, Jun 14, 2013 at 2:50 AM, Viresh Modi  wrote:

> Hi Rafal
>
> Here i attached solr index file snapshot as well ..
> So can you look into this and any another information required regarding
> it then let me know.
>
>
> Thanks&  Regards,
> Viresh modi
> Mobile: 91 (0) 9714567430
>
>
> On 13 June 2013 17:41, Rafał Kuć  wrote:
>
>> Hello!
>>
>> Do you have some backup after commit in your configuration? It would
>> also be good to see how your index directory looks like, can you list
>> that ?
>>
>> --
>> Regards,
>>  Rafał Kuć
>>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
>>
>> > Thanks Rafal for reply...
>>
>> > I agree with you. But Actually After optimization , it does not reduce
>> size
>> > and it remains double. so is there any thing we missed or need to do for
>> > achieving index size reduction ?
>>
>> > Is there any special setting we need to configure for replication?
>>
>>
>>
>>
>> > On 13 June 2013 16:53, Rafał Kuć  wrote:
>>
>> >> Hello!
>> >>
>> >> Optimize command needs to rewrite the segments, so while it is
>> >> still working you may see the index size to be doubled. However after
>> >> it is finished the index size will be usually lowered comparing to the
>> >> index size before optimize.
>> >>
>> >> --
>> >> Regards,
>> >>  Rafał Kuć
>> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
>> >>
>> >> > Hi,
>> >> > I have solr server 1.4.1 with index file size 428GB.Now When I
>> upgrade
>> >> solr
>> >> > Server 1.4.1 to Solr 3.5.0 by replication method. Size remains same.
>> >> > But when optimize index for Solr 3.5.0 instance its size reaches
>> 791GB.so
>> >> > what is solutions for size remains same or lesser.
>> >> > I optimize Solr 3.5 with Query:
>> >> > /update?optimize=true&commit=true
>> >>
>> >> > Thanks & regards
>> >> > Viresh Modi
>> >>
>> >>
>>
>>
>
> --
> This email and its attachments are intended for the above named only and
> may be confidential. If they have come to you in error you must take no
> action based on them, nor must you copy or show them to anyone; please
> reply to this email and highlight the error.
>


Re: Solr 3.5 Optimization takes index file size almost double

2013-06-14 Thread Viresh Modi
Hi pravin

I have nearly 2 TB Disk space for optimization.And  after optimization get
response of Qtime nearly 7hours (Obvious which  in milisecond).So i think
not issue of disk space.


Thanks&  Regards,
Viresh modi
Mobile: 91 (0) 9714567430


On 14 June 2013 20:10, Pravin Bhutada  wrote:

> Hi Viresh,
>
> How much free disc space do you have?  if you have dont have enough space
> on disc, optimization process stops and rollsback to some intermediate
> state.
>
>
> Pravin
>
>
>
>
> On Fri, Jun 14, 2013 at 2:50 AM, Viresh Modi <
> viresh.m...@highqsolutions.com
> > wrote:
>
> > Hi Rafal
> >
> > Here i attached solr index file snapshot as well ..
> > So can you look into this and any another information required regarding
> > it then let me know.
> >
> >
> > Thanks&  Regards,
> > Viresh modi
> > Mobile: 91 (0) 9714567430
> >
> >
> > On 13 June 2013 17:41, Rafał Kuć  wrote:
> >
> >> Hello!
> >>
> >> Do you have some backup after commit in your configuration? It would
> >> also be good to see how your index directory looks like, can you list
> >> that ?
> >>
> >> --
> >> Regards,
> >>  Rafał Kuć
> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
> >>
> >> > Thanks Rafal for reply...
> >>
> >> > I agree with you. But Actually After optimization , it does not reduce
> >> size
> >> > and it remains double. so is there any thing we missed or need to do
> for
> >> > achieving index size reduction ?
> >>
> >> > Is there any special setting we need to configure for replication?
> >>
> >>
> >>
> >>
> >> > On 13 June 2013 16:53, Rafał Kuć  wrote:
> >>
> >> >> Hello!
> >> >>
> >> >> Optimize command needs to rewrite the segments, so while it is
> >> >> still working you may see the index size to be doubled. However after
> >> >> it is finished the index size will be usually lowered comparing to
> the
> >> >> index size before optimize.
> >> >>
> >> >> --
> >> >> Regards,
> >> >>  Rafał Kuć
> >> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
> >> >>
> >> >> > Hi,
> >> >> > I have solr server 1.4.1 with index file size 428GB.Now When I
> >> upgrade
> >> >> solr
> >> >> > Server 1.4.1 to Solr 3.5.0 by replication method. Size remains
> same.
> >> >> > But when optimize index for Solr 3.5.0 instance its size reaches
> >> 791GB.so
> >> >> > what is solutions for size remains same or lesser.
> >> >> > I optimize Solr 3.5 with Query:
> >> >> > /update?optimize=true&commit=true
> >> >>
> >> >> > Thanks & regards
> >> >> > Viresh Modi
> >> >>
> >> >>
> >>
> >>
> >
> > --
> > This email and its attachments are intended for the above named only and
> > may be confidential. If they have come to you in error you must take no
> > action based on them, nor must you copy or show them to anyone; please
> > reply to this email and highlight the error.
> >
>

-- 

--
This email and its attachments are intended for the above named only and 
may be confidential. If they have come to you in error you must take no 
action based on them, nor must you copy or show them to anyone; please 
reply to this email and highlight the error.


Re: Solr 3.5 Optimization takes index file size almost double

2013-06-14 Thread Pravin Bhutada
One thing that you can try is optimize incrementally. Instead of optimizing
to 1 segment, optimize to 100, then 50 , 25, 10 ,5 ,2 ,1
After each step, the index size should go down. This way you dont have to
wait 7 hours to get some results.


Pravin


On Fri, Jun 14, 2013 at 10:45 AM, Viresh Modi <
viresh.m...@highqsolutions.com> wrote:

> Hi pravin
>
> I have nearly 2 TB Disk space for optimization.And  after optimization get
> response of Qtime nearly 7hours (Obvious which  in milisecond).So i think
> not issue of disk space.
>
>
> Thanks&  Regards,
> Viresh modi
> Mobile: 91 (0) 9714567430
>
>
> On 14 June 2013 20:10, Pravin Bhutada  wrote:
>
> > Hi Viresh,
> >
> > How much free disc space do you have?  if you have dont have enough space
> > on disc, optimization process stops and rollsback to some intermediate
> > state.
> >
> >
> > Pravin
> >
> >
> >
> >
> > On Fri, Jun 14, 2013 at 2:50 AM, Viresh Modi <
> > viresh.m...@highqsolutions.com
> > > wrote:
> >
> > > Hi Rafal
> > >
> > > Here i attached solr index file snapshot as well ..
> > > So can you look into this and any another information required
> regarding
> > > it then let me know.
> > >
> > >
> > > Thanks&  Regards,
> > > Viresh modi
> > > Mobile: 91 (0) 9714567430
> > >
> > >
> > > On 13 June 2013 17:41, Rafał Kuć  wrote:
> > >
> > >> Hello!
> > >>
> > >> Do you have some backup after commit in your configuration? It would
> > >> also be good to see how your index directory looks like, can you list
> > >> that ?
> > >>
> > >> --
> > >> Regards,
> > >>  Rafał Kuć
> > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
> > >>
> > >> > Thanks Rafal for reply...
> > >>
> > >> > I agree with you. But Actually After optimization , it does not
> reduce
> > >> size
> > >> > and it remains double. so is there any thing we missed or need to do
> > for
> > >> > achieving index size reduction ?
> > >>
> > >> > Is there any special setting we need to configure for replication?
> > >>
> > >>
> > >>
> > >>
> > >> > On 13 June 2013 16:53, Rafał Kuć  wrote:
> > >>
> > >> >> Hello!
> > >> >>
> > >> >> Optimize command needs to rewrite the segments, so while it is
> > >> >> still working you may see the index size to be doubled. However
> after
> > >> >> it is finished the index size will be usually lowered comparing to
> > the
> > >> >> index size before optimize.
> > >> >>
> > >> >> --
> > >> >> Regards,
> > >> >>  Rafał Kuć
> > >> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
> > >> >>
> > >> >> > Hi,
> > >> >> > I have solr server 1.4.1 with index file size 428GB.Now When I
> > >> upgrade
> > >> >> solr
> > >> >> > Server 1.4.1 to Solr 3.5.0 by replication method. Size remains
> > same.
> > >> >> > But when optimize index for Solr 3.5.0 instance its size reaches
> > >> 791GB.so
> > >> >> > what is solutions for size remains same or lesser.
> > >> >> > I optimize Solr 3.5 with Query:
> > >> >> > /update?optimize=true&commit=true
> > >> >>
> > >> >> > Thanks & regards
> > >> >> > Viresh Modi
> > >> >>
> > >> >>
> > >>
> > >>
> > >
> > > --
> > > This email and its attachments are intended for the above named only
> and
> > > may be confidential. If they have come to you in error you must take no
> > > action based on them, nor must you copy or show them to anyone; please
> > > reply to this email and highlight the error.
> > >
> >
>
> --
>
> --
> This email and its attachments are intended for the above named only and
> may be confidential. If they have come to you in error you must take no
> action based on them, nor must you copy or show them to anyone; please
> reply to this email and highlight the error.
>


Solr Server Add causes java.net.SocketException: No buffer space available

2013-06-14 Thread Snubbel
Hello,

I am upgrading from Solr 4.0 to 4.3 and a Testcase that worked fine is
failing since.

I do commit 1 Documents to Solr, then reload them and add a value to a
multi-valued field with Atomic Update. 
I do commit every 50 Documents, so it's not so many at once, because the
multi-valued field contains many values already.

And at some point, I get this exception:

java.net.SocketException: No buffer space available(maximum connections
reached?): connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:640)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)

I did use the solrconfig and schema from the 4.3 example, just added some
custom fields.

I did not have any config for maximum connections in my solrconfig for 4.0.
Did anyone have that behavior too?

What do I have to configure to fix this?

Best regards, 

Snubbel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Server-Add-causes-java-net-SocketException-No-buffer-space-available-tp4070533.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Atomic Update Configurations how to?

2013-06-14 Thread Snubbel
Thanks,

I started out with the original solrconfig and schema now, and it works. 
Just need to put in everything we are missing and figure out what the
problem really was.

Best regards

Snubbel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Atomic-Update-Configurations-how-to-tp4069900p4070534.html
Sent from the Solr - User mailing list archive at Nabble.com.


Replicas and soft commit

2013-06-14 Thread Giovanni Bricconi
I have recently upgraded our application from solr 3.6 to solr 4.2.1, and I
have just started learning about soft commits and partial updates.

Currently I have one indexing node and 3 replicas of the same core, and
every modification goes through a dih delta index. This is usually ok but I
have some special cases where updates should be made visible very quickly.

As I have seen with my first tests - it is possible to send partial updates
and soft commits to each replica and to the indexer - and when the indexer
gets an hard commit every replica is realligned.

Is this the right approach or am I misunderstanding how to use this
feature?

I don't see soft commit propagation to replicas when sending update to the
indexer only: is this true or maybe I haven't changed some configuration
files when porting the application to solr4?

Giovanni


Re: ngroups does not show correct number of groups when used in SolrCloud

2013-06-14 Thread Markus.Mirsberger

Hi Shreejay,

Thanks for the info.
I read about this too but as far as I understand this this feature is 
not really usefull in my case.


This means I would have to reindex my documents just to get the grouping 
that I need now.
It would be OK to do that one time but I would have to do that again 
when I maybe want to group on another field too in a few weeks.



Thanks,
Markus


On 14.06.2013 20:28, Shreejay wrote:

Hi Markus,

For ngroups to work in a cloud environment you have to make sure that all docs 
belonging to a group reside on the same shard. Custom hashing has been 
introduced in the recent versions of solr cloud. You might want to look into 
that
https://issues.apache.org/jira/browse/SOLR-2592

All queries on SolrCloud are run individually on each shard. And then the 
results are merged. When u run a group query SolrCloud runs the query on each 
shard and when the results are merged the ngroups from each shard are added up. 
This is why the ngroups is incorrect when using SolrCloud.





Re: Suggest and Filtering

2013-06-14 Thread Brendan Grainger
Hi Otis,

Sorry was a bit tired when I wrote that. I think what I'd like is to be
able spellcheck the suggestions. For example. If a user types in brayk (as
opposed to brake) I'd still get the following suggestions say:

brake line
brake condition

Does that make sense?

Thanks
Brendan



On Thu, Jun 13, 2013 at 8:53 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> I think you are talking about wanting instant search?
>
> See https://github.com/fergiemcdowall/solrstrap
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Thu, Jun 13, 2013 at 7:43 PM, Brendan Grainger
>  wrote:
> > Hi Solr Guru's
> >
> > I am trying to implement auto suggest where solr would suggest several
> > phrases that would return results as the user types in a query (as
> distinct
> > from autocomplete). e.g. say the user starts typing 'br' and we have
> > documents that contain "brake pads" and "left disc brake", solr would
> > suggest both of those phrases with "brake pads" first. I also want to
> only
> > look at documents that match a given filter query. So say I have a bunch
> of
> > documents for a toyota cressida that contain the bi-gram "brake pads",
> > while the documents for a honda accord don't have any brake pad articles.
> > If the user is filtering on the honda accord I wouldn't want "brake pads"
> > as a suggestion.
> >
> > Right now, I've played with the suggest component and using faceting.
> >
> > Any thoughts?
> >
> > Thanks
> > Brendan
> >
> > --
> > Brendan Grainger
> > www.kuripai.com
>



-- 
Brendan Grainger
www.kuripai.com


Re: The 'threads' parameter in DIH - SOLR 4.3.0

2013-06-14 Thread Java One
Hello,
 
   I'm more than happy to contribute to this effort as well.  
 
We are still on Solr 3.5 and never got solr 'threads' working properly. I've 
heard much of this was fixed in 3.6 but still a bit buggy and deprecated in 
later versions. Fully support in 4.X is a major wish-list item, that I would be 
willing to sacrafice a few weekends for myself helping write it.
 
Nevertheless and related to this, its a bit interesting to see this topic open 
because I just started experimenting with simulating a multi-threaded import 
via the DIH and the following approach and it appears to be working fine with a 
caveat:
 
Here's are my steps:
 
I created a series of similiar entities, partitioning the data I'm targeting by 
a logical range (i.e WHERE somefield BETWEEN 'SOME VALUE' AND 'SOME VALUE'.
 
And I have a few of these, but depending on your data, you'll need to 
experiement.. (You'll need to be careful not to bring your database to its 
knees! )
 
Within Solrconfig.xml , I've created a corresponding Data Import Handlers, one 
for each of these entitites.
 
And when I initiate a nimport, I call each one, similiar to the below 
(obviously I've stripped out my server & naming conventions.
 
http://[server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting1]&commit=true
 
http://[server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting2]&commit=true
 
 
...etc
 
Import seems to run fine with SIGNIFICANT performance gains. The only caveat I 
haven't figured out yet is one of the threads doesn't appear to commit its data 
(although it states that it did). I have turned off auto-commit as well.. but 
still playing with it.
 
Either way an out of the box is much preferred.
 
Cheers!
Mike
  

From: Mikhail Khludnev 
To: solr-user  
Sent: Friday, June 14, 2013 9:15 AM
Subject: Re: The 'threads' parameter in DIH - SOLR 4.3.0


Hello,

Most times users end-up with coding multithread SolrJ indexer that I
consider as a sad thing. As 3.x fix contributor I want to share my vision
to the problem. While I did that work I realized that join operation itself
is too hard and even impossible to make concurrent. I propose to add
concurrency into outbound and inbound streams.

My plan is:
1. add threads to outbound flow
https://issues.apache.org/jira/browse/SOLR-3585it allows to don't wait for
Solr. I mostly like that code, but recently I realized that this code
implements ConcurrentUpdateSolrServer algorithm, looking forward I prefer
to unify some core concurrent code between them or it's kind of using CUSS
inside of DIH's SolrWriter
2. The next problem, which we've faced is SQLEntityProcessor. It has two
modes, one of them gets miserable performance due to N+1 problem; cached
version is not production capable with default heap cache.  Our proposal
for it https://issues.apache.org/jira/browse/SOLR-4799unfortunately I have
no time to polish the patch.
3. After that the only thing which DIH  waits for is jdbc. it can be easily
boosted by implementing DataSource wrapper with producer thread and bounded
queue as a buffer.

if we complete this plan, we will never need to code SolrJ indexers.

Particular question to you is what you need to speed up?

On Thu, Jun 13, 2013 at 11:01 PM, Shawn Heisey  wrote:

> On 6/13/2013 12:08 PM, bbarani wrote:
>
>> I see that the threads parameter has been removed from DIH from all
>> version
>> starting SOLR 4.x. Can someone let me know the best way to initiate
>> indexing
>> in multi threaded mode when using DIH now? Is there a way to do that?
>>
>
> That parameter was removed because it didn't work right, and there was no
> apparent way to fix it.  The change that went into a later 3.6 version was
> a bandaid, not a fix.  I don't know all the details.
>
> There's no way to get multithreading with DIH directly, but you can do it
> indirectly:
>
> Create multiple request handlers with different names, such as
> /dataimport1, /dataimport2, etc.  Configure each handler with settings that
> will pull part of your data source.  Start them so they run concurrently.
>
> Depending on your environment, it may be easier to just write a
> multi-threaded indexing application using the Solr API for your language of
> choice.
>
> Thanks,
> Shawn
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics




Re: Solr Server Add causes java.net.SocketException: No buffer space available

2013-06-14 Thread Shawn Heisey

On 6/14/2013 8:57 AM, Snubbel wrote:

Hello,

I am upgrading from Solr 4.0 to 4.3 and a Testcase that worked fine is
failing since.

I do commit 1 Documents to Solr, then reload them and add a value to a
multi-valued field with Atomic Update.
I do commit every 50 Documents, so it's not so many at once, because the
multi-valued field contains many values already.

And at some point, I get this exception:

java.net.SocketException: No buffer space available(maximum connections
reached?): connect


Looks like a client-side problem, either not enough java heap or you are 
running out of connections because you're using a lot of connections at 
once.  This is happening on the client side, not the server side. That 
may be an indication that you are doing something not quite right, but 
if you actually do intend to create a lot of connections and you are 
using HttpSolrServer, use code similar to this to bump up the max 
connections:


ModifiableSolrParams params = new ModifiableSolrParams();
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 1000);
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 200);
HttpClient client = HttpClientUtil.createClient(params);
String url = "http://localhost:8983/solr/collection1";;
SolrServer server = new HttpSolrServer(url, client);

Thanks,
Shawn



Re: Solr Server Add causes java.net.SocketException: No buffer space available

2013-06-14 Thread Travis Low
If it's a windows box, then you may be experiencing a kernel sockets leak
problem.

http://support.microsoft.com/kb/2577795


On Fri, Jun 14, 2013 at 1:20 PM, Shawn Heisey  wrote:

> On 6/14/2013 8:57 AM, Snubbel wrote:
>
>> Hello,
>>
>> I am upgrading from Solr 4.0 to 4.3 and a Testcase that worked fine is
>> failing since.
>>
>> I do commit 1 Documents to Solr, then reload them and add a value to a
>> multi-valued field with Atomic Update.
>> I do commit every 50 Documents, so it's not so many at once, because the
>> multi-valued field contains many values already.
>>
>> And at some point, I get this exception:
>>
>> java.net.SocketException: No buffer space available(maximum connections
>> reached?): connect
>>
>
> Looks like a client-side problem, either not enough java heap or you are
> running out of connections because you're using a lot of connections at
> once.  This is happening on the client side, not the server side. That may
> be an indication that you are doing something not quite right, but if you
> actually do intend to create a lot of connections and you are using
> HttpSolrServer, use code similar to this to bump up the max connections:
>
> ModifiableSolrParams params = new ModifiableSolrParams();
> params.set(HttpClientUtil.**PROP_MAX_CONNECTIONS, 1000);
> params.set(HttpClientUtil.**PROP_MAX_CONNECTIONS_PER_HOST, 200);
> HttpClient client = HttpClientUtil.createClient(**params);
> String url = 
> "http://localhost:8983/solr/**collection1
> ";
> SolrServer server = new HttpSolrServer(url, client);
>
> Thanks,
> Shawn
>
>


-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* 

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


SolrCloud excluding certain files in conf from zookeeper

2013-06-14 Thread Bill Au
When using SolrCloud, is it possible to exclude certain files in the conf
directory from being loaded into Zookeeper?

We are keeping our own solr related config files in the conf directory that
is actually different for each node.  Right now the copy in Zookeeper is
overriding the local copy.

Bill


Re: Debugging Solr XSL

2013-06-14 Thread O. Olson
Thank you Upayavira & Miguel. I decided to use Visual Studio – since I can at
least set breakpoints and do interactive debugging in the UI. I hope the way
Visual Studio treats XSL is the same as Solr - else I would have problems
:-).
Thanks again,
O.O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Debugging-Solr-XSL-tp4070368p4070572.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud excluding certain files in conf from zookeeper

2013-06-14 Thread Daniel Collins
We had something similar, we had backup copies of files that were getting 
uploaded to ZK and we didn't want them to.


The morale I learned from that was that the files for ZK don't need to live 
anywhere under the Solr deployment area, they can be in a totally separate 
directory structure (in fact once they are uploaded to ZK, you can delete 
them from the filesystem).  So I would suggest you put your "local" 
configuration files under //conf, and keep your 
ZK-related conf in a separate, parallel area outside .


-Original Message- 
From: Bill Au

Sent: Friday, June 14, 2013 7:01 PM
To: solr-user@lucene.apache.org
Subject: SolrCloud excluding certain files in conf from zookeeper

When using SolrCloud, is it possible to exclude certain files in the conf
directory from being loaded into Zookeeper?

We are keeping our own solr related config files in the conf directory that
is actually different for each node.  Right now the copy in Zookeeper is
overriding the local copy.

Bill 



retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
I have an index first built with solr1.4 and later upgraded to solr3.6,
which has 150million documents, and all docs have a datefield which are not
blank. (verified by solr query).

I am using the following code snippet to retrieve

import org.apache.lucene.index.IndexReader;
import org.apache.lucene.store.*;
import org.apache.lucene.document.*;

IndexReader input = IndexReader.open(indexDir);
Document d = input.document(i);
int maxDoc = input.maxDoc();
for (int i = 0; i < maxDoc; i++) {
System.out.println(d.get('date');
}

However, about 100 million docs give null for d.get('date') and about other
50 million docs give the right values.

What could be wrong?

Ming-


Re: retrieve datefield value from document

2013-06-14 Thread Michael Della Bitta
Shot in the dark:

You're using Lucene to read the index. That's sort of circumventing all the
typing stuff that Solr does. Solr can deal with an index where some of the
segments are in one format (say 1.4) and others are in another (3.6). Maybe
they're being stored in a format in the newer (or older) segments that
doesn't work with raw retrieval of the values through Lucene in the same
way.

Maybe it's able to retrieve the "stored" value from the indexed
representation in one case rather than needing to store it.

I'd query your index using EmbeddedSolrServer instead and see if that
changes what you see.


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang wrote:

> I have an index first built with solr1.4 and later upgraded to solr3.6,
> which has 150million documents, and all docs have a datefield which are not
> blank. (verified by solr query).
>
> I am using the following code snippet to retrieve
>
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.store.*;
> import org.apache.lucene.document.*;
>
> IndexReader input = IndexReader.open(indexDir);
> Document d = input.document(i);
> int maxDoc = input.maxDoc();
> for (int i = 0; i < maxDoc; i++) {
> System.out.println(d.get('date');
> }
>
> However, about 100 million docs give null for d.get('date') and about other
> 50 million docs give the right values.
>
> What could be wrong?
>
> Ming-
>


Re: retrieve datefield value from document

2013-06-14 Thread Dmitry Kan
Maybe a document was marked as deleted?

*isDeleted
*


On Fri, Jun 14, 2013 at 11:25 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Shot in the dark:
>
> You're using Lucene to read the index. That's sort of circumventing all the
> typing stuff that Solr does. Solr can deal with an index where some of the
> segments are in one format (say 1.4) and others are in another (3.6). Maybe
> they're being stored in a format in the newer (or older) segments that
> doesn't work with raw retrieval of the values through Lucene in the same
> way.
>
> Maybe it's able to retrieve the "stored" value from the indexed
> representation in one case rather than needing to store it.
>
> I'd query your index using EmbeddedSolrServer instead and see if that
> changes what you see.
>
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions
> w: appinions.com 
>
>
> On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang  >wrote:
>
> > I have an index first built with solr1.4 and later upgraded to solr3.6,
> > which has 150million documents, and all docs have a datefield which are
> not
> > blank. (verified by solr query).
> >
> > I am using the following code snippet to retrieve
> >
> > import org.apache.lucene.index.IndexReader;
> > import org.apache.lucene.store.*;
> > import org.apache.lucene.document.*;
> >
> > IndexReader input = IndexReader.open(indexDir);
> > Document d = input.document(i);
> > int maxDoc = input.maxDoc();
> > for (int i = 0; i < maxDoc; i++) {
> > System.out.println(d.get('date');
> > }
> >
> > However, about 100 million docs give null for d.get('date') and about
> other
> > 50 million docs give the right values.
> >
> > What could be wrong?
> >
> > Ming-
> >
>


RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-06-14 Thread Andy Brown
Bryan,

For specifics, I'll refer you back to my original email where I
specified all the fields/field types/handlers I use. Here's a general
overview. 
 
I really only have 3 fields that I index and search against: "name",
"description", and "content". All of which are just general text
(string) fields. I have a catch-all field called "text" that is only
used for querying. It's indexed but not stored. The "name",
"description", and "content" fields are copied into the "text" field. 
 
For partial word matching, I have 4 more fields: "name_par",
"description_par", "content_par", and "text_par". The "text_par" field
has the same relationship to the "*_par" fields as "text" does to the
others (only used for querying). Those partial word matching fields are
of type "text_general_partial" which I created. That field type is
analyzed different than the regular text field in that it goes through
an EdgeNGramFilterFactory with the minGramSize="2" and maxGramSize="7"
at index time. 
 
I query against both "text" and "text_par" fields using edismax deftype
with my qf set to "text^2 text_par^1" to give full word matches a higher
score. This part returns back very fast as previously stated. It's when
I turn on highlighting that I take the huge performance hit. 
 
Again, I'm using the FastVectorHighlighting. The hl.fl is set to "name
name_par description description_par content content_par" so that it
returns highlights for full and partial word matches. All of those
fields have indexed, stored, termPositions, termVectors, and termOffsets
set to "true". 
 
It all seems redundant just to allow for partial word
matching/highlighting but I didn't know of a better way. Does anything
stand out to you that could be the culprit? Let me know if you need any
more clarification. 
 
Thanks! 
 
- Andy 

-Original Message-
From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] 
Sent: Wednesday, May 29, 2013 5:44 PM
To: solr-user@lucene.apache.org
Subject: RE: Slow Highlighter Performance Even Using
FastVectorHighlighter

Andy,

> I don't understand why it's taking 7 secs to return highlights. The
size
> of the index is only 20.93 MB. The JVM heap Xms and Xmx are both set
to
> 1024 for this verification purpose and that should be more than
enough.
> The processor is plenty powerful enough as well.
>
> Running VisualVM shows all my CPU time being taken by mainly these 3
> methods:
>
>
org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
> nfo.getStartOffset()
>
org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
> nfo.getStartOffset()
>
org.apache.lucene.search.vectorhighlight.FieldPhraseList.addIfNoOverlap(
> )

That is a strange and interesting set of things to be spending most of
your CPU time on. The implication, I think, is that the number of term
matches in the document for terms in your query (or, at least, terms
matching exact words or the beginning of phrases in your query) is
extremely high . Perhaps that's coming from this "partial word match"
you
mention -- how does that work?

-- Bryan

> My guess is that this has something to do with how I'm handling
partial
> word matches/highlighting. I have setup another request handler that
> only searches the whole word fields and it returns in 850 ms with
> highlighting.
>
> Any ideas?
>
> - Andy
>
>
> -Original Message-
> From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
> Sent: Monday, May 20, 2013 1:39 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Slow Highlighter Performance Even Using
> FastVectorHighlighter
>
> My guess is that the problem is those 200M documents.
> FastVectorHighlighter is fast at deciding whether a match, especially
a
> phrase, appears in a document, but it still starts out by walking the
> entire list of term vectors, and ends by breaking the document into
> candidate-snippet fragments, both processes that are proportional to
the
> length of the document.
>
> It's hard to do much about the first, but for the second you could
> choose
> to expose FastVectorHighlighter's FieldPhraseList representation, and
> return offsets to the caller rather than fragments, building up your
own
> snippets from a separate store of indexed files. This would also
permit
> you to set stored="false", improving your memory/core size ratio,
which
> I'm guessing could use some improving. It would require some work, and
> it
> would require you to store a representation of what was indexed
outside
> the Solr core, in some constant-bytes-to-character representation that
> you
> can use offsets with (e.g. UTF-16, or ASCII+entity references).
>
> However, you may not need to do this -- it may be that you just need
> more
> memory for your search machine. Not JVM memory, but memory that the
O/S
> can use as a file cache. What do you have now? That is, how much
memory
> do
> you have that is not used by the JVM or other apps, and how big is
your
> Solr core?
>
> One way to start getting a handle 

Re: data consistency in solrcloud cluster deployed in aws

2013-06-14 Thread Otis Gospodnetic
Yes, sounds like it's because of the second node being in a different
AZ.  In AWS, AZ really means a DC (Data Center), so the node that is
in a different AZ/DC is naturally going to replicate more slowly.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/





On Fri, Jun 14, 2013 at 8:50 AM, Luis Carlos Guerrero Covo
 wrote:
> Hi,
>
> I currently have solrcloud setup with single shards and two nodes behind a
> load balancer in aws. I also have an additional node in the cluster which
> is outside the load balancer (not receiving any client requests) importing
> data into the cluster using data import handler. So that takes my cluster
> to 3 nodes, 2 receiving user requests and the single data import node.
>
> I'm experiencing several data replication issues that could be caused by
> the irregular setup. The one node that is in the same availability zone as
> the data import node (My two nodes are in two different aws availability
> zones) is replicating correctly and is never far away from the import
> node's generation number. The node that is in a different availability zone
> is always lagging behind in terms of index replication. I'm mentioning
> availability zones because I see that as the only thing that could be
> causing this issue. Am I correct in asuming this? What are further steps
> that I could take to verify what could be the cause of the index not
> replicating fast enough to all nodes?
>
> thanks in advance for any help provided,
>
> Luis Guerrero


Re: data consistency in solrcloud cluster deployed in aws

2013-06-14 Thread Luis Carlos Guerrero Covo
Thank you for your reply otis. I found two open issues which may relate to
this issue:

https://issues.apache.org/jira/browse/SOLR-4924

https://issues.apache.org/jira/browse/SOLR-4260

We recently changed some settings to make commits happen on a more periodic
nature (5 mins or 25000 docs). Before, we ran the commits after every
import from DIH, so commits were more frequent and we were not
experiencieng this issue. The thing is I think this does not relate to
availability zones since I see that the generation number changes on the
replica which is behind every once in a while, but it does not update to a
recent version, but 50 or 60 versions behind the leader and the DIH node.
If this was due to network latency issues, then the versioning would only
be a bit behind.


On Fri, Jun 14, 2013 at 4:51 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Yes, sounds like it's because of the second node being in a different
> AZ.  In AWS, AZ really means a DC (Data Center), so the node that is
> in a different AZ/DC is naturally going to replicate more slowly.
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
>
>
>
>
>
> On Fri, Jun 14, 2013 at 8:50 AM, Luis Carlos Guerrero Covo
>  wrote:
> > Hi,
> >
> > I currently have solrcloud setup with single shards and two nodes behind
> a
> > load balancer in aws. I also have an additional node in the cluster which
> > is outside the load balancer (not receiving any client requests)
> importing
> > data into the cluster using data import handler. So that takes my cluster
> > to 3 nodes, 2 receiving user requests and the single data import node.
> >
> > I'm experiencing several data replication issues that could be caused by
> > the irregular setup. The one node that is in the same availability zone
> as
> > the data import node (My two nodes are in two different aws availability
> > zones) is replicating correctly and is never far away from the import
> > node's generation number. The node that is in a different availability
> zone
> > is always lagging behind in terms of index replication. I'm mentioning
> > availability zones because I see that as the only thing that could be
> > causing this issue. Am I correct in asuming this? What are further steps
> > that I could take to verify what could be the cause of the index not
> > replicating fast enough to all nodes?
> >
> > thanks in advance for any help provided,
> >
> > Luis Guerrero
>



-- 
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047


Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
Michael,

That's what I thought as well.  I would assume an optimization of the index
would rewrite all documents in the newer format then?

Ming-



On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Shot in the dark:
>
> You're using Lucene to read the index. That's sort of circumventing all the
> typing stuff that Solr does. Solr can deal with an index where some of the
> segments are in one format (say 1.4) and others are in another (3.6). Maybe
> they're being stored in a format in the newer (or older) segments that
> doesn't work with raw retrieval of the values through Lucene in the same
> way.
>
> Maybe it's able to retrieve the "stored" value from the indexed
> representation in one case rather than needing to store it.
>
> I'd query your index using EmbeddedSolrServer instead and see if that
> changes what you see.
>
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions
> w: appinions.com 
>
>
> On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang  >wrote:
>
> > I have an index first built with solr1.4 and later upgraded to solr3.6,
> > which has 150million documents, and all docs have a datefield which are
> not
> > blank. (verified by solr query).
> >
> > I am using the following code snippet to retrieve
> >
> > import org.apache.lucene.index.IndexReader;
> > import org.apache.lucene.store.*;
> > import org.apache.lucene.document.*;
> >
> > IndexReader input = IndexReader.open(indexDir);
> > Document d = input.document(i);
> > int maxDoc = input.maxDoc();
> > for (int i = 0; i < maxDoc; i++) {
> > System.out.println(d.get('date');
> > }
> >
> > However, about 100 million docs give null for d.get('date') and about
> other
> > 50 million docs give the right values.
> >
> > What could be wrong?
> >
> > Ming-
> >
>


Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
HI Dmitry,

No, the docs are not deleted.

Ming-


On Fri, Jun 14, 2013 at 1:31 PM, Dmitry Kan  wrote:

> Maybe a document was marked as deleted?
>
> *isDeleted<
> http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexReader.html#isDeleted(int)
> >
> *
>
>
> On Fri, Jun 14, 2013 at 11:25 PM, Michael Della Bitta <
> michael.della.bi...@appinions.com> wrote:
>
> > Shot in the dark:
> >
> > You're using Lucene to read the index. That's sort of circumventing all
> the
> > typing stuff that Solr does. Solr can deal with an index where some of
> the
> > segments are in one format (say 1.4) and others are in another (3.6).
> Maybe
> > they're being stored in a format in the newer (or older) segments that
> > doesn't work with raw retrieval of the values through Lucene in the same
> > way.
> >
> > Maybe it's able to retrieve the "stored" value from the indexed
> > representation in one case rather than needing to store it.
> >
> > I'd query your index using EmbeddedSolrServer instead and see if that
> > changes what you see.
> >
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062  | c: +1 917 477 7906
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions  | g+:
> > plus.google.com/appinions
> > w: appinions.com 
> >
> >
> > On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang  > >wrote:
> >
> > > I have an index first built with solr1.4 and later upgraded to solr3.6,
> > > which has 150million documents, and all docs have a datefield which are
> > not
> > > blank. (verified by solr query).
> > >
> > > I am using the following code snippet to retrieve
> > >
> > > import org.apache.lucene.index.IndexReader;
> > > import org.apache.lucene.store.*;
> > > import org.apache.lucene.document.*;
> > >
> > > IndexReader input = IndexReader.open(indexDir);
> > > Document d = input.document(i);
> > > int maxDoc = input.maxDoc();
> > > for (int i = 0; i < maxDoc; i++) {
> > > System.out.println(d.get('date');
> > > }
> > >
> > > However, about 100 million docs give null for d.get('date') and about
> > other
> > > 50 million docs give the right values.
> > >
> > > What could be wrong?
> > >
> > > Ming-
> > >
> >
>


Solr cloud: zkHost in solr.xml gets wiped out

2013-06-14 Thread Al Wold
Hi,
I'm working on setting up a solr cloud test environment, and the target 
environment I need to put it in has multiple webapps per tomcat instance. With 
that in mind, I wanted/had to avoid putting any configs in system properties. I 
tried putting the zkHost in solr.xml, like this:

> 
> 
>   
>hostContext="/"/>
> 

Everything works fine when I first start things up, create collections, upload 
docs, search, etc. Creating the collection, however, modifies the solr.xml 
file, and doesn't keep the zkHost setting:

> 
> 
>hostContext="/">
>  instanceDir="directory_shard2_replica1/" transient="false" 
> name="directory_shard2_replica1" collection="directory"/>
>  instanceDir="directory_shard1_replica1/" transient="false" 
> name="directory_shard1_replica1" collection="directory"/>
>   
> 


With that in mind, once I restart tomcat, it no longer knows it's supposed to 
be talking to zookeeper, so it looks for local configs and blows up. 

I traced this back to the code in CoreContainer.java, in the method 
persistFile(), where it seems to contain no code to write out the zkHost when 
it updates solr.xml. I upped the logging on my solr instance to verify this 
code is executing, so I'm pretty sure it's the right spot.

Is anyone else using zkHost in their solr.xml successfully? I can't see how it 
would work given this problem.

Does this seem like a bug? If so, I can probably file a report and submit a 
patch. It seems like this problem may become a non-issue in 5.0, based on 
comments in the code and some of the discussion in JIRA, but I'm not sure how 
far off that is.

Thanks!

-Al Wold



Re: retrieve datefield value from document

2013-06-14 Thread Michael Della Bitta
Yes, that should be what happens. But then I'd guess you'd be able to
retrieve no dates. I've encountered this myself.
On Jun 14, 2013 6:05 PM, "Mingfeng Yang"  wrote:

> Michael,
>
> That's what I thought as well.  I would assume an optimization of the index
> would rewrite all documents in the newer format then?
>
> Ming-
>
>
>
> On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta <
> michael.della.bi...@appinions.com> wrote:
>
> > Shot in the dark:
> >
> > You're using Lucene to read the index. That's sort of circumventing all
> the
> > typing stuff that Solr does. Solr can deal with an index where some of
> the
> > segments are in one format (say 1.4) and others are in another (3.6).
> Maybe
> > they're being stored in a format in the newer (or older) segments that
> > doesn't work with raw retrieval of the values through Lucene in the same
> > way.
> >
> > Maybe it's able to retrieve the "stored" value from the indexed
> > representation in one case rather than needing to store it.
> >
> > I'd query your index using EmbeddedSolrServer instead and see if that
> > changes what you see.
> >
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062  | c: +1 917 477 7906
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions  | g+:
> > plus.google.com/appinions
> > w: appinions.com 
> >
> >
> > On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang  > >wrote:
> >
> > > I have an index first built with solr1.4 and later upgraded to solr3.6,
> > > which has 150million documents, and all docs have a datefield which are
> > not
> > > blank. (verified by solr query).
> > >
> > > I am using the following code snippet to retrieve
> > >
> > > import org.apache.lucene.index.IndexReader;
> > > import org.apache.lucene.store.*;
> > > import org.apache.lucene.document.*;
> > >
> > > IndexReader input = IndexReader.open(indexDir);
> > > Document d = input.document(i);
> > > int maxDoc = input.maxDoc();
> > > for (int i = 0; i < maxDoc; i++) {
> > > System.out.println(d.get('date');
> > > }
> > >
> > > However, about 100 million docs give null for d.get('date') and about
> > other
> > > 50 million docs give the right values.
> > >
> > > What could be wrong?
> > >
> > > Ming-
> > >
> >
>


Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
How did you solve the problem then?

MIng


On Fri, Jun 14, 2013 at 3:24 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Yes, that should be what happens. But then I'd guess you'd be able to
> retrieve no dates. I've encountered this myself.
> On Jun 14, 2013 6:05 PM, "Mingfeng Yang"  wrote:
>
> > Michael,
> >
> > That's what I thought as well.  I would assume an optimization of the
> index
> > would rewrite all documents in the newer format then?
> >
> > Ming-
> >
> >
> >
> > On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta <
> > michael.della.bi...@appinions.com> wrote:
> >
> > > Shot in the dark:
> > >
> > > You're using Lucene to read the index. That's sort of circumventing all
> > the
> > > typing stuff that Solr does. Solr can deal with an index where some of
> > the
> > > segments are in one format (say 1.4) and others are in another (3.6).
> > Maybe
> > > they're being stored in a format in the newer (or older) segments that
> > > doesn't work with raw retrieval of the values through Lucene in the
> same
> > > way.
> > >
> > > Maybe it's able to retrieve the "stored" value from the indexed
> > > representation in one case rather than needing to store it.
> > >
> > > I'd query your index using EmbeddedSolrServer instead and see if that
> > > changes what you see.
> > >
> > >
> > > Michael Della Bitta
> > >
> > > Applications Developer
> > >
> > > o: +1 646 532 3062  | c: +1 917 477 7906
> > >
> > > appinions inc.
> > >
> > > “The Science of Influence Marketing”
> > >
> > > 18 East 41st Street
> > >
> > > New York, NY 10017
> > >
> > > t: @appinions  | g+:
> > > plus.google.com/appinions
> > > w: appinions.com 
> > >
> > >
> > > On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang  > > >wrote:
> > >
> > > > I have an index first built with solr1.4 and later upgraded to
> solr3.6,
> > > > which has 150million documents, and all docs have a datefield which
> are
> > > not
> > > > blank. (verified by solr query).
> > > >
> > > > I am using the following code snippet to retrieve
> > > >
> > > > import org.apache.lucene.index.IndexReader;
> > > > import org.apache.lucene.store.*;
> > > > import org.apache.lucene.document.*;
> > > >
> > > > IndexReader input = IndexReader.open(indexDir);
> > > > Document d = input.document(i);
> > > > int maxDoc = input.maxDoc();
> > > > for (int i = 0; i < maxDoc; i++) {
> > > > System.out.println(d.get('date');
> > > > }
> > > >
> > > > However, about 100 million docs give null for d.get('date') and about
> > > other
> > > > 50 million docs give the right values.
> > > >
> > > > What could be wrong?
> > > >
> > > > Ming-
> > > >
> > >
> >
>


strange solr version error

2013-06-14 Thread Jenny Huang
Hi,

I need to use solrj to do a full data import from a table in database, and
encountered the solr version error: "java.lang.RuntimeException: Invalid
version (expected 2, but 60) or the data in not in 'javabin' format".  To
figure out what went wrong, I stripped the program to bare bone and let it
run data import for the 'db' in solr tutorial example-DIH
(\solr-4.3.0\example\example-DIH), and experienced the same version error.

I downloaded and run the most recent solr-4.3.0 in window 7, and pulled the
same version of solrj when writing the small solrj program for data import
(see below maven import).


org.apache.solr
solr-solrj
4.3.0


org.apache.solr
solr-core
4.3.0



The main part of data import solrj program is very simple, see below.

public class DbDataImportClient {
public void fullImport(String url) {
try {
   HttpSolrServer server = new HttpSolrServer(url);
   ModifiableSolrParams params = new ModifiableSolrParams();
   params.set("qt", "/dataimport");
   params.set("command", "full-import");
   server.query(params);
 } catch (Exception e) {
e.printStackTrace();
  }
}

public static void main(String[] args) {
String url = "http://localhost:8983/solr/#/db";;
new DbDataImportClient().fullImport(url);
}
}

I have been going through almost all the pieces of internet search for that
error for two days.  Majority of them are about incompatible versions.  I
don't think it's my case.  I am at my wits end on what went wrong, and
really need help in the problem.


Thanks ahead.

-Jenny


Re: retrieve datefield value from document

2013-06-14 Thread Michael Della Bitta
Use EmbeddedSolrServer rather than Lucene directly.
On Jun 14, 2013 6:47 PM, "Mingfeng Yang"  wrote:

> How did you solve the problem then?
>
> MIng
>
>
> On Fri, Jun 14, 2013 at 3:24 PM, Michael Della Bitta <
> michael.della.bi...@appinions.com> wrote:
>
> > Yes, that should be what happens. But then I'd guess you'd be able to
> > retrieve no dates. I've encountered this myself.
> > On Jun 14, 2013 6:05 PM, "Mingfeng Yang"  wrote:
> >
> > > Michael,
> > >
> > > That's what I thought as well.  I would assume an optimization of the
> > index
> > > would rewrite all documents in the newer format then?
> > >
> > > Ming-
> > >
> > >
> > >
> > > On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta <
> > > michael.della.bi...@appinions.com> wrote:
> > >
> > > > Shot in the dark:
> > > >
> > > > You're using Lucene to read the index. That's sort of circumventing
> all
> > > the
> > > > typing stuff that Solr does. Solr can deal with an index where some
> of
> > > the
> > > > segments are in one format (say 1.4) and others are in another (3.6).
> > > Maybe
> > > > they're being stored in a format in the newer (or older) segments
> that
> > > > doesn't work with raw retrieval of the values through Lucene in the
> > same
> > > > way.
> > > >
> > > > Maybe it's able to retrieve the "stored" value from the indexed
> > > > representation in one case rather than needing to store it.
> > > >
> > > > I'd query your index using EmbeddedSolrServer instead and see if that
> > > > changes what you see.
> > > >
> > > >
> > > > Michael Della Bitta
> > > >
> > > > Applications Developer
> > > >
> > > > o: +1 646 532 3062  | c: +1 917 477 7906
> > > >
> > > > appinions inc.
> > > >
> > > > “The Science of Influence Marketing”
> > > >
> > > > 18 East 41st Street
> > > >
> > > > New York, NY 10017
> > > >
> > > > t: @appinions  | g+:
> > > > plus.google.com/appinions
> > > > w: appinions.com 
> > > >
> > > >
> > > > On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang <
> mfy...@wisewindow.com
> > > > >wrote:
> > > >
> > > > > I have an index first built with solr1.4 and later upgraded to
> > solr3.6,
> > > > > which has 150million documents, and all docs have a datefield which
> > are
> > > > not
> > > > > blank. (verified by solr query).
> > > > >
> > > > > I am using the following code snippet to retrieve
> > > > >
> > > > > import org.apache.lucene.index.IndexReader;
> > > > > import org.apache.lucene.store.*;
> > > > > import org.apache.lucene.document.*;
> > > > >
> > > > > IndexReader input = IndexReader.open(indexDir);
> > > > > Document d = input.document(i);
> > > > > int maxDoc = input.maxDoc();
> > > > > for (int i = 0; i < maxDoc; i++) {
> > > > > System.out.println(d.get('date');
> > > > > }
> > > > >
> > > > > However, about 100 million docs give null for d.get('date') and
> about
> > > > other
> > > > > 50 million docs give the right values.
> > > > >
> > > > > What could be wrong?
> > > > >
> > > > > Ming-
> > > > >
> > > >
> > >
> >
>


yet another optimize question

2013-06-14 Thread Petersen, Robert
Hi guys,

We're on solr 3.6.1 and I've read the discussions about whether to optimize or 
not to optimize.  I decided to try not optimizing our index as was recommended. 
 We have a little over 15 million docs in our biggest index and a 32gb heap for 
our jvm.  So without the optimizes the index folder seemed to grow in size and 
quantity of files.  There seemed to be an upper limit but eventually it hit 300 
files consuming 26gb of space and that seemed to push our slave farm over the 
edge and we started getting the dreaded OOMs.  We have continuous indexing 
activity, so I stopped the indexer and manually ran an optimize which made the 
index become 9 files consuming 15gb of space and our slave farm started having 
acceptable memory usage.  Our merge factor is 10, we're on java 7.  Before 
optimizing, I tried on one slave machine to go with the latest JVM and tried 
switching from the CMS GC to the G1GC but it hit OOM condition even faster.  So 
it seems like I have to continue to schedule a regular optimize.  Right now it 
has been a couple of days since running the optimize and the index is slowly 
growing bigger, now up to a bit over 19gb.  What do you guys think?  Did I miss 
something that would make us able to run without doing an optimize?

Robert (Robi) Petersen
Senior Software Engineer
Search Department


Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
Figured out the solution.

The datefield in those documents were stored as binary, so what I should do
is

Fieldable df = doc.getFieldable(fname);
byte[] ary = df.getBinaryValue();
ByteBuffer bb = ByteBuffer.wrap(ary);
long num = bb.getLong();
ate dt = DateTools.stringToDate(DateTools.timeToString(num,
DateTools.Resolution.SECOND));

Then you get dt as a string in the right format.

Ming-


On Fri, Jun 14, 2013 at 4:20 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Use EmbeddedSolrServer rather than Lucene directly.
> On Jun 14, 2013 6:47 PM, "Mingfeng Yang"  wrote:
>
> > How did you solve the problem then?
> >
> > MIng
> >
> >
> > On Fri, Jun 14, 2013 at 3:24 PM, Michael Della Bitta <
> > michael.della.bi...@appinions.com> wrote:
> >
> > > Yes, that should be what happens. But then I'd guess you'd be able to
> > > retrieve no dates. I've encountered this myself.
> > > On Jun 14, 2013 6:05 PM, "Mingfeng Yang" 
> wrote:
> > >
> > > > Michael,
> > > >
> > > > That's what I thought as well.  I would assume an optimization of the
> > > index
> > > > would rewrite all documents in the newer format then?
> > > >
> > > > Ming-
> > > >
> > > >
> > > >
> > > > On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta <
> > > > michael.della.bi...@appinions.com> wrote:
> > > >
> > > > > Shot in the dark:
> > > > >
> > > > > You're using Lucene to read the index. That's sort of circumventing
> > all
> > > > the
> > > > > typing stuff that Solr does. Solr can deal with an index where some
> > of
> > > > the
> > > > > segments are in one format (say 1.4) and others are in another
> (3.6).
> > > > Maybe
> > > > > they're being stored in a format in the newer (or older) segments
> > that
> > > > > doesn't work with raw retrieval of the values through Lucene in the
> > > same
> > > > > way.
> > > > >
> > > > > Maybe it's able to retrieve the "stored" value from the indexed
> > > > > representation in one case rather than needing to store it.
> > > > >
> > > > > I'd query your index using EmbeddedSolrServer instead and see if
> that
> > > > > changes what you see.
> > > > >
> > > > >
> > > > > Michael Della Bitta
> > > > >
> > > > > Applications Developer
> > > > >
> > > > > o: +1 646 532 3062  | c: +1 917 477 7906
> > > > >
> > > > > appinions inc.
> > > > >
> > > > > “The Science of Influence Marketing”
> > > > >
> > > > > 18 East 41st Street
> > > > >
> > > > > New York, NY 10017
> > > > >
> > > > > t: @appinions  | g+:
> > > > > plus.google.com/appinions
> > > > > w: appinions.com 
> > > > >
> > > > >
> > > > > On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang <
> > mfy...@wisewindow.com
> > > > > >wrote:
> > > > >
> > > > > > I have an index first built with solr1.4 and later upgraded to
> > > solr3.6,
> > > > > > which has 150million documents, and all docs have a datefield
> which
> > > are
> > > > > not
> > > > > > blank. (verified by solr query).
> > > > > >
> > > > > > I am using the following code snippet to retrieve
> > > > > >
> > > > > > import org.apache.lucene.index.IndexReader;
> > > > > > import org.apache.lucene.store.*;
> > > > > > import org.apache.lucene.document.*;
> > > > > >
> > > > > > IndexReader input = IndexReader.open(indexDir);
> > > > > > Document d = input.document(i);
> > > > > > int maxDoc = input.maxDoc();
> > > > > > for (int i = 0; i < maxDoc; i++) {
> > > > > > System.out.println(d.get('date');
> > > > > > }
> > > > > >
> > > > > > However, about 100 million docs give null for d.get('date') and
> > about
> > > > > other
> > > > > > 50 million docs give the right values.
> > > > > >
> > > > > > What could be wrong?
> > > > > >
> > > > > > Ming-
> > > > > >
> > > > >
> > > >
> > >
> >
>


solr version error

2013-06-14 Thread Jenny Huang
Hi,

I need to use solrj to do a full data import from a table in database, and
encountered the solr version error: "java.lang.RuntimeException: Invalid
version (expected 2, but 60) or the data in not in 'javabin' format".  To
figure out what went wrong, I stripped the program to bare bone and let it
run data import for the 'db' in solr tutorial example-DIH
(\solr-4.3.0\example\example-DIH), and experienced the same version error.

I downloaded and run the most recent solr-4.3.0 in window 7, and pulled the
same version of solrj when writing the small solrj program for data import
(see below maven import).


org.apache.solr
solr-solrj
4.3.0


org.apache.solr
solr-core
4.3.0



The main part of data import solrj program is very simple, see below.

public class DbDataImportClient {
public void fullImport(String url) {
try {
   HttpSolrServer server = new HttpSolrServer(url);
   ModifiableSolrParams params = new ModifiableSolrParams();
   params.set("qt", "/dataimport");
   params.set("command", "full-import");
   server.query(params);
 } catch (Exception e) {
e.printStackTrace();
  }
}

public static void main(String[] args) {
String url = "http://localhost:8983/solr/#/db";;
new DbDataImportClient().fullImport(url);
}
}

I have been going through almost all the pieces of internet search for that
error for two days.  Majority of them are about incompatible versions.  I
don't think it's my case.  I am at my wits end on what went wrong, and
really need help in the problem.


Thanks ahead


Re: strange solr version error

2013-06-14 Thread Shawn Heisey
On 6/14/2013 4:26 PM, Jenny Huang wrote:
> Hi,
> 
> I need to use solrj to do a full data import from a table in database, and
> encountered the solr version error: "java.lang.RuntimeException: Invalid
> version (expected 2, but 60) or the data in not in 'javabin' format".  To
> figure out what went wrong, I stripped the program to bare bone and let it
> run data import for the 'db' in solr tutorial example-DIH
> (\solr-4.3.0\example\example-DIH), and experienced the same version error.

This error actually means that the response you are getting is HTML
rather than javabin.

The reason you are getting the error is because you have given SolrJ a
URL from the admin UI, not a core base URL for API calls.  Use this instead:

String url = "http://localhost:8983/solr/db";;

Thanks,
Shawn



Re: strange solr version error

2013-06-14 Thread Jenny Huang
Thanks a lot, Shawn.  It works now.

Have a nice weekend,

-Jenny






On Fri, Jun 14, 2013 at 6:47 PM, Shawn Heisey  wrote:

> On 6/14/2013 4:26 PM, Jenny Huang wrote:
> > Hi,
> >
> > I need to use solrj to do a full data import from a table in database,
> and
> > encountered the solr version error: "java.lang.RuntimeException: Invalid
> > version (expected 2, but 60) or the data in not in 'javabin' format".  To
> > figure out what went wrong, I stripped the program to bare bone and let
> it
> > run data import for the 'db' in solr tutorial example-DIH
> > (\solr-4.3.0\example\example-DIH), and experienced the same version
> error.
>
> This error actually means that the response you are getting is HTML
> rather than javabin.
>
> The reason you are getting the error is because you have given SolrJ a
> URL from the admin UI, not a core base URL for API calls.  Use this
> instead:
>
> String url = "http://localhost:8983/solr/db";;
>
> Thanks,
> Shawn
>
>


Re: solr version error

2013-06-14 Thread Jenny Huang
This is a duplicate of another topic "strange solr version error (
http://lucene.472066.n3.nabble.com/strange-solr-version-error-td4070636.html)".
Please ignore or delete it.


On Fri, Jun 14, 2013 at 5:54 PM, Jenny Huang <
sunearthmoonwaterf...@gmail.com> wrote:

> Hi,
>
> I need to use solrj to do a full data import from a table in database, and
> encountered the solr version error: "java.lang.RuntimeException: Invalid
> version (expected 2, but 60) or the data in not in 'javabin' format".  To
> figure out what went wrong, I stripped the program to bare bone and let it
> run data import for the 'db' in solr tutorial example-DIH
> (\solr-4.3.0\example\example-DIH), and experienced the same version error.
>
> I downloaded and run the most recent solr-4.3.0 in window 7, and pulled
> the same version of solrj when writing the small solrj program for data
> import (see below maven import).
>
> 
> org.apache.solr
> solr-solrj
> 4.3.0
> 
> 
> org.apache.solr
> solr-core
> 4.3.0
> 
>
>
> The main part of data import solrj program is very simple, see below.
>
> public class DbDataImportClient {
> public void fullImport(String url) {
> try {
>HttpSolrServer server = new HttpSolrServer(url);
>ModifiableSolrParams params = new ModifiableSolrParams();
>params.set("qt", "/dataimport");
>params.set("command", "full-import");
>server.query(params);
>  } catch (Exception e) {
> e.printStackTrace();
>   }
> }
>
> public static void main(String[] args) {
> String url = "http://localhost:8983/solr/#/db";;
> new DbDataImportClient().fullImport(url);
> }
> }
>
> I have been going through almost all the pieces of internet search for
> that error for two days.  Majority of them are about incompatible
> versions.  I don't think it's my case.  I am at my wits end on what went
> wrong, and really need help in the problem.
>
>
> Thanks ahead
>