Re: Multi-core and replicated Solr cloud testing. Data-directory mis-configures

2013-03-25 Thread Gopal Patwa
if you use default directory then it will use solr.home directory, I have
tested solr cloud example on local machine with 5-6 nodes.And data
directory was created under core name, like

"example2/solr/collection1/data". you could see example startup script from
source code "solr/cloud-dev/solrcloud-multi-start.sh"

example solrconfig.xml

  ${solr.data.dir:}

On Sun, Mar 24, 2013 at 10:44 PM, Trevor Campbell
wrote:

> I have three indexes which I have set up as three separate cores, using
> this solr.xml config.
>
>hostPort="${jetty.port:}">
> 
>
> 
> 
>
> 
> 
>
> 
>   
>
> This works just fine a standalone solr.
>
> I duplicated this setup on the same machine under a completely separate
> solr installation (solr-nodeb) and modified all the data directroies to
> point to the direstories in nodeb.  This all worked fine.
>
> I then connected the 2 instances together with zoo-keeper using settings
> "-Dbootstrap_conf=true -Dcollection.configName=**jiraCluster -DzkRun
> -DnumShards=1" for the first intsance and "-DzkHost=localhost:9080" for
>  the second. (I'm using tomcat and ports 8080 and 8081 for the 2 Solr
> instances)
>
> Now the data directories of the second node point to the data directories
> in the first node.
>
> I have tried many settings in the solrconfig.xml for each core but am now
> using absolute paths, e.g.
> /home//solr-**4.2.0-nodeb/example/multicore/**
> jira-comment/data
>
> previously I used
> ${solr.jira-comment.data.dir:/**home/tcampbell/solr-4.2.0-**
> nodeb/example/multicore/jira-**comment/data}
> but that had the same result.
>
> It seems zookeeper is forcing data directory config from the uploaded
> configuration on all the nodes in the cluster?
>
> How can I do testing on a single machine? Do I really need identical
> directory layouts on all machines?
>
>
>


Re: OutOfMemoryError

2013-03-25 Thread Arkadi Colson
I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as 
parameters. I also added -XX:+UseG1GC to the java process. But now the 
whole machine crashes! Any idea why?


Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked 
oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ 
mems_allowed=0
Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java 
Not tainted 2.6.32-5-amd64 #1

Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace:
Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [] ? 
oom_kill_process+0x7f/0x23f
Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [] ? 
__out_of_memory+0x12a/0x141
Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [] ? 
out_of_memory+0x140/0x172
Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [] ? 
__alloc_pages_nodemask+0x4ec/0x5fc
Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [] ? 
io_schedule+0x93/0xb7
Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [] ? 
__do_page_cache_readahead+0x9b/0x1b4
Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [] ? 
wake_bit_function+0x0/0x23
Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [] ? 
ra_submit+0x1c/0x20
Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [] ? 
filemap_fault+0x17d/0x2f6
Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [] ? 
__do_fault+0x54/0x3c3
Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [] ? 
handle_mm_fault+0x3b8/0x80f
Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [] ? 
apic_timer_interrupt+0xe/0x20
Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [] ? 
do_page_fault+0x2e0/0x2fc
Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [] ? 
page_fault+0x25/0x30

Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info:
Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, 
btch:   1 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, 
btch:   1 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, 
btch:   1 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, 
btch:   1 usd:   0

Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, 
btch:  31 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, 
btch:  31 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, 
btch:  31 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, 
btch:  31 usd:   0

Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, 
btch:  31 usd:  17
Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, 
btch:  31 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, 
btch:  31 usd:   2
Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, 
btch:  31 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 
inactive_anon:388557 isolated_anon:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080209]  active_file:68 
inactive_file:236 isolated_file:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080210]  unevictable:0 dirty:5 
writeback:5 unstable:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080211]  free:16573 
slab_reclaimable:2398 slab_unreclaimable:2335
Mar 22 20:30:01 solr01-gs kernel: [716098.080212]  mapped:36 shmem:0 
pagetables:24750 bounce:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080575] Node 0 DMA 
free:15796kB min:16kB low:20kB high:24kB active_anon:0kB 
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB 
isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB 
dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 22 20:30:01 solr01-gs kernel: [716098.081041] lowmem_reserve[]: 0 
3000 12090 12090
Mar 22 20:30:01 solr01-gs kernel: [716098.081110] Node 0 DMA32 
free:39824kB min:3488kB low:4360kB high:5232kB active_anon:2285240kB 
inactive_anon:520624kB active_file:0kB inactive_file:188kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072096kB 
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB 
slab_reclaimable:4152kB slab_unreclaimable:1640kB kernel_stack:1104kB 
pagetables:31100kB unstable:0kB bounce:0kB writeback_tmp:0kB 
pages_scanned:89 all_unreclaimable? no
Mar 22 20:30:01 solr01-gs kernel: [716098.081600] lowmem_reserve[]: 0 0 
9090 9090
Mar 22 20:30:01 solr01-gs kernel: [716098.081664] Node 0 Normal 
free:10672kB min:10572kB low:13212kB high:15856kB active_anon:8266824kB 
inactive_anon:1033604kB active_file:292kB inactive_file:756kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:9308160kB 
mlocked:0kB dirty:20kB writeback:20kB mappe

SOLR - "Unable to execute query" error - DIH

2013-03-25 Thread kobe.free.wo...@gmail.com
Hello All,

I am trying to index data from SQL Server view to the SOLR using the DIH
with full-import command. The view has 750K rows and 427 columns. During the
first execution i indexed only the first 50 rows of the view, the data got
indexed in 10 min. But, when i executed the same scenario to index the
complete set of 750K rows, the execution continued for 2 days and
roll-backed, giving me the following error:

"Unable to execute the query: select * from."

Following is my DIH configuration file,


  
  

   

As suggested in some of the posts, i did try with batchsize="-1", but dint
work out. Please suggest is this the correct approach or any parameter needs
to be modified for tuning.

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR - "Unable to execute query" error - DIH

2013-03-25 Thread kobe.free.wo...@gmail.com
In context of the above scenario, when i try to index set of 500 rows, it
fetches and indexes around 400 odd rows and then it shows no progress and
keeps on executing. What can be the possible cause of this issue? If
possible, please do share if you guys have gone through such scenario with
the respective details.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051034.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [ANNOUNCE] Solr wiki editing change

2013-03-25 Thread Andrzej Bialecki

On 3/25/13 4:18 AM, Steve Rowe wrote:

The wiki at http://wiki.apache.org/solr/ has come under attack by spammers more 
frequently of late, so the PMC has decided to lock it down in an attempt to 
reduce the work involved in tracking and removing spam.

 From now on, only people who appear on 
http://wiki.apache.org/solr/ContributorsGroup will be able to 
create/modify/delete wiki pages.

Please request either on the solr-user@lucene.apache.org or on 
d...@lucene.apache.org to have your wiki username added to the 
ContributorsGroup page - this is a one-time step.


Please add AndrzejBialecki to this group. Thank you!

--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
 ___.,___,___,___,_._. __<><
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com



storing key value pair in multivalued field solr4.0

2013-03-25 Thread Karunakar Reddy
Hi ,
i am using solr4.0.i want to store key value pairs of attributes in
mutlivalued field.

Example i have some documents (Products) which have attributes as one field
and i indexed
attributes as separate documents to power auto suggest . now in some auto
suggest i have to show facet count of products also . for this i am using
solr joins 4.0 and faceting on attributes. here i want to get the name and
id of attributes. how i can achieve this?

The Query is looks like below

localhost:8980/solr/searchapp/select?q=%7B!join+from=attr_id+to=prod_attr_id%7Dterms:red&wt=json&indent=true&facet.field=prod_attr_id&facet=true&rows=1000&fl=product_name,product_id


Thanks in advance !


Re: Very slow query when boosting involve with EnternalFileField

2013-03-25 Thread Mikhail Khludnev
Floyd,

I think you need provide stack trace or draft sampling.


On Fri, Mar 22, 2013 at 6:23 AM, Floyd Wu  wrote:

> Anybody can point me a direction?
> Many thanks.
>
>
>
> 2013/3/20 Floyd Wu 
>
> > Hi everyone,
> >
> > I have a problem and have no luck to figure out.
> >
> > When I issue a query to
> > Query 1
> >
> >
> http://localhost:8983/solr/select?q={!boost+b=recip(ms(NOW/HOUR,last_modified_datetime),3.16e-11,1,1)}all
> <
> http://localhost:8983/solr/select?q=%7B!boost+b=recip(ms(NOW/HOUR,last_modified_datetime),3.16e-11,1,1)%7Dall
> >
> > :"java"&start=0&rows=10&fl=score,author&sort=score+desc
> >
> > Query 2
> >
> >
> http://localhost:8983/solr/select?q={!boost+b=sum(ranking,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)}all
> <
> http://localhost:8983/solr/select?q=%7B!boost+b=sum(ranking,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)%7Dall
> >
> > :"java"&start=0&rows=10&fl=score,author&sort=score+desc
> >
> > The difference between two query is "boost".
> > The boost function of Query 2 using a field named ranking and this field
> > is ExternalFileField.
> > External file is key=value pair about 1 lines.
> >
> > Execution time
> > Query 1-->100ms
> > Query 2-->2300ms
> >
> > I tried to issue Query 3 and change ranking to a constant "1"
> >
> >
> http://localhost:8983/solr/select?q={!boost+b=sum(1,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)}all
> <
> http://localhost:8983/solr/select?q=%7B!boost+b=sum(1,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)%7Dall
> >
> > :"java"&start=0&rows=10&fl=score,author&sort=score+desc
> >
> > Execution time
> > Query 3-->110ms
> >
> > one thing I can sure that involved with externalFileField will slow down
> > query execution time significantly. But I have no idea how to solve this
> > problem as my boost function must calculate value of ranking field.
> >
> > Please help on this.
> >
> > PS: I'm using SOLR-4.1
> >
> > Floyd
> >
> >
> >
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: [ANNOUNCE] Solr wiki editing change

2013-03-25 Thread xie kidd
Please add adderllyer to this group. Thank you!

For the ideal, never give up, fighting!


On Mon, Mar 25, 2013 at 5:11 PM, Andrzej Bialecki  wrote:

> On 3/25/13 4:18 AM, Steve Rowe wrote:
>
>> The wiki at http://wiki.apache.org/solr/ has come under attack by
>> spammers more frequently of late, so the PMC has decided to lock it down in
>> an attempt to reduce the work involved in tracking and removing spam.
>>
>>  From now on, only people who appear on http://wiki.apache.org/solr/**
>> ContributorsGroup  will
>> be able to create/modify/delete wiki pages.
>>
>> Please request either on the solr-user@lucene.apache.org or on
>> d...@lucene.apache.org to have your wiki username added to the
>> ContributorsGroup page - this is a one-time step.
>>
>
> Please add AndrzejBialecki to this group. Thank you!
>
> --
> Best regards,
> Andrzej Bialecki
> http://www.sigram.com, blog http://www.sigram.com/blog
>  ___.,___,___,___,_._. __<><_**___
> [___||.__|__/|__||\/|: Information Retrieval, System Integration
> ___|||__||..\|..||..|: Contact: info at sigram dot com
>
>


Undefined field problem.

2013-03-25 Thread Mid Night
Hi,


I recently added a new field (toptipp) to an existing solr schema.xml and
it worked just fine.  Subsequently I added to more fields (active_cruises
and non_grata) to the schema and now I get this error:



4006undefined
field: "active_cruise"400



My solr db is populated via a program that creates and uploads a csv file.
When I view the csv file, the field "active_cruises" (given as undefined
above), is populated correctly.  As far as I can tell, when I added the
final fields to the schema, I did exactly the same as when I added
"toptipp".  I updated schema.xml and restarted solr (java -jar start.jar).

I am really at a loss here.  Can someone please help with the answer or by
pointing me in the right direction?  Naturally I'd be happy to provide
further info if needed.


Thanks
MK


Re: Undefined field problem.

2013-03-25 Thread Mid Night
Further to the prev msg:  Here's an extract from my current schema.xml:

   
   
   
   



The original schema.xml had the last 3 fields in the order toptipp,
active_cruise and non_grata.  Active_cruise and non_grata were also defined
as type="int".  I changed the order and field types in my attempts to fix
the error.





On 25 March 2013 11:21, Mid Night  wrote:

> Hi,
>
>
> I recently added a new field (toptipp) to an existing solr schema.xml and
> it worked just fine.  Subsequently I added to more fields (active_cruises
> and non_grata) to the schema and now I get this error:
>
> 
> 
> 400 name="QTime">6undefined field: 
> "active_cruise"400
> 
>
>
> My solr db is populated via a program that creates and uploads a csv
> file.  When I view the csv file, the field "active_cruises" (given as
> undefined above), is populated correctly.  As far as I can tell, when I
> added the final fields to the schema, I did exactly the same as when I
> added "toptipp".  I updated schema.xml and restarted solr (java -jar
> start.jar).
>
> I am really at a loss here.  Can someone please help with the answer or by
> pointing me in the right direction?  Naturally I'd be happy to provide
> further info if needed.
>
>
> Thanks
> MK
>
>
>
>
>
>
>
>


Re: OutOfMemoryError

2013-03-25 Thread Arkadi Colson
Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? 
Any extra options needed?


Thanks...

On 03/25/2013 08:34 AM, Arkadi Colson wrote:
I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m 
as parameters. I also added -XX:+UseG1GC to the java process. But now 
the whole machine crashes! Any idea why?


Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked 
oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ 
mems_allowed=0
Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: 
java Not tainted 2.6.32-5-amd64 #1

Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace:
Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [] 
? oom_kill_process+0x7f/0x23f
Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [] 
? __out_of_memory+0x12a/0x141
Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [] 
? out_of_memory+0x140/0x172
Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [] 
? __alloc_pages_nodemask+0x4ec/0x5fc
Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [] 
? io_schedule+0x93/0xb7
Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [] 
? __do_page_cache_readahead+0x9b/0x1b4
Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [] 
? wake_bit_function+0x0/0x23
Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [] 
? ra_submit+0x1c/0x20
Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [] 
? filemap_fault+0x17d/0x2f6
Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [] 
? __do_fault+0x54/0x3c3
Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [] 
? handle_mm_fault+0x3b8/0x80f
Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [] 
? apic_timer_interrupt+0xe/0x20
Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [] 
? do_page_fault+0x2e0/0x2fc
Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [] 
? page_fault+0x25/0x30

Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info:
Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, 
btch:   1 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, 
btch:   1 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, 
btch:   1 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, 
btch:   1 usd:   0

Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, 
btch:  31 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, 
btch:  31 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, 
btch:  31 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, 
btch:  31 usd:   0

Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, 
btch:  31 usd:  17
Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, 
btch:  31 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, 
btch:  31 usd:   2
Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, 
btch:  31 usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 
inactive_anon:388557 isolated_anon:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080209]  active_file:68 
inactive_file:236 isolated_file:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080210]  unevictable:0 
dirty:5 writeback:5 unstable:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080211]  free:16573 
slab_reclaimable:2398 slab_unreclaimable:2335
Mar 22 20:30:01 solr01-gs kernel: [716098.080212]  mapped:36 shmem:0 
pagetables:24750 bounce:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080575] Node 0 DMA 
free:15796kB min:16kB low:20kB high:24kB active_anon:0kB 
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB 
isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB 
dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 22 20:30:01 solr01-gs kernel: [716098.081041] lowmem_reserve[]: 0 
3000 12090 12090
Mar 22 20:30:01 solr01-gs kernel: [716098.081110] Node 0 DMA32 
free:39824kB min:3488kB low:4360kB high:5232kB active_anon:2285240kB 
inactive_anon:520624kB active_file:0kB inactive_file:188kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:3072096kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB 
shmem:0kB slab_reclaimable:4152kB slab_unreclaimable:1640kB 
kernel_stack:1104kB pagetables:31100kB unstable:0kB bounce:0kB 
writeback_tmp:0kB pages_scanned:89 all_unreclaimable? no
Mar 22 20:30:01 solr01-gs kernel: [716098.081600] lowmem_reserve[]: 0 
0 9090 9090
Mar 22 20:30:01 solr01-gs kernel: [716098.081664] Node 0 Normal 
free:10672kB min:10572kB low:13212kB high:15856kB 
active_anon:8266824kB inactive_anon:1033604kB

Re: OutOfMemoryError

2013-03-25 Thread Bernd Fehling
The of UseG1GC yes,
but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM (1.7.0_07).
os.​arch: amd64
os.​name: Linux
os.​version: 2.6.32.13-0.5-xen

Only args are "-XX:+UseG1GC -Xms16g -Xmx16g".
Monitoring shows that 16g is a bit high, I might reduce it to 10g or 12g for 
the slaves.
Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g.
Single index, 130GByte, 43.5 mio. dokuments.

Regards,
Bernd


Am 25.03.2013 11:55, schrieb Arkadi Colson:
> Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? Any 
> extra options needed?
> 
> Thanks...
> 
> On 03/25/2013 08:34 AM, Arkadi Colson wrote:
>> I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as 
>> parameters. I also added -XX:+UseG1GC to the java process. But now
>> the whole machine crashes! Any idea why?
>>
>> Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: 
>> gfp_mask=0x201da, order=0, oom_adj=0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ 
>> mems_allowed=0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java Not 
>> tainted 2.6.32-5-amd64 #1
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [] ? 
>> oom_kill_process+0x7f/0x23f
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [] ? 
>> __out_of_memory+0x12a/0x141
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [] ? 
>> out_of_memory+0x140/0x172
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [] ? 
>> __alloc_pages_nodemask+0x4ec/0x5fc
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [] ? 
>> io_schedule+0x93/0xb7
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [] ? 
>> __do_page_cache_readahead+0x9b/0x1b4
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [] ? 
>> wake_bit_function+0x0/0x23
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [] ? 
>> ra_submit+0x1c/0x20
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [] ? 
>> filemap_fault+0x17d/0x2f6
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [] ? 
>> __do_fault+0x54/0x3c3
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [] ? 
>> handle_mm_fault+0x3b8/0x80f
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [] ? 
>> apic_timer_interrupt+0xe/0x20
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [] ? 
>> do_page_fault+0x2e0/0x2fc
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [] ? 
>> page_fault+0x25/0x30
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch:   1 
>> usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch:   1 
>> usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch:   1 
>> usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch:   1 
>> usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, btch:  
>> 31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, btch:  
>> 31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, btch:  
>> 31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, btch:  
>> 31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, btch:  
>> 31 usd:  17
>> Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, btch:  
>> 31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, btch:  
>> 31 usd:   2
>> Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, btch:  
>> 31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 
>> inactive_anon:388557 isolated_anon:0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.080209]  active_file:68 
>> inactive_file:236 isolated_file:0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.080210]  unevictable:0 dirty:5 
>> writeback:5 unstable:0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.080211]  free:16573 
>> slab_reclaimable:2398 slab_unreclaimable:2335
>> Mar 22 20:30:01 solr01-gs kernel: [716098.080212]  mapped:36 shmem:0 
>> pagetables:24750 bounce:0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.080575] Node 0 DMA free:15796kB 
>> min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB
>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB 
>> isolated(file):0kB present:15244kB mlocked:0kB dirty:0kB writeback:0kB
>> mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB 
>> kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
>> pages_scanned:0 all_unreclaimable? yes
>> Mar 22 20:30:01 solr01-gs kernel: [716098.081041] lowmem_reserve[]: 0 3000 
>> 12090 12090
>> Ma

Re: Tlog File not removed after hard commit

2013-03-25 Thread Michael Della Bitta
My understanding is that logs stick around for a while just in case they
can be used to catch up a shard that rejoins the cluster.
 On Mar 24, 2013 12:03 PM, "Niran Fajemisin"  wrote:

> Hi all,
>
> We import about 1.5 million documents on a nightly basis using DIH. During
> this time, we need to ensure that all documents make it into index
> otherwise rollback on any errors; which DIH takes care of for us. We also
> disable autoCommit in DIH but instruct it to commit at the very end of the
> import. This is all done through configuration of the DIH config XML file
> and the command issued to the request handler.
>
> We have noticed that the tlog file appears to linger around even after DIH
> has issued the hard commit. My expectation would be that after the hard
> commit has occurred, the tlog file will be removed. I'm obviously
> misunderstanding how this all works.
>
> Can someone please help me understand how this is meant to function?
> Thanks!
>
> -Niran


Retriving results based on SOLR query data.

2013-03-25 Thread atuldj.jadhav
Hi Team,

I want to overcome a sort issue here.. sort feature works fine.

I have indexed few documents in SOLR.. which have a unique document ID.
Now when I retrieve result's from SOLR results comes automatically sorted. 

However I would like to fetch results based on the sequence I mention in my
SOLR query.

http://hostname:8080/SOLR/browse?q=documentID:D12133 OR documentID:D14423 OR
documentID:D912

I want results in same order...
 D12133 
 D14423 
 D912

Regards,
Atul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Retriving-results-based-on-SOLR-query-data-tp4051076.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [ANNOUNCE] Solr wiki editing change

2013-03-25 Thread Steve Rowe
On Mar 25, 2013, at 3:30 AM, Dawid Weiss  wrote:
> Can you add me to? We have a few pages which we maintain (search results 
> clustering related). My wiki user is DawidWeiss

Added to AdminGroup.

On Mar 25, 2013, at 5:11 AM, Andrzej Bialecki  wrote:
> Please add AndrzejBialecki to this group. Thank you!

Added to AdminGroup.

On Mar 25, 2013, at 5:48 AM, xie kidd  wrote:
> Please add adderllyer to this group. Thank you!

Added to ContributorsGroup.

Re: Timeout occured while waiting response from server

2013-03-25 Thread Erick Erickson
A timeout like this _probably_ means your docs were indexed just fine. I'm
curious why adding the docs takes so long, how many docs are you sending at
a time?

Best
Erick


On Thu, Mar 21, 2013 at 1:31 PM, Benjamin, Roy  wrote:

> I'm calling: m_server.add(docs, 12);
>
> Wondering if the timeout that expires was the one set when the server was
> created?
>
> m_server = new HttpSolrServer(serverUrl);
> m_server.setRequestWriter(new BinaryRequestWriter());
> m_server.setConnectionTimeout(3);
> m_server.setSoTimeout(1);
>
> Also, does the exception always mean the docs were not added?
>
> Thanks
> Roy
>
> Solr 3.6
>
>
> 2013-03-21 10:21:32,487 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2078: Caught error from UDF: checkout.regexudf.SolrAccumulator
> [org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://10.94.238.86:8080/solr]
>
>


Re: Solr 4.2.0 results links

2013-03-25 Thread Erick Erickson
Solr doesn't do anything with links natively, it just echoes back what you
put in. So you're sending file-based http links to Solr...

Best
Erick


On Thu, Mar 21, 2013 at 1:40 PM, zeroeffect  wrote:

> While I am still in the beginning phase of solr I have been able to index a
> directory of HTML files. I can search keywords and get results. The problem
> I am having is the links to the HTML document is file based and http based.
> I get the link but it points to file:\\ and not http:\\. I have been
> looking
> for where to set this information. My setup is exporting database
> information to individual HTML files then FTP them to the solr server and
> have them indexed and accessed on our intranet.
>
> Thank you for your guidance.
>
> ZeroEffect
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-2-0-results-links-tp4049788.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: OutOfMemoryError

2013-03-25 Thread Arkadi Colson

Thanks for the info!
I just upgraded java from 6 to 7...
How exactly do you monitor the memory usage and the affect of the 
garbage collector?



On 03/25/2013 01:18 PM, Bernd Fehling wrote:

The of UseG1GC yes,
but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM (1.7.0_07).
os.​arch: amd64
os.​name: Linux
os.​version: 2.6.32.13-0.5-xen

Only args are "-XX:+UseG1GC -Xms16g -Xmx16g".
Monitoring shows that 16g is a bit high, I might reduce it to 10g or 12g for 
the slaves.
Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g.
Single index, 130GByte, 43.5 mio. dokuments.

Regards,
Bernd


Am 25.03.2013 11:55, schrieb Arkadi Colson:

Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? Any 
extra options needed?

Thanks...

On 03/25/2013 08:34 AM, Arkadi Colson wrote:

I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as 
parameters. I also added -XX:+UseG1GC to the java process. But now
the whole machine crashes! Any idea why?

Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: 
gfp_mask=0x201da, order=0, oom_adj=0
Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ mems_allowed=0
Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java Not 
tainted 2.6.32-5-amd64 #1
Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace:
Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [] ? 
oom_kill_process+0x7f/0x23f
Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [] ? 
__out_of_memory+0x12a/0x141
Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [] ? 
out_of_memory+0x140/0x172
Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [] ? 
__alloc_pages_nodemask+0x4ec/0x5fc
Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [] ? 
io_schedule+0x93/0xb7
Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [] ? 
__do_page_cache_readahead+0x9b/0x1b4
Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [] ? 
wake_bit_function+0x0/0x23
Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [] ? 
ra_submit+0x1c/0x20
Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [] ? 
filemap_fault+0x17d/0x2f6
Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [] ? 
__do_fault+0x54/0x3c3
Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [] ? 
handle_mm_fault+0x3b8/0x80f
Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [] ? 
apic_timer_interrupt+0xe/0x20
Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [] ? 
do_page_fault+0x2e0/0x2fc
Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [] ? 
page_fault+0x25/0x30
Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info:
Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch:   1 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch:   1 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch:   1 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch:   1 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, btch:  31 
usd:  17
Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, btch:  31 
usd:   2
Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 
inactive_anon:388557 isolated_anon:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080209]  active_file:68 
inactive_file:236 isolated_file:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080210]  unevictable:0 dirty:5 
writeback:5 unstable:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080211]  free:16573 
slab_reclaimable:2398 slab_unreclaimable:2335
Mar 22 20:30:01 solr01-gs kernel: [716098.080212]  mapped:36 shmem:0 
pagetables:24750 bounce:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080575] Node 0 DMA free:15796kB 
min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB present:15244kB mlocked:0kB dirty:0kB writeback:0kB
mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB 
kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? yes
Mar 22 20:30:01 solr01-gs kernel: [716098.081041] lowmem_reserve[]: 0 3000 
12090 12090
Mar 22 20:30:01 solr01-gs kernel: [716098.081110] Node 0 DMA32 free:39824kB 
min

Re: How can I compile and debug Solr from source code?

2013-03-25 Thread Erick Erickson
Furkan:

Stop. back up, you're making it too complicated. Follow Erik's
instructions. The "ant example" just compiles all of Solr, just like the
distribution. Then you can go into the example directory and change it to
look just like whatever you want, change the schema, change the solrconfig,
add custom components, etc. There's no difference between that and the
distro. It _is_ the distro just in a convenient form for running in Jetty.

So you create some custom code (say a filter or whatever). You put the path
to it in your solroconfig in a  directive. In fact I usually path
the  directive out to wherever the code gets built by my IDE for
debugging purposes, then I don't have to copy the jar around.

I can then set breakpoints in my custom code. I can debug Solr as well.
It's just way cool.

About the only thing I'd add to Hatchers instructions is the possibility of
specifying "suspend=y" rather than "suspend=n", and that's just if I want
to debug Solr startup code.

BTW, IntelliJ has, under the "edit configurations" section a "remote"
option that guides you through the flags etc that Erik pointed out. Eclipse
has similar but I use IntelliJ.

Best
Erick


On Thu, Mar 21, 2013 at 8:00 PM, Furkan KAMACI wrote:

> Ok I run that and see that there is a .war file at
>
> /lucene-solr/solr/dist
>
> Do you know that how can I run that ant phase from Intellij without command
> line (there are many phases under Ant build window) On the other hand
> within Intellij Idea how can I auto deploy it into Tomcat. All in all I
> will edit configurations and it will run that ant command and deploy it to
> Tomcat itself?
>
> 2013/3/22 Steve Rowe 
>
> > Perhaps you didn't see what I wrote earlier?:
> >
> > Sounds like you want 'ant dist', which will create the .war and put it
> > into the solr/dist/ directory:
> >
> > PROMPT$ ant dist
> >
> > Steve
> >
> > On Mar 21, 2013, at 7:38 PM, Furkan KAMACI 
> wrote:
> >
> > > I mean I need that:  There is a .war file shipped with Solr source
> code.
> > > How can I regenerate (build my code and generate a .war file) as like
> > that?
> > > I will deploy it to Tomcat then?
> > >
> > > 2013/3/22 Furkan KAMACI 
> > >
> > >> Your mentioned suggestion is for only example application? Can I imply
> > it
> > >> to just pure Solr (I don't want to generate example application
> because
> > my
> > >> aim is not just debugging Solr, I want to extend it and I will debug
> > that
> > >> extended code)?
> > >>
> > >>
> > >> 2013/3/22 Alexandre Rafalovitch 
> > >>
> > >>> That's nice. Can we put that on a Wiki? Or as a quick screencast?
> > >>>
> > >>> Regards,
> > >>>   Alex.
> > >>>
> > >>> Personal blog: http://blog.outerthoughts.com/
> > >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > >>> - Time is the quality of nature that keeps events from happening all
> at
> > >>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > book)
> > >>>
> > >>>
> > >>> On Thu, Mar 21, 2013 at 5:42 PM, Erik Hatcher <
> erik.hatc...@gmail.com
> >  wrote:
> > >>>
> >  Here's my development/debug workflow:
> > 
> >   - "ant idea" at the top-level to generate the IntelliJ project
> >   - cd solr; ant example - to build the full example
> >   - cd example; java -Xdebug
> >  -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005 -jar
> >  start.jar - to launch Jetty+Solr in debug mode
> >   - set breakpoints in IntelliJ, set up a Remote run option
> >  (localhost:5005) in IntelliJ and debug pleasantly
> > 
> >  All the unit tests in Solr run very nicely in IntelliJ too, and for
> > >>> tight
> >  development loops, I spend my time doing that instead of running
> full
> > on
> >  Solr.
> > 
> > Erik
> > 
> > 
> >  On Mar 21, 2013, at 05:56 , Furkan KAMACI wrote:
> > 
> > > I use Intellij Idea 12 and Solr 4.1 on a Centos 6.4 64 bit
> computer.
> > >
> > > I have opened Solr source code at Intellij IDEA as explained
> >  documentation.
> > > I want to deploy Solr into Tomcat 7. When I open the project there
> > are
> > > configurations set previosly (I used ant idea command before I open
> > >>> the
> > > project) . However they are all test configurations and some of
> them
> > >>> are
> > > not passed test (this is another issue, no need to go detail at
> this
> > > e-mail). I have added a Tomcat Local configuration into
> > configurations
> >  but
> > > I don't know which one is the main method of Solr and is there any
> > > documentation that explains code. i.e. I want to debug a point what
> > >>> Solr
> > > receives from when I say -index from nutch and what Solr does?
> > >
> > > I tried somehing to run code (I don't think I could generate a .war
> > >>> or an
> > > exploded folder) an this is the error that I get:(I did't point any
> > > artifact for edit configurations)
> > >
> > > Error: Exception thrown by 

Re: Continue to the next record

2013-03-25 Thread Erick Erickson
This has been a long-standing issue with updates, several attempts
have been started to change the behavior, but they haven't gotten
off the ground.

Your options are to send one record at a time, or have error-handling
logic that, say, transmits the docs one at a time whenever a packet fails.

Best
Erick


On Thu, Mar 21, 2013 at 9:21 PM, randolf.julian <
randolf.jul...@dominionenterprises.com> wrote:

> I have an XML file that has several documents in it. For example:
>
> 
>   
>  1
>  MyName1
>   
>   
>  2
>  MyName2
>   
>   
>  3
>  MyName3
>   
> 
>
> I upload the data using SOLR's post.sh script. For some reason, document 2
> failed and it cause the post.sh script to stop. How can I make it continue
> to the next document (3) even if it fails on 2?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Continue-to-the-next-record-tp4049920.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr using a ridiculous amount of memory

2013-03-25 Thread John Nielsen
I apologize for the slow reply. Today has been killer. I will reply to
everyone as soon as I get the time.

I am having difficulties understanding how docValues work.

Should I only add docValues to the fields that I actually use for sorting
and faceting or on all fields?

Will the docValues magic apply to the fields i activate docValues on or on
the entire document when sorting/faceting on a field that has docValues
activated?

I'm not even sure which question to ask. I am struggling to understand this
on a conceptual level.


On Sun, Mar 24, 2013 at 7:11 PM, Robert Muir  wrote:

> On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen  wrote:
>
> > Schema with DocValues attempt at solving problem:
> > http://pastebin.com/Ne23NnW4
> > Config: http://pastebin.com/x1qykyXW
> >
>
> This schema isn't using docvalues, due to a typo in your config.
> it should not be DocValues="true" but docValues="true".
>
> Are you not getting an error? Solr needs to throw exception if you
> provide invalid attributes to the field. Nothing is more frustrating
> than having a typo or something in your configuration and solr just
> ignores this, reports no error, and "doesnt work the way you want".
> I'll look into this (I already intend to add these checks to analysis
> factories for the same reason).
>
> Separately, if you really want the terms data and so on to remain on
> disk, it is not enough to "just enable docvalues" for the field. The
> default implementation uses the heap. So if you want that, you need to
> set docValuesFormat="Disk" on the fieldtype. This will keep the
> majority of the data on disk, and only some key datastructures in heap
> memory. This might have significant performance impact depending upon
> what you are doing so you need to test that.
>



-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


RE: SOLR - "Unable to execute query" error - DIH

2013-03-25 Thread Dyer, James
With MS SqlServer, try adding "selectMethod=cursor" to your conenction string 
and set your batch size to a reasonable amount (possibly just omit it and DIH 
has a default value it will use.)

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: kobe.free.wo...@gmail.com [mailto:kobe.free.wo...@gmail.com] 
Sent: Monday, March 25, 2013 3:25 AM
To: solr-user@lucene.apache.org
Subject: SOLR - "Unable to execute query" error - DIH

Hello All,

I am trying to index data from SQL Server view to the SOLR using the DIH
with full-import command. The view has 750K rows and 427 columns. During the
first execution i indexed only the first 50 rows of the view, the data got
indexed in 10 min. But, when i executed the same scenario to index the
complete set of 750K rows, the execution continued for 2 days and
roll-backed, giving me the following error:

"Unable to execute the query: select * from."

Following is my DIH configuration file,


  
  

   

As suggested in some of the posts, i did try with batchsize="-1", but dint
work out. Please suggest is this the correct approach or any parameter needs
to be modified for tuning.

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028.html
Sent from the Solr - User mailing list archive at Nabble.com.




Contributors Group

2013-03-25 Thread Swati Swoboda
Hello,

Can I be added to the contributors group? Username sswoboda.

Thank you.

Swati


Re: Contributors Group

2013-03-25 Thread Steve Rowe

On Mar 25, 2013, at 10:32 AM, Swati Swoboda  wrote:
> Can I be added to the contributors group? Username sswoboda.

Added to solr ContributorsGroup.

Re: Continue to the next record

2013-03-25 Thread randolf.julian
Erick,

Thanks for the info. That's also what I had in mind and that's what I did
since I can't find anything on the web regarding this issue.

Randolf



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Continue-to-the-next-record-tp4049920p4051113.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OutOfMemoryError

2013-03-25 Thread Bernd Fehling
We use munin with jmx plugin for monitoring all server and Solr installations.
(http://munin-monitoring.org/)

Only for short time monitoring we also use jvisualvm delivered with Java SE JDK.

Regards
Bernd

Am 25.03.2013 14:45, schrieb Arkadi Colson:
> Thanks for the info!
> I just upgraded java from 6 to 7...
> How exactly do you monitor the memory usage and the affect of the garbage 
> collector?
> 
> 
> On 03/25/2013 01:18 PM, Bernd Fehling wrote:
>> The of UseG1GC yes,
>> but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM 
>> (1.7.0_07).
>> os.​arch: amd64
>> os.​name: Linux
>> os.​version: 2.6.32.13-0.5-xen
>>
>> Only args are "-XX:+UseG1GC -Xms16g -Xmx16g".
>> Monitoring shows that 16g is a bit high, I might reduce it to 10g or 12g for 
>> the slaves.
>> Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g.
>> Single index, 130GByte, 43.5 mio. dokuments.
>>
>> Regards,
>> Bernd
>>
>>
>> Am 25.03.2013 11:55, schrieb Arkadi Colson:
>>> Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? Any 
>>> extra options needed?
>>>
>>> Thanks...
>>>
>>> On 03/25/2013 08:34 AM, Arkadi Colson wrote:
 I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as 
 parameters. I also added -XX:+UseG1GC to the java process. But now
 the whole machine crashes! Any idea why?

 Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: 
 gfp_mask=0x201da, order=0, oom_adj=0
 Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ 
 mems_allowed=0
 Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java 
 Not tainted 2.6.32-5-amd64 #1
 Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace:
 Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [] ? 
 oom_kill_process+0x7f/0x23f
 Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [] ? 
 __out_of_memory+0x12a/0x141
 Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [] ? 
 out_of_memory+0x140/0x172
 Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [] ? 
 __alloc_pages_nodemask+0x4ec/0x5fc
 Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [] ? 
 io_schedule+0x93/0xb7
 Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [] ? 
 __do_page_cache_readahead+0x9b/0x1b4
 Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [] ? 
 wake_bit_function+0x0/0x23
 Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [] ? 
 ra_submit+0x1c/0x20
 Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [] ? 
 filemap_fault+0x17d/0x2f6
 Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [] ? 
 __do_fault+0x54/0x3c3
 Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [] ? 
 handle_mm_fault+0x3b8/0x80f
 Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [] ? 
 apic_timer_interrupt+0xe/0x20
 Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [] ? 
 do_page_fault+0x2e0/0x2fc
 Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [] ? 
 page_fault+0x25/0x30
 Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info:
 Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu:
 Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch:   
 1 usd:   0
 Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch:   
 1 usd:   0
 Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch:   
 1 usd:   0
 Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch:   
 1 usd:   0
 Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu:
 Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, btch: 
  31 usd:   0
 Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, btch: 
  31 usd:   0
 Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, btch: 
  31 usd:   0
 Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, btch: 
  31 usd:   0
 Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu:
 Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, btch: 
  31 usd:  17
 Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, btch: 
  31 usd:   0
 Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, btch: 
  31 usd:   2
 Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, btch: 
  31 usd:   0
 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 
 inactive_anon:388557 isolated_anon:0
 Mar 22 20:30:01 solr01-gs kernel: [716098.080209]  active_file:68 
 inactive_file:236 isolated_file:0
 Mar 22 20:30:01 solr01-gs kernel: [716098.080210]  unevictable:0 dirty:5 
 writeback:5 unstable:0
 Mar 22 20:30:01 solr01-gs kernel: [716098.080211]  free:16573 
 slab_reclaimable:2398 slab_unreclaimable:2335
 Mar 22 

Solr 4 automatic DB updates for sync using Delta query DIH with scheduler

2013-03-25 Thread majiedahamed
Hi,

Please let me know how to get the db changes reflected into my solr
index,Iam using Solr4 with DIH and delta query with scheduler in dataimport
scheduler properties.Ultimately i want my DB to be in sync with solr

Everything is all set and working except Every time i modify the data in the
DB column my scheduler automatically creates new index to the solr,I
therefore get two values with different  _version_.What iam looking is the
index get updated as and when the DB colums are updated.Kindly assist...

with regards
majied



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-automatic-DB-updates-for-sync-using-Delta-query-DIH-with-scheduler-tp4051114.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OutOfMemoryError

2013-03-25 Thread Arkadi Colson
How can I see if GC is actually working? Is it written in the tomcat 
logs as well or will I only see it in the memory graphs?


BR,
Arkadi
On 03/25/2013 03:50 PM, Bernd Fehling wrote:

We use munin with jmx plugin for monitoring all server and Solr installations.
(http://munin-monitoring.org/)

Only for short time monitoring we also use jvisualvm delivered with Java SE JDK.

Regards
Bernd

Am 25.03.2013 14:45, schrieb Arkadi Colson:

Thanks for the info!
I just upgraded java from 6 to 7...
How exactly do you monitor the memory usage and the affect of the garbage 
collector?


On 03/25/2013 01:18 PM, Bernd Fehling wrote:

The of UseG1GC yes,
but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM (1.7.0_07).
os.​arch: amd64
os.​name: Linux
os.​version: 2.6.32.13-0.5-xen

Only args are "-XX:+UseG1GC -Xms16g -Xmx16g".
Monitoring shows that 16g is a bit high, I might reduce it to 10g or 12g for 
the slaves.
Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g.
Single index, 130GByte, 43.5 mio. dokuments.

Regards,
Bernd


Am 25.03.2013 11:55, schrieb Arkadi Colson:

Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? Any 
extra options needed?

Thanks...

On 03/25/2013 08:34 AM, Arkadi Colson wrote:

I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as 
parameters. I also added -XX:+UseG1GC to the java process. But now
the whole machine crashes! Any idea why?

Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: 
gfp_mask=0x201da, order=0, oom_adj=0
Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ mems_allowed=0
Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java Not 
tainted 2.6.32-5-amd64 #1
Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace:
Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [] ? 
oom_kill_process+0x7f/0x23f
Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [] ? 
__out_of_memory+0x12a/0x141
Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [] ? 
out_of_memory+0x140/0x172
Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [] ? 
__alloc_pages_nodemask+0x4ec/0x5fc
Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [] ? 
io_schedule+0x93/0xb7
Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [] ? 
__do_page_cache_readahead+0x9b/0x1b4
Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [] ? 
wake_bit_function+0x0/0x23
Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [] ? 
ra_submit+0x1c/0x20
Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [] ? 
filemap_fault+0x17d/0x2f6
Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [] ? 
__do_fault+0x54/0x3c3
Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [] ? 
handle_mm_fault+0x3b8/0x80f
Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [] ? 
apic_timer_interrupt+0xe/0x20
Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [] ? 
do_page_fault+0x2e0/0x2fc
Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [] ? 
page_fault+0x25/0x30
Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info:
Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch:   1 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch:   1 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch:   1 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch:   1 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu:
Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, btch:  31 
usd:  17
Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, btch:  31 
usd:   2
Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, btch:  31 
usd:   0
Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 
inactive_anon:388557 isolated_anon:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080209]  active_file:68 
inactive_file:236 isolated_file:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080210]  unevictable:0 dirty:5 
writeback:5 unstable:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080211]  free:16573 
slab_reclaimable:2398 slab_unreclaimable:2335
Mar 22 20:30:01 solr01-gs kernel: [716098.080212]  mapped:36 shmem:0 
pagetables:24750 bounce:0
Mar 22 20:30:01 solr01-gs kernel: [716098.080575] Node 0 DMA free:15796kB 
min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB i

Re: Slow queries for common terms

2013-03-25 Thread Erick Erickson
take a look here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

looking at memory consumption can be a bit tricky to interpret with
MMapDirectory.

But you say "I see the CPU working very hard" which implies that your issue
is just scoring 90M documents. A way to test: try q=*:*&fq=field:book. My
bet is that that will be much faster, in which case scoring is your
choke-point and you'll need to spread that load across more servers, i.e.
shard.

When running the above, make sure of a couple of things:
1> you haven't run the fq query before (or you have filterCache turned
completely off).
2> you _have_ run a query or two that warms up your low-level caches.
Doesn't matter what, just as long as it doesn't have an fq clause.

Best
Erick



On Sat, Mar 23, 2013 at 3:10 AM, David Parks  wrote:

> I see the CPU working very hard, and at the same time I see 2 MB/sec disk
> access for that 15 seconds. I am not running it this instant, but it seems
> to me that there was more CPU cycles available, so unless it's an issue of
> not being able to multithread it any  further I'd say it's more IO related.
>
> I'm going to set up solr cloud and shard across the 2 servers I have
> available for now. It's not an optimal setup we have while we're in a
> private beta period, but maybe it'll improve things (I've got 2 servers
> with
> 2x 4TB disks in raid-0 shared with the webservers).
>
> I'll work towards some improved IO performance and maybe more shards and
> see
> how things go. I'll also be able to up the RAM in just a couple of weeks.
>
> Are there any settings I should think of in terms of improving cache
> performance when I can give it say 10GB of RAM?
>
> Thanks, this has been tremendously helpful.
>
> David
>
>
> -Original Message-
> From: Tom Burton-West [mailto:tburt...@umich.edu]
> Sent: Saturday, March 23, 2013 1:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Slow queries for common terms
>
> Hi David and Jan,
>
> I wrote the blog post, and David, you are right, the problem we had was
> with
> phrase queries because our positions lists are so huge.  Boolean
> queries don't need to read the positions lists.   I think you need to
> determine whether you are CPU bound or I/O bound.It is possible that
> you are I/O bound and reading the term frequency postings for 90 million
> docs is taking a long time.  In that case, More memory in the machine (but
> not dedicated to Solr) might help because Solr relies on OS disk caching
> for
> caching the postings lists.  You would still need to do some cache warming
> with your most common terms.
>
> On the other hand as Jan pointed out, you may be cpu bound because Solr
> doesn't have early termination and has to rank all 90 million docs in order
> to show the top 10 or 25.
>
> Did you try the OR search to see if your CPU is at 100%?
>
> Tom
>
> On Fri, Mar 22, 2013 at 10:14 AM, Jan Høydahl 
> wrote:
>
> > Hi
> >
> > There might not be a final cure with more RAM if you are CPU bound.
> > Scoring 90M docs is some work. Can you check what's going on during
> > those
> > 15 seconds? Is your CPU at 100%? Try an (foo OR bar OR baz) search
> > which generates >100mill hits and see if that is slow too, even if you
> > don't use frequent words.
> >
> > I'm sure you can find other frequent terms in your corpus which
> > display similar behaviour, words which are even more frequent than
> > "book". Are you using "AND" as default operator? You will benefit from
> > limiting the number of results as much as possible.
> >
> > The real solution is to shard across N number of servers, until you
> > reach the desired performance for the desired indexing/querying load.
> >
> > --
> > Jan Høydahl, search solution architect Cominvent AS -
> > www.cominvent.com Solr Training - www.solrtraining.com
> >
> >
>
>


Re: Two problems (missing updates and timeouts)

2013-03-25 Thread Erick Erickson
For your first problem I'd be looking at the solr logs and verifying that
1> the update was sent
2> no stack traces are thrown
3> You probably already know all about commits, but just in case the commit
interval is passed.

For your second problem, I'm not quite sure where you're setting these
timeouts. SolrJ?

Best
Erick


On Sat, Mar 23, 2013 at 4:23 PM, Aaron Jensen  wrote:

> Hi all,
>
> I'm having two problem with our solr implementation. I don't have a lot of
> detail about them because we're just starting to get into diagnosing them.
> I'm hoping for some help with that diagnosis, ideas, tips, whatever.
>
> Our stack:
>
> Rails
> Sunspot Solr
> sunspot_index_queue
> two solr servers, master and slave, all traffic currently going to master,
> slave is just a replication slave/backup.
>
>
> The first and biggest problem is that we occasionally "lose" updates.
> Something will get added to the database, it will trigger a solr update,
> but then we can't search for that thing. It's just gone. indexing that
> thing again will have it show up. There are a number of moving parts in our
> stack and this is a relatively new problem. It was working fine for 1.5
> years without a problem. We're considering adding a delayed job that will
> index anything that is newly created a second after it is created just to
> "be sure" but this is a giant hack. Any ideas around this would be helpful.
>
>
>
> The second problem is that we get occasional timeouts. These don't happen
> very often, maybe 5-7/day. Solr is serving at most like 350 requests per
> minute. Our timeouts are set to 2 seconds on read and 1 second on open.
> Average response time is around 20ms. It doesn't seem like any requests
> should be timing out but they are. I have no idea how to debug it either.
> Any ideas?
>
> Thanks,
>
> Aaron
>
>


Re: Solr 4.2 Incremental backups

2013-03-25 Thread Erick Erickson
That's essentially what replication does, only backs up parts of the index
that have changed. However, when segments merge, that might mean the entire
index needs to be replicated.

Best
Erick


On Sun, Mar 24, 2013 at 12:08 AM, Sandeep Kumar Anumalla <
sanuma...@etisalat.ae> wrote:

> Hi,
>
> Is there any option to do Incremental backups in Solr 4.2?
>
> Thanks & Regards
> Sandeep A
> Ext : 02618-2856
> M : 0502493820
>
>
> 
> The content of this email together with any attachments, statements and
> opinions expressed herein contains information that is private and
> confidential are intended for the named addressee(s) only. If you are not
> the addressee of this email you may not copy, forward, disclose or
> otherwise use it or any part of it in any form whatsoever. If you have
> received this message in error please notify postmas...@etisalat.ae by
> email immediately and delete the message without making any copies.
>


Re: Too many fields to Sort in Solr

2013-03-25 Thread Erick Erickson
Certainly that will be true for the bare q=*:*, I meant with the boosting
clause added.

Best
Erick


On Sun, Mar 24, 2013 at 7:01 PM, adityab  wrote:

> thanks Eric. in this query "q=*:*" the Lucene score is always 1
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4050944.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: OutOfMemoryError

2013-03-25 Thread Bernd Fehling
You can also use "-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails 
-Xloggc:gc.log"
as additional options to get a "gc.log" file and see what GC is doing.

Regards
Bernd

Am 25.03.2013 16:01, schrieb Arkadi Colson:
> How can I see if GC is actually working? Is it written in the tomcat logs as 
> well or will I only see it in the memory graphs?
> 
> BR,
> Arkadi
> On 03/25/2013 03:50 PM, Bernd Fehling wrote:
>> We use munin with jmx plugin for monitoring all server and Solr 
>> installations.
>> (http://munin-monitoring.org/)
>>
>> Only for short time monitoring we also use jvisualvm delivered with Java SE 
>> JDK.
>>
>> Regards
>> Bernd
>>
>> Am 25.03.2013 14:45, schrieb Arkadi Colson:
>>> Thanks for the info!
>>> I just upgraded java from 6 to 7...
>>> How exactly do you monitor the memory usage and the affect of the garbage 
>>> collector?
>>>
>>>
>>> On 03/25/2013 01:18 PM, Bernd Fehling wrote:
 The of UseG1GC yes,
 but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM 
 (1.7.0_07).
 os.​arch: amd64
 os.​name: Linux
 os.​version: 2.6.32.13-0.5-xen

 Only args are "-XX:+UseG1GC -Xms16g -Xmx16g".
 Monitoring shows that 16g is a bit high, I might reduce it to 10g or 12g 
 for the slaves.
 Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g.
 Single index, 130GByte, 43.5 mio. dokuments.

 Regards,
 Bernd


 Am 25.03.2013 11:55, schrieb Arkadi Colson:
> Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? 
> Any extra options needed?
>
> Thanks...
>
> On 03/25/2013 08:34 AM, Arkadi Colson wrote:
>> I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as 
>> parameters. I also added -XX:+UseG1GC to the java process. But now
>> the whole machine crashes! Any idea why?
>>
>> Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked 
>> oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ 
>> mems_allowed=0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java 
>> Not tainted 2.6.32-5-amd64 #1
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [] ? 
>> oom_kill_process+0x7f/0x23f
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [] ? 
>> __out_of_memory+0x12a/0x141
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [] ? 
>> out_of_memory+0x140/0x172
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [] ? 
>> __alloc_pages_nodemask+0x4ec/0x5fc
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [] ? 
>> io_schedule+0x93/0xb7
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [] ? 
>> __do_page_cache_readahead+0x9b/0x1b4
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [] ? 
>> wake_bit_function+0x0/0x23
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [] ? 
>> ra_submit+0x1c/0x20
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [] ? 
>> filemap_fault+0x17d/0x2f6
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [] ? 
>> __do_fault+0x54/0x3c3
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [] ? 
>> handle_mm_fault+0x3b8/0x80f
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [] ? 
>> apic_timer_interrupt+0xe/0x20
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [] ? 
>> do_page_fault+0x2e0/0x2fc
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [] ? 
>> page_fault+0x25/0x30
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch: 
>>   1 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch: 
>>   1 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch: 
>>   1 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch: 
>>   1 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, 
>> btch:  31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, 
>> btch:  31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, 
>> btch:  31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, 
>> btch:  31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, 
>> btch:  31 usd:  17
>> Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, 
>> btch:  31 usd:   0
>> Mar 22 20:30:01 solr01-

Re: Undefined field problem.

2013-03-25 Thread Erick Erickson
unless you're manually typing things and did a typo, your problem is that
your csv file defines:

active_cruises
and your schema has
active_cruise

Note the lack of an 's'...

Best
Erick


On Mon, Mar 25, 2013 at 6:30 AM, Mid Night  wrote:

> Further to the prev msg:  Here's an extract from my current schema.xml:
>
> required="true" />
> stored="true"/>
>
>
>
>
>
> The original schema.xml had the last 3 fields in the order toptipp,
> active_cruise and non_grata.  Active_cruise and non_grata were also defined
> as type="int".  I changed the order and field types in my attempts to fix
> the error.
>
>
>
>
>
> On 25 March 2013 11:21, Mid Night  wrote:
>
> > Hi,
> >
> >
> > I recently added a new field (toptipp) to an existing solr schema.xml and
> > it worked just fine.  Subsequently I added to more fields (active_cruises
> > and non_grata) to the schema and now I get this error:
> >
> > 
> > 
> > 400 name="QTime">6undefined
> field: "active_cruise"400
> > 
> >
> >
> > My solr db is populated via a program that creates and uploads a csv
> > file.  When I view the csv file, the field "active_cruises" (given as
> > undefined above), is populated correctly.  As far as I can tell, when I
> > added the final fields to the schema, I did exactly the same as when I
> > added "toptipp".  I updated schema.xml and restarted solr (java -jar
> > start.jar).
> >
> > I am really at a loss here.  Can someone please help with the answer or
> by
> > pointing me in the right direction?  Naturally I'd be happy to provide
> > further info if needed.
> >
> >
> > Thanks
> > MK
> >
> >
> >
> >
> >
> >
> >
> >
>


Re: Tlog File not removed after hard commit

2013-03-25 Thread Erick Erickson
The tlogs will stay there to provide "peer synch" on the last 100 docs. Say
a node somehow gets out of synch. There are two options
1> replay from the log
2> replicate the entire index.

To avoid <2> if possible, the tlog is kept around. In your case, all your
data is put in the tlog file, so the "keep the last 100 docs available"
rule means you'll keep the entire log for the run around until the _next_
run completes, at which point I'd expect the oldest one to be deleted.

Best
Erick


On Mon, Mar 25, 2013 at 8:40 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> My understanding is that logs stick around for a while just in case they
> can be used to catch up a shard that rejoins the cluster.
>  On Mar 24, 2013 12:03 PM, "Niran Fajemisin"  wrote:
>
> > Hi all,
> >
> > We import about 1.5 million documents on a nightly basis using DIH.
> During
> > this time, we need to ensure that all documents make it into index
> > otherwise rollback on any errors; which DIH takes care of for us. We also
> > disable autoCommit in DIH but instruct it to commit at the very end of
> the
> > import. This is all done through configuration of the DIH config XML file
> > and the command issued to the request handler.
> >
> > We have noticed that the tlog file appears to linger around even after
> DIH
> > has issued the hard commit. My expectation would be that after the hard
> > commit has occurred, the tlog file will be removed. I'm obviously
> > misunderstanding how this all works.
> >
> > Can someone please help me understand how this is meant to function?
> > Thanks!
> >
> > -Niran
>


Re: Retriving results based on SOLR query data.

2013-03-25 Thread Erick Erickson
There's no good way that I know of to have Solr do that for you.

But you have the original query so it seems like your app layer could sort
the results accordingly.

Best
Erick


On Mon, Mar 25, 2013 at 8:44 AM, atuldj.jadhav wrote:

> Hi Team,
>
> I want to overcome a sort issue here.. sort feature works fine.
>
> I have indexed few documents in SOLR.. which have a unique document ID.
> Now when I retrieve result's from SOLR results comes automatically sorted.
>
> However I would like to fetch results based on the sequence I mention in my
> SOLR query.
>
> http://hostname:8080/SOLR/browse?q=documentID:D12133 OR documentID:D14423
> OR
> documentID:D912
>
> I want results in same order...
>  D12133
>  D14423
>  D912
>
> Regards,
> Atul
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Retriving-results-based-on-SOLR-query-data-tp4051076.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Query slow with termVectors termPositions termOffsets

2013-03-25 Thread Ravi Solr
Hello,
We re-indexed our entire core of 115 docs with some of the
fields having termVectors="true" termPositions="true" termOffsets="true",
prior to the reindex we only had termVectors="true". After the reindex the
the query component has become very slow. I thought that adding the
termOffsets and termPositions will increase the speed, am I wrong ? Several
queries like the one shown below which used to run fine are now very slow.
Can somebody kindly clarify how termOffsets and termPositions affect query
component ?

19076.0
 18972.0
0.0
0.0
0.0
0.0
0.0
0.0
104.0



[#|2013-03-25T11:22:53.446-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=45;_ThreadName=httpSSLWorkerThread-9001-19;|[xxx]
webapp=/solr-admin path=/select
params={q=primarysectionnode:(/national*+OR+/health*)+OR+(contenttype:Blog+AND+subheadline:("The+Checkup"+OR+"Checkpoint+Washington"+OR+"Post+Carbon"+OR+TSA+OR+"College+Inc."+OR+"Campus+Overload"+OR+"Planet+Panel"+OR+"The+Answer+Sheet"+OR+"Class+Struggle"+OR+"BlogPost"))+OR+(contenttype:"Photo+Gallery"+AND+headline:"day+in+photos")&start=0&rows=1&sort=displaydatetime+desc&fq=-source:(Reuters+OR+"PC+World"+OR+"CBS+News"+OR+NC8/WJLA+OR+"NewsChannel+8"+OR+NC8+OR+WJLA+OR+CBS)+-contenttype:("Discussion"+OR+"Photo")+-slug:(op-*dummy*+OR+noipad-*)+-(contenttype:"Photo+Gallery"+AND+headline:("Drawing+Board"+OR+"Drawing+board"+OR+"drawing+board"))+headline:[*+TO+*]+contenttype:[*+TO+*]+pubdatetime:[NOW/DAY-3YEARS+TO+NOW/DAY%2B1DAY]+-headline:("Summary+Box*"+OR+"Video*"+OR+"Post+Sports+Live*")+-slug:(warren*+OR+"history")+-(contenttype:Blog+AND+subheadline:("DC+Schools+Insider"+OR+"On+Leadership"))+contenttype:"Blog"+-systemid:(999c7102-955a-11e2-95ca-dd43e7ffee9c+OR+72bbb724-9554-11e2-95ca-dd43e7ffee9c+OR+2d008b80-9520-11e2-95ca-dd43e7ffee9c+OR+d2443d3c-9514-11e2-95ca-dd43e7ffee9c+OR+173764d6-9520-11e2-95ca-dd43e7ffee9c+OR+0181fd42-953c-11e2-95ca-dd43e7ffee9c+OR+e6cacb96-9559-11e2-95ca-dd43e7ffee9c+OR+03288052-9501-11e2-95ca-dd43e7ffee9c+OR+ddbf020c-9517-11e2-95ca-dd43e7ffee9c)+fullbody:[*+TO+*]&wt=javabin&version=2}
hits=4985 status=0 QTime=19044 |#]

Thanks,

Ravi Kiran Bhaskar


Re: Undefined field problem.

2013-03-25 Thread Jack Krupansky
Generally, you will need to delete the index and completely reindex your 
data if you change the type of a field.


I don't think that would account for active_cruise being an undefined field 
though.


I did try your scenario with the Solr 4.2 example, and a field named 
active_cruise, and it worked fine for me. The only issue was that existing 
data (e.g., 1 in the int field) was all considered as boolean false after I 
changed the schema and restarted.


-- Jack Krupansky

-Original Message- 
From: Mid Night

Sent: Monday, March 25, 2013 6:30 AM
To: solr-user@lucene.apache.org
Subject: Re: Undefined field problem.

Further to the prev msg:  Here's an extract from my current schema.xml:

  
  
  
  



The original schema.xml had the last 3 fields in the order toptipp,
active_cruise and non_grata.  Active_cruise and non_grata were also defined
as type="int".  I changed the order and field types in my attempts to fix
the error.





On 25 March 2013 11:21, Mid Night  wrote:


Hi,


I recently added a new field (toptipp) to an existing solr schema.xml and
it worked just fine.  Subsequently I added to more fields (active_cruises
and non_grata) to the schema and now I get this error:



400name="QTime">6undefined 
field: "active_cruise"400




My solr db is populated via a program that creates and uploads a csv
file.  When I view the csv file, the field "active_cruises" (given as
undefined above), is populated correctly.  As far as I can tell, when I
added the final fields to the schema, I did exactly the same as when I
added "toptipp".  I updated schema.xml and restarted solr (java -jar
start.jar).

I am really at a loss here.  Can someone please help with the answer or by
pointing me in the right direction?  Naturally I'd be happy to provide
further info if needed.


Thanks
MK












Re: Contributors Group

2013-03-25 Thread Upayavira
While you're in that mode, could you please add 'Upayavira'.

Thanks!

Upayavira

On Mon, Mar 25, 2013, at 02:41 PM, Steve Rowe wrote:
> 
> On Mar 25, 2013, at 10:32 AM, Swati Swoboda 
> wrote:
> > Can I be added to the contributors group? Username sswoboda.
> 
> Added to solr ContributorsGroup.


Re: Contributors Group

2013-03-25 Thread Steve Rowe
On Mar 25, 2013, at 11:59 AM, Upayavira  wrote:
> While you're in that mode, could you please add 'Upayavira'.

Added to solr ContributorsGroup.


lucene 42 codec

2013-03-25 Thread Mario Casola
Hi,

I noticed that apache solr 4.2 uses the lucene codec 4.1. How can I
switch to 4.2?

Thanks in advance
Mario


Re: Query slow with termVectors termPositions termOffsets

2013-03-25 Thread alxsss
Did index size increase after turning on termPositions and termOffsets?

Thanks.
Alex.

 

 

 

-Original Message-
From: Ravi Solr 
To: solr-user 
Sent: Mon, Mar 25, 2013 8:27 am
Subject: Query slow with termVectors termPositions termOffsets


Hello,
We re-indexed our entire core of 115 docs with some of the
fields having termVectors="true" termPositions="true" termOffsets="true",
prior to the reindex we only had termVectors="true". After the reindex the
the query component has become very slow. I thought that adding the
termOffsets and termPositions will increase the speed, am I wrong ? Several
queries like the one shown below which used to run fine are now very slow.
Can somebody kindly clarify how termOffsets and termPositions affect query
component ?

19076.0
 18972.0
0.0
0.0
0.0
0.0
0.0
0.0
104.0



[#|2013-03-25T11:22:53.446-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=45;_ThreadName=httpSSLWorkerThread-9001-19;|[xxx]
webapp=/solr-admin path=/select
params={q=primarysectionnode:(/national*+OR+/health*)+OR+(contenttype:Blog+AND+subheadline:("The+Checkup"+OR+"Checkpoint+Washington"+OR+"Post+Carbon"+OR+TSA+OR+"College+Inc."+OR+"Campus+Overload"+OR+"Planet+Panel"+OR+"The+Answer+Sheet"+OR+"Class+Struggle"+OR+"BlogPost"))+OR+(contenttype:"Photo+Gallery"+AND+headline:"day+in+photos")&start=0&rows=1&sort=displaydatetime+desc&fq=-source:(Reuters+OR+"PC+World"+OR+"CBS+News"+OR+NC8/WJLA+OR+"NewsChannel+8"+OR+NC8+OR+WJLA+OR+CBS)+-contenttype:("Discussion"+OR+"Photo")+-slug:(op-*dummy*+OR+noipad-*)+-(contenttype:"Photo+Gallery"+AND+headline:("Drawing+Board"+OR+"Drawing+board"+OR+"drawing+board"))+headline:[*+TO+*]+contenttype:[*+TO+*]+pubdatetime:[NOW/DAY-3YEARS+TO+NOW/DAY%2B1DAY]+-headline:("Summary+Box*"+OR+"Video*"+OR+"Post+Sports+Live*")+-slug:(warren*+OR+"history")+-(contenttype:Blog+AND+subheadline:("DC+Schools+Insider"+OR+"On+Leadership"))+contenttype:"Blog"+-systemid:(999c7102-955a-11e2-95ca-dd43e7ffee9c+OR+72bbb724-9554-11e2-95ca-dd43e7ffee9c+OR+2d008b80-9520-11e2-95ca-dd43e7ffee9c+OR+d2443d3c-9514-11e2-95ca-dd43e7ffee9c+OR+173764d6-9520-11e2-95ca-dd43e7ffee9c+OR+0181fd42-953c-11e2-95ca-dd43e7ffee9c+OR+e6cacb96-9559-11e2-95ca-dd43e7ffee9c+OR+03288052-9501-11e2-95ca-dd43e7ffee9c+OR+ddbf020c-9517-11e2-95ca-dd43e7ffee9c)+fullbody:[*+TO+*]&wt=javabin&version=2}
hits=4985 status=0 QTime=19044 |#]

Thanks,

Ravi Kiran Bhaskar

 


Error creating collection using CORE-API

2013-03-25 Thread yriveiro
Hi,

I'm having an issue when I trying to create a collection:


curl
http://192.168.1.142:8983/solr/admin/cores?action=CREATE&name=RT-4A46DF1563_12&collection=RT-4A46DF1563_12&shard=00&collection.configName=reportssBucket-regular


The curl call has an error because the collection.configName doesn't exists,
then I fixed the curl call to:


curl
http://192.168.1.142:8983/solr/admin/cores?action=CREATE&name=RT-4A46DF1563_12&collection=RT-4A46DF1563_12&shard=00&collection.configName=reportsBucket-regular


But now I have this stacktrace:

INFO: Creating SolrCore 'RT-4A46DF1563_12' using instanceDir:
/Users/yriveiro/Dump/solrCloud/node00.solrcloud/solr/home/RT-4A46DF1563_12
Mar 25, 2013 5:15:35 PM org.apache.solr.cloud.ZkController
createCollectionZkNode
INFO: Check for collection zkNode:RT-4A46DF1563_12
Mar 25, 2013 5:15:35 PM org.apache.solr.cloud.ZkController
createCollectionZkNode
INFO: Collection zkNode exists
Mar 25, 2013 5:15:35 PM org.apache.solr.cloud.ZkController readConfigName
INFO: Load collection config from:/collections/RT-4A46DF1563_12
Mar 25, 2013 5:15:35 PM org.apache.solr.cloud.ZkController readConfigName
SEVERE: Specified config does not exist in ZooKeeper:reportssBucket-regular
Mar 25, 2013 5:15:35 PM org.apache.solr.core.CoreContainer recordAndThrow
SEVERE: Unable to create core: RT-4A46DF1563_12
org.apache.solr.common.cloud.ZooKeeperException: Specified config does not
exist in ZooKeeper:reportssBucket-regular


In fact the collection is in zookeeper as a file and not as a folder, the
question here is: If the CREATE command doesn't find the config, why it's
created a file? and Why after this, I can't run the command again with the
correct syntax without remove the file create by the failed CREATE command?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-creating-collection-using-CORE-API-tp4051156.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strange error in Solr 4.2

2013-03-25 Thread skp
I fixed it by setting JVM properties in glassfish.

-Djavax.net.ssl.keyStorePassword=changeit 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-error-in-Solr-4-2-tp4047386p4051159.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tlog File not removed after hard commit

2013-03-25 Thread Niran Fajemisin
Thanks Erick and Michael for the prompt responses.

Cheers,
Niran



>
> From: Erick Erickson 
>To: solr-user@lucene.apache.org 
>Sent: Monday, March 25, 2013 10:21 AM
>Subject: Re: Tlog File not removed after hard commit
> 
>The tlogs will stay there to provide "peer synch" on the last 100 docs. Say
>a node somehow gets out of synch. There are two options
>1> replay from the log
>2> replicate the entire index.
>
>To avoid <2> if possible, the tlog is kept around. In your case, all your
>data is put in the tlog file, so the "keep the last 100 docs available"
>rule means you'll keep the entire log for the run around until the _next_
>run completes, at which point I'd expect the oldest one to be deleted.
>
>Best
>Erick
>
>
>On Mon, Mar 25, 2013 at 8:40 AM, Michael Della Bitta <
>michael.della.bi...@appinions.com> wrote:
>
>> My understanding is that logs stick around for a while just in case they
>> can be used to catch up a shard that rejoins the cluster.
>>  On Mar 24, 2013 12:03 PM, "Niran Fajemisin"  wrote:
>>
>> > Hi all,
>> >
>> > We import about 1.5 million documents on a nightly basis using DIH.
>> During
>> > this time, we need to ensure that all documents make it into index
>> > otherwise rollback on any errors; which DIH takes care of for us. We also
>> > disable autoCommit in DIH but instruct it to commit at the very end of
>> the
>> > import. This is all done through configuration of the DIH config XML file
>> > and the command issued to the request handler.
>> >
>> > We have noticed that the tlog file appears to linger around even after
>> DIH
>> > has issued the hard commit. My expectation would be that after the hard
>> > commit has occurred, the tlog file will be removed. I'm obviously
>> > misunderstanding how this all works.
>> >
>> > Can someone please help me understand how this is meant to function?
>> > Thanks!
>> >
>> > -Niran
>>
>
>
>

Re: Multi-core and replicated Solr cloud testing. Data-directory mis-configures

2013-03-25 Thread Trevor Campbell
That example does not work if you have > 1 collection (core) per node, all
end up sharing the same index and overwrite one another.


On Mon, Mar 25, 2013 at 6:27 PM, Gopal Patwa  wrote:

> if you use default directory then it will use solr.home directory, I have
> tested solr cloud example on local machine with 5-6 nodes.And data
> directory was created under core name, like
>
> "example2/solr/collection1/data". you could see example startup script from
> source code "solr/cloud-dev/solrcloud-multi-start.sh"
>
> example solrconfig.xml
>
>   ${solr.data.dir:}
>
> On Sun, Mar 24, 2013 at 10:44 PM, Trevor Campbell
> wrote:
>
> > I have three indexes which I have set up as three separate cores, using
> > this solr.xml config.
> >
> >> hostPort="${jetty.port:}">
> > 
> >
> > 
> > 
> >
> > 
> >  >
> >
> > 
> >   
> >
> > This works just fine a standalone solr.
> >
> > I duplicated this setup on the same machine under a completely separate
> > solr installation (solr-nodeb) and modified all the data directroies to
> > point to the direstories in nodeb.  This all worked fine.
> >
> > I then connected the 2 instances together with zoo-keeper using settings
> > "-Dbootstrap_conf=true -Dcollection.configName=**jiraCluster -DzkRun
> > -DnumShards=1" for the first intsance and "-DzkHost=localhost:9080" for
> >  the second. (I'm using tomcat and ports 8080 and 8081 for the 2 Solr
> > instances)
> >
> > Now the data directories of the second node point to the data directories
> > in the first node.
> >
> > I have tried many settings in the solrconfig.xml for each core but am now
> > using absolute paths, e.g.
> > /home//solr-**4.2.0-nodeb/example/multicore/**
> > jira-comment/data
> >
> > previously I used
> > ${solr.jira-comment.data.dir:/**home/tcampbell/solr-4.2.0-**
> > nodeb/example/multicore/jira-**comment/data}
> > but that had the same result.
> >
> > It seems zookeeper is forcing data directory config from the uploaded
> > configuration on all the nodes in the cluster?
> >
> > How can I do testing on a single machine? Do I really need identical
> > directory layouts on all machines?
> >
> >
> >
>


Re: DocValues and field requirements

2013-03-25 Thread Marcin Rzewucki
Hi Chris,

Thanks for your detailed explanations. The default value is a difficult
limitation. Especially for financial figures. I may try with some
workaround like the lowest possible number for TrieLongField, but would be
better to avoid such :)

Regards.

On 22 March 2013 20:39, Chris Hostetter  wrote:

>
> : Thank you for your response. Yes, that's strange. By enabling DocValues
> the
> : information about missing fields is lost, which changes the way of
> sorting
> : as well. Adding default value to the fields can change a logic of
> : application dramatically (I can't set default value to 0 for all
> : Trie*Fields fields, because it could impact the results displayed to the
> : end user, which is not good). It's a pity that using DocValues is so
> : limited.
>
> I'm not really up on docvalues, but i asked rmuir about this a bit on IRC>
>
> the crux of the issue is that there are two differnet docvalue impls, one
> that uses a fixed amount of space per doc (ie: exactly one value per doc)
> and one that alloaws an ordered set of values per doc (ie: multivalued).
>
> the multivalued docvals impl was wired into solr for multivalued fields,
> and the single valued docvals impl was wired in for hte single valued case
> -- but since since the single valued docvals impl *has* to have a value
> for every doc, the schema error you encountered was added if you try to
> use it on a field that isn't required or doesn't have a default value --
> to force you to be explicit about which "default" you want, instead of hte
> low level lucene "0" default coming into play w/o you knowing about it.
> (as Shawn mentioned)
>
> the multivalued docvals impl could concivably be used instead for these
> types of single valued fields (ie: to support 0 or 1 values) but there is
> no sorting support for multivalued docvals, so it would cause other
> problems.
>
> One possible workarround for people who want to take advantage of "sort
> missing first/last" type sorting on a docvals type field would be to mange
> the "missing" information yourself in a distinct field which you also
> leveraged in any filtering or sorting on the docvals field.
>
> ie, have a docvalues field "myfield" which is single valued, with some
> configured default value, and then have a "myfield_exists" boolean field
> which is single valued and required.  when indexing docs, if "myfield"
> does/doesn't have a value set "myfield_exists" to accordingly (this would
> be fairly trivial in an updated processor) and then instead of sorting
> just on "myfield desc" you would sort on "myfield_exists (asc|desc),
> myfield desc" (where you pick hte asc or desc depending on wether you want
> docs w/o values first or last).  you would likewise need to filter on
> myfield_exists:true anytime you did queries against the myfield field.
>
>
> (perhaps someoen could work on patch to inject a synthetic field like this
> automatically for fields that are docValues="true" multiValued="false"
> required="false" w/o a defualtValue?)
>
>
> -Hoss
>


Accessing SolrZkClient instance from a plug-in?

2013-03-25 Thread Timothy Potter
I have a custom ValueSourceParser that sets up a Zookeeper Watcher on some
frequently changing metadata that a custom ValueSource depends on.

Basic flow of events is - VSP watches for metadata changes, which triggers
a refresh of some expensive data that my custom ValueSource uses at query
time. Think of the data in Zookeeper as a pointer to some larger dataset
that is computed offline and then loaded into memory for use by my custom
ValueSource.

In my ValueSourceParser, I connect to Zookeeper using an instance of the
SolrZkClient class and am receiving WatchedEvents when my metadata changes
(as expected).

All this works great until core reload happens. From what I can tell,
there's no shutdown hook for ValueSourceParsers, so what's happening is
that my code ends up adding multiple Watchers and thus receives multiple
update events when the metadata changes.

What I need is either

1) a shutdown hook in my VSP that allows me to clean-up the SolrZkClient
instance my code is managing, or

2) access to the ZkController instance owned by the CoreContainer from my
VSP.

For me #2 is better as I'd prefer to just re-use Solr's instance of
SolrZkClient.

I can go and hack either of these in pretty easily but wanted to see if
someone knows a better way to get 1 or 2?

In general, it might be handy to allow plug-ins to get access to the
Zookeeper client SolrCloud is using.

Thanks.
Tim


Re: Multi-core and replicated Solr cloud testing. Data-directory mis-configures

2013-03-25 Thread Trevor Campbell
Solved.

I was able to solve this by removing any reference to dataDir from the
solrconfig.xml.  So in solr.xml for each node I have:
  

   


   


   

  

and in solrconfig.xml in each core I have removed the reference to dataDir
completely.




On Tue, Mar 26, 2013 at 8:41 AM, Trevor Campbell wrote:

> That example does not work if you have > 1 collection (core) per node, all
> end up sharing the same index and overwrite one another.
>
>
> On Mon, Mar 25, 2013 at 6:27 PM, Gopal Patwa  wrote:
>
>> if you use default directory then it will use solr.home directory, I have
>> tested solr cloud example on local machine with 5-6 nodes.And data
>> directory was created under core name, like
>>
>> "example2/solr/collection1/data". you could see example startup script
>> from
>> source code "solr/cloud-dev/solrcloud-multi-start.sh"
>>
>> example solrconfig.xml
>>
>>   ${solr.data.dir:}
>>
>> On Sun, Mar 24, 2013 at 10:44 PM, Trevor Campbell
>> wrote:
>>
>> > I have three indexes which I have set up as three separate cores, using
>> > this solr.xml config.
>> >
>> >   > > hostPort="${jetty.port:}">
>> > 
>> >
>> > 
>> > 
>> >
>> > 
>> > > instanceDir="jira-change-**history" >
>> >
>> > 
>> >   
>> >
>> > This works just fine a standalone solr.
>> >
>> > I duplicated this setup on the same machine under a completely separate
>> > solr installation (solr-nodeb) and modified all the data directroies to
>> > point to the direstories in nodeb.  This all worked fine.
>> >
>> > I then connected the 2 instances together with zoo-keeper using settings
>> > "-Dbootstrap_conf=true -Dcollection.configName=**jiraCluster -DzkRun
>> > -DnumShards=1" for the first intsance and "-DzkHost=localhost:9080" for
>> >  the second. (I'm using tomcat and ports 8080 and 8081 for the 2 Solr
>> > instances)
>> >
>> > Now the data directories of the second node point to the data
>> directories
>> > in the first node.
>> >
>> > I have tried many settings in the solrconfig.xml for each core but am
>> now
>> > using absolute paths, e.g.
>> > /home//solr-**4.2.0-nodeb/example/multicore/**
>> > jira-comment/data
>> >
>> > previously I used
>> > ${solr.jira-comment.data.dir:/**home/tcampbell/solr-4.2.0-**
>> > nodeb/example/multicore/jira-**comment/data}
>> > but that had the same result.
>> >
>> > It seems zookeeper is forcing data directory config from the uploaded
>> > configuration on all the nodes in the cluster?
>> >
>> > How can I do testing on a single machine? Do I really need identical
>> > directory layouts on all machines?
>> >
>> >
>> >
>>
>
>


Re: Accessing SolrZkClient instance from a plug-in?

2013-03-25 Thread Mark Miller
I don't know the ValueSourceParser from a hole in my head, but it looks like it 
has access to the solrcore with fp.req.getCore?

If so, it's easy to get the zk stuff

core.getCoreDescriptor.getCoreContainer.getZkController(.getZkClient).

From memory, so perhaps with some minor misname.

- Mark

On Mar 25, 2013, at 6:03 PM, Timothy Potter  wrote:

> I have a custom ValueSourceParser that sets up a Zookeeper Watcher on some
> frequently changing metadata that a custom ValueSource depends on.
> 
> Basic flow of events is - VSP watches for metadata changes, which triggers
> a refresh of some expensive data that my custom ValueSource uses at query
> time. Think of the data in Zookeeper as a pointer to some larger dataset
> that is computed offline and then loaded into memory for use by my custom
> ValueSource.
> 
> In my ValueSourceParser, I connect to Zookeeper using an instance of the
> SolrZkClient class and am receiving WatchedEvents when my metadata changes
> (as expected).
> 
> All this works great until core reload happens. From what I can tell,
> there's no shutdown hook for ValueSourceParsers, so what's happening is
> that my code ends up adding multiple Watchers and thus receives multiple
> update events when the metadata changes.
> 
> What I need is either
> 
> 1) a shutdown hook in my VSP that allows me to clean-up the SolrZkClient
> instance my code is managing, or
> 
> 2) access to the ZkController instance owned by the CoreContainer from my
> VSP.
> 
> For me #2 is better as I'd prefer to just re-use Solr's instance of
> SolrZkClient.
> 
> I can go and hack either of these in pretty easily but wanted to see if
> someone knows a better way to get 1 or 2?
> 
> In general, it might be handy to allow plug-ins to get access to the
> Zookeeper client SolrCloud is using.
> 
> Thanks.
> Tim



Re: lucene 42 codec

2013-03-25 Thread Chris Hostetter

: I noticed that apache solr 4.2 uses the lucene codec 4.1. How can I
: switch to 4.2?

Unless you've configured something oddly, Solr is already using the 4.2 
codec.  

What you are probably seeing is that the fileformat for several types of 
files hasn't changed from the 4.1 (or even 4.0) versions, so they are 
still used in 4.2 (and confusingly include "Lucene41" in the filenames in 
several cases).

Note that in the 4.2 codec package javadocs, several codec related classes 
are not implemented, and the docs link back to the 4.1 and 4.0 
implementations...

https://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene42/package-summary.html

If you peek inside the Lucene42Codec class you'll also see...

  private final StoredFieldsFormat fieldsFormat = new 
Lucene41StoredFieldsFormat();
  private final TermVectorsFormat vectorsFormat = new 
Lucene42TermVectorsFormat();
  private final FieldInfosFormat fieldInfosFormat = new 
Lucene42FieldInfosFormat();
  private final SegmentInfoFormat infosFormat = new Lucene40SegmentInfoFormat();
  private final LiveDocsFormat liveDocsFormat = new Lucene40LiveDocsFormat();

-Hoss


Re: Accessing SolrZkClient instance from a plug-in?

2013-03-25 Thread Timothy Potter
Brilliant! Thank you - I was focusing on the init method and totally
ignored the FunctionQParser passed to the parse method.

Cheers,
Tim

On Mon, Mar 25, 2013 at 4:16 PM, Mark Miller  wrote:

> I don't know the ValueSourceParser from a hole in my head, but it looks
> like it has access to the solrcore with fp.req.getCore?
>
> If so, it's easy to get the zk stuff
>
> core.getCoreDescriptor.getCoreContainer.getZkController(.getZkClient).
>
> From memory, so perhaps with some minor misname.
>
> - Mark
>
> On Mar 25, 2013, at 6:03 PM, Timothy Potter  wrote:
>
> > I have a custom ValueSourceParser that sets up a Zookeeper Watcher on
> some
> > frequently changing metadata that a custom ValueSource depends on.
> >
> > Basic flow of events is - VSP watches for metadata changes, which
> triggers
> > a refresh of some expensive data that my custom ValueSource uses at query
> > time. Think of the data in Zookeeper as a pointer to some larger dataset
> > that is computed offline and then loaded into memory for use by my custom
> > ValueSource.
> >
> > In my ValueSourceParser, I connect to Zookeeper using an instance of the
> > SolrZkClient class and am receiving WatchedEvents when my metadata
> changes
> > (as expected).
> >
> > All this works great until core reload happens. From what I can tell,
> > there's no shutdown hook for ValueSourceParsers, so what's happening is
> > that my code ends up adding multiple Watchers and thus receives multiple
> > update events when the metadata changes.
> >
> > What I need is either
> >
> > 1) a shutdown hook in my VSP that allows me to clean-up the SolrZkClient
> > instance my code is managing, or
> >
> > 2) access to the ZkController instance owned by the CoreContainer from my
> > VSP.
> >
> > For me #2 is better as I'd prefer to just re-use Solr's instance of
> > SolrZkClient.
> >
> > I can go and hack either of these in pretty easily but wanted to see if
> > someone knows a better way to get 1 or 2?
> >
> > In general, it might be handy to allow plug-ins to get access to the
> > Zookeeper client SolrCloud is using.
> >
> > Thanks.
> > Tim
>
>


Any experience with adding documents batch sizes?

2013-03-25 Thread Benjamin, Roy
My application is update intensive.  The documents are pretty small, less than 
1K bytes.

Just now I'm batching 4K documents with each SolrJ addDocs() call.

Wondering what I should expect with increasing this batch size?  Say 8K docs 
per update?

Thanks

Roy


Solr 3.6





Re: status 400 on posting json

2013-03-25 Thread Patrice Seyed
Hi Jack, I tried putting the schema.xml file (further below) in the
path you specified below, but when i tried to start (java -jar
start.jar) got the message below.

I can try a fresh install like you suggested, but I'm not sure what
would be different. I was using documenationt at
http://lucene.apache.org/solr/4_1_0/tutorial.html using the binary
from zip. Are you suggesting building from source and/or some other
approach? Also, what is the best documentation currently for 4.1
install (for mac), (there are a lot of sites out there.) Thanks in
advance. -Patrice

SEVERE: Unable to create core: collection1
org.apache.solr.common.SolrException: Unknown fieldtype 'string'
specified on field id
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:390)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)
Mar 25, 2013 7:14:53 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Unable to create
core: collection1
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1654)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1039)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.solr.common.SolrException: Unknown fieldtype
'string' specified on field id
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:390)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033)
... 10 more


---

Here's the normal path to the example configuration in Solr 4.1:

.../solr-4.1.0/example/solr/collection1/conf

That's the directory in which the example schema.xml and other
configuration files live.

There is no solr-4.1.0/example/conf directory, unless you managed to
create one yourself.

I suggest that you start with a fresh install of Solr 4.1

As far as keywords, the existing field is set up to be a
comma-separated list of keyword phrases. Of course, you can structure
it any way that your application requires.

-- Jack Krupansky

-Original Message- From: Patrice Seyed
Sent: Saturday, March 16, 2013 2:48 AM
To: solr-user@lucene.apache.org
Subject: Re: status 400 on posting json

Hi,

Re:

-
Is there some place I should indicate what parameters are including in
the json objects send? I was able to test books.json without the
error.

"Yes, in Solr's schema.xml (under the conf/ directory).  See
 for more details.

   Erik Hatcher"

and:

-

"I tried it and I get the same error response! Which is because... I
don't have a field named "datasource".

You need to check the Solr schema.xml for the available fields and
then add any fields that your JSON uses that are not already there. Be
sure to shutdown and restart Solr after editing the schema.

I did notice that there is a "keywords" field, but it is not
multivalued, while you keywords are multivalued.

Or, you can us dynamic fields, such as "datasource_s" and "keywords_ss
("s" for string and a second "s" for multivalued), etc. for your other
fields.

-- Jack Krupansky"

-

Thanks very much for these responses.  I'm still

Re: Problem with DataImportHandler and embedded entities

2013-03-25 Thread Rulian Estivalletti
Did you ever resolve the issue with your full-import only importing 1
document.
I'm monitoring the source db and its only issuing one query, it never
attempts to query for the other documents on the top of the nest.
I'm running into the exact same issue with NO help out there.
Thanks in advance


Solrcloud 4.1 Collection with multiple slices only use

2013-03-25 Thread Chris R
I have two issues and I'm unsure if they are related:

Problem:  After setting up a multiple collection Solrcloud 4.1 instance on
seven servers, when I index the documents they aren't distributed across
the index slices.  It feels as though, I don't actually have a "cloud"
implementation, yet everything I see in the admin interface and zookeeper
implies I do.  I feel as I'm overlooking something obvious, but have not
been able to figure out what.

Configuration: Seven servers and four collections, each with 12 slices (no
replica shards yet).  Zookeeper configured in a three node ensemble.  When
I send documents to Server1/Collection1 (which holds two slices of
collection1), all the documents show up in a single index shard (core).
 Perhaps related, I have found it impossible to get Solr to recognize the
server names with anything but a literal host="servername" parameter in the
solr.xml.  hostname parameters, host files, network, dns, are all
configured correctly

I have a Solr 4.0 single collection set up similarly and it works just
fine.  I'm using the same schema.xml and solrconfig.xml files on the 4.1
implementation with only the luceneMatchVersion changed to LUCENE_41.

sample solr.xml from server1














Thanks
Chris


Re: Any experience with adding documents batch sizes?

2013-03-25 Thread Otis Gospodnetic
Hi,

You'll have to test because there is no general rule that works in all
environments, but from testing this a while back, you will reach the point
of diminishing returns at some point.  You don't mention using
StreamingUpdateSolrServer, so you may want to try that instead:
http://lucene.apache.org/solr/api-3_6_1/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Mar 25, 2013 at 7:06 PM, Benjamin, Roy  wrote:

> My application is update intensive.  The documents are pretty small, less
> than 1K bytes.
>
> Just now I'm batching 4K documents with each SolrJ addDocs() call.
>
> Wondering what I should expect with increasing this batch size?  Say 8K
> docs per update?
>
> Thanks
>
> Roy
>
>
> Solr 3.6
>
>
>
>


Re: OutOfMemoryError

2013-03-25 Thread Otis Gospodnetic
Arkadi,

jstat -gcutil -h20  2000 100 also gives useful info about GC and I use
it a lot for quick insight into what is going on with GC.  SPM (see
http://sematext.com/spm/index.html ) may also be worth using.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/






On Mon, Mar 25, 2013 at 11:01 AM, Arkadi Colson  wrote:

> How can I see if GC is actually working? Is it written in the tomcat logs
> as well or will I only see it in the memory graphs?
>
> BR,
> Arkadi
>
> On 03/25/2013 03:50 PM, Bernd Fehling wrote:
>
>> We use munin with jmx plugin for monitoring all server and Solr
>> installations.
>> (http://munin-monitoring.org/)
>>
>> Only for short time monitoring we also use jvisualvm delivered with Java
>> SE JDK.
>>
>> Regards
>> Bernd
>>
>> Am 25.03.2013 14:45, schrieb Arkadi Colson:
>>
>>> Thanks for the info!
>>> I just upgraded java from 6 to 7...
>>> How exactly do you monitor the memory usage and the affect of the
>>> garbage collector?
>>>
>>>
>>> On 03/25/2013 01:18 PM, Bernd Fehling wrote:
>>>
 The of UseG1GC yes,
 but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM
 (1.7.0_07).
 os.arch: amd64
 os.name: Linux
 os.version: 2.6.32.13-0.5-xen

 Only args are "-XX:+UseG1GC -Xms16g -Xmx16g".
 Monitoring shows that 16g is a bit high, I might reduce it to 10g or
 12g for the slaves.
 Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g.
 Single index, 130GByte, 43.5 mio. dokuments.

 Regards,
 Bernd


 Am 25.03.2013 11:55, schrieb Arkadi Colson:

> Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7?
> Any extra options needed?
>
> Thanks...
>
> On 03/25/2013 08:34 AM, Arkadi Colson wrote:
>
>> I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m
>> as parameters. I also added -XX:+UseG1GC to the java process. But now
>> the whole machine crashes! Any idea why?
>>
>> Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked
>> oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/
>> mems_allowed=0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm:
>> java Not tainted 2.6.32-5-amd64 #1
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078155]
>> [] ? oom_kill_process+0x7f/0x23f
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078233]
>> [] ? __out_of_memory+0x12a/0x141
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078309]
>> [] ? out_of_memory+0x140/0x172
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078385]
>> [] ? __alloc_pages_nodemask+0x4ec/**0x5fc
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078469]
>> [] ? io_schedule+0x93/0xb7
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078541]
>> [] ? __do_page_cache_readahead+**0x9b/0x1b4
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078626]
>> [] ? wake_bit_function+0x0/0x23
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078702]
>> [] ? ra_submit+0x1c/0x20
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078773]
>> [] ? filemap_fault+0x17d/0x2f6
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078849]
>> [] ? __do_fault+0x54/0x3c3
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078921]
>> [] ? handle_mm_fault+0x3b8/0x80f
>> Mar 22 20:30:01 solr01-gs kernel: [716098.078999]
>> [] ? apic_timer_interrupt+0xe/0x20
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079078]
>> [] ? do_page_fault+0x2e0/0x2fc
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079153]
>> [] ? page_fault+0x25/0x30
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0,
>> btch:   1 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0,
>> btch:   1 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0,
>> btch:   1 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0,
>> btch:   1 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32
>> per-cpu:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186,
>> btch:  31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186,
>> btch:  31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186,
>> btch:  31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186,
>> btch:  31 usd:   0
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal
>> per-cpu:
>> Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186,
>> btch:  31 usd:  17
>> Mar 22 20:30:

Re: status 400 on posting json

2013-03-25 Thread Jack Krupansky
Your schema has only "fields", but no field "types". Check the Solr example 
schema for reference, and include all of the types defined there unless you 
know that you do not need them. "string" is clearly one that is needed.


-- Jack Krupansky

-Original Message- 
From: Patrice Seyed

Sent: Monday, March 25, 2013 7:19 PM
To: solr-user@lucene.apache.org
Subject: Re: status 400 on posting json

Hi Jack, I tried putting the schema.xml file (further below) in the
path you specified below, but when i tried to start (java -jar
start.jar) got the message below.

I can try a fresh install like you suggested, but I'm not sure what
would be different. I was using documenationt at
http://lucene.apache.org/solr/4_1_0/tutorial.html using the binary
from zip. Are you suggesting building from source and/or some other
approach? Also, what is the best documentation currently for 4.1
install (for mac), (there are a lot of sites out there.) Thanks in
advance. -Patrice

SEVERE: Unable to create core: collection1
org.apache.solr.common.SolrException: Unknown fieldtype 'string'
specified on field id
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:390)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:680)
Mar 25, 2013 7:14:53 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Unable to create
core: collection1
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1654)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1039)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.solr.common.SolrException: Unknown fieldtype
'string' specified on field id
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:390)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033)
... 10 more


---

Here's the normal path to the example configuration in Solr 4.1:

.../solr-4.1.0/example/solr/collection1/conf

That's the directory in which the example schema.xml and other
configuration files live.

There is no solr-4.1.0/example/conf directory, unless you managed to
create one yourself.

I suggest that you start with a fresh install of Solr 4.1

As far as keywords, the existing field is set up to be a
comma-separated list of keyword phrases. Of course, you can structure
it any way that your application requires.

-- Jack Krupansky

-Original Message- From: Patrice Seyed
Sent: Saturday, March 16, 2013 2:48 AM
To: solr-user@lucene.apache.org
Subject: Re: status 400 on posting json

Hi,

Re:

-
Is there some place I should indicate what parameters are including in
the json objects send? I was able to test books.json without the
error.

"Yes, in Solr's schema.xml (under the conf/ directory).  See
 for more details.

  Erik Hatcher"

and:

-

"I tried it and I get the same error response! Which is because... I
don't have a field named "datasource".

You need to check the Solr schema.xml for the available fields and
then add any fields that your JSON uses that are not already there. Be
sure to shutdown and restart Solr after editing the schema.

I did notice that there is a "keywords" field, but it is not
multivalued, while you keywords are multivalued.

Or, you can us dynamic fields, such as "datasource_s" and "keywords_ss
("s" fo

Re: Using Solr For a Real Search Engine

2013-03-25 Thread Otis Gospodnetic
Hi,

This question is too open-ended for anyone to give you a good answer.
 Maybe you want to ask more specific questions?  As for embedding vs. war,
start with a simpler war and think about the alternatives if that doesn't
work for you.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI wrote:

> If I want to use Solr in a web search engine what kind of strategies should
> I follow about how to run Solr. I mean I can run it via embedded jetty or
> use war and deploy to a container? You should consider that I will have
> heavy work load on my Solr.
>


Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-25 Thread Mark Miller
I'm guessing you didn't specify numShards. Things changed in 4.1 - if you don't 
specify numShards it goes into a mode where it's up to you to distribute 
updates.

- Mark

On Mar 25, 2013, at 10:29 PM, Chris R  wrote:

> I have two issues and I'm unsure if they are related:
> 
> Problem:  After setting up a multiple collection Solrcloud 4.1 instance on
> seven servers, when I index the documents they aren't distributed across
> the index slices.  It feels as though, I don't actually have a "cloud"
> implementation, yet everything I see in the admin interface and zookeeper
> implies I do.  I feel as I'm overlooking something obvious, but have not
> been able to figure out what.
> 
> Configuration: Seven servers and four collections, each with 12 slices (no
> replica shards yet).  Zookeeper configured in a three node ensemble.  When
> I send documents to Server1/Collection1 (which holds two slices of
> collection1), all the documents show up in a single index shard (core).
> Perhaps related, I have found it impossible to get Solr to recognize the
> server names with anything but a literal host="servername" parameter in the
> solr.xml.  hostname parameters, host files, network, dns, are all
> configured correctly
> 
> I have a Solr 4.0 single collection set up similarly and it works just
> fine.  I'm using the same schema.xml and solrconfig.xml files on the 4.1
> implementation with only the luceneMatchVersion changed to LUCENE_41.
> 
> sample solr.xml from server1
> 
> 
> 
>  shareSchema="true" zkClientTimeout="6">
>  instanceDir="/solr/col201301/col201301s04sh01" name="col201301s04sh01"
> dataDir="/solr/col201301/col201301s04sh01/data"/>
>  instanceDir="/solr/col201301/col201301s11sh01" name="col201301s11sh01"
> dataDir="/solr/col201301/col201301s11sh01/data"/>
>  instanceDir="/solr/col201302/col201302s06sh01" name="col201302s06sh01"
> dataDir="/solr/col201302/col201302s06sh01/data"/>
>  instanceDir="/solr/col201303/col201303s01sh01" name="col201303s01sh01"
> dataDir="/solr/col201303/col201303s01sh01/data"/>
>  instanceDir="/solr/col201303/col201303s08sh01" name="col201303s08sh01"
> dataDir="/solr/col201303/col201303s08sh01/data"/>
>  instanceDir="/solr/col201304/col201304s03sh01" name="col201304s03sh01"
> dataDir="/solr/col201304/col201304s03sh01/data"/>
>  instanceDir="/solr/col201304/col201304s10sh01" name="col201304s10sh01"
> dataDir="/solr/col201304/col201304s10sh01/data"/>
> 
> 
> 
> Thanks
> Chris



Re: opinion: Stats over the faceting component

2013-03-25 Thread Otis Gospodnetic
Nope, this doesn't find it:
http://search-lucene.com/?q=facet+stats&fc_project=Solr&fc_type=issue

Maybe Anirudha wants to do that?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Mar 21, 2013 at 5:16 AM, Upayavira  wrote:

> Have you made a JIRA ticket for this? This is useful generally, isn't
> it?
>
> Thx, Upayavira
>
> On Thu, Mar 21, 2013, at 03:18 AM, Tirthankar Chatterjee wrote:
> > We have done something similar.
> > Please read
> >
> http://lucene.472066.n3.nabble.com/How-to-modify-Solr-StatsComponent-to-support-stats-query-td4028991.html
> >
> > https://plus.google.com/101157854606139706613/posts/HmYYit3RABM
> >
> > If this is something you wanted.
> >
> > On Mar 20, 2013, at 7:08 PM, Anirudha Jadhav wrote:
> >
> > I want to get an opinion here , instead of having  statistics as an
> > independent component which is always limited by faceting features ( eg.
> > does not support date ranges or custom ranges , pivots etc).
> >
> > Why not have a parameter to facet component to compute and return stats.
> >
> > eg. facet.stats=true,facet.stats.stat=min,max,(sum(sqrt(x),log(y),z,0.5))
> >
> > let me know your thoughts,
> >
> > --
> > Anirudha P. Jadhav
> >
> >
> >
> > **Legal Disclaimer***
> > "This communication may contain confidential and privileged
> > material for the sole use of the intended recipient. Any
> > unauthorized review, use or distribution by others is strictly
> > prohibited. If you have received the message in error, please
> > advise the sender by reply email and delete the message. Thank
> > you."
> > *
>


Re: Solr index Backup and restore of large indexs

2013-03-25 Thread Otis Gospodnetic
Hi,

Try something like this: http://host/solr/replication?command=backup

See: http://wiki.apache.org/solr/SolrReplication

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Mar 21, 2013 at 3:23 AM, Sandeep Kumar Anumalla
 wrote:
>
> Hi,
>
> We are loading daily 1TB (Apprx) of index data .Please let me know the best 
> procedure to take Backup and restore of the indexes. I am using Solr 4.2.
>
>
>
> Thanks & Regards
> Sandeep A
> Ext : 02618-2856
> M : 0502493820
>
>
> 
> The content of this email together with any attachments, statements and 
> opinions expressed herein contains information that is private and 
> confidential are intended for the named addressee(s) only. If you are not the 
> addressee of this email you may not copy, forward, disclose or otherwise use 
> it or any part of it in any form whatsoever. If you have received this 
> message in error please notify postmas...@etisalat.ae by email immediately 
> and delete the message without making any copies.


Re: Shingles Filter Query time behaviour

2013-03-25 Thread Otis Gospodnetic
Hi,

What does your query look like?  Does it look like q=name:dark knight?
 If so, note that only "dark" is going against the "name" field.  Try
q=name:dark name:knight or q=name:"dark knight".

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Mar 18, 2013 at 6:21 PM, Catala, Francois
 wrote:
> Hello,
>
> I am trying to have the input "darkknight" match documents containing either 
> "dark knight" and "darkknight".
> The reverse should also work ("dark knight" matching "dark knight" and 
> "darkknight") but it doesn't. Does anyone know why?
>
>
> When I run the following query I get the expected response with the two 
> documents matched
>
> 
>   0
>   1
>   
> name
> true
> name:darkknight
> xml
>   
> 
> 
>   
> Batman, the darkknight Rises
>   
> Batman, the dark knight Rises
> 
> 
>
>
> HOWEVER when I run the same query looking for "dark knight" two words I get 
> only 1 document matched as shows the response :
>
> 
>   0
>   0
>   
> name
> true
> name:dark knight
> xml
>   
> 
> 
>   
> Batman, the dark knight Rises
> 
> 
>
> I have these documents as input :
>
> 
>   bat1
>   Batman, the dark knight Rises
> 
> 
>   bat2
>   Batman, the darkknight Rises
> 
>
> And I defined this analyser :
>
>   
> 
> 
>  tokenSeparator=""
> outputUnigrams="true"/>
>   
>   
> 
> 
>  tokenSeparator=""
> outputUnigrams="true"
> outputUnigramIfNoNgrams="true"/>
>   


Re: Shingles Filter Query time behaviour

2013-03-25 Thread Jack Krupansky

Or, q=name:(dark knight) .

-- Jack Krupansky

-Original Message- 
From: Otis Gospodnetic

Sent: Monday, March 25, 2013 11:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Shingles Filter Query time behaviour

Hi,

What does your query look like?  Does it look like q=name:dark knight?
If so, note that only "dark" is going against the "name" field.  Try
q=name:dark name:knight or q=name:"dark knight".

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Mar 18, 2013 at 6:21 PM, Catala, Francois
 wrote:

Hello,

I am trying to have the input "darkknight" match documents containing 
either "dark knight" and "darkknight".
The reverse should also work ("dark knight" matching "dark knight" and 
"darkknight") but it doesn't. Does anyone know why?



When I run the following query I get the expected response with the two 
documents matched



  0
  1
  
name
true
name:darkknight
xml
  


  
Batman, the darkknight Rises
  
Batman, the dark knight Rises




HOWEVER when I run the same query looking for "dark knight" two words I 
get only 1 document matched as shows the response :



  0
  0
  
name
true
name:dark knight
xml
  


  
Batman, the dark knight Rises



I have these documents as input :


  bat1
  Batman, the dark knight Rises


  bat2
  Batman, the darkknight Rises


And I defined this analyser :

  



  
  



   




Re: Query slow with termVectors termPositions termOffsets

2013-03-25 Thread Ravi Solr
Yes the index size increased after turning on termPositions and termOffsets

Ravi Kiran Bhaskar

On Mon, Mar 25, 2013 at 1:13 PM,  wrote:

> Did index size increase after turning on termPositions and termOffsets?
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
> -Original Message-
> From: Ravi Solr 
> To: solr-user 
> Sent: Mon, Mar 25, 2013 8:27 am
> Subject: Query slow with termVectors termPositions termOffsets
>
>
> Hello,
> We re-indexed our entire core of 115 docs with some of the
> fields having termVectors="true" termPositions="true" termOffsets="true",
> prior to the reindex we only had termVectors="true". After the reindex the
> the query component has become very slow. I thought that adding the
> termOffsets and termPositions will increase the speed, am I wrong ? Several
> queries like the one shown below which used to run fine are now very slow.
> Can somebody kindly clarify how termOffsets and termPositions affect query
> component ?
>
> 19076.0
>   name="time">18972.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="org.apache.solr.handler.component.QueryElevationComponent"> name="time">0.0
>  name="time">0.0
>  name="time">104.0
> 
>
>
>
> [#|2013-03-25T11:22:53.446-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=45;_ThreadName=httpSSLWorkerThread-9001-19;|[xxx]
> webapp=/solr-admin path=/select
>
> params={q=primarysectionnode:(/national*+OR+/health*)+OR+(contenttype:Blog+AND+subheadline:("The+Checkup"+OR+"Checkpoint+Washington"+OR+"Post+Carbon"+OR+TSA+OR+"College+Inc."+OR+"Campus+Overload"+OR+"Planet+Panel"+OR+"The+Answer+Sheet"+OR+"Class+Struggle"+OR+"BlogPost"))+OR+(contenttype:"Photo+Gallery"+AND+headline:"day+in+photos")&start=0&rows=1&sort=displaydatetime+desc&fq=-source:(Reuters+OR+"PC+World"+OR+"CBS+News"+OR+NC8/WJLA+OR+"NewsChannel+8"+OR+NC8+OR+WJLA+OR+CBS)+-contenttype:("Discussion"+OR+"Photo")+-slug:(op-*dummy*+OR+noipad-*)+-(contenttype:"Photo+Gallery"+AND+headline:("Drawing+Board"+OR+"Drawing+board"+OR+"drawing+board"))+headline:[*+TO+*]+contenttype:[*+TO+*]+pubdatetime:[NOW/DAY-3YEARS+TO+NOW/DAY%2B1DAY]+-headline:("Summary+Box*"+OR+"Video*"+OR+"Post+Sports+Live*")+-slug:(warren*+OR+"history")+-(contenttype:Blog+AND+subheadline:("DC+Schools+Insider"+OR+"On+Leadership"))+contenttype:"Blog"+-systemid:(999c7102-955a-11e2-95ca-dd43e7ffee9c+OR+72bbb724-9554-11e2-95ca-dd43e7ffee9c+OR+2d008b80-9520-11e2-95ca-dd43e7ffee9c+OR+d2443d3c-9514-11e2-95ca-dd43e7ffee9c+OR+173764d6-9520-11e2-95ca-dd43e7ffee9c+OR+0181fd42-953c-11e2-95ca-dd43e7ffee9c+OR+e6cacb96-9559-11e2-95ca-dd43e7ffee9c+OR+03288052-9501-11e2-95ca-dd43e7ffee9c+OR+ddbf020c-9517-11e2-95ca-dd43e7ffee9c)+fullbody:[*+TO+*]&wt=javabin&version=2}
> hits=4985 status=0 QTime=19044 |#]
>
> Thanks,
>
> Ravi Kiran Bhaskar
>
>
>


Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-25 Thread Chris R
Interesting, I saw some comments about numshards, but it wasnt ever
specific enough to catch.my attention.  I will give it a try tomorrow.
Thanks.
On Mar 25, 2013 11:35 PM, "Mark Miller"  wrote:

> I'm guessing you didn't specify numShards. Things changed in 4.1 - if you
> don't specify numShards it goes into a mode where it's up to you to
> distribute updates.
>
> - Mark
>
> On Mar 25, 2013, at 10:29 PM, Chris R  wrote:
>
> > I have two issues and I'm unsure if they are related:
> >
> > Problem:  After setting up a multiple collection Solrcloud 4.1 instance
> on
> > seven servers, when I index the documents they aren't distributed across
> > the index slices.  It feels as though, I don't actually have a "cloud"
> > implementation, yet everything I see in the admin interface and zookeeper
> > implies I do.  I feel as I'm overlooking something obvious, but have not
> > been able to figure out what.
> >
> > Configuration: Seven servers and four collections, each with 12 slices
> (no
> > replica shards yet).  Zookeeper configured in a three node ensemble.
>  When
> > I send documents to Server1/Collection1 (which holds two slices of
> > collection1), all the documents show up in a single index shard (core).
> > Perhaps related, I have found it impossible to get Solr to recognize the
> > server names with anything but a literal host="servername" parameter in
> the
> > solr.xml.  hostname parameters, host files, network, dns, are all
> > configured correctly
> >
> > I have a Solr 4.0 single collection set up similarly and it works just
> > fine.  I'm using the same schema.xml and solrconfig.xml files on the 4.1
> > implementation with only the luceneMatchVersion changed to LUCENE_41.
> >
> > sample solr.xml from server1
> >
> > 
> > 
> >  > shareSchema="true" zkClientTimeout="6">
> >  > instanceDir="/solr/col201301/col201301s04sh01" name="col201301s04sh01"
> > dataDir="/solr/col201301/col201301s04sh01/data"/>
> >  > instanceDir="/solr/col201301/col201301s11sh01" name="col201301s11sh01"
> > dataDir="/solr/col201301/col201301s11sh01/data"/>
> >  > instanceDir="/solr/col201302/col201302s06sh01" name="col201302s06sh01"
> > dataDir="/solr/col201302/col201302s06sh01/data"/>
> >  > instanceDir="/solr/col201303/col201303s01sh01" name="col201303s01sh01"
> > dataDir="/solr/col201303/col201303s01sh01/data"/>
> >  > instanceDir="/solr/col201303/col201303s08sh01" name="col201303s08sh01"
> > dataDir="/solr/col201303/col201303s08sh01/data"/>
> >  > instanceDir="/solr/col201304/col201304s03sh01" name="col201304s03sh01"
> > dataDir="/solr/col201304/col201304s03sh01/data"/>
> >  > instanceDir="/solr/col201304/col201304s10sh01" name="col201304s10sh01"
> > dataDir="/solr/col201304/col201304s10sh01/data"/>
> > 
> > 
> >
> > Thanks
> > Chris
>
>


Re: Scaling Solr on VMWare

2013-03-25 Thread Otis Gospodnetic
Hi Frank,

If your servlet container had a crazy low setting for the max number
of threads I think you would see the CPU underutilized.  But I think
you would also see errors in on the client about connections being
requested.  Sounds like a possibly VM issue that's not
Solr-specific...

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Mar 25, 2013 at 1:18 PM, Frank Wennerdahl
 wrote:
> Hi.
>
>
>
> We are currently benchmarking our Solr setup and are having trouble with
> scaling hardware for a single Solr instance. We want to investigate how one
> instance scales with hardware to find the optimal ratio of hardware vs
> sharding when scaling. Our main problem is that we cannot identify any
> hardware limitations, CPU is far from maxed out, disk I/O is not an issue as
> far as we can see and there is plenty of RAM available.
>
>
>
> In short we have a couple of questions that we hope someone here could help
> us with. Detailed information about our setup, use case and things we've
> tried is provided below the questions.
>
>
>
> Questions:
>
> 1.   What could cause Solr to utilize only 2 CPU cores when sending
> multiple update requests in parallel in a VMWare environment?
>
> 2.   Is there a software limit on the number of CPU cores that Solr can
> utilize while indexing?
>
> 3.   Ruling out network and disk performance, what could cause a
> decrease in indexing speed when sending data over a network as opposed to
> sending it from the local machine?
>
>
>
> We are running on three cores per Solr instance, however only one core
> receives any non-trivial load. We are using VMWare (ESX 5.0) virtual
> machines for hosting Solr and a QNAP NAS containing 12 HDDs in a RAID5 setup
> for storage. Our data consists of a huge amount of small-sized documents.
> When indexing we are using Solr's javabin format (although not through
> Solrj, we have implemented the format in C#/.NET) and our batch size is
> currently 1000 documents. The actual size of the data varies, but the
> batches we have used range from approximately 450KB to 1050KB. We're sending
> these batches to Solr in parallel using a number of send threads.
>
>
>
> There are two issues that we've run into:
>
> 1.   When sending data from one VM to Solr on another VM we observed
> that Solr did not seem to utilize CPU cores properly. The Solr VM had 8
> vCPUs available and we were using 4 threads sending data in parallel. We saw
> a low (~29%)  CPU utilization on the Solr VM with 2 cores doing almost all
> the work while the remaining cores remained almost idle. Increasing the
> number of send threads to 8 yielded the same result, capping our indexing
> speed to about 4.88MB per second. The client VM had 4 vCPUs which were
> hardly utilized as we were reading data from pre-generated files.
>
> To rule out network limitations we sent the test data to a server on the
> Solr VM that simply accepted the request and returned an empty response. We
> were able to send data at 219MB per second, so the network did not seem to
> be the bottleneck. We also tested sending data to Solr locally from the Solr
> VM to see if disk I/O was the problem. Surprisingly we were able to index
> significantly faster at 7.34MB per second using 4 send threads (8.4MB with 6
> send threads) which indicated that the disk was not slowing us down when
> sending data over the network. Worth noting is that the CPU utilization was
> now higher (47,81% with 4 threads, 58,8% with 6) and the work was spread out
> over all cores. As before we used pre-generated files and the process
> sending the data used almost no CPU.
>
> 2.   We decided to investigate how Solr would scale with additional
> vCPUs when indexing locally. We increased the number of vCPUs to 16 and the
> number of send threads to 8. Sadly we now experienced a decrease in
> performance: 7MB/s with 8 threads, 6.4MB/s with 12 threads and 4.95/s with
> 16 threads. The CPU usage was in average 30%, regardless of the number of
> threads used. We know that additional vCPUs can cause decreased performance
> in VMWare virtual machines due to time waiting for CPUs to become available.
> We investigated this using esxtop which only showed a 1% CSTP. According to
> VMWare
>  splayKC&externalId=1005362>  a CSTP above 3% could indictate that multiple
> vCPUs are causing performance issues.
>
> We noticed that the average disk write speed seemed to cap at around 11.5
> million bytes per second so we tested the same VM setup using a faster disk.
> This did not yield any increase in performance (it was actually somewhat
> slower), neither did using a RAM-mapped drive for Solr.
>
>
>
> Any help or ideas of what could be the bottleneck in our setup would be
> greatly appreciated!
>
>
>
> Best regards,
>
> Frank Wennerdahl
>
> Developer
>
> Arcadelia AB
>


RE: Slow queries for common terms

2013-03-25 Thread David Parks
"book" by itself returns in 4s (non-optimized disk IO), running it a second
time returned 0s, so I think I can presume that the query was not cached the
first time. This system has been up for week, so it's warm.

I'm going to give your article a good long read, thanks for that.   

I guess good fast disks/SSDs and sharding should also improve on the base 4
sec query time. How _does_ Google get their queries times down to 0.35s
anyway? I presume their indexes are larger than my 150G index. :)

I still am a bit worried about what will happen when my index is 500GB
(it'll happen soon enough), even with sharding... well... I'd just need a
lot of servers it seems, and my feeling of it is that if I need a lot of
servers for a few users, how will it scale to many users?

Thanks for the great discussion,
Dave


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, March 25, 2013 10:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Slow queries for common terms

take a look here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

looking at memory consumption can be a bit tricky to interpret with
MMapDirectory.

But you say "I see the CPU working very hard" which implies that your issue
is just scoring 90M documents. A way to test: try q=*:*&fq=field:book. My
bet is that that will be much faster, in which case scoring is your
choke-point and you'll need to spread that load across more servers, i.e.
shard.

When running the above, make sure of a couple of things:
1> you haven't run the fq query before (or you have filterCache turned
completely off).
2> you _have_ run a query or two that warms up your low-level caches.
Doesn't matter what, just as long as it doesn't have an fq clause.

Best
Erick



On Sat, Mar 23, 2013 at 3:10 AM, David Parks  wrote:

> I see the CPU working very hard, and at the same time I see 2 MB/sec 
> disk access for that 15 seconds. I am not running it this instant, but 
> it seems to me that there was more CPU cycles available, so unless 
> it's an issue of not being able to multithread it any  further I'd say
it's more IO related.
>
> I'm going to set up solr cloud and shard across the 2 servers I have 
> available for now. It's not an optimal setup we have while we're in a 
> private beta period, but maybe it'll improve things (I've got 2 
> servers with 2x 4TB disks in raid-0 shared with the webservers).
>
> I'll work towards some improved IO performance and maybe more shards 
> and see how things go. I'll also be able to up the RAM in just a 
> couple of weeks.
>
> Are there any settings I should think of in terms of improving cache 
> performance when I can give it say 10GB of RAM?
>
> Thanks, this has been tremendously helpful.
>
> David
>
>
> -Original Message-
> From: Tom Burton-West [mailto:tburt...@umich.edu]
> Sent: Saturday, March 23, 2013 1:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Slow queries for common terms
>
> Hi David and Jan,
>
> I wrote the blog post, and David, you are right, the problem we had 
> was with phrase queries because our positions lists are so huge.  
> Boolean
> queries don't need to read the positions lists.   I think you need to
> determine whether you are CPU bound or I/O bound.It is possible that
> you are I/O bound and reading the term frequency postings for 90 
> million docs is taking a long time.  In that case, More memory in the 
> machine (but not dedicated to Solr) might help because Solr relies on 
> OS disk caching for caching the postings lists.  You would still need 
> to do some cache warming with your most common terms.
>
> On the other hand as Jan pointed out, you may be cpu bound because 
> Solr doesn't have early termination and has to rank all 90 million 
> docs in order to show the top 10 or 25.
>
> Did you try the OR search to see if your CPU is at 100%?
>
> Tom
>
> On Fri, Mar 22, 2013 at 10:14 AM, Jan Høydahl 
> wrote:
>
> > Hi
> >
> > There might not be a final cure with more RAM if you are CPU bound.
> > Scoring 90M docs is some work. Can you check what's going on during 
> > those
> > 15 seconds? Is your CPU at 100%? Try an (foo OR bar OR baz) search 
> > which generates >100mill hits and see if that is slow too, even if 
> > you don't use frequent words.
> >
> > I'm sure you can find other frequent terms in your corpus which 
> > display similar behaviour, words which are even more frequent than 
> > "book". Are you using "AND" as default operator? You will benefit 
> > from limiting the number of results as much as possible.
> >
> > The real solution is to shard across N number of servers, until you 
> > reach the desired performance for the desired indexing/querying load.
> >
> > --
> > Jan Høydahl, search solution architect Cominvent AS - 
> > www.cominvent.com Solr Training - www.solrtraining.com
> >
> >
>
>