from:"patrick"

Re: dataimporthandler: nested query is called multiple times

2013-03-21 Thread patrick


alex, thank you for the link.

i enabled the trace for 'org.apache.solr.handler.dataimport' and it 
seems as if the database is only called once:



  2013-03-21T09:40:43
  1363855243889
  50
  org.apache.solr.handler.dataimport.JdbcDataSource
  FINE

org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator
  <init>
  11
  Executing SQL: select * from doc_properties where 
DOCID='0u3xouyscdhye61o'



therefore i assume the output shown in the dataimporthandler UI is 
incorrect. i could doublecheck with the database logs


cheerio,
patrick

On 20.03.2013 12:07, Alexandre Rafalovitch wrote:

There was something like this on Stack Overflow:
http://stackoverflow.com/questions/15164166/solr-filelistentityprocessor-is-executing-sub-entities-multiple-times

Upgrading Solr helped partially, but the conclusion was not fully
satisfactory.

Regards,
 Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Mar 20, 2013 at 6:48 AM, patrick  wrote:


hi,

the dataimport-config-file i'm using with solr3.6.2 uses a nested select
statement. the first query retrieves
the documents while the nested one retrieves the corresponding properties.


   
   

   
 
   
   
 
   
 
   


when running the dataimporthandler with the verbose/debug flag turned on
the output lists more than one query for
'entity:attributes' - this list is increased for each 'entity:item':




   
 
   select DOCID from documents
   0:0:0.50
   --- row #1-
   000emnslnbh88hdd<**/str>
   -**
   
 select * from doc_properties where
DOCID='000emnslnbh88hdd'
 select * from doc_properties where
DOCID='000emnslnbh88hdd'
 0:0:0.37
 0:0:0.37
 --- row #1-
 I
 message_**direction
 -**
 --- row #2-
 heb@test
 message_**event_source
 

 
   --- row #1-
   000hsjunnbh7weq8<**/str>
   -**
   
 select * from doc_properties where
DOCID='000hsjunnbh7weq8'
 select * from doc_properties where
DOCID='000hsjunnbh7weq8'
 select * from doc_properties where
DOCID='000hsjunnbh7weq8'
 select * from doc_properties where
DOCID='000hsjunnbh7weq8'
 0:0:0.1
 0:0:0.1
 0:0:0.1
 0:0:0.1
 --- row #1-
 I
 message_**direction
 -**
 --- row #2-
 heb@test
     message_**event_source
...

i was wondering if there's something wrong with my configuration - thank
you for clarifying,
patrick

dataimporthandler: nested query is called multiple times

2013-03-20 Thread patrick


hi,

the dataimport-config-file i'm using with solr3.6.2 uses a nested select 
statement. the first query retrieves

the documents while the nested one retrieves the corresponding properties.


  
  

  

  
  

  

  


when running the dataimporthandler with the verbose/debug flag turned on 
the output lists more than one query for

'entity:attributes' - this list is increased for each 'entity:item':




  

  select DOCID from documents
  0:0:0.50
  --- row #1-
  000emnslnbh88hdd
  -
  
select * from doc_properties where 
DOCID='000emnslnbh88hdd'
select * from doc_properties where 
DOCID='000emnslnbh88hdd'

0:0:0.37
0:0:0.37
--- row #1-
I
message_direction
-
--- row #2-
heb@test
message_event_source



  --- row #1-
  000hsjunnbh7weq8
  -
  
select * from doc_properties where 
DOCID='000hsjunnbh7weq8'
select * from doc_properties where 
DOCID='000hsjunnbh7weq8'
select * from doc_properties where 
DOCID='000hsjunnbh7weq8'
select * from doc_properties where 
DOCID='000hsjunnbh7weq8'

0:0:0.1
0:0:0.1
0:0:0.1
0:0:0.1
--- row #1-
I
message_direction
-
--- row #2-
heb@test
message_event_source
...

i was wondering if there's something wrong with my configuration - thank 
you for clarifying,

patrick

how to recover when indexing with proxy & shards

2010-12-08 Thread patrick


hi,

i'm considering of using more than 3 solr shards and assign a (separate) 
proxy to do the loadbalancing when indexing. using SolrJ is my way to do 
the indexing. the question is if i get any information about the 
whereabouts of the shard in which the document is stored. this 
information would be helpful in case a specific shard has to be 
re-indexed (no indexing downtime, isolated recovery). i assume the 
HTTP-response only contains the IP address of the proxy.


thank you for any hints!

cheerio,
patrick

numFound inconsistent for different rows-param

2012-07-25 Thread patrick


hi,

i'm running two solr v3.6 instances:

rdta01:9983/solr/msg-core  : 8 documents
rdta01:28983/solr/msg-core : 4 documents

the following two queries with rows=10 resp rows=0 return different 
numFound results which confuses me. i hope someone can clarify this 
behaviour.


URL with rows=10:
-
http://rdta01:9983/solr/msg-core/select?q=*:*&shards=rdta01%3A9983%2Fsolr%2Fmsg-core%2Crdta01%3A28983%2Fsolr%2Fmsg-core&indent=on&start=0&rows=10

numFound=8 (incorrect, second shard is missing)

URL with rows=0:

http://rdta01:9983/solr/msg-core/select?q=*:*&shards=rdta01%3A9983%2Fsolr%2Fmsg-core%2Crdta01%3A28983%2Fsolr%2Fmsg-core&indent=on&start=0&rows=0

numFound=12 (correct)

cheerio,
patrick

Re: numFound inconsistent for different rows-param

2012-07-26 Thread patrick

i resolved my confusion and discovered that the documents of the second 
shard contained the same 'unique' id.


rows=0 displayed the 'correct' numFound since (as i understand) there 
was no merge of the results.


cheerio,
patrick

On 25.07.2012 17:07, patrick wrote:

hi,

i'm running two solr v3.6 instances:

rdta01:9983/solr/msg-core  : 8 documents
rdta01:28983/solr/msg-core : 4 documents

the following two queries with rows=10 resp rows=0 return different
numFound results which confuses me. i hope someone can clarify this
behaviour.

URL with rows=10:
-
http://rdta01:9983/solr/msg-core/select?q=*:*&shards=rdta01%3A9983%2Fsolr%2Fmsg-core%2Crdta01%3A28983%2Fsolr%2Fmsg-core&indent=on&start=0&rows=10

numFound=8 (incorrect, second shard is missing)

URL with rows=0:

http://rdta01:9983/solr/msg-core/select?q=*:*&shards=rdta01%3A9983%2Fsolr%2Fmsg-core%2Crdta01%3A28983%2Fsolr%2Fmsg-core&indent=on&start=0&rows=0

numFound=12 (correct)

cheerio,
patrick

Re: High Cpu sys usage

2016-03-19 Thread Patrick Plaatje

Hi,

>From the sar output you supplied, it looks like you might have a memory issue 
>on your hosts. The memory usage just before your crash seems to be *very* 
>close to 100%. Even the slightest increase (Solr itself, or possibly by a 
>system service) could caused the system crash. What are the specifications of 
>your hosts and how much memory are you allocating?

Cheers,
-patrick




On 16/03/2016, 14:52, "YouPeng Yang"  wrote:

>Hi
> It happened again,and worse thing is that my system went to crash.we can
>even not connect to it with ssh.
> I use the sar command to capture the statistics information about it.Here
>are my details:
>
>
>[1]cpu(by using sar -u),we have to restart our system just as the red font
>LINUX RESTART in the logs.
>--
>03:00:01 PM all  7.61  0.00  0.92  0.07  0.00
>91.40
>03:10:01 PM all  7.71  0.00  1.29  0.06  0.00
>90.94
>03:20:01 PM all  7.62  0.00  1.98  0.06  0.00
>90.34
>03:30:35 PM all  5.65  0.00 31.08  0.04  0.00
>63.23
>03:42:40 PM all 47.58  0.00 52.25  0.00  0.00
> 0.16
>Average:all  8.21  0.00  1.57  0.05  0.00
>90.17
>
>04:42:04 PM   LINUX RESTART
>
>04:50:01 PM CPU %user %nice   %system   %iowait%steal
>%idle
>05:00:01 PM all  3.49  0.00  0.62  0.15  0.00
>95.75
>05:10:01 PM all  9.03  0.00  0.92  0.28  0.00
>89.77
>05:20:01 PM all  7.06  0.00  0.78  0.05  0.00
>92.11
>05:30:01 PM all  6.67  0.00  0.79  0.06  0.00
>92.48
>05:40:01 PM all  6.26  0.00  0.76  0.05  0.00
>92.93
>05:50:01 PM all  5.49  0.00  0.71  0.05  0.00
>93.75
>--
>
>[2]mem(by using sar -r)
>--
>03:00:01 PM   1519272 196633272 99.23361112  76364340 143574212
>47.77
>03:10:01 PM   1451764 196700780 99.27361196  76336340 143581608
>47.77
>03:20:01 PM   1453400 196699144 99.27361448  76248584 143551128
>47.76
>03:30:35 PM   1513844 196638700 99.24361648  76022016 143828244
>47.85
>03:42:40 PM   1481108 196671436 99.25361676  75718320 144478784
>48.07
>Average:  5051607 193100937 97.45362421  81775777 142758861
>47.50
>
>04:42:04 PM   LINUX RESTART
>
>04:50:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
>%commit
>05:00:01 PM 154357132  43795412 22.10 92012  18648644 134950460
>44.90
>05:10:01 PM 136468244  61684300 31.13219572  31709216 134966548
>44.91
>05:20:01 PM 135092452  63060092 31.82221488  32162324 134949788
>44.90
>05:30:01 PM 133410464  64742080 32.67233848  32793848 134976828
>44.91
>05:40:01 PM 132022052  66130492 33.37235812  33278908 135007268
>44.92
>05:50:01 PM 130630408  67522136 34.08237140  33900912 135099764
>44.95
>Average:136996792  61155752 30.86206645  30415642 134991776
>44.91
>--
>
>
>As the blue font parts show that my hardware crash from 03:30:35.It is hung
>up until I restart it manually at 04:42:04
>ALl the above information just snapshot the performance when it crashed
>while there is nothing cover the reason.I have also
>check the /var/log/messages and find nothing useful.
>
>Note that I run the command- sar -v .It shows something abnormal:
>
>02:50:01 PM  11542262  9216 76446   258
>03:00:01 PM  11645526  9536 76421   258
>03:10:01 PM  11748690  9216 76451   258
>03:20:01 PM  11850191  9152 76331   258
>03:30:35 PM  11972313 10112132625   258
>03:42:40 PM  12177319 13760340227   258
>Average:  8293601  8950 68187   161
>
>04:42:04 PM   LINUX RESTART
>
>04:50:01 PM dentunusd   file-nr  inode-nrpty-nr
>05:00:01 PM 35410  7616 35223 4
>05:10:01 PM137320  7296 42632 6
>05:20:01 PM247010  7296 42839 9
>05:30:01 PM358434  7360 42697 9
>05:40:01 PM471543  7040 4292910
>05:50:01 PM583787  7296 4283713
>

Re: High Cpu sys usage

2016-03-19 Thread Patrick Plaatje

Yeah, I did’t pay attention to the cached memory at all, my bad!

I remember running into a similar situation a couple of years ago, one of the 
things to investigate our memory profile was to produce a full heap dump and 
manually analyse that using a tool like MAT.

Cheers,
-patrick




On 17/03/2016, 21:58, "Otis Gospodnetić"  wrote:

>Hi,
>
>On Wed, Mar 16, 2016 at 10:59 AM, Patrick Plaatje 
>wrote:
>
>> Hi,
>>
>> From the sar output you supplied, it looks like you might have a memory
>> issue on your hosts. The memory usage just before your crash seems to be
>> *very* close to 100%. Even the slightest increase (Solr itself, or possibly
>> by a system service) could caused the system crash. What are the
>> specifications of your hosts and how much memory are you allocating?
>
>
>That's normal actually - http://www.linuxatemyram.com/
>
>You *want* Linux to be using all your memory - you paid for it :)
>
>Otis
>--
>Monitoring - Log Management - Alerting - Anomaly Detection
>Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>
>>
>
>
>>
>>
>> On 16/03/2016, 14:52, "YouPeng Yang"  wrote:
>>
>> >Hi
>> > It happened again,and worse thing is that my system went to crash.we can
>> >even not connect to it with ssh.
>> > I use the sar command to capture the statistics information about it.Here
>> >are my details:
>> >
>> >
>> >[1]cpu(by using sar -u),we have to restart our system just as the red font
>> >LINUX RESTART in the logs.
>>
>> >--
>> >03:00:01 PM all  7.61  0.00  0.92  0.07  0.00
>> >91.40
>> >03:10:01 PM all  7.71  0.00  1.29  0.06  0.00
>> >90.94
>> >03:20:01 PM all  7.62  0.00  1.98  0.06  0.00
>> >90.34
>> >03:30:35 PM all  5.65  0.00 31.08  0.04  0.00
>> >63.23
>> >03:42:40 PM all 47.58  0.00 52.25  0.00  0.00
>> > 0.16
>> >Average:all  8.21  0.00  1.57  0.05  0.00
>> >90.17
>> >
>> >04:42:04 PM   LINUX RESTART
>> >
>> >04:50:01 PM CPU %user %nice   %system   %iowait%steal
>> >%idle
>> >05:00:01 PM all  3.49  0.00  0.62  0.15  0.00
>> >95.75
>> >05:10:01 PM all  9.03  0.00  0.92  0.28  0.00
>> >89.77
>> >05:20:01 PM all  7.06  0.00  0.78  0.05  0.00
>> >92.11
>> >05:30:01 PM all  6.67  0.00  0.79  0.06  0.00
>> >92.48
>> >05:40:01 PM all  6.26  0.00  0.76  0.05  0.00
>> >92.93
>> >05:50:01 PM all  5.49  0.00  0.71  0.05  0.00
>> >93.75
>>
>> >--
>> >
>> >[2]mem(by using sar -r)
>>
>> >--
>> >03:00:01 PM   1519272 196633272 99.23361112  76364340 143574212
>> >47.77
>> >03:10:01 PM   1451764 196700780 99.27361196  76336340 143581608
>> >47.77
>> >03:20:01 PM   1453400 196699144 99.27361448  76248584 143551128
>> >47.76
>> >03:30:35 PM   1513844 196638700 99.24361648  76022016 143828244
>> >47.85
>> >03:42:40 PM   1481108 196671436 99.25361676  75718320 144478784
>> >48.07
>> >Average:  5051607 193100937 97.45362421  81775777 142758861
>> >47.50
>> >
>> >04:42:04 PM   LINUX RESTART
>> >
>> >04:50:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
>> >%commit
>> >05:00:01 PM 154357132  43795412 22.10 92012  18648644 134950460
>> >44.90
>> >05:10:01 PM 136468244  61684300 31.13219572  31709216 134966548
>> >44.91
>> >05:20:01 PM 135092452  63060092 31.82221488  32162324 134949788
>> >44.90
>> >05:30:01 PM 133410464  64742080 32.67233848  32793848 134976828
>> >44.91
>> >05:40:01 PM 132022052  66130492 33.37235812  33278908 135007268
>> >44.92
>> >05:50:01 PM 130630408  67522136 34.08237140  33900912 135099764
>> >44.95
>> >Average:136996792  6

Querying Dynamic Fields

2015-10-26 Thread Patrick Hoeffel

I have a simple Solr schema that uses dynamic fields to create most of my 
fields. This works great. Unfortunately, I now need to ask Solr to give me the 
names of the fields in the schema. I'm using:

http://localhost:8983/solr/core/schema/fields

This returns the statically defined fields, but does not return the ones that 
were created matching my dynamic definitions, such as *_s, *_i, *_txt, etc.

I know Solr is aware of these fields, because I can query against them.

What is the secret sauce to query their names and data types?

Thanks,

Patrick Hoeffel
Senior Software Engineer
Intelligent Software Solutions (www.issinc.com<http://www.issinc.com/>)
(719) 452-7371 (direct)
(719) 210-3706 (mobile)

"Bringing Knowledge to Light"

Apache Solr Reference Guide 5.0

2015-03-06 Thread Patrick Durusau


Greetings,

I was looking at the PDF version of the Apache Solr Reference Guide 5.0 
and noticed that it has no TOC nor any section numbering. 
http://apache.claz.org/lucene/solr/ref-guide/apache-solr-ref-guide-5.0.pdf


The lack of a TOC and section headings makes navigation difficult.

I have just started making suggestions on the documentation and was 
wondering if there is a reason why the TOC and section headings are 
missing? (that isn't apparent from the document)


Thanks!

Hope everyone is near a great weekend!

Patrick

Re: Apache Solr Reference Guide 5.0

2015-03-06 Thread Patrick Durusau


Shawn,

Thanks!

I was using Document Viewer and not Adobe Acrobat so was unclear.

The TOC I meant was as in a traditional print publication with section 
#s, etc. Not a navigation TOC sans numbering as in Adobe.


The Confluence documentation (I can't see the actual stylesheet in use, 
I don't think) here:


https://confluence.atlassian.com/display/DOC/Customising+Exports+to+PDF

Says:

*
Disabling the Table of Contents

To prevent the table of contents from being generated in your PDF 
document, add the div.toc-macro rule to the PDF Stylesheet and set its 
display property to none:

*

Which is why I was asking if there was a reason for the TOC and section 
numbering not appearing.


They can be defeated but that doesn't appear to be the default setting.

This came up because a section said it would cover topics N - S and I 
could not determine if all those topics fell in that section or not.


Thanks!

Hope you are having a great day!

Patrick

On 03/06/2015 12:28 PM, Shawn Heisey wrote:

On 3/6/2015 10:20 AM, Patrick Durusau wrote:

I was looking at the PDF version of the Apache Solr Reference Guide
5.0 and noticed that it has no TOC nor any section numbering.
http://apache.claz.org/lucene/solr/ref-guide/apache-solr-ref-guide-5.0.pdf

The lack of a TOC and section headings makes navigation difficult.

I have just started making suggestions on the documentation and was
wondering if there is a reason why the TOC and section headings are
missing? (that isn't apparent from the document)

The TOC is built into the PDF and it's up to the PDF viewer to display it.

Here's a screenshot of the ref guide in Adobe Reader with a clickable
TOC open.

https://www.dropbox.com/s/3ajuri1emj61imu/refguide-5.0-TOC.png?dl=0

Section numbering might be a good idea, if it's not too intrusive or
difficult.

Thanks,
Shawn

How do I tell Tika to not complement a field's value defined in my Solr schema when indexing a binary document?

2015-04-15 Thread Patrick Savelberg

I use Solr to index different kinds of database tables. I have a Solr index 
containing a field named category. I make sure that the category field in Solr 
gets occupied with the right value depending on the table. This I can use to 
build facet queries which works fine.

The problem I have is with tables that contain records which represent binary 
documents like PDF's. I use the extract query (TIKA) to index the contents of 
the binary document along with the data from the database record. Tika 
sometimes finds metadata in the document which has the same name as one of my 
index fields I have in my schema.xml, like category. I end up with the category 
field being a multi-value field containing the category data from my database 
record AND the additional data from the category (meta)field extracted by TIKA 
from the actual binary document. It seems that the extracthandler adds every 
field it may find to my index if there is a corresponding field in my index.

How can I prevent this from happening? All I need is the textual representation 
of the binary document added as content and not the extra (meta?) fields. I 
don't want the extra data TIKA may find to be added to any field in my index. 
However I do want to keep the data in the category field which comes from my 
database record. So adding a fmap.category="ignored_" won't help me because 
then the data of my database record will be ignored as well.

Another reason for wanting to prevent this is that I cannot know in advance 
which other fields TIKA might come up with when the document is extracted. In 
other words choosing more elaborated names (like a namespace like prefix) for 
my index fields will never guarantee field name collisions 100%.

So, how can I prevent the data the extract comes up with is added to my index 
field or am I missing a point here?

can't seem to get delta imports to work.

2016-08-31 Thread Stahle, Patrick

Hi,

I am having problems getting the delta import working. Full import works fine. 
I am using current version of solr (6.1). I have been looking at this pretty 
much all day and can't find what I am not doing correctly... I did try the 
Using query attribute for both full and delta import and that worked, but as 
soon I ran it for a full import via clean=true my queries performance went very 
bad (oracle execution plain must of went bonkers). Anyways, I would appreciate 
any help.

Thanks

Here is my dataimportHandler config:
[hubadm@emcappd43:solr-6.1.0]$ cat ./server/solr/dmtec1/conf/db-data-config.xml















































Here is the log output:
2016-08-31 19:45:42.641 INFO  (qtp403424356-68) [   x:dmtec1] 
o.a.s.h.d.DataImporter Loading DIH Configuration: db-data-config.xml
2016-08-31 19:45:42.648 INFO  (qtp403424356-68) [   x:dmtec1] 
o.a.s.h.d.DataImporter Data Configuration loaded successfully
2016-08-31 19:45:42.649 INFO  (qtp403424356-68) [   x:dmtec1] o.a.s.c.S.Request 
[dmtec1]  webapp=/solr path=/dataimport 
params={indent=on&wt=json&command=reload-config&_=1472648332418} status=0 
QTime=9
2016-08-31 19:45:42.680 INFO  (qtp403424356-77) [   x:dmtec1] o.a.s.c.S.Request 
[dmtec1]  webapp=/solr path=/admin/mbeans 
params={cat=QUERYHANDLER&wt=json&_=1472648332418} status=0 QTime=1
2016-08-31 19:45:42.695 INFO  (qtp403424356-84) [   x:dmtec1] o.a.s.c.S.Request 
[dmtec1]  webapp=/solr path=/dataimport 
params={indent=on&wt=json&command=show-config&_=1472648332418} status=0 QTime=1
2016-08-31 19:45:42.696 INFO  (qtp403424356-49) [   x:dmtec1] o.a.s.c.S.Request 
[dmtec1]  webapp=/solr path=/dataimport 
params={indent=on&wt=json&command=status&_=1472648332418} status=0 QTime=0
2016-08-31 19:45:48.550 INFO  (qtp403424356-68) [   x:dmtec1] 
o.a.s.h.d.DataImporter Loading DIH Configuration: db-data-config.xml
2016-08-31 19:45:48.558 INFO  (qtp403424356-68) [   x:dmtec1] 
o.a.s.h.d.DataImporter Data Configuration loaded successfully
2016-08-31 19:45:48.560 INFO  (qtp403424356-68) [   x:dmtec1] o.a.s.c.S.Request 
[dmtec1]  webapp=/solr path=/dataimport 
params={core=dmtec1&optimize=false&indent=on&commit=true&clean=false&wt=json&command=delta-import&_=1472648332418&verbose=false}
 status=0 QTime=10
2016-08-31 19:45:48.560 INFO  (Thread-39) [   x:dmtec1] o.a.s.h.d.DataImporter 
Starting Delta Import
2016-08-31 19:45:48.574 INFO  (Thread-39) [   x:dmtec1] 
o.a.s.h.d.SimplePropertiesWriter Read dataimport.properties
2016-08-31 19:45:48.576 INFO  (Thread-39) [   x:dmtec1] o.a.s.h.d.DocBuilder 
Starting delta collection.
2016-08-31 19:45:48.577 INFO  (Thread-39) [   x:dmtec1] o.a.s.h.d.DocBuilder 
Running ModifiedRowKey() for Entity: viewables
2016-08-31 19:45:48.577 INFO  (Thread-39) [   x:dmtec1] o.a.s.h.d.DocBuilder 
Completed ModifiedRowKey for Entity: viewables rows obtained : 0
2016-08-31 19:45:48.577 INFO  (Thread-39) [   x:dmtec1] o.a.s.h.d.DocBuilder 
Completed DeletedRowKey for Entity: viewables rows obtained : 0
2016-08-31 19:45:48.577 INFO  (Thread-39) [   x:dmtec1] o.a.s.h.d.DocBuilder 
Completed parentDeltaQuery for Entity: viewables
2016-08-31 19:45:48.577 INFO  (Thread-39) [   x:dmtec1] o.a.s.h.d.DocBuilder 
Running ModifiedRowKey() for Entity: relParts
2016-08-31 19:45:48.578 INFO  (Thread-39) [   x:dmtec1] o.a.s.h.d.DocBuilder 
Completed ModifiedRowKey for Entity: relParts rows obtained : 0
2016-08-31 19:45:48.578 INFO  (Thread-39) [   x:dmtec1] o.a.s.h.d.DocBuilder 
Completed DeletedRowKey for Entity: relParts rows obtained : 0
2016-08-31 19:45:48.578 INFO  (Thread-39) [   x:dmtec1] o.a.s.h.d.DocBuilder 
Completed parentDeltaQuery for Entity: relParts
2016-08-31 19:45:48.578 INFO  (Threa

RE: CDCR - how to deal with the transaction log files

2017-07-20 Thread Patrick Hoeffel

I'm working on my first setup of CDCR, and I'm seeing the same "The log reader 
for target collection {collection name} is not initialised" as you saw.

It looks like you're creating collections on a regular basis, but for me, I 
create it one time and never again. I've been creating the collection first 
from defaults and then applying the CDCR-aware solrconfig changes afterward. It 
sounds like maybe I need to create the configset in ZK first, then create the 
collections, first on the Target and then on the Source, and I should be good?

Thanks,

Patrick Hoeffel 
Senior Software Engineer
(Direct)  719-452-7371
(Mobile) 719-210-3706
patrick.hoef...@polarisalpha.com
PolarisAlpha.com 


-Original Message-
From: jmyatt [mailto:jmy...@wayfair.com] 
Sent: Wednesday, July 12, 2017 4:49 PM
To: solr-user@lucene.apache.org
Subject: Re: CDCR - how to deal with the transaction log files

glad to hear you found your solution!  I have been combing over this post and 
others on this discussion board many times and have tried so many tweaks to 
configuration, order of steps, etc, all with absolutely no success in getting 
the Source cluster tlogs to delete.  So incredibly frustrating.  If anyone has 
other pearls of wisdom I'd love some advice.  Quick hits on what I've tried:

- solrconfig exactly like Sean's (target and source respectively) expect no 
autoSoftCommit
- I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
target) explicitly before starting since the config setting of 
defaultState=disabled doesn't seem to work
- when I create the collection on source first, I get the warning "The log 
reader for target collection {collection name} is not initialised".  When I 
reverse the order (create the collection on target first), no such warning
- tlogs replicate as expected, hard commits on both target and source cause 
tlogs to rollover, etc - all of that works as expected
- action=QUEUES on source reflects the queueSize accurately.  Also *always* 
shows updateLogSynchronizer state as "stopped"
- action=LASTPROCESSEDVERSION on both source and target always seems correct (I 
don't see the -1 that Sean mentioned).
- I'm creating new collections every time and running full data imports that 
take 5-10 minutes. Again, all data replication, log rollover, and autocommit 
activity seems to work as expected, and logs on target are deleted.  It's just 
those pesky source tlogs I can't get to delete.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-the-transaction-log-files-tp4345062p4345715.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: CDCR - how to deal with the transaction log files

2017-07-28 Thread Patrick Hoeffel

Amrit,

Problem solved! My biggest mistake was in my SOURCE-side configuration. The 
zkHost field needed the entire zkHost string, including the CHROOT indicator. I 
suppose that should have been obvious to me, but the examples only showed the 
IP Address of the target ZK, and I made a poor assumption.

  
  
  
10.161.0.7:2181,10.161.0.6:2181,10.161.0.5:2181/chroot/solr
ks_v1
ks_v1
  

  
  
  
10.161.0.7:2181 <=== Problem was here.
ks_v1
ks_v1
  


After that, I just made sure I did this:
1. Stop all Solr nodes at both SOURCE and TARGET.
2. $ rm -rf $SOLR_HOME/server/solr/collection_name/data/tlog/*.*
3. On the TARGET:
a. $ collection/cdcr?action=DISABLEBUFFER
b. $ collection/cdcr?action=START

4. On the Source:
a. $ collection/cdcr?action=DISABLEBUFFER
b. $ collection/cdcr?action=START

At this point any existing data in the SOURCE collection started flowing into 
the TARGET collection, and it has remained congruent ever since.

Thanks,



Patrick Hoeffel 
Senior Software Engineer
(Direct)  719-452-7371
(Mobile) 719-210-3706
patrick.hoef...@polarisalpha.com
PolarisAlpha.com 


-Original Message-
From: Amrit Sarkar [mailto:sarkaramr...@gmail.com] 
Sent: Friday, July 21, 2017 7:21 AM
To: solr-user@lucene.apache.org
Cc: jmy...@wayfair.com
Subject: Re: CDCR - how to deal with the transaction log files

Patrick,

Yes! You created default UpdateLog which got written to a disk and then you 
changed it to CdcrUpdateLog in configs. I find no reason it would create a 
proper COLLECTIONCHECKPOINT on target tlog.

One thing you can try before creating / starting from scratch is restarting 
source cluster nodes, the leaders of shard will try to create the same 
COLLECTIONCHECKPOINT, which may or may not be successful.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 11:09 AM, Patrick Hoeffel < 
patrick.hoef...@polarisalpha.com> wrote:

> I'm working on my first setup of CDCR, and I'm seeing the same "The 
> log reader for target collection {collection name} is not initialised" 
> as you saw.
>
> It looks like you're creating collections on a regular basis, but for 
> me, I create it one time and never again. I've been creating the 
> collection first from defaults and then applying the CDCR-aware 
> solrconfig changes afterward. It sounds like maybe I need to create 
> the configset in ZK first, then create the collections, first on the 
> Target and then on the Source, and I should be good?
>
> Thanks,
>
> Patrick Hoeffel
> Senior Software Engineer
> (Direct)  719-452-7371
> (Mobile) 719-210-3706
> patrick.hoef...@polarisalpha.com
> PolarisAlpha.com
>
>
> -Original Message-
> From: jmyatt [mailto:jmy...@wayfair.com]
> Sent: Wednesday, July 12, 2017 4:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: CDCR - how to deal with the transaction log files
>
> glad to hear you found your solution!  I have been combing over this 
> post and others on this discussion board many times and have tried so 
> many tweaks to configuration, order of steps, etc, all with absolutely 
> no success in getting the Source cluster tlogs to delete.  So 
> incredibly frustrating.  If anyone has other pearls of wisdom I'd love some 
> advice.
> Quick hits on what I've tried:
>
> - solrconfig exactly like Sean's (target and source respectively) 
> expect no autoSoftCommit
> - I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
> target) explicitly before starting since the config setting of 
> defaultState=disabled doesn't seem to work
> - when I create the collection on source first, I get the warning "The 
> log reader for target collection {collection name} is not 
> initialised".  When I reverse the order (create the collection on 
> target first), no such warning
> - tlogs replicate as expected, hard commits on both target and source 
> cause tlogs to rollover, etc - all of that works as expected
> - action=QUEUES on source reflects the queueSize accurately.  Also
> *always* shows updateLogSynchronizer state as "stopped"
> - action=LASTPROCESSEDVERSION on both source and target always seems 
> correct (I don't see the -1 that Sean mentioned).
> - I'm creating new collections every time and running full data 
> imports that take 5-10 minutes. Again, all data replication, log 
> rollover, and autocommit activity seems to work as expected, and logs 
> on target are deleted.  It's just those pesky source tlogs I can't get to 
> delete.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/CDCR-how-to-deal-with-the-transaction-log-
> files-tp4345062p4345715.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

JSON facet SUM precision and accuracy is incorrect

2017-08-08 Thread Patrick Chan

Appreciate if anyone can help raise an issue for the JSON facet sum error
my staff Edwin raised earlier

but have not gotten any response from the Solr community and developers.

Our production operation is urgently needing this accuracy to proceed as it
impacts audit issues.


Best regards,

Dr.Patrick


On Tue, Jul 25, 2017 at 6:27 PM, Zheng Lin Edwin Yeo 

wrote:

> This is the way which I put my JSON facet.
>
> totalAmount:"sum(sum(amount1_d,amount2_d))"
>
> amount1_d: 69446961 <6944%206961>.2
> amount2_d: 0
>
> Result I get: 69446959 <6944%206959>.27
>
>
> Regards,
> Edwin
>
>
> On 25 July 2017 at 20:44, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi,
> >
> > I'm trying to do a sum of two double fields in JSON Facet. One of the
> > field has a value of 69446961 <6944%206961>.2, while the other is 0.
However, when I
> get
> > the result, I'm getting a value of 69446959 <6944%206959>.27. This is
1.93 lesser than
> > the original value.
> >
> > What could be the reason?
> >
> > I'm using Solr 6.5.1.
> >
> > Regards,
> > Edwin
> >

Solr Issue

2017-09-07 Thread Patrick Fallert

Hey Guys,
i´ve got a problem with my Solr Highlighter..
When I search for a word, i get some results. For every result i want to 
display the highlighted text and here is my problem. Some of the returned 
documents have a highlighted text the other ones doesnt. I don´t know why it is 
but i need to fix this problem. Below is the configuration of my 
managed-schema. The configuration of the highlighter in solrconfig.xml is 
default.
I hope someone can help me. If you need more details you can ask me for sure.

managed-schema:




id










































































































































































































































































































































































































































































































































































Mit freundlichen Grüßen

Patrick Fallert


[cid:image001.jpg@01D327BC.EA2F]




Rainer-Haungs-Straße 7
D-77933 Lahr

zentrale:
fax:
mobil:

+49 7821 9509-0
+49 7821 9509-99

i...@schrempp-edv.de <mailto:i...@schrempp-edv.de>
www.schrempp-edv.de <http://www.schrempp-edv.de/>



Geschäftsführer: Brigitta Schrempp Gesamtleitung, Stefan Basler Entwicklung. 
Register-Nummer: HRB 391291,Register-Gericht: Freiburg i. Breisgau.
Steuernummer:  10050/03799. Umsatzsteuer-Identifikationsnummer: DE206688941

Vertraulichkeitshinweis:
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. 
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben, informieren Sie bitte sofort den Absender und vernichten Sie diese 
E-Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail 
ist nicht gestattet.
Diese E-Mail wurde doppelt auf Viren überprüft. Dies garantiert aber keine 
Virenfreiheit. Wir übernehmen keine Haftung für eventuelle Schäden, die durch 
diese E-Mail oder deren Anhänge entstehen könnten.

TermVectors and ExactStatsCache

2017-09-19 Thread Patrick Plante

Hi!

I have a SolrCloud 6.6 collection with 3 shards setup where I need the 
TermVectors TF and DF values for queries.

I have configured the ExactStatsCache in the solrConfig:



When I query "detector works", it returns different docfreq values based on the 
shard the document comes from:

"termVectors":[
"27504103",[
  "uniqueKey","27504103",
  "kc",[
"detector works",[
  "tf",1,
  "df",3,
  "tf-idf",0.]]],
"27507925",[
  "uniqueKey","27507925",
  "kc",[
"detector works",[
  "tf",1,
  "df",3,
  "tf-idf",0.]]],
"27504105",[
  "uniqueKey","27504105",
  "kc",[
"detector works",[
  "tf",1,
  "df",2,
  "tf-idf",0.5]]],
"27507927",[
  "uniqueKey","27507927",
  "kc",[
"detector works",[
  "tf",1,
  "df",2,
  "tf-idf",0.5]]],
"27507929",[
  "uniqueKey","27507929",
  "kc",[
"detector works",[
  "tf",1,
  "df",1,
  "tf-idf",1.0]]],
"27504107",[
  "uniqueKey","27504107",
  "kc",[
"detector works",[
  "tf",1,
  "df",3,
  "tf-idf",0.}

I expect to see the DF values to be 6 and TF-IDF to be adjusted on that value. 
I can see in the debug logs that the cache was active.

I have found a pending bug (since Solr 5.5: 
https://issues.apache.org/jira/browse/SOLR-8893) that explains that this 
ExactStatsCache is used to compute the correct TF-IDF for the query but not for 
the TermVectors component.

Is there any way to get the correctly merged DF values (and TF-IDF) from 
multiple shards?

Is there a way to get from which shard a document comes from so I could compute 
my own correct DF?

Thank you,
Patrick

TermVectors and ExactStatsCache

2017-09-20 Thread Patrick Plante

Hi!

I have a SolrCloud 6.6 collection with 3 shards setup where I need the 
TermVectors TF and DF values when querying.

I have configured the ExactStatsCache in the solrConfig:



When I query "detector works" in my collection, it returns different docfreq 
values based on the shard the document comes from:

"termVectors":[
"27504103",[
  "uniqueKey","27504103",
  "kc",[
"detector works",[
  "tf",1,
  "df",3,
  "tf-idf",0.]]],
"27507925",[
  "uniqueKey","27507925",
  "kc",[
"detector works",[
  "tf",1,
  "df",3,
  "tf-idf",0.]]],
"27504105",[
  "uniqueKey","27504105",
  "kc",[
"detector works",[
  "tf",1,
  "df",2,
  "tf-idf",0.5]]],
"27507927",[
  "uniqueKey","27507927",
  "kc",[
"detector works",[
  "tf",1,
  "df",2,
  "tf-idf",0.5]]],
"27507929",[
  "uniqueKey","27507929",
  "kc",[
"detector works",[
  "tf",1,
  "df",1,
  "tf-idf",1.0]]],
"27504107",[
  "uniqueKey","27504107",
  "kc",[
"detector works",[
  "tf",1,
  "df",3,
  "tf-idf",0.}

I expect to see the DF values to be 6 and TF-IDF to be adjusted on that value. 
I can see in the debug logs that the cache was active.

I have found a pending bug (since Solr 5.5: 
https://issues.apache.org/jira/browse/SOLR-8893) that explains that this 
ExactStatsCache is used to compute the correct TF-IDF for the query but not for 
the TermVectors component.

Is there any way to get the correctly merged DF values (and TF-IDF) from 
multiple shards?

Is there a way to get from which shard a document comes from so I could compute 
my own correct DF?

Thank you,
Patrick

Unexpected query result

2013-11-08 Thread Patrick Duc

I'm using Solr 4.4.0 running on Tomcat 7.0.29. The solrconfig.xlm is
as-delivered (excepted for the Solr home directory of course). I could pass
on the schema.xml, though I doubt this would help much, as the following
will show.

If I select all documents containing "russia" in the text, which is the
default field, ie if I execute the query "russia", I find only 1 document,
which is correct.

If I select all documents containing "web" in the text ("web"), the result
is 29, which is also correct.

If I search for all documents that do not contain "russia" ("NOT(russia)"),
the result is still correct (202).

If I search for all documents that contain "web" and do not contain "russia"
("web AND NOT(russia)"), the result is, once again, correct (28, because the
document containing "russia" also contains "web").

But if I search for all documents that contain "web" or do not contain
"russia" ("web OR NOT(russia)"), the result is still 28, though I should get
203 matches (the whole set).

Has anyone got an explanation ??

For information, the AND and OR work correctly if I don't use a NOT
somewhere in the query, i.e. : "web AND russia" --> OK "web OR russia" -->
OK



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-query-result-tp416.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unexpected query result

2013-11-08 Thread Patrick Duc

Thank you for your very quick reply - and for your solution, that works
perfectly well.

Still, I wonder why this simple and straightforward syntax "web OR
NOT(russia)" needs some translation to be processed correctly...
>From the many related posts I read before asking my question, I know that
I'm not the first one to be puzzled by this behavior. Wouldn't it be a good
idea to modify the (Lucene, I guess ?) parser so that the subsequent
processing would produce a correct result ?

Thanks again for your help !



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-query-result-tp416p4100015.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-11-26 Thread Patrick O'Lone

I've been tracking a problem in our Solr environment for awhile with
periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try
and thought I might get some insight from some others on this list.

The load on the server is normally anywhere between 1-3. It's an 8-core
machine with 40GB of RAM. I have about 25GB of index data that is
replicated to this server every 5 minutes. It's taking about 200
connections per second and roughly every 5-10 minutes it will stall for
about 30 seconds to a minute. The stall causes the load to go to as high
as 90. It is all CPU bound in user space - all cores go to 99%
utilization (spinlock?). When doing a thread dump, the following line is
blocked in all running Tomcat threads:

org.apache.lucene.search.FieldCacheImpl$Cache.get (
FieldCacheImpl.java:230 )

Looking the source code in 3.6.1, that is a function call to
syncronized() which blocks all threads and causes the backlog. I've
tried to correlate these events to the replication events - but even
with replication disabled - this still happens. We run multiple data
centers using Solr and I was comparing garbage collection processes
between and noted that the old generation is collected very differently
on this data center versus others. The old generation is collected as a
massive collect event (several gigabytes worth) - the other data center
is more saw toothed and collects only in 500MB-1GB at a time. Here's my
parameters to java (the same in all environments):

/usr/java/jre/bin/java \
-verbose:gc \
-XX:+PrintGCDetails \
-server \
-Dcom.sun.management.jmxremote \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSIncrementalMode \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSIncrementalPacing \
-XX:NewRatio=3 \
-Xms30720M \
-Xmx30720M \
-Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \
-classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \
-Dcatalina.base=/usr/local/share/apache-tomcat \
-Dcatalina.home=/usr/local/share/apache-tomcat \
-Djava.io.tmpdir=/tmp \
org.apache.catalina.startup.Bootstrap start

I've tried a few GC option changes from this (been running this way for
a couple of years now) - primarily removing CMS Incremental mode as we
have 8 cores and remarks on the internet suggest that it is only for
smaller SMP setups. Removing CMS did not fix anything.

I've considered that the heap is way too large (30GB from 40GB) and may
not leave enough memory for mmap operations (MMap appears to be used in
the field cache). Based on active memory utilization in Java, seems like
I might be able to reduce down to 22GB safely - but I'm not sure if that
will help with the CPU issues.

I think field cache is used for sorting and faceting. I've started to
investigate facet.method, but from what I can tell, this doesn't seem to
influence sorting at all - only facet queries. I've tried setting
useFilterForSortQuery, and seems to require less field cache but doesn't
address the stalling issues.

Is there something I am overlooking? Perhaps the system is becoming
oversubscribed in terms of resources? Thanks for any help that is offered.

-- 
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-11-26 Thread Patrick O'Lone

We do perform a lot of sorting - on multiple fields in fact. We have
different kinds of Solr configurations - our news searches do little
with regards to faceting, but heavily sort. We provide classified ad
searches and that heavily uses faceting. I might try reducing the JVM
memory some and amount of perm generation as suggested earlier. It feels
like a GC issue and loading the cache just happens to be the victim of a
stop-the-world event at the worse possible time.

> My gut instinct is that your heap size is way too high. Try decreasing it to 
> like 5-10G. I know you say it uses more than that, but that just seems 
> bizarre unless you're doing something like faceting and/or sorting on every 
> field.
> 
> -Michael
> 
> -Original Message-
> From: Patrick O'Lone [mailto:pol...@townnews.com] 
> Sent: Tuesday, November 26, 2013 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
> 
> I've been tracking a problem in our Solr environment for awhile with periodic 
> stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I 
> might get some insight from some others on this list.
> 
> The load on the server is normally anywhere between 1-3. It's an 8-core 
> machine with 40GB of RAM. I have about 25GB of index data that is replicated 
> to this server every 5 minutes. It's taking about 200 connections per second 
> and roughly every 5-10 minutes it will stall for about 30 seconds to a 
> minute. The stall causes the load to go to as high as 90. It is all CPU bound 
> in user space - all cores go to 99% utilization (spinlock?). When doing a 
> thread dump, the following line is blocked in all running Tomcat threads:
> 
> org.apache.lucene.search.FieldCacheImpl$Cache.get (
> FieldCacheImpl.java:230 )
> 
> Looking the source code in 3.6.1, that is a function call to
> syncronized() which blocks all threads and causes the backlog. I've tried to 
> correlate these events to the replication events - but even with replication 
> disabled - this still happens. We run multiple data centers using Solr and I 
> was comparing garbage collection processes between and noted that the old 
> generation is collected very differently on this data center versus others. 
> The old generation is collected as a massive collect event (several gigabytes 
> worth) - the other data center is more saw toothed and collects only in 
> 500MB-1GB at a time. Here's my parameters to java (the same in all 
> environments):
> 
> /usr/java/jre/bin/java \
> -verbose:gc \
> -XX:+PrintGCDetails \
> -server \
> -Dcom.sun.management.jmxremote \
> -XX:+UseConcMarkSweepGC \
> -XX:+UseParNewGC \
> -XX:+CMSIncrementalMode \
> -XX:+CMSParallelRemarkEnabled \
> -XX:+CMSIncrementalPacing \
> -XX:NewRatio=3 \
> -Xms30720M \
> -Xmx30720M \
> -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath 
> /usr/local/share/apache-tomcat/bin/bootstrap.jar \ 
> -Dcatalina.base=/usr/local/share/apache-tomcat \ 
> -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ 
> org.apache.catalina.startup.Bootstrap start
> 
> I've tried a few GC option changes from this (been running this way for a 
> couple of years now) - primarily removing CMS Incremental mode as we have 8 
> cores and remarks on the internet suggest that it is only for smaller SMP 
> setups. Removing CMS did not fix anything.
> 
> I've considered that the heap is way too large (30GB from 40GB) and may not 
> leave enough memory for mmap operations (MMap appears to be used in the field 
> cache). Based on active memory utilization in Java, seems like I might be 
> able to reduce down to 22GB safely - but I'm not sure if that will help with 
> the CPU issues.
> 
> I think field cache is used for sorting and faceting. I've started to 
> investigate facet.method, but from what I can tell, this doesn't seem to 
> influence sorting at all - only facet queries. I've tried setting 
> useFilterForSortQuery, and seems to require less field cache but doesn't 
> address the stalling issues.
> 
> Is there something I am overlooking? Perhaps the system is becoming 
> oversubscribed in terms of resources? Thanks for any help that is offered.
> 
> --
> Patrick O'Lone
> Director of Software Development
> TownNews.com
> 
> E-mail ... pol...@townnews.com
> Phone  309-743-0809
> Fax .. 309-743-0830
> 
> 


-- 
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830

facet.method=fcs vs facet.method=fc on solr slaves

2013-12-04 Thread Patrick O'Lone

Is there any advantage on a Solr slave to receive queries using
facet.method=fcs instead of the default of facet.method=fc? Most of the
segment files are unchanged between replication events - but I wasn't
sure if replication would cause the unchanged segment field caches to be
lost anyway.
-- 
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830

Re: facet.method=fcs vs facet.method=fc on solr slaves

2013-12-05 Thread Patrick O'Lone

So does it make the most sense then to force, by default,
facet.method=fcs on slave nodes that receive updates every 5 minutes but
with large segments that don't change every update? Right now,
everything I have configured uses facet.method=fc since we don't declare
it at all.

Randomly, after replication, I have several threads that will hang on
reading data from field cache and I'm trying to think of things I can do
to mitigate that. Thanks for the info.

> Hello Patrick,
> 
> Replication flushes UnInvertedField cache that impacts fc, but doesn't
> harm Lucene's FieldCache which is for fcs. You can check how much time
> in millis is spend on UnInvertedField cache regeneration in INFO logs like
> "UnInverted multi-valued field ,time=### ..."
> 
> 
> On Thu, Dec 5, 2013 at 12:15 AM, Patrick O'Lone  <mailto:pol...@townnews.com>> wrote:
> 
> Is there any advantage on a Solr slave to receive queries using
> facet.method=fcs instead of the default of facet.method=fc? Most of the
> segment files are unchanged between replication events - but I wasn't
> sure if replication would cause the unchanged segment field caches to be
> lost anyway.
> --
> Patrick O'Lone
> Director of Software Development
> TownNews.com
> 
> E-mail ... pol...@townnews.com <mailto:pol...@townnews.com>
> Phone  309-743-0809
> Fax .. 309-743-0830
> 
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> <http://www.griddynamics.com>
> <mailto:mkhlud...@griddynamics.com>


-- 
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Patrick O'Lone

I have a new question about this issue - I create a filter queries of
the form:

fq=start_time:[* TO NOW/5MINUTE]

This is used to restrict the set of documents to only items that have a
start time within the next 5 minutes. Most of my indexes have millions
of documents with few documents that start sometime in the future.
Nearly all of my queries include this, would this cause every other
search thread to block until the filter query is re-cached every 5
minutes and if so, is there a better way to do it? Thanks for any
continued help with this issue!

> We have a webapp running with a very high HEAP size (24GB) and we have
> no problems with it AFTER we enabled the new GC that is meant to replace
> sometime in the future the CMS GC, but you have to have Java 6 update
> "Some number I couldn't find but latest should cover" to be able to use:
> 
> 1. Remove all GC options you have and...
> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
> 
> As a test of course, more information you can read on the following (and
> interesting) article, we also have Solr running with these options, no
> more pauses or HEAP size hitting the sky.
> 
> Don't get bored reading the 1st (and small) introduction page of the
> article, page 2 and 3 will make lot of sense:
> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
> 
> 
> HTH,
> 
> Guido.
> 
> On 26/11/13 21:59, Patrick O'Lone wrote:
>> We do perform a lot of sorting - on multiple fields in fact. We have
>> different kinds of Solr configurations - our news searches do little
>> with regards to faceting, but heavily sort. We provide classified ad
>> searches and that heavily uses faceting. I might try reducing the JVM
>> memory some and amount of perm generation as suggested earlier. It feels
>> like a GC issue and loading the cache just happens to be the victim of a
>> stop-the-world event at the worse possible time.
>>
>>> My gut instinct is that your heap size is way too high. Try
>>> decreasing it to like 5-10G. I know you say it uses more than that,
>>> but that just seems bizarre unless you're doing something like
>>> faceting and/or sorting on every field.
>>>
>>> -Michael
>>>
>>> -Original Message-
>>> From: Patrick O'Lone [mailto:pol...@townnews.com]
>>> Sent: Tuesday, November 26, 2013 11:59 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
>>>
>>> I've been tracking a problem in our Solr environment for awhile with
>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
>>> try and thought I might get some insight from some others on this list.
>>>
>>> The load on the server is normally anywhere between 1-3. It's an
>>> 8-core machine with 40GB of RAM. I have about 25GB of index data that
>>> is replicated to this server every 5 minutes. It's taking about 200
>>> connections per second and roughly every 5-10 minutes it will stall
>>> for about 30 seconds to a minute. The stall causes the load to go to
>>> as high as 90. It is all CPU bound in user space - all cores go to
>>> 99% utilization (spinlock?). When doing a thread dump, the following
>>> line is blocked in all running Tomcat threads:
>>>
>>> org.apache.lucene.search.FieldCacheImpl$Cache.get (
>>> FieldCacheImpl.java:230 )
>>>
>>> Looking the source code in 3.6.1, that is a function call to
>>> syncronized() which blocks all threads and causes the backlog. I've
>>> tried to correlate these events to the replication events - but even
>>> with replication disabled - this still happens. We run multiple data
>>> centers using Solr and I was comparing garbage collection processes
>>> between and noted that the old generation is collected very
>>> differently on this data center versus others. The old generation is
>>> collected as a massive collect event (several gigabytes worth) - the
>>> other data center is more saw toothed and collects only in 500MB-1GB
>>> at a time. Here's my parameters to java (the same in all environments):
>>>
>>> /usr/java/jre/bin/java \
>>> -verbose:gc \
>>> -XX:+PrintGCDetails \
>>> -server \
>>> -Dcom.sun.management.jmxremote \
>>> -XX:+UseConcMarkSweepGC \
>>> -XX:+UseParNewGC \
>>> -XX:+CMSIncrementalMode \
>>> -XX:+CMSParallelRemarkEnabled \
>>> -XX:+CMSIncrementalPacing \
>>> -XX:NewRatio=3 \
>>> -Xm

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Patrick O'Lone

Unfortunately, in a test environment, this happens in version 4.4.0 of
Solr as well.

> I was trying to locate the release notes for 3.6.x it is too old, if I
> were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you
> since it is a minor release, locate the release notes and see if
> something that is affecting you got fixed, also, I would be thinking on
> moving on to 4.x which is quite stable and fast.
> 
> Like anything with Java and concurrency, it will just get better (and
> faster) with bigger numbers and concurrency frameworks becoming more and
> more reliable, standard and stable.
> 
> Regards,
> 
> Guido.
> 
> On 09/12/13 15:07, Patrick O'Lone wrote:
>> I have a new question about this issue - I create a filter queries of
>> the form:
>>
>> fq=start_time:[* TO NOW/5MINUTE]
>>
>> This is used to restrict the set of documents to only items that have a
>> start time within the next 5 minutes. Most of my indexes have millions
>> of documents with few documents that start sometime in the future.
>> Nearly all of my queries include this, would this cause every other
>> search thread to block until the filter query is re-cached every 5
>> minutes and if so, is there a better way to do it? Thanks for any
>> continued help with this issue!
>>
>>> We have a webapp running with a very high HEAP size (24GB) and we have
>>> no problems with it AFTER we enabled the new GC that is meant to replace
>>> sometime in the future the CMS GC, but you have to have Java 6 update
>>> "Some number I couldn't find but latest should cover" to be able to use:
>>>
>>> 1. Remove all GC options you have and...
>>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
>>>
>>> As a test of course, more information you can read on the following (and
>>> interesting) article, we also have Solr running with these options, no
>>> more pauses or HEAP size hitting the sky.
>>>
>>> Don't get bored reading the 1st (and small) introduction page of the
>>> article, page 2 and 3 will make lot of sense:
>>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
>>>
>>>
>>>
>>> HTH,
>>>
>>> Guido.
>>>
>>> On 26/11/13 21:59, Patrick O'Lone wrote:
>>>> We do perform a lot of sorting - on multiple fields in fact. We have
>>>> different kinds of Solr configurations - our news searches do little
>>>> with regards to faceting, but heavily sort. We provide classified ad
>>>> searches and that heavily uses faceting. I might try reducing the JVM
>>>> memory some and amount of perm generation as suggested earlier. It
>>>> feels
>>>> like a GC issue and loading the cache just happens to be the victim
>>>> of a
>>>> stop-the-world event at the worse possible time.
>>>>
>>>>> My gut instinct is that your heap size is way too high. Try
>>>>> decreasing it to like 5-10G. I know you say it uses more than that,
>>>>> but that just seems bizarre unless you're doing something like
>>>>> faceting and/or sorting on every field.
>>>>>
>>>>> -Michael
>>>>>
>>>>> -Original Message-
>>>>> From: Patrick O'Lone [mailto:pol...@townnews.com]
>>>>> Sent: Tuesday, November 26, 2013 11:59 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
>>>>>
>>>>> I've been tracking a problem in our Solr environment for awhile with
>>>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
>>>>> try and thought I might get some insight from some others on this
>>>>> list.
>>>>>
>>>>> The load on the server is normally anywhere between 1-3. It's an
>>>>> 8-core machine with 40GB of RAM. I have about 25GB of index data that
>>>>> is replicated to this server every 5 minutes. It's taking about 200
>>>>> connections per second and roughly every 5-10 minutes it will stall
>>>>> for about 30 seconds to a minute. The stall causes the load to go to
>>>>> as high as 90. It is all CPU bound in user space - all cores go to
>>>>> 99% utilization (spinlock?). When doing a thread dump, the following
>>>>> line is blocked in all running Tomcat threads:
>>&

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Patrick O'Lone

Yeah, I tried G1, but it did not help - I don't think it is a garbage
collection issue. I've made various changes to iCMS as well and the
issue ALWAYS happens - no matter what I do. If I'm taking heavy traffic
(200 requests per second) - as soon as I hit a 5 minute mark - the world
stops - garbage collection would be less predictable. Nearly all of my
requests have this 5 minute windowing behavior on time though, which is
why I have it as a strong suspect now. If it blocks on that - even for a
couple of seconds, my traffic backlog will be 600-800 requests.

> Did you add the Garbage collection JVM options I suggested you?
> 
> -XX:+UseG1GC -XX:MaxGCPauseMillis=50
> 
> Guido.
> 
> On 09/12/13 16:33, Patrick O'Lone wrote:
>> Unfortunately, in a test environment, this happens in version 4.4.0 of
>> Solr as well.
>>
>>> I was trying to locate the release notes for 3.6.x it is too old, if I
>>> were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you
>>> since it is a minor release, locate the release notes and see if
>>> something that is affecting you got fixed, also, I would be thinking on
>>> moving on to 4.x which is quite stable and fast.
>>>
>>> Like anything with Java and concurrency, it will just get better (and
>>> faster) with bigger numbers and concurrency frameworks becoming more and
>>> more reliable, standard and stable.
>>>
>>> Regards,
>>>
>>> Guido.
>>>
>>> On 09/12/13 15:07, Patrick O'Lone wrote:
>>>> I have a new question about this issue - I create a filter queries of
>>>> the form:
>>>>
>>>> fq=start_time:[* TO NOW/5MINUTE]
>>>>
>>>> This is used to restrict the set of documents to only items that have a
>>>> start time within the next 5 minutes. Most of my indexes have millions
>>>> of documents with few documents that start sometime in the future.
>>>> Nearly all of my queries include this, would this cause every other
>>>> search thread to block until the filter query is re-cached every 5
>>>> minutes and if so, is there a better way to do it? Thanks for any
>>>> continued help with this issue!
>>>>
>>>>> We have a webapp running with a very high HEAP size (24GB) and we have
>>>>> no problems with it AFTER we enabled the new GC that is meant to
>>>>> replace
>>>>> sometime in the future the CMS GC, but you have to have Java 6 update
>>>>> "Some number I couldn't find but latest should cover" to be able to
>>>>> use:
>>>>>
>>>>> 1. Remove all GC options you have and...
>>>>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
>>>>>
>>>>> As a test of course, more information you can read on the following
>>>>> (and
>>>>> interesting) article, we also have Solr running with these options, no
>>>>> more pauses or HEAP size hitting the sky.
>>>>>
>>>>> Don't get bored reading the 1st (and small) introduction page of the
>>>>> article, page 2 and 3 will make lot of sense:
>>>>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> HTH,
>>>>>
>>>>> Guido.
>>>>>
>>>>> On 26/11/13 21:59, Patrick O'Lone wrote:
>>>>>> We do perform a lot of sorting - on multiple fields in fact. We have
>>>>>> different kinds of Solr configurations - our news searches do little
>>>>>> with regards to faceting, but heavily sort. We provide classified ad
>>>>>> searches and that heavily uses faceting. I might try reducing the JVM
>>>>>> memory some and amount of perm generation as suggested earlier. It
>>>>>> feels
>>>>>> like a GC issue and loading the cache just happens to be the victim
>>>>>> of a
>>>>>> stop-the-world event at the worse possible time.
>>>>>>
>>>>>>> My gut instinct is that your heap size is way too high. Try
>>>>>>> decreasing it to like 5-10G. I know you say it uses more than that,
>>>>>>> but that just seems bizarre unless you're doing something like
>>>>>>> faceting and/or sorting on every field.
>>>>>>>
>>>>>>> -Michael
&g

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Patrick O'Lone

Well, I want to include everything will start in the next 5 minute
interval and everything that came before. The query is more like:

fq=start_time:[* TO NOW+5MINUTE/5MINUTE]

so that it rounds to the nearest 5 minute interval on the right-hand
side. But, as soon as 1 second after that 5 minute window, everything
pauses wanting for filter cache (at least that's my working theory based
on observation). Is it possible to do something like:

fq=start_time:[* TO NOW+1DAY/DAY]&q=start_time:[* TO NOW/MINUTE]

where it would use the filter cache to narrow down by day resolution and
then filter as part of the standard query, or something like that?

My thought is that this would still gain a benefit from a query cache,
but somewhat slower since it must remove results for things appearing
later in the day.

> If you want a start time within the next 5 minutes, I think your filter
> is not the good one.
> * will be replaced by the first date in your field
> 
> Try :
> fq=start_time:[NOW TO NOW+5MINUTE]
> 
> Franck Brisbart
> 
> 
> Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit :
>> I have a new question about this issue - I create a filter queries of
>> the form:
>>
>> fq=start_time:[* TO NOW/5MINUTE]
>>
>> This is used to restrict the set of documents to only items that have a
>> start time within the next 5 minutes. Most of my indexes have millions
>> of documents with few documents that start sometime in the future.
>> Nearly all of my queries include this, would this cause every other
>> search thread to block until the filter query is re-cached every 5
>> minutes and if so, is there a better way to do it? Thanks for any
>> continued help with this issue!
>>
>>> We have a webapp running with a very high HEAP size (24GB) and we have
>>> no problems with it AFTER we enabled the new GC that is meant to replace
>>> sometime in the future the CMS GC, but you have to have Java 6 update
>>> "Some number I couldn't find but latest should cover" to be able to use:
>>>
>>> 1. Remove all GC options you have and...
>>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
>>>
>>> As a test of course, more information you can read on the following (and
>>> interesting) article, we also have Solr running with these options, no
>>> more pauses or HEAP size hitting the sky.
>>>
>>> Don't get bored reading the 1st (and small) introduction page of the
>>> article, page 2 and 3 will make lot of sense:
>>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
>>>
>>>
>>> HTH,
>>>
>>> Guido.
>>>
>>> On 26/11/13 21:59, Patrick O'Lone wrote:
>>>> We do perform a lot of sorting - on multiple fields in fact. We have
>>>> different kinds of Solr configurations - our news searches do little
>>>> with regards to faceting, but heavily sort. We provide classified ad
>>>> searches and that heavily uses faceting. I might try reducing the JVM
>>>> memory some and amount of perm generation as suggested earlier. It feels
>>>> like a GC issue and loading the cache just happens to be the victim of a
>>>> stop-the-world event at the worse possible time.
>>>>
>>>>> My gut instinct is that your heap size is way too high. Try
>>>>> decreasing it to like 5-10G. I know you say it uses more than that,
>>>>> but that just seems bizarre unless you're doing something like
>>>>> faceting and/or sorting on every field.
>>>>>
>>>>> -Michael
>>>>>
>>>>> -Original Message-
>>>>> From: Patrick O'Lone [mailto:pol...@townnews.com]
>>>>> Sent: Tuesday, November 26, 2013 11:59 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
>>>>>
>>>>> I've been tracking a problem in our Solr environment for awhile with
>>>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
>>>>> try and thought I might get some insight from some others on this list.
>>>>>
>>>>> The load on the server is normally anywhere between 1-3. It's an
>>>>> 8-core machine with 40GB of RAM. I have about 25GB of index data that
>>>>> is replicated to this server every 5 minutes. It's taking about 200
>>>>> connections per second and roughly every 5-10 minutes it will stall
>>&

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-12-09 Thread Patrick O'Lone

I initially thought this was the case as well. These are slave nodes
that receive updates every 5-10 minutes. However, this issue happens
even if replication is turned off and no update handler is provided at all.

I have confirmed against my data that simply querying the fq for a
start_time in a range takes 11-13 seconds to actually populate the
cache. If I make the fq not cache at all, my QTime raises by about
100ms, but does not have the stalling effect. A purely negative query
also seems to have this effect, that is:

fq=-start_time:[NOW/MINUTE TO *]

But, I'm not sure if that is because it actually caches the negative
query or if it discards it entirely.

> Patrick,
> 
> Are you getting these stalls following a commit? If so then the issue is
> most likely fieldCache warming pauses. To stop your users from seeing
> this pause you'll need to add static warming queries to your
> solrconfig.xml to warm the fieldCache before it's registered .
> 
> 
> On Mon, Dec 9, 2013 at 12:33 PM, Patrick O'Lone  <mailto:pol...@townnews.com>> wrote:
> 
> Well, I want to include everything will start in the next 5 minute
> interval and everything that came before. The query is more like:
> 
> fq=start_time:[* TO NOW+5MINUTE/5MINUTE]
> 
> so that it rounds to the nearest 5 minute interval on the right-hand
> side. But, as soon as 1 second after that 5 minute window, everything
> pauses wanting for filter cache (at least that's my working theory based
> on observation). Is it possible to do something like:
> 
> fq=start_time:[* TO NOW+1DAY/DAY]&q=start_time:[* TO NOW/MINUTE]
> 
> where it would use the filter cache to narrow down by day resolution and
> then filter as part of the standard query, or something like that?
> 
> My thought is that this would still gain a benefit from a query cache,
> but somewhat slower since it must remove results for things appearing
> later in the day.
> 
> > If you want a start time within the next 5 minutes, I think your
> filter
> > is not the good one.
> > * will be replaced by the first date in your field
> >
> > Try :
> > fq=start_time:[NOW TO NOW+5MINUTE]
> >
> > Franck Brisbart
> >
> >
> > Le lundi 09 d�cembre 2013 � 09:07 -0600, Patrick O'Lone a �crit :
> >> I have a new question about this issue - I create a filter queries of
> >> the form:
> >>
> >> fq=start_time:[* TO NOW/5MINUTE]
> >>
> >> This is used to restrict the set of documents to only items that
> have a
> >> start time within the next 5 minutes. Most of my indexes have
> millions
> >> of documents with few documents that start sometime in the future.
> >> Nearly all of my queries include this, would this cause every other
> >> search thread to block until the filter query is re-cached every 5
> >> minutes and if so, is there a better way to do it? Thanks for any
> >> continued help with this issue!
> >>
> >>> We have a webapp running with a very high HEAP size (24GB) and
> we have
> >>> no problems with it AFTER we enabled the new GC that is meant to
> replace
> >>> sometime in the future the CMS GC, but you have to have Java 6
> update
> >>> "Some number I couldn't find but latest should cover" to be able
> to use:
> >>>
> >>> 1. Remove all GC options you have and...
> >>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
> >>>
> >>> As a test of course, more information you can read on the
> following (and
> >>> interesting) article, we also have Solr running with these
> options, no
> >>> more pauses or HEAP size hitting the sky.
> >>>
> >>> Don't get bored reading the 1st (and small) introduction page of the
> >>> article, page 2 and 3 will make lot of sense:
> >>>
> 
> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
> >>>
> >>>
> >>> HTH,
> >>>
> >>> Guido.
> >>>
> >>> On 26/11/13 21:59, Patrick O'Lone wrote:
> >>>> We do perform a lot of sorting - on multiple fields in fact. We
> have
> >>>> different kinds of Solr configurations - our news searches do
> little
> >>>>

LFU cache and autowarming

2013-12-19 Thread Patrick O'Lone

If I was to use the LFU cache instead of FastLRU on the filter cache, if
I enable auto-warming on that cache type - does it warm the most
frequently used fq on the filter cache? Thanks for any info!

-- 
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830

Re: LFU cache and autowarming

2013-12-19 Thread Patrick O'Lone

Well, I haven't tested it - if it's not ready yet I will probably avoid
for now.

> On 12/19/2013 1:46 PM, Patrick O'Lone wrote:
>> If I was to use the LFU cache instead of FastLRU on the filter cache, if
>> I enable auto-warming on that cache type - does it warm the most
>> frequently used fq on the filter cache? Thanks for any info!
> 
> I wrote that cache.  It's a really really crappy implementation, I would
> only expect it to work well if it's the cache is very very small.
> 
> I do have a replacement implementation that's just about ready, but I've
> not been able to find 'round tuits to work on getting it polished and
> committed.
> 
> https://issues.apache.org/jira/browse/SOLR-2906
> https://issues.apache.org/jira/browse/SOLR-3393
> 
> Thanks,
> Shawn
> 
> 


-- 
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830

Re: no servers hosting shard

2014-01-07 Thread patrick conant

After a full bounce of Tomcat, I'm now getting a new exception (below).  I
can browse the Zookeeper config in the Solr admin UI, and can confirm that
there's a node for '/collections/customerOrderSearch/leaders/shard2', but
no node for 'collections/customerOrderSearch/leaders/shard1'.  Still, any
ideas or guidance on how to recover would be appreciated.  We've restarted
all three zookeeper instances and both Solr instances, but that hasn't made
any appreciable difference.

--p.

2014-01-07 10:06:14,980 [coreLoadExecutor-4-thread-1] ERROR
org.apache.solr.core.CoreContainer -
null:org.apache.solr.common.cloud.ZooKeeperException:
at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:309)
at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:556)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:365)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Error getting leader from
zk for shard shard1
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:864)
at org.apache.solr.cloud.ZkController.register(ZkController.java:773)
at org.apache.solr.cloud.ZkController.register(ZkController.java:723)
at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:286)
... 11 more
Caused by: org.apache.solr.common.SolrException: Could not get leader props
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:911)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:875)
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:839)
... 14 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/customerOrderSearch/leaders/shard1
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:252)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:249)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:249)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:889)
... 16 more

On Tue, Jan 7, 2014 at 9:57 AM, patrick conant wrote:

> In our Solr instance we have two shards each running on two servers.  The
> server that was the leader for one of the shards ran into a problem, and
> when we restarted the service, Solar is no longer electing a leader for the
> shard.
>
> The stack traces from the logs are below, and the 'Cloud Dump' from the
> Solr console is attached.  We're running Solr 4.4.0.  Any guidance on how
> to recover from this?  Restarting or redeploying the service doesn't seem
> to make any difference.
>
> Thanks,
> Pat.
>
>
> 2014-01-07 00:00:10,754 [http-8080-62] ERROR org.apache.solr.core.SolrCore
> - org.apache.solr.common.SolrException: no servers hosting shard:
>  at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
>
> 2014-01-07 09:38:33,701 [http-8080-21] ERROR org.apache.solr.core.SolrCore
> - org.apache.solr.common.SolrException: No registered leader was found,
> collection:customerOrderSearch slice:shard1
>  at
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:487)
> at
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:470)
>  at
> org.

Re: no servers hosting shard

2014-01-07 Thread patrick conant

We found a way to recover.  This sequence allowed everything to start up
successfully.

- Stop all Solr instances
- Stop all Zookeeper instances
- Start all Zookeeper instances
- Start Solr instances one at a time.

Restarting the first Solr instance took several minutes, but the subsequent
instances started up much more quickly.

Cheers,
Pat.





On Tue, Jan 7, 2014 at 10:20 AM, patrick conant wrote:

> After a full bounce of Tomcat, I'm now getting a new exception (below).  I
> can browse the Zookeeper config in the Solr admin UI, and can confirm that
> there's a node for '/collections/customerOrderSearch/leaders/shard2', but
> no node for 'collections/customerOrderSearch/leaders/shard1'.  Still, any
> ideas or guidance on how to recover would be appreciated.  We've restarted
> all three zookeeper instances and both Solr instances, but that hasn't made
> any appreciable difference.
>
> --p.
>
>
>
>
> 2014-01-07 10:06:14,980 [coreLoadExecutor-4-thread-1] ERROR
> org.apache.solr.core.CoreContainer -
> null:org.apache.solr.common.cloud.ZooKeeperException:
>  at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:309)
> at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:556)
>  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:365)
> at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.solr.common.SolrException: Error getting leader from
> zk for shard shard1
> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:864)
>  at org.apache.solr.cloud.ZkController.register(ZkController.java:773)
> at org.apache.solr.cloud.ZkController.register(ZkController.java:723)
>  at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:286)
> ... 11 more
> Caused by: org.apache.solr.common.SolrException: Could not get leader props
>  at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:911)
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:875)
>  at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:839)
> ... 14 more
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /collections/customerOrderSearch/leaders/shard1
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:252)
>  at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:249)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
>  at
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:249)
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:889)
>  ... 16 more
>
>
>
> On Tue, Jan 7, 2014 at 9:57 AM, patrick conant 
> wrote:
>
>> In our Solr instance we have two shards each running on two servers.  The
>> server that was the leader for one of the shards ran into a problem, and
>> when we restarted the service, Solar is no longer electing a leader for the
>> shard.
>>
>> The stack traces from the logs are below, and the 'Cloud Dump' from the
>> Solr console is attached.  We're running Solr 4.4.0.  Any guidance on how
>> to recover from this?  Restarting or redeploying the service doesn't seem
>> to make any difference.
>>
>> Thanks,
>> Pat.
>>
>>
>> 2014-01-07 00:00:10,754 [http-8080-62] ERROR
>> org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: no
>> servers hosting shard:
>>  at
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
>> at
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
>>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>  at
>> java.util.concurrent.Executors$RunnableAdapter.call(E

Best way to map holidays to corresponding date

2014-11-05 Thread Patrick Kirsch

Hey,
 maybe someone already faced the situation and could give me a hint.

Given one query includes "Easter" or "Sylvester" I search for the best
place to translate the string to the corresponding date.

Is there any solr.Mapping*Factory for that?
Do I need to implement it in a custom Solr Query Parser etc.?

Regards,
 Patrick

Handling growth

2014-11-13 Thread Patrick Henry

Hello everyone,

I am working with a Solr collection that is several terabytes in size over
 several hundred millions of documents.  Each document is very rich, and
over the past few years we have consistently quadrupled the size our
collection annually.  Unfortunately, this sits on a single node with only a
few hundred megabytes of memory - so our performance is less than ideal.

I am looking into implementing a SolrCloud cluster.  From reading a few
books (i.e. Solr in Action), various internet blogs, and the reference
guide, it states to build a cluster with room to grow.  I can probably
provision enough hardware for a year from today worth of growth, however I
would like to have a plan beyond that.  Shard splitting seems pretty
straight forward.  We are in a continuous adding documents and never change
existing ones.  Based on that, one individual recommended for me to
implement custom hashing and route the latest documents to the shard with
the least documents, and when that shard fills up add a new shard and index
on the new shard, rinse and repeat.

The last one makes sense.  However, my concern with the last one is I lose
the distributed indexing, implementation concerns and the maintainability.
My question for the community is what are your thoughts are on this, and do
you have any suggestion and/or recommendations on planning for future
growth?

Look forward to your responses,
Patrick

RE: Handling growth

2014-11-19 Thread Patrick Henry

Good eye, that should have been gigabytes.  When adding to the new shard,
is the shard already part of the the collection?  What mechanism have you
found useful in accomplishing this (i.e. routing)?
On Nov 14, 2014 7:07 AM, "Toke Eskildsen"  wrote:

> Patrick Henry [patricktheawesomeg...@gmail.com] wrote:
>
> >I am working with a Solr collection that is several terabytes in size over
> > several hundred millions of documents.  Each document is very rich, and
> > over the past few years we have consistently quadrupled the size our
> > collection annually.  Unfortunately, this sits on a single node with
> only a
> > few hundred megabytes of memory - so our performance is less than ideal.
>
> I assume you mean gigabytes of memory. If you have not already done so,
> switching to SSDs for storage should buy you some more time.
>
> > [Going for SolrCloud]  We are in a continuous adding documents and never
> change
> > existing ones.  Based on that, one individual recommended for me to
> > implement custom hashing and route the latest documents to the shard with
> > the least documents, and when that shard fills up add a new shard and
> index
> > on the new shard, rinse and repeat.
>
> We have quite a similar setup, where we produce a never-changing shard
> once every 8 days and add it to our cloud. One could also combine this
> setup with a single live shard, for keeping the full index constantly up to
> date. The memory overhead of running an immutable shard is smaller than a
> mutable one and easier to fine-tune. It also allows you to optimize the
> index down to a single segment, which requires a bit less processing power
> and saves memory when faceting. There's a description of our setup at
> http://sbdevel.wordpress.com/net-archive-search/
>
> From an administrative point of view, we like having complete control over
> each shard. We keep track of what goes in it and in case of schema or
> analyze chain changes, we can re-build each shard one at a time and deploy
> them continuously, instead of having to re-build everything in one go on a
> parallel setup. Of course, fundamental changes to the schema would require
> a complete re-build before deploy, so we hope to avoid that.
>
> - Toke Eskildsen
>

Re: Handling growth

2014-11-19 Thread Patrick Henry

Michael,

Interesting, I'm still unfamiliar with limitations (if any) of aliasing.
Does architecture utilize realtime get?
On Nov 18, 2014 11:49 AM, "Michael Della Bitta" <
michael.della.bi...@appinions.com> wrote:

> We're achieving some success by treating aliases as collections and
> collections as shards.
>
> More specifically, there's a read alias that spans all the collections,
> and a write alias that points at the 'latest' collection. Every week, I
> create a new collection, add it to the read alias, and point the write
> alias at it.
>
> Michael
>
> On 11/14/14 07:06, Toke Eskildsen wrote:
>
>> Patrick Henry [patricktheawesomeg...@gmail.com] wrote:
>>
>>  I am working with a Solr collection that is several terabytes in size
>>> over
>>> several hundred millions of documents.  Each document is very rich, and
>>> over the past few years we have consistently quadrupled the size our
>>> collection annually.  Unfortunately, this sits on a single node with
>>> only a
>>> few hundred megabytes of memory - so our performance is less than ideal.
>>>
>> I assume you mean gigabytes of memory. If you have not already done so,
>> switching to SSDs for storage should buy you some more time.
>>
>>  [Going for SolrCloud]  We are in a continuous adding documents and never
>>> change
>>> existing ones.  Based on that, one individual recommended for me to
>>> implement custom hashing and route the latest documents to the shard with
>>> the least documents, and when that shard fills up add a new shard and
>>> index
>>> on the new shard, rinse and repeat.
>>>
>> We have quite a similar setup, where we produce a never-changing shard
>> once every 8 days and add it to our cloud. One could also combine this
>> setup with a single live shard, for keeping the full index constantly up to
>> date. The memory overhead of running an immutable shard is smaller than a
>> mutable one and easier to fine-tune. It also allows you to optimize the
>> index down to a single segment, which requires a bit less processing power
>> and saves memory when faceting. There's a description of our setup at
>> http://sbdevel.wordpress.com/net-archive-search/
>>
>>  From an administrative point of view, we like having complete control
>> over each shard. We keep track of what goes in it and in case of schema or
>> analyze chain changes, we can re-build each shard one at a time and deploy
>> them continuously, instead of having to re-build everything in one go on a
>> parallel setup. Of course, fundamental changes to the schema would require
>> a complete re-build before deploy, so we hope to avoid that.
>>
>> - Toke Eskildsen
>>
>
>

SolrCloud: Collection API question and problem with core loading

2013-07-15 Thread Patrick Mi

Hi there,

I run 2 solr instances ( Tomcat 7, Solr 4.3.0 , one shard),one external
Zookeeper instance and have lots of cores. 

I use collection API to create the new core dynamically after the
configuration for the core is uploaded to the Zookeeper and it all works
fine.

As there are so many cores it takes very long time to load them at start up
I would like to start up the server quickly and load the cores on demand.

When the core is created via collection API it is created with default
parameter : loadOnStartup="true" ( this can be seen in solr.xml )

Question: is there a way to specify this parameter so it can be set 'false'
in collection API ?  

Problem: If I manually set loadOnStartup="true" for the core I had exception
below when I used CloudSolrServer to query the core : 
Error: org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request  

Seems to me that CloudSolrServer will not trigger the core to be loaded. 

Is it possible to get the core loaded using CloudSolrServer?

Regards,
Patrick

RE: SolrCloud with Zookeeper ensemble : fail to restart master server

2013-04-16 Thread Patrick Mi

After a number of testing I found that running embedded zookeeper isn't a
good idea especially only run one Zookeeper instance. When the Solr instance
with ZooKeeper embedded gets rebooted it got confused who should be the
leader therefore it will not start while others(followers) are still
running. I now use standalone Zookeeper instance and that works well.

Thanks Erick for giving the right direction, much appreciated!

Regards,
Patrick

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, 20 March 2013 2:57 a.m.
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud with Zookeeper ensemble : fail to restart master
server

First, the bootstrap_conf  and numShards should only be specified the
_first_ time you start up your leader. bootstrap_conf's purpose is to push
the configuration files to Zookeeper. numShards is a one-time-only
parameter that you shouldn't specify more than once, it is ignored
afterwards I think. Once the conf files are up in zookeeper, then they
don't need to be pushed again until they change, and you can use the
command-line tools to do that

Terminology: we're trying to get away from master/slave and use
leader/replica in SolrCloud mode to distinguish it from the old replication
process, so just checking to be sure that you probably really mean
leader/replica, right?

 Watch your admin/SolrCloud link as you bring machines up and down. That
page will show you the state of each of your machines. Normally there's no
trouble bringing the leader up and down, _except_ it sounds like you have
your zookeeper running embedded. A quorum of ZK nodes (in this case one)
needs to be running for SolrCloud to operate. Still, that shouldn't prevent
your machine running ZK from coming back up.

So I'm a bit puzzled, but let's straighten out the startup stuff and watch
your solr log on your leader when you bring it up, that should generate
some more questions..

Best
Erick

On Mon, Mar 18, 2013 at 11:12 PM, Patrick Mi  wrote:

> Hi there,
>
> I have experienced some problems starting the master server.
>
> Solr4.2 under Tomcat 7 on Centos6.
>
> Configuration :
> 3 solr instances running on different machines, one shard, 3 cores, 2
> replicas, using Zookeeper comes with Solr
>
> The master server A has the following run option: -Dbootstrap_conf=true
> -DzkRun -DnumShards=1,
> The slave servers B and C have : -DzkHost=masterServerIP:2181
>
> It works well for add/update/delete etc after I start up master and slave
> servers in order.
>
> When the master A is up stop/start slave B and C are OK.
>
> When slave B and C are running I couldn't restart master A. Only after I
> shutdown B and C then I can start master A.
>
> Is this a feature or bug or something I haven't configure properly?
>
> Thanks advance for your help
>
> Regards,
> Patrick
>
>

OPENNLP current patch compiling problem for 4.x branch

2013-05-22 Thread Patrick Mi

Hi,

I checked out from here
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_3_0 and
downloaded the latest patch LUCENE-2899-current.patch.

Applied the patch ok but when I did 'ant compile' I got the following error:


==
[javac]
/home/lucene_solr_4_3_0/lucene/analysis/opennlp/src/java/org/apache/lucene/a
nalysis/opennlp/FilterPayloadsFilter.java:43: error
r: cannot find symbol
[javac] super(Version.LUCENE_44, input);
[javac]  ^
[javac]   symbol:   variable LUCENE_44
[javac]   location: class Version
[javac] 1 error
==

Compiled it on trunk without problem.

Is this patch supposed to work for 4.X?

Regards,
Patrick

RE: OPENNLP current patch compiling problem for 4.x branch

2013-05-28 Thread Patrick Mi

Thanks Steve, that worked for branch_4x 

-Original Message-
From: Steve Rowe [mailto:sar...@gmail.com] 
Sent: Friday, 24 May 2013 3:19 a.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP current patch compiling problem for 4.x branch

Hi Patrick,

I think you should check out and apply the patch to branch_4x, rather than
the lucene_solr_4_3_0 tag:

http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x

Steve

On May 23, 2013, at 2:08 AM, Patrick Mi 
wrote:

> Hi,
> 
> I checked out from here
> http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_3_0 and
> downloaded the latest patch LUCENE-2899-current.patch.
> 
> Applied the patch ok but when I did 'ant compile' I got the following
error:
> 
> 
> ==
>[javac]
>
/home/lucene_solr_4_3_0/lucene/analysis/opennlp/src/java/org/apache/lucene/a
> nalysis/opennlp/FilterPayloadsFilter.java:43: error
> r: cannot find symbol
>[javac] super(Version.LUCENE_44, input);
>[javac]  ^
>[javac]   symbol:   variable LUCENE_44
>[javac]   location: class Version
>[javac] 1 error
> ==
> 
> Compiled it on trunk without problem.
> 
> Is this patch supposed to work for 4.X?
> 
> Regards,
> Patrick 
>

OPENNLP problems

2013-05-28 Thread Patrick Mi

Hi there,

Checked out branch_4x and applied the latest patch
LUCENE-2899-current.patch however I ran into 2 problems

Followed the wiki page instruction and set up a field with this type aiming
to keep nouns and verbs and do a facet on the field
==

  




  

==

Struggled to get that going until I put the extra parameter
keepPayloads="true" in as below. 
 

Question: am I doing the right thing? Is this a mistake on wiki 

Second problem:

Posted the document xml one by one to the solr and the result was what I
expected.



  1
  check in the hotel


However if I put multiple documents into the same xml file and post it in
one go only the first document gets processed( only 'check' and 'hotel' were
showing in the facet result.) 
 


  1
  check in the hotel


  2
  removes the payloads


  3
  retains only nouns and verbs 



Same problem when updated the data using csv upload.

Is that a bug or something I did wrong?

Thanks in advance!

Regards,
Patrick

RE: OPENNLP problems

2013-06-09 Thread Patrick Mi

Hi Lance,

I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch
uploaded on 6th June but still had the same problem.


Regards,
Patrick

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Thursday, 6 June 2013 5:16 p.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP problems

Patrick-
I found the problem with multiple documents. The problem was that the 
API for the life cycle of a Tokenizer changed, and I only noticed part 
of the change. You can now upload multiple documents in one post, and 
the OpenNLPTokenizer will process each document.

You're right, the example on the wiki is wrong. The FilterPayloadsFilter 
default is to remove the given payloads, and needs keepPayloads="true" 
to retain them.

The fixed patch is up as LUCENE-2899-x.patch. Again, thanks for trying it.

Lance

https://issues.apache.org/jira/browse/LUCENE-2899

On 05/28/2013 10:08 PM, Patrick Mi wrote:
> Hi there,
>
> Checked out branch_4x and applied the latest patch
> LUCENE-2899-current.patch however I ran into 2 problems
>
> Followed the wiki page instruction and set up a field with this type
aiming
> to keep nouns and verbs and do a facet on the field
> ==
>  positionIncrementGap="100">
>
>   tokenizerModel="opennlp/en-token.bin"/>
>   posTaggerModel="opennlp/en-pos-maxent.bin"/>
>   payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/>
>  
>
>  
> ==
>
> Struggled to get that going until I put the extra parameter
> keepPayloads="true" in as below.
>payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/>
>
> Question: am I doing the right thing? Is this a mistake on wiki
>
> Second problem:
>
> Posted the document xml one by one to the solr and the result was what I
> expected.
>
> 
> 
>1
>check in the hotel
> 
>
> However if I put multiple documents into the same xml file and post it in
> one go only the first document gets processed( only 'check' and 'hotel'
were
> showing in the facet result.)
>   
> 
> 
>1
>check in the hotel
> 
> 
>2
>removes the payloads
> 
> 
>3
>retains only nouns and verbs 
> 
> 
>
> Same problem when updated the data using csv upload.
>
> Is that a bug or something I did wrong?
>
> Thanks in advance!
>
> Regards,
> Patrick
>
>

Stemming and other tokenizers

2011-09-09 Thread Patrick Sauts

Hello,

 

I want to implement some king of AutoStemming that will detect the language
of a field based on a tag at the start of this field like #en# my field is
stored on disc but I don't want this tag to be stored. Is there a way to
avoid this field to be stored ?

To me all the filters and the tokenizers interact only with the indexed
field and not the stored one.

Am I wrong ?

Is it possible to you to do such a filter.

 

Patrick.

Re: Master Slave Question

2011-09-10 Thread Patrick Sauts

Real Time indexing (solr 4) or decrease replication poll and auto commit
time.

2011/9/10 Jamie Johnson 

> Is it appropriate to query the master servers when replicating?  I ask
> because there could be a case where we index say 50 documents to the
> master, they have not yet been replicated and a user asks for page 2,
> when they ask for page 2 the request could be sent to a slave and get
> 0.  Is there a way to avoid this?  My thought was to not allow
> querying of the master but I'm not sure that this could be configured
> in solr
>

Re: Stemming and other tokenizers

2011-09-11 Thread Patrick Sauts

I can't create one field per language, that is the problem but I'll dig into
it following your indications.
I let you know what I could come out with.

Patrick.

2011/9/11 Jan Høydahl 

> Hi,
>
> You'll not be able to detect language and change stemmer on the same field
> in one go. You need to create one fieldType in your schema per language you
> want to use, and then use LanguageIdentification (SOLR-1979) to do the magic
> of detecting language and renaming the field. If you set
> langid.override=false, languid.map=true and populate your "language" field
> with the known language, you will probably get the desired effect.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 10. sep. 2011, at 03:24, Patrick Sauts wrote:
>
> > Hello,
> >
> >
> >
> > I want to implement some king of AutoStemming that will detect the
> language
> > of a field based on a tag at the start of this field like #en# my field
> is
> > stored on disc but I don't want this tag to be stored. Is there a way to
> > avoid this field to be stored ?
> >
> > To me all the filters and the tokenizers interact only with the indexed
> > field and not the stored one.
> >
> > Am I wrong ?
> >
> > Is it possible to you to do such a filter.
> >
> >
> >
> > Patrick.
> >
>
>

RE: Weird behaviors with not operators.

2011-09-12 Thread Patrick Sauts

Maybe this will answer your question
http://wiki.apache.org/solr/FAQ

Why does 'foo AND -baz' match docs, but 'foo AND (-bar)' doesn't ?

Boolean queries must have at least one "positive" expression (ie; MUST or
SHOULD) in order to match. Solr tries to help with this, and if asked to
execute a BooleanQuery that does contains only negatived clauses _at the
topmost level_, it adds a match all docs query (ie: *:*)

If the top level BoolenQuery contains somewhere inside of it a nested
BooleanQuery which contains only negated clauses, that nested query will not
be modified, and it (by definition) an't match any documents -- if it is
required, that means the outer query will not match.

More Detail:

*  https://issues.apache.org/jira/browse/SOLR-80
*
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201006.mbox/%3Cal
pine.deb.1.10.1006011609080.29...@radix.cryptio.net%3E


Patrick.
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Monday, September 12, 2011 3:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Weird behaviors with not operators.


: I'm crashing into a weird behavior with - operators.

I went ahead and added a FAQ on this using some text from a previous nearly
identical email ...

https://wiki.apache.org/solr/FAQ#Why_does_.27foo_AND_-baz.27_match_docs.2C_b
ut_.27foo_AND_.28-bar.29.27_doesn.27t_.3F

please reply if you have followup questions.


-Hoss

RE: Weird behaviors with not operators.

2011-09-12 Thread Patrick Sauts

I mean it's a known bug.

Hostetter  AND (-chris *:*) 

Should do the trick.
Depending on your request.

NAME:(-chris *:*)

-Original Message-
From: Patrick Sauts [mailto:patrick.via...@gmail.com] 
Sent: Monday, September 12, 2011 3:57 PM
To: solr-user@lucene.apache.org
Subject: RE: Weird behaviors with not operators.

Maybe this will answer your question
http://wiki.apache.org/solr/FAQ

Why does 'foo AND -baz' match docs, but 'foo AND (-bar)' doesn't ?

Boolean queries must have at least one "positive" expression (ie; MUST or
SHOULD) in order to match. Solr tries to help with this, and if asked to
execute a BooleanQuery that does contains only negatived clauses _at the
topmost level_, it adds a match all docs query (ie: *:*)

If the top level BoolenQuery contains somewhere inside of it a nested
BooleanQuery which contains only negated clauses, that nested query will not
be modified, and it (by definition) an't match any documents -- if it is
required, that means the outer query will not match.

More Detail:

*  https://issues.apache.org/jira/browse/SOLR-80
*
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201006.mbox/%3Cal
pine.deb.1.10.1006011609080.29...@radix.cryptio.net%3E


Patrick.
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Monday, September 12, 2011 3:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Weird behaviors with not operators.


: I'm crashing into a weird behavior with - operators.

I went ahead and added a FAQ on this using some text from a previous nearly
identical email ...

https://wiki.apache.org/solr/FAQ#Why_does_.27foo_AND_-baz.27_match_docs.2C_b
ut_.27foo_AND_.28-bar.29.27_doesn.27t_.3F

please reply if you have followup questions.


-Hoss

facet.method=fc

2011-09-14 Thread Patrick Sauts

Is the parameter facet.method=fc still needed ?

Thank you.

Patrick.

Solr-3.5.0/Nutch-1.4 - SolrDeleteDuplicates fails

2011-12-11 Thread Patrick Durusau


Greetings!

This may be a Nutch question and if so, I will repost to the Nutch list.

I can run the following commands with Solr-3.5.0/Nutch-1.4:

bin/nutch crawl urls -dir crawl -depth 3 -topN 5


then:

bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb 
crawl/linkdb crawl/segments/*


successfully.

But, if I run:

bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5

It fails with the following messages:

SolrIndexer: starting at 2011-12-11 14:01:27

Adding 11 documents

SolrIndexer: finished at 2011-12-11 14:01:28, elapsed: 00:00:01

SolrDeleteDuplicates: starting at 2011-12-11 14:01:28

SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/

Exception in thread "main" java.io.IOException: Job failed!

at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)

at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)

at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)

at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

I am running on Ubuntu 10.10 with 12 GB of memory, Java version 1.6.0_26.

I can delete the crawl directory and replicate this error consistently.

Suggestions?

Other than "...use the way that doesn't fail." ;-)

I am concerned that a different invocation of Solr failing consistently 
represents something that may cause trouble elsewhere when least 
expected. (And hard to isolate as the problem.)


Thanks!

Hope everyone is having a great weekend!

Patrick

PS: From the hadoop log (when it fails) if that's helpful:

2011-12-11 15:21:51,436 INFO  solr.SolrWriter - Adding 11 documents

2011-12-11 15:21:52,250 INFO  solr.SolrIndexer - SolrIndexer: finished at 
2011-12-11 15:21:52, elapsed: 00:00:01

2011-12-11 15:21:52,251 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: 
starting at 2011-12-11 15:21:52

2011-12-11 15:21:52,251 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: 
Solr url: http://localhost:8983/solr/

2011-12-11 15:21:52,330 WARN  mapred.LocalJobRunner - job_local_0020

java.lang.NullPointerException

at org.apache.hadoop.io.Text.encode(Text.java:388)

at org.apache.hadoop.io.Text.set(Text.java:178)

at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)

at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)

at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)

at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)


--
Patrick Durusau
patr...@durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
OASIS Technical Advisory Board (TAB) - member

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau

Solr-3.5.0/Nutch-1.4 - SolrDeleteDuplicates fails

2011-12-12 Thread Patrick Durusau


Greetings!

On the Nutch Tutorial:

I can run the following commands with Solr-3.5.0/Nutch-1.4:

bin/nutch crawl urls -dir crawl -depth 3 -topN 5


then:

bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb 
crawl/linkdb crawl/segments/*



successfully.

But, if I run:

bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5

It fails with the following messages:

SolrIndexer: starting at 2011-12-11 14:01:27

Adding 11 documents

SolrIndexer: finished at 2011-12-11 14:01:28, elapsed: 00:00:01

SolrDeleteDuplicates: starting at 2011-12-11 14:01:28

SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/

Exception in thread "main" java.io.IOException: Job failed!

at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)

at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)


at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)


at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

I am running on Ubuntu 10.10 with 12 GB of memory, Java version 1.6.0_26.

I can delete the crawl directory and replicate this error consistently.

Suggestions?

Other than "...use the way that doesn't fail." ;-)

I am concerned that a different invocation of Solr failing consistently 
represents something that may cause trouble elsewhere when least 
expected. (And hard to isolate as the problem.)


Thanks!

Hope everyone is having a great weekend!

Patrick

PS: From the hadoop log (when it fails) if that's helpful:

2011-12-11 15:21:51,436 INFO  solr.SolrWriter - Adding 11 documents

2011-12-11 15:21:52,250 INFO  solr.SolrIndexer - SolrIndexer: finished 
at 2011-12-11 15:21:52, elapsed: 00:00:01


2011-12-11 15:21:52,251 INFO  solr.SolrDeleteDuplicates - 
SolrDeleteDuplicates: starting at 2011-12-11 15:21:52


2011-12-11 15:21:52,251 INFO  solr.SolrDeleteDuplicates - 
SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/


2011-12-11 15:21:52,330 WARN  mapred.LocalJobRunner - job_local_0020

java.lang.NullPointerException

at org.apache.hadoop.io.Text.encode(Text.java:388)

at org.apache.hadoop.io.Text.set(Text.java:178)

at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)


at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)


at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)


at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)


at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)


--
Patrick Durusau
patr...@durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
OASIS Technical Advisory Board (TAB) - member

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau

Re: How to get SolrServer within my own servlet

2011-12-13 Thread Patrick Plaatje

Have à look here first and you're will probably be using SolrEmbeddedServer.

http://wiki.apache.org/solr/Solrj

Patrick


Op 13 dec. 2011 om 20:38 heeft Joey  het volgende 
geschreven:

> Anybody could help?
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583368.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to get SolrServer within my own servlet

2011-12-13 Thread Patrick Plaatje

Hey Joey,

You should first configure your deployed Solr instance by adding/changing the 
schema.xml and solrconfig.xml. After that you can use SolrJ to connect to that 
Solr instance and add documents to it. On the link i posted earlier, you'll 
find à couple of examples on how to do that.

- Patrick 

Verstuurd vanaf mijn iPhone

Op 13 dec. 2011 om 20:53 heeft Joey  het volgende 
geschreven:

> Thanks Patrick  for the reply. 
> 
> What I did was un-jar solr.war and created my own web application. Now I
> want to write my own servlet to index all files inside a folder. 
> 
> I suppose there is already solrserver instance initialized when my web app
> started. 
> 
> How can I access that solr server instance in my servlet?
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583416.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: blocking access by user-agent

2011-12-21 Thread Patrick Plaatje

Hi Roland,

you can configure Jetty to use a simple .htaccess file to allow only
specific IP adresses access to your webapp. Have a look here on how to do
thta:

http://www.viaboxxsystems.de/how-to-configure-your-jetty-webapp-to-grant-access-for-dedicated-ip-addresses-only

If you want more sophisticated access control, you need it to be included
in an extra layer between Solr and the devices accressing your Solr
instance.


- Patrick


2011/12/21 RT 

> Hi,
>
> I would like to control what applications get access to the solr database.
> I am using jetty as the appcontainer.
>
> Is this at all achievable? If yes, how?
>
> Internet search has not yielded anything I could use so far.
>
> Thanks in advance.
>
> Roland
>



-- 
Patrick Plaatje
Senior Consultant
<http://www.nmobile.nl/>

Re: Searching partial phone numbers

2012-01-19 Thread Patrick Plaatje

Hi Marotosg,

you can index the phonenumber field with the ngram field type, which allows
for partial (wildcard) searches on this field. Have a look here:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory

Cheers,

Patrick



2012/1/19 marotosg 

> Hi.
> I have phone numbers in my solr schema in a field. At the moment i have
> this
> field as string.
> I would like to be able to make searches that find parts of a  phone
> number.
>
> For instance:
> Number +35384589458
>
> search by  *+35384* or search by  *84589*.
>
> Do you know if this is posible?
>
> Thanks a lot
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Searching-partial-phone-numbers-tp3671908p3671908.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Patrick Plaatje
Senior Consultant
<http://www.nmobile.nl/>

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Patrick Plaatje

Partially agree. If just the facts are given, and not a complete sales talk
instead, it'll be fine. Don't overdo it like this though.

Cheers,

Patrick


2012/1/19 Darren Govoni 

> I think the occassional "Hey, we made something cool you might be
> interested in!" notice, even if commercial, is ok
> because it addresses numerous issues we struggle with on this list.
>
> Now, if it were something completely off-base or unrelated (e.g. male
> enhancement pills), then yeah, I agree.
>
> On 01/18/2012 11:08 PM, Steven A Rowe wrote:
>
>> Hi Darren,
>>
>> I think it's rare because it's rare: if this were found to be a useful
>> advertising space, rare would cease to be descriptive of it.  But I could
>> be wrong.
>>
>> Steve
>>
>>  -Original Message-
>>> From: Darren Govoni [mailto:dar...@ontrenet.com]
>>> Sent: Wednesday, January 18, 2012 8:40 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: How to accelerate your Solr-Lucene appication by 4x
>>>
>>> And to be honest, many people on this list are professionals who not
>>> only build their own solutions, but also buy tools and tech.
>>>
>>> I don't see what the big deal is if some clever company has something of
>>> imminent value here to share it. Considering that its a rare event.
>>>
>>> On 01/18/2012 08:28 PM, Jason Rutherglen wrote:
>>>
>>>> Steven,
>>>>
>>>> If you are going to admonish people for advertising, it should be
>>>> equally dished out or not at all.
>>>>
>>>> On Wed, Jan 18, 2012 at 6:38 PM, Steven A Rowe   wrote:
>>>>
>>>>> Hi Peter,
>>>>>
>>>>> Commercial solicitations are taboo here, except in the context of a
>>>>>
>>>> request for help that is directly relevant to a product or service.
>>>
>>>> Please don’t do this again.
>>>>>
>>>>> Steve Rowe
>>>>>
>>>>> From: Peter Velikin [mailto:pe...@velobit.com]
>>>>> Sent: Wednesday, January 18, 2012 6:33 PM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: How to accelerate your Solr-Lucene appication by 4x
>>>>>
>>>>> Hello Solr users,
>>>>>
>>>>> Did you know that you can boost the performance of your Solr
>>>>>
>>>> application using your existing servers? All you need is commodity SSD
>>> and
>>> plug-and-play software like VeloBit.
>>>
>>>> At ZoomInfo, a leading business information provider, VeloBit increased
>>>>>
>>>> the performance of the Solr-Lucene-powered application by 4x.
>>>
>>>> I would love to tell you more about VeloBit and find out if we can
>>>>>
>>>> deliver same business benefits at your company. Click
>>> here<http://www.velobit.com/**15-minute-brief<http://www.velobit.com/15-minute-brief>>
>>>   for a 15-minute
>>> briefing<http://www.velobit.**com/15-minute-brief<http://www.velobit.com/15-minute-brief>>
>>>   on the VeloBit
>>> technology.
>>>
>>>> Here is more information on how VeloBit helped ZoomInfo:
>>>>>
>>>>>   *   Increased Solr-Lucene performance by 4x using existing servers
>>>>>
>>>> and commodity SSD
>>>
>>>>   *   Installed VeloBit plug-and-play SSD caching software in 5-minutes
>>>>>
>>>> transparent to running applications and storage infrastructure
>>>
>>>>   *   Reduced by 75% the hardware and monthly operating costs required
>>>>>
>>>> to support service level agreements
>>>
>>>> Technical Details:
>>>>>
>>>>>   *   Environment: Solr‐Lucene indexed directory search service fronted
>>>>>
>>>> by J2EE web application technology
>>>
>>>>   *   Index size: 600 GB
>>>>>   *   Number of items indexed: 50 million
>>>>>   *   Primary storage: 6 x SAS HDD
>>>>>   *   SSD Cache: VeloBit software + OCZ Vertex 3
>>>>>
>>>>> Click 
>>>>> here<http://www.velobit.com/**use-cases/enterprise-search/<http://www.velobit.com/use-cases/enterprise-search/>>
>>>>>   to
>>>>>
>>>> read more about the ZoomInfo Solr-Lucene case
>>> study<http://www.velobit.com/**us

Re: Problem instantiating CommonsHttpSolrServer using solrj

2010-08-13 Thread Patrick Archibald

I went through jar hell yesterday. I finally got Solrj working.
http://jarfinder.com was a big help.

Rock on, PLA

Patrick L Archibald
http://patrickarchibald.com



On Fri, Aug 13, 2010 at 7:25 PM, Chris Hostetter
wrote:

>
> :  I get the following runtime error:
> :
> : Exception in thread "main" java.lang.NoClassDefFoundError:
> : org/apache/solr/client/solrj/SolrServerException
> : Caused by: java.lang.ClassNotFoundException:
> : org.apache.solr.client.solrj.SolrServerException
> ...
> : I am following the this link :  http://wiki.apache.org/solr/Solrj   ,and
> : have included all the jar files specified there, in the classpath.
>
> Are you certain?
>
> the class it can't find is
> org.apache.solr.client.solrj.SolrServerException which is definitely in
> the apache-solr-solrj-*.jar
>
> did you perchance copy the list of jars verbatim from that wiki? because
> someone seems to have made a typo and called it "solr-solrj-1.4.0.jar"
> instead of "apache-solr-solrj-1.4.0.jar" but if you actually *look* at the
> jars available, it's pretty obvious.
>
>
> -Hoss
>
>

Limitations of prohibited clausses in sub-expression - pure negative query

2010-09-28 Thread Patrick Sauts

I can find the answer but is this problem solved in Solr 1.4.1 ?

Thx for your answers.

RE: Limitations of prohibited clausses in sub-expression - pure negative query

2010-09-28 Thread Patrick Sauts

Maybe SOLR-80 jira issue ?

 

As written in Solr 1.4 book; "pure negative query doesn't work correctly ."
you have to add 'AND *:* '

 

thx

 

 

 

From: Patrick Sauts [mailto:patrick.via...@gmail.com] 
Sent: mardi 28 septembre 2010 11:53
To: 'solr-user@lucene.apache.org'
Subject: Limitations of prohibited clausses in sub-expression - pure
negative query

 

I can find the answer but is this problem solved in Solr 1.4.1 ?

Thx for your answers.

DataDirectory: relative path doesn't work

2013-02-25 Thread Patrick Mi

I am running Solr4.0/Tomcat 7 on Centos6

According to this page http://wiki.apache.org/solr/SolrConfigXml if
 is not absolute, then it is relative to the instanceDir of the
SolrCore.

However the index directory is always created under the directory where I
start the Tomcat (startup.sh) rather than under instanceDir of the SolrCore.

Am I doing something wrong in configuration?

Regards,
Patrick

RE: DataDirectory: relative path doesn't work

2013-03-11 Thread Patrick Mi

Thanks for fixing the wiki page http://wiki.apache.org/solr/SolrConfigXml
now it says this:
'If this directory is not absolute, then it is relative to the directory
you're in when you start SOLR.'

It will be nice if you drop me a line here after you make the change on the
document ...

-Original Message-
From: Patrick Mi [mailto:patrick...@touchpointgroup.com] 
Sent: Tuesday, 26 February 2013 5:49 p.m.
To: solr-user@lucene.apache.org
Subject: DataDirectory: relative path doesn't work 

I am running Solr4.0/Tomcat 7 on Centos6

According to this page http://wiki.apache.org/solr/SolrConfigXml if
 is not absolute, then it is relative to the instanceDir of the
SolrCore.

However the index directory is always created under the directory where I
start the Tomcat (startup.sh) rather than under instanceDir of the SolrCore.

Am I doing something wrong in configuration?

Regards,
Patrick

SolrCloud with Zookeeper ensemble : fail to restart master server

2013-03-18 Thread Patrick Mi

Hi there,

I have experienced some problems starting the master server.

Solr4.2 under Tomcat 7 on Centos6.

Configuration : 
3 solr instances running on different machines, one shard, 3 cores, 2
replicas, using Zookeeper comes with Solr 

The master server A has the following run option: -Dbootstrap_conf=true
-DzkRun -DnumShards=1, 
The slave servers B and C have : -DzkHost=masterServerIP:2181 

It works well for add/update/delete etc after I start up master and slave
servers in order.

When the master A is up stop/start slave B and C are OK.

When slave B and C are running I couldn't restart master A. Only after I
shutdown B and C then I can start master A.

Is this a feature or bug or something I haven't configure properly?

Thanks advance for your help

Regards,
Patrick

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Patrick Plaatje

A start maybe to use a RAM disk for that. Mount is as a normal disk and
have the index files stored there. Have a read here:

http://en.wikipedia.org/wiki/RAM_disk

Cheers,

Patrick


2012/2/8 Ted Dunning 

> This is true with Lucene as it stands.  It would be much faster if there
> were a specialized in-memory index such as is typically used with high
> performance search engines.
>
> On Tue, Feb 7, 2012 at 9:50 PM, Lance Norskog  wrote:
>
> > Experience has shown that it is much faster to run Solr with a small
> > amount of memory and let the rest of the ram be used by the operating
> > system "disk cache". That is, the OS is very good at keeping the right
> > disk blocks in memory, much better than Solr.
> >
> > How much RAM is in the server and how much RAM does the JVM get? How
> > big are the documents, and how large is the term index for your
> > searches? How many documents do you get with each search? And, do you
> > use filter queries- these are very powerful at limiting searches.
> >
> > 2012/2/7 James :
> > > Is there any practice to load index into RAM to accelerate solr
> > performance?
> > > The over all documents is about 100 million. The search time around
> > 100ms. I am seeking some method to accelerate the respond time for solr.
> > > Just check that there is some practice use SSD disk. And SSD is also
> > cost much, just want to know is there some method like to load the index
> > file in RAM and keep the RAM index and disk index synchronized. Then I
> can
> > search on the RAM index.
> >
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
> >
>



-- 
Patrick Plaatje
Senior Consultant
<http://www.nmobile.nl/>

Show SQL-DIH datasource name in result list

2011-01-08 Thread Patrick Kirsch


Hey,
 does somebody know, if there is a command option in Solr to show which 
datasource provided the result.
Or with other words: is it possible to output in the result the tag name 
given in  or ?


 Let me explain:
 - I'm using the SQL-DIH with a lot of datasources and several 
entities. Every datasource has a name e.g.  and 
every entity, too, e.g. 


 - Now at the result list I would need to know, which e.g. table the 
result provided, e.g.


 
 
  0
 


Thanks,
 Patrick

Solr 1.4.1 and carrot2 clustering

2011-01-16 Thread Patrick Pekczynski


Dear all,

I really enjoy using Solr so far. During the last days I tried to activate the 
ClusteringComponent in Solr as indicated here

http://wiki.apache.org/solr/ClusteringComponent

and copied all the relevant java libraries in the WEB-INF/lib folder of my 
tomcat installation of Solr.

But everytime I try to issue a request to my Solr server using

http://localhost:9005/apache-solr-1.4.1/job0/select?q=*:*&fl=title,score,url&start=0&rows=100&indent=on&clustering=true

I get the following error message:

 java.lang.NoClassDefFoundError: bak/pcj/set/IntSet
at 
org.carrot2.text.preprocessing.PreprocessingPipeline.(PreprocessingPipeline.java:47)
at 
org.carrot2.clustering.lingo.LingoClusteringAlgorithm.(LingoClusteringAlgorithm.java:108)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at 
org.carrot2.util.pool.SoftUnboundedPool.borrowObject(SoftUnboundedPool.java:114)
at 
org.carrot2.core.CachingController.borrowProcessingComponent(CachingController.java:329)



Hence I have downloaded the corresponding pcj-1.2.jar providing the interface 
"bak.pcj.set.IntSet" and I have also put it in the WEB-INF/lib folder

But I still keep getting this error message though the corresponding interface 
MUST be on the classpath now.

Can anyone help me out with this one? I'm really eager to give this clustering 
extension a try from within Solr using the 1.4.1 version that I have already
running on my server.

Thanks for a brief feedback.

Best regards,

Patrick

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Patrick Samborski

[X] ASF Mirrors (linked in our release announcements or via the Lucene
website)

[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
downstream project)

solr 1.4.1 -> 3.6.1; SOLR-758

2012-10-07 Thread Patrick Kirsch

Regarding https://issues.apache.org/jira/browse/SOLR-758 (Enhance 
DisMaxQParserPlugin to support full-Solr syntax and to support alternate 
escaping strategies.)


I'm updating from solr 1.4.1 to 3.6.1 (I'm aware that it is not beautiful).
After applying the attached patches to 3.6.1 I'm experiencing this problem:
 - SEVERE: org.apache.solr.common.SolrException: Error Instantiating 
QParserPlugin, org.apache.solr.search.AdvancedQParserPlugin is not a 
org.apache.solr.search.QParserPlugin

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:421)
at 
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:441)

at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1612)
[...]
   These patches seems no valid anymore.

Which leads me to the more experienced users here:

- Although not directly mentioned in 
https://issues.apache.org/jira/browse/SOLR-758, is there any other (new) 
QParser which obsoletes the DisMax?


- Futhermore I tried to make the patches apply ("forward porting"), but 
always get the error "Error Instantiating QParserPlugin, 
org.apache.solr.search.AdvancedQParserPlugin is not a 
org.apache.solr.search.QParserPlugin", although the class dependency is 
linear:


./core/src/java/org/apache/solr/search/AdvancedQParserPlugin.java:
[...]
public class AdvancedQParserPlugin extends DisMaxQParserPlugin {
[...]

./core/src/java/org/apache/solr/search/DisMaxQParserPlugin.java:
[...]
public class DisMaxQParserPlugin extends QParserPlugin {
[...]


Thanks,
 Patrick

Re: Solr3.6 DeleteByQuery not working with negated query

2012-10-22 Thread Patrick Plaatje

Hi Markus,

Why do you think it's not deleting amyrhing,?

Thanks,
Patrick
Op 22 okt. 2012 08:36 schreef "Markus.Mirsberger" 
het volgende:

> Hi,
>
> I am trying to delete a some documents in my index by query.
> When I just select them with this negated query, I get all the documents I
> want to delete but when I use this query in the DeleteByQuery it is not
> working
> Im trying to delete all elements which value ends with 'somename/' 
> When I use this for selection it works and I get exactly the right
> documents (about 10.000. so too many to delete one by one:) )
>
> curl http://:8080/solr/**core/update/?commit=true -H
> "Content-Type: text/xml" --data-binary '-**
> field:*somename/';
>
> And here the response:
> 
> 
> 0 name="QTime">11091
> 
>
> I tried to perform it in the browser too by using /update?stream.body  ...
> but the result is the same.
> And no Error in the Solr-Log.
>
> I hope someone can help me ... I dont want do this manually :)
>
> Regards,
> Markus
>

Re: Solr3.6 DeleteByQuery not working with negated query

2012-10-22 Thread Patrick Plaatje

Did you make sure to commit after the delete?

Patrick
Op 22 okt. 2012 08:43 schreef "Markus.Mirsberger" 
het volgende:

> Hi, Patrick,
>
> Because I have the same amount of documents in my index than before I
> perform the query.
> And when I use the negated query just to select the documents I can see
> they still there (and of course all other documents too :) )
>
> Regards,
> Markus
>
>
>
>
> On 22.10.2012 14:38, Patrick Plaatje wrote:
>
>> Hi Markus,
>>
>> Why do you think it's not deleting amyrhing,?
>>
>> Thanks,
>> Patrick
>> Op 22 okt. 2012 08:36 schreef "Markus.Mirsberger" <
>> markus.mirsber...@gmx.de>
>> het volgende:
>>
>>  Hi,
>>>
>>> I am trying to delete a some documents in my index by query.
>>> When I just select them with this negated query, I get all the documents
>>> I
>>> want to delete but when I use this query in the DeleteByQuery it is not
>>> working
>>> Im trying to delete all elements which value ends with 'somename/' 
>>> When I use this for selection it works and I get exactly the right
>>> documents (about 10.000. so too many to delete one by one:) )
>>>
>>> curl http://:8080/solr/core/update/?commit=true -H
>>> "Content-Type: text/xml" --data-binary '-**
>>> field:*somename/';
>>>
>>> And here the response:
>>> 
>>> 
>>> 0>> name="QTime">11091
>>> 
>>>
>>> I tried to perform it in the browser too by using /update?stream.body
>>>  ...
>>> but the result is the same.
>>> And no Error in the Solr-Log.
>>>
>>> I hope someone can help me ... I dont want do this manually :)
>>>
>>> Regards,
>>> Markus
>>>
>>>
>

Re: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Patrick Jungermann

Hi Bern,

the problem is the character sequence "--". A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:
> Sorry, the last line was truncated -
> 
> HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
> '(Asia -- Civilization AND status_i:(2)) ': Encountered "-" at line 1, column 
> 7. Was expecting one of: "(" ... "*" ...  ...  ...  
> ...  ... "[" ... "{" ...  ...
> 
> -Original Message-
> From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
> Sent: Friday, 9 October 2009 8:22 AM
> To: 'solr-user@lucene.apache.org'
> Subject: RE: Problems with WordDelimiterFilterFactory
> 
> Here's the query and the error - 
> 
> Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization 
> AND status_i:(2)) 
> Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
> Oct 09 08:20:17  [error] Error on searching: "400" Status: 
> org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
> Civilization AND status_i:(2)) ': Encount
> 
> Bern
> 
> -Original Message-
> From: Christian Zambrano [mailto:czamb...@gmail.com] 
> Sent: Thursday, 8 October 2009 12:48 PM
> To: solr-user@lucene.apache.org
> Cc: solr-user@lucene.apache.org
> Subject: Re: Problems with WordDelimiterFilterFactory
> 
> Bern,
> 
> I am interested on the solr query. In other words, the query that your  
> system sends to solr.
> 
> Thanks,
> 
> 
> Christian
> 
> On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
>   > wrote:
> 
>> Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601
>>
>> Either scroll down and click one of the "television broadcasting --  
>> asia" links, or type it in the Quick Search box.
>>
>>
>> TIA
>>
>> bern
>>
>> -Original Message-
>> From: Christian Zambrano [mailto:czamb...@gmail.com]
>> Sent: Thursday, 8 October 2009 9:43 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Problems with WordDelimiterFilterFactory
>>
>> Could you please provide the exact URL of a query where you are
>> experiencing this problem?
>> eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"
>>
>> On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
>>> We are having some issues with our solr parent application not  
>>> retrieving records as expected.
>>>
>>> For example, if the input query includes a colon (e.g. hot and  
>>> cold: temperatures), the relevant record (which contains a colon in  
>>> the same place) does not get retrieved; if the input query does not  
>>> include the colon, all is fine.  Ditto if the user searches for a  
>>> query containing hyphens, e.g. "asia - civilization, although with  
>>> the qualifier that something like "asia-civilization" (no spaces  
>>> either side of the hyphen) works fine, whereas "asia -  
>>> civilization" (spaces either side of hyphen) doesn't work.
>>>
>>> Our schema.xml contains the following -
>>>
>>> >> positionIncrementGap="100">
>>>   
>>> 
>>> 
>>> >> class="solr.ISOLatin1AccentFilterFactory"/>
>>> >> words="stopwords.txt"/>
>>> >> generateWordParts="1" generateNumberParts="1" catenateWords="1"  
>>> catenateNumbers="1" catenateAll="0"/>
>>> 
>>> >> protected="protwords.txt"/>
>>> 
>>>   
>>>   
>>> 
>>> >> class="solr.ISOLatin1AccentFilterFactory"/>
>>> >> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>> >> words="stopwords.txt"/>
>>> >> generateWordParts="1" generateNumberParts="1" catenateWords="0"  
>>> catenateNumbers="0" catenateAll="0"/>
>>> 
>>> >> protec

multi-word synonyms and analysis.jsp vs real field analysis (query, index)

2009-10-08 Thread Patrick Jungermann

Hi list,

I worked on a field type and its analyzing chain, at which I want to use
the SynonymFilter with entries similar to:

foo bar=>foo_bar

During the analysis phase, I used the /admin/analysis.jsp view to test
the analyzing results produced by the created field type. The output
shows that a query "foo bar" will first be separated by the
WhitespaceTokenizer to the two tokens "foo" and "bar", and that the
SynonymFilter will replace the both tokens with "foo_bar". But as I
tried this at "real" query time with the request handler "standard" and
also with "dismax", the tokens "foo" and "bar" were not replaced. The
parsedQueryString was something similar to "field:foo field:bar". At
index time, it works like expected.

Has anybody experienced this and/or knows a workaround, a solution for it?


Thanks, Patrick

Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

2009-10-09 Thread Patrick Jungermann

Hi Koji,

using phrase queries is no alternative for us, because all query parts
has to be optional parts. The phrase query workaround will work for a
query "foo bar", but only for this exact query. If the user queries for
"foo bar baz", it will be changed to "foo_bar baz", but it will not
match the indexed documents that only contains "foo_bar". And this is,
what we need here.

The cause of my problem should be the query parsing, but I don't know,
if there is any solution for it. I need a possibility that works like
the analysis/query parsing within /admin/analysis.jsp view.

Patrick

Koji Sekiguchi schrieb:
> Patrick,
> 
>> parsedQueryString was something similar to "field:foo field:bar". At
>> index time, it works like expected.
> 
> I guess because you are searching q=foo bar, this causes OR query.
> Use q="foo bar", instead.
> 
> Koji
> 
> 
> Patrick Jungermann wrote:
>> Hi list,
>>
>> I worked on a field type and its analyzing chain, at which I want to use
>> the SynonymFilter with entries similar to:
>>
>> foo bar=>foo_bar
>>
>> During the analysis phase, I used the /admin/analysis.jsp view to test
>> the analyzing results produced by the created field type. The output
>> shows that a query "foo bar" will first be separated by the
>> WhitespaceTokenizer to the two tokens "foo" and "bar", and that the
>> SynonymFilter will replace the both tokens with "foo_bar". But as I
>> tried this at "real" query time with the request handler "standard" and
>> also with "dismax", the tokens "foo" and "bar" were not replaced. The
>> parsedQueryString was something similar to "field:foo field:bar". At
>> index time, it works like expected.
>>
>> Has anybody experienced this and/or knows a workaround, a solution for
>> it?
>>
>>
>> Thanks, Patrick
>>
>>
>>
>>
>>
>>
>>   
>

Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

2009-10-09 Thread Patrick Jungermann

Hi Chantal,

yes, I'm using the SynonymFilter at index and query chain. Using it only
at query time or only at index time was part of former considerations,
but both don't fit all of our requirements.

But as I wrote in my first mail, it works only within the
/admin/analysis.jsp view and not at "real" query time.


Patrick


Chantal Ackermann schrieb:
> Hi Patrick,
> 
> have you added that SynonymFilter to the index chain and the query
> chain? You have to add it to both if you want to have it replaced at
> index and query time. It might also be enough to add it to the query
> chain only. Than your index still preserves the original data.
> 
> Cheers,
> Chantal
> 
> Patrick Jungermann schrieb:
>> Hi Koji,
>>
>> using phrase queries is no alternative for us, because all query parts
>> has to be optional parts. The phrase query workaround will work for a
>> query "foo bar", but only for this exact query. If the user queries for
>> "foo bar baz", it will be changed to "foo_bar baz", but it will not
>> match the indexed documents that only contains "foo_bar". And this is,
>> what we need here.
>>
>> The cause of my problem should be the query parsing, but I don't know,
>> if there is any solution for it. I need a possibility that works like
>> the analysis/query parsing within /admin/analysis.jsp view.
>>
>>
>> Patrick
>>
>>
>>
>> Koji Sekiguchi schrieb:
>>> Patrick,
>>>
>>>> parsedQueryString was something similar to "field:foo field:bar". At
>>>> index time, it works like expected.
>>> I guess because you are searching q=foo bar, this causes OR query.
>>> Use q="foo bar", instead.
>>>
>>> Koji
>>>
>>>
>>> Patrick Jungermann wrote:
>>>> Hi list,
>>>>
>>>> I worked on a field type and its analyzing chain, at which I want to
>>>> use
>>>> the SynonymFilter with entries similar to:
>>>>
>>>> foo bar=>foo_bar
>>>>
>>>> During the analysis phase, I used the /admin/analysis.jsp view to test
>>>> the analyzing results produced by the created field type. The output
>>>> shows that a query "foo bar" will first be separated by the
>>>> WhitespaceTokenizer to the two tokens "foo" and "bar", and that the
>>>> SynonymFilter will replace the both tokens with "foo_bar". But as I
>>>> tried this at "real" query time with the request handler "standard" and
>>>> also with "dismax", the tokens "foo" and "bar" were not replaced. The
>>>> parsedQueryString was something similar to "field:foo field:bar". At
>>>> index time, it works like expected.
>>>>
>>>> Has anybody experienced this and/or knows a workaround, a solution for
>>>> it?
>>>>
>>>>
>>>> Thanks, Patrick
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>

query highlighting

2009-10-09 Thread Patrick Jungermann

Hi list,

is there any possibility to get highlighting also for the query string?

Example:
Query: fooo bar
Tokens after query analysis: foo[0,4], bar[5,8]

Token "foo" matches a token of one of the queried fields.
-> Query higlighting: "fooo"


Thanks, Patrick

Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

2009-10-09 Thread Patrick Jungermann

Hi Koji,

the problem is, that this doesn't fit all of our requirements. We have
some Solr documents that must not be matched by "foo" or "bar" but by
"foo bar" as part of the query. Also, we have some other documents that
could be matched by "foo" and "foo bar" or "bar" and "foo bar".

The best way to handle this, seems to be by using synonyms that allows
the precise configuration of this and that could be managed by an
editorial staff.

Besides, foo bar=>foo_bar works at anything (index time, analysis.jsp)
but query time.


Patrick


Koji Sekiguchi schrieb:
> Hi Patrick,
> 
> Why don't you define:
> 
> foo bar, foo_bar (and expand="true")
> 
> instead of:
> 
> foo bar=>foo_bar
> 
> in only indexing side? Doesn't it make a change for the better?
> 
> Koji
> 
> 
> Patrick Jungermann wrote:
>> Hi Koji,
>>
>> using phrase queries is no alternative for us, because all query parts
>> has to be optional parts. The phrase query workaround will work for a
>> query "foo bar", but only for this exact query. If the user queries for
>> "foo bar baz", it will be changed to "foo_bar baz", but it will not
>> match the indexed documents that only contains "foo_bar". And this is,
>> what we need here.
>>
>> The cause of my problem should be the query parsing, but I don't know,
>> if there is any solution for it. I need a possibility that works like
>> the analysis/query parsing within /admin/analysis.jsp view.
>>
>>
>> Patrick
>>
>>
>>
>> Koji Sekiguchi schrieb:
>>  
>>> Patrick,
>>>
>>>
>>>> parsedQueryString was something similar to "field:foo field:bar". At
>>>> index time, it works like expected.
>>>>   
>>> I guess because you are searching q=foo bar, this causes OR query.
>>> Use q="foo bar", instead.
>>>
>>> Koji
>>>
>>>
>>> Patrick Jungermann wrote:
>>>
>>>> Hi list,
>>>>
>>>> I worked on a field type and its analyzing chain, at which I want to
>>>> use
>>>> the SynonymFilter with entries similar to:
>>>>
>>>> foo bar=>foo_bar
>>>>
>>>> During the analysis phase, I used the /admin/analysis.jsp view to test
>>>> the analyzing results produced by the created field type. The output
>>>> shows that a query "foo bar" will first be separated by the
>>>> WhitespaceTokenizer to the two tokens "foo" and "bar", and that the
>>>> SynonymFilter will replace the both tokens with "foo_bar". But as I
>>>> tried this at "real" query time with the request handler "standard" and
>>>> also with "dismax", the tokens "foo" and "bar" were not replaced. The
>>>> parsedQueryString was something similar to "field:foo field:bar". At
>>>> index time, it works like expected.
>>>>
>>>> Has anybody experienced this and/or knows a workaround, a solution for
>>>> it?
>>>>
>>>>
>>>> Thanks, Patrick
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 
>>
>>
>>   
>

Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

2009-10-20 Thread Patrick Jungermann

Thanks Hoss,

after your hints that had partially confirmed my considerations, I had
made some tests with the FieldQParser. At the beginning, I had have some
problems, but finally, I was able to solve the problem of multi-word
synonyms at query time in a way that is suitable for us - and possibly
for others, too.

At my solution, I re-used the FieldQParserPlugin. At first, I ported it
to the new API (incrementToken instead of next, etc.) and then I
modified the code so, that no PhraseQueries will be created but only
BooleanQueries.

Now with my new QParserPlugin that based on the FieldQParserPlugin, it's
possible to search for things like "foo bar baz", where "foo bar" has to
be changed to "foo_bar" and where at the end the tokens "foo_bar" und
"baz" will be created, so that both could match independently.


Patrick



Chris Hostetter schrieb:
> : The cause of my problem should be the query parsing, but I don't know,
> : if there is any solution for it. I need a possibility that works like
> : the analysis/query parsing within /admin/analysis.jsp view.
> 
> The behavior you are describing is very well documented on the wiki...
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> 
> in general, QueryParsers parse input strigs according to their 
> parsing rules, then send each component of th input string to the 
> analyzer.  this is a fundentmal behavior, w/o it the query parser would 
> have no way of knowing when to make a phrase query, or a term query, or 
> which field to use.
> 
> You may find something like the FieldQParserPlugin helpful as it has *no* 
> markup of it's own, it just hands the string off to an analyzer based on 
> the specified field ... but it will still generate a phrase query when a 
> single piece of input generates multiple tokens with non-zero offsets from 
> eachother, which also confuses people sometimes (not sure if that's what 
> you'd want)
> 
> : >> SynonymFilter will replace the both tokens with "foo_bar". But as I
> : >> tried this at "real" query time with the request handler "standard" and
> 
> you've used the phrase '"real" query time' (in contrast to analysis.jsp) a 
> few times in this thread ... to be clear about something: there is nothing 
> different between analysis.jsp and what happens when a query is executed, 
> the reason you see different behavior is because you are pasteeing what 
> you consider a "query string" into the analysis form, but that's not what 
> happens at query time, and it's not what that form expects -- that form is 
> designed for users to paste in the strings that the query parser would 
> extract from it's query syntax.  it's not suprising that you'll get 
> something different then if you just did a straight search on the same 
> input, any different then it would be suprising if pasting 
> "fieldname:value +otherfield:value" in analysis.jsp didn't produce the 
> same tokens as a query for that string.
> 
> 
> -Hoss
> 
> From - Fri

Re: Solrj Javabin and JSON

2009-10-25 Thread Patrick Jungermann

Hi Stefan,

you don't need to convert the Java objects built from the result
returned as Javabin. Instead of this, you could easily use the JSON
return format by setting "wt=json". See also at [0] for more information
about this.


Patrick


[0] http://wiki.apache.org/solr/SolJSON


SGE0 schrieb:
> Hi Paul,
> 
> 
> fair enough. Is this included in the Solrj package ? Any examples how to do
> this ?
> 
> 
> Stefan
> 
> 
> 
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>> There is no point converting javabin to json. javabin is in
>> intermediate format it is converted to the java objects as soon as
>> comes. You just need means to convert the java object to json.
>>
>>
>>
>> On Sat, Oct 24, 2009 at 12:10 PM, SGE0  wrote:
>>> Hi,
>>>
>>> did anyone write a Javabin to JSON convertor and is willing to share this
>>> ?
>>>
>>> In our servlet we use a CommonsHttpSolrServer instance to execute a
>>> query.
>>>
>>> The problem is that is returns Javabin format and we need to send the
>>> result
>>> back to the browser using JSON format.
>>>
>>> And no, the browser is not allowed to directly query Lucene with the
>>> wt=json
>>> format.
>>>
>>> Regards,
>>>
>>> S.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solrj-Javabin-and-JSON-tp26036551p26036551.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> -- 
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>>
>

Re: Multi-Term Synonyms

2009-11-26 Thread Patrick Jungermann

Hi Brad,

I was trying this, too, and there is a possibility how to get multi-term
synonyms to work properly. I wrote my solution already on this list.

My solution was as follows:

[cite]
after your hints that had partially confirmed my considerations, I had
made some tests with the FieldQParser. At the beginning, I had have some
problems, but finally, I was able to solve the problem of multi-word
synonyms at query time in a way that is suitable for us - and possibly
for others, too.

At my solution, I re-used the FieldQParserPlugin. At first, I ported it
to the new API (incrementToken instead of next, etc.) and then I
modified the code so, that no PhraseQueries will be created but only
BooleanQueries.

Now with my new QParserPlugin that based on the FieldQParserPlugin, it's
possible to search for things like "foo bar baz", where "foo bar" has to
be changed to "foo_bar" and where at the end the tokens "foo_bar" und
"baz" will be created, so that both could match independently.
[/cite]

Our current version is re-worked again, so that also multi-field queries
are possible.

If you want to use such a solution, you have probably to go without
complex query parsing et cetera. I also have to write your own modified
QParser, that fit your special needs. Also some higher features, like
they are offered by other QParsers could be integrated. It's all up to
you and your needs.

Patrick

brad anderson schrieb:
> Thanks for the help. Can't believe I missed that part in the wiki.
> 
> 2009/11/24 Tom Hill 
> 
>> Hi Brad,
>>
>>
>> I suspect that this section from the wiki for SynonymFilterFactory might be
>> relevant:
>>
>>
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>
>> *"Keep in mind that while the SynonymFilter will happily work with synonyms
>> containing multiple words (ie: "**sea biscuit, sea biscit, seabiscuit**")
>> The recommended approach for dealing with synonyms like this, is to expand
>> the synonym when indexing. This is because there are two potential issues
>> that can arrise at query time:*
>>
>>   1.
>>
>>   *The Lucene QueryParser tokenizes on white space before giving any text
>>   to the Analyzer, so if a person searches for the words **sea biscit** the
>>   analyzer will be given the words "sea" and "biscit" seperately, and will
>> not
>>   know that they match a synonym."*
>>
>>   ...
>>
>> Tom
>>
>> On Tue, Nov 24, 2009 at 10:47 AM, brad anderson >> wrote:
>>> Hi Folks,
>>>
>>> I was trying to get multi term synonyms to work. I'm experiencing some
>>> strange behavior and would like some feedback.
>>>
>>> In the synonyms file I have the line:
>>>
>>> thomas, boll holly, thomas a, john q => tom
>>>
>>> And I have a document with the text field as;
>>>
>>> tom
>>>
>>> However, when I do a search on boll holly, it does not return the
>> document
>>> with tom. The same thing happens if I do a query on john q. But if I do a
>>> query on thomas, it gives me the document. Also, if I quote "boll holly"
>> or
>>> "john q" it gives back the document.
>>>
>>> When I look at the analyzer page on the solr admin page, it is
>> transforming
>>> "boll holly" to "tom" when it isn't quoted. Why is it that it is not
>>> returning the document? Is there some configuration I can make so it does
>>> return the document if I do an unquoted search on "boll holly"?
>>>
>>> My synonym filter is defined as follows, and is only defined on the query
>>> side:
>>>
>>> >> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>>
>>>
>>> I've also tried changing the synonym file to be
>>>
>>> tom, thomas, boll holly, thomas a, john q
>>>
>>> This produces the same results.
>>>
>>> Thanks,
>>> Brad
>>>
>

Re: synonyms

2009-12-16 Thread Patrick Jungermann

Hello Peter,

by using the existing SynonymFilterFactory, it is not possible to use a
database instead of a text file. This file will be read at startup and
the internal synonym catalogue (SynonymMap) will be created.

You could create your own filter factory that could create the needed
synonym catalogue by using a database. Look into the
SynonymFilterFactory and the SynonymFilter and you could get this to work.

As another possibility, you could create the needed synonym text file by
  a script or something else, before the startup of Solr server. This
could probably be the easiest way.


-Patrick


Peter A. Kirk schrieb:
> Hi
> 
> 
> 
> It appears that Solr reads a synonym list at startup from a text file.
> 
> Is it possible to alter this behaviour so that Solr obtains the synonym list 
> from a database instead?
> 
> 
> 
> Thanks,
> 
> Peter
> 
>

Re: Huge load and long response times during search

2009-12-18 Thread Patrick Sauts

Try solr.FastLRUCache instead of solr.LRUCache it's the new cache 
gesture for solr 1.4.


And maybe true in main index section or 
diminish mergefactor

see http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

Tomasz Kępski a écrit :

Hi,

I'm using SOLR(1.4) to search among about 3,500,000 documents. After 
the server kernel was updated to 64bit system has started to suffer.

Our server has 8G of RAM and double Intel Core 2 DUO.
We used to have average loads around 2-2,5. It was not as good as it 
should but as long HTTP response times was acceptable we do not care 
to much ;-)


Since few days avg loads are usually around 6, sometimes goes even to 
20. PHP, Mysql and Postgresql based application is rather fine, but 
when tries to access SOLR it takes ages to load page. In top java 
process (Jetty) takes 200-250% of CPU, iotop shows that most of the 
disk operations are done by SOLR threads as well.


When we do shut down Jetty load goes down to 1,5 or even less than 1.

My index has ~12G below is a part of my solrconf.xml:


   1024
   
   
   
   true
   true
   40
   200
   
   
 
solr 0 name="rows">10 
solr price 
0 10 
solr name="sort">rekomendacja 0 name="rows">10 
   static newSearcher warming query from 
solrconfig.xml

 
   
   
 
fast_warm 0 
10 
   static firstSearcher warming query from 
solrconfig.xml

 
   
   false


 
   
dismax
explicit
0.01

   name^90.0 scategory^450.0 brand^90.0 text^0.01 description^30






   brand,description,id,name,price,score


   4<100% 5<90%

100
*:*
   
 

sample query parameters from log looks like this:

2009-11-20 21:07:15 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select 
params={spellcheck=true&wt=json&rows=20&json.nl=map&start=520&facet=true&spellcheck.collate=true&fl=id,name,description,preparation,url,shop_id&q=camera&qt=dismax&version=1.3&hl.fl=name,description,atributes,brand,url&facet.field=shop_id&facet.field=brand&hl.fragsize=200&spellcheck.count=5&hl.snippets=3&hl=true} 
hits=3784 status=0 QTime=83

2009-11-20 21:07:15 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/spellCheckCompRH 
params={spellcheck=true&wt=json&rows=20&json.nl=map&start=520&facet=true&spellcheck.collate=true&fl=id,name,description,preparation,url,shop_id&q=camera&qt=dismax&version=1.3&hl.fl=name,description,atributes,brand,url&facet.field=shop_id&facet.field=brand&hl.fragsize=200&spellcheck.count=5&hl.snippets=3&hl=true} 
hits=3784 status=0 QTime=16


And at last the question ;-)
How to speed up the search?
Which parameters should I check first to find out what is the bottleneck?

Sorry for verbose entry but I would like to give as clear point of 
view as possible


Thanks in advance,
Tom

Keyword extraction

2008-11-25 Thread Plaatje, Patrick

Hi all,

Strugling with a question I recently got from a collegue: is it possible
to extract keywords from indexed content?

In my opinion it should be possible to find out on what words the
ranking of the indexed content is the highest (Lucene or Solr), but have
no clue where to begin. Anyone having suggestions?

Best,

Patrick

RE: Keyword extraction

2008-11-26 Thread Plaatje, Patrick

Hi All,
 
as an addition to my previous post, no interestingTerms are returned
when i execute the folowing url:
 
http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.interes
tingTerms=list&mlt=true&mlt.match.include=true
 
I get a moreLikeThis list though, any thoughts?
 
Best,
 
Patrick

RE: Keyword extraction

2008-11-26 Thread Plaatje, Patrick

Hi Aleksander,

Thanx for clearing this up. I am confident that this is a way to explore for me 
as I'm just starting to grasp the matter. Do you know why I'm not getting any 
results with the query posted earlier then? It gives me the folowing only:

Instead of delivering details of the interestingTerms.

Thanks in advance

Patrick

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] 
Sent: woensdag 26 november 2008 13:03
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

I do not agree with you at all. The concept of MoreLikeThis is based on the 
fundamental idea of TF-IDF weighting, and not term frequency alone.
Please take a look at:  
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html
As you can see, it is possible to use cut-off thresholds to significantly 
reduce the number of unimportant terms, and generate highly suitable queries 
based on the tf-idf frequency of the term, since as you point out, high 
frequency terms alone tends to be useless for querying, but taking the document 
frequency into account drastically increases the importance of the term!

In solr, use parameters to manipulate your desired results:  
http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c
For instance:
mlt.mintf - Minimum Term Frequency - the frequency below which terms will be 
ignored in the source doc.
mlt.mindf - Minimum Document Frequency - the frequency at which words will be 
ignored which do not occur in at least this many docs.
You can also set thresholds for term length etc.

Hope this gives you a better idea of things.
- Aleks

On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]>
wrote:

> Dear Partick, I had the same problem with MoreLikeThis function.
>
> After  briefly reading and analyzing the source code of moreLikeThis 
> function in solr, I conducted:
>
> MoreLikeThis uses term vectors to ranks all the terms from a document 
> by its frequency. According to its ranking, it will start to generate 
> queries, artificially, and search for documents.
>
> So, moreLikeThis will retrieve related documents by artificially 
> generating queries based on most frequent terms.
>
> There's a big problem with "most frequent terms"  from documents. Most 
> frequent words are usually meaningless, or so called function words, 
> or, people from Information Retrieval like to call them stopwords. 
> However, ignoring  technical problems of implementation of 
> moreLikeThis function, this approach is very dangerous, since queries 
> are generated artificially based on a given document.
> Writting queries for retrieving a document is a human task, and it 
> assumes some knowledge (user knows what document he wants).
>
> I advice to use others approaches, depending on your expectation. For 
> example, you can extract similar documents just by searching for 
> documents with similar title (more like this doesn't work in this case).
>
> I hope it helps,
> Best Regards,
> Vitalie Scurtu
> --- On Wed, 11/26/08, Plaatje, Patrick <[EMAIL PROTECTED]>
> wrote:
> From: Plaatje, Patrick <[EMAIL PROTECTED]>
> Subject: RE:  Keyword extraction
> To: solr-user@lucene.apache.org
> Date: Wednesday, November 26, 2008, 10:52 AM
>
> Hi All,
> as an addition to my previous post, no interestingTerms are returned 
> when i execute the folowing url:
> http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.inter
> es tingTerms=list&mlt=true&mlt.match.include=true
> I get a moreLikeThis list though, any thoughts?
> Best,
> Patrick
>
>
>
>

--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

RE: Keyword extraction

2008-11-26 Thread Plaatje, Patrick

Hi Aleksander,

This was a typo on my end, the original query included a semicolon instead of 
an equal sign. But I think it has to do with my field not being stored and not 
being identified as termVectors="true". I'm recreating the index now, and see 
if this fixes the problem.

Best,

patrick

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] 
Sent: woensdag 26 november 2008 14:37
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

Hi there!
Well, first of all i think you have an error in your query, if I'm not mistaken.
You say http://localhost:8080/solr/select/?q=id=18477975...
but since you are referring to the field called "id", you must say:
http://localhost:8080/solr/select/?q=id:18477975...
(use colon instead of the equals sign).
I think that will do the trick.
If not, try adding the &debugQuery=on at the end of your request url, to see 
debug output on how the query is parsed and if/how any documents are matched 
against your query.
Hope this helps.

Cheers,
  Aleksander



On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick <[EMAIL PROTECTED]> wrote:

> Hi Aleksander,
>
> Thanx for clearing this up. I am confident that this is a way to 
> explore for me as I'm just starting to grasp the matter. Do you know 
> why I'm not getting any results with the query posted earlier then? It 
> gives me the folowing only:
>
> 
>
>
> Instead of delivering details of the interestingTerms.
>
> Thanks in advance
>
> Patrick
>
>
> -Original Message-
> From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
> Sent: woensdag 26 november 2008 13:03
> To: solr-user@lucene.apache.org
> Subject: Re: Keyword extraction
>
> I do not agree with you at all. The concept of MoreLikeThis is based 
> on the fundamental idea of TF-IDF weighting, and not term frequency alone.
> Please take a look at:
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/simil
> ar/MoreLikeThis.html As you can see, it is possible to use cut-off 
> thresholds to significantly reduce the number of unimportant terms, 
> and generate highly suitable queries based on the tf-idf frequency of 
> the term, since as you point out, high frequency terms alone tends to 
> be useless for querying, but taking the document frequency into 
> account drastically increases the importance of the term!
>
> In solr, use parameters to manipulate your desired results:
> http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e2
> 2ec5d1519c456b2c
> For instance:
> mlt.mintf - Minimum Term Frequency - the frequency below which terms 
> will be ignored in the source doc.
> mlt.mindf - Minimum Document Frequency - the frequency at which words 
> will be ignored which do not occur in at least this many docs.
> You can also set thresholds for term length etc.
>
> Hope this gives you a better idea of things.
> - Aleks
>
> On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]>
> wrote:
>
>> Dear Partick, I had the same problem with MoreLikeThis function.
>>
>> After  briefly reading and analyzing the source code of moreLikeThis 
>> function in solr, I conducted:
>>
>> MoreLikeThis uses term vectors to ranks all the terms from a document 
>> by its frequency. According to its ranking, it will start to generate 
>> queries, artificially, and search for documents.
>>
>> So, moreLikeThis will retrieve related documents by artificially 
>> generating queries based on most frequent terms.
>>
>> There's a big problem with "most frequent terms"  from documents. 
>> Most frequent words are usually meaningless, or so called function 
>> words, or, people from Information Retrieval like to call them stopwords.
>> However, ignoring  technical problems of implementation of 
>> moreLikeThis function, this approach is very dangerous, since queries 
>> are generated artificially based on a given document.
>> Writting queries for retrieving a document is a human task, and it 
>> assumes some knowledge (user knows what document he wants).
>>
>> I advice to use others approaches, depending on your expectation. For 
>> example, you can extract similar documents just by searching for 
>> documents with similar title (more like this doesn't work in this case).
>>
>> I hope it helps,
>> Best Regards,
>> Vitalie Scurtu
>> --- On Wed, 11/26/08, Plaatje, Patrick 
>> <[EMAIL PROTECTED]>
>> wrote:
>> From: Plaatje, Patrick <[EMAIL PROTECTED]>
>> Subject: RE:  Keyword extraction
>> To: solr-user@lucene.apache.org
>> Date: Wednesday, November 26, 2008, 10:52 AM
>

RE: Keyword extraction

2008-11-27 Thread Plaatje, Patrick

Hi Aleksander,

With all the help of you and the other comments, we're now at a point where a 
MoreLikeThis list is returned, and shows 10 related records. However on the 
query executed there are no keywords whatsoever being returned. Is the 
querystring still wrong or is something else required?

The querystring we're currently executing is:

http://suempnr3:8080/solr/select/?q=amsterdam&mlt.fl=text&mlt.displayTerms=list&mlt=true


Best,

Patrick 

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] 
Sent: woensdag 26 november 2008 15:07
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

Ah, yes, That is important. In lucene, the MLT will see if the term vector is 
stored, and if it is not it will still be able to perform the querying, but in 
a much much much less efficient way.. Lucene will analyze the document (and the 
variable DEFAULT_MAX_NUM_TOKENS_PARSED will be used to limit the number of 
tokens that will be parsed). (don't want to go into details on this since I 
haven't really dug through the code:p) But when the field isn't stored either, 
it is rather difficult to re-analyze the
document;)

On a general note, if you want to "really" understand how the MLT works, take a 
look at the wiki or read this thorough blog post:  
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/

Regards,
  Aleksander

On Wed, 26 Nov 2008 14:41:52 +0100, Plaatje, Patrick <[EMAIL PROTECTED]> wrote:

> Hi Aleksander,
>
> This was a typo on my end, the original query included a semicolon 
> instead of an equal sign. But I think it has to do with my field not 
> being stored and not being identified as termVectors="true". I'm 
> recreating the index now, and see if this fixes the problem.
>
> Best,
>
> patrick
>
> -Original Message-
> From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
> Sent: woensdag 26 november 2008 14:37
> To: solr-user@lucene.apache.org
> Subject: Re: Keyword extraction
>
> Hi there!
> Well, first of all i think you have an error in your query, if I'm not 
> mistaken.
> You say http://localhost:8080/solr/select/?q=id=18477975...
> but since you are referring to the field called "id", you must say:
> http://localhost:8080/solr/select/?q=id:18477975...
> (use colon instead of the equals sign).
> I think that will do the trick.
> If not, try adding the &debugQuery=on at the end of your request url, 
> to see debug output on how the query is parsed and if/how any 
> documents are matched against your query.
> Hope this helps.
>
> Cheers,
>   Aleksander
>
>
>
> On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick 
> <[EMAIL PROTECTED]> wrote:
>
>> Hi Aleksander,
>>
>> Thanx for clearing this up. I am confident that this is a way to 
>> explore for me as I'm just starting to grasp the matter. Do you know 
>> why I'm not getting any results with the query posted earlier then? 
>> It gives me the folowing only:
>>
>> 
>>   
>>
>> Instead of delivering details of the interestingTerms.
>>
>> Thanks in advance
>>
>> Patrick
>>
>>
>> -Original Message-
>> From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
>> Sent: woensdag 26 november 2008 13:03
>> To: solr-user@lucene.apache.org
>> Subject: Re: Keyword extraction
>>
>> I do not agree with you at all. The concept of MoreLikeThis is based 
>> on the fundamental idea of TF-IDF weighting, and not term frequency 
>> alone.
>> Please take a look at:
>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/simi
>> l ar/MoreLikeThis.html As you can see, it is possible to use cut-off 
>> thresholds to significantly reduce the number of unimportant terms, 
>> and generate highly suitable queries based on the tf-idf frequency of 
>> the term, since as you point out, high frequency terms alone tends to 
>> be useless for querying, but taking the document frequency into 
>> account drastically increases the importance of the term!
>>
>> In solr, use parameters to manipulate your desired results:
>> http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e
>> 2
>> 2ec5d1519c456b2c
>> For instance:
>> mlt.mintf - Minimum Term Frequency - the frequency below which terms 
>> will be ignored in the source doc.
>> mlt.mindf - Minimum Document Frequency - the frequency at which words 
>> will be ignored which do not occur in at least this many docs.
>> You can also set thresholds for term length etc.
>>
>> Hope this gives you a better idea of things.
>> - Aleks
>>

RE: php client. json communication

2008-12-16 Thread Plaatje, Patrick

Or have a look at the Wiki, probably a better way to start:

http://wiki.apache.org/solr/SolPHP

Best,

Patrick

--
Just trying to help 
http://www.ipros.nl/
--

-Original Message-
From: KishoreVeleti CoreObjects [mailto:kisho...@coreobjects.com] 
Sent: dinsdag 16 december 2008 15:14
To: solr-user@lucene.apache.org
Subject: Re: php client. json communication


Check out this link
http://www.ibm.com/developerworks/library/os-php-apachesolr/index.html

If anyone of you used it can you share your experiences.

Thanks,
Kishore Veleti A.V.K.


Julian Davchev wrote:
> 
> Hi,
> I am about to integrate solr for index/search of my documents/data. 
> It's php application but I see it should be no problem as solr works 
> with xml by default.
> Is there any read php lib that will ease/help whole communication with

> solr and if possible to send/receive json data.
> 
> I looked up archive list and seems not many discussions in php. Also 
> from manual it seems that it can only get json response but request 
> should always be xml.
> Cheers,
> 
> 

--
View this message in context:
http://www.nabble.com/php-client.-json-communication-tp21033573p21033806
.html
Sent from the Solr - User mailing list archive at Nabble.com.

Using DIH, getting exception

2008-12-16 Thread Plaatje, Patrick

Hi All,

I'm trying to use the Data import handler, with the data config below
(snippet):




The variables are all good (userrname+password, etc), but I'm getting
the following exception, any thoughts?

org.apache.solr.handler.dataimport.DataImportHandlerException: No
dataSource :null available for entity :item Processing Document #


Best,

Patrick

RE: checkout 1.4 snapshot

2008-12-16 Thread Plaatje, Patrick

Hi,

You can find the SVN repository here:
http://www.apache.org/dev/version-control.html#anon-svn

I'm not sure if this represent the 1.4 version, but as being the trunk
it's the latest version.

Best,

Patrick


-Original Message-
From: roberto [mailto:miles.c...@gmail.com] 
Sent: dinsdag 16 december 2008 22:13
To: solr-user@lucene.apache.org
Subject: checkout 1.4 snapshot

Hello,

Someone could tell me how can i  checkout the 1.4 snapshot ?

thanks,


--
"Without love, we are birds with broken wings."
Morrie

RE: checkout 1.4 snapshot

2008-12-16 Thread Plaatje, Patrick

Sorry all,

Wrong url in the post, right url should be:

http://svn.apache.org/repos/asf/lucene/solr/

Best,

Patrick

 

-Original Message-
From: Plaatje, Patrick [mailto:patrick.plaa...@getronics.com] 
Sent: dinsdag 16 december 2008 22:19
To: solr-user@lucene.apache.org
Subject: RE: checkout 1.4 snapshot

Hi,

You can find the SVN repository here:
http://www.apache.org/dev/version-control.html#anon-svn

I'm not sure if this represent the 1.4 version, but as being the trunk
it's the latest version.

Best,

Patrick


-Original Message-
From: roberto [mailto:miles.c...@gmail.com]
Sent: dinsdag 16 december 2008 22:13
To: solr-user@lucene.apache.org
Subject: checkout 1.4 snapshot

Hello,

Someone could tell me how can i  checkout the 1.4 snapshot ?

thanks,


--
"Without love, we are birds with broken wings."
Morrie

RE: php client. json communication

2008-12-16 Thread Plaatje, Patrick

Glad that's sorted. On the other issue (directly accessing solr from any
client) I think I saw a discussion on the list earlier, but I don't know
what the result was, browse through the archives and look for something
about security (I think).

Best,

patrick 

-Original Message-
From: Julian Davchev [mailto:j...@drun.net] 
Sent: dinsdag 16 december 2008 23:02
To: solr-user@lucene.apache.org
Subject: Re: php client. json communication

I think I got it now. Search request is actually just simple url with
few params...no json or xml or fancy stuff needed.
I was concerned with this cause I need to use solr with javascript
directly, bypassing application and directly searching stuff.

Plaatje, Patrick wrote:
> Hi Julian,
>
> I'm a bit confused. The indexing is indeed being done through XML, but

> in searching it is possible to get JSON results by using the wt=json 
> parameter, have a look here:
>
> http://wiki.apache.org/solr/SolJSON
>
> Best,
>
> Patrick
>
>
> -Original Message-
> From: Julian Davchev [mailto:j...@drun.net]
> Sent: dinsdag 16 december 2008 22:39
> To: solr-user@lucene.apache.org
> Subject: Re: php client. json communication
>
> Hi,
> 1. Thanks for links, I looked at both. Still I think that solr or 
> at least those php clients doesn't support jason as input.
> It's clear that it's possible to get json response.but search is 
> only possible via xml queries.
>
>
> Plaatje, Patrick wrote:
>   
>> Or have a look at the Wiki, probably a better way to start:
>>
>> http://wiki.apache.org/solr/SolPHP
>>
>> Best,
>>
>> Patrick
>>
>> --
>> Just trying to help
>> http://www.ipros.nl/
>> --
>>
>> -Original Message-
>> From: KishoreVeleti CoreObjects [mailto:kisho...@coreobjects.com]
>> Sent: dinsdag 16 december 2008 15:14
>> To: solr-user@lucene.apache.org
>> Subject: Re: php client. json communication
>>
>>
>> Check out this link
>> http://www.ibm.com/developerworks/library/os-php-apachesolr/index.htm
>> l
>>
>> If anyone of you used it can you share your experiences.
>>
>> Thanks,
>> Kishore Veleti A.V.K.
>>
>>
>> Julian Davchev wrote:
>>   
>> 
>>> Hi,
>>> I am about to integrate solr for index/search of my documents/data. 
>>> It's php application but I see it should be no problem as solr works

>>> with xml by default.
>>> Is there any read php lib that will ease/help whole communication 
>>> with
>>> 
>>>   
>>   
>> 
>>> solr and if possible to send/receive json data.
>>>
>>> I looked up archive list and seems not many discussions in php. Also

>>> from manual it seems that it can only get json response but request 
>>> should always be xml.
>>> Cheers,
>>>
>>>
>>> 
>>>   
>> --
>> View this message in context:
>> http://www.nabble.com/php-client.-json-communication-tp21033573p21033
>> 8
>> 06
>> .html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>   
>> 
>
>

RE: php client. json communication

2008-12-16 Thread Plaatje, Patrick

Hi Julian,

I'm a bit confused. The indexing is indeed being done through XML, but
in searching it is possible to get JSON results by using the wt=json
parameter, have a look here:

http://wiki.apache.org/solr/SolJSON

Best,

Patrick


-Original Message-
From: Julian Davchev [mailto:j...@drun.net] 
Sent: dinsdag 16 december 2008 22:39
To: solr-user@lucene.apache.org
Subject: Re: php client. json communication

Hi,
1. Thanks for links, I looked at both. Still I think that solr or at
least those php clients doesn't support jason as input.
It's clear that it's possible to get json response.but search is
only possible via xml queries.


Plaatje, Patrick wrote:
> Or have a look at the Wiki, probably a better way to start:
>
> http://wiki.apache.org/solr/SolPHP
>
> Best,
>
> Patrick
>
> --
> Just trying to help
> http://www.ipros.nl/
> --
>
> -Original Message-
> From: KishoreVeleti CoreObjects [mailto:kisho...@coreobjects.com]
> Sent: dinsdag 16 december 2008 15:14
> To: solr-user@lucene.apache.org
> Subject: Re: php client. json communication
>
>
> Check out this link
> http://www.ibm.com/developerworks/library/os-php-apachesolr/index.html
>
> If anyone of you used it can you share your experiences.
>
> Thanks,
> Kishore Veleti A.V.K.
>
>
> Julian Davchev wrote:
>   
>> Hi,
>> I am about to integrate solr for index/search of my documents/data. 
>> It's php application but I see it should be no problem as solr works 
>> with xml by default.
>> Is there any read php lib that will ease/help whole communication 
>> with
>> 
>
>   
>> solr and if possible to send/receive json data.
>>
>> I looked up archive list and seems not many discussions in php. Also 
>> from manual it seems that it can only get json response but request 
>> should always be xml.
>> Cheers,
>>
>>
>> 
>
> --
> View this message in context:
> http://www.nabble.com/php-client.-json-communication-tp21033573p210338
> 06
> .html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

RE: Change in config file (synonym.txt) requires container restart?

2008-12-19 Thread Plaatje, Patrick

Hi ,

I'm wondering if you could not implement a custom filter which reads the
file realtime (you might even keep the create synonym map in memory for
a predefined time). This then doesn't need a restart of the container.

Best,

Patrick

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: vrijdag 19 december 2008 7:30
To: solr-user@lucene.apache.org
Subject: Re: Change in config file (synonym.txt) requires container
restart?

Please note that a core reload will also stop Solr from serving any
search requests in the time it reloads.

On Fri, Dec 19, 2008 at 8:24 AM, Sagar Khetkade
wrote:

>
> But i am using CommonsHttpSolrServer for Solr server configuation as 
> it is accepts the url. So here how can i reload the core.
>
> -Sagar> Date: Thu, 18 Dec 2008 07:55:02 -0500> From: 
> -Sagar> markrmil...@gmail.com>
> To: solr-user@lucene.apache.org> Subject: Re: Change in config file
> (synonym.txt) requires container restart?> > Sagar Khetkade wrote:> > 
> Hi,> >
> > > I am using SolrJ client to connect to the Solr 1.3 server and the 
> > > whole
> POC (doing a feasibility study ) reside in Tomcat web server. If any 
> change I am making in the synonym.txt file to add the synonym in the 
> file to make it reflect I have to restart the tomcat server. The 
> synonym filter factory that I am using are in both in analyzers for 
> type index and query in schema.xml. Please tell me whether this 
> approach is good or any other way to make the change reflect while 
> searching without restarting of tomcat server.> > > > Thanks and 
> Regards,> > Sagar Khetkade> > 
> _> > 
> Chose your Life Partner? Join MSN Matrimony FREE> > 
> http://in.msn.com/matrimony>
> > > You can also reload the core.> > - Mark
> _
> Chose your Life Partner? Join MSN Matrimony FREE 
> http://in.msn.com/matrimony
>



--
Regards,
Shalin Shekhar Mangar.

Getting request object within search component

2008-12-24 Thread Plaatje, Patrick

Hi All,

I developed my own custom search component, in which I need to get the
requestors ip-address. But I can't seem to find a request object from
where I can get this string, ideas anyone?

Best,

Patrick

RE: Solr statistics of top searches and results returned

2009-05-20 Thread Plaatje, Patrick

Hi,

At the moment Solr does not have such functionality. I have written a plugin 
for Solr though which uses a second Solr core to store/index the searches. If 
you're interested, send me an email and I'll get you the source for the plugin.

Regards,

Patrick

-Original Message-
From: solrpowr [mailto:solrp...@hotmail.com] 
Sent: dinsdag 19 mei 2009 20:21
To: solr-user@lucene.apache.org
Subject: Solr statistics of top searches and results returned


Hi,

Besides my own offline processing via logs, does solr have the functionality to 
give me statistics such as top searches, how many results were returned on 
these searches, and/or how long it took to get these results on average.


Thanks,
Bob
--
View this message in context: 
http://www.nabble.com/Solr-statistics-of-top-searches-and-results-returned-tp23621779p23621779.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr statistics of top searches and results returned

2009-05-20 Thread Plaatje, Patrick

Hi Shalin,

Let me investigate. I think the challenge will be in storingmanaging these 
statistics. I'll get back to the list when I have thought of something.

Rgrds,

Patrick

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: woensdag 20 mei 2009 10:33
To: solr-user@lucene.apache.org
Subject: Re: Solr statistics of top searches and results returned

On Wed, May 20, 2009 at 1:31 PM, Plaatje, Patrick < 
patrick.plaa...@getronics.com> wrote:

>
> At the moment Solr does not have such functionality. I have written a 
> plugin for Solr though which uses a second Solr core to store/index 
> the searches. If you're interested, send me an email and I'll get you 
> the source for the plugin.
>
>
Patrick, this will be a useful addition. However instead of doing this with 
another core, we can keep running statistics which can be shown on the 
statistics page itself. What do you think?

A related approach for showing slow queries was discussed recently. There's an 
issue open which has more details:

https://issues.apache.org/jira/browse/SOLR-1101

--
Regards,
Shalin Shekhar Mangar.

RE: Solr statistics of top searches and results returned

2009-05-25 Thread Plaatje, Patrick

Hi all,

I created a script that uses a Solr Search Component, which hooks into the main 
solr core and catches the searches being done. After this it tokenizes the 
search and send both the tokenized as well as the original query to another 
Solr core. I have not written a factory for this, but if required, it shouldn't 
be so hard to modify the script and code Database support into it.

You can find the source here:

http://www.ipros.nl/uploads/Stats-component.zip

It includes a README, and a schema.xml that should be used.

Please let me know you're thoughts.

Best,

Patrick

-Original Message-
From: Umar Shah [mailto:u...@wisdomtap.com] 
Sent: vrijdag 22 mei 2009 10:03
To: solr-user@lucene.apache.org
Subject: Re: Solr statistics of top searches and results returned

Hi,

good feature to have,
maintaining top N would also require storing all the search queries done so far 
and keep updating (or atleast in some time window).

having pluggable persistent storage for all time search queries would be great.

tell me how can I help?

-umar

On Fri, May 22, 2009 at 12:21 PM, Shalin Shekhar Mangar 
 wrote:
> On Fri, May 22, 2009 at 3:22 AM, Grant Ingersoll wrote:
>
>>
>> I think you will want some type of persistence mechanism otherwise 
>> you will end up consuming a lot of resources keeping track of all the 
>> query strings, unless I'm missing something.  Either a Lucene index 
>> (Solr core) or the option of embedding a DB.  Ideally, it would be 
>> pluggable such that people could choose their storage mechanism.  
>> Most people do this kind of thing offline via log analysis as logs can grow 
>> quite large quite quickly.
>>
>
> For a general case, yes. But I was thinking more of a top 'n' queries 
> as a running statistic.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

RE: Solr statistics of top searches and results returned

2009-05-26 Thread Plaatje, Patrick

Hi,

In our specific implementation this is not really an issue, but I can imagine 
it could impact performance. I guess a new thread could spawned, which takes 
care of any performance issues, thanks for pointing it out. I'll post a message 
when I coded the change.

Regards,

Patrick


-Original Message-
From: rswart [mailto:rjsw...@gmail.com] 
Sent: dinsdag 26 mei 2009 16:42
To: solr-user@lucene.apache.org
Subject: RE: Solr statistics of top searches and results returned


If this is is not done in an async way wouldn't this have a serious performance 
impact? 

 

Plaatje, Patrick wrote:
> 
> Hi all,
> 
> I created a script that uses a Solr Search Component, which hooks into 
> the main solr core and catches the searches being done. After this it 
> tokenizes the search and send both the tokenized as well as the 
> original query to another Solr core. I have not written a factory for 
> this, but if required, it shouldn't be so hard to modify the script 
> and code Database support into it.
> 
> You can find the source here:
> 
> http://www.ipros.nl/uploads/Stats-component.zip
> 
> It includes a README, and a schema.xml that should be used.
> 
> Please let me know you're thoughts.
> 
> Best,
> 
> Patrick
> 
> 
> 
>  
> 
> -Original Message-
> From: Umar Shah [mailto:u...@wisdomtap.com]
> Sent: vrijdag 22 mei 2009 10:03
> To: solr-user@lucene.apache.org
> Subject: Re: Solr statistics of top searches and results returned
> 
> Hi,
> 
> good feature to have,
> maintaining top N would also require storing all the search queries 
> done so far and keep updating (or atleast in some time window).
> 
> having pluggable persistent storage for all time search queries would 
> be great.
> 
> tell me how can I help?
> 
> -umar
> 
> On Fri, May 22, 2009 at 12:21 PM, Shalin Shekhar Mangar 
>  wrote:
>> On Fri, May 22, 2009 at 3:22 AM, Grant Ingersoll
>> wrote:
>>
>>>
>>> I think you will want some type of persistence mechanism otherwise 
>>> you will end up consuming a lot of resources keeping track of all 
>>> the query strings, unless I'm missing something.  Either a Lucene 
>>> index (Solr core) or the option of embedding a DB.  Ideally, it 
>>> would be pluggable such that people could choose their storage mechanism.
>>> Most people do this kind of thing offline via log analysis as logs 
>>> can grow quite large quite quickly.
>>>
>>
>> For a general case, yes. But I was thinking more of a top 'n' queries 
>> as a running statistic.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
> 
> 

--
View this message in context: 
http://www.nabble.com/Solr-statistics-of-top-searches-and-results-returned-tp23621779p23724277.html
Sent from the Solr - User mailing list archive at Nabble.com.

Delete, Commit, Add Interaction

2009-06-24 Thread Patrick Johnstone

We're indexing a potentially large collection of documentsinto smaller
subgroups we call "collections".  Each document
has a field that identifies the collection it belongs to, in addition
to a unique document id field:


   
  foo-1
  foo
  ..
   
   
  foo-2
  foo
  .
   

   . etc.


"collection" and "id" are defined in schema.xml as string fields.

When a collection is being added to the index, it's possible that
there is an existing "foo" collection in the index that needs to be
replaced.  The ids in the new collection will reuse many of the ids
in the old collection, but the replacement is not a document-for-document
replacement process -- there may be more or less documents
in the new collection.

So the replacement operation goes as follows:


   collection:foo



   
  .



Each of these XML commands happens on a separate HTTP connection.
If the collection doesn't already exist in the index, then the delete
is essentially a noop.

Finally, here's the behavior we're seeing.  In some cases, usually when
the index is starting to get larger (approaching 500,000 documents),
the above procedure will fail to add anything to the index.  That is, none
of the commands return an error code, there is no indication of a problem
in the log files and the process DOES take some amount of time to
complete.  But at the end of the process, there are no documents in
the index whose collection is "foo".  This can happen whether or not
there is an existing "foo" collection already in the index -- in fact, the
typical case is that there is not.

So my question is:  Is there any chance that the delete, commit, and add
commands are interacting in such a way as to cause the add to happen
before the delete so that the add is just replacing the existing "foo"
documents and then the delete is coming along and deleting everything?

My understanding is that the wait attributes to the commit command should
flush the delete out to the index before the add can start but I have
no knowledge of the true sequencing of events in either Solr or Lucene.

If this is happening, how can I know when the delete has been processed
before initiating the add process?

Thanks,

Patrick Johnstone

StreamingUpdateSolrServer

2009-12-30 Thread Patrick Sauts


Hi All,

I'm testing StreamingUpdateSolrServer for indexing but I don't see the 
last :  finished: 
org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner@
in my logs. Do I have to use a special function to wait until update is 
effective ?


Another question (maybe easy for you) I'm running solr on a tomcat 
5.0.28 and sometimes, not at a time of rsync or big traffic or commit, 
it doesn't respond anymore and uptime is very high.


Thank you for your help.

Patrick.

Invalid CRLF - StreamingUpdateSolrServer ?

2009-12-31 Thread Patrick Sauts

I'm using solr 1.4 on tomcat 5.0.28, with client 
StreamingUpdateSolrServer with 10threads and xml communication via Post 
method.


Is there a way to avoid this error (data lost)?
And is StreamingUpdateSolrServer reliable ?

GRAVE: org.apache.solr.common.SolrException: Invalid CRLF
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72)
   at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)
   at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
   at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
   at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
   at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)

   at java.lang.Thread.run(Thread.java:619)
Caused by: com.ctc.wstx.exc.WstxIOException: Invalid CRLF

1 2 >

1 - 100 of 145 matches

Mail list logo