Sorl Help!

2014-02-24 Thread Quốc Nguyễn
Dear sir,

To Apache Solr support!
wish you have a good day!

I'm new in Solr, please help me to confirm bellow information :

1. "The URL must use the standard ports for HTTP (80) and HTTPS (443).
The port is implied by the scheme, but may also be mentioned in the URL as
long as the port is standard for the scheme (https://...:443/). An app
cannot connect to an arbitrary port of a remote host, nor can it use a
non-standard port for a scheme." This is an annoyance for those running
Solr on non-80/443. To some, this may be a fatal limitation.
2. You can not write index on disk, but you can read files. So
theoretically if the index is read-only and small, you can package it with
the war file.
3. If you need to update the index, you will have to store the index
with Google's data store, just like store an index into databases. Sure
it'll work. But performance would suffer because of transferring the whole
index into memory, then really start searching. On the other hand, this
could be a good solution for small index with per-user data.
4. For large changing indexes, you need to find other solutions to
maintain lucene index.
5. GAE does not support SOLR implementation : solr requires access to
the server file system, which GAE forbids.



this is the restriction of Solr when integrate with GAE. I found this in
the internet . is it right? and any restriction more?

-
Best Regards
Richard Nguyen
GMO Runsystem


Sorl integrate with GAE

2014-02-24 Thread Quốc Nguyễn
Dear sir,

To Apache Solr support!
wish you have a good day!

I'm new in Solr, please help me to confirm bellow information :

1. "The URL must use the standard ports for HTTP (80) and HTTPS (443).
The port is implied by the scheme, but may also be mentioned in the URL as
long as the port is standard for the scheme (https://...:443/). An app
cannot connect to an arbitrary port of a remote host, nor can it use a
non-standard port for a scheme." This is an annoyance for those running
Solr on non-80/443. To some, this may be a fatal limitation.
2. You can not write index on disk, but you can read files. So
theoretically if the index is read-only and small, you can package it with
the war file.
3. If you need to update the index, you will have to store the index
with Google's data store, just like store an index into databases. Sure
it'll work. But performance would suffer because of transferring the whole
index into memory, then really start searching. On the other hand, this
could be a good solution for small index with per-user data.
4. For large changing indexes, you need to find other solutions to
maintain lucene index.
5. GAE does not support SOLR implementation : solr requires access to
the server file system, which GAE forbids.



this is the restriction of Solr when integrate with GAE. I found this in
the internet . is it right? and any restriction more?

-
Best Regards
Richard Nguyen
GMO Runsystem


start.jar config

2014-02-24 Thread Can Arel
Hi all,
I have a server which uses Solr and for some reason the solr got
terminated. When I restart it with java -jar start.jar it uses stdout as
logger. Should I just redirect this with > to a file location or is there
an idomatic Solr way this should be done?

Thanks,
Can


Re: start.jar config

2014-02-24 Thread manju16832003
Solr already writes the logs to a file 'solr.log'. Its located in the same
folder as start.jar (logs/solr.log).

I'm not sure if thats what you looking for :-).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/start-jar-config-tp4119201p4119203.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Converting solrdocument response into pojo

2014-02-24 Thread Navaa
Thank you Alexander for your reply.

Here I am posting my schema definition


 
  


But I am not able to resolve this issue please tell me where I am going
wrong. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-solrdocument-response-into-pojo-tp4118743p4119205.html
Sent from the Solr - User mailing list archive at Nabble.com.


URLDataSource : Issue assigning single xpath field name to two solr fields

2014-02-24 Thread manju16832003
I'm not sure if I would be missing any configuration params here, however
when I tried to assign an xpath field from URLDataSource (XML end point) to
two fields defined in schema.xml.

Here is my scenario,
I have two fields
*profile_display* and *profile_indexed*

My assignment in DataImpotHandler looks like this

http://URLTOExternalSystem//ProfileService.svc/";
processor="XPathEntityProcessor"
forEach="/Profiles">




My Scheama.xml config looks like this



*So the issue here is, the value is value is always assigned to
profile_indexed, and profile_display does not contain any value. *

Meaning, if we were to assign xpath field name to different solr fields,
only the last field contains the data.

The reason I have two fields is that, One to store it as a String to display
to user, another field where I apply Filter and Tokenizers to do text
transformation.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/URLDataSource-Issue-assigning-single-xpath-field-name-to-two-solr-fields-tp4119206.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can not index raw binary data stored in Database in BLOB format.

2014-02-24 Thread Gora Mohanty
On 24 February 2014 12:51, Chandan khatua  wrote:
> Hi,
>
>
>
> We have raw binary data stored in database(not word,excel,xml etc files) in
> BLOB.
>
> We are trying to index using TikaEntityProcessor but nothing seems to get
> indexed.
>
> But the same configuration works when xml/word/excel files are stored in the
> BLOB field.

Please start by reviewing http://wiki.apache.org/solr/DataImportHandler as the
above seems quite confused. Why are you using TikaEntityProcessor if the data
in the DB are not richtext files?

What is the type of the column used to store the binary data in
Oracle? You might
be able to convert it with a ClobTransformer. Please see
http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5

Regards,
Gora


Re: Luke 4.6.1 released

2014-02-24 Thread Dmitry Kan
Yes, indeed. Every release of luke is tested on the corresponding solr
version's indexes. The indexes are created based on the exampledocs of the
solr package.

Dmitry


On Mon, Feb 17, 2014 at 12:41 AM, Bill Bell  wrote:

> Yes it works with Solr
>
> Bill Bell
> Sent from mobile
>
>
> > On Feb 16, 2014, at 3:38 PM, Alexandre Rafalovitch 
> wrote:
> >
> > Does it work with Solr? I couldn't tell what the description was from
> > this repo and it's Solr relevance.
> >
> > I am sure all the long timers know, but for more recent Solr people,
> > the additional information would be useful.
> >
> > Regards,
> >   Alex.
> > Personal website: http://www.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all
> > at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > book)
> >
> >
> >> On Mon, Feb 17, 2014 at 3:02 AM, Dmitry Kan 
> wrote:
> >> Hello!
> >>
> >> Luke 4.6.1 has been just released. Grab it here:
> >>
> >> https://github.com/DmitryKey/luke/releases/tag/4.6.1
> >>
> >> fixes:
> >> loading the jar from command line is now working fine.
> >>
> >> --
> >> Dmitry Kan
> >> Blog: http://dmitrykan.blogspot.com
> >> Twitter: twitter.com/dmitrykan
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: Sorl integrate with GAE

2014-02-24 Thread Gora Mohanty
On 24 February 2014 12:39, Quốc Nguyễn  wrote:
> Dear sir,
>
> To Apache Solr support!
> wish you have a good day!
>
> I'm new in Solr, please help me to confirm bellow information :
>
> 1. "The URL must use the standard ports for HTTP (80) and HTTPS (443).
> The port is implied by the scheme, but may also be mentioned in the URL as
> long as the port is standard for the scheme (https://...:443/). An app
> cannot connect to an arbitrary port of a remote host, nor can it use a
> non-standard port for a scheme." This is an annoyance for those running
> Solr on non-80/443. To some, this may be a fatal limitation.
> 2. You can not write index on disk, but you can read files. So
> theoretically if the index is read-only and small, you can package it with
> the war file.
> 3. If you need to update the index, you will have to store the index
> with Google's data store, just like store an index into databases. Sure
> it'll work. But performance would suffer because of transferring the whole
> index into memory, then really start searching. On the other hand, this
> could be a good solution for small index with per-user data.
> 4. For large changing indexes, you need to find other solutions to
> maintain lucene index.
> 5. GAE does not support SOLR implementation : solr requires access to
> the server file system, which GAE forbids.
>
>
>
> this is the restriction of Solr when integrate with GAE. I found this in
> the internet . is it right? and any restriction more?

You would probably have better luck with asking on a GAE forum. This
seems to have nothing to do with Solr per se.

Regards,
Gora


Re: URLDataSource : Issue assigning single xpath field name to two solr fields

2014-02-24 Thread Gora Mohanty
On 24 February 2014 14:45, manju16832003  wrote:
> I'm not sure if I would be missing any configuration params here, however
> when I tried to assign an xpath field from URLDataSource (XML end point) to
> two fields defined in schema.xml.
>
> Here is my scenario,
> I have two fields
> *profile_display* and *profile_indexed*
>
> My assignment in DataImpotHandler looks like this
>
> 
> url="http://URLTOExternalSystem//ProfileService.svc/";
> processor="XPathEntityProcessor"
> forEach="/Profiles">
> 
> 
> 
>
> My Scheama.xml config looks like this
>  multiValued="false" default=""/>
>  stored="false"  multiValued="false" default=""/>
>
> *So the issue here is, the value is value is always assigned to
> profile_indexed, and profile_display does not contain any value. *
>
> Meaning, if we were to assign xpath field name to different solr fields,
> only the last field contains the data.
>
> The reason I have two fields is that, One to store it as a String to display
> to user, another field where I apply Filter and Tokenizers to do text
> transformation.

Not sure what happens when the same Xpath is applied to two fields
(though I would have thought that this should work). If you need the
same data in two fields that are tokenised in different ways, you can
use Solr's CopyField: http://wiki.apache.org/solr/SchemaXml#Copy_Fields
This will be more efficient, too.

Regards,
Gora


Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

2014-02-24 Thread Shalin Shekhar Mangar
I opened SOLR-5768

https://issues.apache.org/jira/browse/SOLR-5768

On Mon, Feb 24, 2014 at 12:56 AM, Shalin Shekhar Mangar
 wrote:
> Yes that should be simple. But regardless of the parameter, the
> fl=id,score use-case should be optimized by default. I think I'll
> commit the patch as-is and open a new issue to add the
> distrib.singlePass parameter.
>
> On Sun, Feb 23, 2014 at 11:49 PM, Yonik Seeley  wrote:
>> On Sun, Feb 23, 2014 at 1:08 PM, Shalin Shekhar Mangar
>>  wrote:
>>> I should clarify though that this optimization only works with fl=id,score.
>>
>> Although it seems like it should be relatively simple to make it work
>> with other fields as well, by passing down the complete "fl" requested
>> if some optional parameter is set (distrib.singlePass?)
>>
>> -Yonik
>> http://heliosearch.org - native off-heap filters and fieldcache for solr
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.



-- 
Regards,
Shalin Shekhar Mangar.


RE: Can not index raw binary data stored in Database in BLOB format.

2014-02-24 Thread Chandan khatua
Hi Gora !

Your concern was "What is the type of the column used to store the binary
data in Oracle?"
The column type is BLOB in DB.  The column can also have rich text file.

Regards,
Chandan


-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Monday, February 24, 2014 3:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Can not index raw binary data stored in Database in BLOB
format.

On 24 February 2014 12:51, Chandan khatua  wrote:
> Hi,
>
>
>
> We have raw binary data stored in database(not word,excel,xml etc 
> files) in BLOB.
>
> We are trying to index using TikaEntityProcessor but nothing seems to 
> get indexed.
>
> But the same configuration works when xml/word/excel files are stored 
> in the BLOB field.

Please start by reviewing http://wiki.apache.org/solr/DataImportHandler as
the above seems quite confused. Why are you using TikaEntityProcessor if the
data in the DB are not richtext files?

What is the type of the column used to store the binary data in Oracle? You
might be able to convert it with a ClobTransformer. Please see
http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are
_added_to_the_Solr_document_as_object_strings_like_B.401f23c5

Regards,
Gora



Re: Wikipedia Data Cleaning at Solr

2014-02-24 Thread Furkan KAMACI
My input is that:

{| style="text-align: left; width: 50%; table-layout: fixed;" border="0" |}

Analysis is as follows:

WT
textraw_bytesstartendtypeflagsposition
style[73 74 79 6c 65]3801
text[74 65 78 74]101402
align[61 6c 69 67 6e]152003
left[6c 65 66 74]222604
width[77 69 64 74 68]283305
50[35 30]353706
table[74 61 62 6c 65]404507
layout[6c 61 79 6f 75 74]465208
fixed[66 69 78 65 64]545909
border[62 6f 72 64 65 72]6268010
0[30]7071011



2014-02-24 0:28 GMT+02:00 Furkan KAMACI :

> I've compared the results when using WikipediaTokenizer for  index time
> analyzer but there is no difference?
>
>
> 2014-02-23 3:44 GMT+02:00 Ahmet Arslan :
>
> Hi Furkan,
>>
>> There is org.apache.lucene.analysis.wikipedia.WikipediaTokenizer
>>
>> Ahmet
>>
>>
>> On Sunday, February 23, 2014 2:22 AM, Furkan KAMACI <
>> furkankam...@gmail.com> wrote:
>> Hi;
>>
>> I want to run an NLP algorithm for Wikipedia data. I used dataimport
>> handler for dump data and everything is OK. However there are some texts
>> as
>> like:
>>
>> == Altyapı bilgileri == Köyde, [[ilköğretim]] okulu yoktur fakat taşımalı
>> eğitimden yararlanılmaktadır.
>>
>> I think that it should be like that:
>>
>> Altyapı bilgileri Köyde, ilköğretim okulu yoktur fakat taşımalı eğitimden
>> yararlanılmaktadır.
>>
>> On the other hand this should be removed:
>>
>> {| border="0" cellpadding="5" cellspacing="5" |- bgcolor="#aa"
>> |'''Seçim Yılı''' |'''Muhtar''' |- bgcolor="#dd" |[[2009]] |kazım
>> güngör |- bgcolor="#dd" | |Ömer Gungor |- bgcolor="#dd" | |Fazlı
>> Uzun |- bgcolor="#dd" | |Cemal Özden |- bgcolor="#dd" | | |}
>>
>> Also including titles as like == Altyapı bilgileri == should be optional
>> (I
>> think that they can be removed for some purposes)
>>
>> My question is that. Is there any analyzer combination to clean up
>> Wikipedia data for Solr?
>>
>> Thanks;
>> Furkan KAMACI
>>
>
>


Re: Can not index raw binary data stored in Database in BLOB format.

2014-02-24 Thread Raymond Wiker
I've done something like this; the key was to use a FieldStreamDataSource
to read from the BLOB field.

Something like




then

  







...




On Mon, Feb 24, 2014 at 11:04 AM, Chandan khatua wrote:

> Hi Gora !
>
> Your concern was "What is the type of the column used to store the binary
> data in Oracle?"
> The column type is BLOB in DB.  The column can also have rich text file.
>
> Regards,
> Chandan
>
>
> -Original Message-
> From: Gora Mohanty [mailto:g...@mimirtech.com]
> Sent: Monday, February 24, 2014 3:02 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can not index raw binary data stored in Database in BLOB
> format.
>
> On 24 February 2014 12:51, Chandan khatua  wrote:
> > Hi,
> >
> >
> >
> > We have raw binary data stored in database(not word,excel,xml etc
> > files) in BLOB.
> >
> > We are trying to index using TikaEntityProcessor but nothing seems to
> > get indexed.
> >
> > But the same configuration works when xml/word/excel files are stored
> > in the BLOB field.
>
> Please start by reviewing http://wiki.apache.org/solr/DataImportHandler as
> the above seems quite confused. Why are you using TikaEntityProcessor if
> the
> data in the DB are not richtext files?
>
> What is the type of the column used to store the binary data in Oracle? You
> might be able to convert it with a ClobTransformer. Please see
> http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
>
> http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are
> _added_to_the_Solr_document_as_object_strings_like_B.401f23c5
>
> Regards,
> Gora
>
>


RE: Can not index raw binary data stored in Database in BLOB format.

2014-02-24 Thread Chandan khatua
Hi Raymond !

I've data-config.xml like bellow:




 
 
  
 
 



  


 
 




This is looks like similar to your configuration. But when xml data are in
BLOB in database, indexing is done. But, when binary data are in BLOB in
database, indexing is NOT done.
Please help.

Thanking you,
-Chandan


-Original Message-
From: Raymond Wiker [mailto:rwi...@gmail.com] 
Sent: Monday, February 24, 2014 4:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Can not index raw binary data stored in Database in BLOB
format.

I've done something like this; the key was to use a FieldStreamDataSource to
read from the BLOB field.

Something like




then

  







...




On Mon, Feb 24, 2014 at 11:04 AM, Chandan khatua
wrote:

> Hi Gora !
>
> Your concern was "What is the type of the column used to store the 
> binary data in Oracle?"
> The column type is BLOB in DB.  The column can also have rich text file.
>
> Regards,
> Chandan
>
>
> -Original Message-
> From: Gora Mohanty [mailto:g...@mimirtech.com]
> Sent: Monday, February 24, 2014 3:02 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can not index raw binary data stored in Database in BLOB 
> format.
>
> On 24 February 2014 12:51, Chandan khatua  wrote:
> > Hi,
> >
> >
> >
> > We have raw binary data stored in database(not word,excel,xml etc
> > files) in BLOB.
> >
> > We are trying to index using TikaEntityProcessor but nothing seems 
> > to get indexed.
> >
> > But the same configuration works when xml/word/excel files are 
> > stored in the BLOB field.
>
> Please start by reviewing 
> http://wiki.apache.org/solr/DataImportHandler as the above seems quite 
> confused. Why are you using TikaEntityProcessor if the data in the DB 
> are not richtext files?
>
> What is the type of the column used to store the binary data in 
> Oracle? You might be able to convert it with a ClobTransformer. Please 
> see http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
>
> http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_tab
> le_are
> _added_to_the_Solr_document_as_object_strings_like_B.401f23c5
>
> Regards,
> Gora
>
>



Re: Can not index raw binary data stored in Database in BLOB format.

2014-02-24 Thread Raymond Wiker
Try replacing the inner entity with something like



  

--- this assumes that you get the blob from a column named "MESSAGE" in the
outer entity ("messages").


On Mon, Feb 24, 2014 at 11:51 AM, Chandan khatua wrote:

> Hi Raymond !
>
> I've data-config.xml like bellow:
>
> 
> 
>  url="jdbc:oracle:thin:@//x.x.x.x:x/d11gr21" user="x" password="x"/>
>  
>  
>  name="messages" pk=" PK" transformer='DateFormatTransformer'
>   query="select * from table1"
>   dataSource="db">
>  
>  
>  name="message"
> dataSource="dastream"
> processor="TikaEntityProcessor"
> url="message"
> dataField="db.MESSAGE"
> format="text"
> >
>
> 
>   
> 
>
>
>  
> 
>
>
>
> This is looks like similar to your configuration. But when xml data are in
> BLOB in database, indexing is done. But, when binary data are in BLOB in
> database, indexing is NOT done.
> Please help.
>
> Thanking you,
> -Chandan
>
>
> -Original Message-
> From: Raymond Wiker [mailto:rwi...@gmail.com]
> Sent: Monday, February 24, 2014 4:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can not index raw binary data stored in Database in BLOB
> format.
>
> I've done something like this; the key was to use a FieldStreamDataSource
> to
> read from the BLOB field.
>
> Something like
>
> 
> 
>
> then
>
>dataField="main.BLOB" dataSource="fieldstream" format="xml">
> 
> 
> 
> 
> 
> 
>
> ...
>
>
>
>
> On Mon, Feb 24, 2014 at 11:04 AM, Chandan khatua
> wrote:
>
> > Hi Gora !
> >
> > Your concern was "What is the type of the column used to store the
> > binary data in Oracle?"
> > The column type is BLOB in DB.  The column can also have rich text file.
> >
> > Regards,
> > Chandan
> >
> >
> > -Original Message-
> > From: Gora Mohanty [mailto:g...@mimirtech.com]
> > Sent: Monday, February 24, 2014 3:02 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Can not index raw binary data stored in Database in BLOB
> > format.
> >
> > On 24 February 2014 12:51, Chandan khatua 
> wrote:
> > > Hi,
> > >
> > >
> > >
> > > We have raw binary data stored in database(not word,excel,xml etc
> > > files) in BLOB.
> > >
> > > We are trying to index using TikaEntityProcessor but nothing seems
> > > to get indexed.
> > >
> > > But the same configuration works when xml/word/excel files are
> > > stored in the BLOB field.
> >
> > Please start by reviewing
> > http://wiki.apache.org/solr/DataImportHandler as the above seems quite
> > confused. Why are you using TikaEntityProcessor if the data in the DB
> > are not richtext files?
> >
> > What is the type of the column used to store the binary data in
> > Oracle? You might be able to convert it with a ClobTransformer. Please
> > see http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
> >
> > http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_tab
> > le_are
> > _added_to_the_Solr_document_as_object_strings_like_B.401f23c5
> >
> > Regards,
> > Gora
> >
> >
>
>


RE: Can not index raw binary data stored in Database in BLOB format.

2014-02-24 Thread Chandan khatua
I've tried as per your guide. But, no data are indexing.
The output of Query screen looks like :


2158

  http://www.w3.org/1999/xhtml";>






1460918369230258176



But, the indexed data should be displayed within   tag. When xml
message are stored in DB in BLOB type, then indexing is done smoothly. 
But, I am trying to index binary data which are stored in DB in BLOB type.

Need help.

Thanking you,
Chandan



-Original Message-
From: Raymond Wiker [mailto:rwi...@gmail.com] 
Sent: Monday, February 24, 2014 4:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Can not index raw binary data stored in Database in BLOB
format.

Try replacing the inner entity with something like



  

--- this assumes that you get the blob from a column named "MESSAGE" in the
outer entity ("messages").


On Mon, Feb 24, 2014 at 11:51 AM, Chandan khatua
wrote:

> Hi Raymond !
>
> I've data-config.xml like bellow:
>
>name="db" driver="oracle.jdbc.driver.OracleDriver"
> url="jdbc:oracle:thin:@//x.x.x.x:x/d11gr21" user="x" password="x"/>  
>   
> 
>  name="messages" pk=" PK" transformer='DateFormatTransformer'
>   query="select * from table1"
>   dataSource="db">
>  
>  
>  name="message"
> dataSource="dastream"
> processor="TikaEntityProcessor"
> url="message"
> dataField="db.MESSAGE"
> format="text"
> >
>
> 
>   
> 
>
>
>  
> 
>
>
>
> This is looks like similar to your configuration. But when xml data 
> are in BLOB in database, indexing is done. But, when binary data are 
> in BLOB in database, indexing is NOT done.
> Please help.
>
> Thanking you,
> -Chandan
>
>
> -Original Message-
> From: Raymond Wiker [mailto:rwi...@gmail.com]
> Sent: Monday, February 24, 2014 4:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can not index raw binary data stored in Database in BLOB 
> format.
>
> I've done something like this; the key was to use a 
> FieldStreamDataSource to read from the BLOB field.
>
> Something like
>
> 
> 
>
> then
>
>dataField="main.BLOB" dataSource="fieldstream" format="xml">
> 
> 
> 
> 
> 
> 
>
> ...
>
>
>
>
> On Mon, Feb 24, 2014 at 11:04 AM, Chandan khatua
> wrote:
>
> > Hi Gora !
> >
> > Your concern was "What is the type of the column used to store the 
> > binary data in Oracle?"
> > The column type is BLOB in DB.  The column can also have rich text file.
> >
> > Regards,
> > Chandan
> >
> >
> > -Original Message-
> > From: Gora Mohanty [mailto:g...@mimirtech.com]
> > Sent: Monday, February 24, 2014 3:02 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Can not index raw binary data stored in Database in 
> > BLOB format.
> >
> > On 24 February 2014 12:51, Chandan khatua 
> wrote:
> > > Hi,
> > >
> > >
> > >
> > > We have raw binary data stored in database(not word,excel,xml etc
> > > files) in BLOB.
> > >
> > > We are trying to index using TikaEntityProcessor but nothing seems 
> > > to get indexed.
> > >
> > > But the same configuration works when xml/word/excel files are 
> > > stored in the BLOB field.
> >
> > Please start by reviewing
> > http://wiki.apache.org/solr/DataImportHandler as the above seems 
> > quite confused. Why are you using TikaEntityProcessor if the data in 
> > the DB are not richtext files?
> >
> > What is the type of the column used to store the binary data in 
> > Oracle? You might be able to convert it with a ClobTransformer. 
> > Please see 
> > http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
> >
> > http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_t
> > ab
> > le_are
> > _added_to_the_Solr_document_as_object_strings_like_B.401f23c5
> >
> > Regards,
> > Gora
> >
> >
>
>



Re: URLDataSource : Issue assigning single xpath field name to two solr fields

2014-02-24 Thread Shalin Shekhar Mangar
The XPathEntityProcessor supports only one field mapping per xpath so
using copyField is the only way.

On Mon, Feb 24, 2014 at 2:45 PM, manju16832003  wrote:
> I'm not sure if I would be missing any configuration params here, however
> when I tried to assign an xpath field from URLDataSource (XML end point) to
> two fields defined in schema.xml.
>
> Here is my scenario,
> I have two fields
> *profile_display* and *profile_indexed*
>
> My assignment in DataImpotHandler looks like this
>
> 
> url="http://URLTOExternalSystem//ProfileService.svc/";
> processor="XPathEntityProcessor"
> forEach="/Profiles">
> 
> 
> 
>
> My Scheama.xml config looks like this
>  multiValued="false" default=""/>
>  stored="false"  multiValued="false" default=""/>
>
> *So the issue here is, the value is value is always assigned to
> profile_indexed, and profile_display does not contain any value. *
>
> Meaning, if we were to assign xpath field name to different solr fields,
> only the last field contains the data.
>
> The reason I have two fields is that, One to store it as a String to display
> to user, another field where I apply Filter and Tokenizers to do text
> transformation.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/URLDataSource-Issue-assigning-single-xpath-field-name-to-two-solr-fields-tp4119206.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Inconsistency between Leader and replica in solr cloud

2014-02-24 Thread abhijit das

We are currently using Solr Cloud Version 4.3, with the following set-up, a 
core with 2 shards - Shard1 and Shard2, each shard has replication factor 1.

We have noticed that in one of the shards, the document differs between the 
leader and the replica. Though the doc exists in both the machines, the 
properties of the doc are not same.

This is causing inconsistent result in subsequent queries, our understanding is 
that the docs would be replicated and be identical in both leader and replica.

What could be causing this and how can this be avoided.




Thanks in advance.




Regards,

Abhijit






Sent from Windows Mail

Re: Fault Tolerant Technique of Solr Cloud

2014-02-24 Thread Vineet Mishra
Can you brief as how to make a direct call to Zookeeper instead of Cloud
Collection(as currently I was querying the Cloud something like
*"http://192.168.2.183:8900/solr/collection1/select?q=*:*
"* ) from UI, now
if I assume shard 8900 is down then how can I still make the call.

I have followed the Apache Tutorial(with separate zookeeper running on port
2181)

http://wiki.apache.org/solr/SolrCloud

Can you please be more specific in respect to zookeeper distributed calls.

Regards


On Wed, Feb 19, 2014 at 9:45 PM, Per Steffensen  wrote:

> On 19/02/14 07:57, Vineet Mishra wrote:
>
>> Thanks for all your response but my doubt is which *Server:Port* should
>> the
>>
>> query be made as we don't know the crashed server or which server might
>> crash in the future(as any server can go down).
>>
> That is what CloudSolrServer will deal with for you. It knows which
> servers are down and make sure not to send request to those servers.
>
>
>> The only intention for writing this doubt is to get an idea about how the
>> query format for distributed search might work if any of the shard or
>> replica goes down.
>>
>
> // Setting up your CloudSolrServer-client
> CloudSolrServer client=  new  CloudSolrServer();  //
>  being the same string as you provide in -D|zkHost when
> starting your servers
> |client.setDefaultCollection("collection1");
> client.connect();
>
> // Creating and firing queries (you can do it in different way, but at
> least this is an option)
> SolrQuery query = new SolrQuery("*:*");
> QueryResponse results = client.query(query);
>
>
> Because you are using CloudSolrServer you do not have to worry about not
> sending the request to a crashed server.
>
> In your example I believe the situation is as follows:
> * One collection called "collection1" with two shards "shard1" and
> "shard2" each having two replica "replica1" and "replica2" (a replica is an
> "instance" of a shard, and when you have one replica you are not having
> replication).
> * collection1.shard1.replica1 is running on localhost:8983 and
> collection1.shard1.replica2 is running on localhost:8900 (or maybe switched)
> * collection1.shard2.replica1 is running on localhost:7574 and
> collection1.shard2.replica2 is running on localhost:7500 (or maybe switched)
> If localhost:8900 is the only server that is down, all data is still
> available for search because every shard has at least on replica running.
> In that case I believe setting "shards.tolerant" will not make a
> difference. You will get your response no matter what. But if
> localhost:8983 was also down there would no live replica of shard1. I that
> case you will get an exception from you query, indicating that the query
> cannot be carried out over the complete data-set. In that case if you set
> "shards.tolerant" that behaviour will change, and you will not get an
> exception - you will get a real response, but it will just not include data
> from shard1, because it is not available at the moment. That is just the
> way I believe "shards.tolerant" works, but you might want to verify that.
>
> To set "shards.tolerant":
>
> SolrQuery query = new SolrQuery("*:*");
> query.set("shards.tolerant", true);
> QueryResponse results = client.query(query);
>
>
> Believe distributes search is default, but you can explicitly require it by
>
> query.setDistrib(true);
>
> or
>
> query.set("distrib", true);
>
>
>> Thanks
>>
>
>


Re: Can not index raw binary data stored in Database in BLOB format.

2014-02-24 Thread Raymond Wiker
Try running the query for the outer entity ("messages") in an sql client,
and verify that your blob column is called MESSAGE.


On Mon, Feb 24, 2014 at 12:22 PM, Chandan khatua wrote:

> I've tried as per your guide. But, no data are indexing.
> The output of Query screen looks like :
>
> 
> 2158
> 
>xmlns="http://www.w3.org/1999/xhtml";>
> 
> 
> 
> 
> 
> 
> 1460918369230258176
>
>
>
> But, the indexed data should be displayed within   tag. When xml
> message are stored in DB in BLOB type, then indexing is done smoothly.
> But, I am trying to index binary data which are stored in DB in BLOB type.
>
> Need help.
>
> Thanking you,
> Chandan
>
>
>
> -Original Message-
> From: Raymond Wiker [mailto:rwi...@gmail.com]
> Sent: Monday, February 24, 2014 4:38 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can not index raw binary data stored in Database in BLOB
> format.
>
> Try replacing the inner entity with something like
>
> dataSource="dastream"
>processor="TikaEntityProcessor"
>dataField="messages.MESSAGE"
>format="xml">
> 
>   
>
> --- this assumes that you get the blob from a column named "MESSAGE" in the
> outer entity ("messages").
>
>
> On Mon, Feb 24, 2014 at 11:51 AM, Chandan khatua
> wrote:
>
> > Hi Raymond !
> >
> > I've data-config.xml like bellow:
> >
> >> name="db" driver="oracle.jdbc.driver.OracleDriver"
> > url="jdbc:oracle:thin:@//x.x.x.x:x/d11gr21" user="x" password="x"/>
> > 
> > 
> >>   name="messages" pk=" PK" transformer='DateFormatTransformer'
> >   query="select * from table1"
> >   dataSource="db">
> >  
> >  
> >  > name="message"
> > dataSource="dastream"
> > processor="TikaEntityProcessor"
> > url="message"
> > dataField="db.MESSAGE"
> > format="text"
> > >
> >
> > 
> >   
> > 
> >
> >
> >  
> > 
> >
> >
> >
> > This is looks like similar to your configuration. But when xml data
> > are in BLOB in database, indexing is done. But, when binary data are
> > in BLOB in database, indexing is NOT done.
> > Please help.
> >
> > Thanking you,
> > -Chandan
> >
> >
> > -Original Message-
> > From: Raymond Wiker [mailto:rwi...@gmail.com]
> > Sent: Monday, February 24, 2014 4:06 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Can not index raw binary data stored in Database in BLOB
> > format.
> >
> > I've done something like this; the key was to use a
> > FieldStreamDataSource to read from the BLOB field.
> >
> > Something like
> >
> > 
> > 
> >
> > then
> >
> >> dataField="main.BLOB" dataSource="fieldstream" format="xml">
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > ...
> >
> >
> >
> >
> > On Mon, Feb 24, 2014 at 11:04 AM, Chandan khatua
> > wrote:
> >
> > > Hi Gora !
> > >
> > > Your concern was "What is the type of the column used to store the
> > > binary data in Oracle?"
> > > The column type is BLOB in DB.  The column can also have rich text
> file.
> > >
> > > Regards,
> > > Chandan
> > >
> > >
> > > -Original Message-
> > > From: Gora Mohanty [mailto:g...@mimirtech.com]
> > > Sent: Monday, February 24, 2014 3:02 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Can not index raw binary data stored in Database in
> > > BLOB format.
> > >
> > > On 24 February 2014 12:51, Chandan khatua 
> > wrote:
> > > > Hi,
> > > >
> > > >
> > > >
> > > > We have raw binary data stored in database(not word,excel,xml etc
> > > > files) in BLOB.
> > > >
> > > > We are trying to index using TikaEntityProcessor but nothing seems
> > > > to get indexed.
> > > >
> > > > But the same configuration works when xml/word/excel files are
> > > > stored in the BLOB field.
> > >
> > > Please start by reviewing
> > > http://wiki.apache.org/solr/DataImportHandler as the above seems
> > > quite confused. Why are you using TikaEntityProcessor if the data in
> > > the DB are not richtext files?
> > >
> > > What is the type of the column used to store the binary data in
> > > Oracle? You might be able to convert it with a ClobTransformer.
> > > Please see
> > > http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
> > >
> > > http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_t
> > > ab
> > > le_are
> > > _added_to_the_Solr_document_as_object_strings_like_B.401f23c5
> > >
> > > Regards,
> > > Gora
> > >
> > >
> >
> >
>
>


Re: Can not index raw binary data stored in Database in BLOB format.

2014-02-24 Thread Gora Mohanty
On 24 February 2014 15:34, Chandan khatua  wrote:
> Hi Gora !
>
> Your concern was "What is the type of the column used to store the binary
> data in Oracle?"
> The column type is BLOB in DB.  The column can also have rich text file.

Um, your original message said that it does *not* contain richtext data. How
do you tell whether it has richtext data, or not? For just a binary blob, the
ClobTransformer should work, but you need the TikaEntityProcessor for richtext
data. If you do not know whether the data in the blob is richtext or
not, you will
need to roll your own solution to determine that.

Regards,
Gora


Re: Need feedback: Browsing and searching solr-user list emails

2014-02-24 Thread Dmitry Kan
Hello!

Just few random points:

1. Interesting site. I'd say there are similar sites, but this one has
cleaner interface. How does your site compare to this one, for example, in
terms of feature set?

http://qnalist.com/questions/4640870/luke-4-6-0-released

At least, the user ranking seems to be different, because on your site
yours truly marked with 5800 points and on the qnalist with 59.

2. Do you handle several users, like DmitryKan, DmitryKan-1.. as a single
user, i.e. if I'd post under different e-mail addresses.

3. It seems like your site is going to mostly be read only, except for
question / user voting?

To me any such site, including yours, will make sense as long as I could
find stuff faster than with Google.

Dmitry Kan





On Tue, Feb 11, 2014 at 7:18 AM, Durgam Vahia  wrote:

> Hi Solr-users,
>
> I wanted to get your thoughts/feedback on a potentially useful way to
> browse and search prior email conversations in
> solr-users@lucenedistribution list.
>
> http://www.signaldump.org/solr/qpod/
>
> In a nutshell, this is a Q&A engine like StackExchange (SE) auto-populated
> with solr-users@lucene email threads of past one year. Engine auto-tags
> email threads and creates user profile of participants with points, badges
> etc. New emails also gets processed automatically and will be placed under
> the relevant conversation.
>
> Here are some of the advantages that might be useful -
>
>- Like SE, users can "crowdsource" the quality of content by voting, and
>choosing best answers.
>- You can favorite posts/threads, users, tags to personalize search.
>- Email conversations and Q&A engine work seamlessly together. One can
>use any medium and conversations are still presented in a uniform way.
>- Web UI supports mobile device aspect ratios - just click on above link
>on your mobile device to get a feel.
>
> Do you think this would be useful for the solr-users community? To get a
> feel, try searching the archive before posting in the email list to see if
> UI makes finding things little gentler. As more people search/view/vote,
> search should become more relevant and personalized.
>
> I would be happy to maintain this for the benefit of the community.
> Currently I have only seeded past one year of email but we could
> potentially go further back if people find this useful.
>
> Thanks and feedback welcome.
>
> And before someone asks - yes, our search engine is Solr ..
>
> Durgam.
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Reg queryResultCache...

2014-02-24 Thread Senthilnathan Vijayaraja
Below is the url which will hit the middle layer then middle layer will
form the solr query and fire it.

*listing?offset=0&sortparam=0&limit=20&q=Chennai~Tambaram~1~2,3~45~2500~800~2000~~24*


Chennai-->city
Tambaram-->locality
1-->blah
2,3-->blah
45~2500-->price_min and max
800~2000-->area min and max
*24--lux_ amenities*

here other than lux_amenities I am using fq for all other things,so the
problem here is sorting.

I am sorting the results using bscore and relscore likebelow,

*$bscore desc,$relscore desc*

first time it works fine.

above bscore and relscore will change based on lux_amenities but
lux_amenities is neither part of fq  nor q.So if second time we are
changing the lux_amenities alone and firing the query means it is giving
the result in same order as first query even the bscore and relscore are
different.

So I disabled the queryResultCache,



Now it is working fine. But I need a better solution than disabling this
for all queries.For eg, I want to disable this for few queries alone not
for all.


Could someone help me please..


Thanks & Regards,
Senthilnathan V


Re: Fault Tolerant Technique of Solr Cloud

2014-02-24 Thread Shalin Shekhar Mangar
Vineet, I'm assuming that you are executing your search from a Java
Client. If so, just use CloudSolrServer present in the Solrj API and
save yourself from all these troubles. If you are not using a Java
client, then you need to put a few or all your servers behind a load
balancer and invoke requests against that.

On Mon, Feb 24, 2014 at 5:34 PM, Vineet Mishra  wrote:
> Can you brief as how to make a direct call to Zookeeper instead of Cloud
> Collection(as currently I was querying the Cloud something like
> *"http://192.168.2.183:8900/solr/collection1/select?q=*:*
> "* ) from UI, now
> if I assume shard 8900 is down then how can I still make the call.
>
> I have followed the Apache Tutorial(with separate zookeeper running on port
> 2181)
>
> http://wiki.apache.org/solr/SolrCloud
>
> Can you please be more specific in respect to zookeeper distributed calls.
>
> Regards
>
>
> On Wed, Feb 19, 2014 at 9:45 PM, Per Steffensen  wrote:
>
>> On 19/02/14 07:57, Vineet Mishra wrote:
>>
>>> Thanks for all your response but my doubt is which *Server:Port* should
>>> the
>>>
>>> query be made as we don't know the crashed server or which server might
>>> crash in the future(as any server can go down).
>>>
>> That is what CloudSolrServer will deal with for you. It knows which
>> servers are down and make sure not to send request to those servers.
>>
>>
>>> The only intention for writing this doubt is to get an idea about how the
>>> query format for distributed search might work if any of the shard or
>>> replica goes down.
>>>
>>
>> // Setting up your CloudSolrServer-client
>> CloudSolrServer client=  new  CloudSolrServer();  //
>>  being the same string as you provide in -D|zkHost when
>> starting your servers
>> |client.setDefaultCollection("collection1");
>> client.connect();
>>
>> // Creating and firing queries (you can do it in different way, but at
>> least this is an option)
>> SolrQuery query = new SolrQuery("*:*");
>> QueryResponse results = client.query(query);
>>
>>
>> Because you are using CloudSolrServer you do not have to worry about not
>> sending the request to a crashed server.
>>
>> In your example I believe the situation is as follows:
>> * One collection called "collection1" with two shards "shard1" and
>> "shard2" each having two replica "replica1" and "replica2" (a replica is an
>> "instance" of a shard, and when you have one replica you are not having
>> replication).
>> * collection1.shard1.replica1 is running on localhost:8983 and
>> collection1.shard1.replica2 is running on localhost:8900 (or maybe switched)
>> * collection1.shard2.replica1 is running on localhost:7574 and
>> collection1.shard2.replica2 is running on localhost:7500 (or maybe switched)
>> If localhost:8900 is the only server that is down, all data is still
>> available for search because every shard has at least on replica running.
>> In that case I believe setting "shards.tolerant" will not make a
>> difference. You will get your response no matter what. But if
>> localhost:8983 was also down there would no live replica of shard1. I that
>> case you will get an exception from you query, indicating that the query
>> cannot be carried out over the complete data-set. In that case if you set
>> "shards.tolerant" that behaviour will change, and you will not get an
>> exception - you will get a real response, but it will just not include data
>> from shard1, because it is not available at the moment. That is just the
>> way I believe "shards.tolerant" works, but you might want to verify that.
>>
>> To set "shards.tolerant":
>>
>> SolrQuery query = new SolrQuery("*:*");
>> query.set("shards.tolerant", true);
>> QueryResponse results = client.query(query);
>>
>>
>> Believe distributes search is default, but you can explicitly require it by
>>
>> query.setDistrib(true);
>>
>> or
>>
>> query.set("distrib", true);
>>
>>
>>> Thanks
>>>
>>
>>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Reg queryResultCache...

2014-02-24 Thread Shalin Shekhar Mangar
Please provide the *Solr* queries that are being invoked by your
middle layer along with the results you expect and the results you
actually got from Solr with cache-enabled.

On Mon, Feb 24, 2014 at 6:23 PM, Senthilnathan Vijayaraja
 wrote:
> Below is the url which will hit the middle layer then middle layer will
> form the solr query and fire it.
>
> *listing?offset=0&sortparam=0&limit=20&q=Chennai~Tambaram~1~2,3~45~2500~800~2000~~24*
>
>
> Chennai-->city
> Tambaram-->locality
> 1-->blah
> 2,3-->blah
> 45~2500-->price_min and max
> 800~2000-->area min and max
> *24--lux_ amenities*
>
> here other than lux_amenities I am using fq for all other things,so the
> problem here is sorting.
>
> I am sorting the results using bscore and relscore likebelow,
>
> *$bscore desc,$relscore desc*
>
> first time it works fine.
>
> above bscore and relscore will change based on lux_amenities but
> lux_amenities is neither part of fq  nor q.So if second time we are
> changing the lux_amenities alone and firing the query means it is giving
> the result in same order as first query even the bscore and relscore are
> different.
>
> So I disabled the queryResultCache,
>
> 
>
> Now it is working fine. But I need a better solution than disabling this
> for all queries.For eg, I want to disable this for few queries alone not
> for all.
>
>
> Could someone help me please..
>
>
> Thanks & Regards,
> Senthilnathan V



-- 
Regards,
Shalin Shekhar Mangar.


Re: Inconsistency between Leader and replica in solr cloud

2014-02-24 Thread Yago Riveiro
This bug was fixed on Solr 4.6.1—
/Yago Riveiro

On Mon, Feb 24, 2014 at 11:56 AM, abhijit das 
wrote:

> We are currently using Solr Cloud Version 4.3, with the following set-up, a 
> core with 2 shards - Shard1 and Shard2, each shard has replication factor 1.
> We have noticed that in one of the shards, the document differs between the 
> leader and the replica. Though the doc exists in both the machines, the 
> properties of the doc are not same.
> This is causing inconsistent result in subsequent queries, our understanding 
> is that the docs would be replicated and be identical in both leader and 
> replica.
> What could be causing this and how can this be avoided.
> Thanks in advance.
> Regards,
> Abhijit
> Sent from Windows Mail

Re: Fault Tolerant Technique of Solr Cloud

2014-02-24 Thread Per Steffensen

On 24/02/14 13:04, Vineet Mishra wrote:

Can you brief as how to make a direct call to Zookeeper instead of Cloud
Collection(as currently I was querying the Cloud something like
*"http://192.168.2.183:8900/solr/collection1/select?q=*:*
"* ) from UI, now
if I assume shard 8900 is down then how can I still make the call.
It is obvious that you cannot make the call to localhost:8900 - the 
server listening to that port is down. You can make the call to any of 
the other servers, though. Information about which Solr-servers are 
running is available in ZooKeeper, CloudSolrServer reads that 
information in order to know which servers to route requests to. As long 
as localhost:8900 is down it will not route requests to that server.


You do not make a "direct call to ZooKeeper". ZooKeeper is not an HTTP 
server that will receive your calls. It just has information about which 
Solr-servers are up and running. CloudSolrServers takes advantage of 
that information. You really cannot do without CloudSolrServer (or at 
least LBHttpSolrServer), unless you write a component that can do the 
same thing in some other language (if the reason you do not want to use 
CloudSolrServer, is that your client is not java). Else you need to do 
other clever stuff, like e.g. what Shalin suggests.


Regards, Per Steffensen


RE: Inconsistency between Leader and replica in solr cloud

2014-02-24 Thread Markus Jelsma
Yes, that issue is fixed. We are on trunk and seeing it happen again. Kill some 
nodes when indexing, trigger OOM or reload the collection and you are in 
trouble again.
 
-Original message-
> From:Yago Riveiro 
> Sent: Monday 24th February 2014 14:54
> To: solr-user@lucene.apache.org
> Subject: Re: Inconsistency between Leader and replica in solr cloud
> 
> This bug was fixed on Solr 4.6.1—
> /Yago Riveiro
> 
> On Mon, Feb 24, 2014 at 11:56 AM, abhijit das 
> wrote:
> 
> > We are currently using Solr Cloud Version 4.3, with the following set-up, a 
> > core with 2 shards - Shard1 and Shard2, each shard has replication factor 1.
> > We have noticed that in one of the shards, the document differs between the 
> > leader and the replica. Though the doc exists in both the machines, the 
> > properties of the doc are not same.
> > This is causing inconsistent result in subsequent queries, our 
> > understanding is that the docs would be replicated and be identical in both 
> > leader and replica.
> > What could be causing this and how can this be avoided.
> > Thanks in advance.
> > Regards,
> > Abhijit
> > Sent from Windows Mail


Re: start.jar config

2014-02-24 Thread Erick Erickson
Probably when originally started, whoever did it piped the output to
dev/null.

You can also change this permanently by altering the logging, see:
https://wiki.apache.org/solr/SolrLogging

Best,
Erick


On Mon, Feb 24, 2014 at 12:56 AM, manju16832003 wrote:

> Solr already writes the logs to a file 'solr.log'. Its located in the same
> folder as start.jar (logs/solr.log).
>
> I'm not sure if thats what you looking for :-).
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/start-jar-config-tp4119201p4119203.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Slow query time on stemmed fields

2014-02-24 Thread Jens Meiners
Hi,

we've built an index (Solr 4.3), which contains approx. 1 Million docs and
its size is around 20 GB (optimized).

In our index we have one field which contains the tokenized words of
indexed documents and a second field with the stemmed contents
(SnowballFilter, German2).

During our tests we've found out that some keywords are just taking too
long to process. When we exclude the stemmed field from our edismax
configuration (qf) the query time was surprisingly quick (10 000x faster).

Had one of you the same experience ?

We are using the stemmed field only to increase the returned documents and
not for highlighting. We know that by applying highlighting on stemmed
values is not good for query speed.

Best Regards,
Jens Meiners


Re: Slow query time on stemmed fields

2014-02-24 Thread Erick Erickson
This is really strange. You should have _fewer_ tokens in your stemmed
field.
Plus, the up-front processing to stem the field in the query shouldn't be
noticeable.

Let's see the query and results from &debug=all being added to the URL
because something is completely strange here.

Best,
Erick


On Mon, Feb 24, 2014 at 7:18 AM, Jens Meiners wrote:

> Hi,
>
> we've built an index (Solr 4.3), which contains approx. 1 Million docs and
> its size is around 20 GB (optimized).
>
> In our index we have one field which contains the tokenized words of
> indexed documents and a second field with the stemmed contents
> (SnowballFilter, German2).
>
> During our tests we've found out that some keywords are just taking too
> long to process. When we exclude the stemmed field from our edismax
> configuration (qf) the query time was surprisingly quick (10 000x faster).
>
> Had one of you the same experience ?
>
> We are using the stemmed field only to increase the returned documents and
> not for highlighting. We know that by applying highlighting on stemmed
> values is not good for query speed.
>
> Best Regards,
> Jens Meiners
>


Re: Slow query time on stemmed fields

2014-02-24 Thread Jack Krupansky
Maybe some heap/GC issue from using more of this 20 GB index. Maybe it was 
running at the edge and just one more field was too much for the heap.


The "timing" section of the debug query response should shed a little light.

-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Monday, February 24, 2014 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Slow query time on stemmed fields

This is really strange. You should have _fewer_ tokens in your stemmed
field.
Plus, the up-front processing to stem the field in the query shouldn't be
noticeable.

Let's see the query and results from &debug=all being added to the URL
because something is completely strange here.

Best,
Erick


On Mon, Feb 24, 2014 at 7:18 AM, Jens Meiners wrote:


Hi,

we've built an index (Solr 4.3), which contains approx. 1 Million docs and
its size is around 20 GB (optimized).

In our index we have one field which contains the tokenized words of
indexed documents and a second field with the stemmed contents
(SnowballFilter, German2).

During our tests we've found out that some keywords are just taking too
long to process. When we exclude the stemmed field from our edismax
configuration (qf) the query time was surprisingly quick (10 000x faster).

Had one of you the same experience ?

We are using the stemmed field only to increase the returned documents and
not for highlighting. We know that by applying highlighting on stemmed
values is not good for query speed.

Best Regards,
Jens Meiners





Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

2014-02-24 Thread Gregg Donovan
Thank you Shalin and Yonik! Both
SOLR-1880
 and SOLR-5768  will be
very helpful for our distributed search performance.



On Mon, Feb 24, 2014 at 5:02 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> I opened SOLR-5768
>
> https://issues.apache.org/jira/browse/SOLR-5768
>
> On Mon, Feb 24, 2014 at 12:56 AM, Shalin Shekhar Mangar
>  wrote:
> > Yes that should be simple. But regardless of the parameter, the
> > fl=id,score use-case should be optimized by default. I think I'll
> > commit the patch as-is and open a new issue to add the
> > distrib.singlePass parameter.
> >
> > On Sun, Feb 23, 2014 at 11:49 PM, Yonik Seeley 
> wrote:
> >> On Sun, Feb 23, 2014 at 1:08 PM, Shalin Shekhar Mangar
> >>  wrote:
> >>> I should clarify though that this optimization only works with
> fl=id,score.
> >>
> >> Although it seems like it should be relatively simple to make it work
> >> with other fields as well, by passing down the complete "fl" requested
> >> if some optional parameter is set (distrib.singlePass?)
> >>
> >> -Yonik
> >> http://heliosearch.org - native off-heap filters and fieldcache for
> solr
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Converting solrdocument response into pojo

2014-02-24 Thread Alexandre Rafalovitch
On Mon, Feb 24, 2014 at 8:03 PM, Navaa
 wrote:
> 

So, you are probably supplying an id and then also merging doctor_id
and id fields together. Which gives you two fields values in ID. I
would have expected Solr to complaint about it, but either way you
have a design issue here.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


RE: Solr4 performance

2014-02-24 Thread Joshi, Shital
Thanks. 

We found some evidence that this could be the issue. We're monitoring closely 
to confirm this. 

One question though: none of our nodes show more that 50% of physical memory 
used. So there is enough memory available for memory mapped files. Can this 
kind of pause still happen? 


-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Sent: Friday, February 21, 2014 5:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 performance

It could be that your query is churning the page cache on that node
sometimes, so Solr pauses so the OS can drag those pages off of disk. Have
you tried profiling your iowait in top or iostat during these pauses?
(assuming you're using linux).

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital  wrote:

> Thanks for your answer.
>
> We confirmed that it is not GC issue.
>
> The auto warming query looks good too and queries before and after the
> long running query comes back really quick. The only thing stands out is
> shard on which query takes long time has couple million more documents than
> other shards.
>
> -Original Message-
> From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> Sent: Thursday, February 20, 2014 5:26 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr4 performance
>
> Hi,
>
> As for your first question, setting openSearcher to true means you will see
> the new docs after every hard commit. Soft and hard commits only become
> isolated from one another with that set to false.
>
> Your second problem might be explained by your large heap and garbage
> collection. Walking a heap that large can take an appreciable amount of
> time. You might consider turning on the JVM options for logging GC and
> seeing if you can correlate your slow responses to times when your JVM is
> garbage collecting.
>
> Hope that helps,
> On Feb 20, 2014 4:52 PM, "Joshi, Shital"  wrote:
>
> > Hi!
> >
> > I have few other questions regarding Solr4 performance issue we're
> facing.
> >
> > We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use
> > commit=false in update URL. We have only hard commit setting in Solr4
> > config.
> >
> > 
> >${solr.autoCommit.maxTime:60}
> >10
> >true
> >  
> >
> >
> > Since we're not using Soft commit at all (commit=false), the caches will
> > not get reloaded for every commit and recently added documents will not
> be
> > visible, correct?
> >
> > What we see is queries which usually take few milli seconds, takes ~40
> > seconds once in a while. Can high IO during hard commit cause queries to
> > slow down?
> >
> > For some shards we see 98% full physical memory. We have 60GB machine (30
> > GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high
> > physical memory would cause queries to slow down. We're in process of
> > reducing JVM size anyways.
> >
> > We have never run optimization till now. QA optimization didn't yield in
> > performance gain.
> >
> > Thanks much for all help.
> >
> > -Original Message-
> > From: Shawn Heisey [mailto:s...@elyograg.org]
> > Sent: Tuesday, February 18, 2014 4:55 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr4 performance
> >
> > On 2/18/2014 2:14 PM, Joshi, Shital wrote:
> > > Thanks much for all suggestions. We're looking into reducing allocated
> > heap size of Solr4 JVM.
> > >
> > > We're using NRTCachingDirectoryFactory. Does it use MMapDirectory
> > internally? Can someone please confirm?
> >
> > In Solr, NRTCachingDirectory does indeed use MMapDirectory as its
> > default delegate.  That's probably also the case with Lucene -- these
> > are Lucene classes, after all.
> >
> > MMapDirectory is almost always the most efficient way to handle on-disk
> > indexes.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

2014-02-24 Thread Jeff Wartes

I¹ll second that thank-you, this is awesome.

I asked about this issue in 2010, but when I didn¹t hear anything (and
disappointingly didn¹t find SOLR-1880), we ended up rolling our own
version of this functionality. I¹ve been laboriously migrating it every
time we bump our Solr version ever since. The performance difference is
quite noticeable. 
One thing is that our version interferes pretty badly with various other
Components. It¹s been a while, but my recollection is that other
Components like Debug assumed some stuff happened in STAGE_GET_FIELDS.

I think I¹ll try to apply SOLR-1880 to 4.6.1 and see what happens.




On 2/24/14, 11:07 AM, "Gregg Donovan"  wrote:

>Thank you Shalin and Yonik! Both
>SOLR-1880
> and SOLR-5768  will be
>very helpful for our distributed search performance.
>
>
>
>On Mon, Feb 24, 2014 at 5:02 AM, Shalin Shekhar Mangar <
>shalinman...@gmail.com> wrote:
>
>> I opened SOLR-5768
>>
>> https://issues.apache.org/jira/browse/SOLR-5768
>>
>> On Mon, Feb 24, 2014 at 12:56 AM, Shalin Shekhar Mangar
>>  wrote:
>> > Yes that should be simple. But regardless of the parameter, the
>> > fl=id,score use-case should be optimized by default. I think I'll
>> > commit the patch as-is and open a new issue to add the
>> > distrib.singlePass parameter.
>> >
>> > On Sun, Feb 23, 2014 at 11:49 PM, Yonik Seeley 
>> wrote:
>> >> On Sun, Feb 23, 2014 at 1:08 PM, Shalin Shekhar Mangar
>> >>  wrote:
>> >>> I should clarify though that this optimization only works with
>> fl=id,score.
>> >>
>> >> Although it seems like it should be relatively simple to make it work
>> >> with other fields as well, by passing down the complete "fl"
>>requested
>> >> if some optional parameter is set (distrib.singlePass?)
>> >>
>> >> -Yonik
>> >> http://heliosearch.org - native off-heap filters and fieldcache for
>> solr
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Shalin Shekhar Mangar.
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>



SolrCloud Startup

2014-02-24 Thread KNitin
Hi

 I have a 4 node solrcloud cluster with more than 50 collections with 4
shards each. Everytime I want to make a schema change, I upload configs to
zookeeper and then restart all nodes. However the restart of every node is
very slow and takes about 20-30 minutes per node.

Is it recommended to make loadOnStartup=false and allow solrcloud to lazy
load? Is there a way to make schema changes without restarting solrcloud?


Thanks


Re: SolrCloud Startup

2014-02-24 Thread Jeff Wartes

There is a RELOAD collection command you might try:
https://cwiki.apache.org/confluence/display/solr/Collections+API#Collection
sAPI-api2


I think you¹ll find this a lot faster than restarting your whole JVM.


On 2/24/14, 4:12 PM, "KNitin"  wrote:

>Hi
>
> I have a 4 node solrcloud cluster with more than 50 collections with 4
>shards each. Everytime I want to make a schema change, I upload configs to
>zookeeper and then restart all nodes. However the restart of every node is
>very slow and takes about 20-30 minutes per node.
>
>Is it recommended to make loadOnStartup=false and allow solrcloud to lazy
>load? Is there a way to make schema changes without restarting solrcloud?
>
>
>Thanks



Re: Solr4 performance

2014-02-24 Thread Michael Della Bitta
I'm not sure how you're measuring free RAM. Maybe this will help:

http://www.linuxatemyram.com/play.html

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Mon, Feb 24, 2014 at 5:35 PM, Joshi, Shital  wrote:

> Thanks.
>
> We found some evidence that this could be the issue. We're monitoring
> closely to confirm this.
>
> One question though: none of our nodes show more that 50% of physical
> memory used. So there is enough memory available for memory mapped files.
> Can this kind of pause still happen?
>
>
> -Original Message-
> From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> Sent: Friday, February 21, 2014 5:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr4 performance
>
> It could be that your query is churning the page cache on that node
> sometimes, so Solr pauses so the OS can drag those pages off of disk. Have
> you tried profiling your iowait in top or iostat during these pauses?
> (assuming you're using linux).
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> "The Science of Influence Marketing"
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions<
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com 
>
>
> On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital 
> wrote:
>
> > Thanks for your answer.
> >
> > We confirmed that it is not GC issue.
> >
> > The auto warming query looks good too and queries before and after the
> > long running query comes back really quick. The only thing stands out is
> > shard on which query takes long time has couple million more documents
> than
> > other shards.
> >
> > -Original Message-
> > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> > Sent: Thursday, February 20, 2014 5:26 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: Solr4 performance
> >
> > Hi,
> >
> > As for your first question, setting openSearcher to true means you will
> see
> > the new docs after every hard commit. Soft and hard commits only become
> > isolated from one another with that set to false.
> >
> > Your second problem might be explained by your large heap and garbage
> > collection. Walking a heap that large can take an appreciable amount of
> > time. You might consider turning on the JVM options for logging GC and
> > seeing if you can correlate your slow responses to times when your JVM is
> > garbage collecting.
> >
> > Hope that helps,
> > On Feb 20, 2014 4:52 PM, "Joshi, Shital"  wrote:
> >
> > > Hi!
> > >
> > > I have few other questions regarding Solr4 performance issue we're
> > facing.
> > >
> > > We're committing data to Solr4 every ~30 seconds (up to 20K rows). We
> use
> > > commit=false in update URL. We have only hard commit setting in Solr4
> > > config.
> > >
> > > 
> > >${solr.autoCommit.maxTime:60}
> > >10
> > >true
> > >  
> > >
> > >
> > > Since we're not using Soft commit at all (commit=false), the caches
> will
> > > not get reloaded for every commit and recently added documents will not
> > be
> > > visible, correct?
> > >
> > > What we see is queries which usually take few milli seconds, takes ~40
> > > seconds once in a while. Can high IO during hard commit cause queries
> to
> > > slow down?
> > >
> > > For some shards we see 98% full physical memory. We have 60GB machine
> (30
> > > GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high
> > > physical memory would cause queries to slow down. We're in process of
> > > reducing JVM size anyways.
> > >
> > > We have never run optimization till now. QA optimization didn't yield
> in
> > > performance gain.
> > >
> > > Thanks much for all help.
> > >
> > > -Original Message-
> > > From: Shawn Heisey [mailto:s...@elyograg.org]
> > > Sent: Tuesday, February 18, 2014 4:55 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Solr4 performance
> > >
> > > On 2/18/2014 2:14 PM, Joshi, Shital wrote:
> > > > Thanks much for all suggestions. We're looking into reducing
> allocated
> > > heap size of Solr4 JVM.
> > > >
> > > > We're using NRTCachingDirectoryFactory. Does it use MMapDirectory
> > > internally? Can someone please confirm?
> > >
> > > In Solr, NRTCachingDirectory does indeed use MMapDirectory as its
> > > default delegate.  That's probably also the case with Lucene -- these
> > > are Lucene classes, after all.
> > >
> > > MMapDirectory is almost always the most efficient way to handle on-disk
> > > indexes.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


Fetching uniqueKey and other int quickly from documentCache?

2014-02-24 Thread Gregg Donovan
We fetch a large number of documents -- 1000+ -- for each search. Each
request fetches only the uniqueKey or the uniqueKey plus one secondary
integer key. Despite this, we find that we spent a sizable amount of time
in SolrIndexSearcher#doc(int docId, Set fields). Time is spent
fetching the two stored fields, LZ4 decoding, etc.

I would love to be able to tell Solr to always fetch these two fields from
memory. We have them both in the fieldCache so we're already spending the
RAM. I've seen this asked previously [1], so it seems like a fairly common
need, especially for distributed search. Any ideas?

A few possible ideas I had:

--Check FieldCache.html#getCacheEntries() before going to stored fields.
--Give the documentCache config a list of fields it should load from the
fieldCache


Having an in-memory mapping from docId->uniqueKey has come up for us
before. We've used a custom SolrCache maintaining that mapping to quickly
filter over personalized collections. Maybe the uniqueKey should be more
optimized out of the box? Perhaps a custom "uniqueKey" codec that also
maintained the docId->uniqueKey mapping in memory?

--Gregg

[1] http://search-lucene.com/m/oCUKJ1heHUU1


Re: SolrCloud Startup

2014-02-24 Thread Shawn Heisey
> Hi
>
>  I have a 4 node solrcloud cluster with more than 50 collections with 4
> shards each. Everytime I want to make a schema change, I upload configs to
> zookeeper and then restart all nodes. However the restart of every node is
> very slow and takes about 20-30 minutes per node.
>
> Is it recommended to make loadOnStartup=false and allow solrcloud to lazy
> load? Is there a way to make schema changes without restarting solrcloud?

I'm on my phone so getting a Url for you is hard. Search the wiki for
SolrPerformanceProblems. There's a section there on slow startup.

If that's not it, it's probably not enough RAM for the OS disk cache. That
is also discussed on that wiki page.

Thanks,
Shawn





Re: SolrCloud Startup

2014-02-24 Thread Otis Gospodnetic
Hi,

Slow startup could it be your transaction logs are being replayed?  Are
they very big?  Do you see lots of disk reading during those 20-30 minutes?

Shawn was referring to http://wiki.apache.org/solr/SolrPerformanceProblems

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Feb 24, 2014 at 10:41 PM, Shawn Heisey  wrote:

> > Hi
> >
> >  I have a 4 node solrcloud cluster with more than 50 collections with 4
> > shards each. Everytime I want to make a schema change, I upload configs
> to
> > zookeeper and then restart all nodes. However the restart of every node
> is
> > very slow and takes about 20-30 minutes per node.
> >
> > Is it recommended to make loadOnStartup=false and allow solrcloud to lazy
> > load? Is there a way to make schema changes without restarting solrcloud?
>
> I'm on my phone so getting a Url for you is hard. Search the wiki for
> SolrPerformanceProblems. There's a section there on slow startup.
>
> If that's not it, it's probably not enough RAM for the OS disk cache. That
> is also discussed on that wiki page.
>
> Thanks,
> Shawn
>
>
>
>


Re: SolrCloud Startup

2014-02-24 Thread Erick Erickson
What is your firstSearcher set to in solrconfig.xml? If you're
doing something really crazy there that might be an issue.

But I think Otis' suggestion is a lot more probable. What
are your autocommits configured to?

Best,
Erick


On Mon, Feb 24, 2014 at 7:41 PM, Shawn Heisey  wrote:

> > Hi
> >
> >  I have a 4 node solrcloud cluster with more than 50 collections with 4
> > shards each. Everytime I want to make a schema change, I upload configs
> to
> > zookeeper and then restart all nodes. However the restart of every node
> is
> > very slow and takes about 20-30 minutes per node.
> >
> > Is it recommended to make loadOnStartup=false and allow solrcloud to lazy
> > load? Is there a way to make schema changes without restarting solrcloud?
>
> I'm on my phone so getting a Url for you is hard. Search the wiki for
> SolrPerformanceProblems. There's a section there on slow startup.
>
> If that's not it, it's probably not enough RAM for the OS disk cache. That
> is also discussed on that wiki page.
>
> Thanks,
> Shawn
>
>
>
>


Can not index document in solar

2014-02-24 Thread rachun
Dear all,

Could you guys please help me?

I just try to index document into solar it doesn't give me any error but it
doesn't index document too but it used to work but not now please see..

#Solr Log


WARN  - 2014-02-25 11:30:35.675; org.apache.solr.handler.loader.XMLLoader;
Unknown attribute id in add:allowDups
INFO  - 2014-02-25 11:30:35.677;
org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
webapp=/solr path=/update/ params={indent=on&wt=xml&version=2.2}
{add=[WT334455 (1460983704505548800)]} 0 3
INFO  - 2014-02-25 11:30:49.560;
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2014-02-25 11:30:49.566; org.apache.solr.core.SolrDeletionPolicy;
SolrDeletionPolicy.onCommit: commits: num=2

commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Downloads/solr-4.6.0/example/solr/collection1/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@2a44b7f7;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_g17,generation=20779}

commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Downloads/solr-4.6.0/example/solr/collection1/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@2a44b7f7;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_g18,generation=20780}
INFO  - 2014-02-25 11:30:49.567; org.apache.solr.core.SolrDeletionPolicy;
newest commit generation = 20780
INFO  - 2014-02-25 11:30:49.568; org.apache.solr.search.SolrIndexSearcher;
Opening Searcher@21ff9c03 main
INFO  - 2014-02-25 11:30:49.569; org.apache.solr.core.QuerySenderListener;
QuerySenderListener sending requests to Searcher@21ff9c03
main{StandardDirectoryReader(segments_g18:63044:nrt
_hop(4.6):C3444/1:delGen=1 _hp3(4.6):C1 _hp4(4.6):C1)}
INFO  - 2014-02-25 11:30:49.569; org.apache.solr.core.QuerySenderListener;
QuerySenderListener done.
INFO  - 2014-02-25 11:30:49.569;
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
INFO  - 2014-02-25 11:30:49.569; org.apache.solr.core.SolrCore;
[collection1] Registered new searcher Searcher@21ff9c03
main{StandardDirectoryReader(segments_g18:63044:nrt
_hop(4.6):C3444/1:delGen=1 _hp3(4.6):C1 _hp4(4.6):C1)}
INFO  - 2014-02-25 11:30:49.570;
org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
webapp=/solr path=/update params={commit=true} {commit=} 0 10


PHP Code

$options = array
(
'hostname'  => \Config::get('database.connections.solr.host'),
'port'  => \Config::get('database.connections.solr.port'),
'path'  => '/solr'
);

$client = new SolrClient($options);

$doc = new SolrInputDocument();

$doc->addField('product_id', 'WT334455');
$doc->addField('product_name_en', 'Software');

$updateResponse = $client->addDocument($doc);

print_r($updateResponse->getResponse());

I also do manual commit by this >> 
http://localhost:8983/solr/update?commit=true

Any idea please..
Thank you very much,
Chun





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-not-index-document-in-solar-tp4119461.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can not index document in solar

2014-02-24 Thread Erick Erickson
Well, what was that last thing you changed?

There's really not much here to go on, you
have to provide more details about what
you've tried, what evidence you have that
the doc isn't indexed, etc.

have you looked at your Solr admin screen to
see if maxDoc has increased? Have you
committed your changes before looking?
Have you?

You might review:

http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick


On Mon, Feb 24, 2014 at 8:37 PM, rachun  wrote:

> Dear all,
>
> Could you guys please help me?
>
> I just try to index document into solar it doesn't give me any error but it
> doesn't index document too but it used to work but not now please see..
>
> #Solr Log
>
>
> WARN  - 2014-02-25 11:30:35.675; org.apache.solr.handler.loader.XMLLoader;
> Unknown attribute id in add:allowDups
> INFO  - 2014-02-25 11:30:35.677;
> org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
> webapp=/solr path=/update/ params={indent=on&wt=xml&version=2.2}
> {add=[WT334455 (1460983704505548800)]} 0 3
> INFO  - 2014-02-25 11:30:49.560;
> org.apache.solr.update.DirectUpdateHandler2; start
>
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> INFO  - 2014-02-25 11:30:49.566; org.apache.solr.core.SolrDeletionPolicy;
> SolrDeletionPolicy.onCommit: commits: num=2
>
> commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@
> /Downloads/solr-4.6.0/example/solr/collection1/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@2a44b7f7;
> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_g17,generation=20779}
>
> commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@
> /Downloads/solr-4.6.0/example/solr/collection1/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@2a44b7f7;
> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_g18,generation=20780}
> INFO  - 2014-02-25 11:30:49.567; org.apache.solr.core.SolrDeletionPolicy;
> newest commit generation = 20780
> INFO  - 2014-02-25 11:30:49.568; org.apache.solr.search.SolrIndexSearcher;
> Opening Searcher@21ff9c03 main
> INFO  - 2014-02-25 11:30:49.569; org.apache.solr.core.QuerySenderListener;
> QuerySenderListener sending requests to Searcher@21ff9c03
> main{StandardDirectoryReader(segments_g18:63044:nrt
> _hop(4.6):C3444/1:delGen=1 _hp3(4.6):C1 _hp4(4.6):C1)}
> INFO  - 2014-02-25 11:30:49.569; org.apache.solr.core.QuerySenderListener;
> QuerySenderListener done.
> INFO  - 2014-02-25 11:30:49.569;
> org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
> INFO  - 2014-02-25 11:30:49.569; org.apache.solr.core.SolrCore;
> [collection1] Registered new searcher Searcher@21ff9c03
> main{StandardDirectoryReader(segments_g18:63044:nrt
> _hop(4.6):C3444/1:delGen=1 _hp3(4.6):C1 _hp4(4.6):C1)}
> INFO  - 2014-02-25 11:30:49.570;
> org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
> webapp=/solr path=/update params={commit=true} {commit=} 0 10
>
>
> PHP Code
>
> $options = array
> (
> 'hostname'  => \Config::get('database.connections.solr.host'),
> 'port'  => \Config::get('database.connections.solr.port'),
> 'path'  => '/solr'
> );
>
> $client = new SolrClient($options);
>
> $doc = new SolrInputDocument();
>
> $doc->addField('product_id', 'WT334455');
> $doc->addField('product_name_en', 'Software');
>
> $updateResponse = $client->addDocument($doc);
>
> print_r($updateResponse->getResponse());
>
> I also do manual commit by this >>
> http://localhost:8983/solr/update?commit=true
>
> Any idea please..
> Thank you very much,
> Chun
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Can-not-index-document-in-solar-tp4119461.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Can not index document in solar

2014-02-24 Thread rachun
Thank you Eric,
I figured out something that actually the document is indexed but it doesn't
show on my api result because it missed some field.

So I would like to delete this post how can I?

Thank you very  much,
Chun.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-not-index-document-in-solar-tp4119461p4119465.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can not index document in solar

2014-02-24 Thread Erick Erickson
No, can't delete posts. Having them around keeps a history for
others as well, so that's an added benefit.

Glad you figured it out!

Erick


On Mon, Feb 24, 2014 at 9:20 PM, rachun  wrote:

> Thank you Eric,
> I figured out something that actually the document is indexed but it
> doesn't
> show on my api result because it missed some field.
>
> So I would like to delete this post how can I?
>
> Thank you very  much,
> Chun.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Can-not-index-document-in-solar-tp4119461p4119465.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Can not index raw binary data stored in Database in BLOB format.

2014-02-24 Thread Chandan khatua
I have verified that blob column is called MESSAGE.
In my data-config file the field column named 'id' is indexed in solr. But
the data(field column  name="mxMsg") is not indexed. It comes empty with in
quotes. 

The same configuration is working on xml data (stored BLOB type in DB), But
not on binary data (stored BLOB type in DB).

Please help.

Thanking you,

- Chandan

-Original Message-
From: Raymond Wiker [mailto:rwi...@gmail.com] 
Sent: Monday, February 24, 2014 5:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Can not index raw binary data stored in Database in BLOB
format.

Try running the query for the outer entity ("messages") in an sql client,
and verify that your blob column is called MESSAGE.


On Mon, Feb 24, 2014 at 12:22 PM, Chandan khatua
wrote:

> I've tried as per your guide. But, no data are indexing.
> The output of Query screen looks like :
>
> 
> 2158
> 
>xmlns="http://www.w3.org/1999/xhtml";>
> 
> 
> 
> 
> 
> 
> 1460918369230258176
>
>
>
> But, the indexed data should be displayed within   tag. When xml 
> message are stored in DB in BLOB type, then indexing is done smoothly.
> But, I am trying to index binary data which are stored in DB in BLOB type.
>
> Need help.
>
> Thanking you,
> Chandan
>
>
>
> -Original Message-
> From: Raymond Wiker [mailto:rwi...@gmail.com]
> Sent: Monday, February 24, 2014 4:38 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can not index raw binary data stored in Database in BLOB 
> format.
>
> Try replacing the inner entity with something like
>
> dataSource="dastream"
>processor="TikaEntityProcessor"
>dataField="messages.MESSAGE"
>format="xml">
> 
>   
>
> --- this assumes that you get the blob from a column named "MESSAGE" 
> in the outer entity ("messages").
>
>
> On Mon, Feb 24, 2014 at 11:51 AM, Chandan khatua
> wrote:
>
> > Hi Raymond !
> >
> > I've data-config.xml like bellow:
> >
> >> name="db" driver="oracle.jdbc.driver.OracleDriver"
> > url="jdbc:oracle:thin:@//x.x.x.x:x/d11gr21" user="x" password="x"/> 
> >  
> > 
> >>   name="messages" pk=" PK" transformer='DateFormatTransformer'
> >   query="select * from table1"
> >   dataSource="db">
> >  
> >  
> >  > name="message"
> > dataSource="dastream"
> > processor="TikaEntityProcessor"
> > url="message"
> > dataField="db.MESSAGE"
> > format="text"
> > >
> >
> > 
> >   
> > 
> >
> >
> >  
> > 
> >
> >
> >
> > This is looks like similar to your configuration. But when xml data 
> > are in BLOB in database, indexing is done. But, when binary data are 
> > in BLOB in database, indexing is NOT done.
> > Please help.
> >
> > Thanking you,
> > -Chandan
> >
> >
> > -Original Message-
> > From: Raymond Wiker [mailto:rwi...@gmail.com]
> > Sent: Monday, February 24, 2014 4:06 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Can not index raw binary data stored in Database in 
> > BLOB format.
> >
> > I've done something like this; the key was to use a 
> > FieldStreamDataSource to read from the BLOB field.
> >
> > Something like
> >
> > 
> > 
> >
> > then
> >
> >> dataField="main.BLOB" dataSource="fieldstream" format="xml">
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > ...
> >
> >
> >
> >
> > On Mon, Feb 24, 2014 at 11:04 AM, Chandan khatua
> > wrote:
> >
> > > Hi Gora !
> > >
> > > Your concern was "What is the type of the column used to store the 
> > > binary data in Oracle?"
> > > The column type is BLOB in DB.  The column can also have rich text
> file.
> > >
> > > Regards,
> > > Chandan
> > >
> > >
> > > -Original Message-
> > > From: Gora Mohanty [mailto:g...@mimirtech.com]
> > > Sent: Monday, February 24, 2014 3:02 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Can not index raw binary data stored in Database in 
> > > BLOB format.
> > >
> > > On 24 February 2014 12:51, Chandan khatua 
> > > 
> > wrote:
> > > > Hi,
> > > >
> > > >
> > > >
> > > > We have raw binary data stored in database(not word,excel,xml 
> > > > etc
> > > > files) in BLOB.
> > > >
> > > > We are trying to index using TikaEntityProcessor but nothing 
> > > > seems to get indexed.
> > > >
> > > > But the same configuration works when xml/word/excel files are 
> > > > stored in the BLOB field.
> > >
> > > Please start by reviewing
> > > http://wiki.apache.org/solr/DataImportHandler as the above seems 
> > > quite confused. Why are you using TikaEntityProcessor if the data 
> > > in the DB are not richtext files?
> > >
> > > What is the type of the column used to store the binary data in 
> > > Oracle? You might be able to convert it with a ClobTransformer.
> > > Please see
> > > http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
> > >
> > > http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my
> > > _t
> > > ab
> >

Re: Reg queryResultCache...

2014-02-24 Thread Senthilnathan Vijayaraja
Hi,


select?br="2"+"3"&version=2&fl=id,level,name,city,amenities,$lanorm,$relscore,$bscore&q=*:*&fq={!lucene
q.op=OR df=property_type v=$ptype}&ptype=1&fq={!lucene q.op=OR df=city
v=$cit}&sort=$bscore desc,$relscore
desc&cit=Chennai&relscore=product($banorm,15)&bscore=banorm($la,amenities,10)&la=8

this is the sample query.

banorm($la,amenities,10) , banorm is the custom function,la is the user
input and 10 is some constant vale.

if la=8 then the results should be,and it is working fine

name:ABC
city: "Chennai",
propertyType: [1],
baNorm: 0.4,
relScore: 6,
bScore: 3001

name:XYZ
city: "Chennai",
propertyType: [1],
baNorm: 0.4,
relScore: 6,
bScore: 3001

name:PQR
city: "Chennai",
propertyType: [1],
baNorm: 0,
relScore: 0,
bScore: 2001

if we are changing the la to 24(i.e la=24) again the results are being
displayed in the same order as first query but the bscore and relscore
values are different,which means the result is not sorted.

select?br="2"+"3"&version=2&fl=id,level,name,city,amenities,$lanorm,$relscore,$bscore&q=*:*&fq={!lucene
q.op=OR df=property_type v=$ptype}&ptype=1&fq={!lucene q.op=OR df=city
v=$cit}&sort=$bscore desc,$relscore
desc&cit=Chennai&relscore=product($banorm,15)&bscore=banorm($la,amenities,10)&la=24

name:ABC
city: "Chennai",
propertyType: [1],
baNorm: 0,
relScore: 0,
bScore: 2001

name:XYZ
city: "Chennai",
propertyType: [1],
baNorm: 0.6,
relScore: 9,
bScore: 4001

name:PQR
city: "Chennai",
propertyType: [1],
baNorm: 0.4,
relScore: 6,
bScore: 3001


Thanks & Regards,
Senthilnathan V


On Mon, Feb 24, 2014 at 7:04 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Please provide the *Solr* queries that are being invoked by your
> middle layer along with the results you expect and the results you
> actually got from Solr with cache-enabled.
>
> On Mon, Feb 24, 2014 at 6:23 PM, Senthilnathan Vijayaraja
>  wrote:
> > Below is the url which will hit the middle layer then middle layer will
> > form the solr query and fire it.
> >
> >
> *listing?offset=0&sortparam=0&limit=20&q=Chennai~Tambaram~1~2,3~45~2500~800~2000~~24*
> >
> >
> > Chennai-->city
> > Tambaram-->locality
> > 1-->blah
> > 2,3-->blah
> > 45~2500-->price_min and max
> > 800~2000-->area min and max
> > *24--lux_ amenities*
> >
> > here other than lux_amenities I am using fq for all other things,so the
> > problem here is sorting.
> >
> > I am sorting the results using bscore and relscore likebelow,
> >
> > *$bscore desc,$relscore desc*
> >
> > first time it works fine.
> >
> > above bscore and relscore will change based on lux_amenities but
> > lux_amenities is neither part of fq  nor q.So if second time we are
> > changing the lux_amenities alone and firing the query means it is giving
> > the result in same order as first query even the bscore and relscore are
> > different.
> >
> > So I disabled the queryResultCache,
> >
> > 
> >
> > Now it is working fine. But I need a better solution than disabling this
> > for all queries.For eg, I want to disable this for few queries alone not
> > for all.
> >
> >
> > Could someone help me please..
> >
> >
> > Thanks & Regards,
> > Senthilnathan V
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Apache Solr Beginner's Guide

2014-02-24 Thread leon715
For more info : http://www.packtpub.com/apache-solr-beginners-guide/book



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apache-Solr-Beginner-s-Guide-tp4119486.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Apache Solr Beginner's Guide

2014-02-24 Thread Alexandre Rafalovitch
Yes, and? There is a bunch of books on Solr, including a couple for
the beginners. Packt in particular has obviously gone for the volume
approach :-)

If yo have a question about the book, you may want to send it to Packt
or book author or other forums. Or rephrase it as a specific Solr
questions that we can help with on the forum without needing a copy of
the book.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Feb 25, 2014 at 5:44 PM, leon715  wrote:
> For more info : http://www.packtpub.com/apache-solr-beginners-guide/book
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Apache-Solr-Beginner-s-Guide-tp4119486.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Sort order in field collapsing (group.field=xxxx)

2014-02-24 Thread Sesha Sendhil Subramanian
Hey,

I have a solr cloud setup with 2 shards. I am trying to use solr's field
grouping feature

My query looks like q=*:*&fq=field:value&group=true&group.field=otherValue

The *ordering* of the groups *differs based on which shard the query is
fired from* and it seems that docs located in that same shard (as the one
from which query is fired) come up first.

If I fire the query without grouping i.e q=*:*&fq=field:value , the
ordering is consistent when the query is fired from either shard.

The documents to be grouped are spread across the two shards. However, I
have made sure documents within a group will be on the same shard by
setting my id as fieldA!fieldB where I use fieldA to group the docs by.

Thanks
Sesha