How to sort by the function: relevance_score*numberic_field/(relevance_score +numberic_field )

2013-08-20 Thread Liu
Hi:

  I want to rank the search result by the function:
relevance_score*numberic_field/(relevance_score +numberic_field ) , this
function equals to 

1/((1/relevance_score)+1/numberic_field) 

 

As far as I know ,I could use function query: sort=
div(1,sum(div(1,field(numberic_field)),div(1,query({!edismax v='
somewords''} .There is a subquery in this function: query({!edismax
v='somewords'}) ,it returns the relevance_sore .But I can't figure out its
query efficiency. After tracking the source code, I think the efficiency is
OK, but I can't make sure.

 

Do we have other approaches to sort docs by:
relevance_score*numberic_field/(relevance_score +numberic_field ) ?

 

Thank you

Leo



答复: removing duplicates

2013-08-21 Thread Liu
This picture is extracted from apache-solr-ref-guide-4.4.pdf ,Maybe it will
help you.
You could download the document from
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/

-邮件原件-
发件人: Ali, Saqib [mailto:docbook@gmail.com] 
发送时间: 2013年8月22日 5:15
收件人: solr-user@lucene.apache.org
主题: removing duplicates

hello,

We have documents that are duplicates i.e. the ID is different, but rest of
the fields are same. Is there a query that can remove duplicate, and just
leave one copy of the document on solr? There is one numeric field that we
can key off for find duplicates.

Please advise.

Thanks


Re: Documents cannot be searched immediately when indexed using REST API with Solr Cloud

2015-03-19 Thread Liu Bo
Hi Edvin

Please review your commit/soft-commit configuration,
"soft commits are about visibility, hard commits are about durability"
  by a wise man. :)

If you are doing NRT index and searching, your probably need a short soft
commit interval or commit explicitly in your request handler. Be advised
that these strategies and configurations need to be tested and adjusted
according to your data size, searching and index updating frequency.

You should be able to find the answer yourself here:
http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

All the best

Liu Bo

On 19 March 2015 at 17:54, Zheng Lin Edwin Yeo  wrote:

> Hi,
>
> I'm using Solr Cloud now, with 2 shards known as shard1 and shard2, and
> when I try to index rich-text documents using REST API or the default
> Documents module in Solr Admin UI, the documents that are indexed do not
> appear immediately when I do a search. It only appears after I restarted
> the Solr services (both shard1 and shard2).
>
> However, the same issue do not happen when I index the same documents using
> post.jar, and I can search for the indexed documents immediately.
>
> Here's my ExtractingRequestHandler in solrconfig.xml.
>
>  class="solr.extraction.ExtractingRequestHandler" >
> 
>   true
>   ignored_
>
>   
>   true
>   links
>   ignored_
> 
>   
>
> What could be the reason why this is happening, and any solutions to solve
> it?
>
> Regards,
> Edwin
>


Help needed on Solr Streaming Expressions

2016-06-05 Thread Hui Liu
Hi,

  I have Solr 6.0.0 installed on my PC (windows 7), I was 
experimenting with 'Streaming Expression' feature by following steps from this 
link: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions, 
but cannot get it to work, attached is my solrconfig.xml and schema.xml, note I 
do have 'export' handler defined in my 'solrconfig.xml' and enabled all fields 
as 'docvalues' in 'schema.xml'; I am using solr cloud and external zookeeper 
(also installed on m PC), here is the command to start this 2-node Solr cloud 
instance and to create the collection 'document3':

-- start 2-node solr cloud instances:
solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3
solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4

-- create the collection:
solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2

  after creating the collection I loaded a few documents using 
'csv' format and I was able to query it using 'curl' command from my PC:

-- this works on my PC:
curl 
http://localhost:8988/solr/document3/select?q=*:*&sort=document_id+desc,sender_msg_dest+desc&fl=document_id,sender_msg_dest,recip_msg_dest

  but when trying Streaming 'search' using curl, it does not work, 
I tried with 3 different options: with zkHost, using 'export', or using 
'select', all getting the same error:

curl: (6) Could not resolve host: sort=document_id asc,qt=
{"result-set":{"docs":[
{"EXCEPTION":null,"EOF":true}]}}

-- different curl commands tried, all getting the same error above:
curl --data-urlencode 
'expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id, 
sender_msg_dest", sort="document_id asc",qt="/export")' 
"http://localhost:8988/solr/document2/stream";

curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id, 
sender_msg_dest", sort="document_id asc",qt="/export")' 
"http://localhost:8988/solr/document2/stream";

curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id, 
sender_msg_dest", sort="document_id asc",qt="/select",rows=10)' 
"http://localhost:8988/solr/document2/stream";

  what am I doing wrong? Thanks for any help!

Regards,
Hui Liu





  

  
  6.0.0

  
  ${solr.data.dir:}


  
  
   

  
  

  
  


${solr.lock.type:native}


 true
  


  
  
  
  
  
  

  
  



  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}

 

  
   ${solr.autoCommit.maxTime:15000} 
   false 
 


  
   ${solr.autoSoftCommit.maxTime:-1} 
 

  
  
  
  

1024









   



 



true

   
   20

   
   200


false


2

  


  
  
 





  

  
  
  

 
   explicit
   10
 



  
  
 
   explicit
   json
   true
   text
 
  

  

  text

  

  
  


  
  

  
  

 explicit 
 true

  
  


  

  
  

  
  
 
  true
  false
 

  terms

  


  
{!xport}
xsort
false
  
  
query
  


  
  




  
 
 
 
 
 

   

  
  
  
   
   
 
 
 
 
 
 
 
 
 
   
  document_id
  document_id



RE: Help needed on Solr Streaming Expressions

2016-06-06 Thread Hui Liu
Joel,

Thank you very much for your help, I tried the http command below with my 
existing 2 shards collection 'document3' (sorry I have a typo below should be 
document3 instead of document2), this time I got much better error:

{"result-set":{"docs":[
{"EXCEPTION":"Unable to construct instance of 
org.apache.solr.client.solrj.io.stream.CloudSolrStream","EOF":true}]}}

I attach the error stack trace from 'solr-8988-console.log' and 'solr.log' here 
in file 'solr_error.txt'.

However I continued and tried create another identical collection 'document5' 
with 2 shards and 2 replica using the same schema, this time the http URL 
worked!!! Maybe my previous collection 'document3' has some corruption? 

-- command to create collection 'document5':
solr create -c document5 -d new_doc_configs5 -p 8988 -s 2 -rf 2

-- command for stream expression:
http://localhost:8988/solr/document5/stream?expr=search(document5,zkHost="127.0.0.1:2181",q="*:*",fl="document_id,
 sender_msg_dest", sort="document_id asc",qt="/export")

-- result from browser:
{"result-set":{"docs":[
{"document_id":20346005172,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346005173,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346006403,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006406,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006741,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006743,"sender_msg_dest":"14:004321519IBMP"},
{"EOF":true,"RESPONSE_TIME":10}]}}

Do you think I can try the same in http using other 'Stream Decorators' such as 
'complement' and 'innerJoin'?

Regards,
Hui

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Monday, June 06, 2016 9:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Help needed on Solr Streaming Expressions

Hi,

To eliminate any issues that might be happening due to curl, try running the 
command from your browser.

http://localhost:8988/solr/document2/stream?expr=search(document3,zkHost="
127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id 
asc",qt="/export")



I think most browsers will url encode the expression automatically, but you can 
url encode also using an online tool. Also you can remove the zkHost param and 
it should default to zkHost your solr is connected to.


If you still get an error take a look at the logs and post the full stack trace 
to this thread, which will help determine where the problem is.



Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jun 5, 2016 at 2:11 PM, Hui Liu  wrote:

> Hi,
>
>
>
>   I have Solr 6.0.0 installed on my PC (windows 7), I was 
> experimenting with ‘Streaming Expression’ feature by following steps 
> from this link:
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> , but cannot get it to work, attached is my solrconfig.xml and 
> schema.xml, note I do have ‘export’ handler defined in my 
> ‘solrconfig.xml’ and enabled all fields as ‘docvalues’ in 
> ‘schema.xml’; I am using solr cloud and external zookeeper (also 
> installed on m PC), here is the command to start this 2-node Solr 
> cloud instance and to create the collection ‘document3’:
>
>
>
> -- start 2-node solr cloud instances:
>
> solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3
>
> solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4
>
>
>
> -- create the collection:
>
> solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2
>
>
>
>   after creating the collection I loaded a few documents 
> using ‘csv’ format and I was able to query it using ‘curl’ command from my PC:
>
>
>
> -- this works on my PC:
>
> curl
> http://localhost:8988/solr/document3/select?q=*:*&sort=document_id+des
> c,sender_msg_dest+desc&fl=document_id,sender_msg_dest,recip_msg_dest
>
>
>
>   but when trying Streaming ‘search’ using curl, it does 
> not work, I tried with 3 different options: with zkHost, using 
> ‘export’, or using ‘select’, all getting the same error:
>
>
> curl: (6) Could not resolve host: sort=document_id asc,qt=
>
> {"result-set":{"docs":[
>
> {"EXCEPTION":null,"EOF":true}]}}
>
> -- different curl commands tried, all getting the same error above:
>
> curl --data-urlencode 
> 'expr=search(document3,zkHost="127.0.0.1:2181

RE: Help needed on Solr Streaming Expressions

2016-06-06 Thread Hui Liu
The only difference between document3 and document5 is document3 has no data in 
'shard2', after loading some data into shard2, the http command also worked:

http://localhost:8988/solr/document3/stream?expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id,
 sender_msg_dest", sort="document_id asc",qt="/export")

my guess is the 'null pointer' error from the stack trace is caused by no data 
in the 'shard2'.

Regards,
Hui

-Original Message-
From: Hui Liu 
Sent: Monday, June 06, 2016 1:04 PM
To: solr-user@lucene.apache.org
Subject: RE: Help needed on Solr Streaming Expressions

Joel,

Thank you very much for your help, I tried the http command below with my 
existing 2 shards collection 'document3' (sorry I have a typo below should be 
document3 instead of document2), this time I got much better error:

{"result-set":{"docs":[
{"EXCEPTION":"Unable to construct instance of 
org.apache.solr.client.solrj.io.stream.CloudSolrStream","EOF":true}]}}

I attach the error stack trace from 'solr-8988-console.log' and 'solr.log' here 
in file 'solr_error.txt'.

However I continued and tried create another identical collection 'document5' 
with 2 shards and 2 replica using the same schema, this time the http URL 
worked!!! Maybe my previous collection 'document3' has some corruption? 

-- command to create collection 'document5':
solr create -c document5 -d new_doc_configs5 -p 8988 -s 2 -rf 2

-- command for stream expression:
http://localhost:8988/solr/document5/stream?expr=search(document5,zkHost="127.0.0.1:2181",q="*:*",fl="document_id,
 sender_msg_dest", sort="document_id asc",qt="/export")

-- result from browser:
{"result-set":{"docs":[
{"document_id":20346005172,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346005173,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346006403,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006406,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006741,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006743,"sender_msg_dest":"14:004321519IBMP"},
{"EOF":true,"RESPONSE_TIME":10}]}}

Do you think I can try the same in http using other 'Stream Decorators' such as 
'complement' and 'innerJoin'?

Regards,
Hui

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com]
Sent: Monday, June 06, 2016 9:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Help needed on Solr Streaming Expressions

Hi,

To eliminate any issues that might be happening due to curl, try running the 
command from your browser.

http://localhost:8988/solr/document2/stream?expr=search(document3,zkHost="
127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id 
asc",qt="/export")



I think most browsers will url encode the expression automatically, but you can 
url encode also using an online tool. Also you can remove the zkHost param and 
it should default to zkHost your solr is connected to.


If you still get an error take a look at the logs and post the full stack trace 
to this thread, which will help determine where the problem is.



Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jun 5, 2016 at 2:11 PM, Hui Liu  wrote:

> Hi,
>
>
>
>   I have Solr 6.0.0 installed on my PC (windows 7), I was 
> experimenting with ‘Streaming Expression’ feature by following steps 
> from this link:
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> , but cannot get it to work, attached is my solrconfig.xml and 
> schema.xml, note I do have ‘export’ handler defined in my 
> ‘solrconfig.xml’ and enabled all fields as ‘docvalues’ in 
> ‘schema.xml’; I am using solr cloud and external zookeeper (also 
> installed on m PC), here is the command to start this 2-node Solr 
> cloud instance and to create the collection ‘document3’:
>
>
>
> -- start 2-node solr cloud instances:
>
> solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3
>
> solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4
>
>
>
> -- create the collection:
>
> solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2
>
>
>
>   after creating the collection I loaded a few documents 
> using ‘csv’ format and I was able to query it using ‘curl’ command from my PC:
>
>
>
> -- this works on my PC:
>
> curl
> http://localhost:8988/solr/document3/select?q=*:*&sort=do

Questions regarding re-index when using Solr as a data source

2016-06-09 Thread Hui Liu
Hi,

  We are porting an application currently hosted in Oracle 11g to 
Solr Cloud 6.x, i.e we plan to migrate all tables in Oracle as collections in 
Solr, index them, and build search tools on top of this; the goal is we won't 
be using Oracle at all after this has been implemented; every fields in Solr 
will have 'stored=true' and selectively a subset of searchable fields will have 
'indexed=true'; the question is what steps we should follow if we need to 
re-index a collection after making some schema changes - mostly we only add new 
fields to store, or make a non-indexed field as indexed, we normally do not 
delete or rename any existing fields; according to this url: 
https://wiki.apache.org/solr/HowToReindex it seems we need to setup a 
'intermediate' Solr1 to only store the data themselves without any indexing, 
then have another Solr2 setup to store the indexed data, and in case of 
re-index, just delete all the documents in Solr2 for the collection and 
re-import data from Solr1 into Solr2 using SolrEntityProcessor (from dataimport 
handler)? Is this still the recommended approach? I can see the downside of 
this approach is if we have tremendous amount of data for a collection (some of 
our collection could have several billions of documents), re-import it from 
Solr1 to Solr2 may take a few hours or even days, and during this time, users 
cannot query the data, is there any better way to do this and avoid this type 
of down time? Any feedback is appreciated!

Regards,
Hui Liu
Opentext, Inc.


RE: Questions regarding re-index when using Solr as a data source

2016-06-09 Thread Hui Liu
Hi Walter,

Thank you for the reply, sorry I need to clarify what I mean by 'migrate 
tables' from Oracle to Solr, we are not literally move existing records from 
Oracle to Solr, instead, we are building a new application directly feed data 
into Solr as document and fields, in parallel of another existing application 
which feeds the same data into Oracle tables/columns, of course, the Solr 
schema will be somewhat different than Oracle; also we only keep those data for 
90 days for user to search on, we hope once we run both system in parallel for 
some time (> 90 days), we will build up enough new data in Solr and we no 
longer need any old data in Oracle, by then we will be able to use Solr as our 
only data store.

It sounds to me that we may need to consider save the data into either file 
system, or another database, in case we need to rebuild the indexes; and the 
reason I mentioned to save data into another Solr system is by reading this 
info from https://wiki.apache.org/solr/HowToReindex : so just trying to get a 
feedback on if there is any update on this approach? And any better way to do 
this to minimize the downtime caused by the schema change and re-index? For 
example, in Oracle, we are able to add a new column or new index online without 
any impact of existing queries as existing indexes are intact.

Alternatives when a traditional reindex isn't possible

Sometimes the option of "do your indexing again" is difficult. Perhaps the 
original data is very slow to access, or it may be difficult to get in the 
first place.

Here's where we go against our own advice that we just gave you. Above we said 
"don't use Solr itself as a datasource" ... but one way to deal with data 
availability problems is to set up a completely separate Solr instance (not 
distributed, which for SolrCloud means numShards=1) whose only job is to store 
the data, then use the SolrEntityProcessor in the DataImportHandler to index 
from that instance to your real Solr install. If you need to reindex, just run 
the import again on your real installation. Your schema for the intermediate 
Solr install would have stored="true" and indexed="false" for all fields, and 
would only use basic types like int, long, and string. It would not have any 
copyFields.

This is the approach used by the Smithsonian for their Solr installation, 
because getting access to the source databases for the individual entities 
within the organization is very difficult. This way they can reindex the online 
Solr at any time without having to get special permission from all those 
entities. When they index new content, it goes into a copy of Solr configured 
for storage only, not in-depth searching. Their main Solr instance uses 
SolrEntityProcessor to import from the intermediate Solr servers, so they can 
always reindex.

Regards,
Hui

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Thursday, June 09, 2016 12:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Questions regarding re-index when using Solr as a data source

First, using Solr as a repository is pretty risky. I would keep the official 
copy of the data in a database, not in Solr.

Second, you can’t “migrate tables” because Solr doesn’t have tables. You need 
to turn the tables into documents, then index the documents. It can take a lot 
of joins to flatten a relational schema into Solr documents.

Solr does not support schema migration, so yes, you will need to save off all 
the documents, then reload them. I would save them to files. It makes no sense 
to put them in another copy of Solr.

Changing the schema will be difficult and time-consuming, but you’ll probably 
run into much worse problems trying to use Solr as a repository.

wunder
Walter Underwood
wun...@wunderwood.org<mailto:wun...@wunderwood.org>
http://observer.wunderwood.org/  (my blog)


> On Jun 9, 2016, at 8:50 AM, Hui Liu 
> mailto:h...@opentext.com>> wrote:
>
> Hi,
>
>  We are porting an application currently hosted in Oracle 11g to 
> Solr Cloud 6.x, i.e we plan to migrate all tables in Oracle as collections in 
> Solr, index them, and build search tools on top of this; the goal is we won't 
> be using Oracle at all after this has been implemented; every fields in Solr 
> will have 'stored=true' and selectively a subset of searchable fields will 
> have 'indexed=true'; the question is what steps we should follow if we need 
> to re-index a collection after making some schema changes - mostly we only 
> add new fields to store, or make a non-indexed field as indexed, we normally 
> do not delete or rename any existing fields; according to this url: 
> https://wiki.apache.org/solr/HowToReindex it seems we need to setup a 
> 'intermediate' Solr1 to only store the data themselves without any indexing,

RE: Questions regarding re-index when using Solr as a data source

2016-06-10 Thread Hui Liu
Walter,

Thank you for your advice. We are new to Solr and have been using 
Oracle for past 10+ years, so we are used to the idea of having a tool that can 
be used as both data store and also searchable by having indexes on top of it. 
I guess the reason we are considering Solr as data store is due to it has some 
features of a database that our application requires, such as 1) be able to 
detect duplicate record by having a unique field; 2) allow us to do concurrent 
update by using Optimistic concurrency control feature; 3) its 'replication' 
feature allowing us to store multiple copies of data; so if we were to use a 
file system, we will not have the above features (at least not 1 and 2) and 
have to implement those ourselves. The other option is to pick another database 
tool such as Mysql or Cassandra, then we will need to learn and support an 
additional tool besides Solr; but you brought up several very good points about 
operational factors we should consider if we pick Solr as a data store. Also 
our application is more of a OLTP than OLAP. I will update our colleagues and 
stakeholders about these concerns. Thanks again!

Regards,
Hui
-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Thursday, June 09, 2016 1:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Questions regarding re-index when using Solr as a data source

In the HowToReindex page, under “Using Solr as a Data Store”, it says this: 
"Don't do this unless you have no other option. Solr is not really designed for 
this role.” So don’t start by planning to do this.

Using a second copy of Solr is still using Solr as a repository. That doesn’t 
satisfy any sort of requirements for disaster recovery. How do you know that 
data is good? How do you make a third copy? How do you roll back to a previous 
version? How do you deal with a security breach that affects all your systems? 
Are the systems in the same data center? How do you deal with ransomware (U. of 
Calgary paid $20K yesterday)?

If a consultant suggested this to me, I’d probably just give up and get a 
different consultant.

Here is what we do for batch loading.

1. For each Solr collection, we define a JSONL feed format, with a JSON Schema.
2. The owners of the data write an extractor to pull the data out of wherever 
it is, then generate the JSON feed.
3. We validate the JSON feed against the JSON schema.
4. If the feed is valid, we save it to Amazon S3 along with a manifest which 
lists the version of the JSON Schema.
5. Then a multi-threaded loader reads the feed and sends it to Solr.

Reloading is safe and easy, because all the feeds in S3 are valid.

Storing backups in S3 instead of running a second Solr is massively cheaper, 
easier, and safer.

We also have a clear contract between the content owners and the search team. 
That contract is enforced by the JSON Schema on every single batch.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 9, 2016, at 9:51 AM, Hui Liu  wrote:
> 
> Hi Walter,
> 
> Thank you for the reply, sorry I need to clarify what I mean by 'migrate 
> tables' from Oracle to Solr, we are not literally move existing records from 
> Oracle to Solr, instead, we are building a new application directly feed data 
> into Solr as document and fields, in parallel of another existing application 
> which feeds the same data into Oracle tables/columns, of course, the Solr 
> schema will be somewhat different than Oracle; also we only keep those data 
> for 90 days for user to search on, we hope once we run both system in 
> parallel for some time (> 90 days), we will build up enough new data in Solr 
> and we no longer need any old data in Oracle, by then we will be able to use 
> Solr as our only data store.
> 
> It sounds to me that we may need to consider save the data into either file 
> system, or another database, in case we need to rebuild the indexes; and the 
> reason I mentioned to save data into another Solr system is by reading this 
> info from https://wiki.apache.org/solr/HowToReindex : so just trying to get a 
> feedback on if there is any update on this approach? And any better way to do 
> this to minimize the downtime caused by the schema change and re-index? For 
> example, in Oracle, we are able to add a new column or new index online 
> without any impact of existing queries as existing indexes are intact.
> 
> Alternatives when a traditional reindex isn't possible
> 
> Sometimes the option of "do your indexing again" is difficult. Perhaps the 
> original data is very slow to access, or it may be difficult to get in the 
> first place.
> 
> Here's where we go against our own advice that we just gave you. Above we 
> said "don't use Solr itself as a datasource" ... but one way to de

RE: Questions regarding re-index when using Solr as a data source

2016-06-10 Thread Hui Liu
What if we plan to use Solr version 6.x? this url says it support 2 different 
update modes: atomic update and optimistic concurrency:

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

I tested 'optimistic concurrency' and it appears to be working, i.e if a 
document I am updating got changed by another person I will get error if I 
supply a _version_ value, So maybe you are referring to an older version of 
Solr?

Regards,
Hui

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Friday, June 10, 2016 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Questions regarding re-index when using Solr as a data source

Solr does not have transactions at all. The “commit” is really “submit batch”.

Solr does not have update. You can add, delete, or replace an entire document.

There is no optimistic concurrency control because there is no concurrency 
control. Clients can concurrently add documents to a batch, then any client can 
submit the entire batch.

Replication is not transactional. Replication is a file copy of the underlying 
indexes (classic) or copying the documents in a batch (Solr Cloud).

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 10, 2016, at 7:41 AM, Hui Liu  wrote:
> 
> Walter,
> 
>   Thank you for your advice. We are new to Solr and have been using 
> Oracle for past 10+ years, so we are used to the idea of having a tool that 
> can be used as both data store and also searchable by having indexes on top 
> of it. I guess the reason we are considering Solr as data store is due to it 
> has some features of a database that our application requires, such as 1) be 
> able to detect duplicate record by having a unique field; 2) allow us to do 
> concurrent update by using Optimistic concurrency control feature; 3) its 
> 'replication' feature allowing us to store multiple copies of data; so if we 
> were to use a file system, we will not have the above features (at least not 
> 1 and 2) and have to implement those ourselves. The other option is to pick 
> another database tool such as Mysql or Cassandra, then we will need to learn 
> and support an additional tool besides Solr; but you brought up several very 
> good points about operational factors we should consider if we pick Solr as a 
> data store. Also our application is more of a OLTP than OLAP. I will update 
> our colleagues and stakeholders about these concerns. Thanks again!
> 
> Regards,
> Hui
> -Original Message-
> From: Walter Underwood [mailto:wun...@wunderwood.org] 
> Sent: Thursday, June 09, 2016 1:24 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Questions regarding re-index when using Solr as a data source
> 
> In the HowToReindex page, under “Using Solr as a Data Store”, it says this: 
> "Don't do this unless you have no other option. Solr is not really designed 
> for this role.” So don’t start by planning to do this.
> 
> Using a second copy of Solr is still using Solr as a repository. That doesn’t 
> satisfy any sort of requirements for disaster recovery. How do you know that 
> data is good? How do you make a third copy? How do you roll back to a 
> previous version? How do you deal with a security breach that affects all 
> your systems? Are the systems in the same data center? How do you deal with 
> ransomware (U. of Calgary paid $20K yesterday)?
> 
> If a consultant suggested this to me, I’d probably just give up and get a 
> different consultant.
> 
> Here is what we do for batch loading.
> 
> 1. For each Solr collection, we define a JSONL feed format, with a JSON 
> Schema.
> 2. The owners of the data write an extractor to pull the data out of wherever 
> it is, then generate the JSON feed.
> 3. We validate the JSON feed against the JSON schema.
> 4. If the feed is valid, we save it to Amazon S3 along with a manifest which 
> lists the version of the JSON Schema.
> 5. Then a multi-threaded loader reads the feed and sends it to Solr.
> 
> Reloading is safe and easy, because all the feeds in S3 are valid.
> 
> Storing backups in S3 instead of running a second Solr is massively cheaper, 
> easier, and safer.
> 
> We also have a clear contract between the content owners and the search team. 
> That contract is enforced by the JSON Schema on every single batch.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Jun 9, 2016, at 9:51 AM, Hui Liu  wrote:
>> 
>> Hi Walter,
>> 
>> Thank you for the reply, sorry I need to clarify what I mean by 'migrate 
>> tables' from Oracle to Solr, we are not literally move existing records from 
>> Oracle to Sol

RE: Questions regarding re-index when using Solr as a data source

2016-06-10 Thread Hui Liu
Thank you Walter.

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Friday, June 10, 2016 3:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Questions regarding re-index when using Solr as a data source

Those are brand new features that I have not used, so I can’t comment on them.

But I know they do not make Solr into a database.

If you need a transactional database that can support search, you probably want 
MarkLogic. I worked at MarkLogic for a couple of years. In some ways, MarkLogic 
is like Solr, but the support for transactions goes very deep. It is not 
something you can put on top of a search engine.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 10, 2016, at 12:39 PM, Hui Liu  wrote:
> 
> What if we plan to use Solr version 6.x? this url says it support 2 different 
> update modes: atomic update and optimistic concurrency:
> 
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
> 
> I tested 'optimistic concurrency' and it appears to be working, i.e if a 
> document I am updating got changed by another person I will get error if I 
> supply a _version_ value, So maybe you are referring to an older version of 
> Solr?
> 
> Regards,
> Hui
> 
> -Original Message-
> From: Walter Underwood [mailto:wun...@wunderwood.org] 
> Sent: Friday, June 10, 2016 11:18 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Questions regarding re-index when using Solr as a data source
> 
> Solr does not have transactions at all. The “commit” is really “submit batch”.
> 
> Solr does not have update. You can add, delete, or replace an entire document.
> 
> There is no optimistic concurrency control because there is no concurrency 
> control. Clients can concurrently add documents to a batch, then any client 
> can submit the entire batch.
> 
> Replication is not transactional. Replication is a file copy of the 
> underlying indexes (classic) or copying the documents in a batch (Solr Cloud).
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Jun 10, 2016, at 7:41 AM, Hui Liu  wrote:
>> 
>> Walter,
>> 
>>  Thank you for your advice. We are new to Solr and have been using 
>> Oracle for past 10+ years, so we are used to the idea of having a tool that 
>> can be used as both data store and also searchable by having indexes on top 
>> of it. I guess the reason we are considering Solr as data store is due to it 
>> has some features of a database that our application requires, such as 1) be 
>> able to detect duplicate record by having a unique field; 2) allow us to do 
>> concurrent update by using Optimistic concurrency control feature; 3) its 
>> 'replication' feature allowing us to store multiple copies of data; so if we 
>> were to use a file system, we will not have the above features (at least not 
>> 1 and 2) and have to implement those ourselves. The other option is to pick 
>> another database tool such as Mysql or Cassandra, then we will need to learn 
>> and support an additional tool besides Solr; but you brought up several very 
>> good points about operational factors we should consider if we pick Solr as 
>> a data store. Also our application is more of a OLTP than OLAP. I will 
>> update our colleagues and stakeholders about these concerns. Thanks again!
>> 
>> Regards,
>> Hui
>> -Original Message-
>> From: Walter Underwood [mailto:wun...@wunderwood.org] 
>> Sent: Thursday, June 09, 2016 1:24 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Questions regarding re-index when using Solr as a data source
>> 
>> In the HowToReindex page, under “Using Solr as a Data Store”, it says this: 
>> "Don't do this unless you have no other option. Solr is not really designed 
>> for this role.” So don’t start by planning to do this.
>> 
>> Using a second copy of Solr is still using Solr as a repository. That 
>> doesn’t satisfy any sort of requirements for disaster recovery. How do you 
>> know that data is good? How do you make a third copy? How do you roll back 
>> to a previous version? How do you deal with a security breach that affects 
>> all your systems? Are the systems in the same data center? How do you deal 
>> with ransomware (U. of Calgary paid $20K yesterday)?
>> 
>> If a consultant suggested this to me, I’d probably just give up and get a 
>> different consultant.
>> 
>> Here is what we do for batch loading.
>> 
>> 1. For each Solr collection, we define a JSONL feed format, with a JSON 
&g

Errors for Streaming Expressions using JDBC (Oracle) stream source

2016-06-22 Thread Hui Liu
Hi,

  I have Solr 6.0.0 installed on my PC (windows 7), I was 
experimenting with 'Streaming Expression' by using Oracle jdbc as the stream 
source, following is the http command I am using:

http://localhost:8988/solr/document5/stream?expr=jdbc(connection="jdbc:oracle:thin:qa_docrep/abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql="SELECT
 
document_id,sender_msg_dest,recip_msg_dest,document_type,document_key,sender_bu_id,recip_bu_id,date_created
 FROM tg_document WHERE rownum < 5",sort="document_id 
asc",driver="oracle.jdbc.driver.OracleDriver")

  I can access this Oracle db from my PC via regular JDBC 
connection. I did put Oracle jdbc driver jar 'ojdbc14.jar' (same jar used in my 
regular jdbc code) under Solr/server/lib dir and restarted Solr cloud. Below is 
the error from solr.log (got a null pointer error); I am merely trying to get 
the data returned from Oracle table, I have not tried to index them in the Solr 
yet, attached is the shema.xml and solrconfig.xml for this collection 
'document5'; does anyone know what am I missing? thanks for any help!

Regards,
Hui Liu

Error from Solr.log:
=
2016-06-23 03:17:34.413 INFO  (qtp1389647288-19) [c:document5 s:shard2 
r:core_node2 x:document5_shard2_replica1] o.a.s.c.S.Request 
[document5_shard2_replica1]  webapp=/solr path=/stream 
params={expr=jdbc(connection%3D"jdbc:oracle:thin:qa_docrep/abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql%3D"SELECT+document_id,sender_msg_dest,recip_msg_dest,document_type,document_key,sender_bu_id,recip_bu_id+FROM+tg_document+WHERE+rownum+<+5",sort%3D"document_id+asc",driver%3D"oracle.jdbc.OracleDriver")}
 status=0 QTime=0
2016-06-23 03:17:37.588 ERROR (qtp1389647288-19) [c:document5 s:shard2 
r:core_node2 x:document5_shard2_replica1] o.a.s.c.s.i.s.ExceptionStream 
java.lang.NullPointerException
  at 
org.apache.solr.client.solrj.io.stream.JDBCStream.read(JDBCStream.java:305)
  at 
org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionStream.java:64)
  at 
org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.java:374)
  at 
org.apache.solr.response.TextResponseWriter.writeTupleStream(TextResponseWriter.java:305)
  at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:167)
  at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)
  at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)
  at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95)
  at 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:60)
  at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
  at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:725)
  at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
  at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
  at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
  at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
  at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
  at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
  at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
  at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
  at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
  at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
  at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
  at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
  at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
  at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
  at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
  at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
  at org.eclipse.jetty.server.Server.handle(Server.java:518)
  at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
  at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
  at 
org.eclipse.jetty.io.AbstractConnect

RE: Errors for Streaming Expressions using JDBC (Oracle) stream source

2016-06-23 Thread Hui Liu
g.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)
at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)
at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95)
at 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:60)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:725)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
... 26 more

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Thursday, June 23, 2016 7:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source

I'm wondering if you're selecting an unsupported data type. The exception being 
thrown looks like it could happen if that were the case. The supported types 
are in the Java doc.
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0.0/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream.java

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jun 22, 2016 at 11:46 PM, Hui Liu  wrote:

> Hi,
>
>
>
>   I have Solr 6.0.0 installed on my PC (windows 7), I was 
> experimenting with ‘Streaming Expression’ by using Oracle jdbc as the 
> stream source, following is the http command I am using:
>
>
>
> http://localhost:8988/solr/document5/stream?expr=jdbc(connection=
> "jdbc:oracle:thin:qa_docrep/
> abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql="SELECT
> document_id,sender_msg_dest,recip_msg_dest,document_type,document_key,
> sender_bu_id,recip_bu_id,date_created
> FROM tg_document WHERE rownum < 5",sort="document_id
> asc",driver="oracle.jdbc.driver.OracleDriver")
>
>
>
>   I can access this Oracle db from my PC via regular JDBC 
> connection. I did put Oracle jdbc driver jar ‘ojdbc14.jar’ (same jar 
> used in my regular jdbc code) under Solr/server/lib dir and restarted 
> Solr cloud. Below is the error from solr.log (got a null pointer 
> error); I am merely trying to get the data returned from Oracle table, 
> I have not tried to index them in the Solr yet, attached is the 
> shema.xml and solrconfig.xml for this collection ‘document5’; does 
> anyone know what am I missing? thanks for any help!
>
>
>
> Regards,
>
> Hui Liu
>
>
>
> Error from Solr.log:
>
> =
>
> 2016-06-23 03:17:34.413 INFO  (qtp1389647288-19) [c:document5 s:shard2
> r:core_node2 x:document5_shard2_replica1] o.a.s.c.S.Request 
> [document5_shard2_replica1]  webapp=/solr path=/stream 
> params={expr=jdbc(connection%3D"jdbc:oracle:thin:qa_docrep/
> abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql%3D"SELECT+docu
> ment_id,sender_msg_dest,recip_msg_dest,document_type,document_key,send
> er_bu_id,recip_bu_id+FROM+tg_document+WHERE+rownum+<+5",sort%3D"docume
> nt_id+asc",driver%3D"oracle.jdbc.OracleDriver")}
> status=0 QTime=0
>
> 2016-06-23 03:17:37.588 ERROR (qtp1389647288-19) [c:document5 s:shard2
> r:core_node2 x:document5_shard2_replica1] 
> o.a.s.c.s.i.s.ExceptionStream java.lang.NullPointerException
>
>   at
> org.apache.solr.client.solrj.io.stream.JDBCStream.read(JDBCStream.java
> :305)
>
>   at
> org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionS
> tream.java:64)
>
>   at
> org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.j
> ava:374)
>
>   at
> org.apache.solr.response.TextResponseWriter.writeTupleStream(TextRespo
> nseWriter.java:305)
>
>   at
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWrite
> r.java:167)
>
>   at
> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONRe
> sponseWriter.java:183)
>
>   at
> org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.
> java:299)
>
>   at
> org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.j
> ava:95)
>
>   at
> org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.j
> ava:60)
>
>   at
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(Qu
> eryResponseWriterUtil.java:65)
>
>   at
> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:7
> 25)
>
>   at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
&

RE: Errors for Streaming Expressions using JDBC (Oracle) stream source

2016-06-23 Thread Hui Liu
Thanks Joel, I have never opened a ticket before with Solr, do you know the 
steps (url etc) I should follow? I will be glad to do so...
At the meantime, I guess the workaround is to use 'data import handler' to get 
the data from Oracle into Solr?

Regards,
Hui
-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Thursday, June 23, 2016 10:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source

Let's open a ticket for this issue specific to Oracle.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 23, 2016 at 10:54 AM, Joel Bernstein  wrote:

> I think we're going to have to add some debugging into the code to 
> find what's going on. On line 225 in JDBCStream it's getting the class 
> name for each column. It would be good know what the class names are 
> that the Oracles driver is returning.
>
>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0.0/
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream.
> java
>
> We probably need to throw an exception that includes the class name to 
> help users report what different drivers using for the classes.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 23, 2016 at 10:18 AM, Hui Liu  wrote:
>
>> Joel - thanks for the quick response, in my previous test, the 
>> collection 'document5' does have a field called 'date_created' which 
>> is type 'date', even though my SQL SELECT below did not select any 
>> un-supported data type (all columns are either long or String in jdbc 
>> type); but to totally rule out this issue, I created a new collection 
>> 'document6' which only contain long and string data type, and a new 
>> Oracle table 'document6' that only contain columns whose jdbc type is 
>> long and string, see below for schema.xml and table definition:
>>
>> schema.xml for Solr collection 'document6': (newly created empty 
>> collections with 2 shards)
>>
>> =
>> = 
>>   
>>  
>>  
>>  > sortMissingLast="true" docValues="true" />
>>  > precisionStep="0" positionIncrementGap="0"/>
>>  
>> 
>>
>> 
>>   
>>   > sortMissingLast="true" omitNorms="true"/>
>>
>>
>>  > multiValued="false"/>
>>  > docValues="true"/>
>>  > stored="true" docValues="true"/>
>>  > stored="true" docValues="true"/>
>>  > stored="true" docValues="true"/>
>>  > stored="true" docValues="true"/>
>>
>>   document_id
>>   document_id
>> 
>>
>> Oracle table 'document6': (newly created Oracle table with 9 records) 
>> ==
>> QA_DOCREP@qlgdb1 > desc document6
>>  Name  Null?Type
>>  - 
>> 
>>  DOCUMENT_ID   NOT NULL NUMBER(12)
>>  SENDER_MSG_DESTVARCHAR2(256)
>>  RECIP_MSG_DEST VARCHAR2(256)
>>  DOCUMENT_TYPE  VARCHAR2(20)
>>  DOCUMENT_KEY   VARCHAR2(100)
>>
>> Then I tried this jdbc streaming expression in my browser, 
>> still getting the same error stack (see below); By looking at the 
>> source code you have provided below, it seems Solr is able to connect 
>> to this Oracle db, but just cannot read the resultset for some 
>> reason? Do you think it has something to do with the jdbc driver version?
>>
>> http://localhost:8988/solr/document6/stream?expr=jdbc(connection=
>> "jdbc:oracle:thin:qa_docrep/
>> abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql="SELECT
>> document_id,sender_msg_dest,recip_msg_dest,document_type,document_key 
>> FROM document6",sort="document_id 
>> asc",driver="oracle.jdbc.driver.OracleDriver")
>>
>> errors in solr.log
>> ==
>> 2016-06-23 14:07:02.833 INFO  (qtp1389647288-139) [c:document6 
>> s:shard2
>> r:core_node1 x:document6_shard2_replica1] o.a.s.c.S.Requ

RE: Errors for Streaming Expressions using JDBC (Oracle) stream source

2016-06-23 Thread Hui Liu
Joel, I just opened an account for this, my user name is h...@opentext.com; let 
me know when I can open the ticket.

And thanks for the info, I will be glad to do any collaboration needed as a 
reporter on this issue, so feel free to let me know what I need to do.

Regards,
Hui

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Thursday, June 23, 2016 11:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source

Sure. You can create a ticket from here
https://issues.apache.org/jira/browse/SOLR/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

After you've created an account I'll need to add your username to the 
contributors group. If you post your username back to this thread I'll do that.

Then you can open a ticket.

This particular issue will require access to an Oracle database so it will 
likely be handled as a collaboration between the reporter and a committer, 
because not all committers are going to have access to Oracle.

DIH will accomplish the data load for you.

The JDBCStream can be used to do things like joins involving RDMBS and Solr.









Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 23, 2016 at 11:06 AM, Hui Liu  wrote:

> Thanks Joel, I have never opened a ticket before with Solr, do you 
> know the steps (url etc) I should follow? I will be glad to do so...
> At the meantime, I guess the workaround is to use 'data import 
> handler' to get the data from Oracle into Solr?
>
> Regards,
> Hui
> -Original Message-
> From: Joel Bernstein [mailto:joels...@gmail.com]
> Sent: Thursday, June 23, 2016 10:55 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) 
> stream source
>
> Let's open a ticket for this issue specific to Oracle.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 23, 2016 at 10:54 AM, Joel Bernstein 
> wrote:
>
> > I think we're going to have to add some debugging into the code to 
> > find what's going on. On line 225 in JDBCStream it's getting the 
> > class name for each column. It would be good know what the class 
> > names are that the Oracles driver is returning.
> >
> >
> > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0.
> > 0/ 
> > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream.
> > java
> >
> > We probably need to throw an exception that includes the class name 
> > to help users report what different drivers using for the classes.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Jun 23, 2016 at 10:18 AM, Hui Liu  wrote:
> >
> >> Joel - thanks for the quick response, in my previous test, the 
> >> collection 'document5' does have a field called 'date_created' 
> >> which is type 'date', even though my SQL SELECT below did not 
> >> select any un-supported data type (all columns are either long or 
> >> String in jdbc type); but to totally rule out this issue, I created 
> >> a new collection 'document6' which only contain long and string 
> >> data type, and a new Oracle table 'document6' that only contain 
> >> columns whose jdbc type is long and string, see below for schema.xml and 
> >> table definition:
> >>
> >> schema.xml for Solr collection 'document6': (newly created empty 
> >> collections with 2 shards)
> >>
> >> ===
> >> == = 
> >>   
> >>  
> >>  
> >>   >> sortMissingLast="true" docValues="true" />
> >>   >> precisionStep="0" positionIncrementGap="0"/>
> >>  
> >> 
> >>
> >> 
> >>   
> >>>> sortMissingLast="true" omitNorms="true"/>
> >>
> >>
> >>   >> multiValued="false"/>
> >>   >> docValues="true"/>
> >>   >> stored="true" docValues="true"/>
> >>   >> stored="true" docValues="true"/>
> >>   >> stored="true" docValues="true"/>
> >>   >> stored="true" docValues="true"/>
> >>
> >>   document_id
> >>   document_id
> >> 
> >>
> >> Or

RE: Errors for Streaming Expressions using JDBC (Oracle) stream source

2016-06-23 Thread Hui Liu
Opened ticket: Issue SOLR-9246 - Errors for Streaming Expressions using JDBC 
(Oracle) stream source

Regards,
Hui
-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Thursday, June 23, 2016 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source

Ok you should be able to create the jira.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 23, 2016 at 11:52 AM, Hui Liu  wrote:

> Joel, I just opened an account for this, my user name is 
> h...@opentext.com; let me know when I can open the ticket.
>
> And thanks for the info, I will be glad to do any collaboration needed 
> as a reporter on this issue, so feel free to let me know what I need to do.
>
> Regards,
> Hui
>
> -Original Message-
> From: Joel Bernstein [mailto:joels...@gmail.com]
> Sent: Thursday, June 23, 2016 11:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) 
> stream source
>
> Sure. You can create a ticket from here
>
> https://issues.apache.org/jira/browse/SOLR/?selectedTab=com.atlassian.
> jira.jira-projects-plugin:summary-panel
>
> After you've created an account I'll need to add your username to the 
> contributors group. If you post your username back to this thread I'll 
> do that.
>
> Then you can open a ticket.
>
> This particular issue will require access to an Oracle database so it 
> will likely be handled as a collaboration between the reporter and a 
> committer, because not all committers are going to have access to Oracle.
>
> DIH will accomplish the data load for you.
>
> The JDBCStream can be used to do things like joins involving RDMBS and 
> Solr.
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 23, 2016 at 11:06 AM, Hui Liu  wrote:
>
> > Thanks Joel, I have never opened a ticket before with Solr, do you 
> > know the steps (url etc) I should follow? I will be glad to do so...
> > At the meantime, I guess the workaround is to use 'data import 
> > handler' to get the data from Oracle into Solr?
> >
> > Regards,
> > Hui
> > -Original Message-
> > From: Joel Bernstein [mailto:joels...@gmail.com]
> > Sent: Thursday, June 23, 2016 10:55 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) 
> > stream source
> >
> > Let's open a ticket for this issue specific to Oracle.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Jun 23, 2016 at 10:54 AM, Joel Bernstein 
> > 
> > wrote:
> >
> > > I think we're going to have to add some debugging into the code to 
> > > find what's going on. On line 225 in JDBCStream it's getting the 
> > > class name for each column. It would be good know what the class 
> > > names are that the Oracles driver is returning.
> > >
> > >
> > > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0.
> > > 0/
> > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream.
> > > java
> > >
> > > We probably need to throw an exception that includes the class 
> > > name to help users report what different drivers using for the classes.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Thu, Jun 23, 2016 at 10:18 AM, Hui Liu  wrote:
> > >
> > >> Joel - thanks for the quick response, in my previous test, the 
> > >> collection 'document5' does have a field called 'date_created'
> > >> which is type 'date', even though my SQL SELECT below did not 
> > >> select any un-supported data type (all columns are either long or 
> > >> String in jdbc type); but to totally rule out this issue, I 
> > >> created a new collection 'document6' which only contain long and 
> > >> string data type, and a new Oracle table 'document6' that only 
> > >> contain columns whose jdbc type is long and string, see below for 
> > >> schema.xml
> and table definition:
> > >>
> > >> schema.xml for Solr collection 'document6': (newly created empty 
> > >> collections with 2 shards)
> > >>
> > >> =
> > >> == == = 
> > >>   
> > &g

RE: remove user defined duplicate from search result

2016-09-26 Thread Yongtao Liu
Sorry, the table is missing.
Update below email with table.

-Original Message-
From: Yongtao Liu [mailto:y...@commvault.com] 
Sent: Monday, September 26, 2016 10:47 AM
To: 'solr-user@lucene.apache.org'
Subject: remove user defined duplicate from search result

Hi,

I am try to remove user defined duplicate from search result.

like below documents match the query.
when query return, I try to remove doc3 from result since it has duplicate guid 
with doc1.

id(uniqueKey)   guid
doc1G1
doc2G2
doc2G1

To do this, I generate exclude list based guid field terms.
For each term, we add from the second document to exclude list.
And add these docs to QueryCommand filter.

If there any better approach to handler this requirement?


Below is code change in SolrIndexSearcer.java

  private TreeMap dupDocs = null;

  public QueryResult search(QueryResult qr, QueryCommand cmd) throws 
IOException {
if (cmd.getUniqueField() != null)
{
  DocSet filter = getDuplicateByField(cmd.getUniqueField());
  if (cmd.getFilter() != null) cmd.getFilter().addAllTo(filter);
  cmd.setFilter(filter);
}

getDocListC(qr,cmd);

return qr;
  }

  private synchronized BitDocSet getDuplicateByField(String field) throws 
IOException
  {
if (dupDocs != null && dupDocs.containsKey(field)) {
  return dupDocs.get(field);
}

if (dupDocs == null)
{
  dupDocs = new TreeMap();
}

LeafReader reader = getLeafReader();

BitDocSet res = new BitDocSet(new FixedBitSet(maxDoc()));

Terms terms = reader.terms(field);

if (terms == null)
{
  dupDocs.put(field, res);
  return res;
}

TermsEnum termEnum = terms.iterator();
PostingsEnum docs = null;
BytesRef term = null;
while ((term = termEnum.next()) != null) {
  docs = termEnum.postings(docs, PostingsEnum.NONE);

  // slip first document
  docs.nextDoc();

  int docID = 0;
  while ((docID = docs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS)
  {
res.add(docID);
  }
}

dupDocs.put(field, res);
return res;
  }

Thanks,
Yongtao


remove user defined duplicate from search result

2016-09-26 Thread Yongtao Liu
Hi,

I am try to remove user defined duplicate from search result.

like below documents match the query.
when query return, I try to remove doc3 from result since it has duplicate guid 
with doc1.

Id (uniqueKey)

guid

doc1

G1

doc2

G2

doc3

G1


To do this, I generate exclude list based guid field terms.
For each term, we add from the second document to exclude list.
And add these docs to QueryCommand filter.

If there any better approach to handler this requirement?


Below is code change in SolrIndexSearcer.java

  private TreeMap dupDocs = null;

  public QueryResult search(QueryResult qr, QueryCommand cmd) throws 
IOException {
if (cmd.getUniqueField() != null)
{
  DocSet filter = getDuplicateByField(cmd.getUniqueField());
  if (cmd.getFilter() != null) cmd.getFilter().addAllTo(filter);
  cmd.setFilter(filter);
}

getDocListC(qr,cmd);

return qr;
  }

  private synchronized BitDocSet getDuplicateByField(String field) throws 
IOException
  {
if (dupDocs != null && dupDocs.containsKey(field)) {
  return dupDocs.get(field);
}

if (dupDocs == null)
{
  dupDocs = new TreeMap();
}

LeafReader reader = getLeafReader();

BitDocSet res = new BitDocSet(new FixedBitSet(maxDoc()));

Terms terms = reader.terms(field);

if (terms == null)
{
  dupDocs.put(field, res);
  return res;
}

TermsEnum termEnum = terms.iterator();
PostingsEnum docs = null;
BytesRef term = null;
while ((term = termEnum.next()) != null) {
  docs = termEnum.postings(docs, PostingsEnum.NONE);

  // slip first document
  docs.nextDoc();

  int docID = 0;
  while ((docID = docs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS)
  {
res.add(docID);
  }
}

dupDocs.put(field, res);
return res;
  }

Thanks,
Yongtao


RE: how to sampling search result

2016-09-27 Thread Yongtao Liu
Mikhail,

Thanks for your reply.

Random field is based on index time.
We want to do sampling based on search result.

Like if the random field has value 1 - 100.
And the query touched documents may all in range 90 - 100.
So random field will not help.

Is it possible we can sampling based on search result?

Thanks,
Yongtao
-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: Tuesday, September 27, 2016 11:16 AM
To: solr-user
Subject: Re: how to sampling search result

Perhaps, you can apply a filter on random field.

On Tue, Sep 27, 2016 at 5:57 PM, googoo  wrote:

> Hi,
>
> Is it possible I can sampling based on  "search result"?
> Like run query first, and search result return 1 million documents.
> With random sampling, 50% (500K) documents return for facet, and stats.
>
> The sampling need based on "search result".
>
> Thanks,
> Yongtao
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/how-to-sampling-search-result-tp4298269.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev


RE: how to remove duplicate from search result

2016-09-27 Thread Yongtao Liu
David,

Thanks for your reply.

Group cannot solve the issue.
We also need run facet and stats based on search result.
With group, facet and stats result still count duplicate.

Thanks,
Yongtao
-Original Message-
From: David Santamauro [mailto:david.santama...@gmail.com] 
Sent: Tuesday, September 27, 2016 11:35 AM
To: solr-user@lucene.apache.org
Cc: david.santama...@gmail.com
Subject: Re: how to remove duplicate from search result

Have a look at

https://cwiki.apache.org/confluence/display/solr/Result+Grouping


On 09/27/2016 11:03 AM, googoo wrote:
> hi,
>
> We want to provide remove duplicate from search result function.
>
> like we have below documents.
> id(uniqueKey) guid
> doc1  G1
> doc2  G2
> doc3  G3
> doc4  G1
>
> user run one query and hit doc1, doc2 and doc4.
> user want to remove duplicate from search result based on guid field.
> since doc1 and doc4 has same guid, one of them should be drop from 
> search result.
>
> how we can address this requirement?
>
> Thanks,
> Yongtao
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-remove-duplicate-from-search
> -result-tp4298272.html Sent from the Solr - User mailing list archive 
> at Nabble.com.
>


RE: how to remove duplicate from search result

2016-09-27 Thread Yongtao Liu
Shamik,

Thanks a lot.
Collapsing query parser solve the issue.

Thanks,
Yongtao
-Original Message-
From: shamik [mailto:sham...@gmail.com] 
Sent: Tuesday, September 27, 2016 3:09 PM
To: solr-user@lucene.apache.org
Subject: RE: how to remove duplicate from search result

Did you take a look at Collapsin Query Parser ?

https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-remove-duplicate-from-search-result-tp4298272p4298305.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: how to sampling search result

2016-09-28 Thread Yongtao Liu
Alexandre,

Thanks for reply.
The use case is customer want to review document based on search result.
But they do not want to review all, since it is costly.
So, they want to pick partial (from 1% to 100%) document to review.
For statistics, user also ask this function.
It is kind of common requirement
Do you know any plan to implement this feature in future?

Post filter should work. Like collapsing query parser.

Thanks,
Yongtao
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Tuesday, September 27, 2016 9:25 PM
To: solr-user
Subject: Re: how to sampling search result

I am not sure I understand what the business case is. However, you might be 
able to do something with a custom post-filter.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 27 September 2016 at 22:29, Yongtao Liu  wrote:
> Mikhail,
>
> Thanks for your reply.
>
> Random field is based on index time.
> We want to do sampling based on search result.
>
> Like if the random field has value 1 - 100.
> And the query touched documents may all in range 90 - 100.
> So random field will not help.
>
> Is it possible we can sampling based on search result?
>
> Thanks,
> Yongtao
> -Original Message-
> From: Mikhail Khludnev [mailto:m...@apache.org]
> Sent: Tuesday, September 27, 2016 11:16 AM
> To: solr-user
> Subject: Re: how to sampling search result
>
> Perhaps, you can apply a filter on random field.
>
> On Tue, Sep 27, 2016 at 5:57 PM, googoo  wrote:
>
>> Hi,
>>
>> Is it possible I can sampling based on  "search result"?
>> Like run query first, and search result return 1 million documents.
>> With random sampling, 50% (500K) documents return for facet, and stats.
>>
>> The sampling need based on "search result".
>>
>> Thanks,
>> Yongtao
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/how-to-sampling-search-result-tp4298269.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev


Solrcloud updating issue.

2017-06-29 Thread Wudong Liu
Hi All:
We are trying to index a large number of documents in solrcloud and keep
seeing the following error: org.apache.solr.common.SolrException: Service
Unavailable, or org.apache.solr.common.SolrException: Service Unavailable

but with a similar stack:

request: http://wp-np2-c0:8983/solr/uniprot/update?wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:320)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$57/936653983.run(Unknown
Source)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


the settings are:
5 nodes in the cluster with each 16g memory, for the collection, it is
defined with 5 shards, and replicate factor 2. the total number of
documents is about 90m, each document size is quite large as well.
we have also 5 zookeeper instances running on each node.

On the solr side, we can see error like:
solr.log.3-Error from server at
http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1: Server Error
solr.log.3-request:
http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fwp-np2-c0.ebi.ac.uk%3A8983%2Fsolr%2Funiprot_shard2_replica1%2F&wt=javabin&version=2
solr.log.3-Remote error message: Async exception during distributed update:
Connect to wp-np2-c2.ebi.ac.uk:8983 timed out
solr.log.3- at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:948)
solr.log.3- at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1679)
solr.log.3- at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
--
solr.log.3- at
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
solr.log.3- at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
solr.log.3- at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
solr.log.3- at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
solr.log.3- at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
solr.log.3- at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
solr.log.3- at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
solr.log.3- at java.lang.Thread.run(Thread.java:745)


The strange bit is this exception doesn't seem to be captured by the
try/catch block in our main thread. and the cluster seems in the good
health (all nodes up) after the job done, we just missing lots of
documents!

any suggestion where we should look to resolve this problem?

Best Regards,
Wudong


Solr deltaImportQuery ID configuration

2017-08-23 Thread Liu, Daphne
Hello,
   I am using Solr 6.3.0. Does anyone know in deltaImportQuery when referencing 
id, should I use '${dih.delta.id}' or '${dataimporter.delta.id} ?
   Both were mentioned in Delta-Import wiki. I am confused. Thank you.

Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com<http://www.cevalogistics.com/> T 904.564.1192 / F 
904.928.1448 / daphne@cevalogistics.com<mailto:daphne@cevalogistics.com>



NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid 
Lines Limited trading as Pyramid Lines.
This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.


Query/Field Index Analysis corrected but return no docs in search

2017-02-04 Thread Peter Liu
hi all:
   I was using solr 3.6 and tried to solve a recall-problem today , but
encountered a weird problem.

   There's doc with field value : 均匀肤色, (just treated that word as a symbol
if you don't know it, I just want to describe the problem as exact as
possible).


   And below was the analysis result ( tokenization) :

  [image: Inline image 2]

  ( and text-version if need.

Index Analyzer
均匀肤色 均匀 匀肤 肤色
均匀肤色 均匀 匀肤 肤色
均匀肤色 均匀 匀肤 肤色
Query Analyzer
均匀肤色
均匀肤色
均匀肤色
均匀肤色


​ The tokenization result indicate the query will recall/hit the doc
​undoubtedly. But the doc did not appear in the result if I search with
"均匀肤色". I tried to simplify the qf/bf/fq/q, just test it with single field
and single document, to make sure it was not caused by other problems but
failed.


​It's knotty to debug because it only reproduced in
​
​product environments, I tried same config/index/query but not produce in
dev ​environment. I'm here ask for helps if you met similar problem, or any
clues/debug-method will be really helped.😶


Delta Import JDBC connection frame size larger than max length

2017-03-01 Thread Liu, Daphne
Hello Solr experts,
   Is there a place in Solr   (Delta Import Datasource?) where I can adjust the 
JDBC connection  frame size to 256 mb ? I have adjusted the settings in 
Cassandra but I'm still getting this error.
   NonTransientConnectionException: 
org.apache.thrift.transport.TTransportException: Frame size (17676563) larger 
than max length (16384000
   Thank you.

Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com<http://www.cevalogistics.com/> T 904.564.1192 / F 
904.928.1448 / daphne@cevalogistics.com<mailto:daphne@cevalogistics.com>


This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.


RE: Data Import Handler on 6.4.1

2017-03-15 Thread Liu, Daphne
For Solr 6.3,  I have to move mine to 
../solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib. If you are using jetty.

Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / 
daphne@cevalogistics.com


-Original Message-
From: Michael Tobias [mailto:mtob...@btinternet.com]
Sent: Wednesday, March 15, 2017 2:36 PM
To: solr-user@lucene.apache.org
Subject: Data Import Handler on 6.4.1

I am sure I am missing something simple but

I am running Solr 4.8.1 and trialling 6.4.1 on another computer.

I have had to manually modify the automatic 6.4.1 scheme config as we use a set 
of specialised field types.  They work fine.

I am now trying to populate my core with data and having problems.

Exactly what names/paths should I be using in the solrconfig.xml file to get 
this working - I don’t recall doing ANYTHING for 4.8.1


   ?

And where do I put the mysql-connector-java-5.1.29-bin.jar file and how do I 
reference it to get it loaded?


??

And then later in the solrconfig.xml I have:


  
db-data-config.xml
  



Any help much appreciated.

Regards

Michael


-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com]
Sent: 15 March 2017 17:47
To: solr-user@lucene.apache.org
Subject: Re: Get handler not working

from your previous email:
"There is no "id"
field defined in the schema."

you need an id field to use the get handler

On Wed, Mar 15, 2017 at 1:45 PM, Chris Ulicny  wrote:

> I thought that "id" and "ids" were fixed parameters for the get
> handler, but I never remember, so I've already tried both. Each time
> it comes back with the same response of no document.
>
> On Wed, Mar 15, 2017 at 1:31 PM Alexandre Rafalovitch
> 
> wrote:
>
> > Actually.
> >
> > I think Real Time Get handler has "id" as a magical parameter, not
> > as a field name. It maps to the real id field via the uniqueKey
> > definition:
> > https://cwiki.apache.org/confluence/display/solr/RealTime+Get
> >
> > So, if you have not, could you try the way you originally wrote it.
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> > On 15 March 2017 at 13:22, Chris Ulicny  wrote:
> > > Sorry, that is a typo. The get is using the iqdocid field. There
> > > is no
> > "id"
> > > field defined in the schema.
> > >
> > > solr/TestCollection/get?iqdocid=2957-TV-201604141900
> > >
> > > solr/TestCollection/select?q=*:*&fq=iqdocid:2957-TV-201604141900
> > >
> > > On Wed, Mar 15, 2017 at 1:15 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> Is this a typo or are you trying to use get with an "id" field
> > >> and your filter query uses "iqdocid"?
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Wed, Mar 15, 2017 at 8:31 AM, Chris Ulicny 
> wrote:
> > >> > Yes, we're using a fixed schema with the iqdocid field set as
> > >> > the
> > >> uniqueKey.
> > >> >
> > >> > On Wed, Mar 15, 2017 at 11:28 AM Alexandre Rafalovitch <
> > >> arafa...@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> What is your uniqueKey? Is it iqdocid?
> > >> >>
> > >> >> Regards,
> > >> >>Alex.
> > >> >> 
> > >> >> http://www.solr-start.com/ - Resources for Solr users, new and
> > >> experienced
> > >> >>
> > >> >>
> > >> >> On 15 March 2017 at 11:24, Chris Ulicny  wrote:
> > >> >> > Hi,
> > >> >> >
> > >> >> > I've been trying to use the get handler for a new solr cloud
> > >> collection
> > >> >> we
> > >> >> > are using, and something seems to be amiss.
> > >> >> >
> > >> >> > We are running 6.3.0, so we did not explicitly define the
> > >> >> > request
> > >> handler
> > >> >> > in the solrconfig since it's supposed to be implicitly defined.
> We
> > >> also
> > >> >> > have the update log enabled with the default configuration.
> > >> >> >
> > >> >> > Whene

RE: Data Import

2017-03-17 Thread Liu, Daphne
I just want to share my recent project. I have successfully sent all our EDI 
documents to Cassandra 3.7 clusters using Solr 6.3 Data Import JDBC Cassandra 
connector indexing our documents.
Since Cassandra is so fast for writing, compression rate is around 13% and all 
my documents can be keep in my Cassandra clusters' memory, we are very happy 
with the result.


Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / 
daphne@cevalogistics.com



-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Friday, March 17, 2017 9:54 AM
To: solr-user 
Subject: Re: Data Import

I feel DIH is much better for prototyping, even though people do use it in 
production. If you do want to use DIH, you may benefit from reviewing the 
DIH-DB example I am currently rewriting in
https://issues.apache.org/jira/browse/SOLR-10312 (may need to change 
luceneMatchVersion in solrconfig.xml first).

CSV, etc, could be useful if you want to keep history of past imports, again 
useful during development, as you evolve schema.

SolrJ may actually be easiest/best for production since you already have Java 
stack.

The choice is yours in the end.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 17 March 2017 at 08:56, Shawn Heisey  wrote:
> On 3/17/2017 3:04 AM, vishal jain wrote:
>> I am new to Solr and am trying to move data from my RDBMS to Solr. I know 
>> the available options are:
>> 1) Post Tool
>> 2) DIH
>> 3) SolrJ (as ours is a J2EE application).
>>
>> I want to know what is the recommended way for Data import in
>> production environment. Will sending data via SolrJ in batches be faster 
>> than posting a csv using POST tool?
>
> I've heard that CSV import runs EXTREMELY fast, but I have never
> tested it.  The same threading problem that I discuss below would
> apply to indexing this way.
>
> DIH is extremely powerful, but it has one glaring problem:  It's
> single-threaded, which means that only one stream of data is going
> into Solr, and each batch of documents to be inserted must wait for
> the previous one to finish inserting before it can start.  I do not
> know if DIH batches documents or sends them in one at a time.  If you
> have a manually sharded index, you can run DIH on each shard in
> parallel, but each one will be single-threaded.  That single thread is
> pretty efficient, but it's still only one thread.
>
> Sending multiple index updates to Solr in parallel (multi-threading)
> is how you radically speed up the Solr part of indexing.  This is
> usually done with a custom indexing program, which might be written
> with SolrJ or even in a completely different language.
>
> One thing to keep in mind with ANY indexing method:  Once the
> situation is examined closely, most people find that it's not Solr
> that makes their indexing slow.  The bottleneck is usually the source
> system -- how quickly the data can be retrieved.  It usually takes a
> lot longer to obtain the data than it does for Solr to index it.
>
> Thanks,
> Shawn
>
This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.


RE: Data Import

2017-03-17 Thread Liu, Daphne
NO, I use the free version. I have the driver from someone else. I can share it 
if you want to use Cassandra.
They have modified it for me since the free JDBC driver I found will timeout 
when the document is greater than 16mb.

Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / 
daphne@cevalogistics.com



-Original Message-
From: vishal jain [mailto:jain02...@gmail.com]
Sent: Friday, March 17, 2017 12:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Data Import

Hi Daphne,

Are you using DSE?


Thanks & Regards,
Vishal

On Fri, Mar 17, 2017 at 7:40 PM, Liu, Daphne 
wrote:

> I just want to share my recent project. I have successfully sent all
> our EDI documents to Cassandra 3.7 clusters using Solr 6.3 Data Import
> JDBC Cassandra connector indexing our documents.
> Since Cassandra is so fast for writing, compression rate is around 13%
> and all my documents can be keep in my Cassandra clusters' memory, we
> are very happy with the result.
>
>
> Kind regards,
>
> Daphne Liu
> BI Architect - Matrix SCM
>
> CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL
> 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 /
> daphne@cevalogistics.com
>
>
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Friday, March 17, 2017 9:54 AM
> To: solr-user 
> Subject: Re: Data Import
>
> I feel DIH is much better for prototyping, even though people do use
> it in production. If you do want to use DIH, you may benefit from
> reviewing the DIH-DB example I am currently rewriting in
> https://issues.apache.org/jira/browse/SOLR-10312 (may need to change
> luceneMatchVersion in solrconfig.xml first).
>
> CSV, etc, could be useful if you want to keep history of past imports,
> again useful during development, as you evolve schema.
>
> SolrJ may actually be easiest/best for production since you already
> have Java stack.
>
> The choice is yours in the end.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
>
>
> On 17 March 2017 at 08:56, Shawn Heisey  wrote:
> > On 3/17/2017 3:04 AM, vishal jain wrote:
> >> I am new to Solr and am trying to move data from my RDBMS to Solr.
> >> I
> know the available options are:
> >> 1) Post Tool
> >> 2) DIH
> >> 3) SolrJ (as ours is a J2EE application).
> >>
> >> I want to know what is the recommended way for Data import in
> >> production environment. Will sending data via SolrJ in batches be
> faster than posting a csv using POST tool?
> >
> > I've heard that CSV import runs EXTREMELY fast, but I have never
> > tested it.  The same threading problem that I discuss below would
> > apply to indexing this way.
> >
> > DIH is extremely powerful, but it has one glaring problem:  It's
> > single-threaded, which means that only one stream of data is going
> > into Solr, and each batch of documents to be inserted must wait for
> > the previous one to finish inserting before it can start.  I do not
> > know if DIH batches documents or sends them in one at a time.  If
> > you have a manually sharded index, you can run DIH on each shard in
> > parallel, but each one will be single-threaded.  That single thread
> > is pretty efficient, but it's still only one thread.
> >
> > Sending multiple index updates to Solr in parallel (multi-threading)
> > is how you radically speed up the Solr part of indexing.  This is
> > usually done with a custom indexing program, which might be written
> > with SolrJ or even in a completely different language.
> >
> > One thing to keep in mind with ANY indexing method:  Once the
> > situation is examined closely, most people find that it's not Solr
> > that makes their indexing slow.  The bottleneck is usually the
> > source system -- how quickly the data can be retrieved.  It usually
> > takes a lot longer to obtain the data than it does for Solr to index it.
> >
> > Thanks,
> > Shawn
> >
> This e-mail message is intended for the above named recipient(s) only.
> It may contain confidential information that is privileged. If you are
> not the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this e-mail and any
> attachment(s) is strictly prohibited. If you have received this e-mail
> by error, please immediately notify the sender by replying to this
> e-mail and deleting the message including 

Can solrcloud be running on a read-only filesystem?

2017-06-02 Thread Wudong Liu
Hi All:

We have a normal build/stage -> prod settings for our production pipeline.
And we would build solr index in the build environment and then the index
is copied to the prod environment.

The solrcloud in prod seems working fine when the file system backing it is
writable. However, we see many errors when the file system is readonly.
Many exceptions are thrown regarding the tlog file cannot be open for write
when the solr nodes are restarted with the new data; some of the nodes
eventually are stuck in the recovering phase and never able to go back
online in the cloud.

Just wondering is anyone has any experience on Solrcloud running in
readonly file system? Is it possible at all?

Regards,
Wudong


Re: Multiple Languages in Same Core

2014-03-26 Thread Liu Bo
Hi Jeremy

There're a lot of multi language discussions, two main approaches
 1. like yours, a language is one core
 2. all in one core, different language has it's own field.

We have multi-language support in a single core, each multilingual field
has it's own suffix such as name_en_US. We customized query handler to hide
the query details to client.
The main reason we want to do this is about NRT index and search,
take product for example:

product has price, quantity which is common and it's used by filtering
and sorting, name, description is multi language field,
if we split product in do different cores, the common field updating
may end up a update in all of the multi language cores.

As to scalability, we don't change solr cores/collections when a new
language is added, but we probably need update our customized index process
and run a full re-index.

This approach suits our requirement for now, but you may have your own
concerns.

We have similar "suggest filter" problem like yours, we want to return
suggest result filtering by stores. I can't find a way to build dictionary
with query at my version of solr 4.6

What I do is run a query on a N-Gram analyzed field and with filter queries
on store_id field. The "suggest" is actually a query. It may not perform as
well as suggestion but can do the trick.

You can try it to build a additional N-GRAM field for suggestion only and
search on it with fq on your "Locale" field.

All the best

Liu Bo




On 25 March 2014 09:15, Alexandre Rafalovitch  wrote:

> Solr In Action has a significant discussion on the multi-lingual
> approach. They also have some code samples out there. Might be worth a
> look
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson
>  wrote:
> > I recently deployed Solr to back the site search feature of a site I work
> > on. The site itself is available in hundreds of languages. With the
> initial
> > release of site search we have enabled the feature for ten of those
> > languages. This is distributed across eight cores, with two Chinese
> > languages plus Korean combined into one CJK core and each of the other
> > seven languages in their own individual cores. The reason for splitting
> > these into separate cores was so that we could have the same field names
> > across all cores but have different configuration for analyzers, etc, per
> > core.
> >
> > Now I have some questions on this approach.
> >
> > 1) Scalability: Considering I need to scale this to many dozens more
> > languages, perhaps hundreds more, is there a better way so that I don't
> end
> > up needing dozens or hundreds of cores? My initial plan was that many
> > languages that didn't have special support within Solr would simply get
> > lumped into a single "default" core that has some default analyzers that
> > are applicable to the majority of languages.
> >
> > 1b) Related to this: is there a practical limit to the number of cores
> that
> > can be run on one instance of Lucene?
> >
> > 2) Auto Suggest: In phase two I intend to add auto-suggestions as a user
> > types a query. In reviewing how this is implemented and how the
> suggestion
> > dictionary is built I have concerns. If I have more than one language in
> a
> > single core (and I keep the same field name for suggestions on all
> > languages within a core) then it seems that I could get suggestions from
> > another language returned with a suggest query. Is there a way to build a
> > separate dictionary for each language, but keep these languages within
> the
> > same core?
> >
> > If it's helpful to know: I have a field in every core for "Locale".
> Values
> > will be the locale of the language of that document, i.e. "en", "es",
> > "zh_hans", etc. I'd like to be able to: 1) when building a suggestion
> > dictionary, divide it into multiple dictionaries, grouping them by
> locale,
> > and 2) supply a parameter to the suggest query that allows the suggest
> > component to only return suggestions from the appropriate dictionary for
> > that locale.
> >
> > If the answer to #1 is "keep splitting groups of languages that have
> > different analyzers into their own cores" and the answer to #2 is "that's
> > not supported", then I'd be curious: where would I start to write my own
> > extension that supported #2? I looked last night at the suggest lookup
> > classes, dictionary classes, etc. But I didn't see a clear point where it
> > would be clean to implement something like I'm suggesting above.
> >
> > Best Regards,
> > Jeremy Thomerson
>



-- 
All the best

Liu Bo


Re: Where to specify numShards when startup up a cloud setup

2014-04-18 Thread Liu Bo
Hi zzT

Putting numShards in core.properties also works.

I struggled a little bit while figuring out this "configuration approach".
I knew I am not alone! ;-)


On 2 April 2014 18:06, zzT  wrote:

> It seems that I've figured out a "configuration approach" to this issue.
>
> I'm having the exact same issue and the only viable solutions found on the
> net till now are
> 1) Pass -DnumShards=x when starting up Solr server
> 2) Use the Collections API as indicated by Shawn.
>
> What I've noticed though - after making the call to /collections to create
> a
> node solr.xml - is that a new  entry is added inside solr.xml with
> the
> attribute "numShards".
>
> So, right now I'm configuring solr.xml with numShards attribute inside my
>  nodes. This way I don't have to worry with annoying stuff you've
> already mentioned e.g. waiting for Solr to start up etc.
>
> Of course same logic applies here, numShards param is meanigful only the
> first time. Even if you change it at a later point the # of shards stays
> the
> same.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Where-to-specify-numShards-when-startup-up-a-cloud-setup-tp4078473p4128566.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
All the best

Liu Bo


solr always loading and not any response

2014-07-24 Thread zhijun liu
hi, all, solr admin page is always "loading", and when I send query request
also can not get any response. the tcp link is  always "ESTABLISHED"。only
restart solr service can fix it. how to find out the problem?

solr:4.6
jetty:8

thanks so much.


Re: SolrDocumentList - bitwise operation

2013-10-13 Thread Liu Bo
join query might be helpful: http://wiki.apache.org/solr/Join

join can across indexes but probably won't work in solr clound.

be aware that only "to" documents are retrievable, if you want content from
both documents, join query won't work. And in lucene join query doesn't
quite work on multiple join conditions, haven't test it in solr yet.

I have similar join case like you, eventually I choose to denormalize our
data into one set of documents.


On 13 October 2013 22:34, Michael Tyler  wrote:

> Hello,
>
> I have 2 different solr indexes returning 2 different sets of
> SolrDocumentList. Doc Id is the foreign key relation.
>
> After obtaining them, I want to perform "AND" operation between them and
> then return results to user. Can you tell me how do I get this? I am using
> solr 4.3
>
>  SolrDocumentList results1 = responseA.getResults();
>  SolrDocumentList results2 = responseB.getResults();
>
> results1  : d1, d2, d3
> results2  :  d1,d2, d4
>
> Return : d1, d2
>
> Regards,
> Michael
>



-- 
All the best

Liu Bo


how does solr load plugins?

2013-10-16 Thread Liu Bo
Hi

I write a plugin to index contents reusing our DAO layer which is developed
using Spring.

What I am doing now is putting the plugin jar and all other depending jars
of DAO layer to shared lib folder under solr home.

In the log, I can see all the jars are loaded through SolrResourceLoader
like:

INFO  - 2013-10-16 16:25:30.611; org.apache.solr.core.SolrResourceLoader;
Adding 'file:/D:/apache-tomcat-7.0.42/solr/lib/spring-tx-3.1.0.RELEASE.jar'
to classloader


Then initialize the Spring context using:

ApplicationContext context = new
FileSystemXmlApplicationContext("/solr/spring/solr-plugin-bean-test.xml");


Then Spring will complain:

INFO  - 2013-10-16 16:33:57.432;
org.springframework.context.support.AbstractApplicationContext; Refreshing
org.springframework.context.support.FileSystemXmlApplicationContext@e582a85:
startup date [Wed Oct 16 16:33:57 CST 2013]; root of context hierarchy
INFO  - 2013-10-16 16:33:57.491;
org.springframework.beans.factory.xml.XmlBeanDefinitionReader; Loading XML
bean definitions from file
[D:\apache-tomcat-7.0.42\solr\spring\solr-plugin-bean-test.xml]
ERROR - 2013-10-16 16:33:59.944;
com.test.search.solr.spring.AppicationContextWrapper; Configuration
problem: Unable to locate Spring NamespaceHandler for XML schema namespace [
http://www.springframework.org/schema/context]
Offending resource: file
[D:\apache-tomcat-7.0.42\solr\spring\solr-plugin-bean-test.xml]

Spring context requires spring-tx-3.1.xsd which does exist
in spring-tx-3.1.0.RELEASE.jar under
"org\springframework\transaction\config\" package, but the program can't
find it even though it could load spring classes successfully.

The following won't work either.

ApplicationContext context = new
ClassPathXmlApplicationContext("classpath:spring/solr-plugin-bean-test.xml");
//the solr-plugin-bean-test.xml is packaged in plugin.jar as well.

But when I but all the jars under TOMECAT_HOME/webapp/solr/WEB-INF/lib, and
using

ApplicationContext context = new
ClassPathXmlApplicationContext("classpath:spring/solr-plugin-bean-test.xml");

everything works fine, I could initialize spring context and load DAO beans
to read data and then write them to solr index. But isn't modifying
solr.war a bad practice?

It seems SolrResourceLoader only loads classes from plugins jars but these
jars are NOT in classpath. Please correct me if I am wrong,

Is there any ways to use resources in plugin jars such as configuration
file?

BTW is there any difference between SolrResourceLoader with tomcat webapp
classLoader?

-- 
All the best

Liu Bo


Re: eDisMax, multiple language support and stopwords

2013-11-11 Thread Liu Bo
Happy to see some one have similar solutions as ours.

we have similar multi-language search feature and we index different
language content to _fr, _en field like you've done

but in search, we need a language code as a parameter to specify the
language client wants to search on which is normally decided by the website
visited, such as: qf=name description&language=en

and in our search components we find the right field: name_en and
description_en to be searched on

we used to support on all language search and removed that later, as the
site tells the customer which language is supported, we also don't think we
have many language experts on our web sites that knows more than two
language and need to search them at the same time.


On 7 November 2013 23:01, Tom Mortimer  wrote:

> Ah, thanks Markus. I think I'll just add the Boolean operators to the
> stopwords list in that case.
>
> Tom
>
>
>
> On 7 November 2013 12:01, Markus Jelsma 
> wrote:
>
> > This is an ancient problem. The issue here is your mm-parameter, it gets
> > confused because for separate fields different amount of tokens are
> > filtered/emitted so it is never going to work just like this. The easiest
> > option is not to use the stopfilter.
> >
> >
> >
> http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html
> > https://issues.apache.org/jira/browse/SOLR-3085
> >
> > -Original message-
> > > From:Tom Mortimer 
> > > Sent: Thursday 7th November 2013 12:50
> > > To: solr-user@lucene.apache.org
> > > Subject: eDisMax, multiple language support and stopwords
> > >
> > > Hi all,
> > >
> > > Thanks for the help and advice I've got here so far!
> > >
> > > Another question - I want to support stopwords at search time, so that
> > e.g.
> > > the query "oscar and wilde" is equivalent to "oscar wilde" (this is
> with
> > > lowercaseOperators=false). Fair enough, I have stopword "and" in the
> > query
> > > analyser chain.
> > >
> > > However, I also need to support French as well as English, so I've got
> > _en
> > > and _fr versions of the text fields, with appropriate stemming and
> > > stopwords. I index French content into the _fr fields and English into
> > the
> > > _en fields. I'm searching with eDisMax over both versions, e.g.:
> > >
> > > headline_en headline_fr
> > >
> > > However, this means I get no results for "oscar and wilde". The parsed
> > > query is:
> > >
> > > (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar))
> > > DisjunctionMaxQuery((headline_fr:and))
> > > DisjunctionMaxQuery((headline_fr:wild |
> headline_en:wild)))~3))/no_coord
> > >
> > > If I add "and" to the French stopwords list, I *do* get results, and
> the
> > > parsed query is:
> > >
> > > (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar))
> > > DisjunctionMaxQuery((headline_fr:wild |
> headline_en:wild)))~2))/no_coord
> > >
> > > This implies that the only solution is to have a minimal, shared
> > stopwords
> > > list for all languages I want to support. Is this correct, or is there
> a
> > > way of supporting this kind of searching with per-language stopword
> > lists?
> > >
> > > Thanks for any ideas!
> > >
> > > Tom
> > >
> >
>



-- 
All the best

Liu Bo


Re: Multi-core support for indexing multiple servers

2013-11-11 Thread Liu Bo
like Erick said, merge data from different datasource could be very
difficult, SolrJ is much easier to use but may need another application to
do handle index process if you don't want to extends solr much.

I eventually end up with a customized request handler which use SolrWriter
from DIH package to index data,

So that I can fully control the index process, quite like SolrJ, you can
write code to convert your data into SolrInputDocument, and then post them
to SolrWriter, SolrWriter will handles the rest stuff.


On 8 November 2013 21:46, Erick Erickson  wrote:

> Yep, you can define multiple data sources for use with DIH.
>
> Combining data from those multiple sources into a single
> index can be a bit tricky with DIH, personally I tend to prefer
> SolrJ, but that's mostly personal preference, especially if
> I want to get some parallelism going on.
>
> But whatever works
>
> Erick
>
>
> On Thu, Nov 7, 2013 at 11:17 PM, manju16832003  >wrote:
>
> > Eric,
> > Just a question :-), wouldn't it be easy to use DIH to pull data from
> > multiple data sources.
> >
> > I do use DIH to do that comfortably. I have three data sources
> >  - MySQL
> >  - URLDataSource that returns XML from an .NET application
> >  - URLDataSource that connects to an API and return XML
> >
> > Here is part of data-config data source settings
> >  > driver="com.mysql.jdbc.Driver"
> > url="jdbc:mysql://localhost/employeeDB" batchSize="-1" user="root"
> > password="root"/>
> > > connectionTimeout="5000" readTimeout="1"/>
> > encoding="UTF-8"
> > connectionTimeout="5000" readTimeout="1"/>
> >
> >
> > Of course, in application I do the same.
> > To construct my results, I do connect to MySQL and those two data
> sources.
> >
> > Basically we have two point of indexing
> >  - Using DIH at one time indexing
> >  - At application whenever there is transaction to the details that we
> are
> > storing in Solr.
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
All the best

Liu Bo


Re: Multi-core support for indexing multiple servers

2013-11-12 Thread Liu Bo
As far as I know about magento, it's DB schema is designed for extensible
property storage and relationships between db tables are kind of complex.

Product has its attribute sets and properties which are stored in different
tables. Configurable product may have different attribute values for each
of it's sub simple products.

Handle relationship like this in DIH won't be easy, especially when you
want to group attributes of a configurable product into one document.

But if you just need to search on name and description but not other
attributes, you can try write DIH on catalog_product_flat_x tables, magento
may have several of them.

We used to use lucene core to provide search on magento products, what we
do is using SOAP service provided by magento to get products, and then
converting them to lucene document. Indexes are updated daily. This hides
lots of magento implementation details but it's kind of slow.




On 12 November 2013 22:41, Robert Veliz  wrote:

> I have two sources/servers--one of them is Magento. Since Magento has a
> more or less out of the box integration with Solr, my thought was to run
> Solr server from the Magento instance and then use DIH to get/merge content
> from the other source/server. Seem feasible/appropriate?  I spec'd it out
> and it seems to make sense...
>
> R
>
> > On Nov 11, 2013, at 11:25 PM, Liu Bo  wrote:
> >
> > like Erick said, merge data from different datasource could be very
> > difficult, SolrJ is much easier to use but may need another application
> to
> > do handle index process if you don't want to extends solr much.
> >
> > I eventually end up with a customized request handler which use
> SolrWriter
> > from DIH package to index data,
> >
> > So that I can fully control the index process, quite like SolrJ, you can
> > write code to convert your data into SolrInputDocument, and then post
> them
> > to SolrWriter, SolrWriter will handles the rest stuff.
> >
> >
> >> On 8 November 2013 21:46, Erick Erickson 
> wrote:
> >>
> >> Yep, you can define multiple data sources for use with DIH.
> >>
> >> Combining data from those multiple sources into a single
> >> index can be a bit tricky with DIH, personally I tend to prefer
> >> SolrJ, but that's mostly personal preference, especially if
> >> I want to get some parallelism going on.
> >>
> >> But whatever works
> >>
> >> Erick
> >>
> >>
> >> On Thu, Nov 7, 2013 at 11:17 PM, manju16832003  >>> wrote:
> >>
> >>> Eric,
> >>> Just a question :-), wouldn't it be easy to use DIH to pull data from
> >>> multiple data sources.
> >>>
> >>> I do use DIH to do that comfortably. I have three data sources
> >>> - MySQL
> >>> - URLDataSource that returns XML from an .NET application
> >>> - URLDataSource that connects to an API and return XML
> >>>
> >>> Here is part of data-config data source settings
> >>>  >>> driver="com.mysql.jdbc.Driver"
> >>> url="jdbc:mysql://localhost/employeeDB" batchSize="-1" user="root"
> >>> password="root"/>
> >>>encoding="UTF-8"
> >>> connectionTimeout="5000" readTimeout="1"/>
> >>>>> encoding="UTF-8"
> >>> connectionTimeout="5000" readTimeout="1"/>
> >>>
> >>>
> >>> Of course, in application I do the same.
> >>> To construct my results, I do connect to MySQL and those two data
> >> sources.
> >>>
> >>> Basically we have two point of indexing
> >>> - Using DIH at one time indexing
> >>> - At application whenever there is transaction to the details that we
> >> are
> >>> storing in Solr.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
> > --
> > All the best
> >
> > Liu Bo
>



-- 
All the best

Liu Bo


Re: deleting a doc inside a custom UpdateRequestProcessor

2013-11-18 Thread Liu Bo
hi,

you can try this in your checkIfIsDuplicate(), build a query based on
your title, and set it to a delete command:

//build your query accordingly, this depends on how your
tittle is indexed, eg analyzed or not. be careful with it and do some test.
  DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processDelete(cmd);

Processors are normally chained, you should make sure that your
processor comes the first so that it can control what's coming next based
on your logic.

you can also try to write your own updaterequesthandler instead of a
customized processor.

you can do a set of operations in your function
@Override
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {}

get your processor chain in this function and passes a delete command
to it such as :

SolrParams params = req.getParams();
checkParameter(params);
UpdateRequestProcessorChain processorChain =
req.getCore().getUpdateProcessingChain(params.get(UpdateParams.UPDATE_CHAIN));
UpdateRequestProcessor processor = processorChain.createProcessor(req,
rsp);

  DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processor.processDelete(cmd);

this is what I am doing when customizing a update request handler, I try
not to touch the original process chain but tell solr what to do by
commands.


On 19 November 2013 10:01, Peyman Faratin  wrote:

> Hi
>
> I am building a custom UpdateRequestProcessor to intercept any doc heading
> to the index. Basically what I want to do is to check if the current index
> has a doc with the same title (i am using IDs as the uniques so I can't use
> that, and besides the logic of checking is a little more complicated). If
> the incoming doc has a duplicate and some other conditions hold then one of
> 2 things can happen:
>
> 1- we don't index the incoming document
> 2- we index the incoming and delete the duplicate currently in the
> index
>
> I think (1) can be done by simple not passing the call up the chain (not
> calling super.processAdd(cmd)). However, I don't know how to implement the
> second condition, deleting the duplicate document, inside a custom
> UpdateRequestProcessor. This thread is the closest to my goal
>
> http://lucene.472066.n3.nabble.com/SOLR-4-3-0-Migration-How-to-use-DeleteUpdateCommand-td4062454.html
>
> however i am not clear how to proceed. Code snippets below.
>
> thank you in advance for your help
>
> class isDuplicate extends UpdateRequestProcessor
> {
> public isDuplicate( UpdateRequestProcessor next) {
>   super( next );
> }
> @Override
> public void processAdd(AddUpdateCommand cmd) throws
> IOException {
> try
> {
> boolean indexIncomingDoc =
> checkIfIsDuplicate(cmd);
> if(indexIncomingDoc)
> super.processAdd(cmd);
> } catch (SolrServerException e)
> {e.printStackTrace();}
> catch (ParseException e) {e.printStackTrace();}
> }
> public boolean checkIfIsDuplicate(AddUpdateCommand cmd)
> ...{
>
> SolrInputDocument incomingDoc =
> cmd.getSolrInputDocument();
> if(incomingDoc == null) return false;
> String title = (String) incomingDoc.getFieldValue(
> "title" );
> SolrIndexSearcher searcher =
> cmd.getReq().getSearcher();
> boolean addIncomingDoc = true;
> Integer idOfDuplicate = searcher.getFirstMatch(new
> Term("title",title));
> if(idOfDuplicate != -1)
> {
> addIncomingDoc =
> compareDocs(searcher,incomingDoc,idOfDuplicate,title,addIncomingDoc);
> }
> return addIncomingDoc;
> }
> private boolean compareDocs(.){
> 
> if( condition 1 )
> {
> --> DELETE DUPLICATE DOC in INDEX <--
> addIncomingDoc = true;
> }
> 
> return addIncomingDoc;
> }




-- 
All the best

Liu Bo


an "array" liked string is treated as multivalued when adding doc to solr

2013-12-05 Thread Liu Bo
Dear solr users:

I've met this kind of error several times,

when add a "array" liked string such as:[Get 20% Off Official Barça Kits,
coupon] to a  multiValued="false" field, solr will complain:

org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692]
multiple values encountered for non multiValued field name_en_US: [Get 20%
Off Official Barca Kits, coupon]

my schema defination:


This field is stored as the search result needs this field and it's value
in original format, and indexed to give it a boost while searching .

What I do is adding name (java.lang.String) to SolrInputDocument by
addField("name_en_US", product.getName()) method, and then add this to solr
using an AddUpdateCommand

It seems solr treats this kind of string data as multivalued, even I add
this field to solr only once.

Is this a bug or a supposed behavior?

Is there any way to tell solr this is not a "multivalued value" add don't
break it?

Your help and suggestion will be much of my appreciation.

-- 
All the best

Liu Bo


Re: an "array" liked string is treated as multivalued when adding doc to solr

2013-12-17 Thread Liu Bo
Hey Furkan and solr users

This is a miss reported problem. It's not solr problem but our data issue.
Sorry for this.

It's a data issue of our side, a coupon happened to have two piece English
description, which is not allowed in our business logic, but it happened
 and we added twice of the name_en_US to solr document.

I've done a set of test and deep debugging to solr source code, and found
out that a array like string such as  [Get 20% Off Official Barca Kits,
coupon] won't be treated as multivalued field.

Sorry again for not digging more before sent out question email. I trust
our business logic and data integrity more than solr, I will definitely not
do this again. ;-)

All the best

Liu Bo



On 11 December 2013 07:21, Furkan KAMACI  wrote:

> Hi Liu;
>
> Yes. it is an expected behavior. If you send data within square brackets
> Solr will behave it as a multivalued field. You can test it with this way:
> if you use Solrj and use a List for a field it will be considered as
> multivalued too because when you call toString() method of your List you
> can see that elements are printed within square brackets. This is the
> reason that a List can be used for a multivalued field.
>
> If you explain your situation I can offer a way how to do it.
>
> Thanks;
> Furkan KAMACI
>
>
> 2013/12/6 Liu Bo 
>
> > Dear solr users:
> >
> > I've met this kind of error several times,
> >
> > when add a "array" liked string such as:[Get 20% Off Official Barça Kits,
> > coupon] to a  multiValued="false" field, solr will complain:
> >
> > org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692]
> > multiple values encountered for non multiValued field name_en_US: [Get
> 20%
> > Off Official Barca Kits, coupon]
> >
> > my schema defination:
> >  > multiValued="false" />
> >
> > This field is stored as the search result needs this field and it's value
> > in original format, and indexed to give it a boost while searching .
> >
> > What I do is adding name (java.lang.String) to SolrInputDocument by
> > addField("name_en_US", product.getName()) method, and then add this to
> solr
> > using an AddUpdateCommand
> >
> > It seems solr treats this kind of string data as multivalued, even I add
> > this field to solr only once.
> >
> > Is this a bug or a supposed behavior?
> >
> > Is there any way to tell solr this is not a "multivalued value" add don't
> > break it?
> >
> > Your help and suggestion will be much of my appreciation.
> >
> > --
> > All the best
> >
> > Liu Bo
> >
>



-- 
All the best

Liu Bo


Re: PostingsSolrHighlighter

2013-12-18 Thread Liu Bo
hi Josip

for the 1 question we've done similar things: copying search field to a
text field. But highlighting is normally on specific fields such as tittle
depending on how the search content is displayed to the front end, you can
search on text and highlight on the field you wanted by specify hl.fl

ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl


On 17 December 2013 02:29, Josip Delic  wrote:

> Hi @all,
>
> i am playing with the "PostingsSolrHighlighter". I'm running solr 4.6.0
> and my configuration is from here:
>
> https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/highlight/
> PostingsSolrHighlighter.html
>
> Search query and result (not working):
>
> http://pastebin.com/13Uan0ZF
>
> Schema (not complete):
>
> http://pastebin.com/JGa38UDT
>
> Search query and result (working):
>
> http://pastebin.com/4CP8XKnr
>
> Solr config:
>
> 
>   
>
>
> 
>
> So this is working just fine, but now i have some questions:
>
> 1.) With the old default highlighter component it was possible to search
> in "searchable_text" and to retrive highlighted "text". This is essential,
> because we use copyfield to put almost everything to searchable_text
> (title, subtitle, description, ...)
>
> 2.) I can't get ellipsis working i tried hl.tag.ellipsis=...,
> f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems
> to work, maxAnalyzedChars is just cutting the sentence?
>
> Kind Regards
>
> Josip Delic
>
>


-- 
All the best

Liu Bo


Re: an "array" liked string is treated as multivalued when adding doc to solr

2013-12-18 Thread Liu Bo
Hi Alexandre

It's quite a rare case, just one out of tens of thousands.

I'm planning to have every multilingual field as multivalued and just get
the first one while formatting the response to our business object.

The first value update processor seems a lot helpful, thank you.

All the best

Liu Bo


On 18 December 2013 15:26, Alexandre Rafalovitch  wrote:

> If this happens rarely and you want to deal with in on the way into Solr,
> you could just keep one of the values, using URP:
>
> http://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html
>
> Regards,
>Alex
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Wed, Dec 18, 2013 at 2:20 PM, Liu Bo  wrote:
>
> > Hey Furkan and solr users
> >
> > This is a miss reported problem. It's not solr problem but our data
> issue.
> > Sorry for this.
> >
> > It's a data issue of our side, a coupon happened to have two piece
> English
> > description, which is not allowed in our business logic, but it happened
> >  and we added twice of the name_en_US to solr document.
> >
> > I've done a set of test and deep debugging to solr source code, and found
> > out that a array like string such as  [Get 20% Off Official Barca Kits,
> > coupon] won't be treated as multivalued field.
> >
> > Sorry again for not digging more before sent out question email. I trust
> > our business logic and data integrity more than solr, I will definitely
> not
> > do this again. ;-)
> >
> > All the best
> >
> > Liu Bo
> >
> >
> >
> > On 11 December 2013 07:21, Furkan KAMACI  wrote:
> >
> > > Hi Liu;
> > >
> > > Yes. it is an expected behavior. If you send data within square
> brackets
> > > Solr will behave it as a multivalued field. You can test it with this
> > way:
> > > if you use Solrj and use a List for a field it will be considered as
> > > multivalued too because when you call toString() method of your List
> you
> > > can see that elements are printed within square brackets. This is the
> > > reason that a List can be used for a multivalued field.
> > >
> > > If you explain your situation I can offer a way how to do it.
> > >
> > > Thanks;
> > > Furkan KAMACI
> > >
> > >
> > > 2013/12/6 Liu Bo 
> > >
> > > > Dear solr users:
> > > >
> > > > I've met this kind of error several times,
> > > >
> > > > when add a "array" liked string such as:[Get 20% Off Official Barça
> > Kits,
> > > > coupon] to a  multiValued="false" field, solr will complain:
> > > >
> > > > org.apache.solr.common.SolrException: ERROR:
> [doc=7781396456243918692]
> > > > multiple values encountered for non multiValued field name_en_US:
> [Get
> > > 20%
> > > > Off Official Barca Kits, coupon]
> > > >
> > > > my schema defination:
> > > >  > > > multiValued="false" />
> > > >
> > > > This field is stored as the search result needs this field and it's
> > value
> > > > in original format, and indexed to give it a boost while searching .
> > > >
> > > > What I do is adding name (java.lang.String) to SolrInputDocument by
> > > > addField("name_en_US", product.getName()) method, and then add this
> to
> > > solr
> > > > using an AddUpdateCommand
> > > >
> > > > It seems solr treats this kind of string data as multivalued, even I
> > add
> > > > this field to solr only once.
> > > >
> > > > Is this a bug or a supposed behavior?
> > > >
> > > > Is there any way to tell solr this is not a "multivalued value" add
> > don't
> > > > break it?
> > > >
> > > > Your help and suggestion will be much of my appreciation.
> > > >
> > > > --
> > > > All the best
> > > >
> > > > Liu Bo
> > > >
> > >
> >
> >
> >
> > --
> > All the best
> >
> > Liu Bo
> >
>



-- 
All the best

Liu Bo


Re: PostingsSolrHighlighter

2013-12-18 Thread Liu Bo
Hi Josip

that's quite weird, to my experience highlight is strict on string field
which needs a exact match, text fields should be fine.

I copy your schema definition and do a quick test in a new core, everything
is default from the tutorial, and the search component is
using solr.HighlightComponent .

search on searchable_text can highlight text, I copied your search url and
just change the host part, the input parameters are exactly the same,

result is attached.

Can you upload your complete solrconfig.xml and schema.xml?


On 18 December 2013 19:02, Josip Delic  wrote:

> Am 18.12.2013 09:55, schrieb Liu Bo:
>
>> hi Josip
>>
>
> hi liu,
>
>
>  for the 1 question we've done similar things: copying search field to a
>> text field. But highlighting is normally on specific fields such as tittle
>> depending on how the search content is displayed to the front end, you can
>> search on text and highlight on the field you wanted by specify hl.fl
>>
>> ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl
>>
>
> thats exactly what i'm doing in that pastebin:
>
> http://pastebin.com/13Uan0ZF
>
> I'm searing there for 'q=searchable_text:labore' this is present in 'text'
> and in the copyfield 'searchable_text' but it is not highlighted in 'text'
> (hl.fl=text)
>
> The same query is working if set 'q=text:labore' as you can see in
>
> http://pastebin.com/4CP8XKnr
>
> For 2 question i figured out that the PostingsSolrHighlighter "ellipsis"
> is not like i thought for adding "ellipsis" to start or/and end in
> highlighted text. It is instead used to combine multiple snippets together
> if snippets is > 1.
>
> cheers
>
> josip
>
>
>
>>
>> On 17 December 2013 02:29, Josip Delic  wrote:
>>
>>  Hi @all,
>>>
>>> i am playing with the "PostingsSolrHighlighter". I'm running solr 4.6.0
>>> and my configuration is from here:
>>>
>>> https://lucene.apache.org/solr/4_6_0/solr-core/org/
>>> apache/solr/highlight/
>>> PostingsSolrHighlighter.html
>>>
>>> Search query and result (not working):
>>>
>>> http://pastebin.com/13Uan0ZF
>>>
>>> Schema (not complete):
>>>
>>> http://pastebin.com/JGa38UDT
>>>
>>> Search query and result (working):
>>>
>>> http://pastebin.com/4CP8XKnr
>>>
>>> Solr config:
>>>
>>> 
>>>
>>>
>>>
>>> 
>>>
>>> So this is working just fine, but now i have some questions:
>>>
>>> 1.) With the old default highlighter component it was possible to search
>>> in "searchable_text" and to retrive highlighted "text". This is
>>> essential,
>>> because we use copyfield to put almost everything to searchable_text
>>> (title, subtitle, description, ...)
>>>
>>> 2.) I can't get ellipsis working i tried hl.tag.ellipsis=...,
>>> f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems
>>> to work, maxAnalyzedChars is just cutting the sentence?
>>>
>>> Kind Regards
>>>
>>> Josip Delic
>>>
>>>
>>>
>>
>>
>
>


-- 
All the best

Liu Bo
http://localhost:8080/solr/try/select?wt=json&fl=text&%2Cscore=&hl=true&hl.fl=text&q=%28searchable_text%3Alabore%29&rows=10&sort=score+desc&start=0

{
"responseHeader": {
"status": 0,
"QTime": 36,
"params": {
"sort": "score desc",
"fl": "text",
"start": "0",
",score": "",
"q": "(searchable_text:labore)",
"hl.fl": "text",
"wt": "json",
"hl": "true",
"rows": "10"
}
},
"response": {
"numFound": 3,
"start": 0,
"docs": [
{
"text": "Lorem ipsum dolor sit amet, consetetur sadipscing 
elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna 
aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores 
et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum 
dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed 
diam nonumy eirmod tempor invidunt ut labore et dolore magna aliq

Re: Chaining plugins

2013-12-31 Thread Liu Bo
Hi

I've done similar things as paul.

what I do is extending the default QueryComponent and overwrite the
preparing method,

then I just change the solrparams according to our logic and then call
super.prepare(). Then replace the default QueryComponent with it in my
search/query handler.

In this way, nothing of solr default behavior is touched. I think you can
do your logic in prepare method, and then let solr proceed the search.

I've tested it along with other components in both single solr node and
solrcloud. It works fine.

Hope it helps

Cheers

Bold



On 31 December 2013 06:03, Chris Hostetter  wrote:

>
> You don't need to write your own handler.
>
> See the previpous comment about implementing a SearchComponent -- you can
> check for the params in your prepare() method and do whatever side effects
> you want, then register your custom component and hook it into the
> component chain of whatever handler configuration you want (either using
> the "components"  or by specifying it as a "first-components"...
>
>
> https://cwiki.apache.org/confluence/display/solr/RequestHandlers+and+SearchComponents+in+SolrConfig
>
> : I want to save the query into a file when a user is changing a parameter
> in
> : the query, lets say he adds "logTofile=1" then the searchHandler will
> : provide the same result as without this parameter, but in the background
> it
> : will do some logic(ex. save the query to file) .
> : But I dont want to touch solr source code, all I want is to add code(like
> : plugin). if i understand it right I want to write my own search handler
> , do
> : some logic , then pass the data to solr default search handler.
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>



-- 
All the best

Liu Bo


Re: Grouping results with group.limit return wrong numFound ?

2013-12-31 Thread Liu Bo
Hi

I've met the same problem, and I've googled it around but not found direct
solution.

But there's a work around, do a facet on your group field, with parameters
like

   true
   your_field
   -1
   1

and then count how many facted pairs in the response. This should be the
same with the number of documents after grouping.

Cheers

Bold




On 31 December 2013 06:40, Furkan KAMACI  wrote:

> Hi;
>
> group.limit is: the number of results (documents) to return for each group.
> Defaults to 1. Did you check the page here:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604232
>
> Thanks;
> Furkan KAMACI
>
>
> 25 Aralık 2013 Çarşamba tarihinde tasmaniski  adlı
> kullanıcı şöyle yazdı:
> > Hi All, When I perform a search with grouping result in a groups and do
> limit
> > results in one group I got that *numFound* is the same as I didn't use
> > limit.looks like SOLR first perform search and calculate numFound and
> that
> > group and limit the results.I do not know if this is a bug or a feature
> > :)But I cannot use pagination and other stuff.Is there any workaround or
> I
> > missed something ?Example:I want to search book title and limit the
> search
> > to 3 results per one publisher.q=book_title: solr
> > php&group=true&group.field=publisher&group.limit=3&group.main=trueI have
> for
> > apress publisher 20 results but I show only 3 that works OKBut in
> numFound I
> > still have 20 for apress publisher...
> >
> >
> >
> > --
> > View this message in context:
>
> http://lucene.472066.n3.nabble.com/Grouping-results-with-group-limit-return-wrong-numFound-tp4108174.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
All the best

Liu Bo


Re: Grouping results with group.limit return wrong numFound ?

2014-01-01 Thread Liu Bo
hi @Ahmet

I've thought about using group.ngroups=true , but when you use
group.main=true, there's no "ngroups" field in the response.

and according to http://wiki.apache.org/solr/FieldCollapsing, the result
might not be correct in solrcloud.

I don't like using facet for this but seems have to...


On 1 January 2014 00:35, Ahmet Arslan  wrote:

> Hi Tasmaniski,
>
> I don't follow. How come Liu's faceting workaround and n.groups=true
> produce different results?
>
>
>
>
>
>
> On Tuesday, December 31, 2013 6:08 PM, tasmaniski 
> wrote:
> @kamaci
> Ofcourse. That is the problem.
>
> "group.limit is: the number of results (documents) to return for each
> group."
> NumFound is number of total found, but *not* sum number of *return for each
> group.*
>
> @Liu Bo
> seems to be the is only workaround for problem but
> it's to much expensive to go through all the groups and calculate total
> number of found/returned (I use PHP for client:) ).
>
> @iorixxx
> Yes, I consider that (group.ngroups=true)
> but in some group I have number of found result  lesser than limit.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Grouping-results-with-group-limit-return-wrong-numFound-tp4108174p4108906.html
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
All the best

Liu Bo


how can I use DataImportHandler on multiple MySQL databases with the same schema?

2013-09-17 Thread Liu Bo
Hi all

Our system has distributed MySQL databases, we create a database for every
customer signed up and distributed it to one of our MySQL hosts.

We currently use lucene core to perform search on these databases, and we
write java code to loop through these databases and convert the data to
lucene index.

Right now we are planning to move to Solr for distribution, and I am doing
investigation on it.

I tried to use DataImportHandler<http://wiki.apache.org/solr/DataImportHandler>
in
the wiki page, but I can't figured out a way to use multiple datasoures
with the same schema.

The other question is, we have the database connection data in one table,
can I create datasource connections info from it, and loop through the
databases using DataImporter?

If DataImporter isn't working, is there a way to feed data to solr using
customized SolrRequestHandler without using SolrJ?

If neither of these two ways is working, I think I am going to reuse the
DAO of the old project and feed the data to solr using SolrJ, probably
using embedded Solr server.

Your help will be much of my appreciation.

<http://wiki.apache.org/solr/DataImportHandlerFaq>--
All the best

Liu Bo


documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json

2013-09-30 Thread Liu Bo
mmit=false}
{commit=} 0 42

6) later I found the range is null in clusterstate.json which might have
caused the document isn't committed distributively

{"content_collection":{

"shards":{

  "shard1":{

   * "range":null,*

"state":"active",

"replicas":{"core_node1":{

"state":"active",

"core":"content",

"node_name":"10.199.46.176:8080_solr",

"base_url":"http://10.199.46.176:8080/solr";,

"leader":"true"}}},

  "shard3":{

   * "range":null,*

"state":"active",

"replicas":{"core_node2":{

"state":"active",

"core":"content",

"node_name":"10.199.46.202:8080_solr",

"base_url":"http://10.199.46.202:8080/solr";,

"leader":"true"}}},

  "shard2":{

   * "range":null,*

"state":"active",

"replicas":{"core_node3":{

"state":"active",

"core":"content",

"node_name":"10.199.46.165:8080_solr",

"base_url":"http://10.199.46.165:8080/solr";,

"leader":"true",

*"router":"implicit"*}}



-- 
All the best

Liu Bo


documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json

2013-10-06 Thread Liu Bo
teProcessor; [content] webapp=/solr
path=/update
params={waitSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false}
{commit=} 0 42

6) later I found the range is null in clusterstate.json which might have
caused the document isn't committed distributively

{"content_collection":{

"shards":{

  "shard1":{

   * "range":null,*

"state":"active",

"replicas":{"core_node1":{

"state":"active",

"core":"content",

"node_name":"10.199.46.176:8080_solr",

"base_url":"http://10.199.46.176:8080/solr";,

"leader":"true"}}},

  "shard3":{

   * "range":null,*

"state":"active",

"replicas":{"core_node2":{

"state":"active",

"core":"content",

"node_name":"10.199.46.202:8080_solr",

"base_url":"http://10.199.46.202:8080/solr";,

"leader":"true"}}},

  "shard2":{

   * "range":null,*

"state":"active",

"replicas":{"core_node3":{

"state":"active",

"core":"content",

"node_name":"10.199.46.165:8080_solr",

"base_url":"http://10.199.46.165:8080/solr";,

"leader":"true",

*"router":"implicit"*}}



-- 
All the best

Liu Bo


Re: documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json

2013-10-08 Thread Liu Bo
I've solved this problem myself.

If you use core discovery, you must specify the "numShards" parameter in
core.properties.
or else solr won't be allocate range for each shards and then documents
won't be distributed properly.

Using core discovery to set up solr cloud in tomcat is much easier and
clean than coreAdmin described in the wiki:
http://wiki.apache.org/solr/SolrCloudTomcat.

It costs me some time to move from jetty to tomcat, but I think our IT team
will like this way. :)




On 6 October 2013 23:53, Liu Bo  wrote:

> Hi all
>
> I've sent out this mail before, but I only subscribed to lucene-user but
> not solr-user at that time. Sorry for repeating if any and your help will
> be much of my appreciation.
>
> I'm trying out the tutorial about solrcloud, and then I manage to write my
> own plugin to import data from our set of databases, I use SolrWriter from
> DataImporter package and the docs could be distributed commit to shards.
>
> Every thing works fine using jetty from the solr example, but when I move
> to tomcat, solrcloud seems not been configured right. As the documents are
> just committed to the shard where update requested goes to.
>
> The cause probably is the range is null for shards in clusterstate.json.
> The router is "implicit" instead of "compositeId" as well.
>
> Is there anything missed or configured wrong in the following steps? How
> could I fix it. Your help will be much of my appreciation.
>
> PS, solr cloud tomcat wiki page isn't up to 4.4 with core discovery, I'm
> trying out after reading SoclrCloud, SolrCloudJboss, and CoreAdmin wiki
> pages.
>
> Here's what I've done and some useful logs:
>
> 1. start three zookeeper server.
> 2. upload configuration files to zookeeper, the collection name is
> "content_collection"
> 3. start three tomcat instants on three server with core discovery
>
> a) core file:
>  name=content
>  loadOnStartup=true
>  transient=false
>  shard=shard1   (differrent on servers)
>  collection=content_collection
> b) solr.xml
>
>  
>
>   
>
> ${host:}
>
> ${hostContext:solr}
>
> 8080
>
> ${zkClientTimeout:15000}
>
> 10.199.46.176:2181,10.199.46.165:2181,
> 10.199.46.158:2181
>
> ${genericCoreNodeNames:true}
>
>   
>
>
>   
> class="HttpShardHandlerFactory">
>
> ${socketTimeout:0}
>
> ${connTimeout:0}
>
>   
>
> 
>
> 4. In the solr.log, I see the three shards are recognized, and the
> solrcloud can see the content_collection has three shards as well.
> 5. write documents to content_collection using my update request, the
> documents only commits to the shard the request goes to, in the log I can
> see the DistributedUpdateProcessorFactory is in the processorChain and
> disribute commit is triggered:
>
> INFO  - 2013-09-30 16:31:43.205;
> com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
> updata request processor factories:
>
> INFO  - 2013-09-30 16:31:43.206;
> com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
> org.apache.solr.update.processor.LogUpdateProcessorFactory@4ae7b77
>
> INFO  - 2013-09-30 16:31:43.207;
> com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
> org.apache.solr.update.processor.*DistributedUpdateProcessorFactory*
> @5b2bc407
>
> INFO  - 2013-09-30 16:31:43.207;
> com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
> org.apache.solr.update.processor.RunUpdateProcessorFactory@1652d654
>
> INFO  - 2013-09-30 16:31:43.283; org.apache.solr.core.SolrDeletionPolicy;
> SolrDeletionPolicy.onInit: commits: num=1
>
>
> commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1}
>
> INFO  - 2013-09-30 16:31:43.284; org.apache.solr.core.SolrDeletionPolicy;
> newest commit generation = 1
>
> INFO  - 2013-09-30 16:31:43.440; *org.apache.solr.update.SolrCmdDistributor;
> Distrib commit to*:[StdNode: http://10.199.46.176:8080/solr/content/,
> StdNode: http://10.199.46.165:8080/solr/content/]
> params:commit_end_point=true&commit=true&softCommit=false&waitSearcher=true&expungeDeletes=false
>
> but the documents won't go to other shards, the other shards only has a
> request with not documents:
>
> INFO  - 2013-09-30 16:31:43.841;
> org.apache.solr.update.DirectUpdateHandler2; start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>
> INFO  -

Re: Multiple schemas in the same SolrCloud ?

2013-10-10 Thread Liu Bo
you can try this way:

start zookeeper server first.

upload your configurations to zookeeper and link them to your collection
using zkcli just like shawn said

let's say you have conf1 and conf2, you can link them to collection1 and
collection2

remove the bootstrap stuff and start solr server.

after you have solr running, create collection1 and collection2 via core
admin, you don't have conf because all your core specified configurations
are in zookeeper

or you could use core discovery and have collection name specified in
core.properties, see :
http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29



On 10 October 2013 23:57, maephisto  wrote:

> On this topic, once you've uploaded you collection's configuration in ZK,
> how
> can you update it?
> Upload the new one with the same config name ?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094729.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
All the best

Liu Bo


Re: SolrCore 'collection1' is not available due to init failure

2013-10-10 Thread Liu Bo
org.apache.solr.core.SolrCore.(SolrCore.java:821) ... 13 more Caused
by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out:
NativeFSLock@/usr/share/solr-4.5.0/example/solr/
collection1/data/index/write.lock:
java.io.FileNotFoundException:
/usr/share/solr-4.5.0/example/solr/collection1/data/index/write.lock
(Permission denied) at org.apache.lucene.store.Lock.obtain(Lock.java:84) at

it seems a permission problem, the user that start tomcat don't have
permission to access your index folder.

try grant read and write permission to current user to your solr data
folder and restart tomcat to see what happens.


-- 
All the best

Liu Bo


One case for shingle and synonym filter

2013-04-11 Thread Xiang Liu
Hi,
Here is the case:Given a doc named "sport center", we hope some query like 
"sportctr" (user ignore) can recall it.Can shingle and synonym filter be 
combined in some smart way to produce the term?
Thanks,Xiang
  

Re: FW: MMapDirectory failed to map a 23G compound index segment

2011-09-21 Thread Yongtao Liu
I hit similar issue recently.
Not sure if MMapDirectory is right way to go.

When index file be map to ram, JVM will call OS file mapping function.
The memory usage is in share memory, it may not be calculate to JVM process
space.

I saw one problem is if the index file bigger then physical ram, and there
are lot of query which cause wide index file access.
Then, the machine has no available memory.
The system change to very slow.

What i did is change lucene code to disable MMapDirectory.

On Wed, Sep 21, 2011 at 1:26 PM, Yongtao Liu  wrote:

>
>
> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Tuesday, September 20, 2011 3:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MMapDirectory failed to map a 23G compound index segment
>
> Since you hit OOME during mmap, I think this is an OS issue not a JVM
> issue.  Ie, the JVM isn't running out of memory.
>
> How many segments were in the unoptimized index?  It's possible the OS
> rejected the mmap because of process limits.  Run "cat
> /proc/sys/vm/max_map_count" to see how many mmaps are allowed.
>
> Or: is it possible you reopened the reader several times against the index
> (ie, after committing from Solr)?  If so, I think 2.9.x never unmaps the
> mapped areas, and so this would "accumulate" against the system limit.
>
> > My memory of this is a little rusty but isn't mmap also limited by mem +
> swap on the box? What does 'free -g' report?
>
> I don't think this should be the case; you are using a 64 bit OS/JVM so in
> theory (except for OS system wide / per-process limits imposed) you should
> be able to mmap up to the full 64 bit address space.
>
> Your virtual memory is unlimited (from "ulimit" output), so that's good.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Sep 7, 2011 at 12:25 PM, Rich Cariens 
> wrote:
> > Ahoy ahoy!
> >
> > I've run into the dreaded OOM error with MMapDirectory on a 23G cfs
> > compound index segment file. The stack trace looks pretty much like
> > every other trace I've found when searching for OOM & "map failed"[1].
> > My configuration
> > follows:
> >
> > Solr 1.4.1/Lucene 2.9.3 (plus
> > SOLR-1969<https://issues.apache.org/jira/browse/SOLR-1969>
> > )
> > CentOS 4.9 (Final)
> > Linux 2.6.9-100.ELsmp x86_64 yada yada yada Java SE (build
> > 1.6.0_21-b06) Hotspot 64-bit Server VM (build 17.0-b16, mixed mode)
> > ulimits:
> >core file size (blocks, -c) 0
> >data seg size(kbytes, -d) unlimited
> >file size (blocks, -f) unlimited
> >pending signals(-i) 1024
> >max locked memory (kbytes, -l) 32
> >max memory size (kbytes, -m) unlimited
> >open files(-n) 256000
> >pipe size (512 bytes, -p) 8
> >POSIX message queues (bytes, -q) 819200
> >stack size(kbytes, -s) 10240
> >cpu time(seconds, -t) unlimited
> >max user processes (-u) 1064959
> >virtual memory(kbytes, -v) unlimited
> >file locks(-x) unlimited
> >
> > Any suggestions?
> >
> > Thanks in advance,
> > Rich
> >
> > [1]
> > ...
> > java.io.IOException: Map failed
> >  at sun.nio.ch.FileChannelImpl.map(Unknown Source)
> >  at
> > org.apache.lucene.store.MMapDirectory$MMapIndexInput.(Unknown
> > Source)
> >  at
> > org.apache.lucene.store.MMapDirectory$MMapIndexInput.(Unknown
> > Source)
> >  at org.apache.lucene.store.MMapDirectory.openInput(Unknown Source)
> >  at org.apache.lucene.index.SegmentReader$CoreReaders.(Unknown
> > Source)
> >
> >  at org.apache.lucene.index.SegmentReader.get(Unknown Source)
> >  at org.apache.lucene.index.SegmentReader.get(Unknown Source)
> >  at org.apache.lucene.index.DirectoryReader.(Unknown Source)
> >  at org.apache.lucene.index.ReadOnlyDirectoryReader.(Unknown
> > Source)
> >  at org.apache.lucene.index.DirectoryReader$1.doBody(Unknown Source)
> >  at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown
> > Source)
> >  at org.apache.lucene.index.DirectoryReader.open(Unknown Source)
> >  at org.apache.lucene.index.IndexReader.open(Unknown Source) ...
> > Caused by: java.lang.OutOfMemoryError: Map failed
> >  at sun.nio.ch.FileChannelImpl.map0(Native Method) ...
> >
> **Legal Disclaimer***
> "This communication may contain confidential and privileged
> material for the sole use of the intended recipient. Any
> unauthorized review, use or distribution by others is strictly
> prohibited. If you have received the message in error, please
> advise the sender by reply email and delete the message. Thank
> you."
> *
>


memory usage keep increase

2011-11-14 Thread Yongtao Liu
Hi all,

I saw one issue is ram usage keep increase when we run query.
After look in the code, looks like Lucene use MMapDirectory to map index file 
to ram.

According to 
http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/store/MMapDirectory.html
 comments, it will use lot of memory.
NOTE: memory mapping uses up a portion of the virtual memory address space in 
your process equal to the size of the file being mapped. Before using this 
class, be sure your have plenty of virtual address space, e.g. by using a 64 
bit JRE, or a 32 bit JRE with indexes that are guaranteed to fit within the 
address space.

So, my understanding is solr request physical RAM >= index file size, is it 
right?

Yongtao


**Legal Disclaimer***
"This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you."
*

RE: memory usage keep increase

2011-11-17 Thread Yongtao Liu
Erick,

Thanks for your reply.

Yes, "virtual memory" does not mean physical memory.
But if when "virtual memory" >> physical memory, the system will change to 
slow, since lots for paging request happen.

Yongtao
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 15, 2011 8:37 AM
To: solr-user@lucene.apache.org
Subject: Re: memory usage keep increase

I'm pretty sure not. The words "virtual memory address space" is important 
here, that's not physical memory...

Best
Erick

On Mon, Nov 14, 2011 at 11:55 AM, Yongtao Liu  wrote:
> Hi all,
>
> I saw one issue is ram usage keep increase when we run query.
> After look in the code, looks like Lucene use MMapDirectory to map index file 
> to ram.
>
> According to 
> http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/store/MMapDirectory.html
>  comments, it will use lot of memory.
> NOTE: memory mapping uses up a portion of the virtual memory address space in 
> your process equal to the size of the file being mapped. Before using this 
> class, be sure your have plenty of virtual address space, e.g. by using a 64 
> bit JRE, or a 32 bit JRE with indexes that are guaranteed to fit within the 
> address space.
>
> So, my understanding is solr request physical RAM >= index file size, is it 
> right?
>
> Yongtao
>
>
> **Legal Disclaimer***
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message in error, please advise the sender by reply email 
> and delete the message. Thank you."
> *


Re: How to delete documents from a SOLR cloud / balance the shards in the cloud?

2010-09-10 Thread James Liu
Stephan and all,

I am evaluating this like you are. You may want to check
http://www.tomkleinpeter.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/.
I would appreciate if others can shed some light on this, too.

Bests,
James
On Fri, Sep 10, 2010 at 6:07 AM, Stephan Raemy wrote:

> Hi solr-cloud users,
>
> I'm currently setting up a solr-cloud/zookeeper instance and so far,
> everything works out fine. I downloaded the source from the cloud branch
> yesterday and build it from source.
>
> I've got 10 shards distributed across 4 servers and a zookeeper instance.
> Searching documents with the flag "distrib=true" works out and it returns
> the expected result.
>
> But here comes the tricky question. I will add new documents every day and
> therefore, I'd like to balance my shards to keep the system speedy. The
> Wiki says that one can calculate the hash of a document id and then
> determine the corresponding shard. But IMHO, this does not take into
> account
> that the cloud may become bigger or shrink over time by adding or removing
> shards. Obviously adding has a higher priority since one wants to reduce
> the shard size to improve the response time of distributed searches.
>
> When reading through the Wikis and existing documentation, it is still
> unclear to me how to do the following operations:
> - Modify/Delete a document stored in the cloud without having to store the
>  document:shard mapping information outside of the cloud. I would expect
>  something like shard attribute on each doc in the SOLR query result
>  (activated/deactivated by a flag), so that i can query the SOLR cloud for
> a
>  doc and then delete it on the specific shard.
> - Balance a cloud when adding/removing new shards or just balance them
> after
>  many deletions.
>
> Of course there are solutions to this, but at the end, I'd love to have a
> true cloud where i do not have to worry about shard performance
> optimization.
> Hints are greatly appreciated.
>
> Cheers,
> Stephan
>


Need help for solr searching case insensative item

2010-10-25 Thread wu liu
Hi all,

I just noticed a wierd thing happend to my solr search result.
if I do a search for "ecommons", it cannot get the result for "eCommons", 
instead,
if i do a search for "eCommons", i can only get all the match for "eCommons", 
but not "ecommons".

I cannot figure it out why?

please help me

Thanks very much in advance


HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu
Hi,
we have an index with 2mil documents in it. From time to time we rewrite
about 1/10 of the documents (just under 200k). No autocommit. At the end we
a single commit and got time out after 60 sec. My questions are:
1. is it normal to have the commit of this size takes more than 1min? I
know it's probably depend on the server ...
2. I know there're a few parameters I can set in CommonsHttpSolrServer
class: setConnectionManagerTimeout(), setConnectionTimeout(),
setSoTimeout(). Which should I use?

TIA


Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu
Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get plenty
memory.



On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch
wrote:

> Which version of Solr?
> Are you sure you did not run out of memory half way through import?
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu  wrote:
>
> > Hi,
> > we have an index with 2mil documents in it. From time to time we rewrite
> > about 1/10 of the documents (just under 200k). No autocommit. At the end
> we
> > a single commit and got time out after 60 sec. My questions are:
> > 1. is it normal to have the commit of this size takes more than 1min? I
> > know it's probably depend on the server ...
> > 2. I know there're a few parameters I can set in CommonsHttpSolrServer
> > class: setConnectionManagerTimeout(), setConnectionTimeout(),
> > setSoTimeout(). Which should I use?
> >
> > TIA
> >
>


Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu
Solrj.


On Tue, Feb 19, 2013 at 9:08 PM, Erick Erickson wrote:

> Well, your commits may have to wait until any merges are done, which _may_
> be merging your entire index into a single segment. Possibly this could
> take more than 60 seconds.
>
> _How_ are you doing this? DIH? SolrJ? post.jar?
>
> Best
> Erick
>
>
> On Tue, Feb 19, 2013 at 8:00 PM, Siping Liu  wrote:
>
> > Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get
> plenty
> > memory.
> >
> >
> >
> > On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch
> > wrote:
> >
> > > Which version of Solr?
> > > Are you sure you did not run out of memory half way through import?
> > >
> > > Regards,
> > >Alex.
> > >
> > > Personal blog: http://blog.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all at
> > > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> > >
> > >
> > > On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu 
> wrote:
> > >
> > > > Hi,
> > > > we have an index with 2mil documents in it. From time to time we
> > rewrite
> > > > about 1/10 of the documents (just under 200k). No autocommit. At the
> > end
> > > we
> > > > a single commit and got time out after 60 sec. My questions are:
> > > > 1. is it normal to have the commit of this size takes more than
> 1min? I
> > > > know it's probably depend on the server ...
> > > > 2. I know there're a few parameters I can set in
> CommonsHttpSolrServer
> > > > class: setConnectionManagerTimeout(), setConnectionTimeout(),
> > > > setSoTimeout(). Which should I use?
> > > >
> > > > TIA
> > > >
> > >
> >
>


How to index MS SQL Server column with image type

2011-04-07 Thread Roy Liu
Hi all,

When I index a column(image type) of a table  via *
http://localhost:8080/solr/dataimport?command=full-import*
*There is a error like this: String length must be a multiple of four.*

Any help?
Thank you very much.

PS. the attachment includes Chinese character.


*1. data-config.xml*

 

 
   
*   *
   
 


*2. schema.xml*


*3. Database*
*attachment *is a column of table attachment. it's type is IMAGE.


Best Regards,
Roy Liu


How to index PDF file stored in SQL Server 2008

2011-04-07 Thread Roy Liu
Hi,

I have a table named *attachment *in MS SQL Server 2008.

COLUMNTYPE
- 
id   int
titlevarchar(200)
attachment image

I need to index the attachment(store pdf files) column from database via
DIH.

After access this URL, it returns "Indexing completed. Added/Updated: 5
documents. Deleted 0 documents."
http://localhost:8080/solr/dataimport?command=full-import

However, I can not search anything.

Anyone can help me ?

Thanks.



*data-config-sql.xml*

  
  


  


*schema.xml*




Best Regards,
Roy Liu


Re: How to index PDF file stored in SQL Server 2008

2011-04-07 Thread Roy Liu
Thanks Lance,

I'm using Solr 1.4.
If I want to using TikaEP, need to upgrade to Solr 3.1 or import jar files?

Best Regards,
Roy Liu


On Fri, Apr 8, 2011 at 10:22 AM, Lance Norskog  wrote:

> You need the TikaEntityProcessor to unpack the PDF image. You are
> sticking binary blobs into the index. Tika unpacks the text out of the
> file.
>
> TikaEP is not in Solr 1.4, but it is in the new Solr 3.1 release.
>
> On Thu, Apr 7, 2011 at 7:14 PM, Roy Liu  wrote:
> > Hi,
> >
> > I have a table named *attachment *in MS SQL Server 2008.
> >
> > COLUMNTYPE
> > - 
> > id   int
> > titlevarchar(200)
> > attachment image
> >
> > I need to index the attachment(store pdf files) column from database via
> > DIH.
> >
> > After access this URL, it returns "Indexing completed. Added/Updated: 5
> > documents. Deleted 0 documents."
> > http://localhost:8080/solr/dataimport?command=full-import
> >
> > However, I can not search anything.
> >
> > Anyone can help me ?
> >
> > Thanks.
> >
> >
> > 
> > *data-config-sql.xml*
> > 
> >   >  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> >  url="jdbc:sqlserver://localhost:1433;databaseName=master"
> >  user="user"
> >  password="pw"/>
> >  
> > >query="select id,title,attachment from attachment">
> >
> >  
> > 
> >
> > *schema.xml*
> > 
> >
> >
> >
> > Best Regards,
> > Roy Liu
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: How to index PDF file stored in SQL Server 2008

2011-04-10 Thread Roy Liu
Hi, all
Thank YOU very much for your kindly help.

*1. I have upgrade from Solr 1.4 to Solr 3.1*
*2. Change data-config-sql.xml *


  
  

  


***

*


  


*3. solrconfig.xml and schema.xml are NOT changed.*

However, when I access
*http://localhost:8080/solr/dataimport?command=full-import*

It still has errors:
Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query:[B@ae1393 Processing Document # 1

Could you give me some advices. This problem is so boring me.
Thanks.

-- 
Best Regards,
Roy Liu


On Mon, Apr 11, 2011 at 5:16 AM, Lance Norskog  wrote:

> You have to upgrade completely to the Apache Solr 3.1 release. It is
> worth the effort. You cannot copy any jars between Solr releases.
> Also, you cannot copy over jars from newer Tika releases.
>
> On Fri, Apr 8, 2011 at 10:47 AM, Darx Oman  wrote:
> > Hi again
> > what you are missing is field mapping
> > 
> > 
> >
> >
> > no need for TikaEntityProcessor  since you are not accessing pdf files
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: How to index PDF file stored in SQL Server 2008

2011-04-10 Thread Roy Liu
Hi,

I have copied
\apache-solr-3.1.0\dist\apache-solr-dataimporthandler-extras-3.1.0.jar

into \apache-tomcat-6.0.32\webapps\solr\WEB-INF\lib\

Other Errors:
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Unclosed
quotation mark after the character string 'B@3e574'.

-- 
Best Regards,
Roy Liu


On Mon, Apr 11, 2011 at 2:12 PM, Darx Oman  wrote:

> Hi there
>
> Error is not clear...
>
> but did you copy "apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar"
> to your solr\lib ?
>


Re: How to index PDF file stored in SQL Server 2008

2011-04-11 Thread Roy Liu
I changed data-config-sql.xml to

  

  





  



There are no errors, but, the indexed pdf is convert to Numbers..
200 1 202 1 203 1 212 1 222 1 236 1 242 1 244 1 254 1 255
-- 
Best Regards,
Roy Liu


On Mon, Apr 11, 2011 at 2:02 PM, Roy Liu  wrote:

> Hi, all
> Thank YOU very much for your kindly help.
>
> *1. I have upgrade from Solr 1.4 to Solr 3.1*
> *2. Change data-config-sql.xml *
>
> 
>  name="*bsds*"
>   driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>
> url="jdbc:sqlserver://localhost:1433;databaseName=bs_docmanager"
>   user="username"
>   password="pw"/>
>   
>
>   
>  query="select id,attachment,filename from attachment where
> ext='pdf' and id>30001030" >
>
> 
> * url="${doc.attachment}" format="text" >**
> 
> *
> 
> 
>   
> 
>
> *3. solrconfig.xml and schema.xml are NOT changed.*
>
> However, when I access
>
> *http://localhost:8080/solr/dataimport?command=full-import*
>
> It still has errors:
> Full Import
> failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to execute query:[B@ae1393 Processing Document # 1
>
> Could you give me some advices. This problem is so boring me.
> Thanks.
>
> --
> Best Regards,
> Roy Liu
>
>
>
> On Mon, Apr 11, 2011 at 5:16 AM, Lance Norskog  wrote:
>
>> You have to upgrade completely to the Apache Solr 3.1 release. It is
>> worth the effort. You cannot copy any jars between Solr releases.
>> Also, you cannot copy over jars from newer Tika releases.
>>
>> On Fri, Apr 8, 2011 at 10:47 AM, Darx Oman  wrote:
>> > Hi again
>> > what you are missing is field mapping
>> > 
>> > 
>> >
>> >
>> > no need for TikaEntityProcessor  since you are not accessing pdf files
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>
>


Re: Tika, Solr running under Tomcat 6 on Debian

2011-04-11 Thread Roy Liu
\apache-solr-3.1.0\contrib\extraction\lib\tika*.jar

-- 
Best Regards,
Roy Liu


On Mon, Apr 11, 2011 at 3:10 PM, Mike  wrote:

> Hi All,
>
> I have the same issue. I have installed solr instance on tomcat6. When try
> to index pdf I am running into the below exception:
>
> 11 Apr, 2011 12:11:55 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NoClassDefFoundError:
> org/apache/tika/exception/TikaException
>at java.lang.Class.forName0(Native Method)
>at java.lang.Class.forName(Class.java:247)
>at
>
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
>at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
>at
> org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
>at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240)
>at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
>at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
>at
>
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.tika.exception.TikaException
>at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>at java.security.AccessController.doPrivileged(Native Method)
>at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>... 22 more
>
> I could not found any tika jar file.
> Could you please help me out in fixing the above issue.
>
> Thanks,
> Mike
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tika-Solr-running-under-Tomcat-6-on-Debian-tp993295p2805615.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


custom sorter

2012-07-19 Thread Siping Liu
Hi,
I have requirements to place a document to a pre-determined  position for
special filter query values, for instance when filter query is
fq=(field1:"xyz") place document abc as first result (the rest of the
result set will be ordered by sort=field2). I guess I have to plug in my
Java code as a custom sorter. I'd appreciate it if someone can shed light
on this (how to add custom sorter, etc.)
TIA.


Re: help: I always get NULL with row.get(columnName)

2012-07-19 Thread Roy Liu
anyone knows?

On Thu, Jul 19, 2012 at 5:48 PM, Roy Liu  wrote:

> Hi,
>
> When I use Transformer to handle files, I always get NULL with
> row.get(columnName).
> anyone knows?
>
> --
> The following file is *data-config.xml*
>
> 
>  name="ds"
>   driver="oracle.jdbc.driver.OracleDriver"
>   url="jdbc:oracle:thin:@10.1.1.1:1521:sid"
>   user="username"
>   password="pwd"
>   />
>   
>
> query="select a.objid as ID from DOCGENERAL a where
> a.objid=14154965">
>
> 
>
> * *query="select docid as ID, name as filename,
> storepath as filepath from attachment where docid=${report.ID}" *
> * transformer="com.bs.solr.BSFileTransformer" >*
> * *
> * *
> * *
> * *
>
> 
>
>   
> 
>
>
> public class *BSFileTransformer *extends Transformer {
>  private static Log LOGGER = LogFactory.getLog(BSFileTransformer.class);
>  @Override
>  public Object transformRow(Map row, Context context) {
> // row.get("filename") is always null,but row.get("id") is
> OK.
>  S*ystem.out.println("==filename:"+row.get("filename"));*
>
> List> fields = context.getAllEntityFields();
>
> String id = null; // Entity ID
> String fileName = "NONAME";
>  for (Map field : fields) {
> String name = field.get("name");
>  System.out.println("name:" + name);
> if ("bs_attachment_id".equals(name)) {
>  String columnName = field.get("column");
> id = String.valueOf(row.get(columnName));
>  }
> if ("bs_attachment_name".equals(name)) {
> String columnName = field.get("column");
>  fileName = (String) row.get(columnName);
> }
>  String isFile = field.get("isfile");
> if ("true".equals(isFile)) {
>  String columnName = field.get("column");
> String filePath = (String) row.get(columnName);
>
> try {
> System.out.println("fileName:"+ fileName+",filePath: " + filePath);
>  if(filePath != null){
> File file = new File(filePath);
>  InputStream inputStream = new FileInputStream(file);
> Tika tika = new Tika();
>  String text = tika.parseToString(inputStream);
>  row.put(columnName, text);
>  }
> LOGGER.info("Processed File OK! Entity: " + fileName + ", ID: " +id);
>  } catch (IOException ioe) {
> LOGGER.error(ioe.getMessage());
> row.put(columnName, "");
>  } catch (TikaException e) {
> LOGGER.error("Parse File Error:" + id + ", Error:"
>  + e.getMessage());
> row.put(columnName, "");
> }
>  }
> }
> return row;
>  }
> }
>


Re: custom sorter

2012-07-22 Thread Siping Liu
Hi -- thanks for the response. It's the right direction. However on closer
look I don't think I can use it directly. The reason is that in my case,
the query string is always "*:*", we use filter query to get different
results. When fq=(field1:"xyz") we want to boost one document and let sort=
to take care of the rest results, and when field1 has other value, sort=
takes care of all results.

Maybe I can define my own SearchComponent class, and specify it in

  my_search_component

I have to try and see if that'd work.

thanks.


On Fri, Jul 20, 2012 at 3:24 AM, Lee Carroll
wrote:

> take a look at
> http://wiki.apache.org/solr/QueryElevationComponent
>
> On 20 July 2012 03:48, Siping Liu  wrote:
>
> > Hi,
> > I have requirements to place a document to a pre-determined  position for
> > special filter query values, for instance when filter query is
> > fq=(field1:"xyz") place document abc as first result (the rest of the
> > result set will be ordered by sort=field2). I guess I have to plug in my
> > Java code as a custom sorter. I'd appreciate it if someone can shed light
> > on this (how to add custom sorter, etc.)
> > TIA.
> >
>


anyway to get Document update time stamp

2009-09-17 Thread siping liu

I understand there's no "update" in Solr/lucene, it's really delete+insert. Is 
there anyway to get a Document's insert time stamp, w/o explicitely creating 
such a data field in the document? If so, how can I query it, for instance "get 
all documents that are older than 24 hours"? Thanks.
_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/171222984/direct/01/

RE: Solr and Garbage Collection

2009-10-02 Thread siping liu

Hi,

I read pretty much all posts on this thread (before and after this one). Looks 
like the main suggestion from you and others is to keep max heap size (-Xmx) as 
small as possible (as long as you don't see OOM exception). This brings more 
questions than answers (for me at least. I'm new to Solr).

 

First, our environment and problem encountered: Solr1.4 (nightly build, 
downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on 
Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml 
(looks very small). At first we used minimum JAVA_OPTS and quickly run into the 
problem similar to the one orignal poster reported -- long pause (seconds to 
minutes) under load test. jconsole showed that it pauses on GC. So more 
JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m 
-XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the thinking is with 
mutile-cpu/cores we can get over with GC as quickly as possibe. With the new 
setup, it works fine until Tomcat reaches heap size, then it blocks and takes 
minutes on "full GC" to get more space from "tenure generation". We tried 
different Xmx (from very small to large), no difference in long GC time. We 
never run into OOM.

 

Questions:

* In general various cachings are good for performance, we have more RAM to use 
and want to use more caching to boost performance, isn't your suggestion (of 
lowering heap limit) going against that?

* Looks like Solr caching made its way into tenure-generation on heap, that's 
good. But why they get GC'ed eventually?? I did a quick check of Solr code 
(Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that 
what is causing all this? This seems to suggest a design flaw in Solr's memory 
management strategy (or just my ignorance about Solr?). I mean, wouldn't this 
be the "right" way of doing it -- you allow user to specify the cache size in 
solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and 
no need to use WeakReference (BTW, why not SoftReference)??

* Right now I have a single Tomcat hosting Solr and other applications. I guess 
now it's better to have Solr on its own Tomcat, given that it's tricky to 
adjust the java options.

 

thanks.


 
> From: wun...@wunderwood.org
> To: solr-user@lucene.apache.org
> Subject: RE: Solr and Garbage Collection
> Date: Fri, 25 Sep 2009 09:51:29 -0700
> 
> 30ms is not better or worse than 1s until you look at the service
> requirements. For many applications, it is worth dedicating 10% of your
> processing time to GC if that makes the worst-case pause short.
> 
> On the other hand, my experience with the IBM JVM was that the maximum query
> rate was 2-3X better with the concurrent generational GC compared to any of
> their other GC algorithms, so we got the best throughput along with the
> shortest pauses.
> 
> Solr garbage generation (for queries) seems to have two major components:
> per-request garbage and cache evictions. With a generational collector,
> these two are handled by separate parts of the collector. Per-request
> garbage should completely fit in the short-term heap (nursery), so that it
> can be collected rapidly and returned to use for further requests. If the
> nursery is too small, the per-request allocations will be made in tenured
> space and sit there until the next major GC. Cache evictions are almost
> always in long-term storage (tenured space) because an LRU algorithm
> guarantees that the garbage will be old.
> 
> Check the growth rate of tenured space (under constant load, of course)
> while increasing the size of the nursery. That rate should drop when the
> nursery gets big enough, then not drop much further as it is increased more.
> 
> After that, reduce the size of tenured space until major GCs start happening
> "too often" (a judgment call). A bigger tenured space means longer major GCs
> and thus longer pauses, so you don't want it oversized by too much.
> 
> Also check the hit rates of your caches. If the hit rate is low, say 20% or
> less, make that cache much bigger or set it to zero. Either one will reduce
> the number of cache evictions. If you have an HTTP cache in front of Solr,
> zero may be the right choice, since the HTTP cache is cherry-picking the
> easily cacheable requests.
> 
> Note that a commit nearly doubles the memory required, because you have two
> live Searcher objects with all their caches. Make sure you have headroom for
> a commit.
> 
> If you want to test the tenured space usage, you must test with real world
> queries. Those are the only way to get accurate cache eviction rates.
> 
> wunder
  
_
Bing™  brings you maps, menus, and reviews organized in one place.   Try it now.
http://www.bing.com/search?q=restaurants&form=MLOGEN&publ=WLHMTAG&crea=TEXT_MLOGEN_Core_tagline_local_1x1

Re: response status: error 400

2009-10-22 Thread James liu
Are you sure the url is correct?


-- 
regards
j.L ( I live in Shanghai, China)


weird problem with solr.DateField

2009-11-11 Thread siping liu

Hi,

I'm using Solr 1.4 (from nightly build about 2 months ago) and have this 
defined in solrconfig:





 

and following code that get executed once every night:

CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer("http://...";);
solrServer.setRequestWriter(new BinaryRequestWriter());

solrServer.add(documents);
solrServer.commit();

UpdateResponse deleteResult = solrServer.deleteByQuery("lastUpdate:[* TO 
NOW-2HOUR]");
solrServer.commit();

 

The purpose is to refresh index with latest data (in "documents").

This works fine, except that after a few days I start to see a few documents 
with no "lastUpdate" field (query "-lastUpdate:[* TO *]") -- how can that be 
possible?

 

thanks in advance.

 
  
_
Windows 7: Unclutter your desktop.
http://go.microsoft.com/?linkid=9690331&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009

Re: Illegal character in xml file

2008-09-19 Thread James liu
first, u should escape some string like (code by php)

> function escapeChars($string) {
>
$string = str_replace("&", "&", $string);

$string = str_replace("<", "<", $string);

$string = str_replace(">", ">", $string);

$string = str_replace("'", "'", $string);

$string = str_replace('"', """, $string);


return $string;

}



second, u get xml(encode by utf-8)

third, post it by utf-8(head string "Content-Type: text/xml;charset=utf-8")


if u don't know how to, maybe u can check solr client(u can find it in
solr's wiki)

Good Luck~


On Fri, Sep 19, 2008 at 4:33 PM, 李学健 <[EMAIL PROTECTED]> wrote:

> hi, all
>
> when i post xml files to solr, it's interrupted by this: Illegal character
>
> how can i deal with it ?
>
> is there any solution to ignore Illegal character in documents feeded ?
>
>
> thanks
>



-- 
regards
j.L


Re: sole 1.3: bug in phps response writer

2008-11-17 Thread James liu
i find url not same as the others
-- 
regards
j.L


Re: Newbe! Trying to run solr-1.3.0 under tomcat. Please help

2008-11-19 Thread James liu
check procedure:
1: rm -r $tomcat/webapps/*
2: rm -r $solr/data ,,,ur index data directory
3: check xml(any xml u modified)
4: start tomcat

i had same error, but i forgot how to fix...so u can use my check procedure,
i think it will help you


i use tomcat+solr in win2003, freebsd, mac osx 10.5.5, they all work well

-- 
regards
j.L


Re: posting error in solr

2008-11-19 Thread James liu
first u sure the xml is utf-8,,and field value is utf-8,,
second u should post xml by utf-8


my advice : All encoding use utf-8...

it make my solr work well,,, i use chinese

-- 
regards
j.L


Re: Query for Distributed search -

2008-11-24 Thread James liu
Up to your solr client.

On Mon, Nov 24, 2008 at 1:24 PM, souravm <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Looking for some insight on distributed search.
>
> Say I have an index distributed in 3 boxes and the index contains time and
> text data (typical log file). Each box has index for different timeline -
> say Box 1 for all Jan to April, Box 2 for May to August and Box 3 for Sep to
> Dec.
>
> Now if I try to search for a text string, will the search would happen in
> parallel in all 3 boxes or sequentially?
>
> Regards,
> Sourav
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further, you
> are not
> to copy, disclose, or distribute this e-mail or its contents to any other
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has
> taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should carry
> out your
> own virus checks before opening the e-mail or attachment. Infosys reserves
> the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS End of Disclaimer INFOSYS***
>



-- 
regards
j.L


adding plug-in after search is done

2009-04-27 Thread siping liu

trying to manipulate search result (like further filtering out unwanted), and 
ordering the results differently. Where is the suitable place for doing it? 
I've been using QueryResponseWriter but that doesn't seem to be the right place.

thanks.

_
Rediscover Hotmail®: Get quick friend updates right in your inbox. 
http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates2_042009

RE: Creating a distributed search in a searchComponent

2009-05-21 Thread siping liu

I was looking for answer to the same question, and have similar concern. Looks 
like any serious customization work requires developing custom SearchComponent, 
but it's not clear to me how Solr designer wanted this to be done. I have more 
confident to either do it at Lucene level, or stay on client side and using 
something like Multi-core (as discussed here 
http://wiki.apache.org/solr/MultipleIndexes).


 
> Date: Wed, 20 May 2009 13:47:20 -0400
> Subject: RE: Creating a distributed search in a searchComponent
> From: nicholas.bai...@rackspace.com
> To: solr-user@lucene.apache.org
> 
> It seems I sent this out a bit too soon. After looking at the source it seems 
> there are two seperate paths for distributed and regular queries, however the 
> prepare method for for all components is run before the shards parameter is 
> checked. So I can build the shards portion by using the prepare method of the 
> my own search component. 
> 
> However I'm not sure if this is the greatest idea in case solr changes at 
> some point.
> 
> -Nick
> 
> -Original Message-
> From: "Nick Bailey" 
> Sent: Wednesday, May 20, 2009 1:29pm
> To: solr-user@lucene.apache.org
> Subject: Creating a distributed search in a searchComponent
> 
> Hi,
> 
> I am wondering if it is possible to basically add the distributed portion of 
> a search query inside of a searchComponent.
> 
> I am hoping to build my own component and add it as a first-component to the 
> StandardRequestHandler. Then hopefully I will be able to use this component 
> to build the "shards" parameter of the query and have the Handler then treat 
> the query as a distributed search. Anyone have any experience or know if this 
> is possible?
> 
> Thanks,
> Nick
> 
> 
> 

_
Hotmail® has ever-growing storage! Don’t worry about storage limits.
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage1_052009

Re: Using Chinese / How to ?

2009-06-02 Thread James liu
u means how to config solr which support chinese?

Update problem?

On Tuesday, June 2, 2009, Fer-Bj  wrote:
>
> I'm sending 3 files:
> - schema.xml
> - solrconfig.xml
> - error.txt (with the error description)
>
> I can confirm by now that this error is due to invalid characters for the
> XML format (ASCII 0 or 11).
> However, this problem now is taking a different direction: how to start
> using the CJK instead of the english!
> http://www.nabble.com/file/p23825881/error.txt error.txt
> http://www.nabble.com/file/p23825881/solrconfig.xml solrconfig.xml
> http://www.nabble.com/file/p23825881/schema.xml schema.xml
>
>
> Grant Ingersoll-6 wrote:
>>
>> Can you provide details on the errors?  I don't think we have a
>> specific how to, but I wouldn't think it would be much different from
>> 1.2
>>
>> -Grant
>> On May 31, 2009, at 10:31 PM, Fer-Bj wrote:
>>
>>>
>>> Hello,
>>>
>>> is there any "how to" already created to get me up using SOLR 1.3
>>> running
>>> for a chinese based website?
>>> Currently our site is using SOLR 1.2, and we tried to move into 1.3
>>> but we
>>> couldn't complete our reindex as it seems like 1.3 is more strict
>>> when it
>>> comes to special chars.
>>>
>>> I would appreciate any help anyone may provide on this.
>>>
>>> Thanks!!
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Using-Chinese---How-to---tp23810129p23810129.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Using-Chinese---How-to---tp23810129p23825881.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

-- 
regards
j.L ( I live in Shanghai, China)


Re: Solr multiple keyword search as google

2009-06-02 Thread James liu
U can find answer in tutorial or example

On Tuesday, June 2, 2009, The Spider  wrote:
>
> Hi,
>    I am using solr nightly bind for my search.
> I have to search in the location field of the table which is not my default
> search field.
> I will briefly explain my requirement below:
> I want to get the same/similar result when I give location multiple
> keywords, say  "San jose ca USA"
> or "USA ca san jose" or "CA San jose USA" (like that of google search). That
> means even if I rearranged the keywords of location I want to get proper
> results. Is there any way to do that?
> Thanks in advance
> --
> View this message in context: 
> http://www.nabble.com/Solr-multiple-keyword-search-as-google-tp23826278p23826278.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

-- 
regards
j.L ( I live in Shanghai, China)


Re: Using Chinese / How to ?

2009-06-02 Thread James liu
1: modify ur schema.xml:
like




2: add your field:


3: add your analyzer to {solr_dir}\lib\

4: rebuild newsolr and u will find it in {solr_dir}\dist

5: follow tutorial to setup solr

6: open your browser to solr admin page, find analyzer to check analyzer, it
will tell u how to analyzer world, use which analyzer


-- 
regards
j.L ( I live in Shanghai, China)


Re: indexing Chienese langage

2009-06-04 Thread James liu
first: u not have to restart solr,,,u can use new data to replace old data
and call solr to use new search..u can find something in shell script which
with solr

two: u not have to restart solr,,,just keep id is same..example: old
id:1,title:hi, new id:1,title:welcome,,just index new data,,it will delete
old data and insert new doc,,,like replace,,but it will use more time and
resouce.

u can find indexed doc number from solr admin page.


On Fri, Jun 5, 2009 at 7:42 AM, Fer-Bj  wrote:

>
> What we usually do to reindex is:
>
> 1. stop solr
> 2. rmdir -r data  (that is to remove everything in  /opt/solr/data/
> 3. mkdir data
> 4. start solr
> 5. start reindex.   with this we're sure about not having old copies or
> index..
>
> To check the index size we do:
> cd data
> du -sh
>
>
>
> Otis Gospodnetic wrote:
> >
> >
> > I can't tell what that analyzer does, but I'm guessing it uses n-grams?
> > Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629
> > instead?
> >
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: Fer-Bj 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, June 4, 2009 2:20:03 AM
> >> Subject: Re: indexing Chienese langage
> >>
> >>
> >> We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after
> >> reindexing
> >> the index size went from 1.5 Gb to 2.7 Gb.
> >>
> >> Is that some expected behavior ?
> >>
> >> Is there any switch or trick to avoid having a double + index file size?
> >>
> >> Koji Sekiguchi-2 wrote:
> >> >
> >> > CharFilter can normalize (convert) traditional chinese to simplified
> >> > chinese or vice versa,
> >> > if you define mapping.txt. Here is the sample of Chinese character
> >> > normalization:
> >> >
> >> >
> >>
> https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
> >> >
> >> > See SOLR-822 for the detail:
> >> >
> >> > https://issues.apache.org/jira/browse/SOLR-822
> >> >
> >> > Koji
> >> >
> >> >
> >> > revathy arun wrote:
> >> >> Hi,
> >> >>
> >> >> When I index chinese content using chinese tokenizer and analyzer in
> >> solr
> >> >> 1.3 ,some of the chinese text files are getting indexed but others
> are
> >> >> not.
> >> >>
> >> >> Since chinese has got many different language subtypes as in standard
> >> >> chinese,simplified chinese etc which of these does the chinese
> >> tokenizer
> >> >> support and is there any method to find the type of  chiense language
> >> >> from
> >> >> the file?
> >> >>
> >> >> Rgds
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/indexing-Chienese-langage-tp22033302p23879730.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
regards
j.L ( I live in Shanghai, China)


Re: indexing Chienese langage

2009-06-04 Thread James liu
On Mon, Feb 16, 2009 at 4:30 PM, revathy arun  wrote:

> Hi,
>
> When I index chinese content using chinese tokenizer and analyzer in solr
> 1.3 ,some of the chinese text files are getting indexed but others are not.
>

are u sure ur analyzer can do it good?

if not sure, u can use analzyer link in solr admin page to check it


>
> Since chinese has got many different language subtypes as in standard
> chinese,simplified chinese etc which of these does the chinese tokenizer
> support and is there any method to find the type of  chiense language  from
> the file?
>
> Rgds
>



-- 
regards
j.L ( I live in Shanghai, China)


Re: timeouts

2009-06-04 Thread James liu
*Collins:

*i don't know what u wanna say?

-- 
regards
j.L ( I live in Shanghai, China)


Query faceting

2009-06-08 Thread siping liu

Hi,

I have a field called "service" with following values:

- Shuttle Services
- Senior Discounts
- Laundry Rooms

- ...

 

When I conduct query with "facet=true&facet.field=service&facet.limit=-1", I 
get something like this back:

- shuttle 2

- service 3

- senior 0

- laundry 0

- room 3

- ...

 

Questions:

- How not to break up fields values in words, so I can get something like 
"Shuttle Services 2" back?

- How to tell Solr not to return facet with 0 value? The query takes long time 
to finish, seemingly because of the long list of items with 0 count.

 

thanks for any advice.

_
Insert movie times and more without leaving Hotmail®. 
http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=TXT_TAGLM_WL_HM_Tutorial_QuickAdd_062009

does solr support summary

2009-06-10 Thread James liu
if user use keyword to search and get summary(auto generated by
keyword)...like this

doc filed: id, text

id: 001
text:

> Open source is a development method for software that harnesses the power
> of distributed peer review and transparency of process. The promise of open
> source is better quality, higher reliability, more flexibility, lower cost,
> and an end to predatory vendor lock-in.
>
if keyword is "source",,summary is:

Open source is a development...The promise of open source is better quality
if keyword is "power ",,,summary is:
Open...harnesses the power of distributed peer review and transparency of
process...

just like google search results...

and any advice will be appreciated.

-- 
regards
j.L ( I live in Shanghai, China)


DisMaxRequestHandler usage

2009-06-16 Thread siping liu

Hi,

I have this standard query:

q=(field1:hello OR field2:hello) AND (field3:world)

 

Can I use dismax handler for this (applying the same search term on field1 and 
field2, but keep field3 with something separate)? If it can be done, what's the 
advantage of doing it this way over using the standard query?

 

thanks.

_
Microsoft brings you a new way to search the web.  Try  Bing™ now
http://www.bing.com?form=MFEHPG&publ=WLHMTAG&crea=TEXT_MFEHPG_Core_tagline_try 
bing_1x1

IndexMerge not found

2009-07-01 Thread James liu
i try http://wiki.apache.org/solr/MergingSolrIndexes

system: win2003, jdk 1.6

Error information:

> Caused by: java.lang.ClassNotFoundException:
> org.apache.lucene.misc.IndexMergeTo
> ol
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClassInternal(Unknown Source)
> Could not find the main class: org/apache/lucene/misc/IndexMergeTool.
> Program w
> ill exit.
>


-- 
regards
j.L ( I live in Shanghai, China)


Re: IndexMerge not found

2009-07-01 Thread James liu
i use lucene-core-2.9-dev.jar, lucene-misc-2.9-dev.jar

On Thu, Jul 2, 2009 at 2:02 PM, James liu  wrote:

> i try http://wiki.apache.org/solr/MergingSolrIndexes
>
> system: win2003, jdk 1.6
>
> Error information:
>
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.lucene.misc.IndexMergeTo
>> ol
>> at java.net.URLClassLoader$1.run(Unknown Source)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(Unknown Source)
>> at java.lang.ClassLoader.loadClass(Unknown Source)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
>> at java.lang.ClassLoader.loadClass(Unknown Source)
>> at java.lang.ClassLoader.loadClassInternal(Unknown Source)
>> Could not find the main class: org/apache/lucene/misc/IndexMergeTool.
>> Program w
>> ill exit.
>>
>
>
> --
> regards
> j.L ( I live in Shanghai, China)
>



-- 
regards
j.L ( I live in Shanghai, China)


Is it problem? I use solr to search and index is made by lucene. (not EmbeddedSolrServer(wiki is old))

2009-07-02 Thread James liu
I use solr to search and index is made by lucene. (not
EmbeddedSolrServer(wiki is old))

Is it problem when i use solr to search?

which the difference between Index(made by lucene and solr)?


thks

-- 
regards
j.L ( I live in Shanghai, China)


Re: Is it problem? I use solr to search and index is made by lucene. (not EmbeddedSolrServer(wiki is old))

2009-07-02 Thread James liu
solr have much fieldtype, like: integer,long, double, sint, sfloat,
tint,tfloat,,and more.

but lucene not fieldtype,,just name and value, value only string.

so i not sure is it a problem when i use solr to search( index made by
lucene).



-- 
regards
j.L ( I live in Shanghai, China)


how to stress test solr

2010-02-03 Thread James liu
before stressing test, Should i close SolrCache?

which tool u use?

How to do stress test correctly?

Any pointers?

-- 
regards
j.L ( I live in Shanghai, China)


match to non tokenizable word ("helloworld")

2010-05-16 Thread siping liu

I get no match when searching for "helloworld", even though I have "hello 
world" in my index. How do people usually deal with this? Write a custom 
analyzer, with help from a collection of all dictionary words?

 

thanks for suggestions/comments.
  
_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

Re: multiple slaves on the same box

2007-07-17 Thread James liu

2007/7/18, Ryan McKinley <[EMAIL PROTECTED]>:


Xuesong Luo wrote:
> Hi, there,
> We have one master server and multiple slave servers. The multiple slave
> servers can be run either on the same box or different boxes.  For
> slaves on the same box, is there any best practice that they should use
> the same index or each should have separate indexes?
>

I'm not sure about 'best' practices, but I can tell you my experience...

We have a master and single slave on the same server using the same
index.  Since it is the same index, there really is no 'distribution'
scripts, only something that periodically calls 'commit' on the slave
index.  This is working great.



I don't know why "We have a master and single slave on the same server using
the same
index."

master which do index? or do other thing? search will use slave or master,
or first master and second to slave?


My experience: every paritition have their index, and index not same.

Master Index do backup and it in other server.




I can't think of any reason to have more then one slave server on the

same machine.  What are you trying to do?

ryan





--
regards
jl


solr index problem

2007-07-17 Thread James liu

when i index 1.7m docs and 4k-5k per doc.

OutOfMemory happen when it finish index ~1.13m docs

I just restart tomcat , delete all lock and restart do index.

No error or warning infor until it finish.


anyone know why? or have the same error?

--
regards
jl


  1   2   3   4   >