Re: Setting up SolrCloud 5.0.0 and ZooKeeper 3.4.6

2015-04-07 Thread Swaraj Kumar
As per http://stackoverflow.com/questions/11765015/zookeeper-not-starting

Running without start will fix this.

One more change you need to do is Solr default runs on 8983 and you have
used 8983 in zookeeper so start solr on different port.

Regards,


Swaraj Kumar
Senior Software Engineer I
MakeMyTrip.com
Mob No- 9811774497

On Tue, Apr 7, 2015 at 9:42 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi Erick,
>
> I think I'll just setup the ZooKeeper server in standalone mode first,
> before I get more confused as I'm quite new to both Solr and ZooKeeper too.
> Better not to jump the gun.
>
> However, I face this error when I try to start it in standalone mode.
>
> 2015-04-07 11:59:51,789 [myid:] - ERROR [main:ZooKeeperServerMain@54] -
> Invalid arguments, exiting abnormally
> java.lang.NumberFormatException: For input string:
> "C:\Users\edwin\zookeeper-3.4.6\bin\..\conf\zoo.cfg"
> at java.lang.NumberFormatException.forInputString(Unknown Source)
> at java.lang.Integer.parseInt(Unknown Source)
> at java.lang.Integer.parseInt(Unknown Source)
> at org.apache.zookeeper.server.ServerConfig.parse(ServerConfig.java:60)
> at
>
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:83)
> at
>
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> 2015-04-07 11:59:51,796 [myid:] - INFO  [main:ZooKeeperServerMain@55] -
> Usage: ZooKeeperServerMain configfile | port datadir [ticktime] [maxcnxns]
>
>
> I have the following information in my zoo.cfg:
>
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\singleserver
> clientPort=8983
>
>
> I got the same error even if I set the clientPort=2888.
>
>
> Regards,
> Edwin
>
>
>
> On 7 April 2015 at 11:26, Erick Erickson  wrote:
>
> > Believe me, I'm no Zookeeper expert, but it looks to me like you're
> > mixing Solr ports and Zookeeper ports. AFAIK, the two ports in
> > the zoo.cfg file are exclusively for the Zookeeper instances to talk
> > to each other. Zookeeper isn't aware that the listening nodes are
> > Solr noodes, so putting Solr ports in there is confusing Zookeeper
> > I'd guess.
> >
> > Assuming you're starting your three ZK instances on ports 2888, 2889 and
> > 2890,
> > I'd expect the proper ports are
> > 2888:3888
> > 2889:3889
> > 2890:3890
> >
> > But as I said I'm not a Zookeeper expert so beware..
> >
> >
> > Best,
> > Erick
> >
> > On Mon, Apr 6, 2015 at 7:57 PM, Zheng Lin Edwin Yeo
> >  wrote:
> > > Hi,
> > >
> > > I'm using Solr 5.0.0 and ZooKeeper 3.4.6. I'm trying to set up a
> > ZooKeeper
> > > with simulation of 3 servers, but they are all located on the same
> > machine
> > > for testing purpose.
> > >
> > > In my zoo.cfg file, I have listed down the 3 servers to be as follows:
> > > server.1=localhost:8983:3888
> > > server.2=localhost:8984:3889
> > > server.3=localhost:8985:3890
> > >
> > > Then I try to start Solr using the following command:
> > > bin/solr start -e cloud -z localhost:8983-noprompt
> > >
> > > However, I'm unable to establish a connection from my Solr to the
> > > ZooKeeper. Is this configuration possible, or is there anything which I
> > > missed out?
> > >
> > > Thank you in advance for your help.
> > >
> > > Regards,
> > > Edwin
> >
>


Re: Solr 4.2.0 index corruption issue

2015-04-07 Thread Puneet Jain
HI Guys,

Please can someone help out here to pin-point the issue..?

Thanks & Regards,
Puneet

On Mon, Apr 6, 2015 at 1:27 PM, Puneet Jain  wrote:

> Hi Guys,
>
> I am using 4.2.0 since more than a year and since last October 2014 facing
> index corruption issue. However, now it is happening everyday and have to
> built a fresh index for the temporary fix. Please find the logs below where
> i can see an error while replicating data from master to slave and notice
> the index corruption issue at slave nodes:
>
> 2015-04-05 00:00:37,671 ERROR snapPuller-15-thread-1 [handler.SnapPuller]
> - Error closing the file stream: _1re_Lucene41_0.tim
> java.io.IOException: Input/output error
> at java.io.RandomAccessFile.close0(Native Method)
> at java.io.RandomAccessFile.close(RandomAccessFile.java:543)
> at
> org.apache.lucene.store.FSDirectory$FSIndexOutput.close(FSDirectory.java:494)
> at
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1223)
> at
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1117)
> at
> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:744)
> at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:398)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:281)
> at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:223)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
>
> Not getting exact solution for the same was thinking to upgrade to SOLR
> 4.7.0 as it uses new versions of httpcomponents and i thought that older
> version have some issues. Please can someone recommend what can be done to
> avoid the index corruption issue in SOLR 4.2.0.
>
> Thanks in advance..!
>
> Thanks & Regards,
> Puneet
>


Re: Collapse and Expand behaviour on result with 1 document.

2015-04-07 Thread Derek Poh

Hi Joel

Is the number of documents info available when using collapse and expand 
parameters?


I can't seem to find it in the return xml.
I know the numFound in the the main result set (maxScore="6.470696" name="response" numFound="27" start="0">) refer to 
the number of collapse groups.


I need to issue another query without the collapse and expand parameters 
to get the total number of documents?
Or is there any fieldor parameter that indicate the number of documents 
that can be return through 'fl' parameter?


I am trying to display such info on the front-end,

571 "led" results from 240 suppliers.


On 4/1/2015 7:05 PM, Joel Bernstein wrote:

Exactly correct.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Apr 1, 2015 at 5:44 AM, Derek Poh  wrote:


Hi Joel

Correct me if my understanding is wrong.
Using supplier id as the field to collapse on.

- If thecollapse group heads inthe main result set has only 1document in
each group, the expanded section will be empty since there are no documents
to expandfor each collapse group.
- To render the page, I need to iterate the main result set. For each
document I have to check if there is an expanded group with the same
supplier id.
- The facets counts is based on the number of collapse groupsin the main
result set ()

-Derek


On 3/31/2015 7:43 PM, Joel Bernstein wrote:


The way that collapse/expand is designed to be used is as follows:

The main result set will contain the collapsed group heads.

The expanded section will contain the expanded groups for the page of
results.

To render the page you iterate the main result set. For each document
check
to see if there is an expanded group.




Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein 
wrote:

  You should be able to use collapse/expand with one result.

Does the document in the main result set have group members that aren't
being expanded?



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh 
wrote:

  If I want to group the results (by a certain field) even if there is

only
1 document, I should use the group parameter instead?
The requirement is to group the result of product documents by their
supplier id.
"&group=true&group.field=P_SupplierId&group.limit=5"

Is it true that the performance of collapse is better than group
parameter on large data set, say 10-20 million documents?

-Derek


On 3/31/2015 10:03 AM, Joel Bernstein wrote:

  The expanded section will only include groups that have expanded

documents.

So, if the document that in the main result set has no documents to
expand,
then this is working as expected.



Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh 
wrote:

   Hi


I have a query which return 1 document.
When I add the collapse and expand parameters to it,
"&expand=true&expand.rows=5&fq={!collapse%20field=P_SupplierId}", the
expanded section is empty ().

Is this the behaviour of collapse and expand parameters on result
which
contain only 1 document?

-Derek









Re: Setting up SolrCloud 5.0.0 and ZooKeeper 3.4.6

2015-04-07 Thread Zheng Lin Edwin Yeo
Thanks Swaraj.

It is working now, after I run without start, and changing the zookeeper
port to 2888 instead.

Regards,
Edwin


On 7 April 2015 at 14:59, Swaraj Kumar  wrote:

> As per http://stackoverflow.com/questions/11765015/zookeeper-not-starting
> 
> Running without start will fix this.
>
> One more change you need to do is Solr default runs on 8983 and you have
> used 8983 in zookeeper so start solr on different port.
>
> Regards,
>
>
> Swaraj Kumar
> Senior Software Engineer I
> MakeMyTrip.com
> Mob No- 9811774497
>
> On Tue, Apr 7, 2015 at 9:42 AM, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi Erick,
> >
> > I think I'll just setup the ZooKeeper server in standalone mode first,
> > before I get more confused as I'm quite new to both Solr and ZooKeeper
> too.
> > Better not to jump the gun.
> >
> > However, I face this error when I try to start it in standalone mode.
> >
> > 2015-04-07 11:59:51,789 [myid:] - ERROR [main:ZooKeeperServerMain@54] -
> > Invalid arguments, exiting abnormally
> > java.lang.NumberFormatException: For input string:
> > "C:\Users\edwin\zookeeper-3.4.6\bin\..\conf\zoo.cfg"
> > at java.lang.NumberFormatException.forInputString(Unknown Source)
> > at java.lang.Integer.parseInt(Unknown Source)
> > at java.lang.Integer.parseInt(Unknown Source)
> > at org.apache.zookeeper.server.ServerConfig.parse(ServerConfig.java:60)
> > at
> >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:83)
> > at
> >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> > at
> >
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> > at
> >
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> > 2015-04-07 11:59:51,796 [myid:] - INFO  [main:ZooKeeperServerMain@55] -
> > Usage: ZooKeeperServerMain configfile | port datadir [ticktime]
> [maxcnxns]
> >
> >
> > I have the following information in my zoo.cfg:
> >
> > tickTime=2000
> > initLimit=10
> > syncLimit=5
> > dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\singleserver
> > clientPort=8983
> >
> >
> > I got the same error even if I set the clientPort=2888.
> >
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On 7 April 2015 at 11:26, Erick Erickson 
> wrote:
> >
> > > Believe me, I'm no Zookeeper expert, but it looks to me like you're
> > > mixing Solr ports and Zookeeper ports. AFAIK, the two ports in
> > > the zoo.cfg file are exclusively for the Zookeeper instances to talk
> > > to each other. Zookeeper isn't aware that the listening nodes are
> > > Solr noodes, so putting Solr ports in there is confusing Zookeeper
> > > I'd guess.
> > >
> > > Assuming you're starting your three ZK instances on ports 2888, 2889
> and
> > > 2890,
> > > I'd expect the proper ports are
> > > 2888:3888
> > > 2889:3889
> > > 2890:3890
> > >
> > > But as I said I'm not a Zookeeper expert so beware..
> > >
> > >
> > > Best,
> > > Erick
> > >
> > > On Mon, Apr 6, 2015 at 7:57 PM, Zheng Lin Edwin Yeo
> > >  wrote:
> > > > Hi,
> > > >
> > > > I'm using Solr 5.0.0 and ZooKeeper 3.4.6. I'm trying to set up a
> > > ZooKeeper
> > > > with simulation of 3 servers, but they are all located on the same
> > > machine
> > > > for testing purpose.
> > > >
> > > > In my zoo.cfg file, I have listed down the 3 servers to be as
> follows:
> > > > server.1=localhost:8983:3888
> > > > server.2=localhost:8984:3889
> > > > server.3=localhost:8985:3890
> > > >
> > > > Then I try to start Solr using the following command:
> > > > bin/solr start -e cloud -z localhost:8983-noprompt
> > > >
> > > > However, I'm unable to establish a connection from my Solr to the
> > > > ZooKeeper. Is this configuration possible, or is there anything
> which I
> > > > missed out?
> > > >
> > > > Thank you in advance for your help.
> > > >
> > > > Regards,
> > > > Edwin
> > >
> >
>


RE: How do I use CachedSqlEntityProcessor?

2015-04-07 Thread chuotlac
The conversation helps me understand Cached processor a lot. I'm working on
DIH cache using MapDB as backed engine instead of default
CachedSqlEntityProcessor



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4198037.html
Sent from the Solr - User mailing list archive at Nabble.com.


What is the best way of Indexing different formats of documents?

2015-04-07 Thread sangeetha.subraman...@gtnexus.com
Hi,

I am a newbie to SOLR and basically from database background. We have a 
requirement of indexing files of different formats (x12,edifact, csv,xml).
The files which are inputted can be of any format and we need to do a content 
based search on it.

>From the web I understand we can use TIKA processor to extract the content and 
>store it in SOLR. What I want to know is, is there any better approach for 
>indexing files in SOLR ? Can we index the document through streaming directly 
>from the Application ? If so what is the disadvantage of using it (against DIH 
>which fetches from the database)? Could someone share me some insight on this 
>? ls there any web links which I can refer to get some idea on it ? Please do 
>help.

Thanks
Sangeetha



Re: Collapse and Expand behaviour on result with 1 document.

2015-04-07 Thread Joel Bernstein
I believe currently issuing another query will be necessary to get the
count of the expanded result set.

I think it does make sense to include this information as part of the
ExpandComponent output. So feel free to create a jira ticket for this and
we should be able to get this into a future release.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Apr 7, 2015 at 3:27 AM, Derek Poh  wrote:

> Hi Joel
>
> Is the number of documents info available when using collapse and expand
> parameters?
>
> I can't seem to find it in the return xml.
> I know the numFound in the the main result set ( maxScore="6.470696" name="response" numFound="27" start="0">) refer to the
> number of collapse groups.
>
> I need to issue another query without the collapse and expand parameters
> to get the total number of documents?
> Or is there any fieldor parameter that indicate the number of documents
> that can be return through 'fl' parameter?
>
> I am trying to display such info on the front-end,
>
> 571 "led" results from 240 suppliers.
>
>
>
> On 4/1/2015 7:05 PM, Joel Bernstein wrote:
>
>> Exactly correct.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Wed, Apr 1, 2015 at 5:44 AM, Derek Poh  wrote:
>>
>>  Hi Joel
>>>
>>> Correct me if my understanding is wrong.
>>> Using supplier id as the field to collapse on.
>>>
>>> - If thecollapse group heads inthe main result set has only 1document in
>>> each group, the expanded section will be empty since there are no
>>> documents
>>> to expandfor each collapse group.
>>> - To render the page, I need to iterate the main result set. For each
>>> document I have to check if there is an expanded group with the same
>>> supplier id.
>>> - The facets counts is based on the number of collapse groupsin the main
>>> result set (>> start="0">)
>>>
>>> -Derek
>>>
>>>
>>> On 3/31/2015 7:43 PM, Joel Bernstein wrote:
>>>
>>>  The way that collapse/expand is designed to be used is as follows:

 The main result set will contain the collapsed group heads.

 The expanded section will contain the expanded groups for the page of
 results.

 To render the page you iterate the main result set. For each document
 check
 to see if there is an expanded group.




 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein 
 wrote:

   You should be able to use collapse/expand with one result.

> Does the document in the main result set have group members that aren't
> being expanded?
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh 
> wrote:
>
>   If I want to group the results (by a certain field) even if there is
>
>> only
>> 1 document, I should use the group parameter instead?
>> The requirement is to group the result of product documents by their
>> supplier id.
>> "&group=true&group.field=P_SupplierId&group.limit=5"
>>
>> Is it true that the performance of collapse is better than group
>> parameter on large data set, say 10-20 million documents?
>>
>> -Derek
>>
>>
>> On 3/31/2015 10:03 AM, Joel Bernstein wrote:
>>
>>   The expanded section will only include groups that have expanded
>>
>>> documents.
>>>
>>> So, if the document that in the main result set has no documents to
>>> expand,
>>> then this is working as expected.
>>>
>>>
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh 
>>> wrote:
>>>
>>>Hi
>>>
>>>  I have a query which return 1 document.
 When I add the collapse and expand parameters to it,
 "&expand=true&expand.rows=5&fq={!collapse%20field=P_SupplierId}",
 the
 expanded section is empty ().

 Is this the behaviour of collapse and expand parameters on result
 which
 contain only 1 document?

 -Derek






>


Re: What is the best way of Indexing different formats of documents?

2015-04-07 Thread Swaraj Kumar
You can always choose either DIH or /update/extract to index docs in solr.
Now there are multiple benefits of DIH which I am listing below :-

1. Clean and update using a single command.
2. DIH also optimize indexing using optimize=true
3. You can do delta-import based on last index time where as in case of
/update/extract you need to do manual operation in case of delta import.
4. You can use multiple entity processor and transformers in case of DIH
which is very useful to index exact data you want.
5. Query parameter "rows" limits the num of records.

Regards,


Swaraj Kumar
Senior Software Engineer I
MakeMyTrip.com
Mob No- 9811774497

On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com <
sangeetha.subraman...@gtnexus.com> wrote:

> Hi,
>
> I am a newbie to SOLR and basically from database background. We have a
> requirement of indexing files of different formats (x12,edifact, csv,xml).
> The files which are inputted can be of any format and we need to do a
> content based search on it.
>
> From the web I understand we can use TIKA processor to extract the content
> and store it in SOLR. What I want to know is, is there any better approach
> for indexing files in SOLR ? Can we index the document through streaming
> directly from the Application ? If so what is the disadvantage of using it
> (against DIH which fetches from the database)? Could someone share me some
> insight on this ? ls there any web links which I can refer to get some idea
> on it ? Please do help.
>
> Thanks
> Sangeetha
>
>


Lucene indexWriter update does not affect Solr search

2015-04-07 Thread Ali Nazemian
I implement a small code for the purpose of extracting some keywords out of
Lucene index. I did implement that using search component. My problem is
when I tried to update Lucene IndexWriter, Solr index which is placed on
top of that, does not affect. As you can see I did the commit part.

BooleanQuery query = new BooleanQuery();
for (String fieldName : keywordSourceFields) {
  TermQuery termQuery = new TermQuery(new Term(fieldName,"N/A"));
  query.add(termQuery, Occur.MUST_NOT);
}
TermQuery termQuery=new TermQuery(new Term(keywordField, "N/A"));
query.add(termQuery, Occur.MUST);
try {
  //Query q= new QueryParser(keywordField, new
StandardAnalyzer()).parse(query.toString());
  TopDocs results = searcher.search(query,
  maxNumDocs);
  ScoreDoc[] hits = results.scoreDocs;
  IndexWriter writer = getLuceneIndexWriter(searcher.getPath());
  for (int i = 0; i < hits.length; i++) {
Document document = searcher.doc(hits[i].doc);
List keywords = keyword.getKeywords(hits[i].doc);
if(keywords.size()>0) document.removeFields(keywordField);
for (String word : keywords) {
  document.add(new StringField(keywordField, word,
Field.Store.YES));
}
String uniqueKey =
searcher.getSchema().getUniqueKeyField().getName();
writer.updateDocument(new Term(uniqueKey,
document.get(uniqueKey)),
document);
  }
  writer.commit();
  writer.forceMerge(1);
  writer.close();
} catch (IOException | SyntaxError e) {
  throw new RuntimeException();
}

Please help me through solving this problem.

-- 
A.Nazemian


Re: Lucene indexWriter update does not affect Solr search

2015-04-07 Thread Upayavira
What are you trying to do? A search component is not intended for
updating the index, so it really doesn’t surprise me that you aren’t
seeing updates.

I’d suggest you describe the problem you are trying to solve before
proposing solutions.

Upayavira


On Tue, Apr 7, 2015, at 01:32 PM, Ali Nazemian wrote:
> I implement a small code for the purpose of extracting some keywords out
> of
> Lucene index. I did implement that using search component. My problem is
> when I tried to update Lucene IndexWriter, Solr index which is placed on
> top of that, does not affect. As you can see I did the commit part.
> 
> BooleanQuery query = new BooleanQuery();
> for (String fieldName : keywordSourceFields) {
>   TermQuery termQuery = new TermQuery(new Term(fieldName,"N/A"));
>   query.add(termQuery, Occur.MUST_NOT);
> }
> TermQuery termQuery=new TermQuery(new Term(keywordField, "N/A"));
> query.add(termQuery, Occur.MUST);
> try {
>   //Query q= new QueryParser(keywordField, new
> StandardAnalyzer()).parse(query.toString());
>   TopDocs results = searcher.search(query,
>   maxNumDocs);
>   ScoreDoc[] hits = results.scoreDocs;
>   IndexWriter writer = getLuceneIndexWriter(searcher.getPath());
>   for (int i = 0; i < hits.length; i++) {
> Document document = searcher.doc(hits[i].doc);
> List keywords = keyword.getKeywords(hits[i].doc);
> if(keywords.size()>0) document.removeFields(keywordField);
> for (String word : keywords) {
>   document.add(new StringField(keywordField, word,
> Field.Store.YES));
> }
> String uniqueKey =
> searcher.getSchema().getUniqueKeyField().getName();
> writer.updateDocument(new Term(uniqueKey,
> document.get(uniqueKey)),
> document);
>   }
>   writer.commit();
>   writer.forceMerge(1);
>   writer.close();
> } catch (IOException | SyntaxError e) {
>   throw new RuntimeException();
> }
> 
> Please help me through solving this problem.
> 
> -- 
> A.Nazemian


Re: What is the best way of Indexing different formats of documents?

2015-04-07 Thread Upayavira


On Tue, Apr 7, 2015, at 11:48 AM, sangeetha.subraman...@gtnexus.com
wrote:
> Hi,
> 
> I am a newbie to SOLR and basically from database background. We have a
> requirement of indexing files of different formats (x12,edifact,
> csv,xml).
> The files which are inputted can be of any format and we need to do a
> content based search on it.
> 
> From the web I understand we can use TIKA processor to extract the
> content and store it in SOLR. What I want to know is, is there any better
> approach for indexing files in SOLR ? Can we index the document through
> streaming directly from the Application ? If so what is the disadvantage
> of using it (against DIH which fetches from the database)? Could someone
> share me some insight on this ? ls there any web links which I can refer
> to get some idea on it ? Please do help.

You can have Solr do the TIKA work for you, by posting to
update/extract. See here:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

You can only post one document at a time, and you will have to provide
extra metadata fields in the URL you post to (e.g. the document ID).

If the extracting update handler can handle what you need, then you are
good. Otherwise, you will want to write your own code to call Tika, then
push the extracted content as a plain document.

Solr is just an HTTP server, so your application can post binary files
for Solr to ingest with Tika, or otherwise.

Upayavira


Re: What is the best way of Indexing different formats of documents?

2015-04-07 Thread Yavar Husain
Well have indexed heterogeneous sources including a variety of NoSQL's,
RDBMs and Rich Documents (PDF Word etc.) using SolrJ. The only prerequisite
of using SolrJ is that you should have an API to fetch data from your data
source (Say JDBC for RDBMS, Tika for extracting text content from rich
documents etc.) than SolrJ is so damn great and simple. Its as simple as
downloading the jar and few lines of code to send data to your solr server
after pre-processing your data. More details here:

http://lucidworks.com/blog/indexing-with-solrj/

https://wiki.apache.org/solr/Solrj

http://www.solrtutorial.com/solrj-tutorial.html

Cheers,
Yavar



On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com <
sangeetha.subraman...@gtnexus.com> wrote:

> Hi,
>
> I am a newbie to SOLR and basically from database background. We have a
> requirement of indexing files of different formats (x12,edifact, csv,xml).
> The files which are inputted can be of any format and we need to do a
> content based search on it.
>
> From the web I understand we can use TIKA processor to extract the content
> and store it in SOLR. What I want to know is, is there any better approach
> for indexing files in SOLR ? Can we index the document through streaming
> directly from the Application ? If so what is the disadvantage of using it
> (against DIH which fetches from the database)? Could someone share me some
> insight on this ? ls there any web links which I can refer to get some idea
> on it ? Please do help.
>
> Thanks
> Sangeetha
>
>


Re: Lucene indexWriter update does not affect Solr search

2015-04-07 Thread Ali Nazemian
Dear Upayavira,
Hi,
It is just the part of my code in which caused the problem. I know
searchComponent is not for changing the index, but for the purpose of
extracting document keywords I was forced to hack searchComponent for
extracting keywords and putting them into index.
For more information about why I chose searchComponent at the first place
please follow this link:
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201503.mbox/browser

Best regards.


On Tue, Apr 7, 2015 at 5:30 PM, Upayavira  wrote:

> What are you trying to do? A search component is not intended for
> updating the index, so it really doesn’t surprise me that you aren’t
> seeing updates.
>
> I’d suggest you describe the problem you are trying to solve before
> proposing solutions.
>
> Upayavira
>
>
> On Tue, Apr 7, 2015, at 01:32 PM, Ali Nazemian wrote:
> > I implement a small code for the purpose of extracting some keywords out
> > of
> > Lucene index. I did implement that using search component. My problem is
> > when I tried to update Lucene IndexWriter, Solr index which is placed on
> > top of that, does not affect. As you can see I did the commit part.
> >
> > BooleanQuery query = new BooleanQuery();
> > for (String fieldName : keywordSourceFields) {
> >   TermQuery termQuery = new TermQuery(new Term(fieldName,"N/A"));
> >   query.add(termQuery, Occur.MUST_NOT);
> > }
> > TermQuery termQuery=new TermQuery(new Term(keywordField, "N/A"));
> > query.add(termQuery, Occur.MUST);
> > try {
> >   //Query q= new QueryParser(keywordField, new
> > StandardAnalyzer()).parse(query.toString());
> >   TopDocs results = searcher.search(query,
> >   maxNumDocs);
> >   ScoreDoc[] hits = results.scoreDocs;
> >   IndexWriter writer = getLuceneIndexWriter(searcher.getPath());
> >   for (int i = 0; i < hits.length; i++) {
> > Document document = searcher.doc(hits[i].doc);
> > List keywords = keyword.getKeywords(hits[i].doc);
> > if(keywords.size()>0) document.removeFields(keywordField);
> > for (String word : keywords) {
> >   document.add(new StringField(keywordField, word,
> > Field.Store.YES));
> > }
> > String uniqueKey =
> > searcher.getSchema().getUniqueKeyField().getName();
> > writer.updateDocument(new Term(uniqueKey,
> > document.get(uniqueKey)),
> > document);
> >   }
> >   writer.commit();
> >   writer.forceMerge(1);
> >   writer.close();
> > } catch (IOException | SyntaxError e) {
> >   throw new RuntimeException();
> > }
> >
> > Please help me through solving this problem.
> >
> > --
> > A.Nazemian
>



-- 
A.Nazemian


DictionaryCompoundWordTokenFilterFactory - Dictionary/Compound-Words File

2015-04-07 Thread Mike L.

Solr User Group -

   I have a case where I need to be able to search against compound words, even 
when the user delimits with a space. (e.g. baseball => base ball).  I think 
I've solved this by creating a compound-words dictionary file containing the 
split words that I would want DictionaryCompoundWordTokenFilterFactory to split.
 base \n  
ball
I also applied in the synonym file the following rule: baseball => base ball  ( 
to allow baseball to also get a hit)
      
  
Two questions - If I could in advance figure out all the compound words I would 
want to split, would it be better (more reliable results) for me to maintain 
this compount-words file or would it be better to throw one of those open 
office dictionaries at it the filter?
Also - Any better suggestions to dealing with this problem vs the one I 
described using both the dictionary filter and the synonym rule?
Thanks in advance!
Mike



Re: DictionaryCompoundWordTokenFilterFactory - Dictionary/Compound-Words File

2015-04-07 Thread Mike L.

Typo:   *even when the user delimits with a space. (e.g. base ball should find 
baseball). 

Thanks,
  From: Mike L. 
 To: "solr-user@lucene.apache.org"  
 Sent: Tuesday, April 7, 2015 9:05 AM
 Subject: DictionaryCompoundWordTokenFilterFactory - Dictionary/Compound-Words 
File
   

Solr User Group -

   I have a case where I need to be able to search against compound words, even 
when the user delimits with a space. (e.g. baseball => base ball).  I think 
I've solved this by creating a compound-words dictionary file containing the 
split words that I would want DictionaryCompoundWordTokenFilterFactory to split.
 base \n  
ball
I also applied in the synonym file the following rule: baseball => base ball  ( 
to allow baseball to also get a hit)
      
  
Two questions - If I could in advance figure out all the compound words I would 
want to split, would it be better (more reliable results) for me to maintain 
this compount-words file or would it be better to throw one of those open 
office dictionaries at it the filter?
Also - Any better suggestions to dealing with this problem vs the one I 
described using both the dictionary filter and the synonym rule?
Thanks in advance!
Mike



  

Re: Lucene indexWriter update does not affect Solr search

2015-04-07 Thread Ali Nazemian
I did some investigation and found out that the retrieving part of
documents works fine while Solr did not restarted. But the searching part
of documents did not work. After I restarted Solr it seems that the core
corrupted and failed to start! Here is the corresponding log:

org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.(SolrCore.java:896)
at org.apache.solr.core.SolrCore.(SolrCore.java:662)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:278)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:272)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1604)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1716)
at org.apache.solr.core.SolrCore.(SolrCore.java:868)
... 9 more
Caused by: org.apache.lucene.index.IndexNotFoundException: no
segments* file found in
NRTCachingDirectory(MMapDirectory@C:\Users\Ali\workspace\lucene_solr_5_0_0\solr\server\solr\document\data\index
lockFactory=org.apache.lucene.store.SimpleFSLockFactory@3bf76891;
maxCacheMB=48.0 maxMergeSizeMB=4.0): files: [_2_Lucene50_0.doc,
write.lock, _2_Lucene50_0.pos, _2.nvd, _2.fdt, _2_Lucene50_0.tim]
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:821)
at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:78)
at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:65)
at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:272)
at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:115)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1573)
... 11 more

4/7/2015, 6:53:26 PM
ERROR
SolrIndexWriter
SolrIndexWriter was not closed prior to finalize(),​ indicates a bug
-- POSSIBLE RESOURCE LEAK!!!
4/7/2015, 6:53:26 PM
ERROR
SolrIndexWriter
Error closing IndexWriter
java.lang.NullPointerException
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2959)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2927)
at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:965)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1010)
at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:130)
at org.apache.solr.update.SolrIndexWriter.finalize(SolrIndexWriter.java:183)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:101)
at java.lang.ref.Finalizer.access$100(Finalizer.java:32)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:190)

There for my guess would be problem with indexing the keywordField and also
problem related to closing the IndexWriter.

On Tue, Apr 7, 2015 at 6:13 PM, Ali Nazemian  wrote:

> Dear Upayavira,
> Hi,
> It is just the part of my code in which caused the problem. I know
> searchComponent is not for changing the index, but for the purpose of
> extracting document keywords I was forced to hack searchComponent for
> extracting keywords and putting them into index.
> For more information about why I chose searchComponent at the first place
> please follow this link:
>
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201503.mbox/browser
>
> Best regards.
>
>
> On Tue, Apr 7, 2015 at 5:30 PM, Upayavira  wrote:
>
>> What are you trying to do? A search component is not intended for
>> updating the index, so it really doesn’t surprise me that you aren’t
>> seeing updates.
>>
>> I’d suggest you describe the problem you are trying to solve before
>> proposing solutions.
>>
>> Upayavira
>>
>>
>> On Tue, Apr 7, 2015, at 01:32 PM, Ali Nazemian wrote:
>> > I implement a small code for the purpose of extracting some keywords out
>> > of
>> > Lucene index. I did implement that using search component. My problem is
>> > when I tried to update Lucene IndexWriter, Solr index which is placed on
>> > top of that, does not affect. As you can see I did the commit part.
>> >
>> > BooleanQuery query = new BooleanQuery();
>> > for (String fieldName : keywordSourceFields) {
>> >   TermQuery termQuery = new TermQuery(new
>> Term(fieldName,"N/A"));
>> >   query.add(termQuery, Occur.MUST_NOT);
>> > }
>> > TermQuery termQuery=new TermQuery(new Term(keywordField,
>> "N/A"));
>> > query.add(termQuery, Occur.MUST);
>> > try {
>> >   //Query q= 

Re: What is the best way of Indexing different formats of documents?

2015-04-07 Thread Dan Davis
Sangeetha,

You can also run Tika directly from data import handler, and Data Import
Handler can be made to run several threads if you can partition the input
documents by directory or database id.   I've done 4 "threads" by having a
base configuration that does an Oracle query like this:

  SELECT * (SELECT id, url, ..., Modulo(rowNum, 4) as threadid FROM ...
WHERE ...) WHERE threadid = %d

A bash/sed script writes several data import handler XML files.
I can then index several threads at a time.

Each of these threads can then use all the transformers, e.g.
templateTransformer, etc.
XML can be transformed via XSLT.

The Data Import Handler has other entities that go out to the web and then
index the document via Tika.

If you are indexing generic HTML, you may want to figure out an approach to
SOLR-3808 and SOLR-2250 - this can be resolved by recompiling Solr and Tika
locally, because Boilerpipe has a bug that has been fixed, but not pushed
to Maven Central.   Without that, the ASF cannot include the fix, but
distributions such as LucidWorks Solr Enterprise can.

I can drop some configs into github.com if I clean them up to obfuscate
host names, passwords, and such.


On Tue, Apr 7, 2015 at 9:14 AM, Yavar Husain  wrote:

> Well have indexed heterogeneous sources including a variety of NoSQL's,
> RDBMs and Rich Documents (PDF Word etc.) using SolrJ. The only prerequisite
> of using SolrJ is that you should have an API to fetch data from your data
> source (Say JDBC for RDBMS, Tika for extracting text content from rich
> documents etc.) than SolrJ is so damn great and simple. Its as simple as
> downloading the jar and few lines of code to send data to your solr server
> after pre-processing your data. More details here:
>
> http://lucidworks.com/blog/indexing-with-solrj/
>
> https://wiki.apache.org/solr/Solrj
>
> http://www.solrtutorial.com/solrj-tutorial.html
>
> Cheers,
> Yavar
>
>
>
> On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com <
> sangeetha.subraman...@gtnexus.com> wrote:
>
> > Hi,
> >
> > I am a newbie to SOLR and basically from database background. We have a
> > requirement of indexing files of different formats (x12,edifact,
> csv,xml).
> > The files which are inputted can be of any format and we need to do a
> > content based search on it.
> >
> > From the web I understand we can use TIKA processor to extract the
> content
> > and store it in SOLR. What I want to know is, is there any better
> approach
> > for indexing files in SOLR ? Can we index the document through streaming
> > directly from the Application ? If so what is the disadvantage of using
> it
> > (against DIH which fetches from the database)? Could someone share me
> some
> > insight on this ? ls there any web links which I can refer to get some
> idea
> > on it ? Please do help.
> >
> > Thanks
> > Sangeetha
> >
> >
>


Re: Problem with new solr.xml format and core swaps

2015-04-07 Thread Erick Erickson
Shawn:

I'm pretty clueless why you would be seeing this, and slammed with
other stuff so I can't dig into this right now.

What do the "core.properties" files look like when you see this? They
should be re-written when you swap cores. Hmmm, I wonder if there's
some condition where the files are already open and the persistence
fails? If so we should be logging that error, I have no proof either
way whether we are or not though.

Guessing that your log files in the problem case weren't all that
helpful, but let's have a look at them if this occurs again?

Sorry I can't be more help
Erick

On Mon, Apr 6, 2015 at 8:38 PM, Shawn Heisey  wrote:
> On 4/6/2015 6:40 PM, Erick Erickson wrote:
>> What version are you migrating _from_? 4.9.0? There were some
>> persistence issues at one point, but AFAIK they were fixed by 4.9, I
>> can check if you're on an earlier version...
>
> Effectively there is no previous version.  Whenever I upgrade, I delete
> all the data directories and completely reindex.  When I converted from
> the old solr.xml to core discovery, the server was already on 4.9.1.
>
> Thanks,
> Shawn
>


Re: What is the best way of Indexing different formats of documents?

2015-04-07 Thread Erick Erickson
The disadvantages of DIH are
1> it's a black box, debugging it isn't easy
2> it puts all the work on the Solr node. Parsing documents in various
forms can be pretty heavy-weight and steal cycles from indexing and
searching.
2a> the extracting request handler also puts all the load on Solr FWIW.


Personally I prefer an external program (and I was gratified to see
Yavar's reference to the indexing with SolrJ article...). But then I'm
a Java programmer by training, so that seems easy...

Best,
Erick

On Tue, Apr 7, 2015 at 7:41 AM, Dan Davis  wrote:
> Sangeetha,
>
> You can also run Tika directly from data import handler, and Data Import
> Handler can be made to run several threads if you can partition the input
> documents by directory or database id.   I've done 4 "threads" by having a
> base configuration that does an Oracle query like this:
>
>   SELECT * (SELECT id, url, ..., Modulo(rowNum, 4) as threadid FROM ...
> WHERE ...) WHERE threadid = %d
>
> A bash/sed script writes several data import handler XML files.
> I can then index several threads at a time.
>
> Each of these threads can then use all the transformers, e.g.
> templateTransformer, etc.
> XML can be transformed via XSLT.
>
> The Data Import Handler has other entities that go out to the web and then
> index the document via Tika.
>
> If you are indexing generic HTML, you may want to figure out an approach to
> SOLR-3808 and SOLR-2250 - this can be resolved by recompiling Solr and Tika
> locally, because Boilerpipe has a bug that has been fixed, but not pushed
> to Maven Central.   Without that, the ASF cannot include the fix, but
> distributions such as LucidWorks Solr Enterprise can.
>
> I can drop some configs into github.com if I clean them up to obfuscate
> host names, passwords, and such.
>
>
> On Tue, Apr 7, 2015 at 9:14 AM, Yavar Husain  wrote:
>
>> Well have indexed heterogeneous sources including a variety of NoSQL's,
>> RDBMs and Rich Documents (PDF Word etc.) using SolrJ. The only prerequisite
>> of using SolrJ is that you should have an API to fetch data from your data
>> source (Say JDBC for RDBMS, Tika for extracting text content from rich
>> documents etc.) than SolrJ is so damn great and simple. Its as simple as
>> downloading the jar and few lines of code to send data to your solr server
>> after pre-processing your data. More details here:
>>
>> http://lucidworks.com/blog/indexing-with-solrj/
>>
>> https://wiki.apache.org/solr/Solrj
>>
>> http://www.solrtutorial.com/solrj-tutorial.html
>>
>> Cheers,
>> Yavar
>>
>>
>>
>> On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com <
>> sangeetha.subraman...@gtnexus.com> wrote:
>>
>> > Hi,
>> >
>> > I am a newbie to SOLR and basically from database background. We have a
>> > requirement of indexing files of different formats (x12,edifact,
>> csv,xml).
>> > The files which are inputted can be of any format and we need to do a
>> > content based search on it.
>> >
>> > From the web I understand we can use TIKA processor to extract the
>> content
>> > and store it in SOLR. What I want to know is, is there any better
>> approach
>> > for indexing files in SOLR ? Can we index the document through streaming
>> > directly from the Application ? If so what is the disadvantage of using
>> it
>> > (against DIH which fetches from the database)? Could someone share me
>> some
>> > insight on this ? ls there any web links which I can refer to get some
>> idea
>> > on it ? Please do help.
>> >
>> > Thanks
>> > Sangeetha
>> >
>> >
>>


Merge Two Fields in SOLR

2015-04-07 Thread EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS)
Hi Group,

I am not sure if we have any easy way to merge two  fields data in One Field, 
the Copy field doesn’t works as it stores as Multivalued.

Can someone suggest any workaround to achieve this Use Case?

FirstName:ABC
SurName:XYZ

I need an Another Field with Name:ABCXYZ where I have to do at SOLR END.. as 
the Source Data is read only and no control to comibine.


Thanks

Ravi


Re: Merge Two Fields in SOLR

2015-04-07 Thread Erick Erickson
I don't understand why copyField doesn't work. Admittedly the
firstName and SurName would be separate tokens, but isn't that what
you want? The fact that it's multiValued isn't really a problem,
multiValued fields are really functionally identical to single valued
fields if you set positionIncrementGap to... hmmm.. 1 or 0 I'm not
quite sure which.

Of course if your'e sorting by the field, that's a different story.

Here's a discussion with several options, but I really wonder what
your specific objection to copyField is, it's the simplest and on the
surface it seems like it would work.

http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-td4086786.html

Best,
Erick

On Tue, Apr 7, 2015 at 10:08 AM, EXTERNAL Taminidi Ravi (ETI,
AA-AS/PAS-PTS)  wrote:
> Hi Group,
>
> I am not sure if we have any easy way to merge two  fields data in One Field, 
> the Copy field doesn’t works as it stores as Multivalued.
>
> Can someone suggest any workaround to achieve this Use Case?
>
> FirstName:ABC
> SurName:XYZ
>
> I need an Another Field with Name:ABCXYZ where I have to do at SOLR END.. as 
> the Source Data is read only and no control to comibine.
>
>
> Thanks
>
> Ravi


Re: Trouble GetSpans lucene 4

2015-04-07 Thread Compte Poubelle
Up.
Anyone?

Best regards.

> On 6 avr. 2015, at 21:32, Test Test  wrote:
> 
> Hi, 
> I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to 
> solr 4.10.2.At the moment, i have a problem about the method 
> "getSpans"."spans.next()" returns always "false".Anyone can helps?
> SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = 
> rb.req.getSearcher();IndexReader reader = 
> searcher.getIndexReader();//AtomicReader wrapper = 
> SlowCompositeReaderWrapper.wrap(reader);Map termContexts = 
> new HashMap();//Spans spans = 
> sQuery.getSpans(wrapper.getContext(), new 
> Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == 
> true) {//}
> 
> Thanks.Regards.
> 


RE: Trouble GetSpans lucene 4

2015-04-07 Thread Allison, Timothy B.
What class is origQuery?

You will have to do more rewriting/calculation if you're trying to convert a 
PhraseQuery to a SpanNearQuery.

If you dig around in 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor in the Lucene 
highlighter package, you might get some inspiration.

I have a hack for converting regular queries to SpanQueries here (this is 
largely based on WeightedSpanTermExtractor):

https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/spans/SimpleSpanQueryConverter.java
 

-Original Message-
From: Compte Poubelle [mailto:andymish...@yahoo.fr] 
Sent: Tuesday, April 07, 2015 1:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Trouble GetSpans lucene 4

Up.
Anyone?

Best regards.

> On 6 avr. 2015, at 21:32, Test Test  wrote:
> 
> Hi, 
> I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to 
> solr 4.10.2.At the moment, i have a problem about the method 
> "getSpans"."spans.next()" returns always "false".Anyone can helps?
> SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = 
> rb.req.getSearcher();IndexReader reader = 
> searcher.getIndexReader();//AtomicReader wrapper = 
> SlowCompositeReaderWrapper.wrap(reader);Map termContexts = 
> new HashMap();//Spans spans = 
> sQuery.getSpans(wrapper.getContext(), new 
> Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == 
> true) {//}
> 
> Thanks.Regards.
> 


Re: Merge Two Fields in SOLR

2015-04-07 Thread Damien Dykman
Ravi, what about using field aliasing at search time? Would that do the
trick for your use case?

http://localhost:8983/solr/mycollection/select?defType=edismax&q=name:"john
doe"&f.name.qf=firstname surname

For more details:
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

Damien

On 04/07/2015 10:21 AM, Erick Erickson wrote:
> I don't understand why copyField doesn't work. Admittedly the
> firstName and SurName would be separate tokens, but isn't that what
> you want? The fact that it's multiValued isn't really a problem,
> multiValued fields are really functionally identical to single valued
> fields if you set positionIncrementGap to... hmmm.. 1 or 0 I'm not
> quite sure which.
>
> Of course if your'e sorting by the field, that's a different story.
>
> Here's a discussion with several options, but I really wonder what
> your specific objection to copyField is, it's the simplest and on the
> surface it seems like it would work.
>
> http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-td4086786.html
>
> Best,
> Erick
>
> On Tue, Apr 7, 2015 at 10:08 AM, EXTERNAL Taminidi Ravi (ETI,
> AA-AS/PAS-PTS)  wrote:
>> Hi Group,
>>
>> I am not sure if we have any easy way to merge two  fields data in One 
>> Field, the Copy field doesn’t works as it stores as Multivalued.
>>
>> Can someone suggest any workaround to achieve this Use Case?
>>
>> FirstName:ABC
>> SurName:XYZ
>>
>> I need an Another Field with Name:ABCXYZ where I have to do at SOLR END.. as 
>> the Source Data is read only and no control to comibine.
>>
>>
>> Thanks
>>
>> Ravi



Re: Problem with new solr.xml format and core swaps

2015-04-07 Thread Shawn Heisey
On 4/7/2015 10:54 AM, Erick Erickson wrote:
> I'm pretty clueless why you would be seeing this, and slammed with
> other stuff so I can't dig into this right now.
>
> What do the "core.properties" files look like when you see this? They
> should be re-written when you swap cores. Hmmm, I wonder if there's
> some condition where the files are already open and the persistence
> fails? If so we should be logging that error, I have no proof either
> way whether we are or not though.
>
> Guessing that your log files in the problem case weren't all that
> helpful, but let's have a look at them if this occurs again?

I hadn't had a chance to review the logs, but when I did just now, I
found this:

ERROR - 2015-04-07 11:56:15.568;
org.apache.solr.core.CorePropertiesLocator; Couldn't persist core
properties to /index/solr4/cores/sparkinc_0/core.properties:
java.io.FileNotFoundException:
/index/solr4/cores/sparkinc_0/core.properties (Permission denied)

That's fairly clear.  I guess my permissions were wrong.  My best guess
as to why -- things owned by root from when I created the
core.properties files.  Solr does not run as root.  I didn't think to
actually look at the permissions before I ran a script that I maintain
which fixes all the ownership on my various directories involved in my
full search installation.

I don't think this explains the not-deleted segment files problem. 
Those segment files were written by solr running as the regular user, so
there couldn't have been a permission problem.

Thanks,
Shawn



Re: Trouble GetSpans lucene 4

2015-04-07 Thread Test Test
Re,
origQuery is a Query object, i got it from a ResponseBuilder object, passed by 
the method getQuery.
ResponseBuilder rb // it's method parameterQuery origQuery = rb.getQuery(); 
Thanks for the link, i'll keep you informed.
Regards,Andy


 Le Mardi 7 avril 2015 20h26, "Allison, Timothy B."  a 
écrit :
   

 What class is origQuery?

You will have to do more rewriting/calculation if you're trying to convert a 
PhraseQuery to a SpanNearQuery.

If you dig around in 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor in the Lucene 
highlighter package, you might get some inspiration.

I have a hack for converting regular queries to SpanQueries here (this is 
largely based on WeightedSpanTermExtractor):

https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/spans/SimpleSpanQueryConverter.java
 

-Original Message-
From: Compte Poubelle [mailto:andymish...@yahoo.fr] 
Sent: Tuesday, April 07, 2015 1:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Trouble GetSpans lucene 4

Up.
Anyone?

Best regards.

> On 6 avr. 2015, at 21:32, Test Test  wrote:
> 
> Hi, 
> I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to 
> solr 4.10.2.At the moment, i have a problem about the method 
> "getSpans"."spans.next()" returns always "false".Anyone can helps?
> SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = 
> rb.req.getSearcher();IndexReader reader = 
> searcher.getIndexReader();//AtomicReader wrapper = 
> SlowCompositeReaderWrapper.wrap(reader);Map termContexts = 
> new HashMap();//Spans spans = 
> sQuery.getSpans(wrapper.getContext(), new 
> Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == 
> true) {//}
> 
> Thanks.Regards.
> 

  

Re: Config join parse in solrconfig.xml

2015-04-07 Thread Frank li
Cool. It actually works after I removed those extra columns. Thanks for
your help.

On Mon, Apr 6, 2015 at 8:19 PM, Erick Erickson 
wrote:

> df does not allow multiple fields, it stands for "default field", not
> "default fields". To get what you're looking for, you need to use
> edismax or explicitly create the multiple clauses.
>
> I'm not quite sure what the join parser is doing with the df
> parameter. So my first question is "what happens if you just use a
> single field for df?".
>
> Best,
> Erick
>
> On Mon, Apr 6, 2015 at 11:51 AM, Frank li  wrote:
> > The error message was from the query with "debug=query".
> >
> > On Mon, Apr 6, 2015 at 11:49 AM, Frank li  wrote:
> >
> >> Hi Erick,
> >>
> >>
> >> Thanks for your response.
> >>
> >> Here is the query I am sending:
> >>
> >>
> http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:apple&fq=type:PartyLawyerLawfirm&facet=true&facet.field=lawyer_id_lms&facet.mincount=1&rows=0
> >> <
> http://dev-solr:8080/solr/collection1/select?q=%7B!join+from=litigation_id_ls+to=lit_id_lms%7Dall_text:apple&fq=type:PartyLawyerLawfirm&facet=true&facet.field=lawyer_id_lms&facet.mincount=1&rows=0
> >
> >>
> >> You can see it has "all_text:apple". I added field name "all_text",
> >> because it gives error without it.
> >>
> >> Errors:
> >>
> >> undefined field all_text number party
> >> name all_code ent_name400
> >>
> >>
> >> These fields are defined as the default search fields in our
> >> solr_config.xml file:
> >>
> >> all_text number party name all_code ent_name
> >>
> >>
> >> Thanks,
> >>
> >> Fudong
> >>
> >> On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson  >
> >> wrote:
> >>
> >>> You have to show us several more things:
> >>>
> >>> 1> what exactly does the query look like?
> >>> 2> what do you expect?
> >>> 3> output when you specify &debug=query
> >>> 4> anything else that would help. You might review:
> >>>
> >>> http://wiki.apache.org/solr/UsingMailingLists
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Fri, Apr 3, 2015 at 10:58 AM, Frank li  wrote:
> >>> > Hi,
> >>> >
> >>> > I am starting using join parser with our solr. We have some default
> >>> fields.
> >>> > They are defined in solrconfig.xml:
> >>> >
> >>> >   
> >>> >edismax
> >>> >explicit
> >>> >10
> >>> >all_text number party name all_code
> ent_name
> >>> >all_text number^3 name^5 party^3 all_code^2
> >>> > ent_name^7
> >>> >id description market_sector_type parent
> >>> ult_parent
> >>> > ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s
> *_ss
> >>> *_ds
> >>> > *_sms *_ss *_bs
> >>> >AND
> >>> >  
> >>> >
> >>> >
> >>> > I found out once I use join parser, it does not recognize the default
> >>> > fields any more. How do I modify the configuration for this?
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Fred
> >>>
> >>
> >>
>


RE: Trouble GetSpans lucene 4

2015-04-07 Thread Allison, Timothy B.
Oh, ok, if that's just a regular query, you will need to convert it to a 
SpanQuery, and you may need to rewrite the SpanQuery after conversion.

If you're trying to do a concordance or trying to retrieve windows around the 
hits, take a look at ConcordanceSearcher within: 
https://github.com/tballison/lucene-addons/tree/master/lucene-5317 .

With any luck, I should find the time to get back to the Solr wrapper under 
solr-5411 that Jason Robinson initially developed.  

-Original Message-
From: Test Test [mailto:andymish...@yahoo.fr] 
Sent: Tuesday, April 07, 2015 3:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Trouble GetSpans lucene 4

Re,
origQuery is a Query object, i got it from a ResponseBuilder object, passed by 
the method getQuery.
ResponseBuilder rb // it's method parameterQuery origQuery = rb.getQuery(); 
Thanks for the link, i'll keep you informed.
Regards,Andy


 Le Mardi 7 avril 2015 20h26, "Allison, Timothy B."  a 
écrit :
   

 What class is origQuery?

You will have to do more rewriting/calculation if you're trying to convert a 
PhraseQuery to a SpanNearQuery.

If you dig around in 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor in the Lucene 
highlighter package, you might get some inspiration.

I have a hack for converting regular queries to SpanQueries here (this is 
largely based on WeightedSpanTermExtractor):

https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/spans/SimpleSpanQueryConverter.java
 

-Original Message-
From: Compte Poubelle [mailto:andymish...@yahoo.fr] 
Sent: Tuesday, April 07, 2015 1:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Trouble GetSpans lucene 4

Up.
Anyone?

Best regards.

> On 6 avr. 2015, at 21:32, Test Test  wrote:
> 
> Hi, 
> I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to 
> solr 4.10.2.At the moment, i have a problem about the method 
> "getSpans"."spans.next()" returns always "false".Anyone can helps?
> SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = 
> rb.req.getSearcher();IndexReader reader = 
> searcher.getIndexReader();//AtomicReader wrapper = 
> SlowCompositeReaderWrapper.wrap(reader);Map termContexts = 
> new HashMap();//Spans spans = 
> sQuery.getSpans(wrapper.getContext(), new 
> Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == 
> true) {//}
> 
> Thanks.Regards.
> 

  


How to trace error records during POST?

2015-04-07 Thread Simon Cheng
Good morning,

I used Solr 4.7 to post 186,745 XML files and 186,622 files have been
indexed. That means there are 123 XML files with errors. How can I trace
what these files are?

Thank you in advance,
Simon Cheng.


Deploying multiple ZooKeeper ensemble on a single machine

2015-04-07 Thread Zheng Lin Edwin Yeo
Hi,

I'm using SolrCloud 5.0.0 and ZooKeeper 3.4.6 running on Windows, and now
I'm trying to deploy a multiple ZooKeeper ensemble (3 servers) on a single
machine. These are the settings which I have configured, according to the
Solr Reference Guide.

These files are under \conf\ directory
(C:\Users\edwin\zookeeper-3.4.6\conf)

*zoo.cfg*
tickTime=2000
initLimit=10
syncLimit=5
dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\1
clientPort=2181
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890


*zoo2.cfg*
tickTime=2000
initLimit=10
syncLimit=5
dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\2
clientPort=2181
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890


*zoo3.cfg*
tickTime=2000
initLimit=10
syncLimit=5
dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\3
clientPort=2181
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890


I have also created the myid file at the respective dataDir location for
each of the 3 servers.

- At C:\Users\edwin\zookeeper-3.4.6\1, the myid file contains just the
number 1
- At C:\Users\edwin\zookeeper-3.4.6\2, the myid file contains just the
number 2
- At C:\Users\edwin\zookeeper-3.4.6\3, the myid file contains just the
number 3


However, I'm getting the following error when I run zkServer.cmd

2015-04-08 10:54:17,097 [myid:1] - DEBUG
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@412] - Queue
size: 1
2015-04-08 10:54:17,097 [myid:1] - DEBUG
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@412] - Queue
size: 1
2015-04-08 10:54:17,097 [myid:1] - DEBUG
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@364] - Opening
channel to server 2
2015-04-08 10:54:18,097 [myid:1] - WARN
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot
open channel to 2 at election address localhost/127.0.0.1:3889
java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-04-08 10:54:18,099 [myid:1] - DEBUG
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@364] - Opening
channel to server 3
2015-04-08 10:54:19,099 [myid:1] - WARN
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot
open channel to 3 at election address localhost/127.0.0.1:3890
java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-04-08 10:54:19,101 [myid:1] - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 3200



Is there anything which I could have set wrongly?


Regards,
Edwin


Re: Deploying multiple ZooKeeper ensemble on a single machine

2015-04-07 Thread nutchsolruser
 I have to choose unique client port #’s for each.   Here I can see that you
have same client port for all 3 servers. 

You can refer  this   link. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deploying-multiple-ZooKeeper-ensemble-on-a-single-machine-tp4198272p4198279.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Deploying multiple ZooKeeper ensemble on a single machine

2015-04-07 Thread Shawn Heisey
On 4/7/2015 9:16 PM, Zheng Lin Edwin Yeo wrote:
> I'm using SolrCloud 5.0.0 and ZooKeeper 3.4.6 running on Windows, and now
> I'm trying to deploy a multiple ZooKeeper ensemble (3 servers) on a single
> machine. These are the settings which I have configured, according to the
> Solr Reference Guide.
> 
> These files are under \conf\ directory
> (C:\Users\edwin\zookeeper-3.4.6\conf)
> 
> *zoo.cfg*
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\1
> clientPort=2181
> server.1=localhost:2888:3888
> server.2=localhost:2889:3889
> server.3=localhost:2890:3890



>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot
> open channel to 2 at election address localhost/127.0.0.1:3889
> java.net.ConnectException: Connection refused: connect

The first thing I would suspect when running any network program on a
Windows machine that won't communicate is the Windows firewall, unless
you have either turned off the firewall or you have explicitly
configured an exception in the firewall for the relevant ports.

Your other reply that you got from nutchsolruser does point out that all
three zookeeper configs are using 2181 as the clientPort.  Because these
are all running on the same machine, you must use a different port for
each one.  I'm not sure what happens to subsequent processes after the
first one starts, but they won't work even if they do manage to start.

Thanks,
Shawn