MappingCharFilterFactory and start and end offsets

2015-06-18 Thread Dmitry Kan
Hi,

It looks like MappingCharFilter sets start and end offset to the same
value. Can this be affected on by some setting?

For a string: test $ test2 and mapping "$" => " dollarsign " (we insert
extra space to separate $ into its own token)

we get: http://snag.gy/eJT1H.jpg

Ideally, we would like to have start and end offset respecting the remapped
token. Can this be achieved with settings?

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Extended Dismax Query Parser with AND as default operator

2015-06-18 Thread Dirk Buchhorn
Hello,

I have a question to the extended dismax query parser. If the default operator 
is changed to AND (q.op=AND) then the search results seems to be incorrect. I 
will explain it on some examples. For this test I use solr v5.1 and the tika 
core from the example directory.
== Preparation ==
Add the following lines to the schema.xml file
  
  id
Change the field "text" to stored="true"
Remove the multiValued attribute from the title and text field (we don't need 
multivaled fields in our test)

Add test data (use curl or fiddler)
Url:http://localhost:8983/solr/tika/update/json?commit=true
Header: Content-type: application/json
[
  {"id":"1", "title":"green", "author":"Jon", "text":"blue"},
  {"id":"2", "title":"green", "author":"Jon Jessie", "text":"red"},
  {"id":"3", "title":"yellow", "author":"Jessie", "text":"blue"},
  {"id":"4", "title":"green", "author":"Jessie", "text":"blue"},
  {"id":"5", "title":"blue", "author":"Jon", "text":"yellow"},
  {"id":"6", "title":"red", "author":"Jon", "text":"green"}
]

== Test ==
The following parameter are always set.
default operator is AND: q.op=AND
use the extended dismax query parser: defType=edismax
set the default query fields to title and text: qf=title text
sort: id asc

=== #1 test ===
q=red green
response:
{ "numFound":2,"start":0,
  "docs":[
{"id":"2","title":"green","author":"Jon Jessie","text":"red"},
{"id":"6","title":"red","author":"Jon","text":"green"}]
}
parsedquery_toString: "+(((text:green | title:green) (text:red | title:red))~2)"

This test works as expected.

=== #2 test ===
We use a group
q=(red green)
Same response as test one.
parsedquery_toString: "+(((text:green | title:green) (text:red | title:red))~2)"

This test works as expected.

=== #3 test ===
q=green red author:Jessie
response:
{ "numFound":1,"start":0,
  "docs":[{"id":"2","title":"green","author":"Jon Jessie","text":"red"}]
}
parsedquery_toString: "+(((text:green | title:green) (text:red | title:red) 
author:jessie)~3)"

This test works as expected.

=== #4 test ===
q=(green red) author:Jessie
response:
{ "numFound":2,"start":0,
  "docs":[
{"id":"2","title":"green","author":"Jon Jessie","text":"red"},
{"id":"4","title":"green","author":"Jessie","text":"blue"}]
}
parsedquery_toString: "+text:green | title:green) (text:red | title:red)) 
author:jessie)~2)"

The same result as the 3th test was expected. Why no AND is used for the query 
group?

=== #5 test ===
q=(+green +red) author:Jessie
response:
{ "numFound":4,"start":0,
  "docs":[
{"id":"2","title":"green","author":"Jon Jessie","text":"red"},
{"id":"3","title":"yellow","author":"Jessie","text":"blue"},
{"id":"4","title":"green","author":"Jessie","text":"blue"},
{"id":"6","title":"red","author":"Jon","text":"green"}]
}
parsedquery_toString: "+((+(text:green | title:green) +(text:red | title:red)) 
author:jessie)"

Now AND is used for the group but the author is concatenated with OR. Why?

=== #6 test ===
q=(+green +red) +author:Jessie
response:
{ "numFound":3,"start":0,
  "docs":[
{"id":"2","title":"green","author":"Jon Jessie","text":"red"},
{"id":"3","title":"yellow","author":"Jessie","text":"blue"},
{"id":"4","title":"green","author":"Jessie","text":"blue"}]
}
parsedquery_toString: "+((+(text:green | title:green) +(text:red | title:red)) 
+author:jessie)"

Still not the expected result.

=== #7 test ===
q=+(+green +red) +author:Jessie
response:
{ "numFound":1,"start":0,
  "docs":[{"id":"2","title":"green","author":"Jon Jessie","text":"red"}]
}
parsedquery_toString: "+(+(+(text:green | title:green) +(text:red | title:red)) 
+author:jessie)"

Now the result is ok. But if all operators must be given then q.op=AND is 
useless.

=== #8 test ===
q=green author:(Jon Jessie)
Found four results, expected are one. The query must changed to '+green 
+author:(+Jon +Jessie)' to get the expected result.

Is this a bug in the extended dismax parser or what is the reason for not 
consequently applying q.op=AND to the query expression?

Kind regards

Dirk Buchhorn


Re: Contribute the Customized Phonetic Filter to Apache Solr

2015-06-18 Thread davidphilip cherian
Hi Aman,

https://wiki.apache.org/solr/HowToContribute

HTH

On Thu, Jun 18, 2015 at 12:11 PM, Aman Tandon 
wrote:

> Hi,
>
> We created the new phonetic filter, It is working great on our products,
> mostly of our suppliers are Indian, it is quite helpful for us to provide
> the exact result e.g.
>
> 1) rikshaw, still able to find the suppliers of rickshaw
> 2) telefone, still able to find the suppliers of telephone
>
> We also analyzed our search satisfaction feedback, it improved by 13% (54%
> -> 67%) just after implementing the same.
>
> And we want to contribute the same to solr, So how could I do it.
>
> With Regards
> Aman Tandon
>


facet query is not working

2015-06-18 Thread Midas A
http://localhost:8983/solr/col/select?q=*:*&sfield=geolocation&pt=26.697,83.1876&facet.query={!frange%20l=0%20u=50}geodist()&facet.query={!frange%20l=50.001%20u=100}geodist()&&wt=json


I am not getting facet results .

schema:
 <
dynamicField name="*_coordinate" type="tdouble" indexed="true" stored=
"false"/>


Re: facet query is not working

2015-06-18 Thread Mikhail Khludnev
isn't facet=true necessary?

On Thu, Jun 18, 2015 at 12:03 PM, Midas A  wrote:

>
> http://localhost:8983/solr/col/select?q=*:*&sfield=geolocation&pt=26.697,83.1876&facet.query={!frange%20l=0%20u=50}geodist()&facet.query={!frange%20l=50.001%20u=100}geodist()&&wt=json
>
>
> I am not getting facet results .
>
> schema:
>  <
> dynamicField name="*_coordinate" type="tdouble" indexed="true" stored=
> "false"/>
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Duplicate suggestions

2015-06-18 Thread jon kerling
Hi,
I am using solr 5.1. I'm getting duplicate suggestions when using my 
solrsuggester. I'm using AnalyzingInfixLookupFactory & 
DocumentDictionaryFactory. can i configure it to suggest me only different 
suggestions?

here are details about my configuration:

from schema.xml:
   
  mySuggester1a
  AnalyzingInfixLookupFactory  
  suggester_infix_dir1a
  true
  DocumentDictionaryFactory 
  f1
  weightField
  text_general
  false
    

  
  mySuggester2a
  AnalyzingInfixLookupFactory  
  suggester_infix_dir2a
  true
  DocumentDictionaryFactory 
  f2
  weightField
  text_general
  false
    
  

  
    
  true
  6
  mySuggester1a
  mySuggester2a
    
    
  suggest
    
   

from schema.xml:

** weightField is ignored by me, I'm not adding any values in it at all.

document example:    2015-04-01    12:06:00    BOOO        
7.52.11.212    7.52.11.213    52358424
After i build the suggester I'm trying to get suggests like here:
http://localhost/solr/core1/suggest?/suggest=true&suggest.q=12



   
  0
  62
   
   
  
 
6

   
  18:34:12
  0
  
   
   
  18:34:12
  0
  
   
   
  18:35:12
  0
  
   
   
  18:35:12
  0
  
   
   
  18:35:12
  0
  
   
   
  12:06:02
  0
  
   

 
  
  
 
0

 
  
   


I would like to get this kind of suggester response ( no duplicates ):



   
  0
  62
   
   
  
 
3

   
  18:34:12
  0
  
   
   
  18:35:12
  0
  
   
   
  12:06:02
  0
  
   

 
  
  
 
0

 
  
   
Thank you.


Re: facet query is not working

2015-06-18 Thread Alessandro Benedetti
If he has not put any appends or invariant in the request handler,
facet=true is mandatory to activate the facets.

I haven't tried those specific facet queries .

I hope the problem was not simply he didn't activate faceting ...

2015-06-18 10:35 GMT+01:00 Mikhail Khludnev :

> isn't facet=true necessary?
>
> On Thu, Jun 18, 2015 at 12:03 PM, Midas A  wrote:
>
> >
> >
> http://localhost:8983/solr/col/select?q=*:*&sfield=geolocation&pt=26.697,83.1876&facet.query={!frange%20l=0%20u=50}geodist()&facet.query={!frange%20l=50.001%20u=100}geodist()&&wt=json
> >
> >
> > I am not getting facet results .
> >
> > schema:
> > 
> <
> > dynamicField name="*_coordinate" type="tdouble" indexed="true" stored=
> > "false"/>
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Help: Problem in customized token filter

2015-06-18 Thread Aman Tandon
Hi,

I created a *token concat filter* to concat all the tokens from token
stream. It creates the concatenated token as expected.

But when I am posting the xml containing more than 30,000 documents, then
only first document is having the data of that field.

*Schema:*

* required="false" omitNorms="false" multiValued="false" />*






> **
> *  *
> **
> **
> * generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
> **
> * outputUnigrams="true" tokenSeparator=""/>*
> * language="English" protected="protwords.txt"/>*
> * class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
> * synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
> expand="true"/>*
> *  *
> *  *
> **
> * ignoreCase="true" expand="true"/>*
> * words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />*
> * generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
> **
> * language="English" protected="protwords.txt"/>*
> * class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
> *  ***


Please help me, The code for the filter is as follows, please take a look.

Here is the picture of what filter is doing


The code of concat filter is :

*package com.xyz.analysis.concat;*
>
> *import java.io.IOException;*
>
>
>> *import org.apache.lucene.analysis.TokenFilter;*
>
> *import org.apache.lucene.analysis.TokenStream;*
>
> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
>
> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
>
> *import
>> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
>
> *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
>
>
>> *public class ConcatenateWordsFilter extends TokenFilter {*
>
>
>> *  private CharTermAttribute charTermAttribute =
>> addAttribute(CharTermAttribute.class);*
>
> *  private OffsetAttribute offsetAttribute =
>> addAttribute(OffsetAttribute.class);*
>
> *  PositionIncrementAttribute posIncr =
>> addAttribute(PositionIncrementAttribute.class);*
>
> *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
>
>
>> *  private StringBuilder stringBuilder = new StringBuilder();*
>
> *  private boolean exhausted = false;*
>
>
>> *  /***
>
> *   * Creates a new ConcatenateWordsFilter*
>
> *   * @param input TokenStream that will be filtered*
>
> *   */*
>
> *  public ConcatenateWordsFilter(TokenStream input) {*
>
> *super(input);*
>
> *  }*
>
>
>> *  /***
>
> *   * {@inheritDoc}*
>
> *   */*
>
> *  @Override*
>
> *  public final boolean incrementToken() throws IOException {*
>
> *while (!exhausted && input.incrementToken()) {*
>
> *  char terms[] = charTermAttribute.buffer();*
>
> *  int termLength = charTermAttribute.length();*
>
> *  if(typeAtrr.type().equals("")){*
>
> * stringBuilder.append(terms, 0, termLength);*
>
> *  }*
>
> *  charTermAttribute.copyBuffer(terms, 0, termLength);*
>
> *  return true;*
>
> *}*
>
>
>> *if (!exhausted) {*
>
> *  exhausted = true;*
>
> *  String sb = stringBuilder.toString();*
>
> *  System.err.println("The Data got is "+sb);*
>
> *  int sbLength = sb.length();*
>
> *  //posIncr.setPositionIncrement(0);*
>
> *  charTermAttribute.copyBuffer(sb.toCharArray(), 0, sbLength);*
>
> *  offsetAttribute.setOffset(offsetAttribute.startOffset(),
>> offsetAttribute.startOffset()+sbLength);*
>
> *  stringBuilder.setLength(0);*
>
> *  //typeAtrr.setType("CONCATENATED");*
>
> *  return true;*
>
> *}*
>
> *return false;*
>
> *  }*
>
> *}*
>
>

With Regards
Aman Tandon


RE: XPathentity processor on CLOB field

2015-06-18 Thread Pattabiraman, Meenakshisundaram

This is the error cause reported.  I also see that it has been reported earlier 
(http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201103.mbox/%3cd0f0d26c-3ac0-4982-9e2b-09dc96937...@535consulting.com%3E)
 but could not find a solution.


I am nesting the FieldReaderDataSource within the Entity definition that has a 
CLOB field. With this it fails only after transforming the clob. 
If I do not nest, I get this error when the FieldReaderDataSource is 
initialized thus failing even before the SQL is executed.
Either case, the error is happening at the same place. 


Caused by: java.sql.SQLException: SQL statement to execute cannot be empty or 
null
at 
oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
at 
oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:112)
at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:173)
at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:229)
at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:403)
at oracle.jdbc.driver.OracleSql.initialize(OracleSql.java:110)
at 
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1761)
at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1739)
at 
oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:298)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:314)
... 14 more






Pattabi Meenakshisundaram



-Original Message-
From: Pattabiraman, Meenakshisundaram 
[mailto:pattabiraman.meenakshisunda...@aig.com] 
Sent: Wednesday, June 17, 2015 9:33 PM
To: 'solr-user@lucene.apache.org'
Subject: XPathentity processor on CLOB field

My requirement is to read the XML from a CLOB field and parse it to get the 
entity.

The data config is as shown below. I am trying to map two fields 'event' and 
'policyNumber' for the entity 'catreport'.


 




 






I am getting this error


Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to execute query: null Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:321)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:278)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:53)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224)

I see that the Clob is getting converted to String correctly and the log has 
this entry where xml is printed Exception while processing: input document : 
SolrInputDocument(fields: [RESPONSE_XML=

Solr 4.10.4: Could not create instance of 'SolrInputDocument'

2015-06-18 Thread Paul Revere
Our web site is created using PaperThin's CommonSpot CMS in a ColdFusion 10 and 
Windows Server 2008 R2 environment, using Apache Solr 4.10.4 instead of CF 
Solr. We create collections through the CMS interface and they do appear in 
both the CMS and the Solr dashboard when created. However, when we try indexing 
our collections through the CMS interface, our CMS error logs show the entry 
'Could not create instance of 'SolrInputDocument'' for each member of the 
collection. This is not a fatal error, as the indexing appears to cycle through 
all members, but each member "errors out" with log entries for each member.  
I've Googled this error message without success. What might this error message 
indicate please??
Paul



Suggester for text array

2015-06-18 Thread Advait Suhas Pandit
Hi,

We run an ecommerce company and would like to use SOLR for our product database 
searches.

We have products along with the categories that they belong to. In case the 
product belongs to more than 1 category, we have a comma separated field of 
categories. 

How do we do auto complete on -
1. Multiple fields - product name, category
2. On categories which are not first in the list in the case of the comma 
separated values
E.g. If a product belongs to Hair Care Products, Personal Care Products how do 
we ensure that the suggester will even suggest if someone starts typing in 
Personal Care. Also, how do we show only Personal Care in the auto complete and 
not as Hair Care Products, Personal Care Products.

Thanks,
Advait



Re: Suggester for text array

2015-06-18 Thread Alessandro Benedetti
Hi Advait ,
First of all I suggest you to study Solr a little bit [1]. because your
requirements are actually really simple :

1) You can simply use more than one suggest dictionary if you care to keep
the suggestions separated ( keeping if a term is coming from the name or
from the the category)

if you don't care to keep them separated, simply use a copy field to copy
both the fields in.

2) Solr supports multi valued fields since the beginning.
I really suggest you to split by comma in your indexer application,
providing to Solr the multi values already separated.
Because they are multi values for the category field ( so it's nor analysis
responsibility to split them)

Cheers

[1]
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

2015-06-18 13:43 GMT+01:00 Advait Suhas Pandit :

> Hi,
>
> We run an ecommerce company and would like to use SOLR for our product
> database searches.
>
> We have products along with the categories that they belong to. In case
> the product belongs to more than 1 category, we have a comma separated
> field of categories.
>
> How do we do auto complete on -
> 1. Multiple fields - product name, category
> 2. On categories which are not first in the list in the case of the comma
> separated values
> E.g. If a product belongs to Hair Care Products, Personal Care Products
> how do we ensure that the suggester will even suggest if someone starts
> typing in Personal Care. Also, how do we show only Personal Care in the
> auto complete and not as Hair Care Products, Personal Care Products.
>
> Thanks,
> Advait
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: XPathentity processor on CLOB field

2015-06-18 Thread Pattabiraman, Meenakshisundaram

I got this working - the errors were due to a mistake in letter case - was 
using 'datasource' instead of 'dataSource'  in the entity that was using 
XpathEntityProcessor. Hence this was being ignored and was inheriting the JDBC 
Datasource of the parent entity.

I am pasting the complete data-config for anyone encountering the same problem.







 
  




   






-Original Message-
From: Pattabiraman, Meenakshisundaram 
[mailto:pattabiraman.meenakshisunda...@aig.com] 
Sent: Thursday, June 18, 2015 7:51 AM
To: solr-user@lucene.apache.org
Subject: RE: XPathentity processor on CLOB field


This is the error cause reported.  I also see that it has been reported earlier 
(http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201103.mbox/%3cd0f0d26c-3ac0-4982-9e2b-09dc96937...@535consulting.com%3E)
 but could not find a solution.


I am nesting the FieldReaderDataSource within the Entity definition that has a 
CLOB field. With this it fails only after transforming the clob. 
If I do not nest, I get this error when the FieldReaderDataSource is 
initialized thus failing even before the SQL is executed.
Either case, the error is happening at the same place. 


Caused by: java.sql.SQLException: SQL statement to execute cannot be empty or 
null
at 
oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
at 
oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:112)
at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:173)
at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:229)
at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:403)
at oracle.jdbc.driver.OracleSql.initialize(OracleSql.java:110)
at 
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1761)
at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1739)
at 
oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:298)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:314)
... 14 more






Pattabi Meenakshisundaram



-Original Message-
From: Pattabiraman, Meenakshisundaram 
[mailto:pattabiraman.meenakshisunda...@aig.com] 
Sent: Wednesday, June 17, 2015 9:33 PM
To: 'solr-user@lucene.apache.org'
Subject: XPathentity processor on CLOB field

My requirement is to read the XML from a CLOB field and parse it to get the 
entity.

The data config is as shown below. I am trying to map two fields 'event' and 
'policyNumber' for the entity 'catreport'.


 




 






I am getting this error


Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to execute query: null Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:321)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:278)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:53)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224)

I see that the Clob is getting converted to String correctly and the log has 
this entry where xml is printed Exception while processing: input document : 
SolrInputDocument(fields: [RESPONSE_XML=

Re: Duplicate suggestions

2015-06-18 Thread Alessandro Benedetti
I had the very same issue,
because I had some document with a redundant field, and I was using the
Infix Suggester as well.

Because the Infix Suggester returns the whole field content, if you have
duplicated fields across your docs, you will se duplicate suggestions.

Do you have any intermediate API in your application ? In the case you can
modify the API using a Collection that prevent duplicates to contain and
return the suggestions.

In the case you want it directly from Solr I assume it is a "bug" .
I think the suggestions should return by default no duplicates ( because
the only information returned is the  field value and not the document id.
Anyway could be a nice parameter to get better suggestions ( sending the
avoidDuplicate parameter to the suggester 0.

Cheers

2015-06-18 10:48 GMT+01:00 jon kerling :

> Hi,
> I am using solr 5.1. I'm getting duplicate suggestions when using my
> solrsuggester. I'm using AnalyzingInfixLookupFactory &
> DocumentDictionaryFactory. can i configure it to suggest me only different
> suggestions?
>
> here are details about my configuration:
>
> from schema.xml: class="solr.SuggestComponent">
>
>   mySuggester1a
>   AnalyzingInfixLookupFactory
>   suggester_infix_dir1a
>   true
>   DocumentDictionaryFactory
>   f1
>   weightField
>   text_general
>   false
> 
>
>   
>   mySuggester2a
>   AnalyzingInfixLookupFactory
>   suggester_infix_dir2a
>   true
>   DocumentDictionaryFactory
>   f2
>   weightField
>   text_general
>   false
> 
>   
>
>startup="lazy">
> 
>   true
>   6
>   mySuggester1a
>   mySuggester2a
> 
> 
>   suggest
> 
>   
>
> from schema.xml: stored="true" required="false" multiValued="false" />
>  required="false" multiValued="false" /> type="float"  indexed="true"  stored="true"/>
> ** weightField is ignored by me, I'm not adding any values in it at all.
>
> document example:2015-04-01 name="f2">12:06:00BOOO name="f4"/>7.52.11.212 name="f6">7.52.11.21352358424
> After i build the suggester I'm trying to get suggests like here:
> http://localhost/solr/core1/suggest?/suggest=true&suggest.q=12
>
> 
> 
>
>   0
>   62
>
>
>   
>  
> 6
> 
>
>   18:34:12
>   0
>   
>
>
>   18:34:12
>   0
>   
>
>
>   18:35:12
>   0
>   
>
>
>   18:35:12
>   0
>   
>
>
>   18:35:12
>   0
>   
>
>
>   12:06:02
>   0
>   
>
> 
>  
>   
>   
>  
> 0
> 
>  
>   
>
> 
>
> I would like to get this kind of suggester response ( no duplicates ):
>
> 
> 
>
>   0
>   62
>
>
>   
>  
> 3
> 
>
>   18:34:12
>   0
>   
>
>
>   18:35:12
>   0
>   
>
>
>   12:06:02
>   0
>   
>
> 
>  
>   
>   
>  
> 0
> 
>  
>   
>
> Thank you.
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Managed schema and schema.xml file

2015-06-18 Thread Steven White
Hi everyone,

I just upgraded from 5.1.0 to 5.2.1 and noticed a behavior change which I
consider a bug.

In my solrconfig.xml, I have the following:

   
   
 true
 my-schema.xml
   

In 5.1.0 (and maybe prior ver.?) when I enable managed schema per the
above, the existing schema.xml file is left as-is, a copy of it is created
as schema.xml.bak and a new one is created based on the name I gave it
"my-schema.xml".

With 5.2.1 schema.xml is renamed to schema.xml.bak and my-schema.xml is
created (e.g.: schema.xml is deleted).

Is this an expected behavior or is this a bug?  I see it as a bug because
if I revert the change I made in my solrconfig.xml back to (i.e.: not
managed schema any more):

  

Solr will not restart because it cannot find schema.xml

Thanks

Steve


Solr 5.2.1 on Solaris

2015-06-18 Thread Bence Vass
Hello,

Is there any documentation on how to start Solr 5.2.1 on Solaris (Solaris
10)? The script (solr start) doesn't work out of the box, is anyone running
Solaris 5.x on Solaris?

- Thanks


Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden
Hello, 

I'm using Solr to pull information from a Database and a file system
simultaneously. The database houses the file path of the file in the file
system. It pulls all of those just fine. In fact, it combines the metadata
from the database and the metadata from the file system great. The problem
occurs when I try to index the text. The error does not occur at the point
when it tries to add the field "text" to the document. The error occurs when
I try to submit that document to Solr. It gives me this error, 


org.apache.solr.common.SolrException: Exception writing document id
/some/filepath to the index; possible analysis error. 


This is how the field is defined in schema:

 

and this is the code I use to add it to the document:

File file = new File(filepath); 

ContentHandler textHandler = new BodyContentHandler(); 

Metadata metadata = new Metadata();

ParseContext context = new ParseContext();

Input Stream = new FileInputStream(file); 

try{

 autoParser.parse(input, textHandler, metadata, context); 

} catch (Exception e) { 

  //prints out error message

 continue;

} 

if(textHandler != null){

  doc.addField("text",textHandler.toString()); 

} 

try{
 
server.add(doc); 

} catch (Exception ex){ 

 //logmessage

 continue; 

} 

I think it has something to do with how the field is defined in schema but I
don't know. All the files that get error messages are PDF's if that helps.
There are .doc s in the file system but they don't error out. 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5.2.1 on Solaris

2015-06-18 Thread Shawn Heisey
On 6/18/2015 8:05 AM, Bence Vass wrote:
> Is there any documentation on how to start Solr 5.2.1 on Solaris (Solaris
> 10)? The script (solr start) doesn't work out of the box, is anyone running
> Solaris 5.x on Solaris?

I think the biggest problem on Solaris will be the options used on the
ps command.  The ps usage in the solr script appears to be formulated
for the version of ps found on Linux and other free UNIX-like operating
systems, and I know from experience that those options don't work on
Solaris.

The solr script also uses lsof, which I don't think is normally
installed on Solaris.  I'm not sure whether lsof is actually required,
or if the script will work without it.

I won't have time right away, but I will be able to look into this at
some point in the next few days and come up with a patch to make the
script work on Solaris.  If anybody else has the time and skill to do so
immediately, feel free to step in.

Thanks,
Shawn



Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Alessandro Benedetti
We would like more information, but the first thing I notice is that hardly
would make any sense to use a "string" type for a file content.

Can you give more details about the exception ?
Have you debugged a little bit ?
How does the solr input document look before it is sent to Solr ?

Furthermore please give us all the stack trace. THe message you post is
almost useless without all the details ...

2015-06-18 15:39 GMT+01:00 Paden :

> Hello,
>
> I'm using Solr to pull information from a Database and a file system
> simultaneously. The database houses the file path of the file in the file
> system. It pulls all of those just fine. In fact, it combines the metadata
> from the database and the metadata from the file system great. The problem
> occurs when I try to index the text. The error does not occur at the point
> when it tries to add the field "text" to the document. The error occurs
> when
> I try to submit that document to Solr. It gives me this error,
>
>
> org.apache.solr.common.SolrException: Exception writing document id
> /some/filepath to the index; possible analysis error.
>
>
> This is how the field is defined in schema:
>
>  required="false" multiValued="true" />
>
> and this is the code I use to add it to the document:
>
> File file = new File(filepath);
>
> ContentHandler textHandler = new BodyContentHandler();
>
> Metadata metadata = new Metadata();
>
> ParseContext context = new ParseContext();
>
> Input Stream = new FileInputStream(file);
>
> try{
>
>  autoParser.parse(input, textHandler, metadata, context);
>
> } catch (Exception e) {
>
>   //prints out error message
>
>  continue;
>
> }
>
> if(textHandler != null){
>
>   doc.addField("text",textHandler.toString());
>
> }
>
> try{
>
> server.add(doc);
>
> } catch (Exception ex){
>
>  //logmessage
>
>  continue;
>
> }
>
> I think it has something to do with how the field is defined in schema but
> I
> don't know. All the files that get error messages are PDF's if that helps.
> There are .doc s in the file system but they don't error out.
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Managed schema and schema.xml file

2015-06-18 Thread Shawn Heisey
On 6/18/2015 8:10 AM, Steven White wrote:
> In 5.1.0 (and maybe prior ver.?) when I enable managed schema per the
> above, the existing schema.xml file is left as-is, a copy of it is created
> as schema.xml.bak and a new one is created based on the name I gave it
> "my-schema.xml".
> 
> With 5.2.1 schema.xml is renamed to schema.xml.bak and my-schema.xml is
> created (e.g.: schema.xml is deleted).
> 
> Is this an expected behavior or is this a bug?  I see it as a bug because
> if I revert the change I made in my solrconfig.xml back to (i.e.: not
> managed schema any more):
> 
>   
> 
> Solr will not restart because it cannot find schema.xml

As I understand it, the managed schema system will complain if it sees a
file named schema.xml -- having both the managed schema file and
schema.xml is confusing, so if the classic file exists, it's an error.

Because of that, if you switch your config from managed to classic
schema, you must also create the schema.xml file (or rename the managed
version).  Neither factory is aware of the other, so there's no
automated way to handle that.

Thanks,
Shawn



Re: Help: Problem in customized token filter

2015-06-18 Thread Aman Tandon
Please help, what wrong I am doing here. please guide me.

With Regards
Aman Tandon

On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
wrote:

> Hi,
>
> I created a *token concat filter* to concat all the tokens from token
> stream. It creates the concatenated token as expected.
>
> But when I am posting the xml containing more than 30,000 documents, then
> only first document is having the data of that field.
>
> *Schema:*
>
> *> required="false" omitNorms="false" multiValued="false" />*
>
>
>
>
>
>
>> *> positionIncrementGap="100">*
>> *  *
>> **
>> **
>> *> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
>> **
>> *> outputUnigrams="true" tokenSeparator=""/>*
>> *> language="English" protected="protwords.txt"/>*
>> *> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>> *> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
>> expand="true"/>*
>> *  *
>> *  *
>> **
>> *> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
>> *> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />*
>> *> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
>> **
>> *> language="English" protected="protwords.txt"/>*
>> *> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>> *  ***
>
>
> Please help me, The code for the filter is as follows, please take a look.
>
> Here is the picture of what filter is doing
> 
>
> The code of concat filter is :
>
> *package com.xyz.analysis.concat;*
>>
>> *import java.io.IOException;*
>>
>>
>>> *import org.apache.lucene.analysis.TokenFilter;*
>>
>> *import org.apache.lucene.analysis.TokenStream;*
>>
>> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
>>
>> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
>>
>> *import
>>> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
>>
>> *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
>>
>>
>>> *public class ConcatenateWordsFilter extends TokenFilter {*
>>
>>
>>> *  private CharTermAttribute charTermAttribute =
>>> addAttribute(CharTermAttribute.class);*
>>
>> *  private OffsetAttribute offsetAttribute =
>>> addAttribute(OffsetAttribute.class);*
>>
>> *  PositionIncrementAttribute posIncr =
>>> addAttribute(PositionIncrementAttribute.class);*
>>
>> *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
>>
>>
>>> *  private StringBuilder stringBuilder = new StringBuilder();*
>>
>> *  private boolean exhausted = false;*
>>
>>
>>> *  /***
>>
>> *   * Creates a new ConcatenateWordsFilter*
>>
>> *   * @param input TokenStream that will be filtered*
>>
>> *   */*
>>
>> *  public ConcatenateWordsFilter(TokenStream input) {*
>>
>> *super(input);*
>>
>> *  }*
>>
>>
>>> *  /***
>>
>> *   * {@inheritDoc}*
>>
>> *   */*
>>
>> *  @Override*
>>
>> *  public final boolean incrementToken() throws IOException {*
>>
>> *while (!exhausted && input.incrementToken()) {*
>>
>> *  char terms[] = charTermAttribute.buffer();*
>>
>> *  int termLength = charTermAttribute.length();*
>>
>> *  if(typeAtrr.type().equals("")){*
>>
>> * stringBuilder.append(terms, 0, termLength);*
>>
>> *  }*
>>
>> *  charTermAttribute.copyBuffer(terms, 0, termLength);*
>>
>> *  return true;*
>>
>> *}*
>>
>>
>>> *if (!exhausted) {*
>>
>> *  exhausted = true;*
>>
>> *  String sb = stringBuilder.toString();*
>>
>> *  System.err.println("The Data got is "+sb);*
>>
>> *  int sbLength = sb.length();*
>>
>> *  //posIncr.setPositionIncrement(0);*
>>
>> *  charTermAttribute.copyBuffer(sb.toCharArray(), 0, sbLength);*
>>
>> *  offsetAttribute.setOffset(offsetAttribute.startOffset(),
>>> offsetAttribute.startOffset()+sbLength);*
>>
>> *  stringBuilder.setLength(0);*
>>
>> *  //typeAtrr.setType("CONCATENATED");*
>>
>> *  return true;*
>>
>> *}*
>>
>> *return false;*
>>
>> *  }*
>>
>> *}*
>>
>>
>
> With Regards
> Aman Tandon
>


Solr Logging

2015-06-18 Thread rbkumar88
Hi,

I want to log Solr search queries/response time and Solr indexing log
separately in different set of log files.
Is there any convenient framework/way to do it.

Thanks
Bharath



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Logging-tp4212730.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden
USING Solr 5.1.0

This is the schema file





  
   
 
   

   
 
 
 
 


 
   
   

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


   

   
   
   

 
   
   
   
   
   

   

   
   

   



 filepath

















 


  





 




  

  



  




  
  




  



  


 






  
  








  


  

  

  





  
  







  


   

  









  


   

  




  
  




  


   

  
  

  

  

   

  




  


  


  


   


  


   


   



  




ENTIRE STACK TRACE

/home/paden/Documents/LWP_Files/BIGDATA/5974412.pdf
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/Testcore3: Exception writing
document id /home/paden/Documents/LWP_Files/BIGDATA/5974412.pdf to the
index; possible analysis error.
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:556)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:233)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:225)
at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:174)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:139)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:153)
at TikaSqlIndexer.Index(TikaSqlIndexer.java:238)
at TikaSqlIndexer.main(TikaSqlIndexer.java:85)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704p4212736.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dedupe in a SolrCloud

2015-06-18 Thread Markus Mirsberger
Thanks :) 
exactly what I was looking for...as I only need to create the signature once 
this works perfect for me:)

Cheers,
Markus 


Sent from my iPhone

> On 17.06.2015, at 20:32, Shalin Shekhar Mangar  wrote:
> 
> Comments inline:
> 
> On Wed, Jun 17, 2015 at 3:18 PM, Markus.Mirsberger
>  wrote:
>> Hi,
>> 
>> I am trying to use the dedupe feature to detect and mark near duplicate
>> content in my collections.
>> I dont want to prevent duplicate content. I woud like to detect it and keep
>> it for further processing. Thats why Im using an extra field and not the
>> documents unique field.
>> 
>> Here is how I added it to the solrConfig.xml :
>> 
>> 
>>   
>> fill_signature
>>   
>> 
>> 
>> > processor="signature">
>>
>> 
>> 
>> > name="signature">
>> true
>> signature
>> false
>> content
>> > name="signatureClass">solr.processor.TextProfileSignature
>> .2
>> 3
>> 
>> 
>> When I initially add the documents to the cloud everything works as expected
>> . the documents are added and the signature will be created and
>> added.perfect:)
>> The problem occours when I want to update an exisiting document. In that
>> case the update.chain=fill_signature parameter will of course be set too and
>> I get a bad request error.
>> 
>> I found this solr issue: https://issues.apache.org/jira/browse/SOLR-3473
>> 
>> Is it that problem I am running into?
> 
> You haven't pasted the complete error response so I am guessing a bit
> here. It is possible that you are running into the same problem i.e.
> the "signature" is being calculated again and the signature field not
> multi-valued, causes an error.
> 
>> Is it somehow possible to add parameters or set a specific update Handler
>> when Im adding documents to the cloud using solrJ?
> 
> Yes, any custom parameter can be added to a SolrJ request. There is a
> setParam(String param, String value) method available in
> AbstractUpdateRequest which can be used to set a custom update.chain
> for each SolrJ request.
> 
>> In that case I could ether set the update.chain manually and remove it from
>> the request handler or write a second request Handler which I only use if I
>> want set the signature field.
>> I know I can do that manually when Im using eg curl but is it also possible
>> with SolrJ? :)
>> 
>> 
>> Thanks,
>> Markus
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.


[ANN] Solr in Action book release (Solr 4.7)

2015-06-18 Thread Roy Silva



Sent from my iPhone


Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden
Just rolling out a little bit more information as it is coming. I changed the
field type in the schema to text_general and that didn't change a thing. 

Another thing is that it's consistently submitting/not submitting the same
documents. I will run over it one time and it won't index a set of
documents. When I clear the index and run the program again it
submits/doesn't submit the same documents. 

And it will index certain PDF's it just won't index others. Which is weird
because I printed the strings that are submitted to Solr and the ones that
get submitted are really similar to the ones that aren't submitted. 

I can't post the actual strings for sensitivity reasons. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704p4212757.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to do a Data sharding for data in a database table

2015-06-18 Thread wwang525
Hi,

We probably would like to shard the data since the response time for
demanding queries at > 10M records is getting > 1 second in a single request
scenario.

I have not done any data sharding before. What are some recommended way to
do data sharding. For example, may be by a criteria with a list of specific
values?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MappingCharFilterFactory and start and end offsets

2015-06-18 Thread Steve Rowe
Hi Dmitry,

It’s weird that start and end offsets are the same - what do you see for the 
start/end of ‘$’, i.e. if you take out MCFF?  (I think it should be start:5, 
end:6.)

As far as offsets “respecting the remapped token”, are you asking for offsets 
to be set as if ‘dollarsign' were part of the original text?  If so, there is 
no setting that would do that - the intent is for offsets to map to the 
*original* text.  You can work around this by performing the substitution prior 
to Solr analysis, e.g. in an update processor like RegexReplaceProcessorFactory.

Steve
www.lucidworks.com

> On Jun 18, 2015, at 3:07 AM, Dmitry Kan  wrote:
> 
> Hi,
> 
> It looks like MappingCharFilter sets start and end offset to the same
> value. Can this be affected on by some setting?
> 
> For a string: test $ test2 and mapping "$" => " dollarsign " (we insert
> extra space to separate $ into its own token)
> 
> we get: http://snag.gy/eJT1H.jpg
> 
> Ideally, we would like to have start and end offset respecting the remapped
> token. Can this be achieved with settings?
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info



Re: How to do a Data sharding for data in a database table

2015-06-18 Thread Jack Krupansky
10M doesn't sound too demanding.

How complex are your queries?

How complex is your data - like number of fields and size, like very large
documents?

Are you sure you have enough RAM to fully cache your index?

Are your queries compute-bound or I/O bound? If I/O-bound, get more RAM. If
compute-bound, sharding may help, but have to examine query complexity
first.


-- Jack Krupansky

On Thu, Jun 18, 2015 at 2:05 PM, wwang525  wrote:

> Hi,
>
> We probably would like to shard the data since the response time for
> demanding queries at > 10M records is getting > 1 second in a single
> request
> scenario.
>
> I have not done any data sharding before. What are some recommended way to
> do data sharding. For example, may be by a criteria with a list of specific
> values?
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Collections API and adding new boxes

2015-06-18 Thread Jim . Musil
Hi,

Let's say I have a zookeeper ensemble with several Solr nodes connected to it. 
I've created a collection successfully and all is well.

What happens when I want to add another solr node?

I've tried spinning one up and connecting it to zookeeper, but the new node 
doesn't "join" the collection.  What's the expected next step?

This is Solr 5.1.

Thanks!
Jim Musil


Re: Collections API and adding new boxes

2015-06-18 Thread Shawn Heisey
On 6/18/2015 3:23 PM, Jim.Musil wrote:
> Let's say I have a zookeeper ensemble with several Solr nodes connected to 
> it. I've created a collection successfully and all is well.
>
> What happens when I want to add another solr node?
>
> I've tried spinning one up and connecting it to zookeeper, but the new node 
> doesn't "join" the collection.  What's the expected next step?
>
> This is Solr 5.1.

The new node will be part of the cloud as soon as it starts, but until
you take action with the Collections API, it will not have any indexes
on it.  SolrCloud does not automatically create replicas except in a
very specific set of circumstances that I do not think are very common.

You'll need to either create a new collection or take steps to modify
your current collection(s) so that one or more shard replicas are
located on the new node.

https://cwiki.apache.org/confluence/display/solr/Collections+API

Thanks,
Shawn



Re: How to do a Data sharding for data in a database table

2015-06-18 Thread wwang525
The query without load is still under 1 second. But under load, response time
can be much longer due to the queued up query.

We would like to shard the data to something like 6 M / shard, which will
still give a under 1 second response time under load.

What are some best practice to shard the data? for example, we could shard
the data by date range, but that is pretty dynamic, and we could shard data
by some other properties, but if the data is not evenly distributed, you may
not be able shard it anymore.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4212803.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.10.4: Could not create instance of 'SolrInputDocument'

2015-06-18 Thread Erick Erickson
No clue whatsoever, you haven't provided near enough details. I rather
doubt that many people
on this list really understand the interactions of that technology
stack, I certainly don't.

I'd ask on the ColdFusion list, as they're (apparently) the ones
who've integrated a Solr
connector of sorts. What evidence do you have that using a stock Solr
is even possible? For
all I know, the Solr provided with CF has some kind of customizations
(maybe a plugin?) that is
required.

Best,
Erick

On Thu, Jun 18, 2015 at 5:22 AM, Paul Revere  wrote:
> Our web site is created using PaperThin's CommonSpot CMS in a ColdFusion 10 
> and Windows Server 2008 R2 environment, using Apache Solr 4.10.4 instead of 
> CF Solr. We create collections through the CMS interface and they do appear 
> in both the CMS and the Solr dashboard when created. However, when we try 
> indexing our collections through the CMS interface, our CMS error logs show 
> the entry 'Could not create instance of 'SolrInputDocument'' for each member 
> of the collection. This is not a fatal error, as the indexing appears to 
> cycle through all members, but each member "errors out" with log entries for 
> each member.  I've Googled this error message without success. What might 
> this error message indicate please??
> Paul
>


Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Erick Erickson
The stack trace is what gets returned to the client, right? It's often
much more informative to see the Solr log output, the error message
is often much more helpful there. By the time the exception bubbles
up through the various layers vital information is sometimes not returned
to the client in the error message.

One precaution I would take since you've changed the schema is to
_completely_ remove the index.
1> shut down Solr
2> rm -rf coreX/data
3> restart Solr.
4> try it again.

Lucene doesn't really care at all whether a field gets indexed one way in
one document and another way in the next document and occasionally
having fields indexed different ways (string and text) in different documents
at the same time confuses things.

Best,
Erick

On Thu, Jun 18, 2015 at 10:31 AM, Paden  wrote:
> Just rolling out a little bit more information as it is coming. I changed the
> field type in the schema to text_general and that didn't change a thing.
>
> Another thing is that it's consistently submitting/not submitting the same
> documents. I will run over it one time and it won't index a set of
> documents. When I clear the index and run the program again it
> submits/doesn't submit the same documents.
>
> And it will index certain PDF's it just won't index others. Which is weird
> because I printed the strings that are submitted to Solr and the ones that
> get submitted are really similar to the ones that aren't submitted.
>
> I can't post the actual strings for sensitivity reasons.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704p4212757.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collections API and adding new boxes

2015-06-18 Thread Erick Erickson
See particularly the ADDREPLICA command and the
"node" parameter. You might not even need the "node"
parameter since when you add a replica Solr does its
best to put the new replica on an underutilized node.

Best,
Erick

On Thu, Jun 18, 2015 at 2:58 PM, Shawn Heisey  wrote:
> On 6/18/2015 3:23 PM, Jim.Musil wrote:
>> Let's say I have a zookeeper ensemble with several Solr nodes connected to 
>> it. I've created a collection successfully and all is well.
>>
>> What happens when I want to add another solr node?
>>
>> I've tried spinning one up and connecting it to zookeeper, but the new node 
>> doesn't "join" the collection.  What's the expected next step?
>>
>> This is Solr 5.1.
>
> The new node will be part of the cloud as soon as it starts, but until
> you take action with the Collections API, it will not have any indexes
> on it.  SolrCloud does not automatically create replicas except in a
> very specific set of circumstances that I do not think are very common.
>
> You'll need to either create a new collection or take steps to modify
> your current collection(s) so that one or more shard replicas are
> located on the new node.
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> Thanks,
> Shawn
>


Re: How to do a Data sharding for data in a database table

2015-06-18 Thread Erick Erickson
You've repeated your original statement. Shawn's
observation is that 10M docs is a very small corpus
by Solr standards. You either have very demanding
document/search combinations or you have a poorly
tuned Solr installation.

On reasonable hardware I expect 25-50M documents to have
sub-second response time.

So what we're trying to do is be sure this isn't
an "XY" problem, from Hossman's apache page:

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341

So again, how would you characterize your documents? How many
fields? What do queries look like? How much physical memory on the
machine? How much memory have you allocated to the JVM?

You might review:
http://wiki.apache.org/solr/UsingMailingLists


Best,
Erick

On Thu, Jun 18, 2015 at 3:23 PM, wwang525  wrote:
> The query without load is still under 1 second. But under load, response time
> can be much longer due to the queued up query.
>
> We would like to shard the data to something like 6 M / shard, which will
> still give a under 1 second response time under load.
>
> What are some best practice to shard the data? for example, we could shard
> the data by date range, but that is pretty dynamic, and we could shard data
> by some other properties, but if the data is not evenly distributed, you may
> not be able shard it anymore.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4212803.html
> Sent from the Solr - User mailing list archive at Nabble.com.


How to append new data to index i solr?

2015-06-18 Thread ??????
Hello,
 I'm a solr user with some question. I want to append new data to the 
existing index. Does Solr support to append new data to index?
 Thanks for any reply.
Best wishes.
Jason

Re: Help: Problem in customized token filter

2015-06-18 Thread Steve Rowe
Hi Aman,

The admin UI screenshot you linked to is from an older version of Solr - what 
version are you using?

Lots of extraneous angle brackets and asterisks got into your email and made 
for a bunch of cleanup work before I could read or edit it.  In the future, 
please put your code somewhere people can easily read it and copy/paste it into 
an editor: into a github gist or on a paste service, etc.

Looks to me like your use of “exhausted” is unnecessary, and is likely the 
cause of the problem you saw (only one document getting processed): you never 
set exhausted to false, and when the filter got reused, it incorrectly carried 
state from the previous document.

Here’s a simpler version that’s hopefully more correct and more efficient (2 
fewer copies from the StringBuilder to the final token).  Note: I didn’t test 
it:

https://gist.github.com/sarowe/9b9a52b683869ced3a17

Steve
www.lucidworks.com

> On Jun 18, 2015, at 11:33 AM, Aman Tandon  wrote:
> 
> Please help, what wrong I am doing here. please guide me.
> 
> With Regards
> Aman Tandon
> 
> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
> wrote:
> 
>> Hi,
>> 
>> I created a *token concat filter* to concat all the tokens from token
>> stream. It creates the concatenated token as expected.
>> 
>> But when I am posting the xml containing more than 30,000 documents, then
>> only first document is having the data of that field.
>> 
>> *Schema:*
>> 
>> *>> required="false" omitNorms="false" multiValued="false" />*
>> 
>> 
>> 
>> 
>> 
>> 
>>> *>> positionIncrementGap="100">*
>>> *  *
>>> **
>>> **
>>> *>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
>>> **
>>> *>> outputUnigrams="true" tokenSeparator=""/>*
>>> *>> language="English" protected="protwords.txt"/>*
>>> *>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>> *>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
>>> expand="true"/>*
>>> *  *
>>> *  *
>>> **
>>> *>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
>>> *>> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />*
>>> *>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
>>> **
>>> *>> language="English" protected="protwords.txt"/>*
>>> *>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>>> *  ***
>> 
>> 
>> Please help me, The code for the filter is as follows, please take a look.
>> 
>> Here is the picture of what filter is doing
>> 
>> 
>> The code of concat filter is :
>> 
>> *package com.xyz.analysis.concat;*
>>> 
>>> *import java.io.IOException;*
>>> 
>>> 
 *import org.apache.lucene.analysis.TokenFilter;*
>>> 
>>> *import org.apache.lucene.analysis.TokenStream;*
>>> 
>>> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
>>> 
>>> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
>>> 
>>> *import
 org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
>>> 
>>> *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
>>> 
>>> 
 *public class ConcatenateWordsFilter extends TokenFilter {*
>>> 
>>> 
 *  private CharTermAttribute charTermAttribute =
 addAttribute(CharTermAttribute.class);*
>>> 
>>> *  private OffsetAttribute offsetAttribute =
 addAttribute(OffsetAttribute.class);*
>>> 
>>> *  PositionIncrementAttribute posIncr =
 addAttribute(PositionIncrementAttribute.class);*
>>> 
>>> *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
>>> 
>>> 
 *  private StringBuilder stringBuilder = new StringBuilder();*
>>> 
>>> *  private boolean exhausted = false;*
>>> 
>>> 
 *  /***
>>> 
>>> *   * Creates a new ConcatenateWordsFilter*
>>> 
>>> *   * @param input TokenStream that will be filtered*
>>> 
>>> *   */*
>>> 
>>> *  public ConcatenateWordsFilter(TokenStream input) {*
>>> 
>>> *super(input);*
>>> 
>>> *  }*
>>> 
>>> 
 *  /***
>>> 
>>> *   * {@inheritDoc}*
>>> 
>>> *   */*
>>> 
>>> *  @Override*
>>> 
>>> *  public final boolean incrementToken() throws IOException {*
>>> 
>>> *while (!exhausted && input.incrementToken()) {*
>>> 
>>> *  char terms[] = charTermAttribute.buffer();*
>>> 
>>> *  int termLength = charTermAttribute.length();*
>>> 
>>> *  if(typeAtrr.type().equals("")){*
>>> 
>>> * stringBuilder.append(terms, 0, termLength);*
>>> 
>>> *  }*
>>> 
>>> *  charTermAttribute.copyBuffer(terms, 0, termLength);*
>>> 
>>> *  return true;*
>>> 
>>> *}*
>>> 
>>> 
 *if (!exhausted) {*
>>> 
>>> *  exhausted = true;*
>>> 
>>> *  String sb = stringBuilder.toString();*
>>> 
>>> *  System.err.println("The Data got is "+sb);*
>>> 
>>> *  int

Re: Help: Problem in customized token filter

2015-06-18 Thread Steve Rowe
Aman,

My version won’t produce anything at all, since incrementToken() always returns 
false…

I updated the gist (at the same URL) to fix the problem by returning true from 
incrementToken() once and then false until reset() is called.  It also handles 
the case when the concatenated token is zero length by not emitting a token.

Steve
www.lucidworks.com

> On Jun 19, 2015, at 12:55 AM, Steve Rowe  wrote:
> 
> Hi Aman,
> 
> The admin UI screenshot you linked to is from an older version of Solr - what 
> version are you using?
> 
> Lots of extraneous angle brackets and asterisks got into your email and made 
> for a bunch of cleanup work before I could read or edit it.  In the future, 
> please put your code somewhere people can easily read it and copy/paste it 
> into an editor: into a github gist or on a paste service, etc.
> 
> Looks to me like your use of “exhausted” is unnecessary, and is likely the 
> cause of the problem you saw (only one document getting processed): you never 
> set exhausted to false, and when the filter got reused, it incorrectly 
> carried state from the previous document.
> 
> Here’s a simpler version that’s hopefully more correct and more efficient (2 
> fewer copies from the StringBuilder to the final token).  Note: I didn’t test 
> it:
> 
>https://gist.github.com/sarowe/9b9a52b683869ced3a17
> 
> Steve
> www.lucidworks.com
> 
>> On Jun 18, 2015, at 11:33 AM, Aman Tandon  wrote:
>> 
>> Please help, what wrong I am doing here. please guide me.
>> 
>> With Regards
>> Aman Tandon
>> 
>> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
>> wrote:
>> 
>>> Hi,
>>> 
>>> I created a *token concat filter* to concat all the tokens from token
>>> stream. It creates the concatenated token as expected.
>>> 
>>> But when I am posting the xml containing more than 30,000 documents, then
>>> only first document is having the data of that field.
>>> 
>>> *Schema:*
>>> 
>>> *>>> required="false" omitNorms="false" multiValued="false" />*
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
 *>>> positionIncrementGap="100">*
 *  *
 **
 **
 *>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
 catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
 **
 *>>> outputUnigrams="true" tokenSeparator=""/>*
 *>>> language="English" protected="protwords.txt"/>*
 *>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
 *>>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
 expand="true"/>*
 *  *
 *  *
 **
 *>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
 *>>> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />*
 *>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
 catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
 **
 *>>> language="English" protected="protwords.txt"/>*
 *>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
 *  ***
>>> 
>>> 
>>> Please help me, The code for the filter is as follows, please take a look.
>>> 
>>> Here is the picture of what filter is doing
>>> 
>>> 
>>> The code of concat filter is :
>>> 
>>> *package com.xyz.analysis.concat;*
 
 *import java.io.IOException;*
 
 
> *import org.apache.lucene.analysis.TokenFilter;*
 
 *import org.apache.lucene.analysis.TokenStream;*
 
 *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
 
 *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
 
 *import
> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
 
 *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
 
 
> *public class ConcatenateWordsFilter extends TokenFilter {*
 
 
> *  private CharTermAttribute charTermAttribute =
> addAttribute(CharTermAttribute.class);*
 
 *  private OffsetAttribute offsetAttribute =
> addAttribute(OffsetAttribute.class);*
 
 *  PositionIncrementAttribute posIncr =
> addAttribute(PositionIncrementAttribute.class);*
 
 *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
 
 
> *  private StringBuilder stringBuilder = new StringBuilder();*
 
 *  private boolean exhausted = false;*
 
 
> *  /***
 
 *   * Creates a new ConcatenateWordsFilter*
 
 *   * @param input TokenStream that will be filtered*
 
 *   */*
 
 *  public ConcatenateWordsFilter(TokenStream input) {*
 
 *super(input);*
 
 *  }*
 
 
> *  /***
 
 *   * {@inheritDoc}*
 
 *   */*
 
 *  @Override*
 
 *  public final boolean incrementToken() throws IOException {*
 
 *while (!exhausted && inpu

Re: Help: Problem in customized token filter

2015-06-18 Thread Aman Tandon
Hi Steve,


>  you never set exhausted to false, and when the filter got reused, *it
> incorrectly carried state from the previous document.*


Thanks for replying, but I am not able to understand this.

With Regards
Aman Tandon

On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe  wrote:

> Hi Aman,
>
> The admin UI screenshot you linked to is from an older version of Solr -
> what version are you using?
>
> Lots of extraneous angle brackets and asterisks got into your email and
> made for a bunch of cleanup work before I could read or edit it.  In the
> future, please put your code somewhere people can easily read it and
> copy/paste it into an editor: into a github gist or on a paste service, etc.
>
> Looks to me like your use of “exhausted” is unnecessary, and is likely the
> cause of the problem you saw (only one document getting processed): you
> never set exhausted to false, and when the filter got reused, it
> incorrectly carried state from the previous document.
>
> Here’s a simpler version that’s hopefully more correct and more efficient
> (2 fewer copies from the StringBuilder to the final token).  Note: I didn’t
> test it:
>
> https://gist.github.com/sarowe/9b9a52b683869ced3a17
>
> Steve
> www.lucidworks.com
>
> > On Jun 18, 2015, at 11:33 AM, Aman Tandon 
> wrote:
> >
> > Please help, what wrong I am doing here. please guide me.
> >
> > With Regards
> > Aman Tandon
> >
> > On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
> > wrote:
> >
> >> Hi,
> >>
> >> I created a *token concat filter* to concat all the tokens from token
> >> stream. It creates the concatenated token as expected.
> >>
> >> But when I am posting the xml containing more than 30,000 documents,
> then
> >> only first document is having the data of that field.
> >>
> >> *Schema:*
> >>
> >> * >>> required="false" omitNorms="false" multiValued="false" />*
> >>
> >>
> >>
> >>
> >>
> >>
> >>> * >>> positionIncrementGap="100">*
> >>> *  *
> >>> **
> >>> **
> >>> * >>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
> >>> **
> >>> * >>> outputUnigrams="true" tokenSeparator=""/>*
> >>> * >>> language="English" protected="protwords.txt"/>*
> >>> * >>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
> >>> * >>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
> >>> expand="true"/>*
> >>> *  *
> >>> *  *
> >>> **
> >>> * >>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
> >>> * >>> words="stopwords_text_prime_search.txt"
> enablePositionIncrements="true" />*
> >>> * >>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
> >>> **
> >>> * >>> language="English" protected="protwords.txt"/>*
> >>> * >>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
> >>> *  ***
> >>
> >>
> >> Please help me, The code for the filter is as follows, please take a
> look.
> >>
> >> Here is the picture of what filter is doing
> >> 
> >>
> >> The code of concat filter is :
> >>
> >> *package com.xyz.analysis.concat;*
> >>>
> >>> *import java.io.IOException;*
> >>>
> >>>
>  *import org.apache.lucene.analysis.TokenFilter;*
> >>>
> >>> *import org.apache.lucene.analysis.TokenStream;*
> >>>
> >>> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
> >>>
> >>> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
> >>>
> >>> *import
> 
> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
> >>>
> >>> *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
> >>>
> >>>
>  *public class ConcatenateWordsFilter extends TokenFilter {*
> >>>
> >>>
>  *  private CharTermAttribute charTermAttribute =
>  addAttribute(CharTermAttribute.class);*
> >>>
> >>> *  private OffsetAttribute offsetAttribute =
>  addAttribute(OffsetAttribute.class);*
> >>>
> >>> *  PositionIncrementAttribute posIncr =
>  addAttribute(PositionIncrementAttribute.class);*
> >>>
> >>> *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
> >>>
> >>>
>  *  private StringBuilder stringBuilder = new StringBuilder();*
> >>>
> >>> *  private boolean exhausted = false;*
> >>>
> >>>
>  *  /***
> >>>
> >>> *   * Creates a new ConcatenateWordsFilter*
> >>>
> >>> *   * @param input TokenStream that will be filtered*
> >>>
> >>> *   */*
> >>>
> >>> *  public ConcatenateWordsFilter(TokenStream input) {*
> >>>
> >>> *super(input);*
> >>>
> >>> *  }*
> >>>
> >>>
>  *  /***
> >>>
> >>> *   * {@inheritDoc}*
> >>>
> >>> *   */*
> >>>
> >>> *  @Override*
> >>>
> >>> *  public final boolean incrementToken() throws IOException {*
> >>>
> >>> *while (!exhausted && input.incrementToken()) {*
> >>>
> >>> *  char term

Re: Help: Problem in customized token filter

2015-06-18 Thread Aman Tandon
Yes I just saw.

With Regards
Aman Tandon

On Fri, Jun 19, 2015 at 10:39 AM, Steve Rowe  wrote:

> Aman,
>
> My version won’t produce anything at all, since incrementToken() always
> returns false…
>
> I updated the gist (at the same URL) to fix the problem by returning true
> from incrementToken() once and then false until reset() is called.  It also
> handles the case when the concatenated token is zero length by not emitting
> a token.
>
> Steve
> www.lucidworks.com
>
> > On Jun 19, 2015, at 12:55 AM, Steve Rowe  wrote:
> >
> > Hi Aman,
> >
> > The admin UI screenshot you linked to is from an older version of Solr -
> what version are you using?
> >
> > Lots of extraneous angle brackets and asterisks got into your email and
> made for a bunch of cleanup work before I could read or edit it.  In the
> future, please put your code somewhere people can easily read it and
> copy/paste it into an editor: into a github gist or on a paste service, etc.
> >
> > Looks to me like your use of “exhausted” is unnecessary, and is likely
> the cause of the problem you saw (only one document getting processed): you
> never set exhausted to false, and when the filter got reused, it
> incorrectly carried state from the previous document.
> >
> > Here’s a simpler version that’s hopefully more correct and more
> efficient (2 fewer copies from the StringBuilder to the final token).
> Note: I didn’t test it:
> >
> >https://gist.github.com/sarowe/9b9a52b683869ced3a17
> >
> > Steve
> > www.lucidworks.com
> >
> >> On Jun 18, 2015, at 11:33 AM, Aman Tandon 
> wrote:
> >>
> >> Please help, what wrong I am doing here. please guide me.
> >>
> >> With Regards
> >> Aman Tandon
> >>
> >> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I created a *token concat filter* to concat all the tokens from token
> >>> stream. It creates the concatenated token as expected.
> >>>
> >>> But when I am posting the xml containing more than 30,000 documents,
> then
> >>> only first document is having the data of that field.
> >>>
> >>> *Schema:*
> >>>
> >>> *  required="false" omitNorms="false" multiValued="false" />*
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
>  *  positionIncrementGap="100">*
>  *  *
>  **
>  **
>  *  generateWordParts="1" generateNumberParts="1" catenateWords="0"
>  catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
>  **
>  *  outputUnigrams="true" tokenSeparator=""/>*
>  *  language="English" protected="protwords.txt"/>*
>  *  class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>  *  synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
>  expand="true"/>*
>  *  *
>  *  *
>  **
>  *  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
>  *  words="stopwords_text_prime_search.txt"
> enablePositionIncrements="true" />*
>  *  generateWordParts="1" generateNumberParts="1" catenateWords="0"
>  catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
>  **
>  *  language="English" protected="protwords.txt"/>*
>  *  class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
>  *  ***
> >>>
> >>>
> >>> Please help me, The code for the filter is as follows, please take a
> look.
> >>>
> >>> Here is the picture of what filter is doing
> >>> 
> >>>
> >>> The code of concat filter is :
> >>>
> >>> *package com.xyz.analysis.concat;*
> 
>  *import java.io.IOException;*
> 
> 
> > *import org.apache.lucene.analysis.TokenFilter;*
> 
>  *import org.apache.lucene.analysis.TokenStream;*
> 
>  *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
> 
>  *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
> 
>  *import
> >
> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
> 
>  *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
> 
> 
> > *public class ConcatenateWordsFilter extends TokenFilter {*
> 
> 
> > *  private CharTermAttribute charTermAttribute =
> > addAttribute(CharTermAttribute.class);*
> 
>  *  private OffsetAttribute offsetAttribute =
> > addAttribute(OffsetAttribute.class);*
> 
>  *  PositionIncrementAttribute posIncr =
> > addAttribute(PositionIncrementAttribute.class);*
> 
>  *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
> 
> 
> > *  private StringBuilder stringBuilder = new StringBuilder();*
> 
>  *  private boolean exhausted = false;*
> 
> 
> > *  /***
> 
>  *   * Creates a new ConcatenateWordsFilter*
> 
>  *   * @param input TokenStream that will be filtered*
> 
>  *   *

Re: Help: Problem in customized token filter

2015-06-18 Thread Steve Rowe
Aman,

Solr uses the same Token filter instances over and over, calling reset() before 
sending each document through.  Your code sets “exhausted" to true and then 
never sets it back to false, so the next time the token filter instance is 
used, its “exhausted" value is still true, so no input stream tokens are 
concatenated ever again.

Does that make sense?

Steve
www.lucidworks.com

> On Jun 19, 2015, at 1:10 AM, Aman Tandon  wrote:
> 
> Hi Steve,
> 
> 
>> you never set exhausted to false, and when the filter got reused, *it
>> incorrectly carried state from the previous document.*
> 
> 
> Thanks for replying, but I am not able to understand this.
> 
> With Regards
> Aman Tandon
> 
> On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe  wrote:
> 
>> Hi Aman,
>> 
>> The admin UI screenshot you linked to is from an older version of Solr -
>> what version are you using?
>> 
>> Lots of extraneous angle brackets and asterisks got into your email and
>> made for a bunch of cleanup work before I could read or edit it.  In the
>> future, please put your code somewhere people can easily read it and
>> copy/paste it into an editor: into a github gist or on a paste service, etc.
>> 
>> Looks to me like your use of “exhausted” is unnecessary, and is likely the
>> cause of the problem you saw (only one document getting processed): you
>> never set exhausted to false, and when the filter got reused, it
>> incorrectly carried state from the previous document.
>> 
>> Here’s a simpler version that’s hopefully more correct and more efficient
>> (2 fewer copies from the StringBuilder to the final token).  Note: I didn’t
>> test it:
>> 
>>https://gist.github.com/sarowe/9b9a52b683869ced3a17
>> 
>> Steve
>> www.lucidworks.com
>> 
>>> On Jun 18, 2015, at 11:33 AM, Aman Tandon 
>> wrote:
>>> 
>>> Please help, what wrong I am doing here. please guide me.
>>> 
>>> With Regards
>>> Aman Tandon
>>> 
>>> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
>>> wrote:
>>> 
 Hi,
 
 I created a *token concat filter* to concat all the tokens from token
 stream. It creates the concatenated token as expected.
 
 But when I am posting the xml containing more than 30,000 documents,
>> then
 only first document is having the data of that field.
 
 *Schema:*
 
 * required="false" omitNorms="false" multiValued="false" />*
 
 
 
 
 
 
> * positionIncrementGap="100">*
> *  *
> **
> **
> * generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
> **
> * outputUnigrams="true" tokenSeparator=""/>*
> * language="English" protected="protwords.txt"/>*
> * class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
> * synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
> expand="true"/>*
> *  *
> *  *
> **
> * synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
> * words="stopwords_text_prime_search.txt"
>> enablePositionIncrements="true" />*
> * generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
> **
> * language="English" protected="protwords.txt"/>*
> * class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
> *  ***
 
 
 Please help me, The code for the filter is as follows, please take a
>> look.
 
 Here is the picture of what filter is doing
 
 
 The code of concat filter is :
 
 *package com.xyz.analysis.concat;*
> 
> *import java.io.IOException;*
> 
> 
>> *import org.apache.lucene.analysis.TokenFilter;*
> 
> *import org.apache.lucene.analysis.TokenStream;*
> 
> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
> 
> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
> 
> *import
>> 
>> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
> 
> *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
> 
> 
>> *public class ConcatenateWordsFilter extends TokenFilter {*
> 
> 
>> *  private CharTermAttribute charTermAttribute =
>> addAttribute(CharTermAttribute.class);*
> 
> *  private OffsetAttribute offsetAttribute =
>> addAttribute(OffsetAttribute.class);*
> 
> *  PositionIncrementAttribute posIncr =
>> addAttribute(PositionIncrementAttribute.class);*
> 
> *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
> 
> 
>> *  private StringBuilder stringBuilder = new StringBuilder();*
> 
> *  private boolean exhausted = false;*
> 
> 

Auto-suggest in Solr

2015-06-18 Thread Zheng Lin Edwin Yeo
I'm implementing an auto-suggest feature in Solr, and I'll like to achieve
the follwing:

For example, if the user enters "mp3", Solr might suggest "mp3 player",
"mp3 nano" and "mp3 music".
When the user enters "mp3 p", the suggestion should narrow down to "mp3
player".

Currently, when I type "mp3 p", the suggester is returning words that
starts with the letter "p" only, and I'm getting results like "plan",
"production", etc, and it does not take the "mp3" token into consideration.

I'm using Solr 5.1 and below is my configuration:

In solrconfig.xml:


  

 FreeTextLookupFactory
 suggester_freetext_dir

DocumentDictionaryFactory
Suggestion
Project
suggestType
5
false
false
  



In schema.xml















Is there anything that I configured wrongly?


Regards,
Edwin


Re: Help: Problem in customized token filter

2015-06-18 Thread Aman Tandon
Steve,

Thank you thank you so much. You guys are awesome.

Steve how can i learn more about the lucene indexing process in more
detail. e.g. after we send documents for indexing which function calls till
the doc actually store in index files.

I will be thankful to you. If you guide me here.

With Regards
Aman Tandon

On Fri, Jun 19, 2015 at 10:48 AM, Steve Rowe  wrote:

> Aman,
>
> Solr uses the same Token filter instances over and over, calling reset()
> before sending each document through.  Your code sets “exhausted" to true
> and then never sets it back to false, so the next time the token filter
> instance is used, its “exhausted" value is still true, so no input stream
> tokens are concatenated ever again.
>
> Does that make sense?
>
> Steve
> www.lucidworks.com
>
> > On Jun 19, 2015, at 1:10 AM, Aman Tandon 
> wrote:
> >
> > Hi Steve,
> >
> >
> >> you never set exhausted to false, and when the filter got reused, *it
> >> incorrectly carried state from the previous document.*
> >
> >
> > Thanks for replying, but I am not able to understand this.
> >
> > With Regards
> > Aman Tandon
> >
> > On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe  wrote:
> >
> >> Hi Aman,
> >>
> >> The admin UI screenshot you linked to is from an older version of Solr -
> >> what version are you using?
> >>
> >> Lots of extraneous angle brackets and asterisks got into your email and
> >> made for a bunch of cleanup work before I could read or edit it.  In the
> >> future, please put your code somewhere people can easily read it and
> >> copy/paste it into an editor: into a github gist or on a paste service,
> etc.
> >>
> >> Looks to me like your use of “exhausted” is unnecessary, and is likely
> the
> >> cause of the problem you saw (only one document getting processed): you
> >> never set exhausted to false, and when the filter got reused, it
> >> incorrectly carried state from the previous document.
> >>
> >> Here’s a simpler version that’s hopefully more correct and more
> efficient
> >> (2 fewer copies from the StringBuilder to the final token).  Note: I
> didn’t
> >> test it:
> >>
> >>https://gist.github.com/sarowe/9b9a52b683869ced3a17
> >>
> >> Steve
> >> www.lucidworks.com
> >>
> >>> On Jun 18, 2015, at 11:33 AM, Aman Tandon 
> >> wrote:
> >>>
> >>> Please help, what wrong I am doing here. please guide me.
> >>>
> >>> With Regards
> >>> Aman Tandon
> >>>
> >>> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon 
> >>> wrote:
> >>>
>  Hi,
> 
>  I created a *token concat filter* to concat all the tokens from token
>  stream. It creates the concatenated token as expected.
> 
>  But when I am posting the xml containing more than 30,000 documents,
> >> then
>  only first document is having the data of that field.
> 
>  *Schema:*
> 
>  * > required="false" omitNorms="false" multiValued="false" />*
> 
> 
> 
> 
> 
> 
> > * > positionIncrementGap="100">*
> > *  *
> > **
> > **
> > * > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>*
> > **
> > * > outputUnigrams="true" tokenSeparator=""/>*
> > * > language="English" protected="protwords.txt"/>*
> > * > class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
> > * > synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true"
> > expand="true"/>*
> > *  *
> > *  *
> > **
> > * > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>*
> > * > words="stopwords_text_prime_search.txt"
> >> enablePositionIncrements="true" />*
> > * > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>*
> > **
> > * > language="English" protected="protwords.txt"/>*
> > * > class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>*
> > *  ***
> 
> 
>  Please help me, The code for the filter is as follows, please take a
> >> look.
> 
>  Here is the picture of what filter is doing
>  
> 
>  The code of concat filter is :
> 
>  *package com.xyz.analysis.concat;*
> >
> > *import java.io.IOException;*
> >
> >
> >> *import org.apache.lucene.analysis.TokenFilter;*
> >
> > *import org.apache.lucene.analysis.TokenStream;*
> >
> > *import
> org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
> >
> > *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
> >
> > *import
> >>
> >> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
> >
> > *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
> >
> >>>

Re: How to append new data to index i solr?

2015-06-18 Thread Mikhail Khludnev
It does. Absolutely. But it depends on what you in it. Start from
http://wiki.apache.org/solr/UpdateXmlMessages#add.2Freplace_documents

On Fri, Jun 19, 2015 at 7:54 AM, 步青云  wrote:

> Hello,
>  I'm a solr user with some question. I want to append new data to the
> existing index. Does Solr support to append new data to index?
>  Thanks for any reply.
> Best wishes.
> Jason




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics