Solr - Best way to handle Solr connection

2012-11-08 Thread uygunes
Hi all,

On our servers we have been dealing with the CLOSE_WAIT problem. The
documentation of Solr says to use a static object and use it for all
connections. We are on Solr 3.3 and this approach seems to be creating
hanging queries for us and slowing down the website.

But when we create a new object each time, we face the CLOSE_WAIT issue. I
am looking for recommendations, would it solve my issue just to simply
upgrade to 3.6 and then use static object? or keep creating new objects each
time but call shutDown() afterwards?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Best-way-to-handle-Solr-connection-tp4018968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
Yes, I did this and the Words with the Umlaute went through the Stopfilter.
The ones without Umlaute were correctly removed.

On Thu, Nov 8, 2012 at 2:22 AM, Lance Norskog  wrote:

> You can debug this with the 'Analysis' page in the Solr UI. You pick
> 'text_general' and then give words with umlauts in the text box for
> indexing and queries.
>
> Lance
>
> - Original Message -
> | From: "Daniel Brügge" 
> | To: solr-user@lucene.apache.org
> | Sent: Wednesday, November 7, 2012 8:45:45 AM
> | Subject: SolrCloud, Zookeeper and Stopwords with Umlaute or other
> special characters
> |
> | Hi,
> |
> | i am running a SolrCloud cluster with the 4.0.0 version. I have a
> | stopwords
> | file
> | which is in the correct encoding. It contains german Umlaute like
> | e.g. 'ü'.
> | I am
> | also running a standalone Zookeeper which contains this stopwords
> | file. In
> | my schema
> | i am using the stopwords file in the standard way:
> |
> | >
> | >  | > positionIncrementGap="100">
> | >   
> | > 
> | >  | > ignoreCase="true"
> | > words="my_stopwords.txt"
> | > enablePositionIncrements="true" />
> |
> |
> | When I am indexing i recognized, that all stopwords without Umlaute
> | are
> | correctly removed, but the ones with
> | Umlaute still exist.
> |
> | Is this a problem with ZK or Solr?
> |
> | Thanks & regards
> |
> | Daniel
> |
>


Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
When I look at the text_de fieldType provided in the example schema i can
see:

>
> 
>  words="lang/stopwords_de.txt" format="snowball"
> enablePositionIncrements="true"/>
> 
> 


I have tried with this and this removed the words with Umlaute. It seems,
that is because of format="snowball". I haven't used this, because I though
I had one word per line. But maybe some invisible characters got into my
stopword file and destroyed it.

Thanks.

Daniel

On Thu, Nov 8, 2012 at 10:36 AM, Daniel Brügge <
daniel.brue...@googlemail.com> wrote:

> Yes, I did this and the Words with the Umlaute went through the
> Stopfilter. The ones without Umlaute were correctly removed.
>
> On Thu, Nov 8, 2012 at 2:22 AM, Lance Norskog  wrote:
>
>> You can debug this with the 'Analysis' page in the Solr UI. You pick
>> 'text_general' and then give words with umlauts in the text box for
>> indexing and queries.
>>
>> Lance
>>
>> - Original Message -
>> | From: "Daniel Brügge" 
>> | To: solr-user@lucene.apache.org
>> | Sent: Wednesday, November 7, 2012 8:45:45 AM
>> | Subject: SolrCloud, Zookeeper and Stopwords with Umlaute or other
>> special characters
>> |
>> | Hi,
>> |
>> | i am running a SolrCloud cluster with the 4.0.0 version. I have a
>> | stopwords
>> | file
>> | which is in the correct encoding. It contains german Umlaute like
>> | e.g. 'ü'.
>> | I am
>> | also running a standalone Zookeeper which contains this stopwords
>> | file. In
>> | my schema
>> | i am using the stopwords file in the standard way:
>> |
>> | >
>> | > > | > positionIncrementGap="100">
>> | >   
>> | > 
>> | > > | > ignoreCase="true"
>> | > words="my_stopwords.txt"
>> | > enablePositionIncrements="true" />
>> |
>> |
>> | When I am indexing i recognized, that all stopwords without Umlaute
>> | are
>> | correctly removed, but the ones with
>> | Umlaute still exist.
>> |
>> | Is this a problem with ZK or Solr?
>> |
>> | Thanks & regards
>> |
>> | Daniel
>> |
>>
>
>


Re: Retrieve unique documents on a non id field

2012-11-08 Thread Indika Tantrigoda
Hi,

Thanks for the reply. Yes, I grouped the documents based on the
restaurant_id and got 1 result per group. Setting the group.format to
simple helped with the formatting.

Thanks,
Indika

On 8 November 2012 12:10, Rafał Kuć  wrote:

> Hello!
>
> Look at the field collapsing functionality -
> http://wiki.apache.org/solr/FieldCollapsing
>
> It allows you to group documents based on field value, query or
> function query.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > Hi All,
>
> > Currently I am using Solr for searching and filtering restaurants based
> on
> > certain criteria. For example I use Solr to obtain the list of
> restaurants
> > open in the day.
>
> > A restaurant can have sessions when its open, e.g. Breakfast, Lunch and
> > Dinner, and the time information related to these sessions are stored in
> > three different documents with a field to identify the restaurant
> > (restaurant_id).
>
> > When I query for restaurants that are open at 11:00 AM using
> (start_time:[*
> > TO 1100] AND end_time:[1100 TO *]) and if sessions overlap (say Breakfast
> > and Lunch) I would get both the Breakfast document and the Lunch
> document.
> > These are in fact two different documents and would have the same
> > restaurant_id.
>
> > My question is, is there a way to retrive only one document if the
> > restaurant_id repeated in the response.
>
> > Thanks,
> > Indika
>
>


Replicated zookeeper

2012-11-08 Thread ku3ia
Hi!

I'm trying to setup SolrCloud with replicated zookeeper, but have a problem.

I'm using Jetty 8 (not embedded), Zookeeper 3.3.6, SolrCloud 4.0 from
branch, Ubuntu 12.04 LTS.
My configs are:

Four Jetty instances running on ports 8080, 8081, 8082 and 8083

Jetty1.sh:
JAVA_OPTIONS="$JAVA_OPTIONS
-Djava.util.logging.config.file=$JETTY_HOME/etc/logging.properties
-XX:+DisableExplicitGC \
-XX:PermSize=96M -XX:MaxPermSize=96M -Xmx512M -Xms512M -XX:NewSize=96M
-XX:MaxNewSize=96M \
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled \
-XX:CMSInitiatingOccupancyFraction=50 -XX:GCTimeRatio=9
-XX:MinHeapFreeRatio=25 -XX:MaxHeapFreeRatio=25 \
-verbose:gc -XX:+PrintGCTimeStamps -Xloggc:$JETTY_HOME/logs/gc.log
-Dsolr.solr.home=/opt/search4/solr/1 \
-Dbootstrap_confdir=/opt/search4/solr/1/collection1/conf
-Dcollection.configName=sm -DnumShards=2
-DzkHost=10.112.1.2:2181,10.112.1.2:2182,10.112.1.2:2183"

Jetty2.sh (3 and 4 are the same except solr.home var):
JAVA_OPTIONS="$JAVA_OPTIONS
-Djava.util.logging.config.file=$JETTY_HOME/etc/logging.properties
-XX:+DisableExplicitGC \
-XX:PermSize=96M -XX:MaxPermSize=96M -Xmx512M -Xms512M -XX:NewSize=96M
-XX:MaxNewSize=96M \
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled \
-XX:CMSInitiatingOccupancyFraction=50 -XX:GCTimeRatio=9
-XX:MinHeapFreeRatio=25 -XX:MaxHeapFreeRatio=25 \
-verbose:gc -XX:+PrintGCTimeStamps -Xloggc:$JETTY_HOME/logs/gc.log
-Dsolr.solr.home=/opt/search4/solr/2 \
-DzkHost=10.112.1.2:2181,10.112.1.2:2182,10.112.1.2:2183"

My solr.xml files:
solr.xml (8080 port)


  

  


solr.xml (8081 port)


  

  


solr.xml (8082 port)


  

  


solr.xml (8083 port)


  

  


My zookeeper configs (are the same, except dataDir and clientPort):
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/search4/zookeeper/1/data
clientPort=2181

# zookeeper ensemble
server.1=10.112.1.2:2888:3888
server.2=10.112.1.2:2889:3889
server.3=10.112.1.2:2890:3890

I had put myid file to datadir to each zookeper and start them and after
that I started Jetty.

Everything looks fine, SolrCloud is running normally, I have two leaders on
ports 8080 (shard1) and 8081 (shard2), but when I turn off first JVM (port
8080) Solr at third JVM doesn't become leader and I see errors in logs (3rd
JVM on port 8082):

Nov 08, 2012 11:00:40 AM org.apache.solr.cloud.ShardLeaderElectionContext
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1
timeoutin=118104
Nov 08, 2012 11:00:41 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Starting Replication Recovery. core=collection1
Nov 08, 2012 11:00:41 AM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Nov 08, 2012 11:00:41 AM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover.
core=collection1:org.apache.solr.client.solrj.SolrServerException: Server
refused connection at: http://10.112.1.2:8080/solr
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
http://10.112.1.2:8080 refused
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
at
org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
at
org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
... 4 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(S

Re: is it possible to save the search query?

2012-11-08 Thread Amit Nithian
Are you trying to do this in real time or offlline? Wouldn't mining your
access logs help? It may help to have your front end application pass in
some extra parameters that are not interpreted by Solr but are there for
"stamping" purposes for log analysis. One example could be a user id or
user cookie or something in case you have to construct sessions.


On Wed, Nov 7, 2012 at 10:01 PM, Romita Saha
wrote:

> Hi,
>
> The following is the example;
> 1st query:
>
>
> http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data
> ^2
> id&start=0&rows=11&fl=data,id
>
> Next query:
>
>
> http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data
> id^2&start=0&rows=11&fl=data,id
>
> In the 1st query the the field 'data' is boosted by 2. However may be the
> user was not satisfied with the response. Thus in the next query he
> boosted the field 'id' by 2.
>
> I want to record both the queries and compare between the two, meaning,
> what are the changes implemented on the 2nd query which are not present in
> the previous one.
>
> Thanks and regards,
> Romita Saha
>
>
>
> From:   Otis Gospodnetic 
> To: solr-user@lucene.apache.org,
> Date:   11/08/2012 01:35 PM
> Subject:Re: is it possible to save the search query?
>
>
>
> Hi,
>
> Compare in what sense?  An example will help.
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm
> On Nov 7, 2012 8:45 PM, "Romita Saha" 
> wrote:
>
> > Hi All,
> >
> > Is it possible to record a search query in solr and then compare it with
> > the previous search query?
> >
> > Thanks and regards,
> > Romita Saha
> >
>
>


Re: Searching for Partial Words

2012-11-08 Thread Amit Nithian
Look at the normal ngram tokenizer. "Engine" with ngram size 3 would yield
"eng" "ngi" "gin" "ine" so a search for engi should match. You can play
around with the min/max values. Edge ngram is useful for prefix matching
but sounds like you want intra-word matching too? ("eng" should match "
ResidentEngineer")


On Tue, Nov 6, 2012 at 7:35 AM, Sohail Aboobaker wrote:

> Thanks Jack.
> In the configuration below:
>
>   positionIncrementGap="100">
>
>   minGramSize="1" maxGramSize="1"/>
>
>  
>
> What are the possible values for "side"?
>
> If I understand it correctly, minGramSize=3 and side=front, will
> include eng* but not en*. Is this correct? So, the minGramSize is for
> number of characters allowed in the specified side.
>
> Does it allow side=both :) or something similar?
>
> Regards,
> Sohail
>


Re: Exponential omitNorms

2012-11-08 Thread Dotan Cohen
On Wed, Nov 7, 2012 at 5:16 PM, Walter Underwood  wrote:
> You are probably thinking of SweetSpotSimilarity. You might also want to look 
> at pivoted document normalization.
>

Thanks, I'll take a look at that.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re-index stored values

2012-11-08 Thread Andreas Niekler

Hello,

how could i re-index stored values of my solr index if i change the 
schema. Is there a gentle way to do this with stored values within solr 
itself? Normally i have to grab the stored values of a field and put it 
again to an update query for solr.


What does that to to copied fields? Can i just delete the contents of a 
copied field a well?


Thank you for any proposals

Andreas
--
Andreas Niekler, Dipl. Ing. (FH)
NLP Group | Department of Computer Science
University of Leipzig
Johannisgasse 26 | 04103 Leipzig

mail: aniek...@informatik.uni-leipzig.deg.de


Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Robert Muir
On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge
 wrote:
> Hi,
>
> i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords
> file
> which is in the correct encoding.

What makes you think that?

Note: "Because I can read it" is not the correct answer.

Ensure any of your stopwords files etc are in UTF-8. This is often
different from the encoding your computer uses by default if you open
a file, start typing in it, and press save.


Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
I trust the 'file' command output. And if i can read there "UTF-8 Unicode"
I believe that this is correct. Don't know if this is the 'correct answer'
for you ;)

BTW: It works locally, but not with ZK. So it's maybe more a ZK issue, which
somehow destroys my file. Will check.

On Thu, Nov 8, 2012 at 12:12 PM, Robert Muir  wrote:

> On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge
>  wrote:
> > Hi,
> >
> > i am running a SolrCloud cluster with the 4.0.0 version. I have a
> stopwords
> > file
> > which is in the correct encoding.
>
> What makes you think that?
>
> Note: "Because I can read it" is not the correct answer.
>
> Ensure any of your stopwords files etc are in UTF-8. This is often
> different from the encoding your computer uses by default if you open
> a file, start typing in it, and press save.
>


Re: Searching for Partial Words

2012-11-08 Thread Sohail Aboobaker
Yes, that is true. We are looking for partial word matches. It seems like
we can achieve this by using edge ngram for prefixes and adding wild card
at the end for ignoring suffix. If we set the edge ngram to 3. "eng" will
match ResidentEng but not ResidentEngineer. But a search for "eng*" will
match both ResidentEngineer and ResidentEngine and Engine etc.

Thank you for your responses. We were able to convince the business users
that partial word searching is not expected behavior and will generate more
results than needed so the partial word search requirement was dropped :)

Thanks again.


Re: how to scale from 1 server to 3 servers with 3 shards

2012-11-08 Thread Jeff Rhines
It's my understanding that your strategy is correct, although I expect that 
zookeeper would need to be updated somehow with the new second and third 
shards, no?

On Nov 8, 2012, at 2:36 AM, SuoNayi  wrote:

> Hi all,
>   Because it' unable to add or remove shard after solrcloud cluster is 
> initialized, 
> so we have to predict a precise shard size at first, saying we need 3 
> shards(0 replica).
> But now we have no enough server to set up the cluster with one shard per one 
> server.
> In this situation, I have to set up the cluster on the single  server, does 
> this mean I have to 
> deploy three solr instances(shards) on the same server with different 
> directories?
> when more servers are available, how to move the rest two shards to the new 
> servers?
> Just copying the files to the new server seems does not work.
> Hope someone can give me a hit,thanks a lot.
> 
> 
> Regards
> SuoNayi


Re: Questions about schema.xml

2012-11-08 Thread johnmunir
Thanks Prithu.


But why would I use different settings for the index and query?  I would think 
that if the setting is not the same for both, then search results for end users 
would be confusing, no?  To illustrate my point (this maybe drastic) if I don't 
"solr.LowerCaseFilterFactory" in one case, then many searches (mix-case for 
example) won't give me any hits.  A more realistic example is, if I don't match 
the rules for "solr.WordDelimiterFilterFactory", again, I could miss hits.  If 
my understanding is correct, and there is value in using different rules for 
"query" and "index", I like to see a concrete example, a use-case I can apply.


-- MJ



-Original Message-
From: Prithu Banerjee 
To: solr-user 
Sent: Thu, Nov 8, 2012 12:34 am
Subject: Re: Questions about schema.xml


Those two values are used to specify the analyzer type you want. That can
be of two kinds, one for the indexer- the analyzer you specify analyzes the
input documents accordingly to build the index. The other one is for query,
it analyzes your query. Typically the specified analyzer for index and
query are same so that you can search over exactly the token you created
while indexing. But you are free to provide any customized analyzer
according to your need.

-- 
best regards,
Prithu

On Thu, Nov 8, 2012 at 8:43 AM,  wrote:

>
> HI,
>
>
> Can someone help me understand the meaning of  and
>  in schema.xml, how they are used and what do I get
> back when the values are not the same?
>
>
> For example, given:
>
>
>  autoGeneratePhraseQueries="true">
>
>   
>words="stopwords.txt" enablePositionIncrements="true" />
>generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>   
>protected="protwords.txt"/>
>   
>
>
>   
>ignoreCase="true" expand="true"/>
>words="stopwords.txt" enablePositionIncrements="true" />
>generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>   
>protected="protwords.txt"/>
>   
>
> 
>
>
> If I make the entire content of "index" the same as "query" (or the other
> way around) how will that impact my search?  And why would I want to not
> make those two blocks the same?
>
>
> Thanks!!!
>
>
> -MJ
>

 


Re: [External] Re: how to scale from 1 server to 3 servers with 3 shards

2012-11-08 Thread Greene, Daniel [USA]
I think that your approach would work,  but you would want to have your "master 
" server up when you bring the new server online.   The new server will become 
a follower of one of the shards,  and get the shard content via replication.   
Once complete,  you could shut down the shard on the "master",  and the new 
server would be promoted automatically to the shard leader.



Sent from my Verizon Wireless 4GLTE smartphone

- Reply message -
From: "Jeff Rhines" 
To: "solr-user@lucene.apache.org" 
Subject: [External] Re: how to scale from 1 server to 3 servers with 3 shards
Date: Thu, Nov 8, 2012 7:30 am



It's my understanding that your strategy is correct, although I expect that 
zookeeper would need to be updated somehow with the new second and third 
shards, no?

On Nov 8, 2012, at 2:36 AM, SuoNayi  wrote:

> Hi all,
>   Because it' unable to add or remove shard after solrcloud cluster is 
> initialized,
> so we have to predict a precise shard size at first, saying we need 3 
> shards(0 replica).
> But now we have no enough server to set up the cluster with one shard per one 
> server.
> In this situation, I have to set up the cluster on the single  server, does 
> this mean I have to
> deploy three solr instances(shards) on the same server with different 
> directories?
> when more servers are available, how to move the rest two shards to the new 
> servers?
> Just copying the files to the new server seems does not work.
> Hope someone can give me a hit,thanks a lot.
>
>
> Regards
> SuoNayi


positions and qf parameter in (e)dismax

2012-11-08 Thread Markus Jelsma
Hi,

We do not want to store positions for some fields or omit term and positions 
(or just tf) for other fields. Obviously we don't need/want explicit phrase 
matching on the fields we want to configure without positions, but (e)dismax 
doesn't let us. All text fields configured in the QF parameter are eligible for 
explicit phrase matching and need to have positions. We're looking for a way to 
disable what we don't need and prevent Solr from searching fields for phrases 
that we don't want to be searched on.

Essentially we'd want to limit explicit phrase matching to the same fields 
configured in pf or have Lucene ignore explicit phrase searching on fields that 
have no positions loaded.

Any ideas to share?

Thanks,
Markus


Questions about schema.xml

2012-11-08 Thread johnmunir
HI,


Can someone help me understand the meaning of  and 
 in schema.xml, how they are used and what do I get back 
when the values are not the same?


For example, given:




   
  
  
  
  
  
  
   
   
  
  
  
  
  
  
  
   




If I make the entire content of "index" the same as "query" (or the other way 
around) how will that impact my search?  And why would I want to not make those 
two blocks the same?


Thanks!!!


-M 


Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-08 Thread Antony Steiner
Hello my name is Antony and I'm new to apache nutch and solr.

I want to crawl my website and therefore I downloaded nutch to do this.
This works fine. But no I would like to integrate nutch with solr. Im
running this on my unix system.
Im trying to follow this tutorial:
http://wiki.apache.org/nutch/NutchTutorial
But it wont for me. Running Solr without nutch is no problem. I can post
documents to solr with post.jar. But what I want to do is post my nutch
crawl to solr.
Now if I copy the schema.xml from nutch to
apache-solr-4.0.0/example/solr/collection1/conf directory aned restart solr
(java -jar start.jar), I get compiling errors but Solr will start. (Is this
the correct directory to copy my schema?)

Nov 8, 2012 9:40:33 AM org.apache.solr.schema.IndexSchema readSchema
INFO: Schema name=nutch
Nov 8, 2012 9:40:33 AM org.apache.solr.core.CoreContainer create
SEVERE: Unable to create core: collection1
org.apache.solr.common.SolrException: Schema Parsing Failed: multiple points
at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
...

Nov 8, 2012 9:40:33 AM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Schema Parsing Failed:
multiple points
at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
...

Now if I don't copy the schema and push my nutch crawl to solr I get
following error:

SolrIndexer: starting at 2012-11-08 10:49:02
Indexing 5 documents
java.io.IOException: Job failed!
SolrDeleteDuplicates: starting at 2012-11-08 10:49:47
SolrDeleteDuplicates: Solr url: http://photon:8983/solr/

And this is taken from the logging:
org.apache.solr.common.SolrException: ERROR: [doc=
http://e-docs/infrastructure/cpuload_monitor.html] unknown field 'host'

What should I do or what am I missing?

I hope you can help me
Best Regards
Antony


Re: Testing Solr Cloud with ZooKeeper

2012-11-08 Thread darul
Hello again,

With the following config :

- 2 zookeeper ensemble
- 2 shards
- 2 main solr instances for the 2 shards
- I added 2, 3 replicates for fun.

While running and I stop one replicate, I see in admin ui graph updates
(replicate disabled/inactivated)...normal.

But if I stopped all solr instance and restart the first main instance
:8983, I always get it waiting for some replicates...is it useful ? Why
replicate are needed to run ? Can not access to admin anymore. 

Solution is to erase zookeeper data and start again, do you have any
solutions to avoid :



What if my replicates are really down in production and I restart everything
?

Another question, 2 shards means 2 zookeeper ensemble, 3 shards, 3 zookeeper
ensemble ?

Thanks,

Jul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019028.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
Weird, if i return the file contents in ZK with 'get' it returns me

w??rde  |  would
w??rden |  would

for example. So the Umlaute are not shown. Does anyone have an idea if this
is because of Zookeepers cli or of the file contents itself?

Thanks & regards.

On Thu, Nov 8, 2012 at 12:24 PM, Daniel Brügge <
daniel.brue...@googlemail.com> wrote:

> I trust the 'file' command output. And if i can read there "UTF-8 Unicode"
> I believe that this is correct. Don't know if this is the 'correct answer'
> for you ;)
>
> BTW: It works locally, but not with ZK. So it's maybe more a ZK issue,
> which
> somehow destroys my file. Will check.
>
>
> On Thu, Nov 8, 2012 at 12:12 PM, Robert Muir  wrote:
>
>> On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge
>>  wrote:
>> > Hi,
>> >
>> > i am running a SolrCloud cluster with the 4.0.0 version. I have a
>> stopwords
>> > file
>> > which is in the correct encoding.
>>
>> What makes you think that?
>>
>> Note: "Because I can read it" is not the correct answer.
>>
>> Ensure any of your stopwords files etc are in UTF-8. This is often
>> different from the encoding your computer uses by default if you open
>> a file, start typing in it, and press save.
>>
>
>


RE: Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-08 Thread Markus Jelsma
Hi, 

Your Nutch schema likely points to the old EnglishPorterFilter that doesn't 
exist anymore. You can change that occurance to PorterStemFilterFactory, that 
should fix the issue. 
 
-Original message-
> From:Antony Steiner 
> Sent: Thu 08-Nov-2012 14:05
> To: solr-user@lucene.apache.org
> Subject: Apache Nutch 1.5.1 + Apache Solr 4.0
> 
> Hello my name is Antony and I'm new to apache nutch and solr.
> 
> I want to crawl my website and therefore I downloaded nutch to do this.
> This works fine. But no I would like to integrate nutch with solr. Im
> running this on my unix system.
> Im trying to follow this tutorial:
> http://wiki.apache.org/nutch/NutchTutorial
> But it wont for me. Running Solr without nutch is no problem. I can post
> documents to solr with post.jar. But what I want to do is post my nutch
> crawl to solr.
> Now if I copy the schema.xml from nutch to
> apache-solr-4.0.0/example/solr/collection1/conf directory aned restart solr
> (java -jar start.jar), I get compiling errors but Solr will start. (Is this
> the correct directory to copy my schema?)
> 
> Nov 8, 2012 9:40:33 AM org.apache.solr.schema.IndexSchema readSchema
> INFO: Schema name=nutch
> Nov 8, 2012 9:40:33 AM org.apache.solr.core.CoreContainer create
> SEVERE: Unable to create core: collection1
> org.apache.solr.common.SolrException: Schema Parsing Failed: multiple points
> at
> org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
> at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
> ...
> 
> Nov 8, 2012 9:40:33 AM org.apache.solr.common.SolrException log
> SEVERE: null:org.apache.solr.common.SolrException: Schema Parsing Failed:
> multiple points
> at
> org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
> at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
> ...
> 
> Now if I don't copy the schema and push my nutch crawl to solr I get
> following error:
> 
> SolrIndexer: starting at 2012-11-08 10:49:02
> Indexing 5 documents
> java.io.IOException: Job failed!
> SolrDeleteDuplicates: starting at 2012-11-08 10:49:47
> SolrDeleteDuplicates: Solr url: http://photon:8983/solr/
> 
> And this is taken from the logging:
> org.apache.solr.common.SolrException: ERROR: [doc=
> http://e-docs/infrastructure/cpuload_monitor.html] unknown field 'host'
> 
> What should I do or what am I missing?
> 
> I hope you can help me
> Best Regards
> Antony
> 


Solr SpellCheck on Query Field

2012-11-08 Thread SolrCarinthia
Is it possible to run a spellcheck on multiple fields. I am aware of using a
multivalued field for this
(http://lucene.472066.n3.nabble.com/spellcheck-on-multiple-fields-td1587327.html)

However, what I want is to return spellcheck alternatives based on the field
against which the query ran. So if I run a query against a field like
'FirstName', I want to be able to retrieve alternate query terms from the
values indexed in 'FirstName' field only. Similarly a search against a field
'LastName' should return alternatives from the values indexed for this field
only. I dont think a multivalued field approach would work for me, since it
is actually an aggregation of indexed values from multiple fields. When
searching for First Name, I don't want to put forward suggestions that are
actually coming from tokens indexed from Last Name, Address City,etc.

To summarize my problem, I want to be able to chose the field against which
spellcheck alternatives should be provided at query time. Is this possible ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-SpellCheck-on-Query-Field-tp4019036.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
Ah, I have fixed it. It was necessary to import the files into Zookeeper
using the file.encoding system property and set it to UTF-8. Then it
worked. Hooray. :)

e.g.

java -Dfile.encoding=UTF-8 -Dbootstrap_confdir=/home/me/myconfdir
-Dcollection.configName=config1 -DzkHost="zkhost:2181" -DnumShards=2
-Dsolr.solr.home=/home/me/solr -jar start.jar



On Thu, Nov 8, 2012 at 2:09 PM, Daniel Brügge  wrote:

> Weird, if i return the file contents in ZK with 'get' it returns me
>
> w??rde  |  would
> w??rden |  would
>
> for example. So the Umlaute are not shown. Does anyone have an idea if
> this is because of Zookeepers cli or of the file contents itself?
>
> Thanks & regards.
>
> On Thu, Nov 8, 2012 at 12:24 PM, Daniel Brügge <
> daniel.brue...@googlemail.com> wrote:
>
>> I trust the 'file' command output. And if i can read there "UTF-8 Unicode"
>> I believe that this is correct. Don't know if this is the 'correct
>> answer' for you ;)
>>
>> BTW: It works locally, but not with ZK. So it's maybe more a ZK issue,
>> which
>> somehow destroys my file. Will check.
>>
>>
>> On Thu, Nov 8, 2012 at 12:12 PM, Robert Muir  wrote:
>>
>>> On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge
>>>  wrote:
>>> > Hi,
>>> >
>>> > i am running a SolrCloud cluster with the 4.0.0 version. I have a
>>> stopwords
>>> > file
>>> > which is in the correct encoding.
>>>
>>> What makes you think that?
>>>
>>> Note: "Because I can read it" is not the correct answer.
>>>
>>> Ensure any of your stopwords files etc are in UTF-8. This is often
>>> different from the encoding your computer uses by default if you open
>>> a file, start typing in it, and press save.
>>>
>>
>>
>


Re: Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-08 Thread Antony Steiner
Hi,

Thank you for your sugestion. Nope, it didn't change anything. Should I
post the full stacktrace?

Regards
Antony


2012/11/8 Markus Jelsma 

> Hi,
>
> Your Nutch schema likely points to the old EnglishPorterFilter that
> doesn't exist anymore. You can change that occurance to
> PorterStemFilterFactory, that should fix the issue.
>
> -Original message-
> > From:Antony Steiner 
> > Sent: Thu 08-Nov-2012 14:05
> > To: solr-user@lucene.apache.org
> > Subject: Apache Nutch 1.5.1 + Apache Solr 4.0
> >
> > Hello my name is Antony and I'm new to apache nutch and solr.
> >
> > I want to crawl my website and therefore I downloaded nutch to do this.
> > This works fine. But no I would like to integrate nutch with solr. Im
> > running this on my unix system.
> > Im trying to follow this tutorial:
> > http://wiki.apache.org/nutch/NutchTutorial
> > But it wont for me. Running Solr without nutch is no problem. I can post
> > documents to solr with post.jar. But what I want to do is post my nutch
> > crawl to solr.
> > Now if I copy the schema.xml from nutch to
> > apache-solr-4.0.0/example/solr/collection1/conf directory aned restart
> solr
> > (java -jar start.jar), I get compiling errors but Solr will start. (Is
> this
> > the correct directory to copy my schema?)
> >
> > Nov 8, 2012 9:40:33 AM org.apache.solr.schema.IndexSchema readSchema
> > INFO: Schema name=nutch
> > Nov 8, 2012 9:40:33 AM org.apache.solr.core.CoreContainer create
> > SEVERE: Unable to create core: collection1
> > org.apache.solr.common.SolrException: Schema Parsing Failed: multiple
> points
> > at
> > org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
> > at
> org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
> > ...
> >
> > Nov 8, 2012 9:40:33 AM org.apache.solr.common.SolrException log
> > SEVERE: null:org.apache.solr.common.SolrException: Schema Parsing Failed:
> > multiple points
> > at
> > org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
> > at
> org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
> > at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
> > ...
> >
> > Now if I don't copy the schema and push my nutch crawl to solr I get
> > following error:
> >
> > SolrIndexer: starting at 2012-11-08 10:49:02
> > Indexing 5 documents
> > java.io.IOException: Job failed!
> > SolrDeleteDuplicates: starting at 2012-11-08 10:49:47
> > SolrDeleteDuplicates: Solr url: http://photon:8983/solr/
> >
> > And this is taken from the logging:
> > org.apache.solr.common.SolrException: ERROR: [doc=
> > http://e-docs/infrastructure/cpuload_monitor.html] unknown field 'host'
> >
> > What should I do or what am I missing?
> >
> > I hope you can help me
> > Best Regards
> > Antony
> >
>


RE: Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-08 Thread Markus Jelsma
Hi - it fixes it here. Please post the full stack trace.
 
-Original message-
> From:Antony Steiner 
> Sent: Thu 08-Nov-2012 15:16
> To: solr-user@lucene.apache.org
> Subject: Re: Apache Nutch 1.5.1 + Apache Solr 4.0
> 
> Hi,
> 
> Thank you for your sugestion. Nope, it didn't change anything. Should I
> post the full stacktrace?
> 
> Regards
> Antony
> 
> 
> 2012/11/8 Markus Jelsma 
> 
> > Hi,
> >
> > Your Nutch schema likely points to the old EnglishPorterFilter that
> > doesn't exist anymore. You can change that occurance to
> > PorterStemFilterFactory, that should fix the issue.
> >
> > -Original message-
> > > From:Antony Steiner 
> > > Sent: Thu 08-Nov-2012 14:05
> > > To: solr-user@lucene.apache.org
> > > Subject: Apache Nutch 1.5.1 + Apache Solr 4.0
> > >
> > > Hello my name is Antony and I'm new to apache nutch and solr.
> > >
> > > I want to crawl my website and therefore I downloaded nutch to do this.
> > > This works fine. But no I would like to integrate nutch with solr. Im
> > > running this on my unix system.
> > > Im trying to follow this tutorial:
> > > http://wiki.apache.org/nutch/NutchTutorial
> > > But it wont for me. Running Solr without nutch is no problem. I can post
> > > documents to solr with post.jar. But what I want to do is post my nutch
> > > crawl to solr.
> > > Now if I copy the schema.xml from nutch to
> > > apache-solr-4.0.0/example/solr/collection1/conf directory aned restart
> > solr
> > > (java -jar start.jar), I get compiling errors but Solr will start. (Is
> > this
> > > the correct directory to copy my schema?)
> > >
> > > Nov 8, 2012 9:40:33 AM org.apache.solr.schema.IndexSchema readSchema
> > > INFO: Schema name=nutch
> > > Nov 8, 2012 9:40:33 AM org.apache.solr.core.CoreContainer create
> > > SEVERE: Unable to create core: collection1
> > > org.apache.solr.common.SolrException: Schema Parsing Failed: multiple
> > points
> > > at
> > > org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
> > > at
> > org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
> > > ...
> > >
> > > Nov 8, 2012 9:40:33 AM org.apache.solr.common.SolrException log
> > > SEVERE: null:org.apache.solr.common.SolrException: Schema Parsing Failed:
> > > multiple points
> > > at
> > > org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
> > > at
> > org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
> > > at
> > org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
> > > ...
> > >
> > > Now if I don't copy the schema and push my nutch crawl to solr I get
> > > following error:
> > >
> > > SolrIndexer: starting at 2012-11-08 10:49:02
> > > Indexing 5 documents
> > > java.io.IOException: Job failed!
> > > SolrDeleteDuplicates: starting at 2012-11-08 10:49:47
> > > SolrDeleteDuplicates: Solr url: http://photon:8983/solr/
> > >
> > > And this is taken from the logging:
> > > org.apache.solr.common.SolrException: ERROR: [doc=
> > > http://e-docs/infrastructure/cpuload_monitor.html] unknown field 'host'
> > >
> > > What should I do or what am I missing?
> > >
> > > I hope you can help me
> > > Best Regards
> > > Antony
> > >
> >
> 


Re: Indexing text files in Solr

2012-11-08 Thread Erick Erickson
You should probably start here:
http://lucene.apache.org/solr/4_0_0/tutorial.html

For indexing and analysis, i.e. how the text is
transformed for indexing and searching (things like
stemming, lowercasing etc.) that's all configured
in schema.xml (you'll find that file in

...solr/example/solr/collection1/conf

Another key file in that directory is solrconfig.xml, which
defines how Solr does things like query parsing etc.

So those are places to start. I'd recommend working
through the tutorial and then coming back with specific
questions.

Also, the two Manning books "Solr in Action" and "Lucene in Action"
will give you a lot of background.

Best
Erick


On Wed, Nov 7, 2012 at 3:01 AM, sharat89  wrote:

> Hey,
>
>  I am new to solr and java. I am working on a project that involves the use
> of both. I want to learn the complete process of indexing a text file using
> solr. Also, I'd like to learn how to interpret the results.
>
> i am looking for outputs like "number of occurrences of a particular word"
> .
>
> Please keep in mind that I am a total newbie. So, instructions that require
> sufficient knowledge of solr and java wont be of any help to me. In case
> there is a forum for beginners in Solr, I'd be grateful if you could
> redirect me to such a forum.
>
>  I read somewhere that indexing text files is not a straight forward
> process
> as it involves changing some entries in some key working files prior to
> starting solr.
>
> Any help in this regard would be much appreciated.
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-text-files-in-Solr-tp4018674.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: scale with 3 shards on single server

2012-11-08 Thread Erick Erickson
Yep, that's the usual process for growth planning.

Best
Erick


On Wed, Nov 7, 2012 at 4:01 AM, SuoNayi  wrote:

> Hi all,
>Because we cannot add or remove shard when solrcloud cluster has been
> set up,
> so we have to predict a precise shard size at first, says we need 3 shards.
> but now we have no enough servers to set up the cluster with one shard one
> server.
> In this situation, I have to set up the cluster on the single server, does
> this mean I have to
> deploy three shards on the same server differing by different directories?
> when more servers are available, how to move two shards to new servers?
>
>
> Thanks && regards.
>
>
> SuoNayi
>
>


Re: Re-index stored values

2012-11-08 Thread Gora Mohanty
On 8 November 2012 16:10, Andreas Niekler
 wrote:
>
> Hello,
>
> how could i re-index stored values of my solr index if i change the
> schema. Is there a gentle way to do this with stored values within solr
> itself? Normally i have to grab the stored values of a field and put it
> again to an update query for solr.
>
> What does that to to copied fields? Can i just delete the contents of a
> copied field a well?

There are various ways one could do this, but one way is to retrieve
the values, put it into Solr XML format, and post the XML to a *new*
Solr index with the new schema. Do not retrieve the copy fields, and
the re-indexing should recopy these as per directives in the new schema.xml.

Regards,
Gora


Limit the SolR acces from the web for one user-agent?

2012-11-08 Thread Bruno Mannina

Dear All,

I'm using an external program (my own client) to access to my 
Apache-SolR database.
I would like to restrict the SOLR access to a specific User-Agent 
(defined in my program).


I would like to know if it's possible to do that directly in SolR config 
or I must

process that in the Apache server?

My program do only requests like this (i.e.):
http://xxx.xxx.xxx.xxx:pp/solr/select/?q=ap%3Afuelcell&version=2.2&start=0&rows=10&indent=on 



I can add on my HTTP component properties an User-Agent, Log, Pass, 
etc... like a standard Http connection.


To complete: my soft is distribued to several users and I would like to 
limit the SOLR access to these users and with my program.

FireFox, Chrome, I.E. will be unauthorized.

thanks for your comment or help,
Bruno

Ubuntu 12.04LTS
SolR 3.6


Re: Questions about schema.xml

2012-11-08 Thread Jack Krupansky
Many token filters will be used 100% identically for both "index" and 
"query" analysis, but WordDelimiterFilter is a rare exception. The issue is 
that at index time it has the ability to generate multiple tokens at the 
same position (the "catenate" options), any of which can be queried, but at 
query time it can be problematic to have these "extra" terms (except in some 
conditions), so the WDF settings suppress generation of the extra terms.


Another example is synonyms - generate extra terms at index time for greater 
precision of searches, but limit the query terms to exclude the "extra" 
terms.


That's the reason for the occaassional asymmetry between index-time and 
query-time analyzers.


-- Jack Krupansky

-Original Message- 
From: johnmu...@aol.com

Sent: Wednesday, November 07, 2012 7:13 PM
To: solr-user@lucene.apache.org
Subject: Questions about schema.xml


HI,


Can someone help me understand the meaning of  and 
 in schema.xml, how they are used and what do I get 
back when the values are not the same?



For example, given:


autoGeneratePhraseQueries="true">

  
 
 words="stopwords.txt" enablePositionIncrements="true" />
 generateNumberParts="1" catenateWords="1" catenateNumbers="1" 
catenateAll="0" splitOnCaseChange="1"/>

 
 protected="protwords.txt"/>

 
  
  
 
 ignoreCase="true" expand="true"/>
 words="stopwords.txt" enablePositionIncrements="true" />
 generateNumberParts="1" catenateWords="0" catenateNumbers="0" 
catenateAll="0" splitOnCaseChange="1"/>

 
 protected="protwords.txt"/>

 
  



If I make the entire content of "index" the same as "query" (or the other 
way around) how will that impact my search?  And why would I want to not 
make those two blocks the same?



Thanks!!!


-MJ 



Re: New solrcloud deployment, no results

2012-11-08 Thread Erick Erickson
Hmmm, I tried this with a 2 shard cluster and it works just fine, using
your schema, solrconfig and query so I'm puzzled. What happens when you
look at your cluster with the admin page? When you dive into collection1,
does it show any documents?

Also, look at admin/schema-browser and look at the actual fields, to see if
there's any data indexed.


One thing though, I'd _seriously_ consider making the id a simple "string"
type. It's possible that you're having some sort of wonkiness as a result
of tokenizing your . I know of no _specific_ issues here, but it
makes me really uneasy to see that your id field is tokenized in your
schema given that Solr pretty much assumes that  is a single
token/document. There is some slight evidence for this in that your
numfound is 6 but the data isn't being echoed (although it is for me), but
that's just guessing.

Best
Erick

P.S. If you're still stumped, can you also post the docs you're indexing?
Or at least their IDs so I can see what happens then?



On Wed, Nov 7, 2012 at 4:20 PM, Jeff Rhines  wrote:

> I have a cluster of 6 shards of Solr 4.0.0 deployed, one machine each,
> with no replicas, and another single machine running a zookeeper ensemble
> of 5. Using python sunburnt, I submit six documents with separate ids and
> populated text fields and commit them. No errors are reported. When I
> search ( /solr/collection1/select?q=*%3A*&fl=id&wt=json&indent=true ), I
> see no results, but numFound 6. I'm sure I've misconfigured something, and
> I'm hoping more experienced folk can see what it is. If you have any
> troubleshooting tips, I'll try anything at this point.
>
> Thanks,
> Jeff
>
> Results:
> {
>   "responseHeader":{
> "status":0,
> "QTime":52},
>   "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[]
>   }}
>
>
> My schema.xml is very simple:
>
> 
> 
>   
>  />
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
> 
>   
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
>  ignoreCase="true" expand="true"/>
> 
>   
> 
> 
>   
>  pattern="[^a-zA-Z0-9]"/>
> 
>   
> 
>  positionIncrementGap="0"/>
>  
>  
> required="true"/>
> required="true"/>
>
>  
>  id
> 
>
> As is my solrconfig.xml:
>
> 
> 
>   LUCENE_40
>   
>   
>   
>   
>   
>   
>   
>   
>   ${solr.data.dir:}
>class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>   
> 
>   ${solr.data.dir:}
> 
>   
>   
>   
>   
>   
>   
> 
>   true
>
>   
>   
> 
>
>


Re: Limit the SolR acces from the web for one user-agent?

2012-11-08 Thread Alexandre Rafalovitch
It is very easy to do this on Apache, but you need to be aware that
User-Agent is extremely easy to both sniff and spoof.

Have you thought of perhaps using Client and Server Certificates to protect
the connection and embedding those certificates into clients?

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Nov 8, 2012 at 9:39 AM, Bruno Mannina  wrote:

> Dear All,
>
> I'm using an external program (my own client) to access to my Apache-SolR
> database.
> I would like to restrict the SOLR access to a specific User-Agent (defined
> in my program).
>
> I would like to know if it's possible to do that directly in SolR config
> or I must
> process that in the Apache server?
>
> My program do only requests like this (i.e.):
> http://xxx.xxx.xxx.xxx:pp/**solr/select/?q=ap%3Afuelcell&**
> version=2.2&start=0&rows=10&**indent=on
>
> I can add on my HTTP component properties an User-Agent, Log, Pass, etc...
> like a standard Http connection.
>
> To complete: my soft is distribued to several users and I would like to
> limit the SOLR access to these users and with my program.
> FireFox, Chrome, I.E. will be unauthorized.
>
> thanks for your comment or help,
> Bruno
>
> Ubuntu 12.04LTS
> SolR 3.6
>


Re: Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-08 Thread Antony Steiner
Hi,

I just saw there is a schema-solr4.xml and a schema.xml in the nutch conf
directory. But with both schemas I get the same errors when starting up
solr.
Heres the stacktrace:

Nov 8, 2012 3:32:14 PM org.apache.solr.core.SolrConfig 
INFO: Loaded SolrConfig: solrconfig.xml
Nov 8, 2012 3:32:14 PM org.apache.solr.schema.IndexSchema readSchema
INFO: Reading Solr Schema
Nov 8, 2012 3:32:14 PM org.apache.solr.schema.IndexSchema readSchema
INFO: Schema name=nutch
Nov 8, 2012 3:32:14 PM org.apache.solr.core.CoreContainer create
SEVERE: Unable to create core: collection1
org.apache.solr.common.SolrException: Schema Parsing Failed: multiple points
at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
at
org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:754)
at
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
at
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1221)
at
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
at
org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
at
org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
at
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
at
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
at
org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at
org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
at
org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at
org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:63)
at
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
at
org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
at org.eclipse.jetty.server.Server.doStart(Server.java:263)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at
org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at
org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1138)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:457)
at org.eclipse.jetty.start.Main.start(Main.java:602)
at org.eclipse.jetty.start.Main.main(Main.java:82)
Caused by: java.lang.NumberFormatException: multiple points
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1082)
at java.lang.Float.parseFloat(Float.java:422)
at org.apache.solr.core.Config.getFloat(Config.java:284)
at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:358)
... 45 more
Nov 8, 2012 3:32:14 PM org.apache.solr.common.SolrException log
SEVERE: n

Re: positions and qf parameter in (e)dismax

2012-11-08 Thread Jack Krupansky

Sounds like a reasonable request for a new feature to add to Solr.

Question: Would you want the query to SKIP fields that don't have positions 
enabled, or to treat a phrase as discrete terms? Or, is that another option 
you might need to control  for each field?


-- Jack Krupansky

-Original Message- 
From: Markus Jelsma

Sent: Thursday, November 08, 2012 5:01 AM
To: solr-user@lucene.apache.org
Subject: positions and qf parameter in (e)dismax

Hi,

We do not want to store positions for some fields or omit term and positions 
(or just tf) for other fields. Obviously we don't need/want explicit phrase 
matching on the fields we want to configure without positions, but (e)dismax 
doesn't let us. All text fields configured in the QF parameter are eligible 
for explicit phrase matching and need to have positions. We're looking for a 
way to disable what we don't need and prevent Solr from searching fields for 
phrases that we don't want to be searched on.


Essentially we'd want to limit explicit phrase matching to the same fields 
configured in pf or have Lucene ignore explicit phrase searching on fields 
that have no positions loaded.


Any ideas to share?

Thanks,
Markus 



RE: Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-08 Thread Markus Jelsma
Hm, i copied the schema from Nutch' trunk verbatim and only had to change the 
stemmer.  It seems like you have, for some reason, a float with an extra point 
dangling around somewhere. Can you check?
 
-Original message-
> From:Antony Steiner 
> Sent: Thu 08-Nov-2012 15:54
> To: Markus Jelsma ; solr-user@lucene.apache.org
> Subject: Re: Apache Nutch 1.5.1 + Apache Solr 4.0
> 
> Hi,
> 
> I just saw there is a schema-solr4.xml and a schema.xml in the nutch conf
> directory. But with both schemas I get the same errors when starting up
> solr.
> Heres the stacktrace:
> 
> Nov 8, 2012 3:32:14 PM org.apache.solr.core.SolrConfig 
> INFO: Loaded SolrConfig: solrconfig.xml
> Nov 8, 2012 3:32:14 PM org.apache.solr.schema.IndexSchema readSchema
> INFO: Reading Solr Schema
> Nov 8, 2012 3:32:14 PM org.apache.solr.schema.IndexSchema readSchema
> INFO: Schema name=nutch
> Nov 8, 2012 3:32:14 PM org.apache.solr.core.CoreContainer create
> SEVERE: Unable to create core: collection1
> org.apache.solr.common.SolrException: Schema Parsing Failed: multiple points
> at
> org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
> at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
> at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> at
> org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:754)
> at
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
> at
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1221)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
> at
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
> at
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
> at
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
> at
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
> at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
> at
> org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
> at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
> at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
> at
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:63)
> at
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
> at org.eclipse.jetty.server.Server.doStart(Server.java:263)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1215)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1138)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.eclipse.jetty.start.Main.invokeMai

RE: positions and qf parameter in (e)dismax

2012-11-08 Thread Markus Jelsma

 
-Original message-
> From:Jack Krupansky 
> Sent: Thu 08-Nov-2012 15:56
> To: solr-user@lucene.apache.org
> Subject: Re: positions and qf parameter in (e)dismax
> 
> Sounds like a reasonable request for a new feature to add to Solr.
> 
> Question: Would you want the query to SKIP fields that don't have positions 
> enabled, or to treat a phrase as discrete terms? Or, is that another option 
> you might need to control  for each field?

In the past it was silently ignored. But another parameter or relying on pf 
seems more appropriate to me.

> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: Markus Jelsma
> Sent: Thursday, November 08, 2012 5:01 AM
> To: solr-user@lucene.apache.org
> Subject: positions and qf parameter in (e)dismax
> 
> Hi,
> 
> We do not want to store positions for some fields or omit term and positions 
> (or just tf) for other fields. Obviously we don't need/want explicit phrase 
> matching on the fields we want to configure without positions, but (e)dismax 
> doesn't let us. All text fields configured in the QF parameter are eligible 
> for explicit phrase matching and need to have positions. We're looking for a 
> way to disable what we don't need and prevent Solr from searching fields for 
> phrases that we don't want to be searched on.
> 
> Essentially we'd want to limit explicit phrase matching to the same fields 
> configured in pf or have Lucene ignore explicit phrase searching on fields 
> that have no positions loaded.
> 
> Any ideas to share?
> 
> Thanks,
> Markus 
> 
> 


Re: Re-index stored values

2012-11-08 Thread Andreas Niekler
Thank you for your answer. If you talk of various ways can you also 
comment on some other aproaches?


Am 08.11.2012 15:37, schrieb Gora Mohanty:

On 8 November 2012 16:10, Andreas Niekler
 wrote:


Hello,

how could i re-index stored values of my solr index if i change the
schema. Is there a gentle way to do this with stored values within solr
itself? Normally i have to grab the stored values of a field and put it
again to an update query for solr.

What does that to to copied fields? Can i just delete the contents of a
copied field a well?


There are various ways one could do this, but one way is to retrieve
the values, put it into Solr XML format, and post the XML to a *new*
Solr index with the new schema. Do not retrieve the copy fields, and
the re-indexing should recopy these as per directives in the new schema.xml.

Regards,
Gora



--
Andreas Niekler, Dipl. Ing. (FH)
NLP Group | Department of Computer Science
University of Leipzig
Johannisgasse 26 | 04103 Leipzig

mail: aniek...@informatik.uni-leipzig.deg.de


Re: searching camel cased terms with phrase queries

2012-11-08 Thread Jack Krupansky
I forgot to mention DictionaryCompoundWordTokenFilterFactory. It does 
require you to create a dictionary of terms, as opposed to using the terms 
that have been encountered in the index.


-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Wednesday, November 07, 2012 8:14 AM
To: solr-user@lucene.apache.org
Subject: Re: searching camel cased terms with phrase queries

This is one of those areas of Solr where you can refine and make
improvements, as you have done, but never actually reach 100% satisfaction.
And, in some cases, as here, you have a choice of settings and no single
combination covers all cases.

In this case, you really need compound-term recognition - detecting that two
or more terms have been juxtaposed with no lexical boundary. Google has it,
and I 'm sure some Solr users have implemented it on their own, but it isn't
in Solr proper, yet.

WDF provides a partial approximation, by generating extra, compound terms at
index time. That works well when ALL of the terms are written together, but
not when only a subset are written together without lexical boundaries, as
in your final example.

So, you COULD go the full Google route with a lot of additional effort, or
accept that you offer only a reasonable approximation. Your choice.

So, pick the approximation which seems "best" and accept that it doesn't
handle the other cases.

BTW, the proper name is "PricewaterhouseCoopers".

-- Jack Krupansky

-Original Message- 
From: Dmitry Kan

Sent: Wednesday, November 07, 2012 1:58 AM
To: solr-user@lucene.apache.org
Subject: searching camel cased terms with phrase queries

Hello list,

There was a number of threads about handling camel cased words apparently
in the past (
http://search-lucene.com/?q=camel+case&fc_project=Lucene&fc_project=Solr).
Our case is somewhat different from them.

===
Configuration & example
===

To illustrate the issue, let me give you a real example from our data.
Suppose there is a term in the original text: SmartTV.

If a user wants to type "SmartTV" and "smart tv", we want both to hit the
original term SmartTV. In order to achieve this, the following filter is
used in our solr 3.4 schema:

index side:

 

query side:

 

(no differences)

Copying from the analysis page, the index will contain the following terms
and their positions:

org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1,
spiltOnCaseChange=1, generateNumberParts=0, catenateWords=0,
luceneMatchVersion=LUCENE_34, generateWordParts=1, catenateAll=0,
catenateNumbers=0} position 12 term text SmartTVTV Smart startOffset 05 0
endOffset 77 5 type  

(there are tokenizer StandardTokenizerFactory and StandardFilterFactory
preceeding this filter, but as they didn't affect in this case, their
output is skipped).

On the query side the query="smart tv" gets processed like:

org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1,
spiltOnCaseChange=1, generateNumberParts=0, catenateWords=0,
luceneMatchVersion=LUCENE_34, generateWordParts=1, catenateAll=0,
catenateNumbers=0} position 12 term text smarttv startOffset 06 endOffset 58
type 

so there is a match (of course the LowerCaseFilterFactory is configured to
follow the WordDelimiterFilterFactory to unify the cases for matching) and
user is happily shooting queries: 'smart tv', 'smarttv' and 'SmartTV'.

===
More complex example that doesn't work with the above configuration
===

Problems start to occur, if a user types "smarttv for me" against the text
"SmartTV for me". Here are the index and query analysis excerpts:

index:

org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1,
spiltOnCaseChange=1, generateNumberParts=0, catenateWords=0,
luceneMatchVersion=LUCENE_34, generateWordParts=1, catenateAll=0,
catenateNumbers=0} position 1234 term text SmartTVTVforme Smart startOffset
05812 0 endOffset 771114 5 type 


query:

org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1,
spiltOnCaseChange=1, generateNumberParts=0, catenateWords=0,
luceneMatchVersion=LUCENE_34, generateWordParts=1, catenateAll=0,
catenateNumbers=0} position 123 term text smarttvforme startOffset 0812
endOffset 71114 type 
since in the user query smarttv was written in small case, no split on case
is triggered and we believe there is no match due to mismatch of the term
positions ('for' is on the 3rd position in the index and on the 2nd
position in the query and 'smarttv' and 'for' are not adjacent to satisfy
the phrase query).


=
Config change to fix the problem
=


But here catenateWords=1 on indexing side comes at rescue. Which changes
things to:

index:

org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1,
spiltOnCaseChange=1, generateNumberParts=0, catenateWords=1,
luceneMatchVer

RE: Solr SpellCheck on Query Field

2012-11-08 Thread Dyer, James
This would be an awesome feature to have, wouldn't it?

For now, the best you can do is to create a master dictionary that contains all 
of the "FirstName"s and "LastName"s and use that as your dictionary's 
spellcheck field.  This is the  technique that you refer to in the 
linked post.  Alone this won't work because it might correct a misspelled 
"FirstName" with someone's "LastName" or visa-versa, giving you absurd query 
corrections.

The workaround for this is to use "spellcheck.collate=true" and set 
"spellcheck.maxCollationTries" to a number greater than zero.  This will cause 
SpellCheckComponent to verify that the particular suggestions will actually 
return some hits before sending them back.  So every collation returned will 
represent a valid set of spelling corrections for the user's terms.

Another drawback to having a master dictionary is that by default, 
SpellCheckComponent will never suggest for words included in the dictionary.  
So if somebody's misspelt FirstName happens to be in the dictionary because it 
is a valid LastName, SpellCheckComponent's default settings assume that this is 
indeed correctly-spelled.  The way around this is to specify 
"spellcheck.alternativeTermCount" to a non-zero value.  This is the number of 
suggestions to return for terms that are in the dictionary (you can use the 
same value as for "spellcheck.count", or a lower value if you want to try and 
tune this behavior).  You should also set "spellcheck.maxResultsForSuggest" to 
zero. (Use a higher value if you also want "did-you-mean"-style suggestions for 
low-hitcount queries.)

I think these conbinations will probably give you exactly what you want, at the 
expense of some overhead and configuration complexity.

For more information, see the wiki section beginning here:  
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count 

For an example, see the "/spell" request handler in the Solr Example:  
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: SolrCarinthia [mailto:tandon.ha...@ymail.com] 
Sent: Thursday, November 08, 2012 7:18 AM
To: solr-user@lucene.apache.org
Subject: Solr SpellCheck on Query Field

Is it possible to run a spellcheck on multiple fields. I am aware of using a
multivalued field for this
(http://lucene.472066.n3.nabble.com/spellcheck-on-multiple-fields-td1587327.html)

However, what I want is to return spellcheck alternatives based on the field
against which the query ran. So if I run a query against a field like
'FirstName', I want to be able to retrieve alternate query terms from the
values indexed in 'FirstName' field only. Similarly a search against a field
'LastName' should return alternatives from the values indexed for this field
only. I dont think a multivalued field approach would work for me, since it
is actually an aggregation of indexed values from multiple fields. When
searching for First Name, I don't want to put forward suggestions that are
actually coming from tokens indexed from Last Name, Address City,etc.

To summarize my problem, I want to be able to chose the field against which
spellcheck alternatives should be provided at query time. Is this possible ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-SpellCheck-on-Query-Field-tp4019036.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Re-index stored values

2012-11-08 Thread Gora Mohanty
On 8 November 2012 20:31, Andreas Niekler <
aniek...@informatik.uni-leipzig.de> wrote:

> Thank you for your answer. If you talk of various ways can you also
> comment on some other aproaches?
>

I am not that familiar with SolrJ, but I think that it should
also be possible to use it to read the stored values, and
re-index them to a different index. Similarly, you should
be able to use most any programming language to retrieve
values by querying the Solr query URL, and then re-index.
In most cases, that is probably most easily done via
generating Solr XML.

Maybe someone else can suggest a better way to do this.

Regards,
Gora


Re: is it possible to save the search query?

2012-11-08 Thread Jack Krupansky
You can certainly save the results themselves yourself as well as the 
explanations for scoring and then compare them yourself. Add 
&debugQuery=true to your query and there will be an "explain" section that 
gives all the values used in computing the scores of the top documents.


-- Jack Krupansky

-Original Message- 
From: Romita Saha

Sent: Wednesday, November 07, 2012 10:01 PM
To: solr-user@lucene.apache.org
Subject: Re: is it possible to save the search query?

Hi,

The following is the example;
1st query:

http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data^2
id&start=0&rows=11&fl=data,id

Next query:

http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data
id^2&start=0&rows=11&fl=data,id

In the 1st query the the field 'data' is boosted by 2. However may be the
user was not satisfied with the response. Thus in the next query he
boosted the field 'id' by 2.

I want to record both the queries and compare between the two, meaning,
what are the changes implemented on the 2nd query which are not present in
the previous one.

Thanks and regards,
Romita Saha



From:   Otis Gospodnetic 
To: solr-user@lucene.apache.org,
Date:   11/08/2012 01:35 PM
Subject:Re: is it possible to save the search query?



Hi,

Compare in what sense?  An example will help.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 7, 2012 8:45 PM, "Romita Saha" 
wrote:


Hi All,

Is it possible to record a search query in solr and then compare it with
the previous search query?

Thanks and regards,
Romita Saha





Re: Searching for Partial Words

2012-11-08 Thread Jack Krupansky
The "side" attribute must be "front" or "back". Sorry, no "both", although 
that sounds like a reasonable feature request.


"front" is the default side.

-- Jack Krupansky

-Original Message- 
From: Sohail Aboobaker

Sent: Tuesday, November 06, 2012 7:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching for Partial Words

Thanks Jack.
In the configuration below:


  

  


What are the possible values for "side"?

If I understand it correctly, minGramSize=3 and side=front, will
include eng* but not en*. Is this correct? So, the minGramSize is for
number of characters allowed in the specified side.

Does it allow side=both :) or something similar?

Regards,
Sohail 



Skewed IDF in multi lingual index

2012-11-08 Thread Markus Jelsma
Hi,

We're testing a large multi lingual index with _LANG fields for each language 
and using dismax to query them all. Users provide, explicit or implicit, 
language preferences that we use for either additive or multiplicative boosting 
on the language of the document. However, additive boosting is not adequate 
because it cannot overcome the extremely high IDF values for the same word in 
another language so regardless of the the preference, foreign documents are 
returned. Multiplicative boosting solves this problem but has the other 
downside as it doesn't allow us with standard qf=field^boost to prefer 
documents in another language above the preferred language because the 
multiplicative is so strong. We do use the def function 
(boost=def(query($qq),.3)) to prevent one boost query to return 0 and thus a 
product of 0 for all boost queries. But it doesn't help that much

This all comes down to IDF differences between the languages, even common words 
such as country names like `india` show large differences in IDF. Is here 
anyone with some hints or experiences to share about skewed IDF in such an 
index?

Thanks,
Markus


Re: is it possible to save the search query?

2012-11-08 Thread Otis Gospodnetic
Hi,

Aha, I think I understand.  Yes, you could collect all doc IDs from each
query and find the differences.  There is nothing in Solr that can find
those differences or that would store doc IDs of returned hits in the first
place, so you would have to implement this yourself.  Sematext's Search
Analytics service my be of help here in the sense that all data you
need (queries, doc IDs, etc.) are collected, so it would be a matter of
providing an API to get the data for off-line analysis.  But this data
collection+diffing is also something you could implement yourself.  One
thing to think about - what do you do when a query returns a lrge
number of hits.  Do you really want/need to get IDs for all of them, or
only a page at a time.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Nov 8, 2012 at 1:01 AM, Romita Saha wrote:

> Hi,
>
> The following is the example;
> 1st query:
>
>
> http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data
> ^2
> id&start=0&rows=11&fl=data,id
>
> Next query:
>
>
> http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data
> id^2&start=0&rows=11&fl=data,id
>
> In the 1st query the the field 'data' is boosted by 2. However may be the
> user was not satisfied with the response. Thus in the next query he
> boosted the field 'id' by 2.
>
> I want to record both the queries and compare between the two, meaning,
> what are the changes implemented on the 2nd query which are not present in
> the previous one.
>
> Thanks and regards,
> Romita Saha
>
>
>
> From:   Otis Gospodnetic 
> To: solr-user@lucene.apache.org,
> Date:   11/08/2012 01:35 PM
> Subject:Re: is it possible to save the search query?
>
>
>
> Hi,
>
> Compare in what sense?  An example will help.
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm
> On Nov 7, 2012 8:45 PM, "Romita Saha" 
> wrote:
>
> > Hi All,
> >
> > Is it possible to record a search query in solr and then compare it with
> > the previous search query?
> >
> > Thanks and regards,
> > Romita Saha
> >
>
>


Re: Skewed IDF in multi lingual index

2012-11-08 Thread Robert Muir
Hi Markus: how are the languages distributed across documents?

Imagine I have a text_en field and a text_fr field. Lets say I have
100 documents, 95 are english and only 5 are french.
So the text_en field is populated 95% of the time, and the text_fr 5%
of the time.

But the default IDF computation doesnt look at things this way: it
always uses '100' as maxDoc. So in such a situation, any terms against
text_fr are "rare" :)

The first thing i would look at, is treating this situation as merging
results from a english index with 95 docs and a french index with 5
docs.
So I would consider overriding the two idfExplain methods (term and
phrase) to use CollectionStatistics.docCount() instead of
CollectionStatistics.maxDoc()
The former would be 95 for the english field (instead of 100), and 5
for the french field (instead of 100).

I dont think this will solve all your problems: but it might help.

Note: you must ensure your index is fully upgraded to 4.0 to try this
statistic, otherwise it will return -1 if you have any 3.x segments in
your index.

On Thu, Nov 8, 2012 at 11:13 AM, Markus Jelsma
 wrote:
> Hi,
>
> We're testing a large multi lingual index with _LANG fields for each language 
> and using dismax to query them all. Users provide, explicit or implicit, 
> language preferences that we use for either additive or multiplicative 
> boosting on the language of the document. However, additive boosting is not 
> adequate because it cannot overcome the extremely high IDF values for the 
> same word in another language so regardless of the the preference, foreign 
> documents are returned. Multiplicative boosting solves this problem but has 
> the other downside as it doesn't allow us with standard qf=field^boost to 
> prefer documents in another language above the preferred language because the 
> multiplicative is so strong. We do use the def function 
> (boost=def(query($qq),.3)) to prevent one boost query to return 0 and thus a 
> product of 0 for all boost queries. But it doesn't help that much
>
> This all comes down to IDF differences between the languages, even common 
> words such as country names like `india` show large differences in IDF. Is 
> here anyone with some hints or experiences to share about skewed IDF in such 
> an index?
>
> Thanks,
> Markus


Re: NullPointerException when debugQuery=true

2012-11-08 Thread Otis Gospodnetic
Looks like a bug.  If Solr 4.0, maybe this needs to be in JIRA along with
some sample data you indexed + your schema, so one can reproduce it.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Nov 8, 2012 at 9:04 AM, Jeff Rhines  wrote:

> Any help would be greatly appreciated
>
> Query:
> /solr/collection1/select?q=*%3A*&fl=id&wt=json&indent=true&debugQuery=true<
> http://solr-zk1/solr/collection1/select?q=*%3A*&fl=id&wt=json&indent=true&debugQuery=true
> >
> Result:
>
> {
>   "responseHeader":{
> "status":500,
> "QTime":92},
>   "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[]
>   },
>   "error":{
> "trace":"java.lang.NullPointerException\n\tat
>
> org.apache.solr.common.util.NamedList.nameValueMapToList(NamedList.java:109)\n\tat
> org.apache.solr.common.util.NamedList.(NamedList.java:75)\n\tat
>
> org.apache.solr.common.util.SimpleOrderedMap.(SimpleOrderedMap.java:58)\n\tat
>
> org.apache.solr.handler.component.DebugComponent.finishStage(DebugComponent.java:144)\n\tat
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:315)\n\tat
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)\n\tat
>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:351)\n\tat
>
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)\n\tat
>
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)\n\tat
>
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)\n\tat
>
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)\n\tat
> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)\n\tat
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)\n\tat
>
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)\n\tat
>
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)\n\tat
> java.lang.Thread.run(Thread.java:722)\n",
> "code":500}}
>


Re: Testing Solr Cloud with ZooKeeper

2012-11-08 Thread darul
Thanks Otis, 

Indeed here too  zoo doc

 
, they advise to choose odd number of zk nodes this way "To create a
deployment that can tolerate the failure of F machines, you should count on
deploying 2xF+1 machines"...

Well, I just do not yet understand why after using replicate, I am not able
to restart solr instances if replicates are not running. (When I start them,
it is ok)

Do I need to erase all zookeeper config every time solr servers are
restarted...I mean send the conf again with bootstrap, looks like I am not
doing the right way ;)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019102.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: New solrcloud deployment, no results

2012-11-08 Thread Timothy Potter
I've seen the same exact behavior when using analyzed key fields, switching
to string as Erick recommends should solve your problem.

Cheers,
Tim

On Thu, Nov 8, 2012 at 7:45 AM, Erick Erickson wrote:

> Hmmm, I tried this with a 2 shard cluster and it works just fine, using
> your schema, solrconfig and query so I'm puzzled. What happens when you
> look at your cluster with the admin page? When you dive into collection1,
> does it show any documents?
>
> Also, look at admin/schema-browser and look at the actual fields, to see if
> there's any data indexed.
>
>
> One thing though, I'd _seriously_ consider making the id a simple "string"
> type. It's possible that you're having some sort of wonkiness as a result
> of tokenizing your . I know of no _specific_ issues here, but it
> makes me really uneasy to see that your id field is tokenized in your
> schema given that Solr pretty much assumes that  is a single
> token/document. There is some slight evidence for this in that your
> numfound is 6 but the data isn't being echoed (although it is for me), but
> that's just guessing.
>
> Best
> Erick
>
> P.S. If you're still stumped, can you also post the docs you're indexing?
> Or at least their IDs so I can see what happens then?
>
>
>
> On Wed, Nov 7, 2012 at 4:20 PM, Jeff Rhines  wrote:
>
> > I have a cluster of 6 shards of Solr 4.0.0 deployed, one machine each,
> > with no replicas, and another single machine running a zookeeper ensemble
> > of 5. Using python sunburnt, I submit six documents with separate ids and
> > populated text fields and commit them. No errors are reported. When I
> > search ( /solr/collection1/select?q=*%3A*&fl=id&wt=json&indent=true ), I
> > see no results, but numFound 6. I'm sure I've misconfigured something,
> and
> > I'm hoping more experienced folk can see what it is. If you have any
> > troubleshooting tips, I'll try anything at this point.
> >
> > Thanks,
> > Jeff
> >
> > Results:
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":52},
> >   "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[]
> >   }}
> >
> >
> > My schema.xml is very simple:
> >
> > 
> > 
> >   
> >  > />
> >  > positionIncrementGap="100">
> >   
> > 
> >  > words="stopwords.txt" enablePositionIncrements="true" />
> > 
> >   
> >   
> > 
> >  > words="stopwords.txt" enablePositionIncrements="true" />
> >  > ignoreCase="true" expand="true"/>
> > 
> >   
> > 
> > 
> >   
> >  > pattern="[^a-zA-Z0-9]"/>
> > 
> >   
> > 
> >  > positionIncrementGap="0"/>
> >  
> >  
> > > required="true"/>
> > > required="true"/>
> >
> >  
> >  id
> > 
> >
> > As is my solrconfig.xml:
> >
> > 
> > 
> >   LUCENE_40
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   ${solr.data.dir:}
> >> class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
> >   
> > 
> >   ${solr.data.dir:}
> > 
> >   
> >   
> >   
> >   
> >   
> >   
> > 
> >   true
> >
> >   
> >   
> > 
> >
> >
>


Re: Testing Solr Cloud with ZooKeeper

2012-11-08 Thread darul
Too illustrate:

 

Taking this example, 8983 and 8984 are Shard "owner", 7501/7502 just
replicates.

If I stop all instance, then restart 8983 or 8984 first, they won't run and
asked for replicates too be started...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019103.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query Time Field Boost with MoreLikeThis on solr 3.6

2012-11-08 Thread Eva Lacy
I've been trying to boost fields using MoreLikeThis, I haven't been able to
cause the order of the products or their score to change by changing the
field boosts.

I've tried using mlt.ql=field1,field2&mlt.qf=field1^2+field2^1 and several
other configurations of the url to try and boost fields, it doesn't seem to
make a difference, any idea how I can boost a field this way?

Eva


Splitting data into an array / lookup

2012-11-08 Thread poir...@googlemail.com
Hey all,

I have a query based on a value I'm getting:-

 

Where all my fields I want are populated correctly, including the multivalue
one which has the format:



But I then want to take the values from 'findID' which are in an array at
the moment:

1012

and submit them to another table to lookup the values, which I do something
like:

 

   

Which kinda' works, as I get the first value back, and not the second from
findID.

The field set in the schema is set to multivalue, but not sure what else I
need to do to get all values back into the field.

Any help is appreciated :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Splitting-data-into-an-array-lookup-tp4019105.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is it possible to save the search query?

2012-11-08 Thread Jorge Luis Betancourt Gonzalez
I think that solr by him self doesn't store the queries (correct me if I'm 
wrong, about this) but you can accomplish what you want by processing the solr 
log (its the only way I think). From the solr log you can get the queries and 
then process the queries according to your needs, and change the boost 
parameters in your app o solr config. 

On Nov 8, 2012, at 11:32 AM, Otis Gospodnetic  
wrote:

> Hi,
> 
> Aha, I think I understand.  Yes, you could collect all doc IDs from each
> query and find the differences.  There is nothing in Solr that can find
> those differences or that would store doc IDs of returned hits in the first
> place, so you would have to implement this yourself.  Sematext's Search
> Analytics service my be of help here in the sense that all data you
> need (queries, doc IDs, etc.) are collected, so it would be a matter of
> providing an API to get the data for off-line analysis.  But this data
> collection+diffing is also something you could implement yourself.  One
> thing to think about - what do you do when a query returns a lrge
> number of hits.  Do you really want/need to get IDs for all of them, or
> only a page at a time.
> 
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
> 
> 
> On Thu, Nov 8, 2012 at 1:01 AM, Romita Saha 
> wrote:
> 
>> Hi,
>> 
>> The following is the example;
>> 1st query:
>> 
>> 
>> http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data
>> ^2
>> id&start=0&rows=11&fl=data,id
>> 
>> Next query:
>> 
>> 
>> http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data
>> id^2&start=0&rows=11&fl=data,id
>> 
>> In the 1st query the the field 'data' is boosted by 2. However may be the
>> user was not satisfied with the response. Thus in the next query he
>> boosted the field 'id' by 2.
>> 
>> I want to record both the queries and compare between the two, meaning,
>> what are the changes implemented on the 2nd query which are not present in
>> the previous one.
>> 
>> Thanks and regards,
>> Romita Saha
>> 
>> 
>> 
>> From:   Otis Gospodnetic 
>> To: solr-user@lucene.apache.org,
>> Date:   11/08/2012 01:35 PM
>> Subject:Re: is it possible to save the search query?
>> 
>> 
>> 
>> Hi,
>> 
>> Compare in what sense?  An example will help.
>> 
>> Otis
>> --
>> Performance Monitoring - http://sematext.com/spm
>> On Nov 7, 2012 8:45 PM, "Romita Saha" 
>> wrote:
>> 
>>> Hi All,
>>> 
>>> Is it possible to record a search query in solr and then compare it with
>>> the previous search query?
>>> 
>>> Thanks and regards,
>>> Romita Saha
>>> 
>> 
>> 
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Pay for support?

2012-11-08 Thread Jeff Rhines
Is there a service that I can pay to answer questions while I'm configuring and 
troubleshooting a Solr deployment?

Re: SolrCloud with JBoss

2012-11-08 Thread Otis Gospodnetic
Solr is a webapp in a war, so you can deploy it in jboss as such.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 8, 2012 1:15 PM, "Carlos Alexandro Becker" 
wrote:

> How can I made SolrCloud work with JBoss? I only find examples with Jetty,
> running the start.jar with a lot of params..
> Is possible to reproduce it with JBoss AS 7.1?
>
> Thanks in advance.
>
> --
> Atenciosamente,
> *Carlos Alexandro Becker*
> http://caarlos0.github.com/about
>


Re: Pay for support?

2012-11-08 Thread Gora Mohanty
On 9 November 2012 00:13, Jeff Rhines  wrote:
> Is there a service that I can pay to answer questions while I'm configuring 
> and troubleshooting a Solr deployment?

http://wiki.apache.org/solr/Support

Regards,
Gora


Re: New solrcloud deployment, no results

2012-11-08 Thread Jeff Rhines
Thanks for looking at it.

Id is usually going to be as follows:

"some.domain.name_SOMELONGSHA1HASH:/FileName.ext/somechars/1"

I indexed it so I could search for the domain name or the hash without storing 
it a second time. I'll convert to a string and see if this fixes the problem.

On Nov 8, 2012, at 8:45 AM, Erick Erickson wrote:

> Hmmm, I tried this with a 2 shard cluster and it works just fine, using
> your schema, solrconfig and query so I'm puzzled. What happens when you
> look at your cluster with the admin page? When you dive into collection1,
> does it show any documents?
> 
> Also, look at admin/schema-browser and look at the actual fields, to see if
> there's any data indexed.
> 
> 
> One thing though, I'd _seriously_ consider making the id a simple "string"
> type. It's possible that you're having some sort of wonkiness as a result
> of tokenizing your . I know of no _specific_ issues here, but it
> makes me really uneasy to see that your id field is tokenized in your
> schema given that Solr pretty much assumes that  is a single
> token/document. There is some slight evidence for this in that your
> numfound is 6 but the data isn't being echoed (although it is for me), but
> that's just guessing.
> 
> Best
> Erick
> 
> P.S. If you're still stumped, can you also post the docs you're indexing?
> Or at least their IDs so I can see what happens then?
> 
> 
> 
> On Wed, Nov 7, 2012 at 4:20 PM, Jeff Rhines  wrote:
> 
>> I have a cluster of 6 shards of Solr 4.0.0 deployed, one machine each,
>> with no replicas, and another single machine running a zookeeper ensemble
>> of 5. Using python sunburnt, I submit six documents with separate ids and
>> populated text fields and commit them. No errors are reported. When I
>> search ( /solr/collection1/select?q=*%3A*&fl=id&wt=json&indent=true ), I
>> see no results, but numFound 6. I'm sure I've misconfigured something, and
>> I'm hoping more experienced folk can see what it is. If you have any
>> troubleshooting tips, I'll try anything at this point.
>> 
>> Thanks,
>> Jeff
>> 
>> Results:
>> {
>>  "responseHeader":{
>>"status":0,
>>"QTime":52},
>>  "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[]
>>  }}
>> 
>> 
>> My schema.xml is very simple:
>> 
>> 
>> 
>>  
>>> />
>>> positionIncrementGap="100">
>>  
>>
>>> words="stopwords.txt" enablePositionIncrements="true" />
>>
>>  
>>  
>>
>>> words="stopwords.txt" enablePositionIncrements="true" />
>>> ignoreCase="true" expand="true"/>
>>
>>  
>>
>>
>>  
>>> pattern="[^a-zA-Z0-9]"/>
>>
>>  
>>
>>> positionIncrementGap="0"/>
>> 
>> 
>>   > required="true"/>
>>   > required="true"/>
>>   
>> 
>> id
>> 
>> 
>> As is my solrconfig.xml:
>> 
>> 
>> 
>>  LUCENE_40
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  ${solr.data.dir:}
>>  > class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>>  
>>
>>  ${solr.data.dir:}
>>
>>  
>>  
>>  
>>  
>>  
>>  
>>
>>  true
>>   
>>  
>>  
>> 
>> 
>> 



Re: SolrCloud with JBoss

2012-11-08 Thread Carlos Alexandro Becker
Hm, but how I configure zookeeper?

Do I have to do any custom setup?

PS: I'm using solr maven repository, because I have some custom classes..

Thanks in advance.


On Thu, Nov 8, 2012 at 4:45 PM, Otis Gospodnetic  wrote:

> Solr is a webapp in a war, so you can deploy it in jboss as such.
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm
> On Nov 8, 2012 1:15 PM, "Carlos Alexandro Becker" 
> wrote:
>
> > How can I made SolrCloud work with JBoss? I only find examples with
> Jetty,
> > running the start.jar with a lot of params..
> > Is possible to reproduce it with JBoss AS 7.1?
> >
> > Thanks in advance.
> >
> > --
> > Atenciosamente,
> > *Carlos Alexandro Becker*
> > http://caarlos0.github.com/about
> >
>



-- 
Atenciosamente,
*Carlos Alexandro Becker*
http://caarlos0.github.com/about


Re: Skewed IDF in multi lingual index

2012-11-08 Thread Tom Burton-West
Hi Markus,

No answers, but I am very interested in what you find out.  We currently
index all languages in one index, which presents different IDF issues, but
are interested in exploring alternatives such as the one you describe.

Tom Burton-West

http://www.hathitrust.org/blogs/large-scale-search

On Thu, Nov 8, 2012 at 11:13 AM, Markus Jelsma
wrote:

> Hi,
>
> We're testing a large multi lingual index with _LANG fields for each
> language and using dismax to query them all. Users provide, explicit or
> implicit, language preferences that we use for either additive or
> multiplicative boosting on the language of the document. However, additive
> boosting is not adequate because it cannot overcome the extremely high IDF
> values for the same word in another language so regardless of the the
> preference, foreign documents are returned. Multiplicative boosting solves
> this problem but has the other downside as it doesn't allow us with
> standard qf=field^boost to prefer documents in another language above the
> preferred language because the multiplicative is so strong. We do use the
> def function (boost=def(query($qq),.3)) to prevent one boost query to
> return 0 and thus a product of 0 for all boost queries. But it doesn't help
> that much
>
> This all comes down to IDF differences between the languages, even common
> words such as country names like `india` show large differences in IDF. Is
> here anyone with some hints or experiences to share about skewed IDF in
> such an index?
>
> Thanks,
> Markus
>


Re: Pay for support?

2012-11-08 Thread Gora Mohanty
On 9 November 2012 00:17, Gora Mohanty  wrote:
> On 9 November 2012 00:13, Jeff Rhines  wrote:
>> Is there a service that I can pay to answer questions while I'm configuring 
>> and troubleshooting a Solr deployment?
>
> http://wiki.apache.org/solr/Support

This reminds me of a question that I had: What is the etiquette
to getting one's own company listed on that page. Is it kosher to
edit it oneself?

Regards,
Gora


Re: Pay for support?

2012-11-08 Thread Otis Gospodnetic
Yes. :)

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 8, 2012 1:53 PM, "Gora Mohanty"  wrote:

> On 9 November 2012 00:17, Gora Mohanty  wrote:
> > On 9 November 2012 00:13, Jeff Rhines  wrote:
> >> Is there a service that I can pay to answer questions while I'm
> configuring and troubleshooting a Solr deployment?
> >
> > http://wiki.apache.org/solr/Support
>
> This reminds me of a question that I had: What is the etiquette
> to getting one's own company listed on that page. Is it kosher to
> edit it oneself?
>
> Regards,
> Gora
>


Re: searching camel cased terms with phrase queries

2012-11-08 Thread Dmitry Kan
Thanks, Jack. This filter should help for dealing with user input without
clear lexical boundaries. I.e. breaking compound-to-be-words into sub-words
on the query side. It does require still mining the dictionary, but is
doable by some "simple" camel case term frequency analysis.

But would it help really to match with the indexed data?

Tried with solr 4.0.0-BETA (hopefully not too different from stable 4.0
release on this side):

text field in schema (slightly modified "text_general" type by adding WDF
and DCWTF + placing LCF in-between them; english-common-nouns.txt is from
http://www.typo3-media.com/fileadmin/files/wordlists/english-common-nouns.txtwith
word 'rice' removed to make the example below make more sense):



  








  
  








  




index:

product for PricewaterhouseCoopers company is this!

query:

"product for Pricewaterhousecoopers company is this!"

I believe no match here according to terms and their positions on the
analysis page. Some misconfiguration? Included DCWTF on the query side as
well as opposed to e.g. to an approach here
http://www.typo3-media.com/blog/solr-noun-expansion.html, so that to
encounter for user no lexical boundary compound words.

-- Dmitry


On Thu, Nov 8, 2012 at 5:04 PM, Jack Krupansky wrote:

> I forgot to mention DictionaryCompoundWordTokenFil**terFactory. It does
> require you to create a dictionary of terms, as opposed to using the terms
> that have been encountered in the index.
>
> -- Jack Krupansky
>
> -Original Message- From: Jack Krupansky
> Sent: Wednesday, November 07, 2012 8:14 AM
> To: solr-user@lucene.apache.org
> Subject: Re: searching camel cased terms with phrase queries
>
>
> This is one of those areas of Solr where you can refine and make
> improvements, as you have done, but never actually reach 100% satisfaction.
> And, in some cases, as here, you have a choice of settings and no single
> combination covers all cases.
>
> In this case, you really need compound-term recognition - detecting that
> two
> or more terms have been juxtaposed with no lexical boundary. Google has it,
> and I 'm sure some Solr users have implemented it on their own, but it
> isn't
> in Solr proper, yet.
>
> WDF provides a partial approximation, by generating extra, compound terms
> at
> index time. That works well when ALL of the terms are written together, but
> not when only a subset are written together without lexical boundaries, as
> in your final example.
>
> So, you COULD go the full Google route with a lot of additional effort, or
> accept that you offer only a reasonable approximation. Your choice.
>
> So, pick the approximation which seems "best" and accept that it doesn't
> handle the other cases.
>
> BTW, the proper name is "PricewaterhouseCoopers".
>
> -- Jack Krupansky
>
> -Original Message- From: Dmitry Kan
> Sent: Wednesday, November 07, 2012 1:58 AM
> To: solr-user@lucene.apache.org
> Subject: searching camel cased terms with phrase queries
>
> Hello list,
>
> There was a number of threads about handling camel cased words apparently
> in the past (
> http://search-lucene.com/?q=**camel+case&fc_project=Lucene&**
> fc_project=Solr
> ).
> Our case is somewhat different from them.
>
> ===
> Configuration & example
> ===
>
> To illustrate the issue, let me give you a real example from our data.
> Suppose there is a term in the original text: SmartTV.
>
> If a user wants to type "SmartTV" and "smart tv", we want both to hit the
> original term SmartTV. In order to achieve this, the following filter is
> used in our solr 3.4 schema:
>
> index side:
>
>  generateWordParts="1"
>generateNumberParts="0"
>catenateWords="0"
>catenateNumbers="0"
>catenateAll="0"
>preserveOriginal="1"
>spiltOnCaseChange="1"
>  />
>
> query side:
>
>  generateWordParts="1"
>generateNumberParts="0"
>catenateWords="0"
>catenateNumbers="0"
>catenateAll="0"
>preserveOriginal="1"
>spiltOnCaseChange="1"
>  />
>
> (no differences)
>
> Copying from the analysis page, the index will contain the following terms
> and their positions:
>
> org.apache.solr.analysis.**WordDelimiterFilterFactory {preserveOriginal=1,
> spiltOnCaseChange=1, generateNumberParts=0, catenateWords=0,
> luceneMatchVersion=LUCENE_34, generateWordParts=1, catenateAll=0,
> catenateNumbers=0} position 12 term text SmartTVTV Smart startOffset 05 0
> endOffset 77 5 type  
>
> (there are tokenizer StandardTokenizerFactory and StandardFilterFactory
> preceeding this filter, but as t

Re: New solrcloud deployment, no results

2012-11-08 Thread Jeff Rhines
That did it, good sirs. Additionally, debugQuery=true no longer gives me an NPE.

Best Regards,
Jeff

On Nov 8, 2012, at 11:17 AM, Timothy Potter wrote:

> I've seen the same exact behavior when using analyzed key fields, switching
> to string as Erick recommends should solve your problem.
> 
> Cheers,
> Tim
> 
> On Thu, Nov 8, 2012 at 7:45 AM, Erick Erickson wrote:
> 
>> Hmmm, I tried this with a 2 shard cluster and it works just fine, using
>> your schema, solrconfig and query so I'm puzzled. What happens when you
>> look at your cluster with the admin page? When you dive into collection1,
>> does it show any documents?
>> 
>> Also, look at admin/schema-browser and look at the actual fields, to see if
>> there's any data indexed.
>> 
>> 
>> One thing though, I'd _seriously_ consider making the id a simple "string"
>> type. It's possible that you're having some sort of wonkiness as a result
>> of tokenizing your . I know of no _specific_ issues here, but it
>> makes me really uneasy to see that your id field is tokenized in your
>> schema given that Solr pretty much assumes that  is a single
>> token/document. There is some slight evidence for this in that your
>> numfound is 6 but the data isn't being echoed (although it is for me), but
>> that's just guessing.
>> 
>> Best
>> Erick
>> 
>> P.S. If you're still stumped, can you also post the docs you're indexing?
>> Or at least their IDs so I can see what happens then?
>> 
>> 
>> 
>> On Wed, Nov 7, 2012 at 4:20 PM, Jeff Rhines  wrote:
>> 
>>> I have a cluster of 6 shards of Solr 4.0.0 deployed, one machine each,
>>> with no replicas, and another single machine running a zookeeper ensemble
>>> of 5. Using python sunburnt, I submit six documents with separate ids and
>>> populated text fields and commit them. No errors are reported. When I
>>> search ( /solr/collection1/select?q=*%3A*&fl=id&wt=json&indent=true ), I
>>> see no results, but numFound 6. I'm sure I've misconfigured something,
>> and
>>> I'm hoping more experienced folk can see what it is. If you have any
>>> troubleshooting tips, I'll try anything at this point.
>>> 
>>> Thanks,
>>> Jeff
>>> 
>>> Results:
>>> {
>>>  "responseHeader":{
>>>"status":0,
>>>"QTime":52},
>>>  "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[]
>>>  }}
>>> 
>>> 
>>> My schema.xml is very simple:
>>> 
>>> 
>>> 
>>>  
>>>>> />
>>>>> positionIncrementGap="100">
>>>  
>>>
>>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>
>>>  
>>>  
>>>
>>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>>> ignoreCase="true" expand="true"/>
>>>
>>>  
>>>
>>>
>>>  
>>>>> pattern="[^a-zA-Z0-9]"/>
>>>
>>>  
>>>
>>>>> positionIncrementGap="0"/>
>>> 
>>> 
>>>   >> required="true"/>
>>>   >> required="true"/>
>>>   
>>> 
>>> id
>>> 
>>> 
>>> As is my solrconfig.xml:
>>> 
>>> 
>>> 
>>>  LUCENE_40
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  ${solr.data.dir:}
>>>  >> class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>>>  
>>>
>>>  ${solr.data.dir:}
>>>
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>
>>>  true
>>>   
>>>  
>>>  
>>> 
>>> 
>>> 
>> 



Preventing accepting queries while custom QueryComponent starts up?

2012-11-08 Thread Aaron Daubman
Greetings,

I have several custom QueryComponents that have high one-time startup costs
(hashing things in the index, caching things from a RDBMS, etc...)

Is there a way to prevent solr from accepting connections before all
QueryComponents are "ready"?

Especially, since many of our instance are load-balanced (and
added-in/removed automatically based on admin/ping responses) preventing
ping from answering prior to all custom QueryComponents being ready would
be ideal...

Thanks,
 Aaron


RE: best practice for restarting the entire SolrCloud cluster

2012-11-08 Thread Markus Jelsma
Hi - i think you're seeing:
https://issues.apache.org/jira/browse/SOLR-3993
 
 
-Original message-
> From:Bill Au 
> Sent: Thu 08-Nov-2012 21:16
> To: solr-user@lucene.apache.org
> Subject: best practice for restarting the entire SolrCloud cluster
> 
> I have a simple SolrCloud cluster with 4 Solr instances and 1 shard.  I can
> start and stop individual Solr instances without any problem.  But not when
> I have to shutdown all the Solr instances at the same time.
> 
> After shutting down all the Solr instances, the first instance that starts
> up wait for all the replicas:
> 
> INFO: Waiting until we see more replicas up: total=4 found=3
> timeoutin=169243
> 
> In the meantime, any additional Solr instances that start up while the
> first one is waiting can't get the leader from zookeeper:
> 
> SEVERE: Error getting leader from zk
> org.apache.solr.common.SolrException: Could not get leader props
> 
> When the first Solr instance see all the replicas, it becomes the leader:
> 
> INFO: Enough replicas found to continue.
> INFO: I may be the new leader - try and sync
> 
> But it fails to sync with the instances that had failed to get the leader
> before:
> 
> WARNING: PeerSync: core=collection1 url=http://host2:8983/solr  exception
> talking to http://host2:8983/solr/collection1/, failed
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://host2:8983/solr/collection1
> 
> So I ended up with one for more replicas down after the restart.  I had to
> figure out which replica is down and restart them.
> 
> What I also discovered is that if I start the first Solr instance and wait
> until it returns after the leaderVoteWait of 3 minutes, the rest of the
> Solr instance can be started without any problem since by then they can get
> the leader from zookeeper.
> 
> Is there a better way to restart an entire SolrCloud cluster?
> 
> Bill
> 


Re: best practice for restarting the entire SolrCloud cluster

2012-11-08 Thread Bill Au
My replicas are actually on different machines so they do come up.  The
problem I found is that since they can't get the leader they just come up
but is not part of the cluster.  I can still do local search with
distrib=false.  They do not retry to get the leader so I have to restarted
them after the leader has started in order to get them back into the
cluster.

Bill


On Thu, Nov 8, 2012 at 4:02 PM, Markus Jelsma wrote:

> Hi - i think you're seeing:
> https://issues.apache.org/jira/browse/SOLR-3993
>
>
> -Original message-
> > From:Bill Au 
> > Sent: Thu 08-Nov-2012 21:16
> > To: solr-user@lucene.apache.org
> > Subject: best practice for restarting the entire SolrCloud cluster
> >
> > I have a simple SolrCloud cluster with 4 Solr instances and 1 shard.  I
> can
> > start and stop individual Solr instances without any problem.  But not
> when
> > I have to shutdown all the Solr instances at the same time.
> >
> > After shutting down all the Solr instances, the first instance that
> starts
> > up wait for all the replicas:
> >
> > INFO: Waiting until we see more replicas up: total=4 found=3
> > timeoutin=169243
> >
> > In the meantime, any additional Solr instances that start up while the
> > first one is waiting can't get the leader from zookeeper:
> >
> > SEVERE: Error getting leader from zk
> > org.apache.solr.common.SolrException: Could not get leader props
> >
> > When the first Solr instance see all the replicas, it becomes the leader:
> >
> > INFO: Enough replicas found to continue.
> > INFO: I may be the new leader - try and sync
> >
> > But it fails to sync with the instances that had failed to get the leader
> > before:
> >
> > WARNING: PeerSync: core=collection1 url=http://host2:8983/solr exception
> > talking to http://host2:8983/solr/collection1/, failed
> > org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> > waiting response from server at: http://host2:8983/solr/collection1
> >
> > So I ended up with one for more replicas down after the restart.  I had
> to
> > figure out which replica is down and restart them.
> >
> > What I also discovered is that if I start the first Solr instance and
> wait
> > until it returns after the leaderVoteWait of 3 minutes, the rest of the
> > Solr instance can be started without any problem since by then they can
> get
> > the leader from zookeeper.
> >
> > Is there a better way to restart an entire SolrCloud cluster?
> >
> > Bill
> >
>


Re: Preventing accepting queries while custom QueryComponent starts up?

2012-11-08 Thread Amit Nithian
I think Solr does this by default and are you executing warming queries in
the firstSearcher so that these actions are done before Solr is ready to
accept real queries?


On Thu, Nov 8, 2012 at 11:54 AM, Aaron Daubman  wrote:

> Greetings,
>
> I have several custom QueryComponents that have high one-time startup costs
> (hashing things in the index, caching things from a RDBMS, etc...)
>
> Is there a way to prevent solr from accepting connections before all
> QueryComponents are "ready"?
>
> Especially, since many of our instance are load-balanced (and
> added-in/removed automatically based on admin/ping responses) preventing
> ping from answering prior to all custom QueryComponents being ready would
> be ideal...
>
> Thanks,
>  Aaron
>


Re: Preventing accepting queries while custom QueryComponent starts up?

2012-11-08 Thread Aaron Daubman
Amit,

I am using warming /firstSearcher queries to ensure this happens before any
external queries are received, however, unless I am misinterpreting the
logs, solr starts responding to admin/ping requests before firstSearcher
completes, and, the LB then puts the solr instance back in the pool, and it
starts accepting connections...


On Thu, Nov 8, 2012 at 4:24 PM, Amit Nithian  wrote:

> I think Solr does this by default and are you executing warming queries in
> the firstSearcher so that these actions are done before Solr is ready to
> accept real queries?
>
>
> On Thu, Nov 8, 2012 at 11:54 AM, Aaron Daubman  wrote:
>
> > Greetings,
> >
> > I have several custom QueryComponents that have high one-time startup
> costs
> > (hashing things in the index, caching things from a RDBMS, etc...)
> >
> > Is there a way to prevent solr from accepting connections before all
> > QueryComponents are "ready"?
> >
> > Especially, since many of our instance are load-balanced (and
> > added-in/removed automatically based on admin/ping responses) preventing
> > ping from answering prior to all custom QueryComponents being ready would
> > be ideal...
> >
> > Thanks,
> >  Aaron
> >
>


Re: Preventing accepting queries while custom QueryComponent starts up?

2012-11-08 Thread Amit Nithian
Sorry I misunderstood. I am having difficulty finding this but it's never
clear the exact load order. It seems odd that you'd be getting requests
when the filter (DispatchFilter) hasn't 100% loaded yet.

I didn't think that the admin handler would allow requests while the
dispatch filter is still init'ing but sounds like it is? I'll have to play
with this to see.. curious what the problem is for we have a similar setup
but not as bad of an init problem (plus when I deploy, my deploy script
runs some actual simple test queries to ensure they return before enabling
the ping handler to return 200s) to avoid this problem.

Cheers
Amit


On Thu, Nov 8, 2012 at 1:33 PM, Aaron Daubman  wrote:

> Amit,
>
> I am using warming /firstSearcher queries to ensure this happens before any
> external queries are received, however, unless I am misinterpreting the
> logs, solr starts responding to admin/ping requests before firstSearcher
> completes, and, the LB then puts the solr instance back in the pool, and it
> starts accepting connections...
>
>
> On Thu, Nov 8, 2012 at 4:24 PM, Amit Nithian  wrote:
>
> > I think Solr does this by default and are you executing warming queries
> in
> > the firstSearcher so that these actions are done before Solr is ready to
> > accept real queries?
> >
> >
> > On Thu, Nov 8, 2012 at 11:54 AM, Aaron Daubman 
> wrote:
> >
> > > Greetings,
> > >
> > > I have several custom QueryComponents that have high one-time startup
> > costs
> > > (hashing things in the index, caching things from a RDBMS, etc...)
> > >
> > > Is there a way to prevent solr from accepting connections before all
> > > QueryComponents are "ready"?
> > >
> > > Especially, since many of our instance are load-balanced (and
> > > added-in/removed automatically based on admin/ping responses)
> preventing
> > > ping from answering prior to all custom QueryComponents being ready
> would
> > > be ideal...
> > >
> > > Thanks,
> > >  Aaron
> > >
> >
>


Re: Preventing accepting queries while custom QueryComponent starts up?

2012-11-08 Thread Aaron Daubman
>  (plus when I deploy, my deploy script
> runs some actual simple test queries to ensure they return before enabling
> the ping handler to return 200s) to avoid this problem.
>

What are you doing to programmatically disable/enable the ping handler?
This sounds like exactly what I should be doing as well...


Re: SolrCloud with JBoss

2012-11-08 Thread Otis Gospodnetic
Hi,

You should just set up ZK independently of JBoss/Solr and then point Solr
to it.

Check this: http://search-lucene.com/?q=solr+jboss

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Nov 8, 2012 at 1:50 PM, Carlos Alexandro Becker
wrote:

> Hm, but how I configure zookeeper?
>
> Do I have to do any custom setup?
>
> PS: I'm using solr maven repository, because I have some custom classes..
>
> Thanks in advance.
>
>
> On Thu, Nov 8, 2012 at 4:45 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com
> > wrote:
>
> > Solr is a webapp in a war, so you can deploy it in jboss as such.
> >
> > Otis
> > --
> > Performance Monitoring - http://sematext.com/spm
> > On Nov 8, 2012 1:15 PM, "Carlos Alexandro Becker" 
> > wrote:
> >
> > > How can I made SolrCloud work with JBoss? I only find examples with
> > Jetty,
> > > running the start.jar with a lot of params..
> > > Is possible to reproduce it with JBoss AS 7.1?
> > >
> > > Thanks in advance.
> > >
> > > --
> > > Atenciosamente,
> > > *Carlos Alexandro Becker*
> > > http://caarlos0.github.com/about
> > >
> >
>
>
>
> --
> Atenciosamente,
> *Carlos Alexandro Becker*
> http://caarlos0.github.com/about
>


Re: Problem with ping handler, SolrJ 4.1-SNAPSHOT, Solr 3.5.0

2012-11-08 Thread Shawn Heisey

On 11/6/2012 12:25 AM, Shawn Heisey wrote:
If I use this exact same code to talk to a Solr 3.5.0 server (older 
version of the SOLR-1972 patch applied) with the ping handler in the 
"enabled" state, I get the following exception.  The /admin/ping 
handler works in a browser on both Solr versions:


Caused by: org.apache.solr.client.solrj.SolrServerException: Error 
executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:98)

at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at com.newscom.common.solr.Core.ping(Core.java:396)
... 4 more
Caused by: java.lang.RuntimeException: Invalid version (expected 2, 
but 60) or the data in not in 'javabin' format
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:109)
at 
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:384)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)

... 6 more


No response in two days.  See original message for full details.

I grabbed the javabin response on 4.1-SNAPSHOT and on 3.5.0. Aside from 
the 4.1 version having a lot more data, I couldn't see a lot of 
difference between the two (examining in notepad, not a binary editor), 
but as I am unfamiliar with the javabin format, I cannot say.


My best guess at this point is that something in the 3.5.0 response, or 
the http headers sent with the response, makes SolrJ barf.


Right now I am treating an exception that contains the "Invalid version" 
message as a "no info" status.  With the 4.1-SNAPSHOT server, I can 
reliably say "OK" instead.  With both versions, I can detect the 
"Disabled" state.


Unless someone objects, I will be filing a bug against 3.5, though a 
workaround in the newer SolrJ would be welcome as well.


Thanks,
Shawn



RE: Problem with ping handler, SolrJ 4.1-SNAPSHOT, Solr 3.5.0

2012-11-08 Thread Dyer, James
Shawn,

Could this be a side-effect from SOLR-4019, in branch_4.0 this was commit 
r1405894 ?  Prior to this commit, PingRequestHandler would throw a 
SolrException for 503/Bad Request.  The change is that the exception isn't 
actually thrown but rather sent in place of the response.  This prevents the 
container from logging huge stack traces just because PingrequestHandler is in 
a "disabled" state.  Prior to this, SolrException had logging disabled for 
503's with hardcoding, but this broke other uses of 503 SE's.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Tuesday, November 06, 2012 1:25 AM
To: solr-user@lucene.apache.org
Subject: Problem with ping handler, SolrJ 4.1-SNAPSHOT, Solr 3.5.0

If I send a ping request to Solr 4.1 from SolrJ 4.1, it works.  I don't 
have an exact revision number from branch_4x, I don't know how to get it 
from SolrJ.  The 4.1 server is running solr-impl 4.1-SNAPSHOT 1401798M 
with the patch from SOLR-1972 applied, and it's somewhat newer than SolrJ.


Java code snippets:

private static final String PING_HANDLER = "/admin/ping";

query.setRequestHandler(PING_HANDLER);
response = _querySolr.query(query);


If I use this exact same code to talk to a Solr 3.5.0 server (older 
version of the SOLR-1972 patch applied) with the ping handler in the 
"enabled" state, I get the following exception.  The /admin/ping handler 
works in a browser on both Solr versions:

Caused by: org.apache.solr.client.solrj.SolrServerException: Error 
executing query
 at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:98)
 at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
 at com.newscom.common.solr.Core.ping(Core.java:396)
 ... 4 more
Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 
60) or the data in not in 'javabin' format
 at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:109)
 at 
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:384)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
 ... 6 more

If I use the XMLResponseParser instead, then I get a different exception:

Caused by: org.apache.solr.common.SolrException: parsing error
 at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:143)
 at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:104)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:384)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
 at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
 at com.newscom.common.solr.Core.ping(Core.java:398)
 ... 4 more
Caused by: java.lang.Exception: really needs to be response or result.  
not:html
 at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:134)
 ... 10 more

I'm already dealing with the expected "Service Unavailable" exception, 
this is something different.  Is it going to be possible to make this 
work?  Should I file an issue in Jira?  Is the problem in the newer 
SolrJ or in the older Solr?  At this time I do not really need the 
information in the response, I just need to be able to judge success 
(Solr is up and working) by nothing being thrown, and be able to look 
into any exception thrown to see whether I've got a disabled handler or 
an error condition.

Thanks,
Shawn





Re: Problem with ping handler, SolrJ 4.1-SNAPSHOT, Solr 3.5.0

2012-11-08 Thread Shawn Heisey

On 11/8/2012 3:25 PM, Dyer, James wrote:

Shawn,

Could this be a side-effect from SOLR-4019, in branch_4.0 this was commit r1405894 ?  
Prior to this commit, PingRequestHandler would throw a SolrException for 503/Bad Request. 
 The change is that the exception isn't actually thrown but rather sent in place of the 
response.  This prevents the container from logging huge stack traces just because 
PingrequestHandler is in a "disabled" state.  Prior to this, SolrException had 
logging disabled for 503's with hardcoding, but this broke other uses of 503 SE's.
My checkout of branch_4x is prior to the change from SOLR-4019. SolrJ is 
considerably older, Solr is somewhat newer.  I'm having a problem with 
3.5.0 servers, not 4.x.


I'm already handling the exception when the ping handler is disabled and 
returns a 503 error.  The exception message is different between the two 
versions, but I'm dealing with that.  My problem happens when the ping 
handler is enabled.  Against a 4.1-SNAPSHOT server, I successfully get a 
QueryResponse object from the query.  Against a 3.5.0 server, I get an 
exception, with this as the relevant piece:


Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 
60) or the data in not in 'javabin' format
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:109)


In a browser, using /solr/CORE/admin/ping gets me an XML response.  For 
both versions, in the main response section is a status field saying OK, 
and in the responseHeader section is a status field with a value of zero.


I have not tried a SolrJ 3.5 against the 3.5 server.  This program is 
generating a status page for my entire Solr environment, both 3.5 
production and 4.1 development, I can't run two versions.


Thanks,
Shawn



Re: Questions about schema.xml

2012-11-08 Thread Erick Erickson
And, in fact, you do NOT need to have two. If they are both identical, just
specify one analysis chain with no qualifier, i.e.



On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky wrote:

> Many token filters will be used 100% identically for both "index" and
> "query" analysis, but WordDelimiterFilter is a rare exception. The issue is
> that at index time it has the ability to generate multiple tokens at the
> same position (the "catenate" options), any of which can be queried, but at
> query time it can be problematic to have these "extra" terms (except in
> some conditions), so the WDF settings suppress generation of the extra
> terms.
>
> Another example is synonyms - generate extra terms at index time for
> greater precision of searches, but limit the query terms to exclude the
> "extra" terms.
>
> That's the reason for the occaassional asymmetry between index-time and
> query-time analyzers.
>
> -- Jack Krupansky
>
> -Original Message- From: johnmu...@aol.com
> Sent: Wednesday, November 07, 2012 7:13 PM
> To: solr-user@lucene.apache.org
> Subject: Questions about schema.xml
>
>
>
> HI,
>
>
> Can someone help me understand the meaning of  and
>  in schema.xml, how they are used and what do I get
> back when the values are not the same?
>
>
> For example, given:
>
>
>  autoGeneratePhraseQueries="**true">
>   
>  
>   words="stopwords.txt" enablePositionIncrements="**true" />
>   generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>  
>   protected="protwords.txt"/>
>  
>   
>   
>  
>   ignoreCase="true" expand="true"/>
>   words="stopwords.txt" enablePositionIncrements="**true" />
>   generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>  
>   protected="protwords.txt"/>
>  
>   
> 
>
>
> If I make the entire content of "index" the same as "query" (or the other
> way around) how will that impact my search?  And why would I want to not
> make those two blocks the same?
>
>
> Thanks!!!
>
>
> -MJ
>


Using AnalyzingQueryParser - Solr 4.0

2012-11-08 Thread balaji.gandhi
Hi Team,

Just trying to find out how to configure AnalyzingQueryParser in Solr 4.0.
Please let me know.

Thanks,
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-AnalyzingQueryParser-Solr-4-0-tp4019193.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using AnalyzingQueryParser - Solr 4.0

2012-11-08 Thread Jack Krupansky
There isn't a "QParserPlugIn" for that query parser for Solr. You would have 
to develop one yourself.


But, why do you think you need that query parser? I mean, the standard query 
parsers/analyzers for Solr are now "multi-term aware" to permit some 
combinations of case filtering and wildcards, for example.


-- Jack Krupansky

-Original Message- 
From: balaji.gandhi

Sent: Thursday, November 08, 2012 4:13 PM
To: solr-user@lucene.apache.org
Subject: Using AnalyzingQueryParser - Solr 4.0

Hi Team,

Just trying to find out how to configure AnalyzingQueryParser in Solr 4.0.
Please let me know.

Thanks,
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-AnalyzingQueryParser-Solr-4-0-tp4019193.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Questions about schema.xml

2012-11-08 Thread johnmunir

Thank you everyone for your explanation.  So for WordDelimiterFilter, let me 
see if I got it right.


Given that out-of-the box setting for catenateWords is "0" for query but is "1" 
for index, then I don't see how this will give me any hits.  That is, if my 
document has "wi-fi", at index time it will be stored as "wifi".  Well, than at 
query time if I type "wi-fi" (without quotes) I will be searching for "wi fi" 
and thus won't get a hit.  no?


What about when I *do* quote my search, i.e.: I search for "wi-fi" with quotes, 
now what am I sending to the searcher, "wi-fi", "wi fi" or "wifi"?  Again, this 
is using the default out-of-the box setting per the above.


The same applies for catenateNumbers.


Btw, I'm looking at this link for the above values: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters


--MJ





-Original Message-
From: Erick Erickson 
To: solr-user 
Sent: Thu, Nov 8, 2012 6:57 pm
Subject: Re: Questions about schema.xml


And, in fact, you do NOT need to have two. If they are both identical, just
specify one analysis chain with no qualifier, i.e.



On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky wrote:

> Many token filters will be used 100% identically for both "index" and
> "query" analysis, but WordDelimiterFilter is a rare exception. The issue is
> that at index time it has the ability to generate multiple tokens at the
> same position (the "catenate" options), any of which can be queried, but at
> query time it can be problematic to have these "extra" terms (except in
> some conditions), so the WDF settings suppress generation of the extra
> terms.
>
> Another example is synonyms - generate extra terms at index time for
> greater precision of searches, but limit the query terms to exclude the
> "extra" terms.
>
> That's the reason for the occaassional asymmetry between index-time and
> query-time analyzers.
>
> -- Jack Krupansky
>
> -Original Message- From: johnmu...@aol.com
> Sent: Wednesday, November 07, 2012 7:13 PM
> To: solr-user@lucene.apache.org
> Subject: Questions about schema.xml
>
>
>
> HI,
>
>
> Can someone help me understand the meaning of  and
>  in schema.xml, how they are used and what do I get
> back when the values are not the same?
>
>
> For example, given:
>
>
>  autoGeneratePhraseQueries="**true">
>   
>  
>   words="stopwords.txt" enablePositionIncrements="**true" />
>   generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>  
>   protected="protwords.txt"/>
>  
>   
>   
>  
>   ignoreCase="true" expand="true"/>
>   words="stopwords.txt" enablePositionIncrements="**true" />
>   generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>  
>   protected="protwords.txt"/>
>  
>   
> 
>
>
> If I make the entire content of "index" the same as "query" (or the other
> way around) how will that impact my search?  And why would I want to not
> make those two blocks the same?
>
>
> Thanks!!!
>
>
> -MJ
>

 


Re: Questions about schema.xml

2012-11-08 Thread Jack Krupansky
The default setting should index BOTH "wi fi" and "wifi". Query for "wi-fi", 
either with or without quotes will query for "wi fi". Incidentally, that is 
known as "autoGeneratePhraseQueries".


-- Jack Krupansky

-Original Message- 
From: johnmu...@aol.com

Sent: Thursday, November 08, 2012 6:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Questions about schema.xml


Thank you everyone for your explanation.  So for WordDelimiterFilter, let me 
see if I got it right.



Given that out-of-the box setting for catenateWords is "0" for query but is 
"1" for index, then I don't see how this will give me any hits.  That is, if 
my document has "wi-fi", at index time it will be stored as "wifi".  Well, 
than at query time if I type "wi-fi" (without quotes) I will be searching 
for "wi fi" and thus won't get a hit.  no?



What about when I *do* quote my search, i.e.: I search for "wi-fi" with 
quotes, now what am I sending to the searcher, "wi-fi", "wi fi" or "wifi"? 
Again, this is using the default out-of-the box setting per the above.



The same applies for catenateNumbers.


Btw, I'm looking at this link for the above values: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters



--MJ





-Original Message-
From: Erick Erickson 
To: solr-user 
Sent: Thu, Nov 8, 2012 6:57 pm
Subject: Re: Questions about schema.xml


And, in fact, you do NOT need to have two. If they are both identical, just
specify one analysis chain with no qualifier, i.e.



On Thu, Nov 8, 2012 at 9:44 AM, Jack Krupansky 
wrote:



Many token filters will be used 100% identically for both "index" and
"query" analysis, but WordDelimiterFilter is a rare exception. The issue 
is

that at index time it has the ability to generate multiple tokens at the
same position (the "catenate" options), any of which can be queried, but 
at

query time it can be problematic to have these "extra" terms (except in
some conditions), so the WDF settings suppress generation of the extra
terms.

Another example is synonyms - generate extra terms at index time for
greater precision of searches, but limit the query terms to exclude the
"extra" terms.

That's the reason for the occaassional asymmetry between index-time and
query-time analyzers.

-- Jack Krupansky

-Original Message- From: johnmu...@aol.com
Sent: Wednesday, November 07, 2012 7:13 PM
To: solr-user@lucene.apache.org
Subject: Questions about schema.xml



HI,


Can someone help me understand the meaning of  and
 in schema.xml, how they are used and what do I get
back when the values are not the same?


For example, given:



  
 
 
 
 
 
 
  
  
 
 
 
 
 
 
 
  



If I make the entire content of "index" the same as "query" (or the other
way around) how will that impact my search?  And why would I want to not
make those two blocks the same?


Thanks!!!


-MJ






Re: Limit the SolR acces from the web for one user-agent?

2012-11-08 Thread Floyd Wu
Hi Alex, I'd like to know how to "using Client and Server Certificates to
protect
the connection and embedding those certificates into clients?"

Please kindly share your experience.

Floyd


2012/11/8 Alexandre Rafalovitch 

> It is very easy to do this on Apache, but you need to be aware that
> User-Agent is extremely easy to both sniff and spoof.
>
> Have you thought of perhaps using Client and Server Certificates to protect
> the connection and embedding those certificates into clients?
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Thu, Nov 8, 2012 at 9:39 AM, Bruno Mannina  wrote:
>
> > Dear All,
> >
> > I'm using an external program (my own client) to access to my Apache-SolR
> > database.
> > I would like to restrict the SOLR access to a specific User-Agent
> (defined
> > in my program).
> >
> > I would like to know if it's possible to do that directly in SolR config
> > or I must
> > process that in the Apache server?
> >
> > My program do only requests like this (i.e.):
> > http://xxx.xxx.xxx.xxx:pp/**solr/select/?q=ap%3Afuelcell&**
> > version=2.2&start=0&rows=10&**indent=on
> >
> > I can add on my HTTP component properties an User-Agent, Log, Pass,
> etc...
> > like a standard Http connection.
> >
> > To complete: my soft is distribued to several users and I would like to
> > limit the SOLR access to these users and with my program.
> > FireFox, Chrome, I.E. will be unauthorized.
> >
> > thanks for your comment or help,
> > Bruno
> >
> > Ubuntu 12.04LTS
> > SolR 3.6
> >
>


Re: Limit the SolR acces from the web for one user-agent?

2012-11-08 Thread Alexandre Rafalovitch
I haven't _done_ this myself, but I believe it is a well supported
scenario. See, for example,
http://httpd.apache.org/docs/2.4/ssl/ssl_howto.html#accesscontrol
and
http://stackoverflow.com/questions/1666052/java-https-client-certificate-authentication

Basically, you create a set of self-signed certificates and then your
client has to encrypt the connection and provide the certificate. Somebody
with access to the client can probably still break it and get the
certificates out, but it is quite a bit harder than just running a
Wireshark on the same (or even other) machine and checking what custom
header is being used.

This is no longer a SOLR question, but I am sure StackOverflow can help
with more specific issues, if needed.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Nov 8, 2012 at 10:08 PM, Floyd Wu  wrote:

> Hi Alex, I'd like to know how to "using Client and Server Certificates to
> protect
> the connection and embedding those certificates into clients?"
>
> Please kindly share your experience.
>
> Floyd
>
>
> 2012/11/8 Alexandre Rafalovitch 
>
> > It is very easy to do this on Apache, but you need to be aware that
> > User-Agent is extremely easy to both sniff and spoof.
> >
> > Have you thought of perhaps using Client and Server Certificates to
> protect
> > the connection and embedding those certificates into clients?
> >
> > Regards,
> >Alex.
> >
> > Personal blog: http://blog.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Thu, Nov 8, 2012 at 9:39 AM, Bruno Mannina  wrote:
> >
> > > Dear All,
> > >
> > > I'm using an external program (my own client) to access to my
> Apache-SolR
> > > database.
> > > I would like to restrict the SOLR access to a specific User-Agent
> > (defined
> > > in my program).
> > >
> > > I would like to know if it's possible to do that directly in SolR
> config
> > > or I must
> > > process that in the Apache server?
> > >
> > > My program do only requests like this (i.e.):
> > > http://xxx.xxx.xxx.xxx:pp/**solr/select/?q=ap%3Afuelcell&**
> > > version=2.2&start=0&rows=10&**indent=on
> > >
> > > I can add on my HTTP component properties an User-Agent, Log, Pass,
> > etc...
> > > like a standard Http connection.
> > >
> > > To complete: my soft is distribued to several users and I would like to
> > > limit the SOLR access to these users and with my program.
> > > FireFox, Chrome, I.E. will be unauthorized.
> > >
> > > thanks for your comment or help,
> > > Bruno
> > >
> > > Ubuntu 12.04LTS
> > > SolR 3.6
> > >
> >
>


Re: Preventing accepting queries while custom QueryComponent starts up?

2012-11-08 Thread Amit Nithian
Hi Aaron,

Check out
http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/handler/PingRequestHandler.html
You'll see the ?action=enable/disable. I have our load balancers remove the
server out of rotation when the response code != 200 for some number of
times in a row which I suspect you are doing too. If I am rolling releasing
our search code to production, it gets disabled, sleep for some known
number of seconds for the LB to yank the search server out of rotation,
push the code, execute some queries using CURL to ensure a response (the
warming process should block the request until done) and then enable.

HTH!
Amit


On Thu, Nov 8, 2012 at 2:01 PM, Aaron Daubman  wrote:

> >  (plus when I deploy, my deploy script
> > runs some actual simple test queries to ensure they return before
> enabling
> > the ping handler to return 200s) to avoid this problem.
> >
>
> What are you doing to programmatically disable/enable the ping handler?
> This sounds like exactly what I should be doing as well...
>


RE: DIH nested entities don't work

2012-11-08 Thread mroosendaal
Hi James,

What i did:
* build a jar from the patch
* downloaded the BDB library
* added them to my classpath
* download a nightly 4.1 Sol build
* created a db config according to:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestEphemeralCache.java

although i got things working, after 2 hours of indexing i stopped the
proces. For that amount of data it took endeca 1h15. After looking at some
of the tests in the patch i configured the data-config.xml as follows:






Although different in behaviour:
[snapshot from the indexing after 8 minutes: Requests: 2899, Fetched:
28974398, Skipped: 0, Processed: 2258] it was still slow and the parameter
'persistCacheBaseDir' has no effect. The difference in behaviour from the
previous is that it had only 2 requests and hadn't processed anything after
2 hours.

Hope you can help me.

Thanks,
Maarten




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4019223.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to use tokenizer in solr under jetty?

2012-11-08 Thread Dmitry Kan
By specifying the tokenizer in question as a filter in schema.xml for your
text field type. In case it is your custom tokenizer, it must adhere to the
Lucene / SOLR API to submit tokens properly down the processing stream..
like these: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.
This is web-server agnostic to my best knowledge.

- Dmitry

On Thu, Nov 8, 2012 at 1:02 PM, FANGXiaoshan  wrote:

>
> Thanks.
>


RE: DIH nested entities don't work

2012-11-08 Thread mroosendaal
Additional information: i just finished a test for 10.000 records (the db
containts 600K products), it took 25 minutes and all the parents records had
the same 'feature'.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4019227.html
Sent from the Solr - User mailing list archive at Nabble.com.