date:20121203

Re: behavior of solr.KeepWordFilterFactory

2012-12-03 Thread Xi Shen

Solr index is case-sensitive by default, unless you used the lower case
filter. I remember I saw this topic on Solr, and the solution is simple:

copy the filed;
use a new analyzer/tokenizer to process this field, and do not use lower
case filter

when query, make sure both fields are included.


On Mon, Dec 3, 2012 at 3:04 PM, Joe Zhang  wrote:

> In other words, what I wanted to achieve is case-senstive indexing on a
> small set of words. Can anybody help?
>
> On Sun, Dec 2, 2012 at 11:56 PM, Joe Zhang  wrote:
>
> > To be more specific, this is the data type I was using:
> >
> > > positionIncrementGap="100">
> > 
> > 
> >  > words="tickers.txt" ignoreCase="false"/>
> >  > ignoreCase="true" words="stopwords.txt"/>
> >  > generateWordParts="1" generateNumberParts="1"
> > catenateWords="1" catenateNumbers="1" catenateAll="0"
> > splitOnCaseChange="1"/>
> > 
> >  > protected="protwords.txt"/>
> > 
> > 
> > 
> >
> >
> > On Sun, Dec 2, 2012 at 11:51 PM, Joe Zhang  wrote:
> >
> >> yes, that is the correct behavior. But how do I achieve my goal, i.e,
> >> speical treatment on a list of uppercase/special words, normal
> treatment on
> >> everything else?
> >>
> >>
> >> On Sun, Dec 2, 2012 at 11:46 PM, Xi Shen  wrote:
> >>
> >>> By the definition on
> >>>
> >>>
> https://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/KeepWordFilter.html
> >>> ,
> >>> I am pretty sure it is the correct behavior of this filter :)
> >>>
> >>> I guess you are trying to this filter to index some special words in
> >>> Chinese?
> >>>
> >>>
> >>> On Mon, Dec 3, 2012 at 1:54 PM, Joe Zhang 
> wrote:
> >>>
> >>> > I defined the following data type in my solr schema.xml
> >>> >
> >>> > 
> >>> >
> >>> >   >>> > ignoreCase="false"/>
> >>> >
> >>> > 
> >>> >
> >>> > when I use the type "testkeep" to index a test field, my true
> >>> expecation
> >>> > was to make sure solr indexes the uppercase form of a small list of
> >>> words
> >>> > in the file, AND TREAT EVERY OTHER WORD AS USUAL. The goal of
> securing
> >>> the
> >>> > closed list is achieved, but NO OTHER WORD outside the list is
> indexed!
> >>> >
> >>> > Can anybody help? Thanks in advance!
> >>> >
> >>> > Joe
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Regards，
> >>> David Shen
> >>>
> >>> http://about.me/davidshen
> >>> https://twitter.com/#!/davidshen84
> >>>
> >>
> >>
> >
>



-- 
Regards，
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Re: duplicated URL sent from Nutch to solr index

2012-12-03 Thread Xi Shen

Then the "URL" must be the same.


On Mon, Dec 3, 2012 at 2:34 PM, Joe Zhang  wrote:

> Sorry I didn't make it perfectly clear. The "id" field is URL.
>
> On Sun, Dec 2, 2012 at 11:33 PM, Joe Zhang  wrote:
>
> > Thanks!
> >
> >
> > On Sun, Dec 2, 2012 at 11:20 PM, Xi Shen  wrote:
> >
> >> If the value for "id" field is the same, the old entry will be update;
> if
> >> it is new, a new entry will be created & indexed.
> >>
> >> This is my experience. :)
> >>
> >>
> >> On Mon, Dec 3, 2012 at 1:45 PM, Joe Zhang  wrote:
> >>
> >> > Dear list,
> >> >
> >> > I just want to confirm an expected behavior of solr:
> >> >
> >> > Assuming we have " id" in schema.xml for solr,
> >> when
> >> > we send the same URL from nutch to solr multiple times. would there be
> >> ONLY
> >> > ONE entry for that URL, but the content (if changed) and timestamp
> >> would be
> >> > updated?
> >> >
> >> >
> >> > Thanks!
> >> >
> >> > Joe
> >> >
> >>
> >>
> >>
> >> --
> >> Regards，
> >> David Shen
> >>
> >> http://about.me/davidshen
> >> https://twitter.com/#!/davidshen84
> >>
> >
> >
>



-- 
Regards，
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Re: Solr 4: Join Query

2012-12-03 Thread Vikash Sharma

Hi Erick,
One more thing: So is there any other way to get the result?
I mean, I need to get both parent and child document in/not nested format.

Regards,
Vikash

Regards,
Vikash Sharma
vikash0...@gmail.com


On Sat, Dec 1, 2012 at 10:29 PM, Erick Erickson wrote:

> That's the way joins work, and why they're called "pseudo join", they don't
> work like DB joins and return data from both records
>
> Joins were put in for a specific use-case, when you try to treat Solr like
> a DB you're bound to be disappointed. I'd think about reworking the
> solution to de-normalize the data so you don't have to do joins.
>
> Best
> Erick
>
>
> On Fri, Nov 30, 2012 at 10:38 AM, Vikash Sharma  >wrote:
>
> > Hi All,
> > I have my field definition in schema.xml like below
> >
> > 
> > 
> > 
> > 
> >
> >
> > I need to create separate record in solr for each parent child
> > relationship... such that if child is same across different parent that
> it
> > gets stored only once.
> >
> > For e.g.
> >  ---_Record 1
> > ABC
> > EMP001
> > DOC001
> > My Parent Doc
> >
> >  ---_Record 2
> > DOC001
> > 
> > 
> > My Document Data
> >
> >
> > This will ensure that if any doc_id content is duplicate, than only once
> > the record is inserted in the solr.
> >
> > Lastly, I want the result as join. if emp_id=EMP001. then both record
> > should be returned, as there is a relationship between two records using
> of
> > doc_id = id
> >
> > If I query:
> >
> >
> http://localhost:8983/solr/select?q={!join%20from=doc_id%20to=id}emp_id:EMP001&wt=json
> > <
> >
> http://localhost:8983/solr/select?q={!join%20from=sha_one%20to=id}project_id:10&wt=json
> > >
> >
> > I expect both record should be returned either one after another or
> > nested..
> > But I only get child records...
> >
> >
> > Please help..
> >
> >
> >
> > Regards,
> > Vikash Sharma
> > vikash0...@gmail.com
> >
>

How to change Solr UI

2012-12-03 Thread Romita Saha

Hi,

I want to change the Solr UI. As far as i understand, Solritas is just for 
prototyping, where I can change the UI according to a predefined template 
(Velocity) and cannot add on any additional functionality to that page. 
How can I change the Solr UI otherwise. Any guidance would be appreciated.

Thanks and regards,
Romita

AW: Edismax query parser and phrase queries

2012-12-03 Thread Tantius, Richard

Hi,
the use case we have in mind is that we would like to achieve exact matches for 
explicit phrases. Our users expect that an explicit phrase not only considers 
the order of terms, but also the exact wording. Therefore if we search on 
fields using a data type that is not meant performing exact matches, we need to 
change that for explicit phrases. This means in a usual query we have qf 
default fields using advanced tokenization (for query processing and indexing), 
for example like stemming via SnowballPorterFilterFactory. So our idea was to 
change the default search fields for explicit phrases to achieve exact matches, 
by using a simple data format like for example “string“ (StrField, without 
advanced options).

Extending our example from the last mail: 

qf="title text"

Datatype of title, text, something like “text_advanced”:

 
  
  
  
...

Data type of the additional fields titleExact, textExact:


q="ran away from home" Cat Dog 

-transformTo->

q=( titleExact:"ran away from home" OR textExact:"ran away from home" ) Cat Dog.

Regards,
Richard.

BINSERV
Gesellschaft für interaktive Konzepte und neue Medien mbH
Software Engineer

Gotenstr. 7-9
53175 Bonn
Tel.: +49 (0)228 / 4 22 86 - 38 
Fax.: +49 (0)228 / 4 22 86 - 538
E-Mail:   r.tant...@binserv.de  
Web:  www.binserv.de
  www.binforcepro.de

Geschäftsführer: Rüdiger Jakob
Amtsgericht: Siegburg HRB 6765
Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche 
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den 
Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien 
öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen 
Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien 
umgehend. Vielen Dank!


- Original message -
Von: Jack Krupansky [mailto:j...@basetechnology.com] 
Gesendet: Freitag, 30. November 2012 23:04
An: solr-user@lucene.apache.org
Betreff: Re: Edismax query parser and phrase queries

I don’t have a simple answer for your stated issue, but maybe part of that is 
because I’m not so sure what the exact problem/goal is. I mean, what’s so 
special about phrase queries for your app than they need distinct processing 
from individual terms?

And, ultimately, what goal are you trying to achieve? Such as, how will the 
outcome of the query affect what users see and do.

-- Jack Krupansky

From: Tantius, Richard
Sent: Friday, November 30, 2012 8:44 AM
To: solr-user@lucene.apache.org
Subject: Edismax query parser and phrase queries

Hi,

we are using the edismax query parser and execute queries on specific fields by 
using the qf option. Like others, we are facing the problem we do not want 
explicit phrase queries to be performed on some of the qf fields and also 
require additional search fields for those kind of queries.

We tried to expand explicit phrases in a query by implementing some 
pre-processing logic, which did not seemed to be quite convenient.

So for example (lets assume qf="title text", we want phrase queries to be 
performed on the additional fields "titleAlt textAlt" ): q="ran away from home" 
Cat Dog -transformTo-> q=( titleAlt:"ran away from home" OR textAlt:"ran away 
from home" ) Cat Dog. Unfortunately this gets rather complicated if logic 
operators are involved within the query. Is there some kind of best practice, 
should we for example extend the query parser, or stick to our pre-processing 
approach?


Regards,
Richard.

Re: Replication in SolrCloud

2012-12-03 Thread Arkadi Colson


  
  
Thanks for the explaination It's clear now...
  
  I expanded the setup to:
  4 hosts with 2 shards en 1 replicator for each shard. When I
  shutdown tomcat on solr01-dcg which is the master of shard 1 for
  both collections, the replicator (solr01-gs) seems NOT to
  takeover.
  See logs below.
  
  Dec 3, 2012 9:55:34 AM
  org.apache.solr.cloud.ShardLeaderElectionContext
  runLeaderProcess
  INFO: Running the leader process.
  Dec 3, 2012 9:55:34 AM
  org.apache.solr.cloud.ShardLeaderElectionContext
  shouldIBeLeader
  INFO: Checking if I should try and be the leader.
  Dec 3, 2012 9:55:34 AM
  org.apache.solr.cloud.ShardLeaderElectionContext
  shouldIBeLeader
  INFO: My last published State was Active, it's okay to be the
  leader.
  Dec 3, 2012 9:55:34 AM
  org.apache.solr.cloud.ShardLeaderElectionContext
  runLeaderProcess
  INFO: I may be the new leader - try and sync
  Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.SyncStrategy sync
  INFO: Sync replicas to http://solr01-gs:8983/solr/intradesk/
  Dec 3, 2012 9:55:34 AM org.apache.solr.update.PeerSync sync
  INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr
  START replicas=[http://solr01-dcg:8983/solr/intradesk/]
  nUpdates=100
  Dec 3, 2012 9:55:34 AM org.apache.solr.update.PeerSync sync
  INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr
  DONE.  We have no versions.  sync failed.
  Dec 3, 2012 9:55:34 AM org.apache.solr.common.SolrException
  log
  SEVERE: Sync Failed
  Dec 3, 2012 9:55:34 AM
  org.apache.solr.cloud.ShardLeaderElectionContext
  rejoinLeaderElection
  INFO: There is a better leader candidate than us - going back
  into recovery
  Dec 3, 2012 9:55:35 AM
  org.apache.solr.update.DefaultSolrCoreState doRecovery
  INFO: Running recovery - first canceling any ongoing recovery
  Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy
  run
  INFO: Starting recovery process.  core=intradesk
  recoveringAfterStartup=false
  Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy
  doRecovery
  INFO: Attempting to PeerSync from
  http://solr01-dcg:8983/solr/intradesk/ core=intradesk -
  recoveringAfterStartup=false
  Dec 3, 2012 9:55:35 AM org.apache.solr.update.PeerSync sync
  INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr
  START replicas=[http://solr01-dcg:8983/solr/intradesk/]
  nUpdates=100
  Dec 3, 2012 9:55:35 AM org.apache.solr.update.PeerSync sync
  WARNING: no frame of reference to tell of we've missed updates
  Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy
  doRecovery
  INFO: PeerSync Recovery was not successful - trying
  replication. core=intradesk
  Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy
  doRecovery
  INFO: Starting Replication Recovery. core=intradesk
  Dec 3, 2012 9:55:35 AM
  org.apache.solr.client.solrj.impl.HttpClientUtil createClient
  INFO: Creating new http client,
config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
  Dec 3, 2012 9:55:35 AM org.apache.solr.common.SolrException
  log
  SEVERE: Error while trying to recover.
  core=intradesk:org.apache.solr.client.solrj.SolrServerException:
  Server refused connection at: http://solr01-dcg:8983/solr
      at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
      at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
      at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
      at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
      at
  org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
  Caused by: org.apache.http.conn.HttpHostConnectException:
  Connection to http://solr01-dcg:8983 refused
      at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
      at
org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
      at
org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
      at
org.apache.http.impl.client.DefaultRequestDirector.tr

Re: Replication in SolrCloud

2012-12-03 Thread Arkadi Colson


Never mind I think I found it.

There must be some documents into each shardso they havea version 
number. Then everything seems to work...


On 11/30/2012 04:57 PM, Mark Miller wrote:

Thanks for all the detailed info!

Yes, that is confusing. One of the sore points we have while supporting both 
std Solr and SolrCloud mode.

In SolrCloud, every node is a Master when thinking about std Solr replication. 
However, as you see on the cloud page, only one of them is a *leader*. A leader 
is different than a master.

Being a Master when it comes to the replication handler simply means you can 
replicate the index to other nodes - in SolrCloud we need every node to be 
capable of doing that. Each shard only has one leader, but every node in your 
cluster will be a replication master.

- Mark


On Nov 30, 2012, at 10:32 AM, Arkadi Colson  wrote:


This is my setup for solrCloud 4.0 on Tomcat 7.0.33 and zookeeper 3.4.5

hosts:
- solr01-dcg (first started)
- solr01-gs (second started so becomes replicate)

collections:
- smsc

shards:
- mydoc

zookeeper:
- on solr01-dcg
- on solr01-gs

SOLR_OPTS="-Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName=smsc 
-DzkClientTimeout=2 -DzkHost=solr01-dcg:2181,solr01-gs:2181"

solr.xml:


   
 
   


I upload the config to zookeeper:
java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* 
org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 
solr01-dcg:2181,solr01-gs:2181 -confdir /opt/solr/conf -confname smsc

Linking the config to the collection:
java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* 
org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection mydoc -zkhost 
solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181 -confname 
smsc

cloud on both hosts:



solr01-dcg



solr01-gs:


Any idea?

Thanks!

On 11/30/2012 03:15 PM, Mark Miller wrote:

On Nov 30, 2012, at 5:08 AM, Arkadi Colson 
  wrote:



Hi

I've setup an simple 2 machine cloud with 1 shard, one replicator and 2 
collections.Everything went fine. However when I look at the interface:
http://localhost:8983/solr/#/coll1/replication
  is reporting the both machines are master. Did I do something wrong in my 
config or isit a report for manual replication configuration? Can someone else 
check this?


How? You don't really give anything to look at :)



Is it poossible to link 2 collections to the same conf in zookeeper?



Yes, that is no problem.

- Mark









--
Met vriendelijke groeten

Arkadi Colson

Smartbit bvba . Hoogstraat 13 . 3670 Meeuwen
T +32 11 64 08 80 . F +32 11 64 08 81

Re: News clustering

2012-12-03 Thread Stanislaw Osinski

One of our clients uses Solr's search results clustering for grouping news.
Instead of the default Carrot2 algorithm that ships with Solr they use a
commercial one, but Carrot2 should give you decent clusters too. Here's an
example clustering result:

http://imagebin.org/238001

Staszek

--
Stanislaw Osinski
http://carrotsearch.com

On Fri, Nov 30, 2012 at 4:44 PM, Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:

> Hi all:
>
> I'm thinking on using nutch combined with solr to index some news sites in
> an intranet. And I was wondering how effective could be using the
> clustering component to cluster the search results? Any success history on
> using solr clustering component for news clustering? Any existing solution
> for clustering/classification on index time?
>
> Greetings!
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
>

Re: behavior of solr.KeepWordFilterFactory

2012-12-03 Thread Joe Zhang

across-the-board case-senstive indexing is not what I want...

Let me make sure I understand your suggestion:

   








   







And define content1 as text1, content2 as text2?
On Mon, Dec 3, 2012 at 1:09 AM, Xi Shen  wrote:

> Solr index is case-sensitive by default, unless you used the lower case
> filter. I remember I saw this topic on Solr, and the solution is simple:
>
> copy the filed;
> use a new analyzer/tokenizer to process this field, and do not use lower
> case filter
>
> when query, make sure both fields are included.
>
>
> On Mon, Dec 3, 2012 at 3:04 PM, Joe Zhang  wrote:
>
> > In other words, what I wanted to achieve is case-senstive indexing on a
> > small set of words. Can anybody help?
> >
> > On Sun, Dec 2, 2012 at 11:56 PM, Joe Zhang  wrote:
> >
> > > To be more specific, this is the data type I was using:
> > >
> > > > > positionIncrementGap="100">
> > > 
> > > 
> > >  > > words="tickers.txt" ignoreCase="false"/>
> > >  > > ignoreCase="true" words="stopwords.txt"/>
> > >  > > generateWordParts="1" generateNumberParts="1"
> > > catenateWords="1" catenateNumbers="1"
> catenateAll="0"
> > > splitOnCaseChange="1"/>
> > > 
> > >  > > protected="protwords.txt"/>
> > >  class="solr.RemoveDuplicatesTokenFilterFactory"/>
> > > 
> > > 
> > >
> > >
> > > On Sun, Dec 2, 2012 at 11:51 PM, Joe Zhang 
> wrote:
> > >
> > >> yes, that is the correct behavior. But how do I achieve my goal, i.e,
> > >> speical treatment on a list of uppercase/special words, normal
> > treatment on
> > >> everything else?
> > >>
> > >>
> > >> On Sun, Dec 2, 2012 at 11:46 PM, Xi Shen 
> wrote:
> > >>
> > >>> By the definition on
> > >>>
> > >>>
> >
> https://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/KeepWordFilter.html
> > >>> ,
> > >>> I am pretty sure it is the correct behavior of this filter :)
> > >>>
> > >>> I guess you are trying to this filter to index some special words in
> > >>> Chinese?
> > >>>
> > >>>
> > >>> On Mon, Dec 3, 2012 at 1:54 PM, Joe Zhang 
> > wrote:
> > >>>
> > >>> > I defined the following data type in my solr schema.xml
> > >>> >
> > >>> > 
> > >>> >
> > >>> >   words="keepwords.txt"
> > >>> > ignoreCase="false"/>
> > >>> >
> > >>> > 
> > >>> >
> > >>> > when I use the type "testkeep" to index a test field, my true
> > >>> expecation
> > >>> > was to make sure solr indexes the uppercase form of a small list of
> > >>> words
> > >>> > in the file, AND TREAT EVERY OTHER WORD AS USUAL. The goal of
> > securing
> > >>> the
> > >>> > closed list is achieved, but NO OTHER WORD outside the list is
> > indexed!
> > >>> >
> > >>> > Can anybody help? Thanks in advance!
> > >>> >
> > >>> > Joe
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Regards，
> > >>> David Shen
> > >>>
> > >>> http://about.me/davidshen
> > >>> https://twitter.com/#!/davidshen84
> > >>>
> > >>
> > >>
> > >
> >
>
>
>
> --
> Regards，
> David Shen
>
> http://about.me/davidshen
> https://twitter.com/#!/davidshen84
>

Re: News clustering

2012-12-03 Thread Iwan Hanjoyo

Hi Stanislaw Osinski,


On Mon, Dec 3, 2012 at 6:13 PM, Stanislaw Osinski wrote:

> One of our clients uses Solr's search results clustering for grouping news.
> Instead of the default Carrot2 algorithm that ships with Solr they use a
> commercial one, but Carrot2 should give you decent clusters too. Here's an
> example clustering result:
>
> http://imagebin.org/238001
>
> Staszek
>
> --
> Stanislaw Osinski
> http://carrotsearch.com
>
> On Fri, Nov 30, 2012 at 4:44 PM, Jorge Luis Betancourt Gonzalez <
> jlbetanco...@uci.cu> wrote:
>
> > Hi all:
> >
> > I'm thinking on using nutch combined with solr to index some news sites
> in
> > an intranet. And I was wondering how effective could be using the
> > clustering component to cluster the search results? Any success history
> on
> > using solr clustering component for news clustering? Any existing
> solution
> > for clustering/classification on index time?
> >
> > Greetings!
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> > INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
> >
>

Re: News clustering

2012-12-03 Thread Iwan Hanjoyo

Hi Stanislaw Osinski,

Was the picture generated using Lingo 3G algorihtms?
I saw some sub-clusters inside it.
Nice pic :)

I am interested to learn it.
How long is the Lingo 3G trial period?

Is there any way to programmatically measure the performance of Carrot2
clustering algorithm?
thanx

cheers

Hanjoyo

On Mon, Dec 3, 2012 at 6:13 PM, Stanislaw Osinski wrote:

> One of our clients uses Solr's search results clustering for grouping news.
> Instead of the default Carrot2 algorithm that ships with Solr they use a
> commercial one, but Carrot2 should give you decent clusters too. Here's an
> example clustering result:
>
> http://imagebin.org/238001
>
> Staszek
>
> --
> Stanislaw Osinski
> http://carrotsearch.com
>
> On Fri, Nov 30, 2012 at 4:44 PM, Jorge Luis Betancourt Gonzalez <
> jlbetanco...@uci.cu> wrote:
>
> > Hi all:
> >
> > I'm thinking on using nutch combined with solr to index some news sites
> in
> > an intranet. And I was wondering how effective could be using the
> > clustering component to cluster the search results? Any success history
> on
> > using solr clustering component for news clustering? Any existing
> solution
> > for clustering/classification on index time?
> >
> > Greetings!
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> > INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
> >
>

Re: How to change Solr UI

2012-12-03 Thread Iwan Hanjoyo

Hi Romita,

In my opinion, if you are new to Solr, you can start learning from Solritas.
Solritas uses Apache Velocity, a templating language, CSS and JQuery to
manage it looks and behavior.
Besides that you can write a custom SearchComponent inside the /browse
SearchHandler
to add more functionality to your search application.

Kind regards,

Hanjoyo

On Mon, Dec 3, 2012 at 4:35 PM, Romita Saha wrote:

> Hi,
>
> I want to change the Solr UI. As far as i understand, Solritas is just for
> prototyping, where I can change the UI according to a predefined template
> (Velocity) and cannot add on any additional functionality to that page.
> How can I change the Solr UI otherwise. Any guidance would be appreciated.
>
> Thanks and regards,
> Romita
>

Re: News clustering

2012-12-03 Thread Stanislaw Osinski

> Was the picture generated using Lingo 3G algorihtms?
> I saw some sub-clusters inside it.
> Nice pic :)
>

That is correct.


I am interested to learn it.
> How long is the Lingo 3G trial period?
>

I'll send you the details in a private e-mail in a second.



> Is there any way to programmatically measure the performance of Carrot2
> clustering algorithm?
>

I'm not sure what you mean by performance. Measuring clustering time is
pretty straightforward, measuring the quality of clusters is not, a lot
depends on your specific data and application.

Staszek

Whole Phrase search in Solr

2012-12-03 Thread NickA

Hello,

I am trying to achieve searching with a phrase in SOLR. Specifically I have
the following field in my schema:

   


  



  
  



  


Also (as a second similar problem) in the “synonyms.txt” I have values like
these:

aword => a whole phrase

and I even tried:

aword => "a whole phrase"

now I tried searching for “check this” in several ways:

fq=search_field:check this
fq=search_field:check+this
fq=search_field:"check this"
fq=search_field:'check this'

but in all cases the search seems to run for “check OR this”!

similarly, if I search for “aword” which matches the synonyms file, the
search also looks for “a OR whole OR phrase”.

What am I doing wrong? Is there any way to force the query for the whole
phrase and not for each word separately?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu

Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch,
there was no document in SOLR but only one.

When I analysed , I can see stemming is correct and I can see these for
words "bul", "baş" ,"gör" and "umut" in SF row
I attached analyse screens

Erol Akarsu

On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky wrote:

> Have you tried using the Solr Admin Analysis page, using the word and a
> few words of context for index analysis and the word alone for query
> analysis?
>
> And be sure to fully reindex if you change ANYTHING in the schema fields
> or field types.
>
> -- Jack Krupansky
>
> From: Erol Akarsu
> Sent: Sunday, December 02, 2012 10:38 PM
> To: solr-user@lucene.apache.org
> Subject: Luke and SOLR search giving different results
>
> Hi,
>
> I am trying to apply SOLR for Turkish Language for my research.
>
> Instead of using language identification, I manually assigned Turkish
> language for a sample test document. I have configured SOLR schema.xml,
> activated the part below. I have added the attached document
> testTurkishDoc.xml that is inserted to SOLR database.
>
> But searching for raw Lucene index through Luke and SOLR 4.0 search though
> GUI is giving different results. In picture Selection_006.png, the word
> "baş" is listed as top term. I search the word "baş" in Luke and I got the
> result result that is only document, shown in Selection_004.png.
>
> But in SOLR GUI, I am getting empty result for word "baş" in picture
> Selection_002.png.
>
> In the text we have  features field, that has word "baştan" that is being
> derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing
> search different than Luke. I could not figure it out why I could not find
> it while getting in Luke. The same thing happens for words "umut", "bul"
> and "gör".
>
> I will appreciate if you can help me to get same results from SOLR UI.
>
>
> 
>Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
> dedirterek.
>   
>
>
>
> Added to schema.xml for SOLR:
>
>  multiValued="true"/>
>  positionIncrementGap="100">
>   
> 
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>  language="Turkish"/>
>   
>   
> 
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>  language="Turkish"/>
>   
> 
>
>
>

Backing up SolR 4.0

2012-12-03 Thread Andy D'Arcy Jewell


Hi all.

I'm new to SolR, and I have recently had to set up a SolR server running 
4.0.


I've been searching for info on backing it up, but all I've managed to 
come up with is "it'll be different" or "you'll be able to do push 
replication" or using http and the command=backup parameter, which 
doesn't sound like it will be effective for a production setup (unless 
I've got that wrong)...



I was wondering if I can just stop or suspend the SolR server, then do 
an LVM snapshot of the data store, before bringing it back on line, but 
I'm not sure if that will cut it. I gather merely rsyncing the data 
files won't do...


Can anyone give me a pointer to that "easy-to-find" document I have so 
far failed to find? Or failing that, maybe some sound advice on how to 
proceed?


Regards,
-Andy




--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk

Re: News clustering

2012-12-03 Thread Iwan Hanjoyo

Hi Stanislaw,

I mean measuring the similarity between the document in each cluster.
Also, difference between document on one cluster with another cluster.

I saw the sample code ClusteringQualityBencmark.java
However, I do not know how to make use of it for assessing my Solr
Clustering performance.

Kind regards,

Hanjoyo

On Mon, Dec 3, 2012 at 8:11 PM, Stanislaw Osinski wrote:

> > Was the picture generated using Lingo 3G algorihtms?
> > I saw some sub-clusters inside it.
> > Nice pic :)
> >
>
> That is correct.
>
>
> I am interested to learn it.
> > How long is the Lingo 3G trial period?
> >
>
> I'll send you the details in a private e-mail in a second.
>
>
>
> > Is there any way to programmatically measure the performance of Carrot2
> > clustering algorithm?
> >
>
> I'm not sure what you mean by performance. Measuring clustering time is
> pretty straightforward, measuring the quality of clusters is not, a lot
> depends on your specific data and application.
>
> Staszek
>

PHP client

2012-12-03 Thread Arkadi Colson


Hi

Anyone tested the pecl Solr Client in combination with SolrCloud? I 
seems to be broken since 4.0


Best regard
Arkadi

Re: PHP client

2012-12-03 Thread Bill Au

https://bugs.php.net/bug.php?id=62332

There is a fork with patches applied.


On Mon, Dec 3, 2012 at 9:38 AM, Arkadi Colson  wrote:

> Hi
>
> Anyone tested the pecl Solr Client in combination with SolrCloud? I seems
> to be broken since 4.0
>
> Best regard
> Arkadi
>
>

Re: AW: Edismax query parser and phrase queries

2012-12-03 Thread Jack Krupansky

Okay, so the bottom line here is that you wish to change the semantics of 
quoted phrases. Fine, that's your prerogative, but a change in semantics 
would require a change to the query parser, or as you originally indicated, 
a pre-processor. It does sound as if a pre-processor is the way to go here.


You still have a choice: An application-level preprocessor that generates an 
edismax query, or implement a Solr SearchComponent that pre-processes the 
query after Solr receives it but before edismax sees it. The former is 
probably easier. The only question is whether there might be multiple 
applications that access the same Solr node, so that maybe centralizing the 
pre-processing in Solr might be warranted.


-- Jack Krupansky

-Original Message- 
From: Tantius, Richard

Sent: Monday, December 03, 2012 5:03 AM
To: solr-user@lucene.apache.org
Subject: AW: Edismax query parser and phrase queries

Hi,
the use case we have in mind is that we would like to achieve exact matches 
for explicit phrases. Our users expect that an explicit phrase not only 
considers the order of terms, but also the exact wording. Therefore if we 
search on fields using a data type that is not meant performing exact 
matches, we need to change that for explicit phrases. This means in a usual 
query we have qf default fields using advanced tokenization (for query 
processing and indexing), for example like stemming via 
SnowballPorterFilterFactory. So our idea was to change the default search 
fields for explicit phrases to achieve exact matches, by using a simple data 
format like for example “string“ (StrField, without advanced options).


Extending our example from the last mail:

qf="title text"

Datatype of title, text, something like “text_advanced”:

 
 
 
 
...

Data type of the additional fields titleExact, textExact:
omitNorms="true"/>


q="ran away from home" Cat Dog

-transformTo->

q=( titleExact:"ran away from home" OR textExact:"ran away from home" ) Cat 
Dog.


Regards,
Richard.

BINSERV
Gesellschaft für interaktive Konzepte und neue Medien mbH
Software Engineer

Gotenstr. 7-9
53175 Bonn
Tel.: +49 (0)228 / 4 22 86 - 38
Fax.: +49 (0)228 / 4 22 86 - 538
E-Mail:   r.tant...@binserv.de
Web:  www.binserv.de
   www.binforcepro.de

Geschäftsführer: Rüdiger Jakob
Amtsgericht: Siegburg HRB 6765
Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
Diese E-Mail einschließlich eventuell angehängter Dateien enthält 
vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der 
richtige Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen 
Sie weder den Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell 
angehängten Dateien öffnen und auch nichts kopieren oder 
weitergeben/verbreiten. Bitte verständigen Sie den Absender und löschen Sie 
diese E-Mail und eventuell angehängte Dateien umgehend. Vielen Dank!



- Original message -
Von: Jack Krupansky [mailto:j...@basetechnology.com]
Gesendet: Freitag, 30. November 2012 23:04
An: solr-user@lucene.apache.org
Betreff: Re: Edismax query parser and phrase queries

I don’t have a simple answer for your stated issue, but maybe part of that 
is because I’m not so sure what the exact problem/goal is. I mean, what’s so 
special about phrase queries for your app than they need distinct processing 
from individual terms?


And, ultimately, what goal are you trying to achieve? Such as, how will the 
outcome of the query affect what users see and do.


-- Jack Krupansky

From: Tantius, Richard
Sent: Friday, November 30, 2012 8:44 AM
To: solr-user@lucene.apache.org
Subject: Edismax query parser and phrase queries

Hi,

we are using the edismax query parser and execute queries on specific fields 
by using the qf option. Like others, we are facing the problem we do not 
want explicit phrase queries to be performed on some of the qf fields and 
also require additional search fields for those kind of queries.


We tried to expand explicit phrases in a query by implementing some 
pre-processing logic, which did not seemed to be quite convenient.


So for example (lets assume qf="title text", we want phrase queries to be 
performed on the additional fields "titleAlt textAlt" ): q="ran away from 
home" Cat Dog -transformTo-> q=( titleAlt:"ran away from home" OR 
textAlt:"ran away from home" ) Cat Dog. Unfortunately this gets rather 
complicated if logic operators are involved within the query. Is there some 
kind of best practice, should we for example extend the query parser, or 
stick to our pre-processing approach?



Regards,
Richard.

Re: Whole Phrase search in Solr

2012-12-03 Thread Jack Krupansky

The OR behavior is because the default operator is OR. You can change that 
by setting q.op=AND.


Try the quoted phrases again, but with &debugQuery=true to see what query is 
actually generated.


Finally, if you remove stop words at index time, then you must remove them 
at query time as well.


-- Jack Krupansky

-Original Message- 
From: NickA

Sent: Monday, December 03, 2012 6:28 AM
To: solr-user@lucene.apache.org
Subject: Whole Phrase search in Solr

Hello,

I am trying to achieve searching with a phrase in SOLR. Specifically I have
the following field in my schema:

  

   
 
   
   
   
 
 
   
   
   
 
   

Also (as a second similar problem) in the “synonyms.txt” I have values like
these:

aword => a whole phrase

and I even tried:

aword => "a whole phrase"

now I tried searching for “check this” in several ways:

fq=search_field:check this
fq=search_field:check+this
fq=search_field:"check this"
fq=search_field:'check this'

but in all cases the search seems to run for “check OR this”!

similarly, if I search for “aword” which matches the synonyms file, the
search also looks for “a OR whole OR phrase”.

What am I doing wrong? Is there any way to force the query for the whole
phrase and not for each word separately?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky

So, does that highlight the problem for you or not? Is the term analyzed as you 
expected?

-- Jack Krupansky

From: Erol Akarsu 
Sent: Monday, December 03, 2012 8:44 AM
To: solr-user@lucene.apache.org 
Subject: Re: Luke and SOLR search giving different results

Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch, there 
was no document in SOLR but only one. 

When I analysed , I can see stemming is correct and I can see these for words 
"bul", "baş" ,"gör" and "umut" in SF row
I attached analyse screens

Erol Akarsu

On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky  wrote:

  Have you tried using the Solr Admin Analysis page, using the word and a few 
words of context for index analysis and the word alone for query analysis?

  And be sure to fully reindex if you change ANYTHING in the schema fields or 
field types.

  -- Jack Krupansky

  From: Erol Akarsu
  Sent: Sunday, December 02, 2012 10:38 PM
  To: solr-user@lucene.apache.org
  Subject: Luke and SOLR search giving different results

  Hi,

  I am trying to apply SOLR for Turkish Language for my research.

  Instead of using language identification, I manually assigned Turkish 
language for a sample test document. I have configured SOLR schema.xml, 
activated the part below. I have added the attached document testTurkishDoc.xml 
that is inserted to SOLR database.

  But searching for raw Lucene index through Luke and SOLR 4.0 search though 
GUI is giving different results. In picture Selection_006.png, the word "baş" 
is listed as top term. I search the word "baş" in Luke and I got the result 
result that is only document, shown in Selection_004.png.

  But in SOLR GUI, I am getting empty result for word "baş" in picture 
Selection_002.png.

  In the text we have  features field, that has word "baştan" that is being 
derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing 
search different than Luke. I could not figure it out why I could not find it 
while getting in Luke. The same thing happens for words "umut", "bul" and "gör".

  I will appreciate if you can help me to get same results from SOLR UI.

 Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” 
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve 
büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması 
reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam 
Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda 
Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in 
ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda 
bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek.

  Added to schema.xml for SOLR:

Re: Solr 4: Join Query

2012-12-03 Thread Erick Erickson

not that I know of. Also, your performance will be much better if you can
denormlized the data.


On Mon, Dec 3, 2012 at 12:44 AM, Vikash Sharma  wrote:

> Hi Erick,
> One more thing: So is there any other way to get the result?
> I mean, I need to get both parent and child document in/not nested format.
>
> Regards,
> Vikash
>
> Regards,
> Vikash Sharma
> vikash0...@gmail.com
>
>
> On Sat, Dec 1, 2012 at 10:29 PM, Erick Erickson  >wrote:
>
> > That's the way joins work, and why they're called "pseudo join", they
> don't
> > work like DB joins and return data from both records
> >
> > Joins were put in for a specific use-case, when you try to treat Solr
> like
> > a DB you're bound to be disappointed. I'd think about reworking the
> > solution to de-normalize the data so you don't have to do joins.
> >
> > Best
> > Erick
> >
> >
> > On Fri, Nov 30, 2012 at 10:38 AM, Vikash Sharma  > >wrote:
> >
> > > Hi All,
> > > I have my field definition in schema.xml like below
> > >
> > > 
> > > 
> > > 
> > > 
> > >
> > >
> > > I need to create separate record in solr for each parent child
> > > relationship... such that if child is same across different parent that
> > it
> > > gets stored only once.
> > >
> > > For e.g.
> > >  ---_Record 1
> > > ABC
> > > EMP001
> > > DOC001
> > > My Parent Doc
> > >
> > >  ---_Record 2
> > > DOC001
> > > 
> > > 
> > > My Document Data
> > >
> > >
> > > This will ensure that if any doc_id content is duplicate, than only
> once
> > > the record is inserted in the solr.
> > >
> > > Lastly, I want the result as join. if emp_id=EMP001. then both record
> > > should be returned, as there is a relationship between two records
> using
> > of
> > > doc_id = id
> > >
> > > If I query:
> > >
> > >
> >
> http://localhost:8983/solr/select?q={!join%20from=doc_id%20to=id}emp_id:EMP001&wt=json
> > > <
> > >
> >
> http://localhost:8983/solr/select?q={!join%20from=sha_one%20to=id}project_id:10&wt=json
> > > >
> > >
> > > I expect both record should be returned either one after another or
> > > nested..
> > > But I only get child records...
> > >
> > >
> > > Please help..
> > >
> > >
> > >
> > > Regards,
> > > Vikash Sharma
> > > vikash0...@gmail.com
> > >
> >
>

Re: How to change Solr UI

2012-12-03 Thread Erick Erickson

Adding to what Iwan said, I want to be sure you're not confusing
prototyping with a full-fledged application. The Velocity code included is
mostly intended as a rapid-prototyping vehicle. There are significant
security issues if you try to use it as your user-facing application, be
sure you trust your users if you go down this route.

But to change it, see the Apache velocity project, and the code in /conf/velocity.

Note that Velocity _can_ be used for user-facing code, but be very sure you
secure your Solr. If you allow direct access, a user can easily enter
something like 
http:///update?commit=true&stream.body=*:*.
And all your documents will be gone.

Most installations use a middle layer between Solr and the user that
controls access.

Best
Erick

On Mon, Dec 3, 2012 at 5:01 AM, Iwan Hanjoyo  wrote:

> Hi Romita,
>
> In my opinion, if you are new to Solr, you can start learning from
> Solritas.
> Solritas uses Apache Velocity, a templating language, CSS and JQuery to
> manage it looks and behavior.
> Besides that you can write a custom SearchComponent inside the /browse
> SearchHandler
> to add more functionality to your search application.
>
> Kind regards,
>
> Hanjoyo
>
> On Mon, Dec 3, 2012 at 4:35 PM, Romita Saha  >wrote:
>
> > Hi,
> >
> > I want to change the Solr UI. As far as i understand, Solritas is just
> for
> > prototyping, where I can change the UI according to a predefined template
> > (Velocity) and cannot add on any additional functionality to that page.
> > How can I change the Solr UI otherwise. Any guidance would be
> appreciated.
> >
> > Thanks and regards,
> > Romita
> >
>

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu

Jack,

Yes.

I expect SOLR should give same search results as Luked does.

Term analyzer gives correct answer in SOLR as expected. But SOLR does not
return correct search results.

I don't know why.

Erol Akarsu

On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky wrote:

> So, does that highlight the problem for you or not? Is the term analyzed
> as you expected?
>
> -- Jack Krupansky
>
> From: Erol Akarsu
> Sent: Monday, December 03, 2012 8:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Luke and SOLR search giving different results
>
> Jack,
>
> Thanks for help.
>
> I removed data folder  of SOLR and indexed this sample doc from scratch,
> there was no document in SOLR but only one.
>
> When I analysed , I can see stemming is correct and I can see these for
> words "bul", "baş" ,"gör" and "umut" in SF row
> I attached analyse screens
>
> Erol Akarsu
>
>
> On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky 
> wrote:
>
>   Have you tried using the Solr Admin Analysis page, using the word and a
> few words of context for index analysis and the word alone for query
> analysis?
>
>   And be sure to fully reindex if you change ANYTHING in the schema fields
> or field types.
>
>   -- Jack Krupansky
>
>   From: Erol Akarsu
>   Sent: Sunday, December 02, 2012 10:38 PM
>   To: solr-user@lucene.apache.org
>   Subject: Luke and SOLR search giving different results
>
>
>   Hi,
>
>   I am trying to apply SOLR for Turkish Language for my research.
>
>   Instead of using language identification, I manually assigned Turkish
> language for a sample test document. I have configured SOLR schema.xml,
> activated the part below. I have added the attached document
> testTurkishDoc.xml that is inserted to SOLR database.
>
>   But searching for raw Lucene index through Luke and SOLR 4.0 search
> though GUI is giving different results. In picture Selection_006.png, the
> word "baş" is listed as top term. I search the word "baş" in Luke and I got
> the result result that is only document, shown in Selection_004.png.
>
>   But in SOLR GUI, I am getting empty result for word "baş" in picture
> Selection_002.png.
>
>   In the text we have  features field, that has word "baştan" that is
> being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is
> doing search different than Luke. I could not figure it out why I could not
> find it while getting in Luke. The same thing happens for words "umut",
> "bul" and "gör".
>
>   I will appreciate if you can help me to get same results from SOLR UI.
>
>
>   
>  Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
> dedirterek.
> 
>
>
>
>   Added to schema.xml for SOLR:
>
>multiValued="true"/>
>positionIncrementGap="100">
> 
>   
>   
>words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>language="Turkish"/>
> 
> 
>   
>   
>words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>language="Turkish"/>
> 
>   
>
>
>
>

Re: Whole Phrase search in Solr

2012-12-03 Thread NickA

Thank you Jack,

the problem with the "AND" is that it does not search for a PHRASE but for
the 2 words being SOMEWHERE in the article.

For example the "Check this" will NOT search for "Check this" as a PHRASE
but for the "Check" word and the "this" word somewhere in the article, even
far away the one from the other.

So the suggestions that you made do not work for searching as a "PHRASE".

Unless we do something wrong?

Any other ideas on the PHRASE search?

Thank you again!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024029.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Backing up SolR 4.0

2012-12-03 Thread Erick Erickson

There's no real need to do what you ask.

First thing is that you should always be prepared, in the worst-case
scenario, to regenerate your entire index.

That said, perhaps the easiest way to back up Solr is just to use
master/slave replication. Consider having a machine that's a slave to the
master (but not necessarily searched against) and periodically poll your
master (say daily or whatever your interval is). You can configure Solr to
keep N copies of the index as extra insurance. These will be fairly static
so if you _really_ wanted to you could just copy the /data
directory somewhere, but I don't know if that's necessary.

See: http://wiki.apache.org/solr/SolrReplication

Best
Erick

On Mon, Dec 3, 2012 at 6:07 AM, Andy D'Arcy Jewell <
andy.jew...@sysmicro.co.uk> wrote:

> Hi all.
>
> I'm new to SolR, and I have recently had to set up a SolR server running
> 4.0.
>
> I've been searching for info on backing it up, but all I've managed to
> come up with is "it'll be different" or "you'll be able to do push
> replication" or using http and the command=backup parameter, which doesn't
> sound like it will be effective for a production setup (unless I've got
> that wrong)...
>
>
> I was wondering if I can just stop or suspend the SolR server, then do an
> LVM snapshot of the data store, before bringing it back on line, but I'm
> not sure if that will cut it. I gather merely rsyncing the data files won't
> do...
>
> Can anyone give me a pointer to that "easy-to-find" document I have so far
> failed to find? Or failing that, maybe some sound advice on how to proceed?
>
> Regards,
> -Andy
>
>
>
>
> --
> Andy D'Arcy Jewell
>
> SysMicro Limited
> Linux Support
> E:  andy.jew...@sysmicro.co.uk
> W:  www.sysmicro.co.uk
>
>

Re: Backing up SolR 4.0

2012-12-03 Thread Andy D'Arcy Jewell


On 03/12/12 16:39, Erick Erickson wrote:

There's no real need to do what you ask.

First thing is that you should always be prepared, in the worst-case
scenario, to regenerate your entire index.

That said, perhaps the easiest way to back up Solr is just to use
master/slave replication. Consider having a machine that's a slave to the
master (but not necessarily searched against) and periodically poll your
master (say daily or whatever your interval is). You can configure Solr to
keep N copies of the index as extra insurance. These will be fairly static
so if you_really_  wanted to you could just copy the /data
directory somewhere, but I don't know if that's necessary.

See:http://wiki.apache.org/solr/SolrReplication

Best
Erick

Hi Erick,

Thanks for that, I'll take a look.

However, wouldn't re-creating the index on a large dataset take an 
inordinate amount of time? The system I will be backing up is likely to 
undergo rapid development and thus schema changes, so I need some kind 
of insurance against corruption if we need to roll-back after a change.


How should I go about creating multiplebackup versions I can put aside 
(e.g. on tape) to hedge against the down-time which would be required to 
regenerate the indexes from scratch?


Regards,
-Andy

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk

Re: AW: Edismax query parser and phrase queries

2012-12-03 Thread Erick Erickson

It _seems_ like just adding "phrase fields" (qf) to your edismax defaults
gets you close. It would have the problem of matching if the field were
longer... but it might be "close enough".

Otherwise, why not just add in fq clauses on your exact fields? Because one
problem you'll have is that you need to get the parameters past the parser
to the field, which will be...er...interesting.

And one note. Rather than String fields (which are case sensitive),
consider KeywordTokenizer and LowercaseFilter or some such.

But I'd _really_ prove that you can't get close enough with current
functionality before I went down the custom route. Often things like this
seem like a good idea but then don't improve results enough to be worth the
complexity.

Best
Erick


On Mon, Dec 3, 2012 at 8:00 AM, Jack Krupansky wrote:

> Okay, so the bottom line here is that you wish to change the semantics of
> quoted phrases. Fine, that's your prerogative, but a change in semantics
> would require a change to the query parser, or as you originally indicated,
> a pre-processor. It does sound as if a pre-processor is the way to go here.
>
> You still have a choice: An application-level preprocessor that generates
> an edismax query, or implement a Solr SearchComponent that pre-processes
> the query after Solr receives it but before edismax sees it. The former is
> probably easier. The only question is whether there might be multiple
> applications that access the same Solr node, so that maybe centralizing the
> pre-processing in Solr might be warranted.
>
> -- Jack Krupansky
>
> -Original Message- From: Tantius, Richard
> Sent: Monday, December 03, 2012 5:03 AM
> To: solr-user@lucene.apache.org
> Subject: AW: Edismax query parser and phrase queries
>
>
> Hi,
> the use case we have in mind is that we would like to achieve exact
> matches for explicit phrases. Our users expect that an explicit phrase not
> only considers the order of terms, but also the exact wording. Therefore if
> we search on fields using a data type that is not meant performing exact
> matches, we need to change that for explicit phrases. This means in a usual
> query we have qf default fields using advanced tokenization (for query
> processing and indexing), for example like stemming via
> SnowballPorterFilterFactory. So our idea was to change the default search
> fields for explicit phrases to achieve exact matches, by using a simple
> data format like for example “string“ (StrField, without advanced options).
>
> Extending our example from the last mail:
>
> qf="title text"
>
> Datatype of title, text, something like “text_advanced”:
>
>   
>  
>  
>  
> ...
>
> Data type of the additional fields titleExact, textExact:
>  omitNorms="true"/>
>
> q="ran away from home" Cat Dog
>
> -transformTo->
>
> q=( titleExact:"ran away from home" OR textExact:"ran away from home" )
> Cat Dog.
>
> Regards,
> Richard.
>
> BINSERV
> Gesellschaft für interaktive Konzepte und neue Medien mbH
> Software Engineer
>
> Gotenstr. 7-9
> 53175 Bonn
> Tel.: +49 (0)228 / 4 22 86 - 38
> Fax.: +49 (0)228 / 4 22 86 - 538
> E-Mail:   r.tant...@binserv.de
> Web:  www.binserv.de
>www.binforcepro.de
>
> Geschäftsführer: Rüdiger Jakob
> Amtsgericht: Siegburg HRB 6765
> Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
> Diese E-Mail einschließlich eventuell angehängter Dateien enthält
> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
> der richtige Adressat sind und diese E-Mail irrtümlich erhalten haben,
> dürfen Sie weder den Inhalt dieser E-Mail nutzen noch dürfen Sie die
> eventuell angehängten Dateien öffnen und auch nichts kopieren oder
> weitergeben/verbreiten. Bitte verständigen Sie den Absender und löschen Sie
> diese E-Mail und eventuell angehängte Dateien umgehend. Vielen Dank!
>
>
> - Original message -
> Von: Jack Krupansky [mailto:jack@basetechnology.**com
> ]
> Gesendet: Freitag, 30. November 2012 23:04
> An: solr-user@lucene.apache.org
> Betreff: Re: Edismax query parser and phrase queries
>
> I don’t have a simple answer for your stated issue, but maybe part of that
> is because I’m not so sure what the exact problem/goal is. I mean, what’s
> so special about phrase queries for your app than they need distinct
> processing from individual terms?
>
> And, ultimately, what goal are you trying to achieve? Such as, how will
> the outcome of the query affect what users see and do.
>
> -- Jack Krupansky
>
> From: Tantius, Richard
> Sent: Friday, November 30, 2012 8:44 AM
> To: solr-user@lucene.apache.org
> Subject: Edismax query parser and phrase queries
>
> Hi,
>
> we are using the edismax query parser and execute queries on specific
> fields by using the qf option. Like others, we are facing the problem we do
> not want explicit phrase queries to be performed on some of the qf fields
> and also require additional search fields for those kind of queries.
>
> We tried to expand explicit phrases in a query b

Re: Whole Phrase search in Solr

2012-12-03 Thread Erick Erickson

As Jack suggested, show the results of adding &debugQuery=on, it'll help us
help you. Particularly with this form: q=search_field:"check this". It
should be doing what you want.

Best
Erick


On Mon, Dec 3, 2012 at 8:37 AM, NickA  wrote:

> Thank you Jack,
>
> the problem with the "AND" is that it does not search for a PHRASE but for
> the 2 words being SOMEWHERE in the article.
>
> For example the "Check this" will NOT search for "Check this" as a PHRASE
> but for the "Check" word and the "this" word somewhere in the article, even
> far away the one from the other.
>
> So the suggestions that you made do not work for searching as a "PHRASE".
>
> Unless we do something wrong?
>
> Any other ideas on the PHRASE search?
>
> Thank you again!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024029.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Whole Phrase search in Solr

2012-12-03 Thread Jack Krupansky

If you use the edismax query parser and set the "pf", "pf2", and "pf3" 
fields your phrases should show up as top results. This will not eliminate 
non-phrase matches, but will assure that phrase matches get boosted.


See:
http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29

-- Jack Krupansky

-Original Message- 
From: NickA

Sent: Monday, December 03, 2012 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Whole Phrase search in Solr

Thank you Jack,

the problem with the "AND" is that it does not search for a PHRASE but for
the 2 words being SOMEWHERE in the article.

For example the "Check this" will NOT search for "Check this" as a PHRASE
but for the "Check" word and the "this" word somewhere in the article, even
far away the one from the other.

So the suggestions that you made do not work for searching as a "PHRASE".

Unless we do something wrong?

Any other ideas on the PHRASE search?

Thank you again!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024029.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Downloading files from the solr replication Handler

2012-12-03 Thread Eva Lacy

They are the '\0' character.
what is a marker?

Gettting the following with a wget
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/xml]


On Fri, Nov 30, 2012 at 4:58 PM, Alexandre Rafalovitch
wrote:

> What mime type you get for binary files? Maybe server is misconfigured for
> that extension and sends them as text. Then they could be the markers.
>
> Do they look like markers?
>
> Regards,
> Alex
> On 30 Nov 2012 04:06, "Eva Lacy"  wrote:
>
> > Doesn't make much sense if they are in binary files as well.
> >
> >
> > On Thu, Nov 29, 2012 at 10:16 PM, Lance Norskog 
> wrote:
> >
> > > Maybe these are text encoding markers?
> > >
> > > - Original Message -
> > > | From: "Eva Lacy" 
> > > | To: solr-user@lucene.apache.org
> > > | Sent: Thursday, November 29, 2012 3:53:07 AM
> > > | Subject: Re: Downloading files from the solr replication Handler
> > > |
> > > | I tried downloading them with my browser and also with a c#
> > > | WebRequest.
> > > | If I skip the first and last 4 bytes it seems work fine.
> > > |
> > > |
> > > | On Thu, Nov 29, 2012 at 2:28 AM, Erick Erickson
> > > | wrote:
> > > |
> > > | > How are you downloading them? I suspect the issue is
> > > | > with the download process rather than Solr, but I'm just guessing.
> > > | >
> > > | > Best
> > > | > Erick
> > > | >
> > > | >
> > > | > On Wed, Nov 28, 2012 at 12:19 PM, Eva Lacy  wrote:
> > > | >
> > > | > > Just to add to that, I'm using solr 3.6.1
> > > | > >
> > > | > >
> > > | > > On Wed, Nov 28, 2012 at 5:18 PM, Eva Lacy  wrote:
> > > | > >
> > > | > > > I downloaded some configuration and data files directly from
> > > | > > > solr in an
> > > | > > > attempt to develop a backup solution.
> > > | > > > I noticed there is some characters at the start and end of the
> > > | > > > file
> > > | > that
> > > | > > > aren't in configuration files, I notice the same characters at
> > > | > > > the
> > > | > start
> > > | > > > and end of the data files.
> > > | > > > Anyone with any idea how I can download these files without the
> > > | > > > extra
> > > | > > > characters or predict how many there are going to be so I can
> > > | > > > skip
> > > | > them?
> > > | > > >
> > > | > >
> > > | >
> > > |
> > >
> >
>

Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky


Two points:

1. Possibly an encoding problem with your container? Is UTF-8 encoding 
enabled?
2. Add &debugQuery=true to your query (from the browser) and see if the 
parser_query has the expected term that matches what Luke reports for the 
index and what Solr Admin Analysis also reports for index analysis.


-- Jack Krupansky

-Original Message- 
From: Erol Akarsu

Sent: Monday, December 03, 2012 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

Yes.

I expect SOLR should give same search results as Luked does.

Term analyzer gives correct answer in SOLR as expected. But SOLR does not
return correct search results.

I don't know why.

Erol Akarsu

On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky 
wrote:



So, does that highlight the problem for you or not? Is the term analyzed
as you expected?

-- Jack Krupansky

From: Erol Akarsu
Sent: Monday, December 03, 2012 8:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch,
there was no document in SOLR but only one.

When I analysed , I can see stemming is correct and I can see these for
words "bul", "baş" ,"gör" and "umut" in SF row
I attached analyse screens

Erol Akarsu


On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky 
wrote:

  Have you tried using the Solr Admin Analysis page, using the word and a
few words of context for index analysis and the word alone for query
analysis?

  And be sure to fully reindex if you change ANYTHING in the schema fields
or field types.

  -- Jack Krupansky

  From: Erol Akarsu
  Sent: Sunday, December 02, 2012 10:38 PM
  To: solr-user@lucene.apache.org
  Subject: Luke and SOLR search giving different results


  Hi,

  I am trying to apply SOLR for Turkish Language for my research.

  Instead of using language identification, I manually assigned Turkish
language for a sample test document. I have configured SOLR schema.xml,
activated the part below. I have added the attached document
testTurkishDoc.xml that is inserted to SOLR database.

  But searching for raw Lucene index through Luke and SOLR 4.0 search
though GUI is giving different results. In picture Selection_006.png, the
word "baş" is listed as top term. I search the word "baş" in Luke and I 
got

the result result that is only document, shown in Selection_004.png.

  But in SOLR GUI, I am getting empty result for word "baş" in picture
Selection_002.png.

  In the text we have  features field, that has word "baştan" that is
being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI 
is
doing search different than Luke. I could not figure it out why I could 
not

find it while getting in Luke. The same thing happens for words "umut",
"bul" and "gör".

  I will appreciate if you can help me to get same results from SOLR UI.


  
 Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda 
Turan

ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir 
de

Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
dedirterek.




  Added to schema.xml for SOLR:

Re: News clustering

2012-12-03 Thread Stanislaw Osinski

> I mean measuring the similarity between the document in each cluster.
> Also, difference between document on one cluster with another cluster.
>
> I saw the sample code ClusteringQualityBencmark.java
> However, I do not know how to make use of it for assessing my Solr
> Clustering performance.
>

You'd need to write your own code for this, here are the most common
clustering quality measures you mentioned:

http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results

These are meant for the general case (numeric attributes), to apply them to
texts, you'd need to use the vector representation of the documents.

One a more general note, synthetic measures test only the document-cluster
assignments, but none take the quality of labels into account (this is
really hard to measure objectively).

Staszek

Re: Whole Phrase search in Solr

2012-12-03 Thread Jack Krupansky

The edismax "phrase boost" feature boosts the phrase IF it occurs - it's 
optional.


If you want Solr to search ONLY by whole phrase, Solr does have a precise 
way to request that - simply enclose the phrase in quotes. But I presume 
that you knew that.


You can certainly preprocess your query to convert raw phrases into quoted 
phrases.


-- Jack Krupansky

-Original Message- 
From: NickA

Sent: Monday, December 03, 2012 12:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Whole Phrase search in Solr

Thank you Jack,

Before doing this major change, please note that the problem is that there
are ZERO matches of the "your products" phrase (on my example below). It is
not that the search finds this phrase but it has it in very low ranking...
it is that it NEVER finds this phrase as a result.

So how will the search show them on top results, since these are ZERO?

OR you mean that with this new parser we WILL get phrase results too?

Thank you again!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024048.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Backing up SolR 4.0

2012-12-03 Thread Shawn Heisey


On 12/3/2012 9:47 AM, Andy D'Arcy Jewell wrote:
However, wouldn't re-creating the index on a large dataset take an 
inordinate amount of time? The system I will be backing up is likely 
to undergo rapid development and thus schema changes, so I need some 
kind of insurance against corruption if we need to roll-back after a 
change.


How should I go about creating multiplebackup versions I can put aside 
(e.g. on tape) to hedge against the down-time which would be required 
to regenerate the indexes from scratch?


Serious production Solr installs require at least two copies of your 
index.  Failures *will* happen, and sometimes they'll be the kind of 
failures that will take down an entire machine.  You can plan for some 
failures -- redundant power supply and RAID are important for this.  
Some failures will cause downtime, though -- multiple disk failures, 
motherboard, CPU, memory, software problems wiping out your index, user 
error, etc.If you have at least one other copy of your index, you'll be 
able to keep the system operational while you fix the down machine.


Replication is a very good way to accomplish getting two or more copies 
of your index.  I would expect that most production Solr installations 
use either plain replication or SolrCloud.  I do my redundancy a 
different way that gives me a lot more flexibility, but replication is a 
VERY solid way to go.


If you are running on a UNIX/Linux platform (just about anything *other* 
than Windows), and backups via replication are not enough for you, you 
can use the hardlink capability in the OS to avoid taking Solr down 
while you make backups.  Here's the basic sequence:


1) Pause indexing, wait for all commits and merges to complete.
2) Create a target directory on the same filesystem as your Solr index.
3) Make hardlinks of all files in your Solr index in the target directory.
4) Resume indexing.
5) Copy the target directory to your backup location at your leisure.
6) Delete the hardlink copies from the target directory.

Making hardlinks is a near-instantaneous operation.  The way that 
Solr/Lucene works will guarantee that your hardlink copy will continue 
to be a valid index snapshot no matter what happens to the live index.  
If you can make the backup and get the hardlinks deleted before your 
index undergoes a merge, the hardlinks will use very little extra disk 
space.


If you leave the hardlink copies around, eventually your live index will 
diverge to the point where the copy has different files and therefore 
takes up disk space.  If you have a *LOT* of extra disk space on the 
Solr server, you can keep multiple hardlink copies around as snapshots.


Recent versions of Windows do have features similar to UNIX links, so 
there may in fact be a way to do this on Windows.  I will leave that for 
someone else to pursue.


Thanks,
Shawn

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu

Jack,

I have already set tomcat server fro UTF-Encoding before. I have added
URIEncoding="UTF-8" to all  elements in server.xml in Tomcat
7.

As you see below, when I search  word "baş"  with debug mode I can see
empty response. But  when I search word "baştan", I can get correct
response.

It seems to me that TurkishAnalyser is not being used in SOLR search
because we can make only full word search "baştan" but not the root word
"baş". Probably, English Analyzer is being used and could not find the root
word. For example, in Luke, if I change "Analyser to use for query parsing"
to EnglishAnalyser, then it can not find word "baş" but it can with
TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.

Is this assumption true? I could not find any other reason

0
58

true
baş
xml

baş
baş
text:baş
text:baş

LuceneQParser

38.0

16.0

3.0

0.0

0.0

0.0

0.0

0.0

10.0

0.0

0.0

0.0

0.0

0.0

10.0

0
2

true
baştan
xml

htt://111.a.b1
6H500F0
tr
Maxtor DiamondMax 11 - hard drive - 500 GB -
SATA-300

Maxtor Corp.
maxtor

electronics
hard drive

SATA 3.0Gb/s, NCQ
8.5ms seek
16MB cache

Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim
senaryoyu!" diyerek
baştan savma reklamlarla kotarmaya bakıyor işi.
Futbolcu Arda Turan
ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un
oynatıldığı
giyim firması reklamı da tam bir fiyasko. Birbirinden
ünlü bu iki
ismin oynadığı reklam Arda'nın kabinde papağan gibi
tekrarladığı
"My darling!" repliği, sonunda Paris'i görünce anlam
veremediğimiz
uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez
izledikten
sonra anlaşılan "Paris seçti, firma yaptı, Arda
bayıldı."
sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
dedirterek.

350.0
350,USD
6
true
2006-02-13T15:26:37Z
1420300467908378624

baştan
baştan
text:baştan
text:baştan

0.028767452 = (MATCH) weight(text:baştan in 0)
[DefaultSimilarity], result of:
0.028767452 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
0.30685282 = idf(docFreq=1, maxDocs=1)
0.09375 = fieldNorm(doc=0)

LuceneQParser

2.0

1.0

1.0

0.0

0.0

0.0

0.0

0.0

1.0

0.0

0.0

0.0

0.0

0.0

1.0

On Mon, Dec 3, 2012 at 12:30 PM, Jack Krupansky wrote:

> Two points:
>
> 1. Possibly an encoding problem with your container? Is UTF-8 encoding
> enabled?
> 2. Add &debugQuery=true to your query (from the browser) and see if the
> parser_query has the expected term that matches what Luke reports for the
> index and what Solr Admin Analysis also reports for index analysis.
>
> -- Jack Krupansky
>
> -Original Messa

Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky

Ah! See where it says "text:baş"? 
Your query is against the "text" field, which probably doesn't have the 
Turkish analysis.


There is probably a copyField from "features" to "text". You use the 
"text_tr" field type for "features", but probably not for the "text" field.


-- Jack Krupansky

-Original Message- 
From: Erol Akarsu

Sent: Monday, December 03, 2012 1:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

I have already set tomcat server fro UTF-Encoding before. I have added
URIEncoding="UTF-8" to all  elements in server.xml in Tomcat
7.

As you see below, when I search  word "baş"  with debug mode I can see
empty response. But  when I search word "baştan", I can get correct
response.

It seems to me that TurkishAnalyser is not being used in SOLR search
because we can make only full word search "baştan" but not the root word
"baş". Probably, English Analyzer is being used and could not find the root
word. For example, in Luke, if I change "Analyser to use for query parsing"
to EnglishAnalyser, then it can not find word "baş" but it can with
TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.

Is this assumption true? I could not find any other reason




   
   0
   58
   
   true
   baş
   xml
   
   
   
   
   baş
   baş
   text:baş
   text:baş
   
   LuceneQParser
   
   38.0
   
   16.0
   
   3.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   
   10.0
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   10.0
   
   
   
   



   
   0
   2
   
   true
   baştan
   xml
   
   
   
   
   htt://111.a.b1
   6H500F0
   tr
   Maxtor DiamondMax 11 - hard drive - 500 GB -
SATA-300
   
   Maxtor Corp.
   maxtor
   
   electronics
   hard drive
   
   
   SATA 3.0Gb/s, NCQ
   8.5ms seek
   16MB cache
   
   Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim
senaryoyu!" diyerek
   baştan savma reklamlarla kotarmaya bakıyor işi.
Futbolcu Arda Turan
   ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un
oynatıldığı
   giyim firması reklamı da tam bir fiyasko. Birbirinden
ünlü bu iki
   ismin oynadığı reklam Arda'nın kabinde papağan gibi
tekrarladığı
   "My darling!" repliği, sonunda Paris'i görünce anlam
veremediğimiz
   uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez
izledikten
   sonra anlaşılan "Paris seçti, firma yaptı, Arda
bayıldı."
   sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
dedirterek.
   
   
   350.0
   350,USD
   6
   true
   2006-02-13T15:26:37Z
   1420300467908378624
   
   
   
   baştan
   baştan
   text:baştan
   text:baştan
   
   
   0.028767452 = (MATCH) weight(text:baştan in 0)
[DefaultSimilarity], result of:
   0.028767452 = fieldWeight in 0, product of:
   1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
   0.30685282 = idf(docFreq=1, maxDocs=1)
   0.09375 = fieldNorm(doc=0)
   
   
   LuceneQParser
   
   2.0
   
   1.0
   
   1.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   
   1.0
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   1.0
   
   
   
   


On Mon, Dec 3, 2012 at 12:30 PM, Jack Krupansky 
wrote:



Two points:

1. Possibly an encoding problem with your cont

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu

Jack,

I have these in schema.xml that defines "features" as type of text_tr

But unfortunately, this fails.

 



  




  
  




  




On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky wrote:

> Ah! See where it says "**text:baş"?
> Your query is against the "text" field, which probably doesn't have the
> Turkish analysis.
>
> There is probably a copyField from "features" to "text". You use the
> "text_tr" field type for "features", but probably not for the "text" field.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Erol Akarsu
> Sent: Monday, December 03, 2012 1:06 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Luke and SOLR search giving different results
>
> Jack,
>
> I have already set tomcat server fro UTF-Encoding before. I have added
> URIEncoding="UTF-8" to all  elements in server.xml in Tomcat
> 7.
>
> As you see below, when I search  word "baş"  with debug mode I can see
> empty response. But  when I search word "baştan", I can get correct
> response.
>
> It seems to me that TurkishAnalyser is not being used in SOLR search
> because we can make only full word search "baştan" but not the root word
> "baş". Probably, English Analyzer is being used and could not find the root
> word. For example, in Luke, if I change "Analyser to use for query parsing"
> to EnglishAnalyser, then it can not find word "baş" but it can with
> TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.
>
> Is this assumption true? I could not find any other reason
>
>
> 
> 
>
>0
>58
>
>true
>baş
>xml
>
>
>
>
>baş
>baş
>text:baş
>**text:baş
>
>LuceneQParser
>
>38.0
>
>16.0
> name="org.apache.solr.handler.**component.QueryComponent">
>3.0
>
> name="org.apache.solr.handler.**component.FacetComponent">
>0.0
>
> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>0.0
>
> name="org.apache.solr.handler.**component.HighlightComponent">
>0.0
>
> name="org.apache.solr.handler.**component.StatsComponent">
>0.0
>
> name="org.apache.solr.handler.**component.DebugComponent">
>0.0
>
>
>
>10.0
> name="org.apache.solr.handler.**component.QueryComponent">
>0.0
>
> name="org.apache.solr.handler.**component.FacetComponent">
>0.0
>
> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>0.0
>
> name="org.apache.solr.handler.**component.HighlightComponent">
>0.0
>
> name="org.apache.solr.handler.**component.StatsComponent">
>0.0
>
> name="org.apache.solr.handler.**component.DebugComponent">
>10.0
>
>
>
>
> 
>
> 
>
>0
>2
>
>true
>baştan
>xml
>
>
>
>
>htt://111.a.b1
>6H500F0
>tr
>Maxtor DiamondMax 11 - hard drive - 500 GB -
> SATA-300
>
>Maxtor Corp.
>maxtor
>
>electronics
>hard drive
>
>
>SATA 3.0Gb/s, NCQ
>8.5ms seek
>16MB cache
>
>Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim
> senaryoyu!" diyerek
>baştan savma reklamlarla kotarmaya bakıyor işi.
> Futbolcu Arda Turan
>ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un
> oynatıldığı
>giyim firması reklamı da tam bir fiyasko. Birbirinden
> ünlü bu iki
>ismin oynadığı reklam Arda'nın kabinde papağan gibi
> tekrarladığı
>"My darling!" repliği, sonunda Paris'i görünce anlam
> veremediğimiz
>uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez
> izledikten
>sonra anlaşılan "Paris seçti, firma yaptı, Arda
> bayıldı."
>sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
> dedirterek.
>
>
>350.0
>350,USD
>6
>true
>**2006-02-13T15:26:37Z
>**1420300467908378624
>
>
>
>baştan
>baştan
>

Re: Whole Phrase search in Solr

2012-12-03 Thread NickA

Jack thank you again,

however we have the major problem that using QUOTES to bring "phrase"
results, actually does not bring any results AT ALL!

I mentioned this at the initial post, that we also used these:

fq=search_field:"check this"
fq=search_field:'check this' 

But no results appear when quotes are used. What may be doing wrong in our
configuration?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024071.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: News clustering

2012-12-03 Thread Jorge Luis Betancourt Gonzalez

I'm trying to using to search though news websites, but I was interested in 
classification on index time, is there any available solution for this?

Greetings!

On Dec 3, 2012, at 12:37 PM, Stanislaw Osinski  wrote:

>> I mean measuring the similarity between the document in each cluster.
>> Also, difference between document on one cluster with another cluster.
>> 
>> I saw the sample code ClusteringQualityBencmark.java
>> However, I do not know how to make use of it for assessing my Solr
>> Clustering performance.
>> 
> 
> You'd need to write your own code for this, here are the most common
> clustering quality measures you mentioned:
> 
> http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results
> 
> These are meant for the general case (numeric attributes), to apply them to
> texts, you'd need to use the vector representation of the documents.
> 
> One a more general note, synthetic measures test only the document-cluster
> assignments, but none take the quality of labels into account (this is
> really hard to measure objectively).
> 
> Staszek
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu

Jack,

I see interesting stuff here now.

I tried  as search query  not "baş" but "features:baş" in field "q" in SOLR
GUI. And, I got result!

In the one document, I had some fields type of text_eng, text_general and
one field features type of text_tr. If I don't specify field name, SOLR use
EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
in search query string.

Is this true?

Erol Akarsu

On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu  wrote:

> Jack,
>
> I have these in schema.xml that defines "features" as type of text_tr
>
> But unfortunately, this fails.
>
>
>   multiValued="true"/>
> 
>
>
>  positionIncrementGap="100">
>   
>  
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>   language="Turkish"/>
>   
>   
>
> 
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>   language="Turkish"/>
>   
> 
>
>
>
>
> On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky wrote:
>
>> Ah! See where it says "**text:baş"?
>> Your query is against the "text" field, which probably doesn't have the
>> Turkish analysis.
>>
>> There is probably a copyField from "features" to "text". You use the
>> "text_tr" field type for "features", but probably not for the "text" field.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Erol Akarsu
>> Sent: Monday, December 03, 2012 1:06 PM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Luke and SOLR search giving different results
>>
>> Jack,
>>
>> I have already set tomcat server fro UTF-Encoding before. I have added
>> URIEncoding="UTF-8" to all  elements in server.xml in Tomcat
>> 7.
>>
>> As you see below, when I search  word "baş"  with debug mode I can see
>> empty response. But  when I search word "baştan", I can get correct
>> response.
>>
>> It seems to me that TurkishAnalyser is not being used in SOLR search
>> because we can make only full word search "baştan" but not the root word
>> "baş". Probably, English Analyzer is being used and could not find the
>> root
>> word. For example, in Luke, if I change "Analyser to use for query
>> parsing"
>> to EnglishAnalyser, then it can not find word "baş" but it can with
>> TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.
>>
>> Is this assumption true? I could not find any other reason
>>
>>
>> 
>> 
>>
>>0
>>58
>>
>>true
>>baş
>>xml
>>
>>
>>
>>
>>baş
>>baş
>>text:baş
>>**text:baş
>>
>>LuceneQParser
>>
>>38.0
>>
>>16.0
>>> name="org.apache.solr.handler.**component.QueryComponent">
>>3.0
>>
>>> name="org.apache.solr.handler.**component.FacetComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.HighlightComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.StatsComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.DebugComponent">
>>0.0
>>
>>
>>
>>10.0
>>> name="org.apache.solr.handler.**component.QueryComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.FacetComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.HighlightComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.StatsComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.DebugComponent">
>>10.0
>>
>>
>>
>>
>> 
>>
>> 
>>
>>0
>>2
>>
>>true
>>baştan
>>xml
>>
>>
>>
>>
>>htt://111.a.b1
>>6H500F0
>>tr
>>Maxtor DiamondMax 11 - hard drive - 500 GB -
>> SATA-300
>>
>>Maxtor Corp.
>>maxtor
>>
>>electronics
>>hard drive
>>
>>
>>SATA 3.0Gb/s, NCQ
>>8.5ms seek
>>16MB cache
>>
>>Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim
>> senaryoyu!" diyerek
>>baştan savma

Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky

As I pointed out in my message, your query is indicating that "text" is your 
default search field. So, either choose a different default search field, or 
assure that the "text" field has the desired field type.


If you want to change the default search field, eEither use a "df" request 
parameter or change the "df" default value for the request handler in the 
solrconfig.xml.


-- Jack Krupansky

-Original Message- 
From: Erol Akarsu

Sent: Monday, December 03, 2012 3:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

I see interesting stuff here now.

I tried  as search query  not "baş" but "features:baş" in field "q" in SOLR
GUI. And, I got result!

In the one document, I had some fields type of text_eng, text_general and
one field features type of text_tr. If I don't specify field name, SOLR use
EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
in search query string.

Is this true?

Erol Akarsu

On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu  wrote:


Jack,

I have these in schema.xml that defines "features" as type of text_tr

But unfortunately, this fails.


 




  
 


 
  
  




 
  





On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky 
wrote:


Ah! See where it says "name="parsedquery_toString">**text:baş"?

Your query is against the "text" field, which probably doesn't have the
Turkish analysis.

There is probably a copyField from "features" to "text". You use the
"text_tr" field type for "features", but probably not for the "text" 
field.



-- Jack Krupansky

-Original Message- From: Erol Akarsu
Sent: Monday, December 03, 2012 1:06 PM

To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

I have already set tomcat server fro UTF-Encoding before. I have added
URIEncoding="UTF-8" to all  elements in server.xml in 
Tomcat

7.

As you see below, when I search  word "baş"  with debug mode I can see
empty response. But  when I search word "baştan", I can get correct
response.

It seems to me that TurkishAnalyser is not being used in SOLR search
because we can make only full word search "baştan" but not the root word
"baş". Probably, English Analyzer is being used and could not find the
root
word. For example, in Luke, if I change "Analyser to use for query
parsing"
to EnglishAnalyser, then it can not find word "baş" but it can with
TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.

Is this assumption true? I could not find any other reason




   
   0
   58
   
   true
   baş
   xml
   
   
   
   
   baş
   baş
   text:baş
   **text:baş
   
   LuceneQParser
   
   38.0
   
   16.0
   
   3.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   
   10.0
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   0.0
   
   
   10.0
   
   
   
   



   
   0
   2
   
   true
   baştan
   xml
   
   
   
   
   htt://111.a.b1
   6H500F0
   tr
   Maxtor DiamondMax 11 - hard drive - 500 GB -
SATA-300
   
   Maxtor Corp.
   maxtor
   
   electronics
   hard drive
   
   
   SATA 3.0Gb/s, NCQ
   8.5ms seek
   16MB cache
   
   Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim
senaryoyu!" diyerek
   baştan savma reklamlarla kotarmaya bakıyor işi.
Futbolcu Arda Turan
   ve büyük umutlarla Türkiye'ye getirilen Paris 
Hilton'un

oynatıldığı
   giyim firması reklamı da tam bir fiyasko. Birbirinden
ünlü bu iki
   ismin oynadığı reklam Arda'nın kabinde papağan gibi
tekrarladığı
   "My darling!" repliği, sonunda Paris'i görünce anlam
veremediğimiz
   uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez
izledikten
   sonra anlaşılan "Paris seçti, firma yaptı, Arda
bayıldı."
   sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
dedirterek.
   
   
   350.0
   350,USD
   6
   true
   **2006-02-13T15:26:37Z
   **14203004679083

solr war -> osgi

2012-12-03 Thread Marcos Mendez

Hi,

Has anyone had any experience repackaging the solr war for osgi? And while I'm 
at it, has anyone done this in geronimo 3.0?

Regards,
Marcos

Re: Luke and SOLR search giving different results

2012-12-03 Thread Shawn Heisey


On 12/3/2012 1:44 PM, Erol Akarsu wrote:

I tried  as search query  not "baş" but "features:baş" in field "q" in SOLR
GUI. And, I got result!

In the one document, I had some fields type of text_eng, text_general and
one field features type of text_tr. If I don't specify field name, SOLR use
EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
in search query string.


Your config is set up to search against a field named "text" by default 
- either by a setting in schema.xml or a "df" parameter in your search 
handler definition in solrconfig.xml.  If you are using (e)dismax, it 
might be qf/pf parameters instead of df.


The field named text is not properly set up for this search.  Your 
attachment at the beginning of this thread indicates that either you do 
not have a text field for this document at all, or that field is not 
stored.  If the text field is a copyField as Jack has mentioned, note 
that it doesn't matter what analysis you are doing on features -- the 
copy is done before analysis, so it is completely separate.


Thanks,
Shawn

Re: Whole Phrase search in Solr

2012-12-03 Thread Jack Krupansky

Ah! You have conflicting tokenizers in your index and query analyzers. They 
should be the same.


Your index has:
 

Your query has:
  

That has the effect of treating the entire query term as one index term. 
That actually works for simple terms, but a quoted phrase is passed to the 
query analyzer as one string and the keyword tokenizer will treat it as one 
token and this will index it as one term, which will not match the two terms 
that were indexed by the standard tokenizer.


Stick with the same tokenizer as you used at index time.

-- Jack Krupansky

-Original Message- 
From: NickA

Sent: Monday, December 03, 2012 1:47 PM
To: solr-user@lucene.apache.org
Subject: Re: Whole Phrase search in Solr

Jack thank you again,

however we have the major problem that using QUOTES to bring "phrase"
results, actually does not bring any results AT ALL!

I mentioned this at the initial post, that we also used these:

fq=search_field:"check this"
fq=search_field:'check this'

But no results appear when quotes are used. What may be doing wrong in our
configuration?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024071.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with ping handler, SolrJ 4.1-SNAPSHOT, Solr 3.5.0

2012-12-03 Thread Shawn Heisey


On 11/8/2012 3:25 PM, Dyer, James wrote:

Could this be a side-effect from SOLR-4019, in branch_4.0 this was commit r1405894 ?  
Prior to this commit, PingRequestHandler would throw a SolrException for 503/Bad Request. 
 The change is that the exception isn't actually thrown but rather sent in place of the 
response.  This prevents the container from logging huge stack traces just because 
PingrequestHandler is in a "disabled" state.  Prior to this, SolrException had 
logging disabled for 503's with hardcoding, but this broke other uses of 503 SE's.


While working on another issue (SOLR-4143), I figured out why this isn't 
working.  Initially I did not connect the exceptions in the Solr 3.5 log 
to my problems getting ping responses, but the light eventually turned on.


My requests to the 3.5 ping handler from SolrJ 4.1-SNAPSHOT use the 
setRequestHandler method to talk to /admin/ping.  In addition to using 
/admin/ping as the URL path, this also sets the qt parameter to 
/admin/ping.  The PingRequestHandler in Solr 3.x looks at the qt 
parameter that it receives, and if that handler is an instance of 
PingRequestHandler, throws an exception saying that you can't call PRH 
recursively.  This is why I get an exception and no response, but it 
works perfectly in a browser -- I wasn't setting qt in my browser.  Once 
I did that, I get the bad response in the browser too.


There is no way in SolrJ 4.x or trunk to set the request handler without 
also setting qt.  When I looked at SolrJ code trying to make a patch for 
SOLR-4143, I discovered that it's not a trivial change, and it may not 
be possible to even do in branch_4x.


Is there possibly a workaround I can use in SolrJ?  Other thoughts?

Thanks,
Shawn

Re: News clustering

2012-12-03 Thread Iwan Hanjoyo

Hi Stanislaw,

I see. Thank you for the reference.

Kind regards,

Hanjoyo

On Tue, Dec 4, 2012 at 12:37 AM, Stanislaw Osinski
wrote:

> > I mean measuring the similarity between the document in each cluster.
> > Also, difference between document on one cluster with another cluster.
> >
> > I saw the sample code ClusteringQualityBencmark.java
> > However, I do not know how to make use of it for assessing my Solr
> > Clustering performance.
> >
>
> You'd need to write your own code for this, here are the most common
> clustering quality measures you mentioned:
>
>
> http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results
>
> These are meant for the general case (numeric attributes), to apply them to
> texts, you'd need to use the vector representation of the documents.
>
> One a more general note, synthetic measures test only the document-cluster
> assignments, but none take the quality of labels into account (this is
> really hard to measure objectively).
>
> Staszek
>

Re: How to change Solr UI

2012-12-03 Thread Iwan Hanjoyo

>
>
> Note that Velocity _can_ be used for user-facing code, but be very sure you
> secure your Solr. If you allow direct access, a user can easily enter
> something like http://
> /update?commit=true&stream.body=*:*.
> And all your documents will be gone.
>
> Hi Erickson,

Thank you for the input.
I'll notice and filter out this url.
* http://
/update?commit=true&stream.body=*:*

Kind regards,

Hanjoyo

Re: solr war -> osgi

2012-12-03 Thread Iwan Hanjoyo

> Has anyone had any experience repackaging the solr war for osgi? And while
> I'm at it, has anyone done this in geronimo 3.0?
>
>
Hi Marcos,

Start glassfish web server.
Put solr war file inside the autodeploy folder.
Finally, you need to find the solr home folder location.
Different operating system will have different solr home location for
glassfish.

You need to find it yourself in the glassfish log file.
It is a bit difficult.

good luck

Kind regards,

Hanjoyo

Re: How to change Solr UI

2012-12-03 Thread Jack Krupansky


It is annoying to have to repeat these explanations so much.

Any serious objection to removing the VW UI from Solr proper and replacing 
it with a standalone app?


I mean, Solr should have PHP, python, Java, and ruby example apps, right?

-- Jack Krupansky

-Original Message- 
From: Iwan Hanjoyo

Sent: Monday, December 03, 2012 8:28 PM
To: solr-user@lucene.apache.org
Subject: Re: How to change Solr UI




Note that Velocity _can_ be used for user-facing code, but be very sure 
you

secure your Solr. If you allow direct access, a user can easily enter
something like http://
/update?commit=true&stream.body=*:*.
And all your documents will be gone.

Hi Erickson,


Thank you for the input.
I'll notice and filter out this url.
* http://
/update?commit=true&stream.body=*:*

Kind regards,

Hanjoyo

Solr Query Parameter : ids - What is this used for?

2012-12-03 Thread deniz

Hello, as it is clear in the title too, i wanna know for what solr uses this
parameter... i see it on a sharding env on cloud, so i guess it is related
with cloud but still there is no explanation about it in any of wiki pages
that i have checked... can someone explain the usage and aim of this
parameter? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Query-Parameter-ids-What-is-this-used-for-tp4024152.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Query Parameter : ids - What is this used for?

2012-12-03 Thread Yonik Seeley

On Mon, Dec 3, 2012 at 10:55 PM, deniz  wrote:
> Hello, as it is clear in the title too, i wanna know for what solr uses this
> parameter... i see it on a sharding env on cloud, so i guess it is related
> with cloud but still there is no explanation about it in any of wiki pages
> that i have checked... can someone explain the usage and aim of this
> parameter?

It's an internal implementation detail of distributed search - the
second phase selects specific ids on each shard via the "ids"
parameter.

-Yonik
http://lucidworks.com

Difference between 'bf' and 'boost' when using eDismax

2012-12-03 Thread Floyd Wu

Hi there,

I'm not sure if I understand this clearly.

'bf' is that final score will be add some value return by bf?
for example->  score + bf = final score

'boost' is that score will be multiply with value that return by boost?
for example-> score * boost = final score

When using both( 'bf' and 'boost')
score * boost + bf = final score

If I would like to make recent created document ranking higher, using 'bf'
or 'boost' will be better solution(Assume bf and boost will use the same
function recip(ms(NOW,datefield),3.16e-11,1,1))?


Please help on this.

search behavior on a case-sensitive field

2012-12-03 Thread Joe Zhang

I have a search like this:












When I query "COST", it gives reasonable results (n1);
When I query "CoSt", however, it gives me n2 (>n1) results, and I can't
locate actual occurence of "CoSt" in the docs at all. Can anybody advise?

Re: Solr Query Parameter : ids - What is this used for?

2012-12-03 Thread deniz

Yonik Seeley-4 wrote
> It's an internal implementation detail of distributed search - the
> second phase selects specific ids on each shard via the "ids"
> parameter.
> 
> -Yonik
> http://lucidworks.com

so i suppose it us unique field? or it depends on which field we are using
for querying on shards? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Query-Parameter-ids-What-is-this-used-for-tp4024152p4024159.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search behavior on a case-sensitive field

2012-12-03 Thread Jack Krupansky

"CoSt" was split into two terms and the query parser generated an OR of 
them. Adding the autoGeneratePhraseQueries="true" attribute to your field 
type should fix the problem.


You can also change splitOnCaseChange="1" to splitOnCaseChange="0" to avoid 
the term splitting issue.


Be sure to completely reindex in either case.

-- Jack Krupansky

-Original Message- 
From: Joe Zhang

Sent: Monday, December 03, 2012 11:10 PM
To: solr-user@lucene.apache.org
Subject: search behavior on a case-sensitive field

I have a search like this:

   
   
   
   
   

   
   
   
   

When I query "COST", it gives reasonable results (n1);
When I query "CoSt", however, it gives me n2 (>n1) results, and I can't
locate actual occurence of "CoSt" in the docs at all. Can anybody advise?

Re: search behavior on a case-sensitive field

2012-12-03 Thread Joe Zhang

haha, makes perfect sense! Thanks a lot!

On Mon, Dec 3, 2012 at 9:25 PM, Jack Krupansky wrote:

> "CoSt" was split into two terms and the query parser generated an OR of
> them. Adding the autoGeneratePhraseQueries="**true" attribute to your
> field type should fix the problem.
>
> You can also change splitOnCaseChange="1" to splitOnCaseChange="0" to
> avoid the term splitting issue.
>
> Be sure to completely reindex in either case.
>
> -- Jack Krupansky
>
> -Original Message- From: Joe Zhang
> Sent: Monday, December 03, 2012 11:10 PM
> To: solr-user@lucene.apache.org
> Subject: search behavior on a case-sensitive field
>
>
> I have a search like this:
>
>positionIncrementGap="100">
>
>
>ignoreCase="true" words="stopwords.txt"/>
>generateWordParts="1" generateNumberParts="1"
>catenateWords="1" catenateNumbers="1" catenateAll="0"
>splitOnCaseChange="1"/>
> 
>protected="protwords.txt"/>
>
>
>
>
> When I query "COST", it gives reasonable results (n1);
> When I query "CoSt", however, it gives me n2 (>n1) results, and I can't
> locate actual occurence of "CoSt" in the docs at all. Can anybody advise?
>

Re: Difference between 'bf' and 'boost' when using eDismax

2012-12-03 Thread Jack Krupansky


"bf" is processed first, then "boost".

All the bf's will be added, then the resulting scores will be boosted by the 
product of all the "boost" function queries.


-- Jack Krupansky

-Original Message- 
From: Floyd Wu

Sent: Monday, December 03, 2012 11:00 PM
To: solr-user@lucene.apache.org
Subject: Difference between 'bf' and 'boost' when using eDismax

Hi there,

I'm not sure if I understand this clearly.

'bf' is that final score will be add some value return by bf?
for example->  score + bf = final score

'boost' is that score will be multiply with value that return by boost?
for example-> score * boost = final score

When using both( 'bf' and 'boost')
score * boost + bf = final score

If I would like to make recent created document ranking higher, using 'bf'
or 'boost' will be better solution(Assume bf and boost will use the same
function recip(ms(NOW,datefield),3.16e-11,1,1))?


Please help on this.

Re: How to change Solr UI

2012-12-03 Thread Erick Erickson

That's only one example, there are others,
stream.body=blah. or
id:*

Jack's comment is well taken, consider a real middleware application.


Best
Erick


On Mon, Dec 3, 2012 at 5:28 PM, Iwan Hanjoyo  wrote:

> >
> >
> > Note that Velocity _can_ be used for user-facing code, but be very sure
> you
> > secure your Solr. If you allow direct access, a user can easily enter
> > something like http://
> >
> /update?commit=true&stream.body=*:*.
> > And all your documents will be gone.
> >
> > Hi Erickson,
>
> Thank you for the input.
> I'll notice and filter out this url.
> * http://
> /update?commit=true&stream.body=*:*
>
> Kind regards,
>
> Hanjoyo
>

Re: Difference between 'bf' and 'boost' when using eDismax

2012-12-03 Thread Floyd Wu

Thanks Jack!
It helps a lots.

Floyd



2012/12/4 Jack Krupansky 

> "bf" is processed first, then "boost".
>
> All the bf's will be added, then the resulting scores will be boosted by
> the product of all the "boost" function queries.
>
> -- Jack Krupansky
>
> -Original Message- From: Floyd Wu
> Sent: Monday, December 03, 2012 11:00 PM
> To: solr-user@lucene.apache.org
> Subject: Difference between 'bf' and 'boost' when using eDismax
>
>
> Hi there,
>
> I'm not sure if I understand this clearly.
>
> 'bf' is that final score will be add some value return by bf?
> for example->  score + bf = final score
>
> 'boost' is that score will be multiply with value that return by boost?
> for example-> score * boost = final score
>
> When using both( 'bf' and 'boost')
> score * boost + bf = final score
>
> If I would like to make recent created document ranking higher, using 'bf'
> or 'boost' will be better solution(Assume bf and boost will use the same
> function recip(ms(NOW,datefield),3.16e-**11,1,1))?
>
>
> Please help on this.
>

Migrating solr 3.6 to solr 4.0

2012-12-03 Thread Shaveta_Chawla

Hi,

I had solr3.6 installed on my system, now i am migrating my solr3.6 to
solr4.0. but i am getting the error 

SEVERE: Unable to create core: collection1
java.io.IOException: Can't find resource 'solrconfig.xml' in classpath or
'solr/collection1/conf/', cwd=/opt/tomcat/bin

i don't know how to resolve this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Migrating-solr-3-6-to-solr-4-0-tp4024173.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Migrating solr 3.6 to solr 4.0

2012-12-03 Thread Tirthankar Chatterjee

can you paste the content of solr.xml
On Dec 4, 2012, at 1:26 AM, Shaveta_Chawla wrote:

> Hi,
> 
> I had solr3.6 installed on my system, now i am migrating my solr3.6 to
> solr4.0. but i am getting the error 
> 
> SEVERE: Unable to create core: collection1
> java.io.IOException: Can't find resource 'solrconfig.xml' in classpath or
> 'solr/collection1/conf/', cwd=/opt/tomcat/bin
> 
> i don't know how to resolve this.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Migrating-solr-3-6-to-solr-4-0-tp4024173.html
> Sent from the Solr - User mailing list archive at Nabble.com.

**Legal Disclaimer***
"This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you."
*

63 matches

Mail list logo