Deploying war

2012-12-17 Thread Arkadi Colson

Hi

From time to time it takes quite some time to start tomcat. The logging 
is reporting thissnippet below. Any idea? Sorry I'm a beginner with Java.


Dec 17, 2012 8:01:38 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["ajp-bio-8009"]
Dec 17, 2012 8:01:38 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 542 ms
Dec 17, 2012 8:01:38 AM org.apache.catalina.core.StandardService 
startInternal

INFO: Starting service Catalina
Dec 17, 2012 8:01:38 AM org.apache.catalina.core.StandardEngine 
startInternal

INFO: Starting Servlet Engine: Apache Tomcat/7.0.33
Dec 17, 2012 8:01:38 AM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive 
/usr/local/apache-tomcat-7.0.33/webapps/solr.war


BR,
Arkadi


Re: Faceted result , based on words specified in search query.

2012-12-17 Thread Upayavira
You can facet on terms in a field. When a document is selected, you
cannot facet on some terms for a field in that document, and not others.

In your case, if Sony, Samsung and LG were in your 'manufacturer' field,
and that is what you searched across, then surely it would be
straight-forward to create a facet that shows which of them matched. 

If you want combinations, you should do that at display time.

Upayavira

On Mon, Dec 17, 2012, at 07:07 AM, veena rani wrote:
>   Hi,
> 
>If searched for three words in query.
> 
> > Eg:
> > Sony, Samsung,LG.
> > i should get faceted result with the count ,combination of two among them
> > like,
> > 1.Sony, Samsung
> > 2.Samsung,LG
> > 3.LG,Sony
> > and also the number of available for each of them
> > and also all of them.
> >
> > --
> > Regards,
> > Veena Rani P N
> > Banglore.
> > 9538440458
> >
> >
> 
> 
> -- 
> Regards,
> Veena Rani P N
> Banglore.
> 9538440458


Re: Update / replication of offline indexes

2012-12-17 Thread Dikchant Sahi
Thanks Erick and Upayavira! This answers my question.


On Mon, Dec 17, 2012 at 8:05 AM, Erick Erickson wrote:

> See the very last line here:
> http://wiki.apache.org/solr/MergingSolrIndexes
>
> Short answer is that merging will lead to duplicate documents, even with
> uniqueKeys defined.
>
> So you're really kind of stuck handling this outside of merge, either by
> shipping the
> list of overwritten docs and deleting them from the base index or shipping
> the JSON/XML
> format and indexing those. Of the  two, I'd think the latter is
> easiest/least prone to surprises.
> Especially since you could re-run the indexing as many times as necessary.
>
> The UniqueKey bits are only guaranteed to overwrite older docs when
> indexing, not merging.
>
> Best
> Erick
>
>
> On Thu, Dec 13, 2012 at 3:17 PM, Dikchant Sahi  >wrote:
>
> > Hi Alex,
> >
> > You got my point right. What I see is merge adds duplicate document. Is
> > there a way to overwrite existing document in one core by another. Can
> > merge operation lead to data corruption, say in case when the core on
> > client had uncommitted changes.
> >
> > What would be a better solution for my requirement, merge or indexing
> > XML/JSON?
> >
> > Regards,
> > Dikchant
> >
> > On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
> > wrote:
> >
> > > Not sure I fully understood this and maybe you already cover that by
> > > 'merge', but if you know what you gave the client last time, you can
> just
> > > build a differential as a second core, then on client mount that second
> > > core and merge it into the first one (e.g. with DIH).
> > >
> > > Just a thought.
> > >
> > > Regards,
> > >Alex.
> > >
> > > Personal blog: http://blog.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all at
> > > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> > >
> > >
> > >
> > > On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi  > > >wrote:
> > >
> > > > Hi Erick,
> > > >
> > > > Sorry for creating the confusion. By slave, I mean the indexes on
> > client
> > > > machine will be replica of the master and in not same as the slave in
> > > > master-slave model. Below is the detail:
> > > >
> > > > The system is being developed to support search facility on 1000s of
> > > > system, a majority of which will be offline.
> > > >
> > > > The idea is that we will have a search system which will be sold
> > > > on subscription basis. For each of the subscriber, we will copy the
> > > master
> > > > index to their local machine, over a drive or CD. Now, if a
> subscriber
> > > > comes after 2 months and want the updates, we just want to provide
> the
> > > > deltas for 2 month as the volume of data is huge. For this we can
> think
> > > of
> > > > two approaches:
> > > > 1. Fetch the documents which are less than 2 months old  in JSON
> format
> > > > from master Solr. Copy it to the subscriber machine
> > > > and index those documents. (copy through cd / memory sticks)
> > > > 2. Create separate indexes for each month on our master machine. Copy
> > the
> > > > indexes to the client machine and merge. Prior to merge we need to
> > delete
> > > > records which the new index has, to avoid duplicates.
> > > >
> > > > As long as the setup is new, we will copy the complete index and
> > restart
> > > > Solr. We are not sure of the best approach for copying the deltas.
> > > >
> > > > Thanks,
> > > > Dikchant
> > > >
> > > >
> > > >
> > > > On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson <
> > erickerick...@gmail.com
> > > > >wrote:
> > > >
> > > > > This is somewhat confusing. You say that box2 is the slave, yet
> > they're
> > > > not
> > > > > connected? Then you need to copy the /data index from
> box
> > 1
> > > to
> > > > > box 2 manually (I'd have box2 solr shut down at the time) and
> restart
> > > > Solr.
> > > > >
> > > > > Why can't the boxes be connected? That's a much simpler way of
> going
> > > > about
> > > > > it.
> > > > >
> > > > > Best
> > > > > Erick
> > > > >
> > > > >
> > > > > On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi <
> > contacts...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi Walter,
> > > > > >
> > > > > > Thanks for the response.
> > > > > >
> > > > > > Commit will help to reflect changes on Box1. We are able to
> achieve
> > > > this.
> > > > > > We want the changes to reflect in Box2.
> > > > > >
> > > > > > We have two indexes. Say
> > > > > > Box1: Master & DB has been setup. Data Import runs on this.
> > > > > > Box2: Slave running.
> > > > > >
> > > > > > We want all the updates on Box1 to be merged/present in index on
> > > Box2.
> > > > > Both
> > > > > > the boxes are not connected over n/w. How can be achieve this.
> > > > > >
> > > > > > Please let me know, if am not clear.
> > > > > >
> > > > > > Thanks again!
> > > > > >
> > > > > > Regards,
> > > > > > Dikchant
> > > > > >
> > > > > > On Tue, Dec 11, 2012 

RE: SolrCloud breaks distributed query strings

2012-12-17 Thread Markus Jelsma
Anyone else noticed a similar issue where Solr mangles distributed query 
parameters? Any hints on how to track this issue? Where to look?

Thanks 
 
-Original message-
> From:Markus Jelsma 
> Sent: Wed 12-Dec-2012 15:11
> To: solr-user@lucene.apache.org
> Subject: RE: SolrCloud breaks distributed query strings
> 
> Hi Per,
> 
> We're running Tomcat6 with the today's checkout from trunk. I cannot remember 
> i've seen it before and i cannot reproduce it manually in my browser, only in 
> concurrent stress tests firing queries.
> 
> Thanks
> Markus 
>  
> -Original message-
> > From:Per Steffensen 
> > Sent: Wed 12-Dec-2012 15:04
> > To: solr-user@lucene.apache.org
> > Subject: Re: SolrCloud breaks distributed query strings
> > 
> > It doesnt sound exactly like a problem we experienced some time ago, 
> > where long request where mixed put during transport. Jetty was to blame. 
> > I might be Jetty that f up you request too? SOLR-4031. Are you still 
> > running 8.1.2?
> > 
> > Regards, Per Steffensen
> > 
> > Markus Jelsma skrev:
> > > Hi,
> > >
> > > We're starting to see issues on a test cluster where Solr breaks up query 
> > > string parameters that are either defined in the request handler or are 
> > > passed in the URL in the initial request.
> > >
> > > In our request handler we have an SF parameter for edismax (SOLR-3925):
> > >
> > >   
> > > title_general~2^4
> > > title_nl~2^4
> > > title_en~2^4
> > > title_de~2^4
> > >  
> > >
> > > Almost all queries pass without issue but some fail because the parameter 
> > > arrives in an incorrect format, i've logged several occurences:
> > >
> > > 2012-12-12 12:01:12,159 ERROR [solr.core.SolrCore] - [http-8080-exec-23] 
> > > - : org
> > > .apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
> > > Invalid a
> > > rguments for sf, must be sf=FIELD~DISTANCE^BOOST, got 
> > > title_general~2^4
> > > title_nl~2^4
> > > title_en~2^4
> > > title_de~2
> > > 4
> > >
> > >   
> > > at 
> > > org.apache.solr.handler.component.QueryComponent.prepare(QueryCompone
> > > nt.java:154)
> > > 
> > >
> > > 2012-12-12 12:00:57,164 ERROR [solr.core.SolrCore] - [http-8080-exec-1] - 
> > > : org.
> > > apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
> > > Invalid ar
> > > guments for sf, must be sf=FIELD~DISTANCE^BOOST, got 
> > > title_general~2^4
> > > title_nl~2
> > > 4
> > > title_en~2^4
> > > title_de~2^4
> > >
> > >   
> > > at 
> > > org.apache.solr.handler.component.QueryComponent.prepare(QueryCompone
> > > nt.java:154)
> > > 
> > >
> > > 2012-12-12 12:01:11,223 ERROR [solr.core.SolrCore] - [http-8080-exec-8] - 
> > > : org.
> > > apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
> > > Invalid ar
> > > guments for sf, must be sf=FIELD~DISTANCE^BOOST, got ^
> > > title_general~2^4
> > > title_nl~2^4
> > > title_en~2^4
> > > title_de~2^4
> > >
> > >   
> > > at 
> > > org.apache.solr.handler.component.QueryComponent.prepare(QueryCompone
> > > nt.java:154)
> > > 
> > >
> > > This seems crazy! For some reason, some times, the parameter get 
> > > corrupted in some manner! We've also seen this with a function query in 
> > > the edismax boost parameter where for some reasons a comma is replaced by 
> > > a newline:
> > >
> > > 2012-12-12 11:11:45,527 ERROR [solr.core.SolrCore] - [http-8080-exec-16] 
> > > - : org.apache.solr.common.SolrException: 
> > > org.apache.solr.search.SyntaxError: Expected ',' at position 55 in 
> > > 'if(exists(date),max(recip(ms(NOW/DAY,date),3.17e-8,143
> > > .9),.8),.7)'
> > > at 
> > > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:154)
> > > ...
> > > at 
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > at java.lang.Thread.run(Thread.java:662)
> > > Caused by: org.apache.solr.search.SyntaxError: Expected ',' at position 
> > > 55 in 'if(exists(date),max(recip(ms(NOW/DAY,date),3.17e-8,143
> > > .9),.8),.7)'
> > >
> > > Accompanying these errors is a number of AIOOBexceptions without stack 
> > > trace and Spellchecker NPE's (SOLR-4049).  I'm completely puzzled here 
> > > because it queries get randomly mangled in some manner. The SF parameter 
> > > seems to get mangled only by replacing ^ with a newline. The boost query 
> > > seems to be mangled in the same way if it fails. Only about 6% of all 
> > > queries fired to the cluster end in such an error.
> > >
> > > We're also seeing strange facets returned where two constraints seem to 
> > > appear in a single returned value for a field, completely messed up :)
> > >
> > > 2012-12-12 12:00:56,341 ERROR [handler.component.FacetComponent] - 
> > > [http-8080-exec-11] - : Unexpected term returned for facet refining. 
> > > key=host term=

Searches with phonetics

2012-12-17 Thread Sangeetha
Hi,

I have not done anything in my schema.xml for phonetics search. But it
searches and returns *july *when i give *juli*. But i dont want this. How to
avoid that?


Thanks,
Sangeetha



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searches-with-phonetics-tp4027487.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searches with phonetics

2012-12-17 Thread Erik Hatcher
This is probably due to stemming. Removing the stemming (porter or snowball) 
from you analysis chains should do the trick. (And reindex)

Erik

On Dec 17, 2012, at 2:57, Sangeetha  wrote:

> Hi,
> 
> I have not done anything in my schema.xml for phonetics search. But it
> searches and returns *july *when i give *juli*. But i dont want this. How to
> avoid that?
> 
> 
> Thanks,
> Sangeetha
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Searches-with-phonetics-tp4027487.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrcloud and Node.js

2012-12-17 Thread Per Steffensen

Luis Cappa Banda skrev:

Thanks a lot, Per. Now I understand the whole scenario. One last question:
I've been searching trying to find some kind of request handler that
retrieves cluster status information, but no luck. I know that there exists
a JSON called clusterstate.json, but I don't know the way to get it in raw
JSON format.
If you want the clusterstate in raw JSON format, I believe there is 
currently no other way than go fetch it youself from ZK. Or maybe 
something in the admin-console /zookeeper will help you.

 Do you know how to get it status? Any request handler or Solr
query? Maybe checking directly from Zookeeper?
  
Yes, if you want it in raw JSON format. If you want the "information" 
parsed as a java object hierarchy you can access through ClusterState 
object. The best way to get a ClusterState (that keeps itself up to date 
with changing states) is probably to use the ZkStateReader:
   ZkStateReader zk = new ZkStateReader(, 
, );

   zk.createClusterStateWatchersAndUpdate();
Then whenever you want a updated "picture" of the cluster state:
   zk.getClusterState();
You can also use a CloudSolrServer which carries a ZkStateReader if you 
are already using that one. But I guess not since it didnt sound like 
you would try the node-java bridge to be able to use SolrJ stuff in node.js

Best regards,

- Luis Cappa.




Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann
Hi,

I am not sure if am missing something, or maybe I do not exactly understand
the index/search analyzer definition and their execution.

I have a field definition like this:



  


  
  


  


Any field starting with cl2 should be recognized as being of type
cl2Tokenized_string:


When I try to search for a token in that sense the query is tokenized at
whitespaces:

{!q.op=AND
df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen und
Erden, sonstiger Bergbau+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau

I expected the query parser would also tokenize ONLY at the pattern ###,
instead of using a white space tokenizer here?
Is is possible to define a filter query, without using phrases, to achieve
the desired behavior?
Maybe local parameters are not the way to go here?

Best
Dirk


Re: Searches with phonetics

2012-12-17 Thread Sangeetha


I have docs which contains the word July. If i search with juli also it
return July.

I have removed  in my
schema.xml. Now it return nothing when i search July. It return July only if
i give juli.

What should i do? I want to search the exact words which are in docs. I need
to do auto-suggest also.

Thanks,
Sangeetha



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searches-with-phonetics-tp4027487p4027498.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud with Near Realtime Search: buildOnOptimize in IndexBasedSpellChecker

2012-12-17 Thread Artyom
When an optimization event occurs in this case? Should I reindex this
spellchecker on every shard manually? Or does this even occurs every hard or
soft commit?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-with-Near-Realtime-Search-buildOnOptimize-in-IndexBasedSpellChecker-tp4027499.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searches with phonetics

2012-12-17 Thread Erik Hatcher
You need to reindex :)

Erik

On Dec 17, 2012, at 03:59 , Sangeetha wrote:

> 
> 
> I have docs which contains the word July. If i search with juli also it
> return July.
> 
> I have removed  in my
> schema.xml. Now it return nothing when i search July. It return July only if
> i give juli.
> 
> What should i do? I want to search the exact words which are in docs. I need
> to do auto-suggest also.
> 
> Thanks,
> Sangeetha
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Searches-with-phonetics-tp4027487p4027498.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann
{!q.op=AND df=cl2Categories_NACE}08
Gewinnung von Steinen und Erden, sonstiger Bergbau+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau

That is the relevant debug Output from the query.

2012/12/17 Dirk Högemann 

> Hi,
>
> I am not sure if am missing something, or maybe I do not exactly
> understand the index/search analyzer definition and their execution.
>
> I have a field definition like this:
>
>
>  sortMissingLast="true" omitNorms="true">
>   
>  group="-1"/>
> 
>   
>   
>  group="-1"/>
> 
>   
> 
>
> Any field starting with cl2 should be recognized as being of type
> cl2Tokenized_string:
>  stored="true" />
>
> When I try to search for a token in that sense the query is tokenized at
> whitespaces:
>
> {!q.op=AND
> df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen und
> Erden, sonstiger Bergbau name="parsed_filter_queries">+cl2Categories_NACE:08
> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
> +cl2Categories_NACE:bergbau
>
> I expected the query parser would also tokenize ONLY at the pattern ###,
> instead of using a white space tokenizer here?
> Is is possible to define a filter query, without using phrases, to achieve
> the desired behavior?
> Maybe local parameters are not the way to go here?
>
> Best
> Dirk
>


Re: SolrCloud with Near Realtime Search: buildOnOptimize in IndexBasedSpellChecker

2012-12-17 Thread Tomás Fernández Löbbe
"optimize" operations only occur when you explicitly request for them. All
nodes should get the command, so if you have set the "buildOnOptimize" in
all nodes (you probably are, as you are using the same configuration) then
all of them should rebuild the spellcheck index.

Tomás


On Mon, Dec 17, 2012 at 7:59 AM, Artyom  wrote:

> When an optimization event occurs in this case? Should I reindex this
> spellchecker on every shard manually? Or does this even occurs every hard
> or
> soft commit?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-with-Near-Realtime-Search-buildOnOptimize-in-IndexBasedSpellChecker-tp4027499.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCloud with Near Realtime Search: buildOnOptimize in IndexBasedSpellChecker

2012-12-17 Thread Artyom
Thank you, Tomás.

This wiki
http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
says
 "*Segments are normally merged over time anyway (as determined by the merge
policy), and optimize just forces these merges to occur immediately.*"

Doesn't the merge policy affects buildOnOptimize setting and trigger
spellcheck index rebuilding?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-with-Near-Realtime-Search-buildOnOptimize-in-IndexBasedSpellChecker-tp4027499p4027510.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud with Near Realtime Search: buildOnOptimize in IndexBasedSpellChecker

2012-12-17 Thread Upayavira
Note with 4.0 you don't need to build a spellcheck index. Spellchecking
can happen from your main index (unless you are providing your own
dictionary).

Upayavira

On Mon, Dec 17, 2012, at 12:36 PM, Artyom wrote:
> Thank you, Tomás.
> 
> This wiki
> http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
> says
>  "*Segments are normally merged over time anyway (as determined by the
>  merge
> policy), and optimize just forces these merges to occur immediately.*"
> 
> Doesn't the merge policy affects buildOnOptimize setting and trigger
> spellcheck index rebuilding?
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-with-Near-Realtime-Search-buildOnOptimize-in-IndexBasedSpellChecker-tp4027499p4027510.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud with Near Realtime Search: buildOnOptimize in IndexBasedSpellChecker

2012-12-17 Thread Tomás Fernández Löbbe
It only rebuilds on explicit optimize operations. A background merge that
merges all segments (to 1) won't fire the rebuild AFAIK.

And Upayavira is right, you can choose to use a DirectSolrSpellChecker,
that way you don't need an external index at all.


On Mon, Dec 17, 2012 at 9:46 AM, Upayavira  wrote:

> Note with 4.0 you don't need to build a spellcheck index. Spellchecking
> can happen from your main index (unless you are providing your own
> dictionary).
>
> Upayavira
>
> On Mon, Dec 17, 2012, at 12:36 PM, Artyom wrote:
> > Thank you, Tomás.
> >
> > This wiki
> >
> http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
> > says
> >  "*Segments are normally merged over time anyway (as determined by the
> >  merge
> > policy), and optimize just forces these merges to occur immediately.*"
> >
> > Doesn't the merge policy affects buildOnOptimize setting and trigger
> > spellcheck index rebuilding?
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/SolrCloud-with-Near-Realtime-Search-buildOnOptimize-in-IndexBasedSpellChecker-tp4027499p4027510.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCloud with Near Realtime Search: buildOnOptimize in IndexBasedSpellChecker

2012-12-17 Thread Artyom
Thank you, Upayavira, I know about the DirectSpellChecker.
But I want to know how IndexBasedSpellChecker is handled in SolrCloud.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-with-Near-Realtime-Search-buildOnOptimize-in-IndexBasedSpellChecker-tp4027499p4027514.html
Sent from the Solr - User mailing list archive at Nabble.com.


Wildcard inside " " ?

2012-12-17 Thread Bruno Mannina

Dear Users,

Wildcard seems not to work inside double quote request.
I get always 0 result.

I.e:
title:"plastic bicycle" <== 79 results in my database
title:"plast* bicycle" <== 0 result found

Is exist a solution for that ?

Thanks a lot,
Bruno



Re: Wildcard inside " " ?

2012-12-17 Thread Ahmet Arslan
> Wildcard seems not to work inside double quote request.
> I get always 0 result.
> 
> I.e:
> title:"plastic bicycle" <== 79 results in my database
> title:"plast* bicycle" <== 0 result found
> 
> Is exist a solution for that ?


Hi Bruno,

You can make use of https://issues.apache.org/jira/browse/SOLR-1604


Re: Wildcard inside " " ?

2012-12-17 Thread Bruno Mannina

Le 17/12/2012 14:13, Ahmet Arslan a écrit :

Wildcard seems not to work inside double quote request.
I get always 0 result.

I.e:
title:"plastic bicycle" <== 79 results in my database
title:"plast* bicycle" <== 0 result found

Is exist a solution for that ?


Hi Bruno,

You can make use of https://issues.apache.org/jira/browse/SOLR-1604



Oh ok thanks but after reading this page,
it seems not solved... Patchs are available ok but not easy for me to 
add it (or re-built war).


Is this, can be applied to Solr 3.6?

1-) extract ComplexPhrase.zip and run 'mvn package'
2-) copy the ComplexPhrase/target/ComplexPhrase-1.0.jar to solrhome/lib 
directory

3-) register queryparser to solrhome/conf/solrconfig.xml by adding
class="org.apache.solr.search.ComplexPhraseQParserPlugin" />

4-) enable it by appending &defType=complexphrase to search url.
5-) More permanent usage can be configured in solrconfig.xml
default="true">


complexphrase




Re: Core URL with solrj

2012-12-17 Thread Carlos Alexandro Becker
PS: I got the 404 when I try to reload the core, I forgot to say.

THanks


On Mon, Dec 17, 2012 at 11:50 AM, Carlos Alexandro Becker <
caarl...@gmail.com> wrote:

> I'm trying to use solrj with the new solr4, but having some issues...
>
> I'm used to use Solr3. I create the cores with something like:
>
> CoreAdminRequest.Create create = new CoreAdminRequest.Create();
> create.setCoreName("somecorename");
> create.process(new SolrHTTPServer("http://localhost:8080/solr";);
>
> And to access it, I do something like:
>
> SolrServer server = new SolrHTTPServer("
> http://localhost:8080/solr/somecorename";);
>
> and then I use this server instance.
>
> But now, I got a 404 with it.
>
> I tried to add a "#/" before "somecorename", like the web interface, but
> it still don't work...
>
>
> Is the a way to pass the base URL and corename and then get the SolrServer
> instance?
>
>
> Thanks in advance.
>
>
> --
> Atenciosamente,
> *Carlos Alexandro Becker*
> http://caarlos0.github.com/about
>



-- 
Atenciosamente,
*Carlos Alexandro Becker*
http://caarlos0.github.com/about


Re: Wildcard inside " " ?

2012-12-17 Thread Jack Krupansky
Not with the standard query parsers, but you can do it with the "surround" 
query parser:


defType=surround&q=title:(plast* w bicycle)

Or

q={!surround}title:(plast* w bicycle)

Two notes:
1. Surround performs NO analysis, so you have to manually analyze your 
terms, like lower case them if necessary.
2. Surround has no dismax feature, so it won't search multiple fields 
automatically.


-- Jack Krupansky

-Original Message- 
From: Bruno Mannina

Sent: Monday, December 17, 2012 8:04 AM
To: solr-user@lucene.apache.org
Subject: Wildcard inside " " ?

Dear Users,

Wildcard seems not to work inside double quote request.
I get always 0 result.

I.e:
title:"plastic bicycle" <== 79 results in my database
title:"plast* bicycle" <== 0 result found

Is exist a solution for that ?

Thanks a lot,
Bruno 



Re: Wildcard inside " " ?

2012-12-17 Thread Bruno Mannina

Both method doesn't work on my Solr 3.6, with the error message:

_Unknown query type 'surround'_




Le 17/12/2012 15:18, Jack Krupansky a écrit :
Not with the standard query parsers, but you can do it with the 
"surround" query parser:


defType=surround&q=title:(plast* w bicycle)

Or

q={!surround}title:(plast* w bicycle)

Two notes:
1. Surround performs NO analysis, so you have to manually analyze your 
terms, like lower case them if necessary.
2. Surround has no dismax feature, so it won't search multiple fields 
automatically.


-- Jack Krupansky

-Original Message- From: Bruno Mannina
Sent: Monday, December 17, 2012 8:04 AM
To: solr-user@lucene.apache.org
Subject: Wildcard inside " " ?

Dear Users,

Wildcard seems not to work inside double quote request.
I get always 0 result.

I.e:
title:"plastic bicycle" <== 79 results in my database
title:"plast* bicycle" <== 0 result found

Is exist a solution for that ?

Thanks a lot,
Bruno






Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Lee Carroll
I use *analyzer type*="*query*" can you use search ?




On 17 December 2012 11:01, Dirk Högemann wrote:

> {!q.op=AND df=cl2Categories_NACE}08
> Gewinnung von Steinen und Erden, sonstiger Bergbau name="parsed_filter_queries">+cl2Categories_NACE:08
> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
> +cl2Categories_NACE:bergbau
>
> That is the relevant debug Output from the query.
>
> 2012/12/17 Dirk Högemann 
>
> > Hi,
> >
> > I am not sure if am missing something, or maybe I do not exactly
> > understand the index/search analyzer definition and their execution.
> >
> > I have a field definition like this:
> >
> >
> >  > sortMissingLast="true" omitNorms="true">
> >   
> >  > group="-1"/>
> > 
> >   
> >   
> >  > group="-1"/>
> > 
> >   
> > 
> >
> > Any field starting with cl2 should be recognized as being of type
> > cl2Tokenized_string:
> >  > stored="true" />
> >
> > When I try to search for a token in that sense the query is tokenized at
> > whitespaces:
> >
> > {!q.op=AND
> > df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen und
> > Erden, sonstiger Bergbau > name="parsed_filter_queries">+cl2Categories_NACE:08
> > +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
> > +cl2Categories_NACE:steinen +cl2Categories_NACE:und
> > +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
> > +cl2Categories_NACE:bergbau
> >
> > I expected the query parser would also tokenize ONLY at the pattern ###,
> > instead of using a white space tokenizer here?
> > Is is possible to define a filter query, without using phrases, to
> achieve
> > the desired behavior?
> > Maybe local parameters are not the way to go here?
> >
> > Best
> > Dirk
> >
>


Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Jack Krupansky
The query parsers normally tokenize on white space and query operators, but 
you can escape any white space with backslash or put the text in quotes and 
then it will be tokenized by the analyzer rather than the query parser.


Also, you have:



Change "search" to "query", but that won't change your problem since Solr 
defaults to using the "index" analyzer if it doesn't "see" a "query" 
analyzer.


-- Jack Krupansky

-Original Message- 
From: Dirk Högemann

Sent: Monday, December 17, 2012 5:59 AM
To: solr-user@lucene.apache.org
Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at 
whitespace?


Hi,

I am not sure if am missing something, or maybe I do not exactly understand
the index/search analyzer definition and their execution.

I have a field definition like this:


   
 
   
   
 
 
   
   
 
   

Any field starting with cl2 should be recognized as being of type
cl2Tokenized_string:


When I try to search for a token in that sense the query is tokenized at
whitespaces:

{!q.op=AND
df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen und
Erden, sonstiger Bergbau+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau

I expected the query parser would also tokenize ONLY at the pattern ###,
instead of using a white space tokenizer here?
Is is possible to define a filter query, without using phrases, to achieve
the desired behavior?
Maybe local parameters are not the way to go here?

Best
Dirk 



Re: Wildcard inside " " ?

2012-12-17 Thread Jack Krupansky

Yeah, it's fully integrated into 4.0.

There is a patch here that may or may not work with 3.6:
https://issues.apache.org/jira/browse/SOLR-2703

-- Jack Krupansky

-Original Message- 
From: Bruno Mannina

Sent: Monday, December 17, 2012 9:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Wildcard inside " " ?

Both method doesn't work on my Solr 3.6, with the error message:

_Unknown query type 'surround'_




Le 17/12/2012 15:18, Jack Krupansky a écrit :

Not with the standard query parsers, but you can do it with the
"surround" query parser:

defType=surround&q=title:(plast* w bicycle)

Or

q={!surround}title:(plast* w bicycle)

Two notes:
1. Surround performs NO analysis, so you have to manually analyze your
terms, like lower case them if necessary.
2. Surround has no dismax feature, so it won't search multiple fields
automatically.

-- Jack Krupansky

-Original Message- From: Bruno Mannina
Sent: Monday, December 17, 2012 8:04 AM
To: solr-user@lucene.apache.org
Subject: Wildcard inside " " ?

Dear Users,

Wildcard seems not to work inside double quote request.
I get always 0 result.

I.e:
title:"plastic bicycle" <== 79 results in my database
title:"plast* bicycle" <== 0 result found

Is exist a solution for that ?

Thanks a lot,
Bruno






Index Update for XPathEntityProcessor

2012-12-17 Thread Lighton Phiri
Hello,

First, apologies for cross posting; I initially posted this on
Stackoverflow [1] and just realised I might have better luck posting
the question here.

I am relatively new to Solr and currently working the the
XPathEntityProcessor DIH. I have a dataset that will be periodically
updated with new XML files and would like to find what best approach
would be best suited seeing that delta-import is only supported in
sqlEntityProcessor [2].

[1] http://stackoverflow.com/q/13914465/664424
[2] http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command-1

--
Phiri
http://lightonphiri.org


Faceting on Dynamic fields

2012-12-17 Thread Mohamed Zahoor
Hi

I have many dynamic fields in my schema  name_X where X can range from 0 to 10.
Not all documents will have all the fields from 0 to 10.

I want to do a fecet on these fields.
I have seen SOLR-247 and other queries in this list.


Is there any other way other than patching SOLR-247 on 4.0?

./Zahoor


Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann
Ok- right, changed that... Nevertheless I thought I should always use the
same analyzers for the query and the index section to have consistent
results.
Does this mean that the tokenizer in the query section will always be
ignored by the given query parsers?



2012/12/17 Jack Krupansky 

> The query parsers normally tokenize on white space and query operators,
> but you can escape any white space with backslash or put the text in quotes
> and then it will be tokenized by the analyzer rather than the query parser.
>
> Also, you have:
>
> 
>
> Change "search" to "query", but that won't change your problem since Solr
> defaults to using the "index" analyzer if it doesn't "see" a "query"
> analyzer.
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Monday, December 17, 2012 5:59 AM
> To: solr-user@lucene.apache.org
> Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at
> whitespace?
>
>
> Hi,
>
> I am not sure if am missing something, or maybe I do not exactly understand
> the index/search analyzer definition and their execution.
>
> I have a field definition like this:
>
>
> sortMissingLast="true" omitNorms="true">
>  
> group="-1"/>
>
>  
>  
> group="-1"/>
>
>  
>
>
> Any field starting with cl2 should be recognized as being of type
> cl2Tokenized_string:
>  stored="true" />
>
> When I try to search for a token in that sense the query is tokenized at
> whitespaces:
>
> {!**q.op=AND
> df=cl2Categories_NACE}**cl2Categories_NACE:08 Gewinnung von Steinen und
> Erden, sonstiger Bergbau name="parsed_filter_queries"><**str>+cl2Categories_NACE:08
> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
> +cl2Categories_NACE:bergbau
>
> I expected the query parser would also tokenize ONLY at the pattern ###,
> instead of using a white space tokenizer here?
> Is is possible to define a filter query, without using phrases, to achieve
> the desired behavior?
> Maybe local parameters are not the way to go here?
>
> Best
> Dirk
>


Re: Highlighting data stored outside of Solr

2012-12-17 Thread P Williams
Your problem seems really similar to "It should be possible to highlight
external text"  in JIRA.

Tricia
[https://issues.apache.org/jira/browse/SOLR-1397]

On Tue, Dec 11, 2012 at 12:48 PM, Michael Ryan  wrote:

> Has anyone ever attempted to highlight a field that is not stored in Solr?
>  We have been considering not storing fields in Solr, but still would like
> to use Solr's built-in highlighting.  On first glance, it looks like it
> would be fairly simply to modify DefaultSolrHighlighter to get the stored
> fields from an external source.  We already do not use term vectors, so no
> concerns there.  Any gotchas that I am not seeing?
>
> -Michael
>


Re: how to understand this benchmark test results (compare index size after schema change)

2012-12-17 Thread Jie Sun
thanks Erik ... I did run optimize on both indices to get ride of the deleted
data when compare to each other. (and my benchmark tests were just indexing
5000 new documents without duplicates...into a new core...  but I did
optimize just to make sure).

I think one results is consistent that the .fdt/.fdx files are reduced by
30-60% after the stored= changes. So that is very promising results for my
purpose.

I am trying to get rid of the .frq (which is the 3rd largest seg files in my
production), I have some discussion in another topic about this.
thanks!
Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-understand-this-benchmark-test-results-compare-index-size-after-schema-change-tp4026674p4027544.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searches with phonetics

2012-12-17 Thread Steve Rowe
In addition to reindexing, you should insure that your query analyzer has the 
same lowercasing behavior as your index analyzer.  Otherwise "july" may not 
match "July", and vice versa.

On Dec 17, 2012, at 4:57 AM, Sangeetha  wrote:

> Hi,
> 
> I have not done anything in my schema.xml for phonetics search. But it
> searches and returns *july *when i give *juli*. But i dont want this. How to
> avoid that?
> 
> 
> Thanks,
> Sangeetha
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Searches-with-phonetics-tp4027487.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Jack Krupansky
No, the "query" analyzer tokenizer will simply be applied to each term or 
quoted string AFTER the query parser has already parsed it. You may have 
escaped or quoted characters which will then be seen by the analyzer 
tokenizer.


-- Jack Krupansky

-Original Message- 
From: Dirk Högemann

Sent: Monday, December 17, 2012 11:01 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at 
whitespace?


Ok- right, changed that... Nevertheless I thought I should always use the
same analyzers for the query and the index section to have consistent
results.
Does this mean that the tokenizer in the query section will always be
ignored by the given query parsers?



2012/12/17 Jack Krupansky 


The query parsers normally tokenize on white space and query operators,
but you can escape any white space with backslash or put the text in 
quotes
and then it will be tokenized by the analyzer rather than the query 
parser.


Also, you have:



Change "search" to "query", but that won't change your problem since Solr
defaults to using the "index" analyzer if it doesn't "see" a "query"
analyzer.

-- Jack Krupansky

-Original Message- From: Dirk Högemann
Sent: Monday, December 17, 2012 5:59 AM
To: solr-user@lucene.apache.org
Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at
whitespace?


Hi,

I am not sure if am missing something, or maybe I do not exactly 
understand

the index/search analyzer definition and their execution.

I have a field definition like this:


   
 
   
   
 
 
   
   
 
   

Any field starting with cl2 should be recognized as being of type
cl2Tokenized_string:


When I try to search for a token in that sense the query is tokenized at
whitespaces:

{!**q.op=AND
df=cl2Categories_NACE}**cl2Categories_NACE:08 Gewinnung von Steinen und
Erden, sonstiger Bergbau<**str>+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau

I expected the query parser would also tokenize ONLY at the pattern ###,
instead of using a white space tokenizer here?
Is is possible to define a filter query, without using phrases, to achieve
the desired behavior?
Maybe local parameters are not the way to go here?

Best
Dirk





Re: if I only need exact search, does frequency/score matter?

2012-12-17 Thread Jie Sun
thanks, this is very helpful



--
View this message in context: 
http://lucene.472066.n3.nabble.com/if-I-only-need-exact-search-does-frequency-score-matter-tp4026893p4027559.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: if I only need exact search, does frequency/score matter?

2012-12-17 Thread Jie Sun
Hi Otis,

do you think I should customize both tf and idf to disable the term
frequency?

i.e. something like:

public float tf(float freq) {
return freq > 0 ? 1.0f : 0.0f;
}

  public float idf(int docFreq, int numDocs) {
return docFreq > 0 ? 1.0f : 0.0f;
  }

thanks!
Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/if-I-only-need-exact-search-does-frequency-score-matter-tp4026893p4027578.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann
Ah - now I got it. My solution to this was to use phrase queries - now I
know why: Thanks!
2012/12/17 Jack Krupansky 

> No, the "query" analyzer tokenizer will simply be applied to each term or
> quoted string AFTER the query parser has already parsed it. You may have
> escaped or quoted characters which will then be seen by the analyzer
> tokenizer.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Monday, December 17, 2012 11:01 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always
> at whitespace?
>
>
> Ok- right, changed that... Nevertheless I thought I should always use the
> same analyzers for the query and the index section to have consistent
> results.
> Does this mean that the tokenizer in the query section will always be
> ignored by the given query parsers?
>
>
>
> 2012/12/17 Jack Krupansky 
>
>  The query parsers normally tokenize on white space and query operators,
>> but you can escape any white space with backslash or put the text in
>> quotes
>> and then it will be tokenized by the analyzer rather than the query
>> parser.
>>
>> Also, you have:
>>
>> 
>>
>> Change "search" to "query", but that won't change your problem since Solr
>> defaults to using the "index" analyzer if it doesn't "see" a "query"
>> analyzer.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Dirk Högemann
>> Sent: Monday, December 17, 2012 5:59 AM
>> To: solr-user@lucene.apache.org
>> Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at
>> whitespace?
>>
>>
>> Hi,
>>
>> I am not sure if am missing something, or maybe I do not exactly
>> understand
>> the index/search analyzer definition and their execution.
>>
>> I have a field definition like this:
>>
>>
>>> sortMissingLast="true" omitNorms="true">
>>  
>>> group="-1"/>
>>
>>  
>>  
>>> group="-1"/>
>>
>>
>>  
>>
>>
>> Any field starting with cl2 should be recognized as being of type
>> cl2Tokenized_string:
>> > stored="true" />
>>
>> When I try to search for a token in that sense the query is tokenized at
>> whitespaces:
>>
>> {!q.op=AND
>> df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen
>> und
>>
>> Erden, sonstiger Bergbau> name="parsed_filter_queries">+cl2Categories_NACE:08
>>
>> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
>> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
>> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
>> +cl2Categories_NACE:bergbau
>>
>>
>> I expected the query parser would also tokenize ONLY at the pattern ###,
>> instead of using a white space tokenizer here?
>> Is is possible to define a filter query, without using phrases, to achieve
>> the desired behavior?
>> Maybe local parameters are not the way to go here?
>>
>> Best
>> Dirk
>>
>>
>


Spell Check is not working properly

2012-12-17 Thread Dixline
When i try spell check with query parameter q=testtt it is returning the
results properly but when i try with q=tett i'm not getting any
suggestions. The correct value is test. Why does spell check work properly
for certain query where it fails in certain cases? Is there any format for
the query?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spell-Check-is-not-working-properly-tp4027558.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to retrieve the maximum id in one core/collection effectively

2012-12-17 Thread Otis Gospodnetic
Hello,

Have a look at http://wiki.apache.org/solr/StatsComponent

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Dec 17, 2012 at 9:10 AM, SuoNayi  wrote:

> Hi dear list,
>
> How to retrieve the maximum id in one core/collection effectively?What I
> only know is to search with sort on the id field but for solr clound there
> are millions of documents in the core/collection this approach may suffer
> from the penalty on performance.I do not make sure if the functional query
> may help or not?
>
>
> Thanks,
>
>
> SuoNayi


Re: Large import making solr unresponsive

2012-12-17 Thread Otis Gospodnetic
Hi Brent,

You said "from what I can tell there is no disk, network, or memory pressure "
- maybe you can share what and how you checked this? (see my signature for
a tool that can help with this)

I'm asking because the above is in conflict with "responses from solr still
come back with a <10ms qtime", which indicate search itself was fast, but
either disk or network were slow.  Try with rows= and
rows=0 and that will give you an idea where to look.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html





On Mon, Dec 17, 2012 at 1:04 AM, Brent Mills  wrote:

> This is an issue we've only been running into lately so I'm not sure what
> to make of it.  We have 2 cores on a solr machine right now, one of them is
> about 10k documents, the other is about 1.5mil.  None of the documents are
> very large, only about 30 short attributes.  We also have about 10
> requests/sec hitting the smaller core and less on the larger one.  Whenever
> we try to do a full import on the smaller one everything is fine, the
> response times stay the same during the whole 30 seconds it takes to run
> the indexer.  The cpu also stays fairly low.
>
> When we run a full import on the larger one the response times on all
> cores tank from about 10ms to over 8 seconds.  We have a 4 core machine
> (VM) and I've noticed 1 core stays pegged the entire time which is
> understandable since the DIH as I understand it is single threaded.  Also,
> from what I can tell there is no disk, network, or memory pressure (8gb)
> either and the other procs do virtually nothing.  Also the responses from
> solr still come back with a <10ms qtime.  My best guess at this point is
> tomcat is having issues when the single proc gets pegged but I'm at a loss
> on how to further diagnose this to a tomcat issue or something weird that
> solr is doing.
>
> Has anyone run into this before or have ideas about what might be
> happening?
>


Re: fieldType custom search

2012-12-17 Thread Otis Gospodnetic
Hi Antoine,

We didn't use grouping. We didn't try to reorder millions of documents
because that would mean they would all need to first read from disk/cache,
then reordered, and then returned, which would be slooow.  We did get more
docs than we needed because in our case our goal was to diversify hits (as
opposed to having hits ordered so uniformly according to a strict pattern
as in your example).  From what I can tell, what you are after could be
accomplished with the same thing we did and a slightly different algo
(which just so happens to be pluggable in our case).

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Sat, Dec 15, 2012 at 10:28 AM, Antoine LE FLOC'H wrote:

> Otis,
> Can you give more details on this ? Sounds interesting to me. What about if
> you are trying to re-order millions of Lucene documents ? Did you use
> grouping first ?
> Antoine.
>
>
> On Thu, Dec 13, 2012 at 8:54 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
> > Hi,
> >
> > We've done something very similar to this before.  We implemented it as a
> > custom SearchComponent with a pluggable hit (re)ordering mechanism.
> >
> > Otis
> > --
> > SOLR Performance Monitoring - http://sematext.com/spm/index.html
> > Search Analytics - http://sematext.com/search-analytics/index.html
> >
> >
> >
> >
> > On Thu, Dec 13, 2012 at 8:06 AM, nihed mbarek  wrote:
> >
> > > actually, I have as schema albums with
> > > artist / album_name / album_description
> > >
> > > When I made a search without query the result is (having the some
> score)
> > :
> > > artist A
> > > artist A
> > > artist A
> > > artist B
> > > artist B
> > > artist B
> > > artist C
> > > artist C
> > > => depends on my indexing process
> > >
> > > what I want as result :
> > > artist A
> > > artist B
> > > artist C
> > > artist A
> > > artist B
> > > artist C
> > > artist A
> > > artist B
> > >
> > > => a circular result to see what I have as artist on my solr
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Dec 13, 2012 at 1:59 PM, Tomás Fernández Löbbe <
> > > tomasflo...@gmail.com> wrote:
> > >
> > > > What do you mean? Could you explain your use case?
> > > >
> > > > Tomás
> > > >
> > > >
> > > > On Thu, Dec 13, 2012 at 9:36 AM, nihed mbarek 
> > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > Is it possible to define a custom search for a fieldType on a
> schema
> > ?
> > > ?
> > > > >
> > > > > Regards,
> > > > >
> > > > > --
> > > > >
> > > > > M'BAREK Med Nihed
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > M'BAREK Med Nihed,
> > > Fedora Ambassador, TUNISIA, Northern Africa
> > > http://www.nihed.com
> > >
> > > 
> > >
> >
>


RE: Spell Check is not working properly

2012-12-17 Thread Dyer, James
The spellcheckers (IndexBasedSpellChecker and DirectSolrSpellChecker) both have 
tuning parameters that control how similar a potential correction needs to be 
from the original query term in order to be considered.  For 
IndexBasedSpellChecker, there is "spellcheck.accuracy", which should be a 
number between 0 and 1.  The default is .5.  Numbers higher than .5 make it 
require the correction to be more similar than the original.  Numbers lower 
than .5 allow the potential correction to be less like the original.  So you 
might weant to try something lower than .5 .  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.accuracy .

If you're using DirectSolrSpellChecker, in addition to "accuracy", there are a 
number of other tuning parameters.  See 
http://lucene.apache.org/solr/4_0_0//solr-core/org/apache/solr/spelling/DirectSolrSpellChecker.html
 .  Of particular note is "maxEdits", which can either be 1 or 2 and no higher. 
 (an "edit" is either inserting a character, deleting a character, substituting 
a character or transposing 2 adjacent characters, so to go from "tett", you 
need 6 edits to get "test".  In other words, DirectSolrSpellChecker will never 
make this correction for you.)

In practice, I don't think it is fruitful to try and correct anything that 
needs more than 2 or so edits.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Dixline [mailto:dixli...@gmail.com] 
Sent: Monday, December 17, 2012 11:07 AM
To: solr-user@lucene.apache.org
Subject: Spell Check is not working properly

When i try spell check with query parameter q=testtt it is returning the
results properly but when i try with q=tett i'm not getting any
suggestions. The correct value is test. Why does spell check work properly
for certain query where it fails in certain cases? Is there any format for
the query?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spell-Check-is-not-working-properly-tp4027558.html
Sent from the Solr - User mailing list archive at Nabble.com.




configuring per-field similarity in Solr 4: "the global similarity does not support it"

2012-12-17 Thread Tom Burton-West
Hello,

I have Solr 4 configured with several fields using different similarity
classes according to:
http://wiki.apache.org/solr/SchemaXml#Similarity

However, I get this error message:
" FieldType 'DFR' is configured with a similarity, but the global
similarity does not support it: class
org.apache.solr.search.similarities.DefaultSimilarityFactory"

Excerpt from schema.xml below.

What I am trying to do is have any field that doesn't specify a similarity
to use the default, but to set up 3 specific fields to use the DFR, IB, and
BM25 similarities respectively.

I think I'm missing something here.  Can someone point me to documentation
or examples?

Tom


Simplified schema.xml excerpt:
 
  


  
  


  






 
  


  
  



  


  I(F)
  B
  H2






 
  


  
  


  

 
  SPL
  DF
  H2




 
  


  
  


  

 

  1.2
  0.75













===-
Excerpt from actual schema.xml
 
  






  
  






  






 
  






  
  






  


  I(F)
  B
  H2






 
  






  
  






  
 
  SPL
  DF
  H2






 
  






  
  






  


 

  1.2
  0.75





RE: configuring per-field similarity in Solr 4: "the global similarity does not support it"

2012-12-17 Thread Markus Jelsma
Hi Tom,

The global similarity must be able to delegate similarity to your per-field 
setting. Solr has the SchemaSimilarityFactory that can do this. Please replace 
your global similarity with:



Keep in mind that coord and queryNorm (=1.0f) are not implemented now, so you 
will get different scores for TF-IDF!

Cheers,

 
 
-Original message-
> From:Tom Burton-West 
> Sent: Mon 17-Dec-2012 23:11
> To: solr-user@lucene.apache.org
> Subject: configuring per-field similarity in Solr 4: "the global 
> similarity does not support it"
> 
> Hello,
> 
> I have Solr 4 configured with several fields using different similarity
> classes according to:
> http://wiki.apache.org/solr/SchemaXml#Similarity
> 
> However, I get this error message:
> " FieldType 'DFR' is configured with a similarity, but the global
> similarity does not support it: class
> org.apache.solr.search.similarities.DefaultSimilarityFactory"
> 
> Excerpt from schema.xml below.
> 
> What I am trying to do is have any field that doesn't specify a similarity
> to use the default, but to set up 3 specific fields to use the DFR, IB, and
> BM25 similarities respectively.
> 
> I think I'm missing something here.  Can someone point me to documentation
> or examples?
> 
> Tom
> 
> 
> Simplified schema.xml excerpt:
>   positionIncrementGap="100"  autoGeneratePhraseQueries="false">
>   
> 
> 
>   
>   
> 
> 
>   
> 
> 
> 
> 
> 
> 
>autoGeneratePhraseQueries="false">
>   
> 
> 
>   
>   
> 
> 
> 
>   
> 
> 
>   I(F)
>   B
>   H2
> 
> 
> 
> 
> 
> 
>autoGeneratePhraseQueries="false">
>   
> 
> 
>   
>   
> 
> 
>   
> 
>  
>   SPL
>   DF
>   H2
> 
> 
> 
> 
>autoGeneratePhraseQueries="false">
>   
> 
> 
>   
>   
> 
> 
>   
> 
>  
> 
>   1.2
>   0.75
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ===-
> Excerpt from actual schema.xml
>   positionIncrementGap="100"  autoGeneratePhraseQueries="false">
>   
> 
> 
>   han="true" hiragana="true"
> katakana="false" hangul="false"   />
> 
> 
>  words="1000common.txt" />
>   
>   
> 
> 
> 
> han="true" hiragana="true"
>   katakana="false" hangul="false"   />
> 
>  words="1000common.txt" />
>   
> 
> 
> 
> 
> 
> 
>autoGeneratePhraseQueries="false">
>   
> 
> 
>   han="true" hiragana="true"
> katakana="false" hangul="false"   />
> 
> 
>  words="1000common.txt" />
>   
>   
> 
> 
> 
> han="true" hiragana="true"
>   katakana="false" hangul="false"   />
> 
>  words="1000common.txt" />
>   
> 
> 
>   I(F)
>   B
>   H2
> 
> 
> 
> 
> 
> 
>autoGeneratePhraseQueries="false">
>   
> 
> 
>   han="true" hiragana="true"
> katakana="false" hangul="false"   />
> 
> 
>  words="1000common.txt" />
>   
>   
> 
> 
> 
> han="true" hiragana="true"
>   katakana="false" hangul="false"   />
> 
>  words="1000common.txt" />
>   
>  
>   SPL
>   DF
>   H2
> 
> 
> 
> 
> 
> 
>autoGeneratePhraseQueries="false">
>   
> 
> 
>   han="true" hiragana="true"
> katakana="false" hangul="false"   />
> 
> 
>  words="1000common.txt" />
>   
>   
> 
> 
> 
> han="true" hiragana="true"
>   katakana="false" hangul="false"   />
> 
>  words="1000common.txt" />
>   
> 
> 
>  
> 
>   1.2
>   0.75
> 
> 
> 
> 


Re: configuring per-field similarity in Solr 4: "the global similarity does not support it"

2012-12-17 Thread Tom Burton-West
Thanks Markus!

Adding  fixed the problem.

>>Keep in mind that coord and queryNorm (=1.0f) are not implemented now, so
you will get different scores for TF-IDF!

Can you explain more about this, or is it documented somewhere?
Do I need to read the source for solr.SchemaSimilarityFactory?
Is there a plan to implement coord and queryNorm?


Tom

On Mon, Dec 17, 2012 at 5:17 PM, Markus Jelsma
wrote:

> Hi Tom,
>
> The global similarity must be able to delegate similarity to your
> per-field setting. Solr has the SchemaSimilarityFactory that can do this.
> Please replace your global similarity with:
>
> 
>
> Keep in mind that coord and queryNorm (=1.0f) are not implemented now, so
> you will get different scores for TF-IDF!
>
> Cheers,
>
>
>


Re: Beginner's view

2012-12-17 Thread Shawn Heisey

On 12/17/2012 11:55 PM, Alexandre Rafalovitch wrote:

Again, thinking from the beginner's view, is there any reason Solr throws
scary looking exception traces in the console when it cannot find optional
html files for admin interfaces (e.g. admin-extra.menu-top.html).

 From my days of tech support, I know that operators are often trained that
ANY exception is not ok, so having several of them (and SEVERE at that)
could make people really confused and worried. And, if there were any real
exceptions in there, they may get shadowed by these less important ones.

Would it make sense to catch those exceptions and log them as Info messages
instead?


I've already filed an issue for this.  No activity so far.

https://issues.apache.org/jira/browse/SOLR-3972

Thanks,
Shawn



Solr atomic update of multi-valued field

2012-12-17 Thread Dikchant Sahi
Hi,

Does Solr 4.0 allows to update the values of multi-valued field? Say I have
list of values for skills field like java, j2ee and i want to change it
to solr, lucene.

I was trying to play with atomic updates and below is my observation:

I have following document in my index:

1
Dikchant
software engineer

java
j2ee



To update the skills to solr, lucene, I indexed document as follows:

**
*  *
*1*
*solr*
*lucene*
*  *
**

The document added to index is as follows:
**
*  1*
*  *
*{set=solr}*
*{set=lucene}*
*  *
**

This is not what I was looking for. I found 2 issues:
1. The value of name field was lost
2. The skills fields had some junks like *{set=solr}*
*
*
*
*
Then, to achieve my goal, I tried something different. I tried setting some
single valued field with update="set" parameter to the same value and also
provided the values of multi-valued field as we do while adding new
document.

  
1
*Dikchant*
solr
lucene
  


With this the index looks as follows:

1
Dikchant
software engineer

solr
lucene



The values of multivalued field is changed and value of other field is not
deleted.

The question that comes to my mind is, does Solr 4.0 allows update of
multi-valued field? if yes, is this how it works or am I doing something
wrong?

Regards,
Dikchant


Solr Cloud 4.0 Production Ready?

2012-12-17 Thread Cool Techi
Hi,

We have been using solr 3.5 in our production for sometime now and facing the 
problems faced by a large solr index. We wanted to migrate to Solr Cloud and 
have started some experimentation. But in the mean time also following the user 
forum and seem to be noticing a lot of bugs which were raised post the release 
and will be fixed in 4.1.

Should we wait for 4.1 release for production or we can go ahead with the 
current release?

Regards,
Ayush


  

"order" question on solr multi value field

2012-12-17 Thread hellorsanjeev
Hi - We have been using solr since over an year. Some of our fields are multi
valued. We are saving and retrieving data from those multi valued fields
perfectly fine assuming the order in which we save, we will get the values
in the same order.

The question is - Is it documented somewhere that Solr maintains the order
of values inserted into multi-valued field OR is it just by luck we are
getting them in insertion order? By my experience so far, it has always
maintained.

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/order-question-on-solr-multi-value-field-tp4027695.html
Sent from the Solr - User mailing list archive at Nabble.com.