Re: Solr 4.0 segment flush times has bigger difference between tow machines

2012-10-20 Thread Erick Erickson
My first question is why this matters? Is this curiosity or is there a real
performance issue you're tracking down?

I don't quite understand when you say "machine A forwards...to machineB".
Are you talking about replication here? Or SolrCloud? Details matter, a lot
DIH has nothing that I know of that forwards anything anywhere, so there
must be something you're not telling us about the setup

But the first thing I'd check is what the solrconfig.xml values are for
committing on both machines. Are they identical?

Best
Erick

On Fri, Oct 19, 2012 at 12:53 AM, Jun Wang  wrote:
> Hi
>
> I have 2 machine for a collection, and it's using DIH to import data, DIH
> is trigger via url request at one machine, let's call it A, and A will
> forward some index to machine B. Recently I have found that segment flush
> happened more in machine B. here is part of INFOSTREAM.txt.
>
> Machine A:
> 
> DWPT 0 [Thu Oct 18 20:06:20 PDT 2012; Thread-39]: flush postings as segment
> _4r3 numDocs=71616
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has 0 deleted
> docs
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has no
> vectors; no norms; no docValues; prox; freqs
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]:
> flushedFiles=[_4r3_Lucene40_0.prx, _4r3.fdt, _4r3.fdx, _4r3.fnm,
> _4r3_Lucene40_0.tip, _4r3_Lucene40_0.tim, _4r3_Lucene40_0.frq]
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushed codec=Lucene40
> D
>
> Machine B
> --
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flush postings
> as segment _zi0 numDocs=4302
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment has
> 0 deleted docs
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment has
> no vectors; no norms; no docValues; prox; freqs
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]:
> flushedFiles=[_zi0_Lucene40_0.prx, _zi0.fdx, _zi0_Lucene40_0.tim, _zi0.fdt,
> _zi0.fnm, _zi0_Lucene40_0.frq, _zi0_Lucene40_0.tip]
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushed
> codec=Lucene40
> D
>
> I have found that flush occured  when number of doc in RAM reached
> 7~9000 in machine A, but the number in machine B is very different,
> almost is 4000.  It seem that every doc in buffer used more RAM in machine
> B then machine A, that result in more flush . Does any one know why this
> happened?
>
> My conf is here.
>
> 6410
>
>
>
>
> --
> from Jun Wang


Re: Solr-4.0.0 DIH not indexing xml attributes

2012-10-20 Thread Billy Newman
Sorry guys, it had nothing to do with the DIH's ability to parse attributes. My 
xslt did not work with the DIH. I used xsltproc to test my xslt and it worked 
great. However the DIH xslt transformation failed. I was able to move some 
things around  in the xslt to get things working. 

Not sure if the DIH can print out the XML file after xslt transformation, but 
that would have helped in debugging. There were a few errors in the log file 
which lead me to change the xslt file so in the end that was sufficient. 

As always thanks again for the response!

Billy

Sent from my iPhone

On Oct 19, 2012, at 9:07 PM, Lance Norskog  wrote:

> Do other fields get added?
> Do these fields have type problems? I.e. is 'attr1' a number and you are 
> adding a string?
> There is a logging EP that I think shows the data found- I don't know how to 
> use it.
> Is it possible to post the whole DIH script?
> 
> - Original Message -
> | From: "Billy Newman" 
> | To: solr-user@lucene.apache.org
> | Sent: Friday, October 19, 2012 9:06:08 AM
> | Subject: Solr-4.0.0 DIH not indexing xml attributes
> | 
> | Hello all,
> | 
> | I am having problems indexing xml attributes using the DIH.
> | 
> | I have the following xml:
> | 
> | 
> | 
> | 
> | 
> | 
> | However nothing is getting inserted into my index.
> | 
> | I am pretty sure this should work so I have no idea what is wrong.
> | 
> | Can anyone else confirm that this is a problem?  Or is it just me?
> | 
> | Thanks,
> | Billy
> | 


Re: Multicore setup is ignored when deploying solr.war on Tomcat 5/6/7

2012-10-20 Thread Rogerio Pereira
Here`s the catalina.out contents:

Out 20, 2012 12:55:58 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: using system property solr.solr.home: /home/rogerio/Dados/salutisvitae
Out 20, 2012 12:55:58 PM org.apache.solr.core.SolrResourceLoader 
INFO: new SolrResourceLoader for deduced Solr Home:
'/home/rogerio/Dados/salutisvitae/'
Out 20, 2012 12:55:58 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Out 20, 2012 12:55:58 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: No /solr/home in JNDI
Out 20, 2012 12:55:58 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: using system property solr.solr.home: /home/rogerio/Dados/salutisvitae
Out 20, 2012 12:55:58 PM org.apache.solr.core.CoreContainer$Initializer
initialize
INFO: looking for solr.xml: /home/rogerio/Dados/salutisvitae/solr.xml
Out 20, 2012 12:55:58 PM org.apache.solr.core.CoreContainer 
INFO: New CoreContainer 1806276996

/home/rogerio/Dados/salutisvitae really exists and has two core dirs,
collection1 and collection2, but only collection1 is initialized as we can
see below:

INFO: unique key field: id
Out 20, 2012 12:56:29 PM org.apache.solr.core.SolrCore 
INFO: [collection1] Opening new SolrCore at
/home/rogerio/Dados/salutisvitae/collection1/,
dataDir=/home/rogerio/Dados/salutisvitae/collection1/data/
Out 20, 2012 12:56:29 PM org.apache.solr.core.SolrCore 
INFO: JMX monitoring not detected for core: collection1
Out 20, 2012 12:56:29 PM org.apache.solr.core.SolrCore getNewIndexDir
WARNING: New index directory detected: old=null
new=/home/rogerio/Dados/salutisvitae/collection1/data/index/
Out 20, 2012 12:56:29 PM org.apache.solr.core.CachingDirectoryFactory get
INFO: return new directory for
/home/rogerio/Dados/salutisvitae/collection1/data/index forceNew:false

No more cores are initialized after collection1.

Note, I`m just making a simple copy of multicore example
to /home/rogerio/Dados/salutisvitae and renaming core1 to collection1,
copying collection1 to collection2 and doing the configuration changes on
solrconfig.xml, and to set the path above I`m using the solr.solr.home
system property with solr admin deployed on tomcat from solr.war

I`m getting the same strange behavior on both Xubuntu 10.04 and Ubuntu 12.10

2012/10/16 Chris Hostetter 

> : To answer your question, I tried both -Dsolr.solr.home and solr/home JNDI
> : variable, in both cases I got the same result.
> :
> : I checked the logs several times, solr always only loads up the
> collection1,
>
> That doesn't really answer any of the questions i was asking you.
>
> *Before* solr logs anything about loading collection1, it will log
> information about how/where it is locating the solr home dir and
> solr.xml
>
> : if you look at the logging when solr first starts up, you should ese
> : several messages about how/where it's trying to locate the Solr Home Dir
> : ... please double check that it's finding the one you intended.
> :
> : Please give us more details about those log messages related to the solr
> : home dir, as well as how you are trying to set it, and what your
> directory
> : structure looks like in tomcat.
>
> For example, this is what Solr logs if it can't detect either the system
> property, or JNDI, and is assuming it should use "./solr" ...
>
> Oct 16, 2012 8:48:52 AM org.apache.solr.core.SolrResourceLoader
> locateSolrHome
> INFO: JNDI not configured for solr (NoInitialContextEx)
> Oct 16, 2012 8:48:52 AM org.apache.solr.core.SolrResourceLoader
> locateSolrHome
> INFO: solr home defaulted to 'solr/' (could not find system property or
> JNDI)
> Oct 16, 2012 8:48:52 AM org.apache.solr.core.SolrResourceLoader 
> INFO: new SolrResourceLoader for deduced Solr Home: 'solr/'
> Oct 16, 2012 8:48:53 AM org.apache.solr.servlet.SolrDispatchFilter init
> INFO: SolrDispatchFilter.init()
> Oct 16, 2012 8:48:53 AM org.apache.solr.core.SolrResourceLoader
> locateSolrHome
> INFO: JNDI not configured for solr (NoInitialContextEx)
> Oct 16, 2012 8:48:53 AM org.apache.solr.core.SolrResourceLoader
> locateSolrHome
> INFO: solr home defaulted to 'solr/' (could not find system property or
> JNDI)
> Oct 16, 2012 8:48:53 AM org.apache.solr.core.CoreContainer$Initializer
> initialize
> INFO: looking for solr.xml:
> /home/hossman/lucene/dev/solr/example/solr/solr.xml
>
> What do your startup logs look like as far as finding the solr home dir?
>
> because my suspicion is that the reason it's not loading your
> multicore setup, or complaining about malformed xml in your solr.xml
> file, is because it's not fiding the directory you want at all.
>
>
>
> -Hoss
>



-- 
Regards,

Rogério Pereira Araújo

Blogs: http://faces.eti.br, http://ararog.blogspot.com
Twitter: http://twitter.com/ararog
Skype: rogerio.araujo
MSN: ara...@hotmail.com
Gtalk/FaceTime: rogerio.ara...@gmail.com

(0xx62) 8240 7212
(0xx62) 3920 2666


Index polygon/bbox with DIH

2012-10-20 Thread Billy Newman
Hey guys,

Just started using Solr 4 and my main use case involves indexing bounding 
boxes/polygons. I have a pretty small dataset and am currently using the DIH 
(URLDatasource) to index my XML. Part of my XML comes back as minx, miny, maxx, 
maxy. Is it possible to index my bbox using the DIH?  If so if you could 
someone please point me in the right direction?  Also I can modify my bbox as 
nessecary using xslt if I need it in a different format for the DIH to handle. 

As always thanks again,
Billy 

Sent from my iPhone

Re: SimpleTextCodec usage tips?

2012-10-20 Thread Erick Erickson
Yeah, all this is new, usage tips are often something that
gets done on an "as needed" basis. I've been curious about
per-field codecs, and your post prompted me to create a Wiki page here:

http://wiki.apache.org/solr/SimpleTextCodecExample

Feel free to edit it as you try it out, I find that the first time someone
tries something I write, there are invariably things that could be
clearer...

Best
Erick

On Fri, Oct 19, 2012 at 8:29 AM, seralf  wrote:
> Hi
>
> does anybody could give some direction / suggestion on how to correctly
> configure and use the SimpleTextCodec?
> http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/codecs/simpletext/SimpleTextCodec.html
>
> i'd like to do some test for debugging purpose, but i'm not shure on how to
> enable the pluggable codecs interface.
>
> as far as i understand, i have to use the codec factory in the schema.xml,
> but i didn't understand where to configure and choice the specific codec.
>
> thank you in advance (sorry if this question was earlier posted, i din't
> find any post on that),
>
> Alfredo Serafini


Re: SimpleTextCodec usage tips?

2012-10-20 Thread seralf
thanks very much Erick! i've missed the "postingsFormat="SimpleText"" part!

now it works as espected on solr4 :-)

i llok forward if to find some user case i could provide to your example,
and if i'll find i'll promise i'll add, thanks

Alfredo

2012/10/20 Erick Erickson 

> Yeah, all this is new, usage tips are often something that
> gets done on an "as needed" basis. I've been curious about
> per-field codecs, and your post prompted me to create a Wiki page here:
>
> http://wiki.apache.org/solr/SimpleTextCodecExample
>
> Feel free to edit it as you try it out, I find that the first time someone
> tries something I write, there are invariably things that could be
> clearer...
>
> Best
> Erick
>
> On Fri, Oct 19, 2012 at 8:29 AM, seralf  wrote:
> > Hi
> >
> > does anybody could give some direction / suggestion on how to correctly
> > configure and use the SimpleTextCodec?
> >
> http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/codecs/simpletext/SimpleTextCodec.html
> >
> > i'd like to do some test for debugging purpose, but i'm not shure on how
> to
> > enable the pluggable codecs interface.
> >
> > as far as i understand, i have to use the codec factory in the
> schema.xml,
> > but i didn't understand where to configure and choice the specific codec.
> >
> > thank you in advance (sorry if this question was earlier posted, i din't
> > find any post on that),
> >
> > Alfredo Serafini
>


Re: number and minus operator

2012-10-20 Thread Erick Erickson
Please review:
http://wiki.apache.org/solr/UsingMailingLists

There's not nearly enough information here to help you.

Best
Erick

On Fri, Oct 19, 2012 at 1:06 PM, calmsoul  wrote:
> I have a document with name ABC 102030 XYZ and if i search for this document
> with ABC and -"10" then i dont get this document (which is correct behavior)
> but when i do ABC and -10 i don't get the correct result back.  Any
> explanation around this.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/number-and-minus-operator-tp4014794.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 copyField not applying index analyzers

2012-10-20 Thread Erick Erickson
Are you sure you're not just seeing the stored value (which never have
analysis applied?) They're what you get back when you specify fl=blah,blivet

take a look at admin/schema browser and point it it the field to see what's
actually in the index. Or get a copy of Luke,

Otherwise, as Jack says, you have to tell us what the exact symptoms are.

Best
Erick

On Fri, Oct 19, 2012 at 5:00 PM, Jack Krupansky  wrote:
> What exactly is the precise symptom - give us an example with field names of
> source and dest and what precise value is in fact being indexed. Is the
> entire field value being indexed as a single term/string (if analyzer is not
> being applied)? Or, what?
>
> -- Jack Krupansky
>
> -Original Message- From: davers
> Sent: Friday, October 19, 2012 2:51 PM
> To: solr-user@lucene.apache.org
> Subject: Solr 4.0 copyField not applying index analyzers
>
>
> I am upgrading from solr 3.6 to solr 4.0 and my copyFields do not seem to be
> applying the index analyzers. I'm sure there is something i'm missing in my
> schema.xml. I am also using a DIH but I'm not sure that matters.
>
> 
>
>
>
>
> 
>
>
> 
>
>
>
>
>   
>   
>multiValued="true"/>
>multiValued="true"/>
>multiValued="true"/>
>   
>   
>   
>multiValued="true"/>
>multiValued="true"/>
>   
>   
>   
>   
>default="0"/>
>multiValued="true"/>
>   
>   
>   
>   
>   
>multiValued="true"/>
>multiValued="true"/>
>   
>multiValued="true"/>
>/>
>multiValued="true"/>
>   
>   
>   
>
>
>multiValued="false"/>
>
>
>multiValued="true"/>
>
>
>multiValued="false" default="NOW"/>
>multiValued="false" default="NOW"/>
>
>
>stored="true" multiValued="true" />
>
>stored="true" multiValued="true" omitNorms="true"
> omitTermFreqAndPositions="true" />
>
>stored="true" multiValued="true" omitNorms="true"
> omitTermFreqAndPositions="true" />
>
>multiValued="true"/>
>multiValued="true"/>
>multiValued="true"/>
>multiValued="true"/>
>multiValued="true"/>
>multiValued="true"/>
>multiValued="true"/>
>
>
>stored="false" />
>
>   
>multiValued="true"/>
>
>
>   
>   
>   
>   
>   
>
>   
>
>multiValued="true"/>
>
>   
>
>
>
>
> 
>
>
>
> id
>
>
>
>
>
>
>  
>  
>  
>  
>  
>  
>
>  
>  
>  
>  
>
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>
>
>  
>
>  
>
>
>
>
>
>
> sortMissingLast="true"/>
>
>
>
>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
>
>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
>
>
> positionIncrementGap="0"/>
>
>
> positionIncrementGap="0"/>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> positionIncrementGap="100">
>  
>
>  
>
>
>
> autoGeneratePhraseQueries="true">
>  
>
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>
> protected="protwords.txt"/>
>
>  
>  
>
> ignoreCase="true" expand="true"/>
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>
> protected="protwords.txt"/>
>
>  
>
>
> autoGeneratePhraseQueries="true">
>  
>
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>
> protected="protwords.txt"/>
>
>  
>  
>
> ignoreCase="true" expand="false"/>
> words="stopwords.txt"/>
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>
> protected="protwords.txt"/>
>
>  
>
>
> positionIncrementGap="100" omitNorms="true">
>  
>
> pattern="([\.,;:-_])" replacement=" " replace="all"/>
>
>
>
>  
>  
>
> ignoreCase="true" expand="true"/>
> pattern="([\.,;:-_])" replacement=" " replace="all"/>
>
>
>
>  
>
>
>
>
> positionIncrementGap="100">
>  
>
>
>
>  
>
>
>
>
>  
>
>
> pattern="([\.,;:_])" replacement=" " replace="all"/>
> minGramSize="1"/>
> pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
>  
>  
>
>
> pattern="([\.,;:_])" replacement=" " replace="all"/>
> pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
> pattern="^(.{30})(.*)?" replacement="$1" replace="all"/>
>  
>
>
>
>
>  
>
>
> minGramSize="1"/>
> pattern="([^\w\d\*æøåÆØÅ])" replacement="" replace="a

Re: Easy question ? docs with empty geodata field

2012-10-20 Thread David Smiley (@MITRE.org)
That'll probably work.  Or with Solr 4's new spatial field types you can do a
rectangle query of the whole world: geofieldname:[-90,-180 TO 90,180]. 
Perhaps it'd be nice to add explicit support for [* TO *].



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014938.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Data Writing Performance of Solr 4.0

2012-10-20 Thread Nagendra Nagarajayya

You may want to look at realtime NRT for this kind of performance:
https://issues.apache.org/jira/browse/SOLR-3816

You can download realtime NRT integrated with Apache Solr from here:
http://solr-ra.tgels.org


Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org



On 10/18/2012 11:50 PM, higashihara_hdk wrote:
> Hello everyone.
>
> I have two questions. I am considering using Solr 4.0 to perform full
> searches on the data output in real-time by a Storm cluster
> (http://storm-project.net/).
>
> 1. In particular, I'm concerned whether Solr would be able to keep up
> with the 2000-message-per-second throughput of the Storm cluster. What
> kind of throughput would I be able to expect from Solr 4.0, for example
> on a Xeon 2.5GHz 4-core with HDD?
>
> 2. Also, how efficiently would Solr scale with clustering?
>
> Any pertinent information would be greatly appreciated.
>
> Hideki Higashihara
>
>



Doing facet count using group truncating with distributed search

2012-10-20 Thread Kenneth Vindum
Hi Solr users!

 

Could any of you tell me how to do a facet count across several cores
excluding duplicates. Eg.

 

Core A:

Page 1

Id=a

Text=hello world

 

Page 2

Id=b

Text=hello again

 

Core B:

Page 1

Id=a

Text=Hej verden

 

Id=c

Text=Ny besked

 

Doing a facet count on core A gives me 2 elements. Doing a facet count on
core B gives me two element as well. Counting across both cores using shards
should return 3 elements when doing group.truncate on the element with Id=a.
This would work on a single core, but doing so on more than one core always
gives me a facet count = 4.

 

I've read the solr page
  saying 

Grouping is also supported for distributed searches from version 
 Solr3.5 and from version 
 Solr4.0. Currently group.truncate and
group.func are the only parameters that aren't supported for distributed
searches.

 

Is this because it's not possible to make this feature, or is it because
nobody needed it yet?

 

Thanks guys :)

 

Kind regards

Kenneth Vindum

 



Re: Easy question ? docs with empty geodata field

2012-10-20 Thread darul
Thank you Amit, 

I let you know on monday when at office because I do not have access to solr
from home...

But I guess I missed to use dynamic field in right way, a long time I do not
read my basics ;)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014943.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Easy question ? docs with empty geodata field

2012-10-20 Thread darul
Indeed, it would be nice we can use [* TO *]

Then, is it possible to deal with the following on solr 3.6:
geofieldname:[-90,-180 TO 90,180]





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014944.html
Sent from the Solr - User mailing list archive at Nabble.com.


Understanding Filter Queries

2012-10-20 Thread Amit Nithian
Hi all,

Quick question. I've been reading up on the filter query and how it's
implemented and the multiple articles I see keep referring to this
notion of leap frogging and filter query execution in parallel with
the main query. Question: Can someone point me to the code that does
this so I can better understand?

Thanks!
Amit


log4j binding finally working, more problems

2012-10-20 Thread Shawn Heisey
I managed to get a setup with a log4j binding working.  I modified the 
build script so that the dist-war-excl-slf4j target excludes all jars 
with slf4j in the name.  Then I put jars from the newest versions of 
slf4j and log4j into lib/ext under the jetty home.  Then I added 
-Dlog4j.configuration=file:etc/log4j.properties to the java commandline.


ncindex@bigindy5 /opt/solr4 $ ls lib/ext
jcl-over-slf4j-1.7.2.jar  log4j-over-slf4j-1.7.2.jar slf4j-log4j12-1.7.2.jar
log4j-1.2.17.jar  slf4j-api-1.7.2.jar

I've run into a new problem that may be related to seeing "Log watching 
is not yet implemented for log4j" in the log.  I can no longer change 
logging levels in the GUI.  When I click on Logging and then click on 
Level, I get a spinning icon and it says "Loading ..." but never 
finishes.  I'm thinking I should file an issue in Jira.


I started with branch_4x checked out about an hour before writing this 
message.


Thanks,
Shawn



Re: Understanding Filter Queries

2012-10-20 Thread Mikhail Khludnev
Amit,

Sure. this method
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L796beside
some other stuff calculates fq's docset intersection which is
supplied into filtered search call
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1474

You are welcome.

On Sun, Oct 21, 2012 at 12:00 AM, Amit Nithian  wrote:

> Hi all,
>
> Quick question. I've been reading up on the filter query and how it's
> implemented and the multiple articles I see keep referring to this
> notion of leap frogging and filter query execution in parallel with
> the main query. Question: Can someone point me to the code that does
> this so I can better understand?
>
> Thanks!
> Amit
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: Understanding Filter Queries

2012-10-20 Thread Amit Nithian
Thanks. So I was poking through this and see that the filters are
calculated up front and stored as docsets that get intersected and
passed into Lucene in the filter. The question though is that
somewhere in the IndexSearcher and somewhere into the scorer it does
this but I can't quite find where.

Thanks
Amit

On Sat, Oct 20, 2012 at 5:22 PM, Mikhail Khludnev
 wrote:
> Amit,
>
> Sure. this method
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L796beside
> some other stuff calculates fq's docset intersection which is
> supplied into filtered search call
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1474
>
> You are welcome.
>
> On Sun, Oct 21, 2012 at 12:00 AM, Amit Nithian  wrote:
>
>> Hi all,
>>
>> Quick question. I've been reading up on the filter query and how it's
>> implemented and the multiple articles I see keep referring to this
>> notion of leap frogging and filter query execution in parallel with
>> the main query. Question: Can someone point me to the code that does
>> this so I can better understand?
>>
>> Thanks!
>> Amit
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> 
>  


Re: Understanding Filter Queries

2012-10-20 Thread Amit Nithian
Okay I think I found it. Let me know if this makes sense (also for
those curious about this).

1) The IndexSearcher will create a FilteredQuery using the
RANDOM_ACCESS_STRATEGY by default (IndexSearcher#wrapFilter).

2) When the searcher requests the scorer, the FilteredQuery uses the
FilterStrategy to retrieve the scorer (FilterStrategy#filteredScorer)

3) The RandomAccessFilterStrategy seems to use a heuristic of whether
or not to use the leapfrog strategy when the first document returned
by the filter is < 100. (RandomAccessFilterStrategy#useRandomAccess).

My simple test had < 100 docs hence why it never went down this leap
frog approach in my debugging.

Next question though is what is the significance of this  < 100? Is
this supposed to be a heuristic for determining the sparseness of the
filter bit set?

Thanks again
Amit

On Sat, Oct 20, 2012 at 7:12 PM, Amit Nithian  wrote:
> Thanks. So I was poking through this and see that the filters are
> calculated up front and stored as docsets that get intersected and
> passed into Lucene in the filter. The question though is that
> somewhere in the IndexSearcher and somewhere into the scorer it does
> this but I can't quite find where.
>
> Thanks
> Amit
>
> On Sat, Oct 20, 2012 at 5:22 PM, Mikhail Khludnev
>  wrote:
>> Amit,
>>
>> Sure. this method
>> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L796beside
>> some other stuff calculates fq's docset intersection which is
>> supplied into filtered search call
>> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1474
>>
>> You are welcome.
>>
>> On Sun, Oct 21, 2012 at 12:00 AM, Amit Nithian  wrote:
>>
>>> Hi all,
>>>
>>> Quick question. I've been reading up on the filter query and how it's
>>> implemented and the multiple articles I see keep referring to this
>>> notion of leap frogging and filter query execution in parallel with
>>> the main query. Question: Can someone point me to the code that does
>>> this so I can better understand?
>>>
>>> Thanks!
>>> Amit
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Tech Lead
>> Grid Dynamics
>>
>> 
>>  


Re: Solr Partial word search in a sentance.

2012-10-20 Thread Amit Nithian
On the surface this looks like you could use the minimum should match
feature of the dismax handler and alter that behavior depending on
whether or not the search is your main search or your fallback search
as you described in your (c) case.

On Sat, Oct 20, 2012 at 1:13 AM, Uma Mahesh  wrote:
> Hi All,I am new to SOLR. we have few requirements as following below.
> a) "*MatchAll*" : match all of the search terms supplied by the user. That
> is, all results need to satisfy the query (term1 AND term2 AND term3) ...
> etc.
> b) "*MatchPartial*" : match just some of the search terms supplied by the
> user where three or more terms are used. So, for instance, for a search
> query that includes five terms we would want to return all results that
> contain a minimum of three matches.
> c) "*MatchAllPartial*" : match all search terms (i.e. "MatchAll" above) BUT
> if that fails to return any results then fall back to matching just some (at
> least N-2) of the search terms (i.e. "MatchPartial" above). This strategy
> implies that each search issued by the end user could theoretically mean
> that two Solr queries are made: the first will use the "MatchAll" strategy
> and (if that returns no hits) the second will use the "MatchPartial"
> strategy. The logic required to manage this will need to be added to custom
> hybris code.
> d) "*MatchAny*" : match at least one search term supplied by the user. All
> results need to satisfy the query (term1 OR term2 OR term3) ... etc.
> e) "*MatchAllAny*" : match all search terms (i.e. "MatchAll" above) BUT if
> that fails to return any results then fall back to matching at least one
> search term. Again, this suggests that each user search could theoretically
> require two Solr queries to be made with custom hybris code responsible for
> managing the outcome.
> Our requirement is to have a separate solrconfig.xml to each strategies.
> Client would be changing the strategy  rarely.
> I guess for match all I can use dismax query handler.
> For MatchPartial I am not sure whether Ngramtokenizer
> How can we implement *MatchAllPartial* and *MatchAllAny*??
> Can any one provide me some inputs.
> Thanks
> Mahesh
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Partial-word-search-in-a-sentance-tp4014895.html
> Sent from the Solr - User mailing list archive at Nabble.com.


SOLR capacity planning and Disaster relief

2012-10-20 Thread Worthy LaFollette
CAVEAT: I am a nubie w/r to SOLR (some Lucene experience, but not SOLR
itself.  Trying to come up to speed.


What have you all done w/r to SOLR capacity planning and disaster relief?

I am curious to the following metrics:

 - File handles and other ulimit/profile concerns
 - Space calculations (particularly w/r to optimizations, etc.)
 - Taxonomy considerations
 - Single Core vs. Multi-core
 - ?

Also, anyone plan for Disaster relief for SOLR across non-metro data
centers?   Currently not an issue for me, but will be shortly.