date:20101201

Re: Best practice for Delta every 2 Minutes.

2010-12-01 Thread stockii


http://10.1.0.10:8983/solr/payment/dataimport?commad=delta-import&debug=on
dont work. no debug is started =(

thanks. i will try mergefactor=2
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1997595.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: distributed architecture

2010-12-01 Thread Upayavira

Okay, I'll see what I can do. 

Also for what it is worth, if anyone is in London tomorrow, I'm giving a
presentation which covers this topic at the (free) Online Information
2010 exhibition at Kensington Olympia, at 3:20pm. Anyone interested is
welcome to come along. I believe we're hoping to video it, so if
successful, I expect it'll get put online somewhere.

Upayavira

On Wed, 01 Dec 2010 03:44 +, "Jayant Das" 
wrote:
> 
> Hi, A diagram will be very much appreciated.
> Thanks,
> Jayant
>  
> > From: u...@odoko.co.uk
> > To: solr-user@lucene.apache.org
> > Subject: Re: distributed architecture
> > Date: Wed, 1 Dec 2010 00:39:40 +
> > 
> > I cannot say how mature the code for B) is, but it is not yet included
> > in a release.
> > 
> > If you want the ability to distribute content across multiple nodes (due
> > to volume) and want resilience, then use both.
> > 
> > I've had one setup where we have two master servers, each with four
> > cores. Then we have two pairs of slaves. Each pair mirrors the masters,
> > so we have two hosts covering each of our cores.
> > 
> > Then comes the complicated bit to explain...
> > 
> > Each of these four slave hosts had a core that was configured with a
> > hardwired "shards" request parameter, which pointed to each of our
> > shards. Actually, it pointed to VIPs on a load balancer. Those two VIPs
> > then balanced across each of our pair of hosts.
> > 
> > Then, put all four of these servers behind another VIP, and we had a
> > single address we could push requests to, for sharded, and resilient
> > search.
> > 
> > Now if that doesn't make any sense, let me know and I'll have another go
> > at explaining it (or even attempt a diagram).
> > 
> > Upayavira
> > 
> > On Tue, 30 Nov 2010 13:27 -0800, "Cinquini, Luca (3880)"
> >  wrote:
> > > Hi,
> > > I'd like to know if anybody has suggestions/opinions on what is currently 
> > > the best architecture for a distributed search system using Solr. The use 
> > > case is that of a system composed
> > > of N indexes, each hosted on a separate machine, each index containing
> > > unique content.
> > > 
> > > Options that I know of are:
> > > 
> > > A) Using Solr distributed search
> > > B) Using Solr + Zookeeper integration
> > > C) Using replication, i.e. each node replicates all the others
> > > 
> > > It seems like options A) and B) would suffer from a fault-tolerance
> > > standpoint: if any of the nodes goes down, the search won't -at this
> > > time- return partial results, but instead report an exception.
> > > Option C) would provide fault tolerance, at least for any search
> > > initiated at a node that is available, but would incur into a large
> > > replication overhead.
> > > 
> > > Did I get any of the above wrong, or does somebody have some insight on
> > > what is the best system architecture for this use case ?
> > > 
> > > thanks in advance,
> > > Luca
>

Re: distributed architecture

2010-12-01 Thread Upayavira

On Tue, 30 Nov 2010 23:11 -0800, "Dennis Gearon" 
wrote:
> Wow, would you put a diagram somewhere up on the Solr site?

> Or, here, and I will put it somewhere there.

I'll see what I can do to make a diagram.

> And, what is a VIP?

Virtual IP. It is what a load balancer uses. You assign a 'virtual IP'
to your load balancer, and it is responsible for forwarding traffic to
that IP to one of the hosts in that particular pool.

Upayavira

Re: Dinamically change master

2010-12-01 Thread Upayavira

Note, all extracted from http://wiki.apache.org/solr/SolrReplication

You'd put:




startup
commit



into every box you want to be able to act as a master, then use:

http://slave_host:port/solr/replication?command=fetchindex&masterUrl=

As the above page says better than I can, "It is possible to pass on
extra attribute 'masterUrl' or other attributes like 'compression' (or
any other parameter which is specified in the  tag) to
do a one time replication from a master. This obviates the need for
hardcoding the master in the slave."

HTH, Upayavira

On Wed, 01 Dec 2010 06:24 +0100, "Tommaso Teofili"
 wrote:
> Hi Upayavira,
> this is a good start for solving my problem, can you please tell how does
> such a replication URL look like?
> Thanks,
> Tommaso
> 
> 2010/12/1 Upayavira 
> 
> > Hi Tommaso,
> >
> > I believe you can tell each server to act as a master (which means it
> > can have its indexes pulled from it).
> >
> > You can then include the master hostname in the URL that triggers a
> > replication process. Thus, if you triggered replication from outside
> > solr, you'd have control over which master you pull from.
> >
> > Does this answer your question?
> >
> > Upayavira
> >
> >
> > On Tue, 30 Nov 2010 09:18 -0800, "Ken Krugler"
> >  wrote:
> > > Hi Tommaso,
> > >
> > > On Nov 30, 2010, at 7:41am, Tommaso Teofili wrote:
> > >
> > > > Hi all,
> > > >
> > > > in a replication environment if the host where the master is running
> > > > goes
> > > > down for some reason, is there a way to communicate to the slaves to
> > > > point
> > > > to a different (backup) master without manually changing
> > > > configuration (and
> > > > restarting the slaves or their cores)?
> > > >
> > > > Basically I'd like to be able to change the replication master
> > > > dinamically
> > > > inside the slaves.
> > > >
> > > > Do you have any idea of how this could be achieved?
> > >
> > > One common approach is to use VIP (virtual IP) support provided by
> > > load balancers.
> > >
> > > Your slaves are configured to use a VIP to talk to the master, so that
> > > it's easy to dynamically change which master they use, via updates to
> > > the load balancer config.
> > >
> > > -- Ken
> > >
> > > --
> > > Ken Krugler
> > > +1 530-210-6378
> > > http://bixolabs.com
> > > e l a s t i c   w e b   m i n i n g
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Re: Dinamically change master

2010-12-01 Thread Tommaso Teofili

Thanks Upayavira, that sounds very good.

p.s.:
I read that page some weeks ago and didn't get back to check on it.


2010/12/1 Upayavira 

> Note, all extracted from http://wiki.apache.org/solr/SolrReplication
>
> You'd put:
>
> 
>
>
>startup
>commit
>
> 
>
> into every box you want to be able to act as a master, then use:
>
> http://slave_host:port/solr/replication?command=fetchindex&masterUrl= master URL>
>
> As the above page says better than I can, "It is possible to pass on
> extra attribute 'masterUrl' or other attributes like 'compression' (or
> any other parameter which is specified in the  tag) to
> do a one time replication from a master. This obviates the need for
> hardcoding the master in the slave."
>
> HTH, Upayavira
>
> On Wed, 01 Dec 2010 06:24 +0100, "Tommaso Teofili"
>  wrote:
> > Hi Upayavira,
> > this is a good start for solving my problem, can you please tell how does
> > such a replication URL look like?
> > Thanks,
> > Tommaso
> >
> > 2010/12/1 Upayavira 
> >
> > > Hi Tommaso,
> > >
> > > I believe you can tell each server to act as a master (which means it
> > > can have its indexes pulled from it).
> > >
> > > You can then include the master hostname in the URL that triggers a
> > > replication process. Thus, if you triggered replication from outside
> > > solr, you'd have control over which master you pull from.
> > >
> > > Does this answer your question?
> > >
> > > Upayavira
> > >
> > >
> > > On Tue, 30 Nov 2010 09:18 -0800, "Ken Krugler"
> > >  wrote:
> > > > Hi Tommaso,
> > > >
> > > > On Nov 30, 2010, at 7:41am, Tommaso Teofili wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > in a replication environment if the host where the master is
> running
> > > > > goes
> > > > > down for some reason, is there a way to communicate to the slaves
> to
> > > > > point
> > > > > to a different (backup) master without manually changing
> > > > > configuration (and
> > > > > restarting the slaves or their cores)?
> > > > >
> > > > > Basically I'd like to be able to change the replication master
> > > > > dinamically
> > > > > inside the slaves.
> > > > >
> > > > > Do you have any idea of how this could be achieved?
> > > >
> > > > One common approach is to use VIP (virtual IP) support provided by
> > > > load balancers.
> > > >
> > > > Your slaves are configured to use a VIP to talk to the master, so
> that
> > > > it's easy to dynamically change which master they use, via updates to
> > > > the load balancer config.
> > > >
> > > > -- Ken
> > > >
> > > > --
> > > > Ken Krugler
> > > > +1 530-210-6378
> > > > http://bixolabs.com
> > > > e l a s t i c   w e b   m i n i n g
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: ArrayIndexOutOfBoundsException in sort

2010-12-01 Thread Jerry Li

sorry for lost, following is my schema.xml config and I use IKTokenizer for
Chinese charactor



   
  








  
  







  



   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


id

text

   
   


On Wed, Dec 1, 2010 at 2:50 PM, Gora Mohanty  wrote:

> On Wed, Dec 1, 2010 at 10:56 AM, Jerry Li  wrote:
> > Hi team
> >
> > My solr version is 1.4
> > There is an ArrayIndexOutOfBoundsException when i sort one field and the
> > following is my code and log info,
> > any help will be appreciated.
> >
> > Code:
> >
> >SolrQuery query = new SolrQuery();
> >query.setSortField("author", ORDER.desc);
> [...]
>
> Please show us how the field "author" defined in your
> schema.xml. Sorting has to be done on a non-tokenized
> field, e.g., a StrField.
>
> Regards,
> Gora
>



-- 

Best Regards.
Jerry. Li | 李宗杰


Re: ArrayIndexOutOfBoundsException in sort

2010-12-01 Thread Jerry Li

Hi

It seems work fine again after I change "author" field type from text to
string, could anybody give some info about it? very appriciated.




On Wed, Dec 1, 2010 at 5:20 PM, Jerry Li  wrote:

> sorry for lost, following is my schema.xml config and I use IKTokenizer for
> Chinese charactor
>
>
>
> positionIncrementGap="100">
>   
>  isMaxWordLength="false"/>
> 
> 
> 
>  ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumb
> ers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
>   
>   
>  isMaxWordLength="true"/>
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumb
> ers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
>   
> 
>
>
> required="true" />
> required="true" />
> omitNorms="true" required="true" />
> compressed="true" omitNorms="true" required="true" />
> required="true" default=" "/>
> required="true" default=" "/>
> required="true" />
> required="true" />
> required="true" />
> required="true"/>
> required="true"/>
> required="true" />
> required="true" />
> required="true" />
> required="true" />
> required="true"/>
> required="true" />
> multiValued="true"/>
>
>
> id
>
> text
>
>
>
>
>
>
> On Wed, Dec 1, 2010 at 2:50 PM, Gora Mohanty  wrote:
>
>> On Wed, Dec 1, 2010 at 10:56 AM, Jerry Li  wrote:
>> > Hi team
>> >
>> > My solr version is 1.4
>> > There is an ArrayIndexOutOfBoundsException when i sort one field and the
>> > following is my code and log info,
>> > any help will be appreciated.
>> >
>> > Code:
>> >
>> >SolrQuery query = new SolrQuery();
>> >query.setSortField("author", ORDER.desc);
>> [...]
>>
>> Please show us how the field "author" defined in your
>> schema.xml. Sorting has to be done on a non-tokenized
>> field, e.g., a StrField.
>>
>> Regards,
>> Gora
>>
>
>
>
> --
>
> Best Regards.
> Jerry. Li | 李宗杰
> 
>



-- 

Best Regards.
Jerry. Li | 李宗杰


Spatial Search

2010-12-01 Thread Aisha Zafar

Hi ,

I am a newbie of solr. I found it really interesting specially spetial search. 
I am very interested to go in its depth but i am facing some problem to use it 
as i have 1.4.1 version installed on my machine but the spetial search is a 
feature of 4.0 version which is not released yet. I have also read somewhere 
that we can use a patch for this purpose. As i am a newbie I dont know how to 
install the patch and from where to download it. If anyone could help me i'll 
be very thankful. 

thanks in advance and bye

Troubles with forming query for solr.

2010-12-01 Thread kolesman


Hi,

I have some troubles with forming query for solr.

Here is my task :
I'm indexing objects with 3 fields, for example {field1, field2, filed3}
In solr's response I want to get object in special order :
1. Firstly I want to get objects where all 3 fields are matched
2. Then I want to get objects where ONLY field1 and field2 are matched
3. And finnally I want to get objects where ONLY field2 and field3 are
matched.

Could your explain me how to form query for my task?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Troubles-with-forming-query-for-solr-tp1996630p1996630.html
Sent from the Solr - User mailing list archive at Nabble.com.

schema design for related fields

2010-12-01 Thread lee carroll

Hi

I've built a schema for a proof of concept and it is all working fairly
fine, niave maybe but fine.
However I think we might run into trouble in the future if we ever use
facets.

The data models train destination city routes from a origin city:
Doc:City
Name: cityname [uniq key]
CityType: city type values [nine possible values so good for faceting]
... [other city attricbutes which relate directy to the doc unique key]
all have limited vocab so good for faceting
FareJanStandard:cheapest standard fare in january(float value)
FareJanFirst:cheapest first class fare in january(float value)
FareFebStandard:cheapest standard fare in feb(float value)
FareFebFirst:cheapest first fare in feb(float value)
. etc

The question is how would i best facet fare price? The desire is to return

number of citys with jan prices in a set of ranges
etc
number of citys with first prices in a set of ranges
etc

install is 1.4.1 running in weblogic

Any ideas ?



Lee C

Re: ArrayIndexOutOfBoundsException for query with rows=0 and sort param

2010-12-01 Thread Martin Grotzke

On Tue, Nov 30, 2010 at 7:51 PM, Martin Grotzke
 wrote:
> On Tue, Nov 30, 2010 at 3:09 PM, Yonik Seeley
>  wrote:
>> On Tue, Nov 30, 2010 at 8:24 AM, Martin Grotzke
>>  wrote:
>>> Still I'm wondering, why this issue does not occur with the plain
>>> example solr setup with 2 indexed docs. Any explanation?
>>
>> It's an old option you have in your solrconfig.xml that causes a
>> different code path to be followed in Solr:
>>
>>   
>>    true
>>
>> Most apps would be better off commenting that out or setting it to
>> false.  It only makes sense when a high number of queries will be
>> duplicated, but with different sorts.
>
> Great, this sounds really promising, would be a very easy fix. I need
> to check this tomorrow on our test/integration server if changing this
> does the trick for us.
I just verified this fix on our test/integration system and it works - cool!

Thanx a lot for this hint,
cheers,
Martin

Re: SOLR for Log analysis feasibility

2010-12-01 Thread phoey


my thoughts exactly that it may seem fairly straightforward but i fear for
when a client wants a perfectly reasonable new feature to be added to their
report and SOLR simply cannot support this feature. 

i am hoping we wont have any real issues with scalability as Loggly because
we dont index and store large documents of data within SOLR. Most of our
documents will be very small.

Does anyone have any experience with using field collapsing in a production
environment?

thank you for all your replies. 

Joe

 

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-for-Log-analysis-feasibility-tp1992202p1998360.html
Sent from the Solr - User mailing list archive at Nabble.com.

send XML multiValued Field Solr-PHP-Client

2010-12-01 Thread stockii


Hello.

do anyone using Solr-PHP-Client ? 

how are you using mutltivalued fields with the method addFields() ?

solr says to me SCHWERWIEGEND: java.lang.NumberFormatException: empty String

when i send a raw xml like this:

24038608
778
reason1
reason1


in schema i defined: 



why dont work this ? =(
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/send-XML-multiValued-Field-Solr-PHP-Client-tp1998370p1998370.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: QueryNorm and FieldNorm

2010-12-01 Thread Gastone Penzo

Thanx for the answer.
Is it possible to remove the QueryNorm??
so all the bf boost became an add of the solr score??

omitNorm is about fieldNorm or queryNorm??

thanx

Gastone

2010/11/30 Jayendra Patil 

> fieldNorm is the combination of length of the field with index and query
> time boosts.
>
>   1. lengthNorm = measure of the importance of a term according to the
>  total number of terms in the field
> 1. Implementation: 1/sqrt(numTerms)
> 2. Implication: a term matched in fields with less terms have a
> higher score
> 3. Rationale: a term in a field with less terms is more important
> than one with more
>  2. boost (index) = boost of the field at index-time
> 1. Index time boost specified. The fieldNorm value in the score
>would include the same.
> 3. boost (query) = boost of the field at query-time
>
>
> bf is the query time boost for a field and should affect fieldNorm value.
>
> queryNorm is just a normalization factor so that queries can be compared
> and
> will differ based on query and results
>
>   1. queryNorm is not related to the relevance of the document, but rather
>  tries to make scores between different queries comparable. It is
> implemented
>  as 1/sqrt(sumOfSquaredWeights)
>
>
> You should not be bothered about queryNorm, as for a query it will have the
> same value for all the results.
>
> Regards,
> Jayendra
>
> On Tue, Nov 30, 2010 at 9:37 AM, Gastone Penzo  >wrote:
>
> > Hello,
> > someone can explain the difference between queryNorm and FieldNorm in
> > debugQuery??
> > why if i push one bf boost up, the queryNorm goes down??
> > i made some modifies..before the situation was different. why??
> > thanx
> >
> > --
> > Gastone Penzo
> >
>



-- 
Gastone Penzo

Re: distributed architecture

2010-12-01 Thread Peter Karich


 Hi,

also take a look at solandra:

https://github.com/tjake/Lucandra/tree/solandra

I don't have it in prod yet but regarding administration overhead it 
looks very promising.
And you'll get some other neat features like (soft) real time, for free. 
So its same like A) +  C) + X) - Y) ;-)


Regards,
Peter.



Hi,
I'd like to know if anybody has suggestions/opinions on what is 
currently the best architecture for a distributed search system using Solr. The 
use case is that of a system composed
of N indexes, each hosted on a separate machine, each index containing unique 
content.

Options that I know of are:

A) Using Solr distributed search
B) Using Solr + Zookeeper integration
C) Using replication, i.e. each node replicates all the others

It seems like options A) and B) would suffer from a fault-tolerance standpoint: 
if any of the nodes goes down, the search won't -at this time- return partial 
results, but instead report an exception.
Option C) would provide fault tolerance, at least for any search initiated at a 
node that is available, but would incur into a large replication overhead.

Did I get any of the above wrong, or does somebody have some insight on what is 
the best system architecture for this use case ?

thanks in advance,
Luca



--
http://jetwick.com twitter search prototype

Re : Spatial Search

2010-12-01 Thread js . vachon

check jteam's spatial search plugin. 
very easy to install


Aisha Zafar  a écrit

> Hi ,
> 
> I am a newbie of solr. I found it really interesting specially spetial 
> search. I am very interested to go in its depth but i am facing some problem 
> to use it as i have 1.4.1 version installed on my machine but the spetial 
> search is a feature of 4.0 version which is not released yet. I have also 
> read somewhere that we can use a patch for this purpose. As i am a newbie I 
> dont know how to install the patch and from where to download it. If anyone 
> could help me i'll be very thankful. 
> 
> thanks in advance and bye
> 
> 
> 
> 


Cet e-mail a été envoyé depuis un Archos 7.

Re: Solr DataImportHandler (DIH) and Cassandra

2010-12-01 Thread David Stuart

This is good timing I am/was just to embark on a spike if anyone is keen to 
help out


On 30 Nov 2010, at 00:37, Mark wrote:

> The DataSource subclass route is what I will probably be interested in. Are 
> there are working examples of this already out there?
> 
> On 11/29/10 12:32 PM, Aaron Morton wrote:
>> AFAIK there is nothing pre-written to pull the data out for you.
>> 
>> You should be able to create your DataSource sub class 
>> http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/DataSource.html
>>  Using the Hector java library to pull data from Cassandra.
>> 
>> I'm guessing you will need to consider how to perform delta imports. Perhaps 
>> using the secondary indexes in 0.7* , or maintaining your own queues or 
>> indexes to know what has changed.
>> 
>> There is also the Lucandra project, not exactly what your after but may be 
>> of interest anyway https://github.com/tjake/Lucandra
>> 
>> Hope that helps.
>> Aaron
>> 
>> 
>> On 30 Nov, 2010,at 05:04 AM, Mark  wrote:
>> 
>>> Is there anyway to use DIH to import from Cassandra? Thanks

Re: ArrayIndexOutOfBoundsException in sort

2010-12-01 Thread Ahmet Arslan

> It seems work fine again after I change "author" field type
> from text to
> string, could anybody give some info about it? very
> appriciated.

http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F

And also see Erick's explanation 
http://search-lucene.com/m/7fnj1TtNde/sort+on+a+tokenized+field&subj=Re+Solr+sorting+problem

Re: [PECL-DEV] Re: PHP Solr API

2010-12-01 Thread Stefan Matheis

Hi again,

actually trying to implement spellcheck on a different way, and had the idea
to access /solr/spellcheck to get all required data, before executing the
final query to /solr/select - but, that seemed to be impossible - since
there is no configuration option to change the /select part of the url? the
part before can be configure through 'path', but nothing else.

maybe that will be an idea to allow this part of the url to be configured,
in what-ever way?

Regards
Stefan

Re: [PECL-DEV] Re: PHP Solr API

2010-12-01 Thread Stefan Matheis

oooh, sorry - used the wrong thread for my suggestion ... please, just
ignore this :)

On Wed, Dec 1, 2010 at 2:01 PM, Stefan Matheis <
matheis.ste...@googlemail.com> wrote:

> Hi again,
>
> actually trying to implement spellcheck on a different way, and had the
> idea to access /solr/spellcheck to get all required data, before executing
> the final query to /solr/select - but, that seemed to be impossible - since
> there is no configuration option to change the /select part of the url? the
> part before can be configure through 'path', but nothing else.
>
> maybe that will be an idea to allow this part of the url to be configured,
> in what-ever way?
>
> Regards
> Stefan
>

Re: Failover setup (is this a bad idea)

2010-12-01 Thread robo -

I agree with the Master with multiple slaves setup.  Very easy using
the built-in java setup in 1.4.1.  When we set this up it made our
developers think about how we were writing to Solr.  We were using a
Delta Import Handler (DIH?) for most writes but our app was also
writing 'deletes' directly to Solr.  Since we wanted to load balance
the Slaves we couldn't have the app writing to the Slaves.  Once we
discussed the Master/Slave setup with our developers we found all
areas where we were writing in our app and moved/centralized those
into the DIH. Now the app only does queries against the load balanced
slaves while the Master is used for DIH and backups only.

Thanks,

robo

On Tue, Nov 30, 2010 at 7:58 AM, Jayendra Patil
 wrote:
> Rather have a Master and multiple Slave combination, with master only being
> used for writes and slaves used for reads.
> Master to Slave replication is easily configurable.
>
> Two Solr instances sharing the same index is not at all good idea with both
> writing to the same index.
>
> Regards,
> Jayendra
>
> On Tue, Nov 30, 2010 at 7:13 AM, Keith Pope <
> keith.p...@inflightproductions.com> wrote:
>
>> Hi,
>>
>> I have a windows cluster that I would like to install Solr onto, there are
>> two nodes that provide basic failover. I was thinking of this setup:
>>
>> Tomcat installed as win service
>> Two solr instances sharing the same index
>>
>> The second instance would take over when the first fails, so you should
>> never get two writes/reads at once.
>>
>> Is this a bad idea? Would I end up corrupting my index?
>>
>> Thx
>>
>> Keith
>>
>>
>>
>> -
>> Registered Office: 15 Stukeley Street, London WC2B 5LT, England.
>> Registered in England number 1421223
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise private information. If you have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the email by you is prohibited. Please note that
>> the information provided in this e-mail is in any case not legally binding;
>> all committing statements require legally binding signatures.
>>
>>
>> http://www.inflightproductions.com
>>
>>
>>
>>
>

Re: schema design for related fields

2010-12-01 Thread Erick Erickson

I'd think that facet.query would work for you, something like:
&facet=true&facet.query=FareJanStandard:[price1 TO
price2]&facet.query:fareJanStandard[price2 TO price3]
You can string as many facet.query clauses as you want, across as many
fields as you want, they're all
independent and will get their own sections in the response.

Best
Erick

On Wed, Dec 1, 2010 at 4:55 AM, lee carroll wrote:

> Hi
>
> I've built a schema for a proof of concept and it is all working fairly
> fine, niave maybe but fine.
> However I think we might run into trouble in the future if we ever use
> facets.
>
> The data models train destination city routes from a origin city:
> Doc:City
>Name: cityname [uniq key]
>CityType: city type values [nine possible values so good for faceting]
>... [other city attricbutes which relate directy to the doc unique key]
> all have limited vocab so good for faceting
>FareJanStandard:cheapest standard fare in january(float value)
>FareJanFirst:cheapest first class fare in january(float value)
>FareFebStandard:cheapest standard fare in feb(float value)
>FareFebFirst:cheapest first fare in feb(float value)
>. etc
>
> The question is how would i best facet fare price? The desire is to return
>
> number of citys with jan prices in a set of ranges
> etc
> number of citys with first prices in a set of ranges
> etc
>
> install is 1.4.1 running in weblogic
>
> Any ideas ?
>
>
>
> Lee C
>

Re: Spatial Search

2010-12-01 Thread Erick Erickson

1.4.1 spatial is pretty much superseded by "geospatial" in the current code,
you can
download a nightly build from here:
https://hudson.apache.org/hudson/

Scroll down to "Solr-trunk" and pick a nightly build that suits you. Follow
the link through
"build artifacts" and checkout/solr/dist and you'll find the zip/tar files.

Hudson is reporting some kinda flaky "failures", but if you look at the
build results you
can determine whether you care. For instance, the Dec-1 build has a red
ball, but
all the tests pass!

Here's a good place to start with geospatial:
http://wiki.apache.org/solr/SpatialSearch

Best
Erick

On Wed, Dec 1, 2010 at 2:35 AM, Aisha Zafar  wrote:

> Hi ,
>
> I am a newbie of solr. I found it really interesting specially spetial
> search. I am very interested to go in its depth but i am facing some problem
> to use it as i have 1.4.1 version installed on my machine but the spetial
> search is a feature of 4.0 version which is not released yet. I have also
> read somewhere that we can use a patch for this purpose. As i am a newbie I
> dont know how to install the patch and from where to download it. If anyone
> could help me i'll be very thankful.
>
> thanks in advance and bye
>
>
>
>
>

Re: schema design for related fields

2010-12-01 Thread lee carroll

Hi Erick,
so if i understand you we could do something like:

if Jan is selected in the user interface and we have 10 price ranges

query would be 20 cluases in the query (10 * 2 fare clases)

if first is selected in the user interface and we have 10 price ranges
query would be 120 cluases (12 months * 10 price ranges)

if first and jan selected with 10 price ranges
query would be 10 cluases

if we required facets to be returned for all price combinations we'd need to
supply
240 cluases

the user interface would also need to collate the individual fields into
meaningful aggragates for the user (ie numbers by month, numbers by fare
class)

have I understood or missed the point (i usually have)




On 1 December 2010 15:00, Erick Erickson  wrote:

> I'd think that facet.query would work for you, something like:
> &facet=true&facet.query=FareJanStandard:[price1 TO
> price2]&facet.query:fareJanStandard[price2 TO price3]
> You can string as many facet.query clauses as you want, across as many
> fields as you want, they're all
> independent and will get their own sections in the response.
>
> Best
> Erick
>
> On Wed, Dec 1, 2010 at 4:55 AM, lee carroll  >wrote:
>
> > Hi
> >
> > I've built a schema for a proof of concept and it is all working fairly
> > fine, niave maybe but fine.
> > However I think we might run into trouble in the future if we ever use
> > facets.
> >
> > The data models train destination city routes from a origin city:
> > Doc:City
> >Name: cityname [uniq key]
> >CityType: city type values [nine possible values so good for faceting]
> >... [other city attricbutes which relate directy to the doc unique
> key]
> > all have limited vocab so good for faceting
> >FareJanStandard:cheapest standard fare in january(float value)
> >FareJanFirst:cheapest first class fare in january(float value)
> >FareFebStandard:cheapest standard fare in feb(float value)
> >FareFebFirst:cheapest first fare in feb(float value)
> >. etc
> >
> > The question is how would i best facet fare price? The desire is to
> return
> >
> > number of citys with jan prices in a set of ranges
> > etc
> > number of citys with first prices in a set of ranges
> > etc
> >
> > install is 1.4.1 running in weblogic
> >
> > Any ideas ?
> >
> >
> >
> > Lee C
> >
>

Re: Best practice for Delta every 2 Minutes.

2010-12-01 Thread Jonathan Rochkind

If your index warmings take longer than two minutes, but you're doing a 
commit every two minutes -- you're going to run into trouble with 
overlapping index preperations, eventually leading to an OOM.  Could 
this be it?


On 11/30/2010 11:36 AM, Erick Erickson wrote:

I don't know, you'll have to debug it to see if it's the thing that takes so
long. Solr
should be able to handle 1,200 updates in a very short time unless there's
something
else going on, like you're committing after every update or something.

This may help you track down performance with DIH

http://wiki.apache.org/solr/DataImportHandler#interactive

Best
Erick

On Tue, Nov 30, 2010 at 9:01 AM, stockii  wrote:


how do you think is the deltaQuery better ? XD
--
View this message in context:
http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Good example of multiple tokenizers for a single field

2010-12-01 Thread Jonathan Rochkind


On 11/29/2010 5:43 PM, Robert Muir wrote:

On Mon, Nov 29, 2010 at 5:41 PM, Jonathan Rochkind  wrote:

* As a tokenizer, I use the WhitespaceTokenizer.

* Then I apply a custom filter that looks for CJK chars, and re-tokenizes
any CJK chars into one-token-per-char. This custom filter was written by
someone other than me; it is open source; but I'm not sure if it's actually
in a public repo, or how well documented it is.  I can put you in touch with
the author to try and ask. There may also be a more standard filter other
than the custom one I'm using that does the same thing?


You are describing what standardtokenizer does.



Wait, standardtokenizer already handles CJK and will put each CJK char 
into it's own token?  Really? I had no idea!  Is that documented 
anywhere, or you just have to look at the source to see it?


I had assumed that standardtokenizer didn't have any special handling of 
bytes known to be UTF-8 CJK, because that wasn't mentioned in the 
documentation -- but it does?   That would be convenient and not require 
my custom code.


Jonathan

Re: Good example of multiple tokenizers for a single field

2010-12-01 Thread Robert Muir

(Jonathan, I apologize for emailing you twice, i meant to hit reply-all)

On Wed, Dec 1, 2010 at 10:49 AM, Jonathan Rochkind  wrote:
>
> Wait, standardtokenizer already handles CJK and will put each CJK char into
> it's own token?  Really? I had no idea!  Is that documented anywhere, or you
> just have to look at the source to see it?
>

Yes, you are right, the documentation should have been more explicit:
in previous releases it doesn't say anything about how it tokenizes
CJK in the documentation. But it does do them this way, and tagged
them as "CJ" token type.

I think the documentation issue is "fixed" in branch_3x and trunk:

 * As of Lucene version 3.1, this class implements the Word Break rules from the
 * Unicode Text Segmentation algorithm, as specified in
 * http://unicode.org/reports/tr29/";>Unicode Standard Annex #29.
(from 
http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java)

So you can read the UAX#29 report and then you know how it tokenizes text
You can also just use this demo app to see how the new one works:
http://unicode.org/cldr/utility/breaks.jsp (choose "Word")

Re: schema design for related fields

2010-12-01 Thread Geert-Jan Brits

"if first is selected in the user interface and we have 10 price ranges
query would be 120 cluases (12 months * 10 price ranges)"

What would you intend to do with the returned facet-results in this
situation? I doubt you want to display 12 categories (1 for each month) ?

When a user hasn't selected a date, perhaps it would be more useful to show
the cheapest fare regardless of month and facet on that?

This would involve introducing 2 new fields:
FareDateDontCareStandard, FareDateDontCareFirst

Populate these fields on indexing time, by calculating the cheapest fares
over all months.

This then results in every query having to support at most 20 price ranges
(10 for normal and 10 for first class)

HTH,
Geert-Jan



2010/12/1 lee carroll 

> Hi Erick,
> so if i understand you we could do something like:
>
> if Jan is selected in the user interface and we have 10 price ranges
>
> query would be 20 cluases in the query (10 * 2 fare clases)
>
> if first is selected in the user interface and we have 10 price ranges
> query would be 120 cluases (12 months * 10 price ranges)
>
> if first and jan selected with 10 price ranges
> query would be 10 cluases
>
> if we required facets to be returned for all price combinations we'd need
> to
> supply
> 240 cluases
>
> the user interface would also need to collate the individual fields into
> meaningful aggragates for the user (ie numbers by month, numbers by fare
> class)
>
> have I understood or missed the point (i usually have)
>
>
>
>
> On 1 December 2010 15:00, Erick Erickson  wrote:
>
> > I'd think that facet.query would work for you, something like:
> > &facet=true&facet.query=FareJanStandard:[price1 TO
> > price2]&facet.query:fareJanStandard[price2 TO price3]
> > You can string as many facet.query clauses as you want, across as many
> > fields as you want, they're all
> > independent and will get their own sections in the response.
> >
> > Best
> > Erick
> >
> > On Wed, Dec 1, 2010 at 4:55 AM, lee carroll <
> lee.a.carr...@googlemail.com
> > >wrote:
> >
> > > Hi
> > >
> > > I've built a schema for a proof of concept and it is all working fairly
> > > fine, niave maybe but fine.
> > > However I think we might run into trouble in the future if we ever use
> > > facets.
> > >
> > > The data models train destination city routes from a origin city:
> > > Doc:City
> > >Name: cityname [uniq key]
> > >CityType: city type values [nine possible values so good for
> faceting]
> > >... [other city attricbutes which relate directy to the doc unique
> > key]
> > > all have limited vocab so good for faceting
> > >FareJanStandard:cheapest standard fare in january(float value)
> > >FareJanFirst:cheapest first class fare in january(float value)
> > >FareFebStandard:cheapest standard fare in feb(float value)
> > >FareFebFirst:cheapest first fare in feb(float value)
> > >. etc
> > >
> > > The question is how would i best facet fare price? The desire is to
> > return
> > >
> > > number of citys with jan prices in a set of ranges
> > > etc
> > > number of citys with first prices in a set of ranges
> > > etc
> > >
> > > install is 1.4.1 running in weblogic
> > >
> > > Any ideas ?
> > >
> > >
> > >
> > > Lee C
> > >
> >
>

RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

Does anyone know how to index a pdf file with very big size (more than 100MB)?

Thanks so much,
Xiaohui 
-Original Message-
From: Ma, Xiaohui (NIH/NLM/LHC) [C] 
Sent: Tuesday, November 30, 2010 4:22 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: how to set maxFieldLength to unlimitd

I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files 
again. I also commented out the one in the  section. Unfortunately 
the files are still chopped out if the size of file is more than 20MB.

Any suggestions? I really appreciate your help!
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 30, 2010 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

Set the  value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the  section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
xiao...@mail.nlm.nih.gov> wrote:

> I need index and search some pdf files which are very big (around 1000
> pages each). How can I set maxFieldLength to unlimited?
>
> Thanks so much for your help in advance,
> Xiaohui
>

Re: schema design for related fields

2010-12-01 Thread Erick Erickson

Hmmm, that's getting to be a pretty clunky query sure enough. Now you're
going to
have to insure that HTTP request that long get through and stuff like
that

I'm reaching a bit here, but you can facet on a tokenized field. Although
that's not
often done there's no prohibition against it.

So, what if you had just one field for each city that contained some
abstract
information about your fares etc. Something like
janstdfareclass1 jancheapfareclass3 febstdfareclass6

Now just facet on that field? Not #values# in that field, just the field
itself. You'd then have to make those into human-readable text, but that
would considerably simplify your query. Probably only works if your user is
selecting from pre-defined ranges, if they expect to put in arbitrary ranges
this scheme probably wouldn't work...

Best
Erick

On Wed, Dec 1, 2010 at 10:22 AM, lee carroll
wrote:

> Hi Erick,
> so if i understand you we could do something like:
>
> if Jan is selected in the user interface and we have 10 price ranges
>
> query would be 20 cluases in the query (10 * 2 fare clases)
>
> if first is selected in the user interface and we have 10 price ranges
> query would be 120 cluases (12 months * 10 price ranges)
>
> if first and jan selected with 10 price ranges
> query would be 10 cluases
>
> if we required facets to be returned for all price combinations we'd need
> to
> supply
> 240 cluases
>
> the user interface would also need to collate the individual fields into
> meaningful aggragates for the user (ie numbers by month, numbers by fare
> class)
>
> have I understood or missed the point (i usually have)
>
>
>
>
> On 1 December 2010 15:00, Erick Erickson  wrote:
>
> > I'd think that facet.query would work for you, something like:
> > &facet=true&facet.query=FareJanStandard:[price1 TO
> > price2]&facet.query:fareJanStandard[price2 TO price3]
> > You can string as many facet.query clauses as you want, across as many
> > fields as you want, they're all
> > independent and will get their own sections in the response.
> >
> > Best
> > Erick
> >
> > On Wed, Dec 1, 2010 at 4:55 AM, lee carroll <
> lee.a.carr...@googlemail.com
> > >wrote:
> >
> > > Hi
> > >
> > > I've built a schema for a proof of concept and it is all working fairly
> > > fine, niave maybe but fine.
> > > However I think we might run into trouble in the future if we ever use
> > > facets.
> > >
> > > The data models train destination city routes from a origin city:
> > > Doc:City
> > >Name: cityname [uniq key]
> > >CityType: city type values [nine possible values so good for
> faceting]
> > >... [other city attricbutes which relate directy to the doc unique
> > key]
> > > all have limited vocab so good for faceting
> > >FareJanStandard:cheapest standard fare in january(float value)
> > >FareJanFirst:cheapest first class fare in january(float value)
> > >FareFebStandard:cheapest standard fare in feb(float value)
> > >FareFebFirst:cheapest first fare in feb(float value)
> > >. etc
> > >
> > > The question is how would i best facet fare price? The desire is to
> > return
> > >
> > > number of citys with jan prices in a set of ranges
> > > etc
> > > number of citys with first prices in a set of ranges
> > > etc
> > >
> > > install is 1.4.1 running in weblogic
> > >
> > > Any ideas ?
> > >
> > >
> > >
> > > Lee C
> > >
> >
>

${dataimporter.last_index_time} Format?

2010-12-01 Thread sahid

Hello All,

I have a simple problem;

In my "conf/dataimport.properties" i have "last_index_time" with this
format '%Y-%m-%d %H:%M:%S'
for example: last_index_time=2010-12-01 16\:53\:16.

But when i use this propertie in my data-config.conf the value format
began "%Y-%m-%d";
for example:
url="http://server/_solr/?last_time=${dataimporter.last_index_time}";
make: http://server/_solr/?last_time=2010-12-01

You have an idea for me?

Thank a lot!

-- 
~sahid

RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread jan.kurella

You just can't set it to "unlimited". What you could do, is ignoring the 
positions and put a filter in, that sets the token for all but the first token 
to 0 (means the field length will be just 1, all tokens "stacked" on the first 
position)
You could also break per page, so you put each "page" on a new position.

Jan

>-Original Message-
>From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] [mailto:xiao...@mail.nlm.nih.gov]
>Sent: Dienstag, 30. November 2010 19:49
>To: solr-user@lucene.apache.org; 'solr-user-i...@lucene.apache.org'; 
>'solr-user-...@lucene.apache.org'
>Subject: how to set maxFieldLength to unlimitd
>
>I need index and search some pdf files which are very big (around 1000 pages 
>each). How can I set maxFieldLength to unlimited?
>
>Thanks so much for your help in advance,
>Xiaohui

Re: schema design for related fields

2010-12-01 Thread lee carroll

Geert

The UI would be something like:
user selections
for the facet price
max price: £100
fare class: any

city attributes facet
cityattribute1 etc: xxx

results displayed something like

Facet price
Standard fares [10]
First fares [3]
in Jan [9]
in feb [10]
in march [1]
etc
is this compatible with your approach ?

Erick the price is an interval scale ie a fare can be any value (not high,
low, medium etc)

How sensible would the following approach be
index city docs with fields only related to the city unique key
in the same index also index fare docs which would be something like:
Fare:
cityID: xxx
Fareclass:standard
FareMonth: Jan
FarePrice: 100

the query would be something like:
q=FarePrice:[* TO 100] FareMonth:Jan fl=cityID
returning facets for FareClass and FareMonth. hold on this will not facet
city docs correctly. sorry thasts not going to work.







On 1 December 2010 16:25, Erick Erickson  wrote:

> Hmmm, that's getting to be a pretty clunky query sure enough. Now you're
> going to
> have to insure that HTTP request that long get through and stuff like
> that
>
> I'm reaching a bit here, but you can facet on a tokenized field. Although
> that's not
> often done there's no prohibition against it.
>
> So, what if you had just one field for each city that contained some
> abstract
> information about your fares etc. Something like
> janstdfareclass1 jancheapfareclass3 febstdfareclass6
>
> Now just facet on that field? Not #values# in that field, just the field
> itself. You'd then have to make those into human-readable text, but that
> would considerably simplify your query. Probably only works if your user is
> selecting from pre-defined ranges, if they expect to put in arbitrary
> ranges
> this scheme probably wouldn't work...
>
> Best
> Erick
>
> On Wed, Dec 1, 2010 at 10:22 AM, lee carroll
> wrote:
>
> > Hi Erick,
> > so if i understand you we could do something like:
> >
> > if Jan is selected in the user interface and we have 10 price ranges
> >
> > query would be 20 cluases in the query (10 * 2 fare clases)
> >
> > if first is selected in the user interface and we have 10 price ranges
> > query would be 120 cluases (12 months * 10 price ranges)
> >
> > if first and jan selected with 10 price ranges
> > query would be 10 cluases
> >
> > if we required facets to be returned for all price combinations we'd need
> > to
> > supply
> > 240 cluases
> >
> > the user interface would also need to collate the individual fields into
> > meaningful aggragates for the user (ie numbers by month, numbers by fare
> > class)
> >
> > have I understood or missed the point (i usually have)
> >
> >
> >
> >
> > On 1 December 2010 15:00, Erick Erickson 
> wrote:
> >
> > > I'd think that facet.query would work for you, something like:
> > > &facet=true&facet.query=FareJanStandard:[price1 TO
> > > price2]&facet.query:fareJanStandard[price2 TO price3]
> > > You can string as many facet.query clauses as you want, across as many
> > > fields as you want, they're all
> > > independent and will get their own sections in the response.
> > >
> > > Best
> > > Erick
> > >
> > > On Wed, Dec 1, 2010 at 4:55 AM, lee carroll <
> > lee.a.carr...@googlemail.com
> > > >wrote:
> > >
> > > > Hi
> > > >
> > > > I've built a schema for a proof of concept and it is all working
> fairly
> > > > fine, niave maybe but fine.
> > > > However I think we might run into trouble in the future if we ever
> use
> > > > facets.
> > > >
> > > > The data models train destination city routes from a origin city:
> > > > Doc:City
> > > >Name: cityname [uniq key]
> > > >CityType: city type values [nine possible values so good for
> > faceting]
> > > >... [other city attricbutes which relate directy to the doc unique
> > > key]
> > > > all have limited vocab so good for faceting
> > > >FareJanStandard:cheapest standard fare in january(float value)
> > > >FareJanFirst:cheapest first class fare in january(float value)
> > > >FareFebStandard:cheapest standard fare in feb(float value)
> > > >FareFebFirst:cheapest first fare in feb(float value)
> > > >. etc
> > > >
> > > > The question is how would i best facet fare price? The desire is to
> > > return
> > > >
> > > > number of citys with jan prices in a set of ranges
> > > > etc
> > > > number of citys with first prices in a set of ranges
> > > > etc
> > > >
> > > > install is 1.4.1 running in weblogic
> > > >
> > > > Any ideas ?
> > > >
> > > >
> > > >
> > > > Lee C
> > > >
> > >
> >
>

Solr 3x segments file and deleting index

2010-12-01 Thread Burton-West, Tom

If I want to delete an entire index and start over, in previous versions of 
Solr, you could stop Solr, delete all files in the index directory and restart 
Solr.  Solr would then create empty segments files and you could start 
indexing.   In Solr 3x if I delete all the files in the index  directory I get 
a large stack trace with this error:

org.apache.lucene.index.IndexNotFoundException: no segments* file found

As a workaround, whenever I delete an index (by deleting all files in the index 
directory), I copy the segments files that come with the Solr example to the 
index directory and then restart Solr.

Is this a feature or a bug?   What is the rationale?

Tom

Tom Burton-West

RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

Thanks so much for your replay, Jan. I just found I cannot index pdf files with 
the file size more than 20MB.

I use curl index them, didn't get any error either. Do you have any suggestions 
to index pdf files with more than 20MB?

Thanks,
Xiaohui 

-Original Message-
From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] 
Sent: Wednesday, December 01, 2010 11:30 AM
To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
solr-user-...@lucene.apache.org
Subject: RE: how to set maxFieldLength to unlimitd

You just can't set it to "unlimited". What you could do, is ignoring the 
positions and put a filter in, that sets the token for all but the first token 
to 0 (means the field length will be just 1, all tokens "stacked" on the first 
position)
You could also break per page, so you put each "page" on a new position.

Jan

>-Original Message-
>From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] [mailto:xiao...@mail.nlm.nih.gov]
>Sent: Dienstag, 30. November 2010 19:49
>To: solr-user@lucene.apache.org; 'solr-user-i...@lucene.apache.org'; 
>'solr-user-...@lucene.apache.org'
>Subject: how to set maxFieldLength to unlimitd
>
>I need index and search some pdf files which are very big (around 1000 pages 
>each). How can I set maxFieldLength to unlimited?
>
>Thanks so much for your help in advance,
>Xiaohui

Re: schema design for related fields

2010-12-01 Thread lee carroll

Sorry Geert missed of the price value bit from the user interface so we'd
display

Facet price
Standard fares [10]
First fares [3]

When traveling
in Jan [9]
in feb [10]
in march [1]

Fare Price
0 - 25 :  [20]
25 - 50: [10]
50 - 100 [2]

cheers lee c


On 1 December 2010 17:00, lee carroll  wrote:

> Geert
>
> The UI would be something like:
> user selections
> for the facet price
> max price: £100
> fare class: any
>
> city attributes facet
> cityattribute1 etc: xxx
>
> results displayed something like
>
> Facet price
> Standard fares [10]
> First fares [3]
> in Jan [9]
> in feb [10]
> in march [1]
> etc
> is this compatible with your approach ?
>
> Erick the price is an interval scale ie a fare can be any value (not high,
> low, medium etc)
>
> How sensible would the following approach be
> index city docs with fields only related to the city unique key
> in the same index also index fare docs which would be something like:
> Fare:
> cityID: xxx
> Fareclass:standard
> FareMonth: Jan
> FarePrice: 100
>
> the query would be something like:
> q=FarePrice:[* TO 100] FareMonth:Jan fl=cityID
> returning facets for FareClass and FareMonth. hold on this will not facet
> city docs correctly. sorry thasts not going to work.
>
>
>
>
>
>
>
>
> On 1 December 2010 16:25, Erick Erickson  wrote:
>
>> Hmmm, that's getting to be a pretty clunky query sure enough. Now you're
>> going to
>> have to insure that HTTP request that long get through and stuff like
>> that
>>
>> I'm reaching a bit here, but you can facet on a tokenized field. Although
>> that's not
>> often done there's no prohibition against it.
>>
>> So, what if you had just one field for each city that contained some
>> abstract
>> information about your fares etc. Something like
>> janstdfareclass1 jancheapfareclass3 febstdfareclass6
>>
>> Now just facet on that field? Not #values# in that field, just the field
>> itself. You'd then have to make those into human-readable text, but that
>> would considerably simplify your query. Probably only works if your user
>> is
>> selecting from pre-defined ranges, if they expect to put in arbitrary
>> ranges
>> this scheme probably wouldn't work...
>>
>> Best
>> Erick
>>
>> On Wed, Dec 1, 2010 at 10:22 AM, lee carroll
>> wrote:
>>
>> > Hi Erick,
>> > so if i understand you we could do something like:
>> >
>> > if Jan is selected in the user interface and we have 10 price ranges
>> >
>> > query would be 20 cluases in the query (10 * 2 fare clases)
>> >
>> > if first is selected in the user interface and we have 10 price ranges
>> > query would be 120 cluases (12 months * 10 price ranges)
>> >
>> > if first and jan selected with 10 price ranges
>> > query would be 10 cluases
>> >
>> > if we required facets to be returned for all price combinations we'd
>> need
>> > to
>> > supply
>> > 240 cluases
>> >
>> > the user interface would also need to collate the individual fields into
>> > meaningful aggragates for the user (ie numbers by month, numbers by fare
>> > class)
>> >
>> > have I understood or missed the point (i usually have)
>> >
>> >
>> >
>> >
>> > On 1 December 2010 15:00, Erick Erickson 
>> wrote:
>> >
>> > > I'd think that facet.query would work for you, something like:
>> > > &facet=true&facet.query=FareJanStandard:[price1 TO
>> > > price2]&facet.query:fareJanStandard[price2 TO price3]
>> > > You can string as many facet.query clauses as you want, across as many
>> > > fields as you want, they're all
>> > > independent and will get their own sections in the response.
>> > >
>> > > Best
>> > > Erick
>> > >
>> > > On Wed, Dec 1, 2010 at 4:55 AM, lee carroll <
>> > lee.a.carr...@googlemail.com
>> > > >wrote:
>> > >
>> > > > Hi
>> > > >
>> > > > I've built a schema for a proof of concept and it is all working
>> fairly
>> > > > fine, niave maybe but fine.
>> > > > However I think we might run into trouble in the future if we ever
>> use
>> > > > facets.
>> > > >
>> > > > The data models train destination city routes from a origin city:
>> > > > Doc:City
>> > > >Name: cityname [uniq key]
>> > > >CityType: city type values [nine possible values so good for
>> > faceting]
>> > > >... [other city attricbutes which relate directy to the doc
>> unique
>> > > key]
>> > > > all have limited vocab so good for faceting
>> > > >FareJanStandard:cheapest standard fare in january(float value)
>> > > >FareJanFirst:cheapest first class fare in january(float value)
>> > > >FareFebStandard:cheapest standard fare in feb(float value)
>> > > >FareFebFirst:cheapest first fare in feb(float value)
>> > > >. etc
>> > > >
>> > > > The question is how would i best facet fare price? The desire is to
>> > > return
>> > > >
>> > > > number of citys with jan prices in a set of ranges
>> > > > etc
>> > > > number of citys with first prices in a set of ranges
>> > > > etc
>> > > >
>> > > > install is 1.4.1 running in weblogic
>> > > >
>> > > > Any ideas ?
>> > > >
>> > > >
>> > > >
>> > > > Lee C
>> >

RE: entire farm fails at the same time with OOM issues

2010-12-01 Thread Robert Petersen

It has typically been when query traffic was lowest!  We are at 12 GB heap, so 
I will try to bump it to 14 GB.  We have 64GB main memory installed now.  Here 
is our settings, do these look OK?

export JAVA_OPTS="-Xmx12228m -Xms12228m -XX:+UseConcMarkSweepGC 
-XX:+CMSIncrementalMode"

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Tuesday, November 30, 2010 6:44 PM
To: solr-user@lucene.apache.org
Subject: Re: entire farm fails at the same time with OOM issues

On Tue, Nov 30, 2010 at 6:04 PM, Robert Petersen  wrote:
> My question is this.  Why in the world would all of my slaves, after
> running fine for some days, suddenly all at the exact same minute
> experience OOM heap errors and go dead?

If there is no change in query traffic when this happens, then it's
due to what the index looks like.

My guess is a large index merge happened, which means that when the
searchers re-open on the new index, it requires more memory than
normal (much less can be shared with the previous index).

I'd try bumping the heap a little bit, and then optimizing once a day
during off-peak hours.
If you still get OOM errors, bump the heap a little more.

-Yonik
http://www.lucidimagination.com

Re: distributed architecture

2010-12-01 Thread Cinquini, Luca (3880)

Hi,
thanks all, this has been very instructive. It looks like in the short 
term using a combination of replication and sharding, based on Upayavira's 
setup, might be the safest thing to do, while in the longer term following the 
zookeeper integration and solandra development might provide a more dynamic 
environment and perhaps easier setup.
Please keep coming the good suggestions if you feel like.
thanks again,
Luca

On Dec 1, 2010, at 4:17 AM, Peter Karich wrote:

>  Hi,
> 
> also take a look at solandra:
> 
> https://github.com/tjake/Lucandra/tree/solandra
> 
> I don't have it in prod yet but regarding administration overhead it 
> looks very promising.
> And you'll get some other neat features like (soft) real time, for free. 
> So its same like A) +  C) + X) - Y) ;-)
> 
> Regards,
> Peter.
> 
> 
>> Hi,
>>  I'd like to know if anybody has suggestions/opinions on what is 
>> currently the best architecture for a distributed search system using Solr. 
>> The use case is that of a system composed
>> of N indexes, each hosted on a separate machine, each index containing 
>> unique content.
>> 
>> Options that I know of are:
>> 
>> A) Using Solr distributed search
>> B) Using Solr + Zookeeper integration
>> C) Using replication, i.e. each node replicates all the others
>> 
>> It seems like options A) and B) would suffer from a fault-tolerance 
>> standpoint: if any of the nodes goes down, the search won't -at this time- 
>> return partial results, but instead report an exception.
>> Option C) would provide fault tolerance, at least for any search initiated 
>> at a node that is available, but would incur into a large replication 
>> overhead.
>> 
>> Did I get any of the above wrong, or does somebody have some insight on what 
>> is the best system architecture for this use case ?
>> 
>> thanks in advance,
>> Luca
> 
> 
> -- 
> http://jetwick.com twitter search prototype
>

Re: Good example of multiple tokenizers for a single field

2010-12-01 Thread Jacob Elder

On Wed, Dec 1, 2010 at 11:01 AM, Robert Muir  wrote:

> (Jonathan, I apologize for emailing you twice, i meant to hit reply-all)
>
> On Wed, Dec 1, 2010 at 10:49 AM, Jonathan Rochkind 
> wrote:
> >
> > Wait, standardtokenizer already handles CJK and will put each CJK char
> into
> > it's own token?  Really? I had no idea!  Is that documented anywhere, or
> you
> > just have to look at the source to see it?
> >
>
> Yes, you are right, the documentation should have been more explicit:
> in previous releases it doesn't say anything about how it tokenizes
> CJK in the documentation. But it does do them this way, and tagged
> them as "CJ" token type.
>
> I think the documentation issue is "fixed" in branch_3x and trunk:
>
>  * As of Lucene version 3.1, this class implements the Word Break rules
> from the
>  * Unicode Text Segmentation algorithm, as specified in
>  * http://unicode.org/reports/tr29/";>Unicode Standard Annex
> #29.
> (from
> http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java
> )
>
> So you can read the UAX#29 report and then you know how it tokenizes text
> You can also just use this demo app to see how the new one works:
> http://unicode.org/cldr/utility/breaks.jsp (choose "Word")
>

What does this mean to those of us on Solr 1.4 and Lucene 2.9.3? Does the
current stable StandardTokenizer handle CJK?

-- 
Jacob Elder
@jelder
(646) 535-3379

Re: Good example of multiple tokenizers for a single field

2010-12-01 Thread Robert Muir

On Wed, Dec 1, 2010 at 12:25 PM, Jacob Elder  wrote:
>
> What does this mean to those of us on Solr 1.4 and Lucene 2.9.3? Does the
> current stable StandardTokenizer handle CJK?
>

yes

Re: entire farm fails at the same time with OOM issues

2010-12-01 Thread Ken Krugler



On Nov 30, 2010, at 5:16pm, Robert Petersen wrote:


What would I do with the heap dump though?  Run one of those java heap
analyzers looking for memory leaks or something?  I have no experience
with thoseI saw there was a bug fix in solr 1.4.1 for a 100 byte  
memory
leak occurring on each commit, but it would take thousands of  
commits to

make that add up to anything right?


Typically when I run out of memory in Solr, it's during an index  
update, when the new index searcher is getting warmed up.


Looking at the heap often shows ways to reduce memory requirements,  
e.g. you'll see a really big chunk used for a sorted field.


See http://wiki.apache.org/solr/SolrCaching and http://wiki.apache.org/solr/SolrPerformanceFactors 
 for more details.


-- Ken




-Original Message-
From: Ken Krugler [mailto:kkrugler_li...@transpac.com]
Sent: Tuesday, November 30, 2010 3:12 PM
To: solr-user@lucene.apache.org
Subject: Re: entire farm fails at the same time with OOM issues

Hi Robert,

I'd recommend launching Tomcat with -XX:+HeapDumpOnOutOfMemoryError
and -XX:HeapDumpPath=, so then
you have something to look at versus a Gedankenexperiment :)

-- Ken

On Nov 30, 2010, at 3:04pm, Robert Petersen wrote:


Greetings, we are running one master and four slaves of our multicore
solr setup.  We just served searches for our catalog of 8 million
products with this farm during black Friday and cyber Monday, our
busiest days of the year, and the servers did not break a sweat!
Index
size is about 28GB.

However, twice now recently during a time of low load we have had a
fire
drill where I have seen tomcat/solr fail and become unresponsive  
after

some OOM heap errors.  Solr wouldn't even serve up its admin pages.
I've had to go in and manually knock tomcat out of memory and then
restart it.  These solr slaves are load balanced and the load
balancers
always probe the solr slaves so if they stop serving up searches they
are automatically removed from the load balancer.  When all four
fail at
the same time we have an issue!

My question is this.  Why in the world would all of my slaves, after
running fine for some days, suddenly all at the exact same minute
experience OOM heap errors and go dead?  The load balancer kicks them
all out at the same time each time.  Each slave only talks to the
master
and not to each other, but the master show no errors in the logs at
all.
Something must be triggering this though.  The only other odd thing I
saw in the logs was after the first OOM errors were recorded, the
slaves
started occasionally not being able to get to the master.

This behavior makes me a little nervous...=:-o  eek!





Environment:  Lucid Imagination distro of Solr 1.4 on Tomcat



Platform: RHEL with Sun JRE 1.6.0_18 on dual quad xeon machines with
64GB memory etc etc











+1 530-265-2225






--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g







--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: Solr 3x segments file and deleting index

2010-12-01 Thread Shawn Heisey


On 12/1/2010 10:12 AM, Burton-West, Tom wrote:

If I want to delete an entire index and start over, in previous versions of 
Solr, you could stop Solr, delete all files in the index directory and restart 
Solr.  Solr would then create empty segments files and you could start 
indexing.   In Solr 3x if I delete all the files in the index  directory I get 
a large stack trace with this error:


You have to delete the index directory entirely.  This looks like a 
change in Lucene, not Solr specificially.  If the directory exists, but 
has nothing in it, it throws an exception.  I'll leave the rationale 
question that you also asked to someone who might actually know.  I 
personally think it shouldn't behave this way, but the dev team may 
encountered something that required that the directory either be a valid 
index or not exist at all.


Shawn

Re: schema design for related fields

2010-12-01 Thread Geert-Jan Brits

Ok longer answer than anticipated (and good conceptual practice ;-)

Yeah I belief that would work if I understand correctly that:

'in Jan [9]
in feb [10]
in march [1]'

has nothing to do with pricing, but only with availability?

If so you could seperate it out as two seperate issues:

1. ) showing pricing (based on context)
2. ) showing availabilities (based on context)

For 1.)  you get 39 pricefields ([jan,feb,..,dec,dc] * [standard,first,dc])
note: 'dc' indicates 'don't care.

depending on the context you query the correct pricefield to populate the
price facet-values.
for discussion lets call the fields: _p[fare][date].
IN other words the price field for no preference at all would become: _pdcdc


For 2.) define a multivalued field 'FaresPerDate 'which indicate
availability, which is used to display:

A)
Standard fares [10]
First fares [3]

B)
in Jan [9]
in feb [10]
in march [1]

A) depends on your selection (or dont caring) about a month
B) vice versa depends on your selection (or dont caring)  about a fare type

given all possible date values: [jan,feb,..dec,dontcare]
given all possible fare values:[standard,first,dontcare]

FaresPerDate consists of multiple values per document where each value
indicates the availability of a combination of 'fare' and 'date':
(standardJan,firstJan,DCjan...,standardJan,firstDec,DCdec,standardDC,firstDC,DCDC)
Note that the nr of possible values = 39.

Example:
1. ) the user hasn't selected any preference:

q=*:*&facet.field:FaresPerDate&facet.query=_pdcdc:[0 TO
20]&facet.query=_pdcdc:[20 TO 40], etc.

in the client you have to make sure to select the correct values of
'FaresPerDate' for display:
in this case:

Standard fares [10] --> FaresPerDate.standardDC
First fares [3] --> FaresPerDate.firstDC

in Jan [9] -> FaresPerDate.DCJan
in feb [10] -> FaresPerDate.DCFeb
in march [1]-> FaresPerDate.DCMarch

2) the user has selected January
q=*:*&facet.field:FaresPerDate&fq=FaresPerDate:DCJan&facet.query=_pDCJan:[0
TO 20]&facet.query=_pDCJan:[20 TO 40]

Standard fares [10] --> FaresPerDate.standardJan
First fares [3] --> FaresPerDate.firstJan

in Jan [9] -> FaresPerDate.DCJan
in feb [10] -> FaresPerDate.DCFeb
in march [1]-> FaresPerDate.DCMarch

Hope that helps,
Geert-Jan


2010/12/1 lee carroll 

> Sorry Geert missed of the price value bit from the user interface so we'd
> display
>
> Facet price
> Standard fares [10]
> First fares [3]
>
> When traveling
> in Jan [9]
> in feb [10]
> in march [1]
>
> Fare Price
> 0 - 25 :  [20]
> 25 - 50: [10]
> 50 - 100 [2]
>
> cheers lee c
>
>
> On 1 December 2010 17:00, lee carroll 
> wrote:
>
> > Geert
> >
> > The UI would be something like:
> > user selections
> > for the facet price
> > max price: £100
> > fare class: any
> >
> > city attributes facet
> > cityattribute1 etc: xxx
> >
> > results displayed something like
> >
> > Facet price
> > Standard fares [10]
> > First fares [3]
> > in Jan [9]
> > in feb [10]
> > in march [1]
> > etc
> > is this compatible with your approach ?
> >
> > Erick the price is an interval scale ie a fare can be any value (not
> high,
> > low, medium etc)
> >
> > How sensible would the following approach be
> > index city docs with fields only related to the city unique key
> > in the same index also index fare docs which would be something like:
> > Fare:
> > cityID: xxx
> > Fareclass:standard
> > FareMonth: Jan
> > FarePrice: 100
> >
> > the query would be something like:
> > q=FarePrice:[* TO 100] FareMonth:Jan fl=cityID
> > returning facets for FareClass and FareMonth. hold on this will not facet
> > city docs correctly. sorry thasts not going to work.
> >
> >
> >
> >
> >
> >
> >
> >
> > On 1 December 2010 16:25, Erick Erickson 
> wrote:
> >
> >> Hmmm, that's getting to be a pretty clunky query sure enough. Now you're
> >> going to
> >> have to insure that HTTP request that long get through and stuff like
> >> that
> >>
> >> I'm reaching a bit here, but you can facet on a tokenized field.
> Although
> >> that's not
> >> often done there's no prohibition against it.
> >>
> >> So, what if you had just one field for each city that contained some
> >> abstract
> >> information about your fares etc. Something like
> >> janstdfareclass1 jancheapfareclass3 febstdfareclass6
> >>
> >> Now just facet on that field? Not #values# in that field, just the field
> >> itself. You'd then have to make those into human-readable text, but that
> >> would considerably simplify your query. Probably only works if your user
> >> is
> >> selecting from pre-defined ranges, if they expect to put in arbitrary
> >> ranges
> >> this scheme probably wouldn't work...
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Dec 1, 2010 at 10:22 AM, lee carroll
> >> wrote:
> >>
> >> > Hi Erick,
> >> > so if i understand you we could do something like:
> >> >
> >> > if Jan is selected in the user interface and we have 10 price ranges
> >> >
> >> > query would be 20 cluases in the query (10 * 2 fare clases)
> >> >
> >> > if first is selec

Re: schema design for related fields

2010-12-01 Thread Geert-Jan Brits

Also, filtering and sorting on price can be done as well. Just be sure to
use the correct price- field.
Geert-Jan

2010/12/1 Geert-Jan Brits 

> Ok longer answer than anticipated (and good conceptual practice ;-)
>
> Yeah I belief that would work if I understand correctly that:
>
> 'in Jan [9]
> in feb [10]
> in march [1]'
>
> has nothing to do with pricing, but only with availability?
>
> If so you could seperate it out as two seperate issues:
>
> 1. ) showing pricing (based on context)
> 2. ) showing availabilities (based on context)
>
> For 1.)  you get 39 pricefields ([jan,feb,..,dec,dc] *
> [standard,first,dc])
> note: 'dc' indicates 'don't care.
>
> depending on the context you query the correct pricefield to populate the
> price facet-values.
> for discussion lets call the fields: _p[fare][date].
> IN other words the price field for no preference at all would become:
> _pdcdc
>
>
> For 2.) define a multivalued field 'FaresPerDate 'which indicate
> availability, which is used to display:
>
> A)
> Standard fares [10]
> First fares [3]
>
> B)
> in Jan [9]
> in feb [10]
> in march [1]
>
> A) depends on your selection (or dont caring) about a month
> B) vice versa depends on your selection (or dont caring)  about a fare type
>
> given all possible date values: [jan,feb,..dec,dontcare]
> given all possible fare values:[standard,first,dontcare]
>
> FaresPerDate consists of multiple values per document where each value
> indicates the availability of a combination of 'fare' and 'date':
>
> (standardJan,firstJan,DCjan...,standardJan,firstDec,DCdec,standardDC,firstDC,DCDC)
> Note that the nr of possible values = 39.
>
> Example:
> 1. ) the user hasn't selected any preference:
>
> q=*:*&facet.field:FaresPerDate&facet.query=_pdcdc:[0 TO
> 20]&facet.query=_pdcdc:[20 TO 40], etc.
>
> in the client you have to make sure to select the correct values of
> 'FaresPerDate' for display:
> in this case:
>
> Standard fares [10] --> FaresPerDate.standardDC
> First fares [3] --> FaresPerDate.firstDC
>
> in Jan [9] -> FaresPerDate.DCJan
> in feb [10] -> FaresPerDate.DCFeb
> in march [1]-> FaresPerDate.DCMarch
>
> 2) the user has selected January
> q=*:*&facet.field:FaresPerDate&fq=FaresPerDate:DCJan&facet.query=_pDCJan:[0
> TO 20]&facet.query=_pDCJan:[20 TO 40]
>
> Standard fares [10] --> FaresPerDate.standardJan
> First fares [3] --> FaresPerDate.firstJan
>
> in Jan [9] -> FaresPerDate.DCJan
> in feb [10] -> FaresPerDate.DCFeb
> in march [1]-> FaresPerDate.DCMarch
>
> Hope that helps,
> Geert-Jan
>
>
> 2010/12/1 lee carroll 
>
> Sorry Geert missed of the price value bit from the user interface so we'd
>> display
>>
>> Facet price
>> Standard fares [10]
>> First fares [3]
>>
>> When traveling
>> in Jan [9]
>> in feb [10]
>> in march [1]
>>
>> Fare Price
>> 0 - 25 :  [20]
>> 25 - 50: [10]
>> 50 - 100 [2]
>>
>> cheers lee c
>>
>>
>> On 1 December 2010 17:00, lee carroll 
>> wrote:
>>
>> > Geert
>> >
>> > The UI would be something like:
>> > user selections
>> > for the facet price
>> > max price: £100
>> > fare class: any
>> >
>> > city attributes facet
>> > cityattribute1 etc: xxx
>> >
>> > results displayed something like
>> >
>> > Facet price
>> > Standard fares [10]
>> > First fares [3]
>> > in Jan [9]
>> > in feb [10]
>> > in march [1]
>> > etc
>> > is this compatible with your approach ?
>> >
>> > Erick the price is an interval scale ie a fare can be any value (not
>> high,
>> > low, medium etc)
>> >
>> > How sensible would the following approach be
>> > index city docs with fields only related to the city unique key
>> > in the same index also index fare docs which would be something like:
>> > Fare:
>> > cityID: xxx
>> > Fareclass:standard
>> > FareMonth: Jan
>> > FarePrice: 100
>> >
>> > the query would be something like:
>> > q=FarePrice:[* TO 100] FareMonth:Jan fl=cityID
>> > returning facets for FareClass and FareMonth. hold on this will not
>> facet
>> > city docs correctly. sorry thasts not going to work.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On 1 December 2010 16:25, Erick Erickson 
>> wrote:
>> >
>> >> Hmmm, that's getting to be a pretty clunky query sure enough. Now
>> you're
>> >> going to
>> >> have to insure that HTTP request that long get through and stuff like
>> >> that
>> >>
>> >> I'm reaching a bit here, but you can facet on a tokenized field.
>> Although
>> >> that's not
>> >> often done there's no prohibition against it.
>> >>
>> >> So, what if you had just one field for each city that contained some
>> >> abstract
>> >> information about your fares etc. Something like
>> >> janstdfareclass1 jancheapfareclass3 febstdfareclass6
>> >>
>> >> Now just facet on that field? Not #values# in that field, just the
>> field
>> >> itself. You'd then have to make those into human-readable text, but
>> that
>> >> would considerably simplify your query. Probably only works if your
>> user
>> >> is
>> >> selecting from pre-defined ranges, if they expect to put in arbitrary
>> >> ranges
>> >> this

Re: schema design for related fields

2010-12-01 Thread lee carroll

Hi Geert,

Ok I think I follow. the magic is in the multi-valued field.

The only danger would be complexity if we allow users to multi select
months/prices/fare classes. For example they can search for first prices in
jan, april and november. I think what you describe is possible in this case
just complicated. I'll see if i can hack some facets into the proto type
tommorrow. Thanks for your help

Lee C

On 1 December 2010 17:57, Geert-Jan Brits  wrote:

> Ok longer answer than anticipated (and good conceptual practice ;-)
>
> Yeah I belief that would work if I understand correctly that:
>
> 'in Jan [9]
> in feb [10]
> in march [1]'
>
> has nothing to do with pricing, but only with availability?
>
> If so you could seperate it out as two seperate issues:
>
> 1. ) showing pricing (based on context)
> 2. ) showing availabilities (based on context)
>
> For 1.)  you get 39 pricefields ([jan,feb,..,dec,dc] * [standard,first,dc])
> note: 'dc' indicates 'don't care.
>
> depending on the context you query the correct pricefield to populate the
> price facet-values.
> for discussion lets call the fields: _p[fare][date].
> IN other words the price field for no preference at all would become:
> _pdcdc
>
>
> For 2.) define a multivalued field 'FaresPerDate 'which indicate
> availability, which is used to display:
>
> A)
> Standard fares [10]
> First fares [3]
>
> B)
> in Jan [9]
> in feb [10]
> in march [1]
>
> A) depends on your selection (or dont caring) about a month
> B) vice versa depends on your selection (or dont caring)  about a fare type
>
> given all possible date values: [jan,feb,..dec,dontcare]
> given all possible fare values:[standard,first,dontcare]
>
> FaresPerDate consists of multiple values per document where each value
> indicates the availability of a combination of 'fare' and 'date':
>
> (standardJan,firstJan,DCjan...,standardJan,firstDec,DCdec,standardDC,firstDC,DCDC)
> Note that the nr of possible values = 39.
>
> Example:
> 1. ) the user hasn't selected any preference:
>
> q=*:*&facet.field:FaresPerDate&facet.query=_pdcdc:[0 TO
> 20]&facet.query=_pdcdc:[20 TO 40], etc.
>
> in the client you have to make sure to select the correct values of
> 'FaresPerDate' for display:
> in this case:
>
> Standard fares [10] --> FaresPerDate.standardDC
> First fares [3] --> FaresPerDate.firstDC
>
> in Jan [9] -> FaresPerDate.DCJan
> in feb [10] -> FaresPerDate.DCFeb
> in march [1]-> FaresPerDate.DCMarch
>
> 2) the user has selected January
> q=*:*&facet.field:FaresPerDate&fq=FaresPerDate:DCJan&facet.query=_pDCJan:[0
> TO 20]&facet.query=_pDCJan:[20 TO 40]
>
> Standard fares [10] --> FaresPerDate.standardJan
> First fares [3] --> FaresPerDate.firstJan
>
> in Jan [9] -> FaresPerDate.DCJan
> in feb [10] -> FaresPerDate.DCFeb
> in march [1]-> FaresPerDate.DCMarch
>
> Hope that helps,
> Geert-Jan
>
>
> 2010/12/1 lee carroll 
>
> > Sorry Geert missed of the price value bit from the user interface so we'd
> > display
> >
> > Facet price
> > Standard fares [10]
> > First fares [3]
> >
> > When traveling
> > in Jan [9]
> > in feb [10]
> > in march [1]
> >
> > Fare Price
> > 0 - 25 :  [20]
> > 25 - 50: [10]
> > 50 - 100 [2]
> >
> > cheers lee c
> >
> >
> > On 1 December 2010 17:00, lee carroll 
> > wrote:
> >
> > > Geert
> > >
> > > The UI would be something like:
> > > user selections
> > > for the facet price
> > > max price: £100
> > > fare class: any
> > >
> > > city attributes facet
> > > cityattribute1 etc: xxx
> > >
> > > results displayed something like
> > >
> > > Facet price
> > > Standard fares [10]
> > > First fares [3]
> > > in Jan [9]
> > > in feb [10]
> > > in march [1]
> > > etc
> > > is this compatible with your approach ?
> > >
> > > Erick the price is an interval scale ie a fare can be any value (not
> > high,
> > > low, medium etc)
> > >
> > > How sensible would the following approach be
> > > index city docs with fields only related to the city unique key
> > > in the same index also index fare docs which would be something like:
> > > Fare:
> > > cityID: xxx
> > > Fareclass:standard
> > > FareMonth: Jan
> > > FarePrice: 100
> > >
> > > the query would be something like:
> > > q=FarePrice:[* TO 100] FareMonth:Jan fl=cityID
> > > returning facets for FareClass and FareMonth. hold on this will not
> facet
> > > city docs correctly. sorry thasts not going to work.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 1 December 2010 16:25, Erick Erickson 
> > wrote:
> > >
> > >> Hmmm, that's getting to be a pretty clunky query sure enough. Now
> you're
> > >> going to
> > >> have to insure that HTTP request that long get through and stuff like
> > >> that
> > >>
> > >> I'm reaching a bit here, but you can facet on a tokenized field.
> > Although
> > >> that's not
> > >> often done there's no prohibition against it.
> > >>
> > >> So, what if you had just one field for each city that contained some
> > >> abstract
> > >> information about your fares etc. Something like
> > >> janstdfareclass1 j

ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Burton-West, Tom

We are using a recent Solr 3.x (See below for exact version).

We have set the ramBufferSizeMB to 320 in both the indexDefaults and the 
mainIndex sections of our solrconfig.xml:

320
20

We expected that this would mean that the index would not write to disk until 
it reached somewhere approximately over 300MB in size.
However, we see many small segments that look to be around 80MB in size.

We have not yet issued a single commit so nothing else should force a write to 
disk.

With a merge factor of 20 we also expected to see larger segments somewhere 
around 320 * 20 = 6GB in size, however we see several around 1GB.

We understand that the sizes are approximate, but these seem nowhere near what 
we expected.

Can anyone explain what is going on?

BTW
maxBufferedDocs is commented out, so this should not be affecting the buffer 
flushes



Solr Specification Version: 3.0.0.2010.11.19.16.00.54Solr Implementation 
Version: 3.1-SNAPSHOT 1036094 - root - 2010-11-19 16:00:54Lucene Specification 
Version: 3.1-SNAPSHOTLucene Implementation Version: 3.1-SNAPSHOT 1036094 - 
2010-11-19 16:01:10

Tom Burton-West

Re: how to set maxFieldLength to unlimitd

2010-12-01 Thread jan.kurella

I don't know about upload limitations, but for sure there are some in  
the default settings, this could explain the limit of 20MB. Which  
upload mechanism on solr side do you use? I guess this is not a lucene  
problem but rather the http-layer of solr.

If you manage to stream your PDF and start parsing it on the stream  
you then should go for the filter, that sets the positionIncrement to  
0 as mentioned.

What we did once for PDF files, we parsed them befor into plain text  
and where indexing this (but we were using lucene directly) with a  
streamReader.


Grüße, Jan

Am 01.12.2010 um 18:13 schrieb "ext Ma, Xiaohui (NIH/NLM/LHC) [C]" 
:

> Thanks so much for your replay, Jan. I just found I cannot index pdf  
> files with the file size more than 20MB.
>
> I use curl index them, didn't get any error either. Do you have any  
> suggestions to index pdf files with more than 20MB?
>
> Thanks,
> Xiaohui
>
> -Original Message-
> From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com]
> Sent: Wednesday, December 01, 2010 11:30 AM
> To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
> solr-user-...@lucene.apache.org
> Subject: RE: how to set maxFieldLength to unlimitd
>
> You just can't set it to "unlimited". What you could do, is ignoring  
> the positions and put a filter in, that sets the token for all but  
> the first token to 0 (means the field length will be just 1, all  
> tokens "stacked" on the first position)
> You could also break per page, so you put each "page" on a new  
> position.
>
> Jan
>
>> -Original Message-
>> From: ext Ma, Xiaohui (NIH/NLM/LHC) [C]  
>> [mailto:xiao...@mail.nlm.nih.gov]
>> Sent: Dienstag, 30. November 2010 19:49
>> To: solr-user@lucene.apache.org; 'solr-user- 
>> i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org'
>> Subject: how to set maxFieldLength to unlimitd
>>
>> I need index and search some pdf files which are very big (around  
>> 1000 pages each). How can I set maxFieldLength to unlimited?
>>
>> Thanks so much for your help in advance,
>> Xiaohui

Re: ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Michael McCandless

The ram efficiency (= size of segment once flushed divided by size of
RAM buffer) can vary drastically.

Because the in-RAM data structures must be "growable" (to append new
docs to the postings as they are encountered), the efficiency is never
100%.  I think 50% is actually a "good" ram efficiency, and lower than
that (even down to 27%) I think is still normal.

Do you have many unique or low-doc-freq terms?  That brings the efficiency down.

If you turn on IndexWriter's infoStream and post the output we can see
if anything odd is going on...

80 * 20 = ~1.6 GB so I'm not sure why you're getting 1 GB segments.
Do you do any deletions in this run?  A merged segment size will often
be less than the sum of the parts, especially if there are many terms
but across segments these terms are shared but the infoStream will
also show what merges are taking place.

Mike

On Wed, Dec 1, 2010 at 2:13 PM, Burton-West, Tom  wrote:
> We are using a recent Solr 3.x (See below for exact version).
>
> We have set the ramBufferSizeMB to 320 in both the indexDefaults and the 
> mainIndex sections of our solrconfig.xml:
>
> 320
> 20
>
> We expected that this would mean that the index would not write to disk until 
> it reached somewhere approximately over 300MB in size.
> However, we see many small segments that look to be around 80MB in size.
>
> We have not yet issued a single commit so nothing else should force a write 
> to disk.
>
> With a merge factor of 20 we also expected to see larger segments somewhere 
> around 320 * 20 = 6GB in size, however we see several around 1GB.
>
> We understand that the sizes are approximate, but these seem nowhere near 
> what we expected.
>
> Can anyone explain what is going on?
>
> BTW
> maxBufferedDocs is commented out, so this should not be affecting the buffer 
> flushes
> 
>
>
> Solr Specification Version: 3.0.0.2010.11.19.16.00.54Solr Implementation 
> Version: 3.1-SNAPSHOT 1036094 - root - 2010-11-19 16:00:54Lucene 
> Specification Version: 3.1-SNAPSHOTLucene Implementation Version: 
> 3.1-SNAPSHOT 1036094 - 2010-11-19 16:01:10
>
> Tom Burton-West
>
>

Solr highlighting is double-quotes-aware?

2010-12-01 Thread Scott Gonyea

Not sure how to write that subject line.  I'm getting some weird behavior out 
of the highlighter in Solr.  It seems like an edge case, but I'm curious to 
hear if this is known about, or if it's something worth looking into further.

Background:

I'm using Solr's highlighting facility to tag words, found in content crawled 
via Nutch. I split up the content based on those tags, which is later fed into 
a moderation process.

Sample Data (snippet from larger content):
[url=\"http://www.sampleurl.com/baffle_prices.html\"]baffle[/url]

(My "hl.simple.pre" is set to "TEST_KEYWORD_START" and my "hl.simple.post" is 
set to "TEST_KEYWORD_END")

Query for "baffle", and solr highlights it thus:

TEST_KEYWORD_STARTbaffle_prices.html\"]baffleTEST_KEYWORD_END

What should be happening, is this:

TEST_KEYWORD_STARTbaffleTEST_KEYWORD_END_prices.html\"]TEST_KEYWORD_STARTbaffleTEST_KEYWORD_END


Is there something about this data that makes the highlighter not want to split 
it up? Do I have to have Solr tokenize the words by some character that I 
somehow excluded?

Thank you,
Scott Gonyea

Re: ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Shawn Heisey


On 12/1/2010 12:13 PM, Burton-West, Tom wrote:

We have set the ramBufferSizeMB to 320 in both the indexDefaults and the 
mainIndex sections of our solrconfig.xml:

320
20

We expected that this would mean that the index would not write to disk until 
it reached somewhere approximately over 300MB in size.
However, we see many small segments that look to be around 80MB in size.

We have not yet issued a single commit so nothing else should force a write to 
disk.

With a merge factor of 20 we also expected to see larger segments somewhere 
around 320 * 20 = 6GB in size, however we see several around 1GB.

We understand that the sizes are approximate, but these seem nowhere near what 
we expected.


I have seen this.  In Solr 1.4.1, the .fdt, .fdx, and the .tv* files do 
not segment, but all the other files do.  I can't remember whether it 
behaves the same under 3.1, or whether it also creates these files in 
each segment.


Here's the first segment created during a test reindex I just started, 
excluding the previously mentioned files, which will be prefixed by _57 
until I choose to optimize the index:


-rw-r--r-- 1 ncindex ncindex315 Dec  1 12:40 _58.fnm
-rw-r--r-- 1 ncindex ncindex   26000115 Dec  1 12:40 _58.frq
-rw-r--r-- 1 ncindex ncindex 399124 Dec  1 12:40 _58.nrm
-rw-r--r-- 1 ncindex ncindex   23879227 Dec  1 12:40 _58.prx
-rw-r--r-- 1 ncindex ncindex 205874 Dec  1 12:40 _58.tii
-rw-r--r-- 1 ncindex ncindex   16000953 Dec  1 12:40 _58.tis

My ramBufferSize is 256MB, and those files add up to about 66MB.  My 
guess is that it takes  256MB of RAM to represent what condenses down to 
66MB on the disk.


When it had accumulated 16 segments, it merged them down to this, all 
the while continuing to index.  This is about 870MB:


-rw-r--r-- 1 ncindex ncindex338 Dec  1 12:56 _5n.fnm
-rw-r--r-- 1 ncindex ncindex  376423659 Dec  1 12:58 _5n.frq
-rw-r--r-- 1 ncindex ncindex5726860 Dec  1 12:58 _5n.nrm
-rw-r--r-- 1 ncindex ncindex  331890058 Dec  1 12:58 _5n.prx
-rw-r--r-- 1 ncindex ncindex2037072 Dec  1 12:58 _5n.tii
-rw-r--r-- 1 ncindex ncindex  154470775 Dec  1 12:58 _5n.tis

If this merge were to happen 16 more times (256 segments created), it 
would then do a super-merge down to one very large segment.  In your 
case, with a mergeFactor of 20, that would take 400 segments.  I only 
ever saw this happen once - when I built a single index with all 49 
million documents in it.


Shawn

Re: schema design for related fields

2010-12-01 Thread Geert-Jan Brits

Indeed, selecting the best price for January OR April OR November and
sorting on it isn't possible with this solution (if that's what you mean).
However, any combination of selecting 1 month and/or 1 price-range and/or 1
fare-type IS possible.

2010/12/1 lee carroll 

> Hi Geert,
>
> Ok I think I follow. the magic is in the multi-valued field.
>
> The only danger would be complexity if we allow users to multi select
> months/prices/fare classes. For example they can search for first prices in
> jan, april and november. I think what you describe is possible in this case
> just complicated. I'll see if i can hack some facets into the proto type
> tommorrow. Thanks for your help
>
> Lee C
>
> On 1 December 2010 17:57, Geert-Jan Brits  wrote:
>
> > Ok longer answer than anticipated (and good conceptual practice ;-)
> >
> > Yeah I belief that would work if I understand correctly that:
> >
> > 'in Jan [9]
> > in feb [10]
> > in march [1]'
> >
> > has nothing to do with pricing, but only with availability?
> >
> > If so you could seperate it out as two seperate issues:
> >
> > 1. ) showing pricing (based on context)
> > 2. ) showing availabilities (based on context)
> >
> > For 1.)  you get 39 pricefields ([jan,feb,..,dec,dc] *
> [standard,first,dc])
> > note: 'dc' indicates 'don't care.
> >
> > depending on the context you query the correct pricefield to populate the
> > price facet-values.
> > for discussion lets call the fields: _p[fare][date].
> > IN other words the price field for no preference at all would become:
> > _pdcdc
> >
> >
> > For 2.) define a multivalued field 'FaresPerDate 'which indicate
> > availability, which is used to display:
> >
> > A)
> > Standard fares [10]
> > First fares [3]
> >
> > B)
> > in Jan [9]
> > in feb [10]
> > in march [1]
> >
> > A) depends on your selection (or dont caring) about a month
> > B) vice versa depends on your selection (or dont caring)  about a fare
> type
> >
> > given all possible date values: [jan,feb,..dec,dontcare]
> > given all possible fare values:[standard,first,dontcare]
> >
> > FaresPerDate consists of multiple values per document where each value
> > indicates the availability of a combination of 'fare' and 'date':
> >
> >
> (standardJan,firstJan,DCjan...,standardJan,firstDec,DCdec,standardDC,firstDC,DCDC)
> > Note that the nr of possible values = 39.
> >
> > Example:
> > 1. ) the user hasn't selected any preference:
> >
> > q=*:*&facet.field:FaresPerDate&facet.query=_pdcdc:[0 TO
> > 20]&facet.query=_pdcdc:[20 TO 40], etc.
> >
> > in the client you have to make sure to select the correct values of
> > 'FaresPerDate' for display:
> > in this case:
> >
> > Standard fares [10] --> FaresPerDate.standardDC
> > First fares [3] --> FaresPerDate.firstDC
> >
> > in Jan [9] -> FaresPerDate.DCJan
> > in feb [10] -> FaresPerDate.DCFeb
> > in march [1]-> FaresPerDate.DCMarch
> >
> > 2) the user has selected January
> >
> q=*:*&facet.field:FaresPerDate&fq=FaresPerDate:DCJan&facet.query=_pDCJan:[0
> > TO 20]&facet.query=_pDCJan:[20 TO 40]
> >
> > Standard fares [10] --> FaresPerDate.standardJan
> > First fares [3] --> FaresPerDate.firstJan
> >
> > in Jan [9] -> FaresPerDate.DCJan
> > in feb [10] -> FaresPerDate.DCFeb
> > in march [1]-> FaresPerDate.DCMarch
> >
> > Hope that helps,
> > Geert-Jan
> >
> >
> > 2010/12/1 lee carroll 
> >
> > > Sorry Geert missed of the price value bit from the user interface so
> we'd
> > > display
> > >
> > > Facet price
> > > Standard fares [10]
> > > First fares [3]
> > >
> > > When traveling
> > > in Jan [9]
> > > in feb [10]
> > > in march [1]
> > >
> > > Fare Price
> > > 0 - 25 :  [20]
> > > 25 - 50: [10]
> > > 50 - 100 [2]
> > >
> > > cheers lee c
> > >
> > >
> > > On 1 December 2010 17:00, lee carroll 
> > > wrote:
> > >
> > > > Geert
> > > >
> > > > The UI would be something like:
> > > > user selections
> > > > for the facet price
> > > > max price: £100
> > > > fare class: any
> > > >
> > > > city attributes facet
> > > > cityattribute1 etc: xxx
> > > >
> > > > results displayed something like
> > > >
> > > > Facet price
> > > > Standard fares [10]
> > > > First fares [3]
> > > > in Jan [9]
> > > > in feb [10]
> > > > in march [1]
> > > > etc
> > > > is this compatible with your approach ?
> > > >
> > > > Erick the price is an interval scale ie a fare can be any value (not
> > > high,
> > > > low, medium etc)
> > > >
> > > > How sensible would the following approach be
> > > > index city docs with fields only related to the city unique key
> > > > in the same index also index fare docs which would be something like:
> > > > Fare:
> > > > cityID: xxx
> > > > Fareclass:standard
> > > > FareMonth: Jan
> > > > FarePrice: 100
> > > >
> > > > the query would be something like:
> > > > q=FarePrice:[* TO 100] FareMonth:Jan fl=cityID
> > > > returning facets for FareClass and FareMonth. hold on this will not
> > facet
> > > > city docs correctly. sorry thasts not going to work.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >

RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

Thanks so much, Jan. I use curl to index pdf files. Is there other way to do it?

I changed it the positionIncrement to 0, I didn't get it work either.

Thanks,
Xiaohui 

-Original Message-
From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] 
Sent: Wednesday, December 01, 2010 2:34 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

I don't know about upload limitations, but for sure there are some in  
the default settings, this could explain the limit of 20MB. Which  
upload mechanism on solr side do you use? I guess this is not a lucene  
problem but rather the http-layer of solr.

If you manage to stream your PDF and start parsing it on the stream  
you then should go for the filter, that sets the positionIncrement to  
0 as mentioned.

What we did once for PDF files, we parsed them befor into plain text  
and where indexing this (but we were using lucene directly) with a  
streamReader.


Grüße, Jan

Am 01.12.2010 um 18:13 schrieb "ext Ma, Xiaohui (NIH/NLM/LHC) [C]" 
:

> Thanks so much for your replay, Jan. I just found I cannot index pdf  
> files with the file size more than 20MB.
>
> I use curl index them, didn't get any error either. Do you have any  
> suggestions to index pdf files with more than 20MB?
>
> Thanks,
> Xiaohui
>
> -Original Message-
> From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com]
> Sent: Wednesday, December 01, 2010 11:30 AM
> To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
> solr-user-...@lucene.apache.org
> Subject: RE: how to set maxFieldLength to unlimitd
>
> You just can't set it to "unlimited". What you could do, is ignoring  
> the positions and put a filter in, that sets the token for all but  
> the first token to 0 (means the field length will be just 1, all  
> tokens "stacked" on the first position)
> You could also break per page, so you put each "page" on a new  
> position.
>
> Jan
>
>> -Original Message-
>> From: ext Ma, Xiaohui (NIH/NLM/LHC) [C]  
>> [mailto:xiao...@mail.nlm.nih.gov]
>> Sent: Dienstag, 30. November 2010 19:49
>> To: solr-user@lucene.apache.org; 'solr-user- 
>> i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org'
>> Subject: how to set maxFieldLength to unlimitd
>>
>> I need index and search some pdf files which are very big (around  
>> 1000 pages each). How can I set maxFieldLength to unlimited?
>>
>> Thanks so much for your help in advance,
>> Xiaohui

RE: ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Burton-West, Tom

Thanks Mike,

Yes we have many unique terms due to dirty OCR and 400 languages and probably 
lots of low doc freq terms as well (although with the ICUTokenizer and 
ICUFoldingFilter we should get fewer terms due to bad tokenization and 
normalization.)

Is this additional overhead because each unique term takes a certain amount of 
space compared to adding entries to a list for an existing term?

Does turning on IndexWriters infostream have a significant impact on memory use 
or indexing speed?  

If it does, I'll reproduce this on our test server rather than turning it on 
for a bit on the production indexer.  If it doesn't I'll turn it on and post 
here.

Tom

-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com] 
Sent: Wednesday, December 01, 2010 2:43 PM
To: solr-user@lucene.apache.org
Subject: Re: ramBufferSizeMB not reflected in segment sizes in index

The ram efficiency (= size of segment once flushed divided by size of
RAM buffer) can vary drastically.

Because the in-RAM data structures must be "growable" (to append new
docs to the postings as they are encountered), the efficiency is never
100%.  I think 50% is actually a "good" ram efficiency, and lower than
that (even down to 27%) I think is still normal.

Do you have many unique or low-doc-freq terms?  That brings the efficiency down.

If you turn on IndexWriter's infoStream and post the output we can see
if anything odd is going on...

80 * 20 = ~1.6 GB so I'm not sure why you're getting 1 GB segments.
Do you do any deletions in this run?  A merged segment size will often
be less than the sum of the parts, especially if there are many terms
but across segments these terms are shared but the infoStream will
also show what merges are taking place.

Mike

On Wed, Dec 1, 2010 at 2:13 PM, Burton-West, Tom  wrote:
> We are using a recent Solr 3.x (See below for exact version).
>
> We have set the ramBufferSizeMB to 320 in both the indexDefaults and the 
> mainIndex sections of our solrconfig.xml:
>
> 320
> 20
>
> We expected that this would mean that the index would not write to disk until 
> it reached somewhere approximately over 300MB in size.
> However, we see many small segments that look to be around 80MB in size.
>
> We have not yet issued a single commit so nothing else should force a write 
> to disk.
>
> With a merge factor of 20 we also expected to see larger segments somewhere 
> around 320 * 20 = 6GB in size, however we see several around 1GB.
>
> We understand that the sizes are approximate, but these seem nowhere near 
> what we expected.
>
> Can anyone explain what is going on?
>
> BTW
> maxBufferedDocs is commented out, so this should not be affecting the buffer 
> flushes
> 
>
>
> Solr Specification Version: 3.0.0.2010.11.19.16.00.54Solr Implementation 
> Version: 3.1-SNAPSHOT 1036094 - root - 2010-11-19 16:00:54Lucene 
> Specification Version: 3.1-SNAPSHOTLucene Implementation Version: 
> 3.1-SNAPSHOT 1036094 - 2010-11-19 16:01:10
>
> Tom Burton-West
>
>

Re: entire farm fails at the same time with OOM issues

2010-12-01 Thread Peter Karich


 also try to minimize maxWarming searchers to 1(?) or 2.
And decrease cache usage (especially autowarming) if possible at all. 
But again: only if it doesn't affect performance ...


Regards,
Peter.


On Tue, Nov 30, 2010 at 6:04 PM, Robert Petersen  wrote:

My question is this.  Why in the world would all of my slaves, after
running fine for some days, suddenly all at the exact same minute
experience OOM heap errors and go dead?

If there is no change in query traffic when this happens, then it's
due to what the index looks like.

My guess is a large index merge happened, which means that when the
searchers re-open on the new index, it requires more memory than
normal (much less can be shared with the previous index).

I'd try bumping the heap a little bit, and then optimizing once a day
during off-peak hours.
If you still get OOM errors, bump the heap a little more.

-Yonik
http://www.lucidimagination.com

Re: ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Michael McCandless

On Wed, Dec 1, 2010 at 3:16 PM, Burton-West, Tom  wrote:
> Thanks Mike,
>
> Yes we have many unique terms due to dirty OCR and 400 languages and probably 
> lots of low doc freq terms as well (although with the ICUTokenizer and 
> ICUFoldingFilter we should get fewer terms due to bad tokenization and 
> normalization.)

OK likely this explains the lowish RAM efficiency.

> Is this additional overhead because each unique term takes a certain amount 
> of space compared to adding entries to a list for an existing term?

Exactly.  There's a highish "startup cost" for each term but then
appending docs/positions to that term is more efficient especially for
higher frequency terms.  In the limit, a single unique term  across
all docs will have very high RAM efficiency...

> Does turning on IndexWriters infostream have a significant impact on memory 
> use or indexing speed?

I don't believe so

Mike

RE: entire farm fails at the same time with OOM issues

2010-12-01 Thread Robert Petersen

Good idea.  Our farm is behind Akamai so that should be ok to do.

-Original Message-
From: Peter Karich [mailto:peat...@yahoo.de] 
Sent: Wednesday, December 01, 2010 12:21 PM
To: solr-user@lucene.apache.org
Subject: Re: entire farm fails at the same time with OOM issues


  also try to minimize maxWarming searchers to 1(?) or 2.
And decrease cache usage (especially autowarming) if possible at all. 
But again: only if it doesn't affect performance ...

Regards,
Peter.

> On Tue, Nov 30, 2010 at 6:04 PM, Robert Petersen
wrote:
>> My question is this.  Why in the world would all of my slaves, after
>> running fine for some days, suddenly all at the exact same minute
>> experience OOM heap errors and go dead?
> If there is no change in query traffic when this happens, then it's
> due to what the index looks like.
>
> My guess is a large index merge happened, which means that when the
> searchers re-open on the new index, it requires more memory than
> normal (much less can be shared with the previous index).
>
> I'd try bumping the heap a little bit, and then optimizing once a day
> during off-peak hours.
> If you still get OOM errors, bump the heap a little more.
>
> -Yonik
> http://www.lucidimagination.com

Re: Good example of multiple tokenizers for a single field

2010-12-01 Thread Jacob Elder

On Tue, Nov 30, 2010 at 10:07 AM, Robert Muir  wrote:

> On Tue, Nov 30, 2010 at 9:45 AM, Jacob Elder  wrote:
> > Right. CJK doesn't tend to have a lot of whitespace to begin with. In the
> > past, we were using a patched version of StandardTokenizer which treated
> > @twitteruser and #hashtag better, but this became a release engineering
> > nightmare so we switched to Whitespace.
>
> in this case, have you considered using a CharFilter (e.g.
> MappingCharFilter) before the tokenizer?
>
> This way you could map your special things such as @ and # to some
> other string that the tokenizer doesnt split on,
> e.g. # => "HASH_".
>
> then your #foobar goes to HASH_foobar.
> If you want searches of "#foobar" to only match "#foobar" and not also
> "foobar" itself, and vice versa, you are done.
> Maybe you want searches of #foobar to only match #foobar, but searches
> of "foobar" to match both "#foobar" and "foobar".
> In this case, you would probably use a worddelimiterfilter w/
> preserveOriginal at index-time only , followed by a StopFilter
> containing HASH, so you index HASH_foobar and foobar.
>
> anyway i think you have a lot of flexibility to reuse
> standardtokenizer but customize things like this without maintaining
> your own tokenizer, this is the purpose of CharFilters.
>

That worked brilliantly. Thank you very much, Robert.

-- 
Jacob Elder
@jelder
(646) 535-3379

spatial query parinsg error: org.apache.lucene.queryParser.ParseException

2010-12-01 Thread Dennis Gearon

I am trying to get spatial search to work on my Solr installation. I am running 
version 1.4.1 with the Jayway Team spatial-solr-plugin. I am performing the 
search with the following url:

http://localhost:8080/solr/select?wt=json&indent=true&q=title:Art%20Loft{!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3}


The result that I get is the following error:

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
'title:Art Loft{!spatial lat=37.326375 lng=-121.892639 radius=3 unit=km 
threadCount=3}': Encountered "  "lng=-121.892639 "" at line 1, 
column 38. Was expecting: "}"

Not sure why it would be complaining about the lng parameter in the query. I 
double-checked to make sure that I had the right name for the longitude field 
in 
my solrconfig.xml file.

Any help/suggestions would be greatly appreciated

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: Return Lucene DocId in Solr Results

2010-12-01 Thread Sasank Mudunuri

Take this with a sizeable grain of salt as I haven't actually tried doing
this. But you might try using an IndexReader which it looks like you can get
from this class:

http://lucene.apache.org/solr/api/org/apache/solr/core/StandardIndexReaderFactory.html

sasank

On Tue, Nov 30, 2010 at 6:45 AM, Lohrenz, Steven
wrote:

> Hmm, I found some similar queries on stackoverflow and they did not
> recommend exposing the lucene docId.
>
> So, I guess my question becomes: What is the best way, from within my
> custom QParser, to take a list of solr primary keys (that were retrieved
> from elsewhere) and turn them into docIds? I also saw something about
> cacheing them using a Field Cache - how would I do that?
>
> Thanks,
> Steve
>
> -Original Message-
> From: Lohrenz, Steven [mailto:steven.lohr...@hmhpub.com]
> Sent: 30 November 2010 11:57
> To: solr-user@lucene.apache.org
> Subject: Return Lucene DocId in Solr Results
>
> Hi,
>
> I was wondering how I would go about getting the lucene docid included in
> the results from a solr query?
>
> I've built a QueryParser to query another solr instance and and join the
> results of the two instances through the use of a Filter.  The Filter needs
> the lucene docid to work. This is the only bit I'm missing right now.
>
> Thanks,
> Steve
>
>

Re: Return Lucene DocId in Solr Results

2010-12-01 Thread Erick Erickson

On the face of it, this doesn't make sense, so perhaps you can explain a
bit.The doc IDs
from one Solr instance have no relation to the doc IDs from another Solr
instance. So anything
that uses doc IDs from one Solr instance to create a filter on another
instance doesn't seem
to be something you'd want to do...

Which may just mean I don't understand what you're trying to do. Can you
back up a bit
and describe the higher-level problem? This seems like it may be an XY
problem, see:
http://people.apache.org/~hossman/#xyproblem

Best
Erick

On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven
wrote:

> Hi,
>
> I was wondering how I would go about getting the lucene docid included in
> the results from a solr query?
>
> I've built a QueryParser to query another solr instance and and join the
> results of the two instances through the use of a Filter.  The Filter needs
> the lucene docid to work. This is the only bit I'm missing right now.
>
> Thanks,
> Steve
>
>

Re: ArrayIndexOutOfBoundsException in sort

2010-12-01 Thread Jerry Li

Got it with thanks.

On Wed, Dec 1, 2010 at 8:02 PM, Ahmet Arslan  wrote:

> > It seems work fine again after I change "author" field type
> > from text to
> > string, could anybody give some info about it? very
> > appriciated.
>
>
> http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F
>
> And also see Erick's explanation
>
> http://search-lucene.com/m/7fnj1TtNde/sort+on+a+tokenized+field&subj=Re+Solr+sorting+problem
>
>
>
>


-- 

Best Regards.
Jerry. Li


Re: spatial query parinsg error: org.apache.lucene.queryParser.ParseException

2010-12-01 Thread Jean-Sebastien Vachon

Try this...

http://localhost:8080/solr/select?wt=json&indent=true&q={!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft

- Original Message - 
From: "Dennis Gearon" 

To: 
Sent: Wednesday, December 01, 2010 7:51 PM
Subject: spatial query parinsg error: 
org.apache.lucene.queryParser.ParseException

I am trying to get spatial search to work on my Solr installation. I am 
running

version 1.4.1 with the Jayway Team spatial-solr-plugin. I am performing the
search with the following url:

http://localhost:8080/solr/select?wt=json&indent=true&q=title:Art%20Loft{!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3}

The result that I get is the following error:

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse
'title:Art Loft{!spatial lat=37.326375 lng=-121.892639 radius=3 unit=km
threadCount=3}': Encountered "  "lng=-121.892639 "" at line 1,
column 38. Was expecting: "}"

Not sure why it would be complaining about the lng parameter in the query. I
double-checked to make sure that I had the right name for the longitude 
field in

my solrconfig.xml file.

Any help/suggestions would be greatly appreciated

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better
idea to learn from others’ mistakes, so you do not have to make them 
yourself.

from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

Re: Preventing index segment corruption when windows crashes

2010-12-01 Thread Lance Norskog

Is there any way that Windows 7 and disk drivers are not honoring the
fsync() calls? That would cause files and/or blocks to get saved out
of order.

On Tue, Nov 30, 2010 at 3:24 PM, Peter Sturge  wrote:
> After a recent Windows 7 crash (:-\), upon restart, Solr starts giving
> LockObtainFailedException errors: (excerpt)
>
>   30-Nov-2010 23:10:51 org.apache.solr.common.SolrException log
>   SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock
> obtain timed out:
> nativefsl...@solr\.\.\data0\index\lucene-ad25f73e3c87e6f192c4421756925f47-write.lock
>
>
> When I run CheckIndex, I get: (excerpt)
>
>  30 of 30: name=_2fi docCount=857
>    compound=false
>    hasProx=true
>    numFiles=8
>    size (MB)=0.769
>    diagnostics = {os.version=6.1, os=Windows 7, lucene.version=3.1-dev 
> ${svnver
> sion} - 2010-09-11 11:09:06, source=flush, os.arch=amd64, 
> java.version=1.6.0_18,
> java.vendor=Sun Microsystems Inc.}
>    no deletions
>    test: open reader.FAILED
>    WARNING: fixIndex() would remove reference to this segment; full exception:
> org.apache.lucene.index.CorruptIndexException: did not read all bytes from 
> file
> "_2fi.fnm": read 1 vs size 512
>        at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:367)
>        at org.apache.lucene.index.FieldInfos.(FieldInfos.java:71)
>        at 
> org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReade
> r.java:119)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:583)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:561)
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:467)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:878)
>
> WARNING: 1 broken segments (containing 857 documents) detected
>
>
> This seems to happen every time Windows 7 crashes, and it would seem
> extraordinary bad luck for this tiny test index to be in the middle of
> a commit every time.
> (it is set to commit every 40secs, but for such a small index it only
> takes millis to complete)
>
> Does this seem right? I don't remember seeing so many corruptions in
> the index - maybe it is the world of Win7 dodgy drivers, but it would
> be worth investigating if there's something amiss in Solr/Lucene when
> things go down unexpectedly...
>
> Thanks,
> Peter
>
>
> On Tue, Nov 30, 2010 at 9:19 AM, Peter Sturge  wrote:
>> The index itself isn't corrupt - just one of the segment files. This
>> means you can read the index (less the offending segment(s)), but once
>> this happens it's no longer possible to
>> access the documents that were in that segment (they're gone forever),
>> nor write/commit to the index (depending on the env/request, you get
>> 'Error reading from index file..' and/or WriteLockError)
>> (note that for my use case, documents are dynamically created so can't
>> be re-indexed).
>>
>> Restarting Solr fixes the write lock errors (an indirect environmental
>> symptom of the problem), and running CheckIndex -fix is the only way
>> I've found to repair the index so it can be written to (rewrites the
>> corrupted segment(s)).
>>
>> I guess I was wondering if there's a mechanism that would support
>> something akin to a transactional rollback for segments.
>>
>> Thanks,
>> Peter
>>
>>
>>
>> On Mon, Nov 29, 2010 at 5:33 PM, Yonik Seeley
>>  wrote:
>>> On Mon, Nov 29, 2010 at 10:46 AM, Peter Sturge  
>>> wrote:
 If a Solr index is running at the time of a system halt, this can
 often corrupt a segments file, requiring the index to be -fix'ed by
 rewriting the offending file.
>>>
>>> Really?  That shouldn't be possible (if you mean the index is truly
>>> corrupt - i.e. you can't open it).
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: spatial query parinsg error: org.apache.lucene.queryParser.ParseException

2010-12-01 Thread Dennis Gearon

Thanks Jean-Sebastion. I forwarded it to my partner. His membership is still 
being held up.

I'll be the go between until he has access.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Jean-Sebastien Vachon 
To: solr-user@lucene.apache.org
Sent: Wed, December 1, 2010 7:12:20 PM
Subject: Re: spatial query parinsg error: 
org.apache.lucene.queryParser.ParseException

Try this...

http://localhost:8080/solr/select?wt=json&indent=true&q={!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft


- Original Message - From: "Dennis Gearon" 
To: 
Sent: Wednesday, December 01, 2010 7:51 PM
Subject: spatial query parinsg error: 
org.apache.lucene.queryParser.ParseException


I am trying to get spatial search to work on my Solr installation. I am running
version 1.4.1 with the Jayway Team spatial-solr-plugin. I am performing the
search with the following url:

http://localhost:8080/solr/select?wt=json&indent=true&q=title:Art%20Loft{!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3}



The result that I get is the following error:

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse
'title:Art Loft{!spatial lat=37.326375 lng=-121.892639 radius=3 unit=km
threadCount=3}': Encountered "  "lng=-121.892639 "" at line 1,
column 38. Was expecting: "}"

Not sure why it would be complaining about the lng parameter in the query. I
double-checked to make sure that I had the right name for the longitude field in
my solrconfig.xml file.

Any help/suggestions would be greatly appreciated

Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: spatial query parinsg error: org.apache.lucene.queryParser.ParseException

2010-12-01 Thread Jean-Sebastien Vachon

I just saw the parameter 'lng' in your query... I believe it should be 
'long'. Give it a try if the link I sent you is not working

- Original Message - 
From: "Dennis Gearon" 

To: 
Sent: Wednesday, December 01, 2010 11:39 PM
Subject: Re: spatial query parinsg error: 
org.apache.lucene.queryParser.ParseException

Thanks Jean-Sebastion. I forwarded it to my partner. His membership is still
being held up.

I'll be the go between until he has access.

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better
idea to learn from others’ mistakes, so you do not have to make them 
yourself.

from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Jean-Sebastien Vachon 
To: solr-user@lucene.apache.org
Sent: Wed, December 1, 2010 7:12:20 PM
Subject: Re: spatial query parinsg error:
org.apache.lucene.queryParser.ParseException

Try this...

http://localhost:8080/solr/select?wt=json&indent=true&q={!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft

- Original Message - From: "Dennis Gearon" 
To: 
Sent: Wednesday, December 01, 2010 7:51 PM
Subject: spatial query parinsg error:
org.apache.lucene.queryParser.ParseException

I am trying to get spatial search to work on my Solr installation. I am 
running

version 1.4.1 with the Jayway Team spatial-solr-plugin. I am performing the
search with the following url:

http://localhost:8080/solr/select?wt=json&indent=true&q=title:Art%20Loft{!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3}

The result that I get is the following error:

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse
'title:Art Loft{!spatial lat=37.326375 lng=-121.892639 radius=3 unit=km
threadCount=3}': Encountered "  "lng=-121.892639 "" at line 1,
column 38. Was expecting: "}"

Not sure why it would be complaining about the lng parameter in the query. I
double-checked to make sure that I had the right name for the longitude 
field in

my solrconfig.xml file.

Any help/suggestions would be greatly appreciated

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better
idea to learn from others’ mistakes, so you do not have to make them 
yourself.

from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

best way to get maxDocs in java (i.e. as on stats.jsp page).

2010-12-01 Thread Will Milspec

hi all,

What's the best way to programmatically-in-java get the 'maxDoc' attribute
(as seen on the stats.jsp page).

I don't see any hooks on the solrj api.

Currently I plan to use an http client to get stats.jsp (which returns xml)
and parse it using xpath.

If anyone can recommend a better approach, please opine.

thanks

will

Re: spatial query parinsg error: org.apache.lucene.queryParser.ParseException

2010-12-01 Thread Dennis Gearon

Forwarded to my partner, thx, will let you know.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Jean-Sebastien Vachon 
To: solr-user@lucene.apache.org
Sent: Wed, December 1, 2010 8:50:58 PM
Subject: Re: spatial query parinsg error: 
org.apache.lucene.queryParser.ParseException

I just saw the parameter 'lng' in your query... I believe it should be 'long'. 
Give it a try if the link I sent you is not working

- Original Message - From: "Dennis Gearon" 
To: 
Sent: Wednesday, December 01, 2010 11:39 PM
Subject: Re: spatial query parinsg error: 
org.apache.lucene.queryParser.ParseException


Thanks Jean-Sebastion. I forwarded it to my partner. His membership is still
being held up.

I'll be the go between until he has access.

Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Jean-Sebastien Vachon 
To: solr-user@lucene.apache.org
Sent: Wed, December 1, 2010 7:12:20 PM
Subject: Re: spatial query parinsg error:
org.apache.lucene.queryParser.ParseException

Try this...

http://localhost:8080/solr/select?wt=json&indent=true&q={!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft



- Original Message - From: "Dennis Gearon" 
To: 
Sent: Wednesday, December 01, 2010 7:51 PM
Subject: spatial query parinsg error:
org.apache.lucene.queryParser.ParseException


I am trying to get spatial search to work on my Solr installation. I am running
version 1.4.1 with the Jayway Team spatial-solr-plugin. I am performing the
search with the following url:

http://localhost:8080/solr/select?wt=json&indent=true&q=title:Art%20Loft{!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3}




The result that I get is the following error:

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse
'title:Art Loft{!spatial lat=37.326375 lng=-121.892639 radius=3 unit=km
threadCount=3}': Encountered "  "lng=-121.892639 "" at line 1,
column 38. Was expecting: "}"

Not sure why it would be complaining about the lng parameter in the query. I
double-checked to make sure that I had the right name for the longitude field in
my solrconfig.xml file.

Any help/suggestions would be greatly appreciated

Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: best way to get maxDocs in java (i.e. as on stats.jsp page).

2010-12-01 Thread Koji Sekiguchi


(10/12/02 13:51), Will Milspec wrote:

hi all,

What's the best way to programmatically-in-java get the 'maxDoc' attribute
(as seen on the stats.jsp page).

I don't see any hooks on the solrj api.

Currently I plan to use an http client to get stats.jsp (which returns xml)
and parse it using xpath.

If anyone can recommend a better approach, please opine.

thanks

will


Will,

Try:
http://localhost:8983/solr/admin/luke

LukeRequestHandler
http://wiki.apache.org/solr/LukeRequestHandler

Koji
--
http://www.rondhuit.com/en/

problems with custom SolrCache.init() - fails on startup

2010-12-01 Thread Kevin Osborn

My project has a couple custom caches that descend from FastLRUCache. These 
worked fine in Solr 1.3. Then I started migrating my project to Solr 1.4.1 and 
had problems during startup.

I believe the problem is that I attempt to access the core in the init process. 
I currently use the deprecated SolrCore.getSolrCore(), but had the same problem 
when attempting to use CoreContainer. During some initialization process, I 
need 
access to the IndexSchema object. I assume the problem is because startup must 
create objects in a different order now.

Does anyone have any suggestions on how to get access to the core 
infrastructure 
at the startup of the caches?

Restrict access to localhost

2010-12-01 Thread Ganesh

Hello all,

1)
I want to restrict access to Solr only in localhost. How to acheive that? 

2)
If i want to allow the clients to search but not to delete? How to restric the 
access?

Any thoughts?

Regards
Ganesh.
Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download 
Now! http://messenger.yahoo.com/download.php

70 matches

Mail list logo