date:20100219

Re: How does one sort facet queries?

2010-02-19 Thread gwk


On 2/19/2010 2:15 AM, Kelly Taylor wrote:

All sorting of facets works great at the field level (count/index)...all good
there...but how is sorting accomplished with range queries? The solrj
response doesn't seem to maintain the order the queries are sent in, and the
order is not in index or count order. What's the trick?

http://localhost:8983/solr/select?q=someterm
   &rows=0
   &facet=true
   &facet.limit=-1
   &facet.query=price:[* TO 100]
   &facet.query=price:[100 TO 200]
   &facet.query=price:[200 TO 300]
   &facet.query=price:[300 TO 400]
   &facet.query=price:[400 TO 500]
   &facet.query=price:[500 TO 600]
   &facet.query=price:[600 TO 700]
   &facet.query=price:[700 TO *]
   &facet.mincount=1
   &collapse.field=dedupe_hash
   &collapse.threshold=1
   &collapse.type=normal
   &collapse.facet=before

   
The "trick" I use is to use LocalParams to give eacht facet query a well 
defined name. Afterwards you can loop through the names in whatever 
order you want.

so basically facet.query={!key=price_0}[* TO 100] etc.

N.B. the facet queries in your example will lead to some documents to be 
counted double (i.e. when the price is exactly 100, 200, 300).


Regards,

gwk

Re: replications issue

2010-02-19 Thread giskard

Ciao,

Uhm after some time a new index in data/index on the slave has been written
with the ~size of the master index.

the configure on both master slave is the same one on the solrReplication wiki 
page
"enable/disable master/slave in a node"


  
${enable.master:false} 
commit
schema.xml,stopwords.txt
 
 
${enable.slave:false} 
   http://localhost:8983/solr/replication
   00:00:60
 


When the master is started, pass in -Denable.master=true and in the slave pass 
in -Denable.slave=true. Alternately , these values can be stored in a 
solrcore.properties file as follows

#solrcore.properties in master
enable.master=true
enable.slave=false

Il giorno 19/feb/2010, alle ore 03.43, Otis Gospodnetic ha scritto:

> giskard,
> 
> Is this on the master or on the slave(s)?
> Maybe you can paste your replication handler config for the master and your 
> replication handler config for the slave.
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
> 
> 
> 
> From: giskard 
> To: solr-user@lucene.apache.org
> Sent: Thu, February 18, 2010 12:16:37 PM
> Subject: replications issue
> 
> Hi all,
> 
> I've setup solr replication as described in the wiki.
> 
> when i start the replication a directory called index.$numebers is created 
> after a while
> it disappears and a new index.$othernumbers is created
> 
> index/ remains untouched with an empty index.
> 
> any clue?
> 
> thank you in advance,
> Riccardo
> 
> --
> ciao,
> giskard

--
ciao,
giskard

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-19 Thread Glen Newton

I've run Lucene with heap sizes as large as 28GB of RAM (on a 32GB
machine, 64bit, Linux) and a ramBufferSize of 3GB. While I haven't
noticed the GC issues mark mentioned in this configuration, I have
seen them in the ranges he discusses (on 1.6 http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql

On 18 February 2010 21:34, Otis Gospodnetic  wrote:
> Hi Tom,
>
> It wouldn't.  I didn't see the mention of parallel indexing in the original 
> email. :)
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> - Original Message 
>> From: Tom Burton-West 
>> To: solr-user@lucene.apache.org
>> Sent: Thu, February 18, 2010 3:30:05 PM
>> Subject: Re: What is largest reasonable setting for ramBufferSizeMB?
>>
>>
>> Thanks Otis,
>>
>> I don't know enough about Hadoop to understand the advantage of using Hadoop
>> in this use case.  How would using Hadoop differ from distributing the
>> indexing over 10 shards on 10 machines with Solr?
>>
>> Tom
>>
>>
>>
>> Otis Gospodnetic wrote:
>> >
>> > Hi Tom,
>> >
>> > 32MB is very low, 320MB is medium, and I think you could go higher, just
>> > pick whichever garbage collector is good for throughput.  I know Java 1.6
>> > update 18 also has some Hotspot and maybe also GC fixes, so I'd use that.
>> > Finally, this sounds like a good use case for reindexing with Hadoop!
>> >
>> >  Otis
>> > 
>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> > Hadoop ecosystem search :: http://search-hadoop.com/
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB--tp27631231p27645167.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 

-

Question regarding wildcards and dismax

2010-02-19 Thread Roland Villemoes

Hi all,

We have a web application build on top of Solr, and we are using a lot of 
facets - everything works just fine.
When the user first hits the searchpage - we would like to do a "get all query" 
to the a result, and thereby get all facets so we can build up the user 
interface from this result/facets.

So I would like to do a q=*:* on the search. But since I have switched to the 
dismax requesthandler this does not work anymore. ?

My request/url looks like this:


a)   /solr/da/mysearcher/?q=*:*   Does not work

b)  /solr/da/select?q=*:*  Does work


But I really need to use a) since I control boosting/ranking in the definition.
Furthermore when the user "drill down" the search result, by selecting from the 
facets, I still need to get the full searchresult, like:

/solr/da/mysearcher/?q=*:*&fq=color:red Does not work.

Can anyone help here? I think that the situation for my web application here is 
quite normal (Get a full resultset to build facets, then let the user du a 
drill down etc)


Thanks a lot in advance


med venlig hilsen/best regards

Roland Villemoes
Tel: (+45) 22 69 59 62
E-Mail: mailto:r...@alpha-solutions.dk

Alpha Solutions A/S
Borgergade 2, 3.sal, 1300 København K
Tel: (+45) 70 20 65 38
Web: http://www.alpha-solutions.dk

** This message including any attachments may contain confidential and/or 
privileged information intended only for the person or entity to which it is 
addressed. If you are not the intended recipient you should delete this 
message. Any printing, copying, distribution or other use of this message is 
strictly prohibited. If you have received this message in error, please notify 
the sender immediately by telephone, or e-mail and delete all copies of this 
message and any attachments from your system. Thank you.

Re: Question regarding wildcards and dismax

2010-02-19 Thread gwk

Have a look at the q.alt parameter 
(http://wiki.apache.org/solr/DisMaxRequestHandler#q.alt) which is used 
for exactly this issue. Basically putting q.alt=*:* in your query means 
you can leave out the q parameter if you want all documents to be selected.


Regards,

gwk

On 2/19/2010 11:28 AM, Roland Villemoes wrote:

Hi all,

We have a web application build on top of Solr, and we are using a lot of 
facets - everything works just fine.
When the user first hits the searchpage - we would like to do a "get all query" 
to the a result, and thereby get all facets so we can build up the user interface from 
this result/facets.

So I would like to do a q=*:* on the search. But since I have switched to the 
dismax requesthandler this does not work anymore. ?

My request/url looks like this:


a)   /solr/da/mysearcher/?q=*:*   Does not work

b)  /solr/da/select?q=*:*  Does work


But I really need to use a) since I control boosting/ranking in the definition.
Furthermore when the user "drill down" the search result, by selecting from the 
facets, I still need to get the full searchresult, like:

/solr/da/mysearcher/?q=*:*&fq=color:red Does not work.

range of scores : queryNorm()

2010-02-19 Thread Smith G

Hello ,
   I have observed that even if we change boosting
drastically, scores are being normalized at the end because of
queryNorm value. Is there anything ( regarding to the queryNorm) that
we can rely on ? like score will always be under 10 or some fixed
value ? The main objective is to provide scores in a fixed range to
the partner. So have you been experienced anything like this? Is it
possible to do so ?.
Have you been experienced any strange situation like for a
particular query, result scores were really high compared to routine?
if yes,I would like to know  the factor that effected scores
drastically, because it may help me to proceed or understand the
cases.

Thanks.

Re: Range Searches in Collections

2010-02-19 Thread cjkadakia


Unfortunately the number of fees is unknown so we couldn't add the fields
into the solr schema until runtime. The work-around we did was create an
additional column in the view I'm pulling from for the index to determine
each record's minimum "fee" and throw that into the column. A total hack,
but now I can simply sort on the "minFee" and problem (hackingly) solved! :)

Otis Gospodnetic wrote:
> 
> Hm, yes, it sounds like your "fees" field has multiple values/tokens, one
> for each fee.  That's full-text search for you. :)
> How about having multiple fee fields, each with just one fee value?
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
> 
> 
> 
> From: cjkadakia 
> To: solr-user@lucene.apache.org
> Sent: Thu, February 18, 2010 7:58:23 PM
> Subject: Range Searches in Collections
> 
> 
> Hi, I'm trying to do a search on a range of floats that are part of my
> solr
> schema. Basically we have a collection of "fees" that are associated with
> each document in our index.
> 
> The query I tried was:
> 
> q=fees:[3 TO 10]
> 
> This should return me documents with Fee values between 3 and 10
> inclusively, which it does. However, I need it to check for ALL items in
> this collection, not just one that satisfies it. Currently, this is
> returning me documents with fee values above 10 and below 3 as long as it
> contains at least one other within.
> 
> Any suggestions on how to accomplish this?
> -- 
> View this message in context:
> http://old.nabble.com/Range-Searches-in-Collections-tp27648470p27648470.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Range-Searches-in-Collections-tp27648470p27653341.html
Sent from the Solr - User mailing list archive at Nabble.com.

highlighting fragments EMPTY

2010-02-19 Thread adeelmahmood


hi
i am trying to get highlighting working and its turning out to be a pain.
here is my schema

 
 
 
 

here is the catchall field (default field for search as well)


here is how I have setup the solrconfig file

 title pi status
 
 0
 
 content
 content
 content
 
 regex 
 regex 
 regex   

after this when I search for lets say
http://localhost:8983/solr/select?q=submit&hl=true
I get these results in highlight section

   
   
   
   
   
  
with no reference to the actual string .. this number thats being returned
is the id of the records .. and is also the unique identifier .. why am I
not getting the string fragments with search terms highlighted

thanks for ur help
-- 
View this message in context: 
http://old.nabble.com/highlighting-fragments-EMPTY-tp27654005p27654005.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: highlighting fragments EMPTY

2010-02-19 Thread Jan

All of your fields seem to be of a "string" type, that's why the highlighting 
doesn't work. 

The highlighting fields must be tokenized before you can do the highlighting on 
them. 

Jan.


--- On Fri, 2/19/10, adeelmahmood  wrote:

From: adeelmahmood 
Subject: highlighting fragments EMPTY
To: solr-user@lucene.apache.org
Date: Friday, February 19, 2010, 4:46 PM


hi
i am trying to get highlighting working and its turning out to be a pain.
here is my schema

 
 
 
 

here is the catchall field (default field for search as well)


here is how I have setup the solrconfig file

     title pi status
     
     0
     
     content
 content
 content
     
 regex 
 regex 
 regex     
    
after this when I search for lets say
http://localhost:8983/solr/select?q=submit&hl=true
I get these results in highlight section

   
   
   
   
   
  
with no reference to the actual string .. this number thats being returned
is the id of the records .. and is also the unique identifier .. why am I
not getting the string fragments with search terms highlighted

thanks for ur help
-- 
View this message in context: 
http://old.nabble.com/highlighting-fragments-EMPTY-tp27654005p27654005.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: highlighting fragments EMPTY

2010-02-19 Thread Ahmet Arslan

> hi
> i am trying to get highlighting working and its turning out
> to be a pain.
> here is my schema
> 
>  stored="true" required="true"
> /> 
>  stored="true"  /> 
>  stored="true" /> 
>  stored="true" /> 
> 
> here is the catchall field (default field for search as
> well)
>  stored="false"
> multiValued="true"/>
> 
> here is how I have setup the solrconfig file
> 
>      title pi
> status
>      
>       name="f.name.hl.fragsize">0
>      
>       name="f.title.hl.alternateField">content
>   name="f.pi.hl.alternateField">content
>   name="f.status.hl.alternateField">content
>      
>   name="f.title.hl.fragmenter">regex 
>   name="f.pi.hl.fragmenter">regex 
>   name="f.status.hl.fragmenter">regex     
>     
> after this when I search for lets say
> http://localhost:8983/solr/select?q=submit&hl=true
> I get these results in highlight section
> 
>    
>    
>    
>    
>    
>   
> with no reference to the actual string .. this number thats
> being returned
> is the id of the records .. and is also the unique
> identifier .. why am I
> not getting the string fragments with search terms
> highlighted

You need to change type of fields (title, pi, staus) from string to text (same 
as content field). 

There should be a match/hit on that field in order to create highlighted 
snippets.

For example q=title:submit should return documents so that snippet of title can 
be generated.

FYI: You can search title, pi, status at the same time using 
http://wiki.apache.org/solr/DisMaxRequestHandler without copying all of them 
into a catch all field.

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-19 Thread Yonik Seeley

On Fri, Feb 19, 2010 at 5:03 AM, Glen Newton  wrote:
> You may consider using LuSql[1] to create the indexes, if your source
> content is in a JDBC accessible db. It is quite a bit faster than
> Solr, as it is a tool specifically created and tuned for Lucene
> indexing.

Any idea why it's faster?
AFAIK, the main purpose of DIH is indexing databases too.  If DIH is
much slower, we should speed it up!

-Yonik
http://www.lucidimagination.com

Re: long warmup duration

2010-02-19 Thread Stefan Neumann

Hey,

I am quite confused with your configuration. It seems to me, that your
caches are extremly small for 30 million documents (128) and during
warmup you only put up to 20 docs in it. Please correct me if I
misunderstand anything.

In my opinion your warm up duration is not that impressiv, since we
currently disabled warmup, the new searcher is registered only in a few
seconds.

Actually, I would not drop these cache numbers. With a cache of 30k
documents we had a hitraion of 60%, decreasing this size the hitratio
decreased as well. With a hitratio of currently 30% it seems to be
better to disable caching anyway. Of course we would love to use caching
;-).

with best regards,

Stefan


Antonio Lobato wrote:
> Drop those cache numbers.  Way down.  I warm up 30 million documents in about 
> 2 minutes with the following configuration:
> 
>class="solr.FastLRUCache"
> size="128"
> initialSize="10"
> cleanupThread="true" />
> 
>class="solr.FastLRUCache"
> size="128"
> initialSize="10"
> autowarmCount="20"
> cleanupThread="true" />
> 
>class="solr.FastLRUCache"
> size="128"
> initialSize="10"
> autowarmCount="20"
> cleanupThread="true" />
> 
>class="solr.FastLRUCache"
> size="128"
> initialSize="10"
> autowarmCount="20"
> cleanupThread="true" />
> 
> Mind you, I also use Solr 1.4.  Also, setup a decent warming query or two, as 
> so:
>  date:[NOW-2DAYS TO NOW] 0 
> 100 date desc
> 
> Don't warm facets that have a large amount of terms or you will kill your 
> warm up time.
> 
> Hope this helps!
> 
> On Feb 17, 2010, at 8:55 AM, Stefan Neumann wrote:
> 
>> Hi all,
>>
>> we are facing extremly increasing warmup times the last 15 days, which
>> we are not able to explain, since the number of documents and their size
>> is stable. Before the increase we can commit our changes in nearly 20
>> minutes, now it is about 2 hours.
>>
>> We were able to identify the warmup of the caches (queryresultCache and
>> filterCache) as the reason. We tried to decrease the number of warmup
>> elements from 3 to 1 without any impact.
>>
>> What influences the runtime during the warmup? Is there any possibility
>> to boost the warmup?
>>
>> I attach some more information and statistics.
>>
>> Thanks a lot for your help.
>>
>> Stefan
>>
>>
>> Solr:1.3
>> Documents:   4.000.000
>> -Xmx 12G
>> index size/disc 4.7G
>>
>> config:
>>
>> 100
>> 200
>>
>> No queries configured for warming.
>>
>> CACHES:
>> ===
>>
>> name:   queryResultCache
>> class:  org.apache.solr.search.LRUCache
>> version:1.0
>> description:LRU Cache(maxSize=20,
>>  initialSize=3,
>>autowarmCount=1,
>>  regenerator=org.apache.solr.search.solrindexsearche...@36eb7331)
>> stats:
>>
>> lookups:15958
>> hits :  9589
>> hitratio:   0.60
>> inserts:16211
>> evictions:  0
>> size:   16169
>> warmupTime :1960239
>> cumulative_lookups: 436250
>> cumulative_hits:260678
>> cumulative_hitratio:0.59
>> cumulative_inserts: 174066
>> cumulative_evictions:   0
>>
>>
>> name:filterCache
>> class:   org.apache.solr.search.LRUCache
>> version: 1.0
>> description: LRU Cache(maxSize=20,
>>initialSize=3,
>>  autowarmCount=3,
>>  regenerator=org.apache.solr.search.solrindexsearche...@9818f80)
>> stats:   
>> lookups: 6313622
>> hits:   6304004
>> hitratio: 0.99
>> inserts: 42266
>> evictions: 0
>> size: 40827
>> warmupTime: 1268074
>> cumulative_lookups: 118887830
>> cumulative_hits: 118605224
>> cumulative_hitratio: 0.99
>> cumulative_inserts: 296134
>> cumulative_evictions: 0
>>
>>
>>
> 
> 

-- 

Stefan Neumann
Dipl.-Ing.

freiheit.com technologies gmbh
Straßenbahnring 22 / 20251 Hamburg, Germany
fon   +49 (0)40 / 890584-0
fax   +49 (0)40 / 890584-20
HRB Hamburg 70814

1CB2 BA3C 168F 0C2B 6005 FC5E 3EBA BCE2 1BF0 21D3
Geschäftsführer: Claudia Dietze, Stefan Richter, Jörg Kirchhof

Re: Run Solr within my war

2010-02-19 Thread Pulkit Singhal

Using EmbeddedSolrServer is a client side way of communicating with
Solr via the file system. Solr has to still be up and running before
that. My question is more along the lines of how to put the server
jars that perform the core functionality and bundle them to start up
within a war which is also the application war for the program that
will communicate as the client with the Solr server.

On Thu, Feb 18, 2010 at 5:49 PM, Richard Frovarp  wrote:
> On 2/18/2010 4:22 PM, Pulkit Singhal wrote:
>>
>> Hello Everyone,
>>
>> I do NOT want to host Solr separately. I want to run it within my war
>> with the Java Application which is using it. How easy/difficult is
>> that to setup? Can anyone with past experience on this topic, please
>> comment.
>>
>> thanks,
>> - Pulkit
>>
>>
>
> So basically you're talking about running an embedded version of Solr like
> the EmbeddedSolrServer? I have no experience on this, but this should
> provide you the correct search term to find documentation on use. From what
> little code I've seen to run test cases against Solr, it looks relatively
> straight forward to get running. To use you would use the SolrJ library to
> communicate with the embedded solr server.
>
> Richard
>

Re: @Field annotation support

2010-02-19 Thread Pulkit Singhal

Ok then, is this the correct class to support the @Field annotation?
Because I have it on the path but its not working.

org\apache\solr\solr-solrj\1.4.0\solr-solrj-1.4.0.jar/org\apache\solr\client\solrj\beans\Field.class

2010/2/18 Noble Paul നോബിള്‍  नोब्ळ् :
> solrj jar
>
> On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal
>  wrote:
>> Hello All,
>>
>> When I use Maven or Eclipse to try and compile my bean which has the
>> @Field annotation as specified in http://wiki.apache.org/solr/Solrj
>> page ... the compiler doesn't find any class to support the
>> annotation. What jar should we use to bring in this custom Solr
>> annotation?
>>
>
>
>
> --
> -
> Noble Paul | Systems Architect| AOL | http://aol.com
>

Re: Run Solr within my war

2010-02-19 Thread Richard Frovarp


Pulkit Singhal wrote:

Using EmbeddedSolrServer is a client side way of communicating with
Solr via the file system. Solr has to still be up and running before
that. My question is more along the lines of how to put the server
jars that perform the core functionality and bundle them to start up
within a war which is also the application war for the program that
will communicate as the client with the Solr server.
  
I could be way wrong, but my interpretation is that EmbeddedSolrServer 
provides a way to embed Solr into an application without requiring that 
anything else is running.


http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

If you are looking for a method of your application doing SolrJ calls to 
Solr, without having to install a separate Solr instance, 
EmbeddedSolrServer would meet your needs. You'd have to use a few other 
functions to load the core and register it, but it's doable without 
having anything else running.

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-19 Thread Tom Burton-West


Hi Glen,

I'd love to use LuSql, but our data is not in a db.  Its 6-8TB of files
containing OCR (one file per page for about 1.5 billion pages) gzipped on
disk which are ugzipped, concatenated, and converted to Solr documents
on-the-fly.  We have multiple instances of our Solr document producer script
running. At this point we can run enough producers, so that the rate at
which Solr can ingest and index documents is our current bottleneck and so
far the bottleneck we see for indexing appears to be disk I/O for
Solr/Lucene during merges.

Is there any obvious relationship between the size of the ramBuffer and how
much heap you need to give the JVM, or is there some reasonable method of
finding this out by experimentation?
We would rather not find out by decreasing the amount of memory allocated to
the JVM until we get an OOM.

Tom



I've run Lucene with heap sizes as large as 28GB of RAM (on a 32GB
machine, 64bit, Linux) and a ramBufferSize of 3GB. While I haven't
noticed the GC issues mark mentioned in this configuration, I have
seen them in the ranges he discusses (on 1.6 http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql


-- 
View this message in context: 
http://old.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB--tp27631231p27658384.html
Sent from the Solr - User mailing list archive at Nabble.com.

Multicore Example

2010-02-19 Thread Lee Smith

Hey All

Trying to dip my feet into multicore and hoping someone can advise why the 
example is not working.

Basically I have been working with the example single core fine so I have 
stopped the server and restarted with the new command line for multicore

ie, java -Dsolr.solr.home=multicore -jar start.jar

When it launches I get this error:

2010-02-19 11:13:39.740::WARN:  EXCEPTION
java.net.BindException: Address already in use
at java.net.PlainSocketImpl.socketBind(Native Method)
at etc

Any ideas what this can be because I have stopped the first one.

Thank you if you can advise.

Documents disappearing

2010-02-19 Thread Pascal Dimassimo


Hi,

I have encounter a situation that I can't explain. We are indexing documents
that are often duplicates so we activated deduplication like this:


  true
  true
  signature
  title,text
  org.apache.solr.update.processor.Lookup3Signature


What I can't explain is that when I look at the documents count in the log,
I see documents disappearing.

11:24:23 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=0 status=0 QTime=0
14:04:24 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=4065 status=0 QTime=10
14:17:07 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=6499 status=0 QTime=42
14:25:42 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7629 status=0 QTime=1
14:47:12 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=10140 status=0 QTime=12
15:17:22 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=10861 status=0 QTime=13
15:47:31 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=9852 status=0 QTime=19
16:17:42 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=8112 status=0 QTime=13
16:38:17 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=10
16:39:10 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=1
16:47:40 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=46
16:51:24 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=74
17:02:13 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=102
17:17:41 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=8

11:24 was the time at which Solr was started that day. Around 13:30, we
started the indexation.

At some point during the indexation, I notice that a batch a documents were
resend (i.e, documents with the same id field were sent again to the index).
And according to the log, NO delete was sent to Solr.

I understand that if I send duplicates (either documents with the same id or
with the same signature), the count of documents should stay the same. But
how can we explain that it is lowering? What are the possible causes of this
behavior?

Thanks! 
-- 
View this message in context: 
http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore Example

2010-02-19 Thread Pascal Dimassimo


Are you sure that you don't have any java processes that are still running?

Did you change the port or are you still using 8983?


Lee Smith-6 wrote:
> 
> Hey All
> 
> Trying to dip my feet into multicore and hoping someone can advise why the
> example is not working.
> 
> Basically I have been working with the example single core fine so I have
> stopped the server and restarted with the new command line for multicore
> 
> ie, java -Dsolr.solr.home=multicore -jar start.jar
> 
> When it launches I get this error:
> 
> 2010-02-19 11:13:39.740::WARN:  EXCEPTION
> java.net.BindException: Address already in use
>   at java.net.PlainSocketImpl.socketBind(Native Method)
>   at etc
> 
> Any ideas what this can be because I have stopped the first one.
> 
> Thank you if you can advise.
> 
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Multicore-Example-tp27659052p27659102.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore Example

2010-02-19 Thread Dave Searle

Do you have something else using port 8983 or 8080?

Sent from my iPhone

On 19 Feb 2010, at 19:22, "Lee Smith"  wrote:

> Hey All
>
> Trying to dip my feet into multicore and hoping someone can advise  
> why the example is not working.
>
> Basically I have been working with the example single core fine so I  
> have stopped the server and restarted with the new command line for  
> multicore
>
> ie, java -Dsolr.solr.home=multicore -jar start.jar
>
> When it launches I get this error:
>
> 2010-02-19 11:13:39.740::WARN:  EXCEPTION
> java.net.BindException: Address already in use
>at java.net.PlainSocketImpl.socketBind(Native Method)
>at etc
>
> Any ideas what this can be because I have stopped the first one.
>
> Thank you if you can advise.
>
>

Re: Seattle Hadoop/Lucene/NoSQL Meetup; Wed Feb 24th, Feat. MongoDB

2010-02-19 Thread Nick Dimiduk

Reminder: this month's Seattle Hadoop Meetup is this Wednesday. Don't forget
to RSVP!

On Tue, Feb 16, 2010 at 6:09 PM, Bradford Stephens <
bradfordsteph...@gmail.com> wrote:

> Greetings,
>
> It's time for another awesome Seattle Hadoop/Lucene/Scalability/NoSQL
> Meetup!
>
> As always, it's at the University of Washington, Allen Computer
> Science building, Room 303 at 6:45pm. You can find a map here:
> http://www.washington.edu/home/maps/southcentral.html?cse
>
> Last month, we had a great talk from Steve McPherson of Razorfish on
> their usage of Hadoop. This month, we'll have Richard Kreuter from
> MongoDB talking about, well, MongoDB. As well as assorted discussion
> on the Hadoop ecosystem.
>
> If you can, please RSVP here (not required, but very nice):
> http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/
>
> My cell # is 904-415-3009 if you have questions/get lost.
>
> Cheers,
> Bradford
>
> --
> http://www.drawntoscalehq.com -- Big Data for all. The Big Data Platform.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>

Re: Multicore Example

2010-02-19 Thread Lee Smith

How can I find out ??


On 19 Feb 2010, at 19:26, Dave Searle wrote:

> Do you have something else using port 8983 or 8080?
> 
> Sent from my iPhone
> 
> On 19 Feb 2010, at 19:22, "Lee Smith"  wrote:
> 
>> Hey All
>> 
>> Trying to dip my feet into multicore and hoping someone can advise  
>> why the example is not working.
>> 
>> Basically I have been working with the example single core fine so I  
>> have stopped the server and restarted with the new command line for  
>> multicore
>> 
>> ie, java -Dsolr.solr.home=multicore -jar start.jar
>> 
>> When it launches I get this error:
>> 
>> 2010-02-19 11:13:39.740::WARN:  EXCEPTION
>> java.net.BindException: Address already in use
>>   at java.net.PlainSocketImpl.socketBind(Native Method)
>>   at etc
>> 
>> Any ideas what this can be because I have stopped the first one.
>> 
>> Thank you if you can advise.
>> 
>>

Re: long warmup duration

2010-02-19 Thread Yonik Seeley

On Fri, Feb 19, 2010 at 12:17 PM, Stefan Neumann
 wrote:
> I am quite confused with your configuration. It seems to me, that your
> caches are extremly small for 30 million documents (128)

The units of the cache are entries, not documents.
So a queryResultCache autowarm count of a few dozen is normally
perfectly sufficient.

-Yonik
http://www.lucidimagination.com

Strange performance behaviour when concurrent requests are done

2010-02-19 Thread Marc Sturlese


Hey there,
I have been doing some stress with a 2 physical CPU (with 4 cores each)
server.
After some reading about GC performance tunning I have configured it this
way:

/usr/lib/jvm/java-6-sun/bin/java -server -Xms7000m -Xmx7000m
-XX:ReservedCodeCacheSize=10m -XX:NewSize=1000m -XX:MaxNewSize=1000m
-XX:SurvivorRatio=16 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled -XX:+CMSClassUnloadingEnabled -XX:PermSize=35m
-XX:MaxPermSize=35m
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.endorsed.dirs=/opt/tomcat-shard-00/common/endorsed

My java version is:
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)

My index is optimized with compound file and in readOnly mode. I just have a
solr core in my Solr box.

I have launched different test against an index and for my surprise the
results are:

test1:
number of concurrent threads: 2
throughput: 15
response ms: 130

test2:
number of concurrent threads: 3
throughput: 22.3
average response ms: 130

test3:
number of concurrent threads: 4
throughput: 28
average response ms: 140

test4:
number of concurrent threads: 5
throughput: 26.8
average response ms: 190

test5:
number of concurrent threads: 6
throughput: 22
average response ms: 270

All requests are launched to the same IndexSearcher (no reloads or warmings
are done during the test)
I have activated the debug in the JVM to see when a GC happens. It is
happening every 3 seconds and it takes 20ms aprox in test1,test2,test3.
In test4 and test5 it happens every 3 seconds aswell and takes 40ms. So,
looks like GC is not delaying the average 
response time of the requests.
The machine has 4 cores and it is really not stressed in terms of CPU,
neighter IO (I am using ssd disk).

Given this scenario, how is it possible that changing from 5 concurrent
threads to 6 the average response time is almost double?
(or from 4 to 5 is not double but it still significantly more)
I think GC can't be the cause given the numbers I have mencioned.
As far as I always have understood Lucene IndexSearcher deals perfectly with
concurreny but it's seems that there's something there that blocks
when there is more that 2 requests at the same time.

Compound file optimization gives better response times but could in any way
be bad for performance?

I am so confused about this... can someone explain me if this is normal or
why does it happens? I mean, if lucene or Solr has some blocking thing?

Thanks in advance

-- 
View this message in context: 
http://old.nabble.com/Strange-performance-behaviour-when-concurrent-requests-are-done-tp27659695p27659695.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Documents disappearing

2010-02-19 Thread Ankit Bhatnagar

Try inspecting your index with luke


Ankit


-Original Message-
From: Pascal Dimassimo [mailto:thesuper...@hotmail.com] 
Sent: Friday, February 19, 2010 2:22 PM
To: solr-user@lucene.apache.org
Subject: Documents disappearing


Hi,

I have encounter a situation that I can't explain. We are indexing documents
that are often duplicates so we activated deduplication like this:


  true
  true
  signature
  title,text
  org.apache.solr.update.processor.Lookup3Signature


What I can't explain is that when I look at the documents count in the log,
I see documents disappearing.

11:24:23 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=0 status=0 QTime=0
14:04:24 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=4065 status=0 QTime=10
14:17:07 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=6499 status=0 QTime=42
14:25:42 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7629 status=0 QTime=1
14:47:12 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=10140 status=0 QTime=12
15:17:22 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=10861 status=0 QTime=13
15:47:31 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=9852 status=0 QTime=19
16:17:42 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=8112 status=0 QTime=13
16:38:17 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=10
16:39:10 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=1
16:47:40 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=46
16:51:24 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=74
17:02:13 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=102
17:17:41 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=8

11:24 was the time at which Solr was started that day. Around 13:30, we
started the indexation.

At some point during the indexation, I notice that a batch a documents were
resend (i.e, documents with the same id field were sent again to the index).
And according to the log, NO delete was sent to Solr.

I understand that if I send duplicates (either documents with the same id or
with the same signature), the count of documents should stay the same. But
how can we explain that it is lowering? What are the possible causes of this
behavior?

Thanks! 
-- 
View this message in context: 
http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore Example

2010-02-19 Thread Dave Searle

Are you on windows? Try netstat -a

Sent from my iPhone

On 19 Feb 2010, at 20:02, "Lee Smith"  wrote:

> How can I find out ??
>
>
> On 19 Feb 2010, at 19:26, Dave Searle wrote:
>
>> Do you have something else using port 8983 or 8080?
>>
>> Sent from my iPhone
>>
>> On 19 Feb 2010, at 19:22, "Lee Smith"  wrote:
>>
>>> Hey All
>>>
>>> Trying to dip my feet into multicore and hoping someone can advise
>>> why the example is not working.
>>>
>>> Basically I have been working with the example single core fine so I
>>> have stopped the server and restarted with the new command line for
>>> multicore
>>>
>>> ie, java -Dsolr.solr.home=multicore -jar start.jar
>>>
>>> When it launches I get this error:
>>>
>>> 2010-02-19 11:13:39.740::WARN:  EXCEPTION
>>> java.net.BindException: Address already in use
>>>  at java.net.PlainSocketImpl.socketBind(Native Method)
>>>  at etc
>>>
>>> Any ideas what this can be because I have stopped the first one.
>>>
>>> Thank you if you can advise.
>>>
>>>
>

Re: Multicore Example

2010-02-19 Thread Shawn Heisey

Assuming you are on a unix variant with a working lsof, use this.  This 
probably won't work correctly on Solaris 10:


lsof -nPi | grep 8983
lsof -nPi | grep 8080

On Windows, you can do this in a command prompt.  It requires elevation 
on Vista or later.  The -b option was added in WinXP SP2 and Win2003 
SP1, without it you can't see the program name that's got the port open:


netstat -b > ports.txt
ports.txt

Shawn


On 2/19/2010 1:01 PM, Lee Smith wrote:

How can I find out ??


On 19 Feb 2010, at 19:26, Dave Searle wrote:

   

Do you have something else using port 8983 or 8080?

RE: Documents disappearing

2010-02-19 Thread Pascal Dimassimo


Using LukeRequestHandler, I see:

7725
28099
758826
1266355690710
false
true
true

org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/opt/solr/myindex/data/index


I will copy the index to my local machine so I can open it with luke. Should
I look for something specific?

Thanks!


ANKITBHATNAGAR wrote:
> 
> Try inspecting your index with luke
> 
> 
> Ankit
> 
> 
> -Original Message-
> From: Pascal Dimassimo [mailto:thesuper...@hotmail.com] 
> Sent: Friday, February 19, 2010 2:22 PM
> To: solr-user@lucene.apache.org
> Subject: Documents disappearing
> 
> 
> Hi,
> 
> I have encounter a situation that I can't explain. We are indexing
> documents
> that are often duplicates so we activated deduplication like this:
> 
>  class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
>   true
>   true
>   signature
>   title,text
>name="signatureClass">org.apache.solr.update.processor.Lookup3Signature
> 
> 
> What I can't explain is that when I look at the documents count in the
> log,
> I see documents disappearing.
> 
> 11:24:23 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=0 status=0 QTime=0
> 14:04:24 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=4065 status=0 QTime=10
> 14:17:07 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=6499 status=0 QTime=42
> 14:25:42 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=7629 status=0 QTime=1
> 14:47:12 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=10140 status=0 QTime=12
> 15:17:22 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=10861 status=0 QTime=13
> 15:47:31 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=9852 status=0 QTime=19
> 16:17:42 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=8112 status=0 QTime=13
> 16:38:17 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=10
> 16:39:10 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=1
> 16:47:40 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=46
> 16:51:24 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=74
> 17:02:13 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=102
> 17:17:41 INFO  - [myindex] webapp=null path=null
> params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=8
> 
> 11:24 was the time at which Solr was started that day. Around 13:30, we
> started the indexation.
> 
> At some point during the indexation, I notice that a batch a documents
> were
> resend (i.e, documents with the same id field were sent again to the
> index).
> And according to the log, NO delete was sent to Solr.
> 
> I understand that if I send duplicates (either documents with the same id
> or
> with the same signature), the count of documents should stay the same. But
> how can we explain that it is lowering? What are the possible causes of
> this
> behavior?
> 
> Thanks! 
> -- 
> View this message in context:
> http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Documents-disappearing-tp27659047p27660077.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore Example

2010-02-19 Thread Lee Smith

Thanks Shawn

I am actually running it on mac

It does not like those unix commands ??

Any further advice ?

Lee

On 19 Feb 2010, at 20:32, Shawn Heisey wrote:

> Assuming you are on a unix variant with a working lsof, use this.  This 
> probably won't work correctly on Solaris 10:
> 
> lsof -nPi | grep 8983
> lsof -nPi | grep 8080
> 
> On Windows, you can do this in a command prompt.  It requires elevation on 
> Vista or later.  The -b option was added in WinXP SP2 and Win2003 SP1, 
> without it you can't see the program name that's got the port open:
> 
> netstat -b > ports.txt
> ports.txt
> 
> Shawn
> 
> 
> On 2/19/2010 1:01 PM, Lee Smith wrote:
>> How can I find out ??
>> 
>> 
>> On 19 Feb 2010, at 19:26, Dave Searle wrote:
>> 
>>   
>>> Do you have something else using port 8983 or 8080?
>>> 
>

Re: Multicore Example

2010-02-19 Thread K Wong

The point that these guys are trying to make is that if another
program is using the port that Solr is trying to bind to then they
will both fight over the exclusive use of the port.

Both the netstat and lsof command work fine on my Mac (Leopard 10.5.8).

Trinity:~ kelvin$ which netstat
/usr/sbin/netstat
Trinity:~ kelvin$ which lsof
/usr/sbin/lsof
Trinity:~ kelvin$

If you use MacPorts, you can also find out port information using 'nmap'.

If something is already using the port Solr is trying to use then you
need to configure Solr to use a different port.

K

On Fri, Feb 19, 2010 at 12:51 PM, Lee Smith  wrote:
> Thanks Shawn
>
> I am actually running it on mac
>
> It does not like those unix commands ??
>
> Any further advice ?
>
> Lee
>
> On 19 Feb 2010, at 20:32, Shawn Heisey wrote:
>
>> Assuming you are on a unix variant with a working lsof, use this.  This 
>> probably won't work correctly on Solaris 10:
>>
>> lsof -nPi | grep 8983
>> lsof -nPi | grep 8080
>>
>> On Windows, you can do this in a command prompt.  It requires elevation on 
>> Vista or later.  The -b option was added in WinXP SP2 and Win2003 SP1, 
>> without it you can't see the program name that's got the port open:
>>
>> netstat -b > ports.txt
>> ports.txt
>>
>> Shawn
>>
>>
>> On 2/19/2010 1:01 PM, Lee Smith wrote:
>>> How can I find out ??
>>>
>>>
>>> On 19 Feb 2010, at 19:26, Dave Searle wrote:
>>>
>>>
 Do you have something else using port 8983 or 8080?

>>
>
>

Solr 1.5 in production

2010-02-19 Thread Asif Rahman

What is the prevailing opinion on using solr 1.5 in a production
environment?  I know that many people were using 1.4 in production for a
while before it became an official release.

Specifically I'm interested in using some of the new spatial features.

Thanks,

Asif

-- 
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com

Re: Solr 1.5 in production

2010-02-19 Thread Grant Ingersoll

On Feb 19, 2010, at 4:54 PM, Asif Rahman wrote:

> What is the prevailing opinion on using solr 1.5 in a production
> environment?  I know that many people were using 1.4 in production for a
> while before it became an official release.
> 
> Specifically I'm interested in using some of the new spatial features.

These aren't fully baked yet (still need some spatial filtering capabilities 
which I'm getting close to done with, or close enough to submit a patch 
anyway), but feedback would be welcome.  The main risk, I suppose, is that any 
new APIs could change.  Other than that, the usually advice applies:  Test it 
out in your environment and see if it meets your needs.  On the spatial stuff, 
we'd definitely appreciate feedback on performance, functionality, APIs, etc.

-Grant

Re: long warmup duration

2010-02-19 Thread Antonio Lobato

You can disable warming, and a new searcher will register (almost) 
instantly, no matter the size.  However, once you run your first search, 
you will be "warming" your searcher, and it will block for a long, long 
time, giving the end user a "frozen" page.


Warming is just another word for "running a set of queries before the 
searcher is pushed to the front end."  Naturally if you disable warming, 
your searcher will register right away.  I wouldn't recommend it 
though.  If I disable warming on my documents, my new searchers would 
register instantly, but my first search on my web page would be stuck 
for 50 seconds or so.


As for the cache size, caching does a cache on entry data, not 
documents.  That's what warming is for.


On 2/19/2010 12:17 PM, Stefan Neumann wrote:

Hey,

I am quite confused with your configuration. It seems to me, that your
caches are extremly small for 30 million documents (128) and during
warmup you only put up to 20 docs in it. Please correct me if I
misunderstand anything.

In my opinion your warm up duration is not that impressiv, since we
currently disabled warmup, the new searcher is registered only in a few
seconds.

Actually, I would not drop these cache numbers. With a cache of 30k
documents we had a hitraion of 60%, decreasing this size the hitratio
decreased as well. With a hitratio of currently 30% it seems to be
better to disable caching anyway. Of course we would love to use caching
;-).

with best regards,

Stefan


Antonio Lobato wrote:
   

Drop those cache numbers.  Way down.  I warm up 30 million documents in about 2 
minutes with the following configuration:

   

   

   

   

Mind you, I also use Solr 1.4.  Also, setup a decent warming query or two, as 
so:
  date:[NOW-2DAYS TO NOW]  0  100  date desc

Don't warm facets that have a large amount of terms or you will kill your warm 
up time.

Hope this helps!

On Feb 17, 2010, at 8:55 AM, Stefan Neumann wrote:

 

Hi all,

we are facing extremly increasing warmup times the last 15 days, which
we are not able to explain, since the number of documents and their size
is stable. Before the increase we can commit our changes in nearly 20
minutes, now it is about 2 hours.

We were able to identify the warmup of the caches (queryresultCache and
filterCache) as the reason. We tried to decrease the number of warmup
elements from 3 to 1 without any impact.

What influences the runtime during the warmup? Is there any possibility
to boost the warmup?

I attach some more information and statistics.

Thanks a lot for your help.

Stefan


Solr:   1.3
Documents:  4.000.000
-Xmx12G
index size/disc 4.7G

config:

100
200

No queries configured for warming.

CACHES:
===

name:   queryResultCache
class:  org.apache.solr.search.LRUCache
version:1.0
description:LRU Cache(maxSize=20,
  initialSize=3,
  autowarmCount=1,
regenerator=org.apache.solr.search.solrindexsearche...@36eb7331)
stats:

lookups:15958
hits :  9589
hitratio:   0.60
inserts:16211
evictions:  0
size:   16169
warmupTime :1960239
cumulative_lookups: 436250
cumulative_hits:260678
cumulative_hitratio:0.59
cumulative_inserts: 174066
cumulative_evictions:   0


name:   filterCache
class:  org.apache.solr.search.LRUCache
version:1.0
description:LRU Cache(maxSize=20,
  initialSize=3,
  autowarmCount=3,  
regenerator=org.apache.solr.search.solrindexsearche...@9818f80)
stats:  
lookups:6313622
hits:   6304004
hitratio: 0.99
inserts: 42266
evictions: 0
size: 40827
warmupTime: 1268074
cumulative_lookups: 118887830
cumulative_hits: 118605224
cumulative_hitratio: 0.99
cumulative_inserts: 296134
cumulative_evictions: 0

filter result by catalog

2010-02-19 Thread Kevin Osborn

So, I am looking at better ways to filter a resultset by catalog. So, I have an 
index of products. And based on the user, I want to filter the search results 
to what they are allowed to see. I will probably have up to 200 or so different 
catalogs.

Re: highlighting fragments EMPTY

2010-02-19 Thread adeelmahmood

well ok I guess that makes sense and I tried changing my title field to text
type and then highlighting worked on it .. but
1) as far as not merging all fields in catchall field and instead
configuring the dismax handler to search through them .. do you mean then
ill have to specify the field I want to do the search in .. e.g.
q=something&hl.fl=title or q=somethingelse&hl.fl=status .. and another thing
is that I have abuot 20 some fields which I am merging in my catch all
fields .. with that many fields do you still think its better to use dismax
or catchall field ???

2) secondly for highlighting q=title:searchterm also didnt worked .. it only
works if I change the type of title field to text instead of string .. even
if I give the full string in q param .. it still doesnt highlights it unless
like I said I change the field type to text ...  so why is that .. and if
thats just how it is and I have to change some of my fields to text .. then
my question is that solr will analyze them first their own field and then
copy them to the catchall field while doing the analysis one more time ..
since catchall field is also text .. i guess this is just more of a
understanding question

thanks for all u guys help

Ahmet Arslan wrote:
> 
>> hi
>> i am trying to get highlighting working and its turning out
>> to be a pain.
>> here is my schema
>> 
>> > stored="true" required="true"
>> /> 
>> > stored="true"  /> 
>> > stored="true" /> 
>> > stored="true" /> 
>> 
>> here is the catchall field (default field for search as
>> well)
>> > stored="false"
>> multiValued="true"/>
>> 
>> here is how I have setup the solrconfig file
>> 
>>      title pi
>> status
>>      
>>      > name="f.name.hl.fragsize">0
>>      
>>      > name="f.title.hl.alternateField">content
>>  > name="f.pi.hl.alternateField">content
>>  > name="f.status.hl.alternateField">content
>>      
>>  > name="f.title.hl.fragmenter">regex 
>>  > name="f.pi.hl.fragmenter">regex 
>>  > name="f.status.hl.fragmenter">regex     
>>     
>> after this when I search for lets say
>> http://localhost:8983/solr/select?q=submit&hl=true
>> I get these results in highlight section
>> 
>>    
>>    
>>    
>>    
>>    
>>   
>> with no reference to the actual string .. this number thats
>> being returned
>> is the id of the records .. and is also the unique
>> identifier .. why am I
>> not getting the string fragments with search terms
>> highlighted
> 
> You need to change type of fields (title, pi, staus) from string to text
> (same as content field). 
> 
> There should be a match/hit on that field in order to create highlighted
> snippets.
> 
> For example q=title:submit should return documents so that snippet of
> title can be generated.
> 
> FYI: You can search title, pi, status at the same time using
> http://wiki.apache.org/solr/DisMaxRequestHandler without copying all of
> them into a catch all field.
> 
> 
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/highlighting-fragments-EMPTY-tp27654005p27661657.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheck.build=true has no effect

2010-02-19 Thread darniz


Hello
Can someone please correct me or acknowlege me is this the correct
behaviour.

Thanksdarniz

darniz wrote:
> 
> Hello All.
> After doing a lot of research i came to this conclusion please correct me
> if i am wrong.
> i noticed that if you have buildonCommit and buildOnOptimize as true in
> your spell check component, then the spell check builds whenever a commit
> or optimze happens. which is the desired behaviour and correct. 
> please read on.
> 
> I am using Index based spell checker and i am copying make and model to my
> spellcheck field. i index some document and the make and model are being
> copied to spellcheck field when i commit.
> Now i stopped my solr server and 
> I added one more filed bodytype to be copied to my spellcheck field.
> i dont want to reindex data so i issued a http request to rebuild my
> spellchecker
> &spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default.
> Looks like the above command has no effect, the bodyType is not being
> copied to spellcheck field.
> 
> The only time the spellcheck filed has bodyType value copied into it is
> when i have to do again reindex document and do a commmit.
> 
> Is this the desired behaviour.
> Adding buildOncommit and buildOnOptimize will force the spellchecker to
> rebuild only if a commit or optimize happens
> Please let me know if there are some configurable parameters so that i can
> issue the http command rather than indexing data again and again.
> 
> 
> thanks
> darniz
> 
> 

-- 
View this message in context: 
http://old.nabble.com/spellcheck.build%3Dtrue-has-no-effect-tp27648346p27661847.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: filter result by catalog

2010-02-19 Thread Otis Gospodnetic

So, hello Kevin,

So what have you tried so far?  I see from 
http://www.search-lucene.com/m?id=839141.906...@web81107.mail.mud.yahoo.com||acl
 you've tried the "acl field" approach.
How about the bitset approach described there?


Otis 
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Kevin Osborn 
> To: Solr 
> Sent: Fri, February 19, 2010 6:06:51 PM
> Subject: filter result by catalog
> 
> So, I am looking at better ways to filter a resultset by catalog. So, I have 
> an 
> index of products. And based on the user, I want to filter the search results 
> to 
> what they are allowed to see. I will probably have up to 200 or so different 
> catalogs.

Re: Documents disappearing

2010-02-19 Thread Otis Gospodnetic

Pascal,

Look at that difference between numDocs and maxDocs.  That delta represents 
deleted docs.  Maybe there is something deleting your docs after all!

Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Pascal Dimassimo 
> To: solr-user@lucene.apache.org
> Sent: Fri, February 19, 2010 3:50:26 PM
> Subject: RE: Documents disappearing
> 
> 
> Using LukeRequestHandler, I see:
> 
> 7725
> 28099
> 758826
> 1266355690710
> false
> true
> true
> 
> org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/opt/solr/myindex/data/index
> 
> 
> I will copy the index to my local machine so I can open it with luke. Should
> I look for something specific?
> 
> Thanks!
> 
> 
> ANKITBHATNAGAR wrote:
> > 
> > Try inspecting your index with luke
> > 
> > 
> > Ankit
> > 
> > 
> > -Original Message-
> > From: Pascal Dimassimo [mailto:thesuper...@hotmail.com] 
> > Sent: Friday, February 19, 2010 2:22 PM
> > To: solr-user@lucene.apache.org
> > Subject: Documents disappearing
> > 
> > 
> > Hi,
> > 
> > I have encounter a situation that I can't explain. We are indexing
> > documents
> > that are often duplicates so we activated deduplication like this:
> > 
> > 
> > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> >  true
> >  true
> >  signature
> >  title,text
> >  
> > name="signatureClass">org.apache.solr.update.processor.Lookup3Signature
> > 
> > 
> > What I can't explain is that when I look at the documents count in the
> > log,
> > I see documents disappearing.
> > 
> > 11:24:23 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=0 status=0 QTime=0
> > 14:04:24 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=4065 status=0 QTime=10
> > 14:17:07 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=6499 status=0 QTime=42
> > 14:25:42 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7629 status=0 QTime=1
> > 14:47:12 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=10140 status=0 QTime=12
> > 15:17:22 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=10861 status=0 QTime=13
> > 15:47:31 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=9852 status=0 QTime=19
> > 16:17:42 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=8112 status=0 QTime=13
> > 16:38:17 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=10
> > 16:39:10 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=1
> > 16:47:40 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=46
> > 16:51:24 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=74
> > 17:02:13 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=102
> > 17:17:41 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=8
> > 
> > 11:24 was the time at which Solr was started that day. Around 13:30, we
> > started the indexation.
> > 
> > At some point during the indexation, I notice that a batch a documents
> > were
> > resend (i.e, documents with the same id field were sent again to the
> > index).
> > And according to the log, NO delete was sent to Solr.
> > 
> > I understand that if I send duplicates (either documents with the same id
> > or
> > with the same signature), the count of documents should stay the same. But
> > how can we explain that it is lowering? What are the possible causes of
> > this
> > behavior?
> > 
> > Thanks! 
> > -- 
> > View this message in context:
> > http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/Documents-disappearing-tp27659047p27660077.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-19 Thread Otis Gospodnetic

Glen may be referring to LuSql indexing with multiple threads?
Does/can DIH do that, too?


Otis 
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Yonik Seeley 
> To: solr-user@lucene.apache.org
> Sent: Fri, February 19, 2010 11:41:07 AM
> Subject: Re: What is largest reasonable setting for ramBufferSizeMB?
> 
> On Fri, Feb 19, 2010 at 5:03 AM, Glen Newton wrote:
> > You may consider using LuSql[1] to create the indexes, if your source
> > content is in a JDBC accessible db. It is quite a bit faster than
> > Solr, as it is a tool specifically created and tuned for Lucene
> > indexing.
> 
> Any idea why it's faster?
> AFAIK, the main purpose of DIH is indexing databases too.  If DIH is
> much slower, we should speed it up!
> 
> -Yonik
> http://www.lucidimagination.com

Re: replications issue

2010-02-19 Thread Otis Gospodnetic

Hello,

You are replicating every 60 seconds?  I hope you don't have a large index with 
lots of continuous index updates on the master, as replicating every 60 
seconds, while doable, may be a bit too frequent (depending on index size, 
amount of changes, cache settings, etc.).

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: giskard 
> To: solr-user@lucene.apache.org
> Sent: Fri, February 19, 2010 4:11:56 AM
> Subject: Re: replications issue
> 
> Ciao,
> 
> Uhm after some time a new index in data/index on the slave has been written
> with the ~size of the master index.
> 
> the configure on both master slave is the same one on the solrReplication 
> wiki 
> page
> "enable/disable master/slave in a node"
> 
> 
>   
> ${enable.master:false} 
> commit
> schema.xml,stopwords.txt
> 
> 
> ${enable.slave:false} 
>   http://localhost:8983/solr/replication
>   00:00:60
> 
> 
> 
> When the master is started, pass in -Denable.master=true and in the slave 
> pass 
> in -Denable.slave=true. Alternately , these values can be stored in a 
> solrcore.properties file as follows
> 
> #solrcore.properties in master
> enable.master=true
> enable.slave=false
> 
> Il giorno 19/feb/2010, alle ore 03.43, Otis Gospodnetic ha scritto:
> 
> > giskard,
> > 
> > Is this on the master or on the slave(s)?
> > Maybe you can paste your replication handler config for the master and your 
> replication handler config for the slave.
> > 
> > Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Hadoop ecosystem search :: http://search-hadoop.com/
> > 
> > 
> > 
> > 
> > 
> > From: giskard 
> > To: solr-user@lucene.apache.org
> > Sent: Thu, February 18, 2010 12:16:37 PM
> > Subject: replications issue
> > 
> > Hi all,
> > 
> > I've setup solr replication as described in the wiki.
> > 
> > when i start the replication a directory called index.$numebers is created 
> after a while
> > it disappears and a new index.$othernumbers is created
> > 
> > index/ remains untouched with an empty index.
> > 
> > any clue?
> > 
> > thank you in advance,
> > Riccardo
> > 
> > --
> > ciao,
> > giskard
> 
> --
> ciao,
> giskard

Re: optimize is taking too much time

2010-02-19 Thread Otis Gospodnetic

Hello,

Solr will never optimize the whole index without somebody explicitly asking for 
it.
Lucene will merge index segments on the master as documents are indexed.  How 
often it does that depends on mergeFactor.

See:
http://search-lucene.com/?q=mergeFactor+segment+merge&fc_project=Lucene&fc_project=Solr&fc_type=mail+_hash_+user


Otis 
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: mklprasad 
> To: solr-user@lucene.apache.org
> Sent: Fri, February 19, 2010 1:02:11 AM
> Subject: Re: optimize is taking too much time
> 
> 
> 
> 
> Jagdish Vasani-2 wrote:
> > 
> > Hi,
> > 
> > you should not optimize index after each insert of document.insted you
> > should optimize it after inserting some good no of documents.
> > because in optimize it will merge  all segments to one according to
> > setting
> > of lucene index.
> > 
> > thanks,
> > Jagdish
> > On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote:
> > 
> >>
> >> hi
> >> in my solr u have 1,42,45,223 records having some 50GB .
> >> Now when iam loading a new record and when its trying optimize the docs
> >> its
> >> taking 2 much memory and time
> >>
> >>
> >> can any body please tell do we have any property in solr to get rid of
> >> this.
> >>
> >> Thanks in advance
> >>
> >> --
> >> View this message in context:
> >> 
> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> > 
> > 
> 
> Yes,
> Thanks for reply 
> i have removed the optmize() from  code. but i have a doubt ..
> 1.Will  mergefactor internally do any optmization (or) we have to specify
> 
> 2. Even if solr initaiates optmize if i have a large data like 52GB will
> that takes huge time?
> 
> Thanks,
> Prasad
> 
> 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: filter result by catalog

2010-02-19 Thread Kevin Osborn

Yes I thought about both methods. The ACL method is easier, but has some 
scalability issues. We use the bitset method in another product, but there are 
some complexity and resource problems.

This is a new project so I am revisiting the issue to see if anyone had any 
better ideas.

On Fri Feb 19th, 2010 6:18 PM PST Otis Gospodnetic wrote:

>So, hello Kevin,
>
>So what have you tried so far?  I see from 
>http://www.search-lucene.com/m?id=839141.906...@web81107.mail.mud.yahoo.com||acl
> you've tried the "acl field" approach.
>How about the bitset approach described there?
>
>
>Otis 
>Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
>- Original Message 
>> From: Kevin Osborn 
>> To: Solr 
>> Sent: Fri, February 19, 2010 6:06:51 PM
>> Subject: filter result by catalog
>> 
>> So, I am looking at better ways to filter a resultset by catalog. So, I have 
>> an 
>> index of products. And based on the user, I want to filter the search 
>> results to 
>> what they are allowed to see. I will probably have up to 200 or so different 
>> catalogs.
>

42 matches

Mail list logo