Re: Term Dictionary + scoring

2010-01-16 Thread MitchK

Grant,

thank you for the link to the wiki. TermsComponent was unknown to me until
now. It sounds good!

> Generally, this clickthrough tracking is tied to the query, so you need a
> layer above just popularity.  You >need popularity per query (or in all
> likelihood a subset of the queries, since you likely only care about this
> >where you have a certain level of clickthroughs/queries).

Yes, that's true, but how can I realize that? Saving all the queries which
ever leads to a click in a field, together with the clickthroughrate sounds
not clean. Okay, I could try to retrieve the values per query but it sounds
really greedy.
What did you mean with "layer"?

Thank you
Mitch
-- 
View this message in context: 
http://old.nabble.com/Term-Dictionary-%2B-scoring-tp27174862p27187981.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: OverlappingFileLockException when using startup

2010-01-16 Thread Joe Kessel

There is no issue here, we had patched our solr to include SOLR-1595 and our 
webapps directory contained two wars.  With only a single war file there is no 
issue with this replication handler.

 

thanks for the quick response.

 

Joe
 
> From: isjust...@hotmail.com
> To: solr-user@lucene.apache.org
> Subject: RE: OverlappingFileLockException when using  name="replicateAfter">startup
> Date: Fri, 15 Jan 2010 18:36:13 -0600
> 
> 
> I am using the example solrconfig.xml with only a few changes. Mainly the 
> replication section for the master has been changed. I am not using any 
> plugins that I am aware of. Here is my replication section:
> 
> 
> 
> 
> 
> startup
> optimize
> 
> 
> 
> 
> 
> If this is valid, then I will open a jira issue.
> 
> 
> 
> Thanks,
> 
> Joe
> 
> > Date: Fri, 15 Jan 2010 19:06:15 -0500
> > Subject: Re: OverlappingFileLockException when using  > name="replicateAfter">startup
> > From: yo...@lucidimagination.com
> > To: solr-user@lucene.apache.org
> > 
> > Interesting... this should be impossible.
> > Unless there is a bug in Lucene's NativeFSLock (and it doesn't look
> > like it), the only way I see that this could happen is if there were
> > multiple instances of that class loaded in different classloaders.
> > Are you using any kind of plugins?
> > 
> > Could you open a JIRA issue for this?
> > 
> > -Yonik
> > http://www.lucidimagination.com
> > 
> > 
> > 
> > On Fri, Jan 15, 2010 at 5:50 PM, Joe Kessel  wrote:
> > >
> > > I have an instance of Solr that won't start since I have added the 
> > > replication startup. I am using Solr 1.4 
> > > and only see this with my index that contains 200k documents with a total 
> > > size of 400MB. Removing the replicate after startup and the instance 
> > > starts without error. We found that we needed replicate after startup as 
> > > there was no version information on the master after restarting the 
> > > instance. Is there something special that needs to be done when using 
> > > replicate after startup? Or is this a bug?
> > >
> > >
> > >
> > > below is the solr portion of the stacktrace.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Joe
> > >
> > >
> > >
> > > INFO: QuerySenderListener sending requests to searc...@5a425eb9 main
> > > Jan 15, 2010 5:29:46 PM org.apache.solr.common.SolrException log
> > > SEVERE: java.nio.channels.OverlappingFileLockException
> > > at 
> > > sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java:1170)
> > > at 
> > > sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:1072)
> > > at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:878)
> > > at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
> > > at 
> > > org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:233)
> > > at org.apache.lucene.store.Lock.obtain(Lock.java:73)
> > > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
> > > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402)
> > > at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
> > > at 
> > > org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
> > > at 
> > > org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
> > > at 
> > > org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376)
> > > at 
> > > org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:845)
> > > at 
> > > org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:486)
> > > at org.apache.solr.core.SolrCore.(SolrCore.java:588)
> > > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
> > > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
> > > at 
> > > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
> > > at 
> > > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
> > > at 
> > > org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
> > >
> > > _
> > > Hotmail: Powerful Free email with security by Microsoft.
> > > http://clk.atdmt.com/GBL/go/196390710/direct/01/
> 
> _
> Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
> http://clk.atdmt.com/GBL/go/196390709/direct/01/
  
_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/196390707/direct/01/

Re: How to start using Solr

2010-01-16 Thread nfire


Glad that I found this thread, I was searching for this issue throughout the
forum. 

Does this mean that if I host my website on a Virtual Private Server, will
it be okay if I ask my hosting provider for Windows Server (since the
website was developed using asp.net) and Apache Tomcat installed? Are there
any other requirements that I should ask for.

Thank you.
-- 
View this message in context: 
http://old.nabble.com/How-to-start-using-Solr-tp25738958p27189121.html
Sent from the Solr - User mailing list archive at Nabble.com.



Fundamental questions of how to build up solr for huge portals

2010-01-16 Thread Peter

Hello!

Our team wants to use solr for an community portal built up out of 3 and 
more sub portals. We are unsure in which way we sould build up the whole 
architecture, because we have more than one portal and we want to make 
them all connected and searchable by solr. Could some experts help us on 
these questions?


- whats the best way to use solr to get the best performance for an huge 
portal with >5000 users that might expense fastly?
- which client to use (Java,PHP...)? Now the portal is almost PHP/MySQL 
based. But we want to make solr as best as it could be in all ways 
(performace, accesibility, way of good programming, using the whole 
features of lucene - like tagging, facetting and so on...)



We are thankful of every suggestions :)

Thanks,
Peter


Re: Fundamental questions of how to build up solr for huge portals

2010-01-16 Thread MitchK

Hello Peter,

well, I am no expert on Solr, but what you want to do sounds like a case for
several SolrCores [1].
I am thinking of one core per portal and one super-core to search over all
portals.
This would be redundant and several information will be stored twice or more
times.
Another way would be to build one super-index. 
In your schema you have to define a field (let's call it "portal") to set to
which portal it's "row" belongs.
If you are searching for content from the news portal, you have to facet
portal:news and so on.

Just some thoughts.

Kind regards from Germany
Mitch

[1]http://wiki.apache.org/solr/CoreAdmin



Peter Gabriel wrote:
> 
> Hello!
> 
> Our team wants to use solr for an community portal built up out of 3 and 
> more sub portals. We are unsure in which way we sould build up the whole 
> architecture, because we have more than one portal and we want to make 
> them all connected and searchable by solr. Could some experts help us on 
> these questions?
> 
> - whats the best way to use solr to get the best performance for an huge 
> portal with >5000 users that might expense fastly?
> - which client to use (Java,PHP...)? Now the portal is almost PHP/MySQL 
> based. But we want to make solr as best as it could be in all ways 
> (performace, accesibility, way of good programming, using the whole 
> features of lucene - like tagging, facetting and so on...)
> 
> 
> We are thankful of every suggestions :)
> 
> Thanks,
> Peter
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Fundamental-questions-of-how-to-build-up-solr-for-huge-portals-tp27189739p27189905.html
Sent from the Solr - User mailing list archive at Nabble.com.



java heap space error when faceting

2010-01-16 Thread Matt Mitchell
I have an index with more than 6 million docs. All is well, until I turn on
faceting and specify a facet.field. There is only about unique 20 values for
this particular facet throughout the entire index. I was able to make things
a little better by using facet.method=enum. That seems to work, until I add
another facet.field to the request, which is another facet that doesn't have
that many unique values. I utlimately end up running out of heap space
memory. I should also mention that in every case, the "rows" param is set to
0.

I've thrown as much memory as I can at the JVM (+3G for start-up and max),
tweaked filter cache settings etc.. I can't seem to get this error to go
away. Anyone have any tips to throw my way?

-- using a recent nighlty build of solr 1.5 dev and Jetty as my servlet
container.

Thanks!
Matt


Re: OverlappingFileLockException when using startup

2010-01-16 Thread Yonik Seeley
On Sat, Jan 16, 2010 at 7:38 AM, Joe Kessel  wrote:
> There is no issue here, we had patched our solr to include SOLR-1595 and our 
> webapps directory contained two wars.  With only a single war file there is 
> no issue with this replication handler.

Thanks Joe, for now I've added a note to the example solrconfig.xml
about "native" not working for multiple solr webapps in the same JVM.

-Yonik
http://www.lucidimagination.com


Re: java heap space error when faceting

2010-01-16 Thread Yonik Seeley
On Sat, Jan 16, 2010 at 10:01 AM, Matt Mitchell  wrote:
> I have an index with more than 6 million docs. All is well, until I turn on
> faceting and specify a facet.field. There is only about unique 20 values for
> this particular facet throughout the entire index.

Hmmm, that doesn't sound right... unless you're already near max
memory usage due to other things.
Is this a single-valued or multi-valued field?  If multi-valued, how
many values does each document have on average?

-Yonik
http://www.lucidimagination.com


Re: java heap space error when faceting

2010-01-16 Thread Matt Mitchell
These are single valued fields. Strings and integers. Is there more specific
info I could post to help diagnose what might be happening?
Thanks!
Matt

On Sat, Jan 16, 2010 at 10:42 AM, Yonik Seeley
wrote:

> On Sat, Jan 16, 2010 at 10:01 AM, Matt Mitchell 
> wrote:
> > I have an index with more than 6 million docs. All is well, until I turn
> on
> > faceting and specify a facet.field. There is only about unique 20 values
> for
> > this particular facet throughout the entire index.
>
> Hmmm, that doesn't sound right... unless you're already near max
> memory usage due to other things.
> Is this a single-valued or multi-valued field?  If multi-valued, how
> many values does each document have on average?
>
> -Yonik
> http://www.lucidimagination.com
>


Re: java heap space error when faceting

2010-01-16 Thread Yonik Seeley
On Sat, Jan 16, 2010 at 11:04 AM, Matt Mitchell  wrote:
> These are single valued fields. Strings and integers. Is there more specific
> info I could post to help diagnose what might be happening?

Faceting on either should currently take ~24MB (6M docs @ 4 bytes per
doc + size_of_unique_values)
With that small number of values, facet.enum may be faster in general
(and take up less room: 6M/8*20 or 15MB).
But you certainly shouldn't be running out of space with the heap
sizes you mentioned.

Perhaps look at the stats.jsp page in the admin and see what's listed
in the fieldCache?
And verify that your heap is really as big as you think it is.
You can also use something like jconsole that ships with the JDK to
manually do a GC and check out how much of the heap is in use before
you try to facet.

-Yonik
http://www.lucidimagination.com


Re: Stripping Punctuation in a fieldType

2010-01-16 Thread Chris Hostetter

: Subject: Stripping Punctuation in a fieldType
: In-Reply-To: <27179780.p...@talk.nabble.com>
: References:
: 
: <27178423.p...@talk.nabble.com> <27179780.p...@talk.nabble.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss



Re: Index Courruption after replication by new Solr 1.4 Replication

2010-01-16 Thread Chris Hostetter

: Subject: Index Courruption after replication by new Solr 1.4 Replication
: References: <3ca90cc651ae3f4baedf8a5b78639c8c038a1...@mail02.tveyes.com>
:  <667725.5147...@web52905.mail.re2.yahoo.com>
:  <3ca90cc651ae3f4baedf8a5b78639c8c038a1...@mail02.tveyes.com>
:  <359a92831001151042n73a47daby46ee728a86bb...@mail.gmail.com>
:  <3ca90cc651ae3f4baedf8a5b78639c8c038a1...@mail02.tveyes.com>
:  <359a92831001151131o10f71619se49d66bea6fe5...@mail.gmail.com>
:  <3ca90cc651ae3f4baedf8a5b78639c8c038a1...@mail02.tveyes.com>
: In-Reply-To: <3ca90cc651ae3f4baedf8a5b78639c8c038a1...@mail02.tveyes.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss



Re: Errors when registering MBeans

2010-01-16 Thread Chris Hostetter

: MBeans. I have tried to deploy it without generating MBeans but with
: no luck.

First off, a quick solution: if you don't care about using JMX to monitor 
Solr, just completley remove the "" config option from 
solrconfig.xml.  that should eliminate all attempts by Solr to register 
MBeans at all.

If you do care about JMX or are interested in helping diagnose this 
further...

: org.apache.solr.core.JmxMonitoredMap put Failed to register info bean:
: searcher
: 
: javax.management.InstanceAlreadyExistsException: 
: 
solr:cell=WC_default_cell,type=searcher,node=WC_default_node,process=server1,id=org.apache.solr.search.SolrIndexSearcher

...at first glance, this seems like it *might* be because the "current" 
index searcher is in fact tracked twice by Solr: once using a unique name, 
and once using a generic name ("searcher" i believe) ... however i've 
never seen this cause a problem with any JmxMBeanServers before -- You can 
definitely get problems if you attempt to registered the same bean with 
the same name more then once, but unique names aren't suppose to be a 
problem.

A quick skim of google results for InstanceAlreadyExistsException seems to 
bear this out, and even if there was a disconnect between your (IBM) 
MBeanServer impl and Solr's use of JMX o this point, it still wouldn't 
explain the rest of these errors below.

Could you try using some JMX tools to query your servlet container to see 
what is/isn't registered?

...

: org.apache.solr.core.JmxMonitoredMap put Failed to register info bean:
: fieldValueCache
: 
: javax.management.InstanceAlreadyExistsException:
: 
solr:cell=WC_default_cell,type=fieldValueCache,node=WC_default_node,process=server1,id=org.apache.solr.search.FastLRUCache

: org.apache.solr.core.JmxMonitoredMap put Failed to register info bean:
: filterCache
: 
: javax.management.InstanceAlreadyExistsException:
: 
solr:cell=WC_default_cell,type=filterCache,node=WC_default_node,process=server1,id=org.apache.solr.search.FastLRUCache

: [1/15/10 10:15:04:897 CET] 046e JmxMonitoredM W
: org.apache.solr.core.JmxMonitoredMap put Failed to register info bean:
: queryResultCache
: 
: javax.management.InstanceAlreadyExistsException:
: 
solr:cell=WC_default_cell,type=queryResultCache,node=WC_default_node,process=server1,id=org.apache.solr.search.LRUCache

: [1/15/10 10:15:04:897 CET] 046e JmxMonitoredM W
: org.apache.solr.core.JmxMonitoredMap put Failed to register info bean:
: documentCache
: 
: javax.management.InstanceAlreadyExistsException:
: 
solr:cell=WC_default_cell,type=documentCache,node=WC_default_node,process=server1,id=org.apache.solr.search.LRUCache




-Hoss



Re: only use sorting when there's no "q" is "*:*"?

2010-01-16 Thread Chris Hostetter

: It uses the doc insertion order by default.

Strictly speaking: it sorts by score, and when multiple docs have 
identical scores, the secondary sorting is undefined (as an implementation 
detail it is _usually_ doc insertion order, but that's not really 
garunteed.

As for your original question...

: > > > Is it possible to set up Solr such that when there's
: > > no query (client would send 
: > > > in "*:*" for "q"), Solr would sort results (basically
: > > all the documents) by date 
: > > > or some other criterion.

why not use:   sort = score desc, myDateField asc


-Hoss



Re: java heap space error when faceting

2010-01-16 Thread Matt Mitchell
I'm embarrassed (but hugely relieved) to say that, the script I had for
starting Jetty had a bug in the way it set java options! So, my heap
start/max was always set at the default. I did end up using jconsole and
learned quite a bit from that too.

Thanks for your help Yonik :)

Matt

On Sat, Jan 16, 2010 at 11:13 AM, Yonik Seeley
wrote:

> On Sat, Jan 16, 2010 at 11:04 AM, Matt Mitchell 
> wrote:
> > These are single valued fields. Strings and integers. Is there more
> specific
> > info I could post to help diagnose what might be happening?
>
> Faceting on either should currently take ~24MB (6M docs @ 4 bytes per
> doc + size_of_unique_values)
> With that small number of values, facet.enum may be faster in general
> (and take up less room: 6M/8*20 or 15MB).
> But you certainly shouldn't be running out of space with the heap
> sizes you mentioned.
>
> Perhaps look at the stats.jsp page in the admin and see what's listed
> in the fieldCache?
> And verify that your heap is really as big as you think it is.
> You can also use something like jconsole that ships with the JDK to
> manually do a GC and check out how much of the heap is in use before
> you try to facet.
>
> -Yonik
> http://www.lucidimagination.com
>


Re: How to start using Solr

2010-01-16 Thread Lance Norskog
Java 1.6. Also decide if you need 32-bit java (limited to 2G of jvm)
or 64-bit. Some kind of log file rolling or size control.

On Sat, Jan 16, 2010 at 4:56 AM, nfire  wrote:
>
>
> Glad that I found this thread, I was searching for this issue throughout the
> forum.
>
> Does this mean that if I host my website on a Virtual Private Server, will
> it be okay if I ask my hosting provider for Windows Server (since the
> website was developed using asp.net) and Apache Tomcat installed? Are there
> any other requirements that I should ask for.
>
> Thank you.
> --
> View this message in context: 
> http://old.nabble.com/How-to-start-using-Solr-tp25738958p27189121.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Fundamental questions of how to build up solr for huge portals

2010-01-16 Thread Sven Maurmann

Hi!

Your question is quite general in nature, therefore here are only a few
initial remarks on how to get started:

If you want to have a global search over all of your portals it might be
best to start with one Solr instance and access it from all the portals.
If you plan to build collections that are special to one or another portal
you can do so during index-time: Just mark the indexed object in a dedicated
field of the index.

If you provide query handlers for each of the portals you can control the
behaviour of the search based on the respective portal. You may than use
query filters to filter results based on the portal.

So much for the erer side. For your question about which client (language) 
to

use:

Since Solr is able to generate responses for a number of client platforms
you may want to consult http://wiki.apache.org/solr/IntegratingSolr for
additional information. I like to use a very lightweight solution using
Java Script with the query responses from Solr being delivered via JSON.
Since you can do this also for PHP clients, you might want to give it a
try.

Regards,

Sven


--On Samstag, 16. Januar 2010 15:16 +0100 Peter  wrote:


Hello!

Our team wants to use solr for an community portal built up out of 3 and
more sub portals. We are unsure in which way we sould build up the whole
architecture, because we have more than one portal and we want to make
them all connected and searchable by solr. Could some experts help us on
these questions?

- whats the best way to use solr to get the best performance for an huge
portal with >5000 users that might expense fastly?
- which client to use (Java,PHP...)? Now the portal is almost PHP/MySQL
based. But we want to make solr as best as it could be in all ways
(performace, accesibility, way of good programming, using the whole
features of lucene - like tagging, facetting and so on...)


We are thankful of every suggestions :)

Thanks,
Peter




--
kippdata informationstechnologie GmbH
Sven Maurmann   Tel: 0228 98549 -12
Bornheimer Str. 33a Fax: 0228 98549 -50
D-53111 Bonnsven.maurm...@kippdata.de

HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417
Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann



Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-16 Thread David MARTIN
I'm really interested in reading the answer to this thread as my problem is
rather the same. Maybe my main difference is the huge SKU number per product
I may have.


David

On Thu, Jan 14, 2010 at 2:35 AM, Kelly Taylor  wrote:

>
> Hoss,
>
> Would you suggest using dedup for my use case; and if so, do you know of a
> working example I can reference?
>
> I don't have an issue using the patched version of Solr, but I'd much
> rather
> use the GA version.
>
> -Kelly
>
>
>
> hossman wrote:
> >
> >
> > : Dedupe is completely the wrong word. Deduping is something else
> > : entirely - it is about trying not to index the same document twice.
> >
> > Dedup can also certainly be used with field collapsing -- that was one of
> > the initial use cases identified for the SignatureUpdateProcessorFactory
> > ... you can compute an 'expensive' signature when adding a document,
> index
> > it, and then FieldCollapse on that signature field.
> >
> > This gives you "query time deduplication" based on a value computed when
> > indexing (the canonical example is multiple urls refrenceing the "same"
> > content but with slightly differnet boilerplate markup.  You can use a
> > Signature class that recognizes the boilerplate and computes an identical
> > signature value for each URL whose content is "the same" but still index
> > all of the URLs and their content as distinct documents ... so use cases
> > where people only "distinct" URLs work using field collapse but by
> default
> > all matching documents can still be returned and searches on text in the
> > boilerplate markup also still work.
> >
> >
> > -Hoss
> >
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27155115.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: recent query execution cache in Solr

2010-01-16 Thread Chris Hostetter

: Yes, it's the cache.  But not document/query/filter cache, but http 
: cache.  Yes, you can disable it in solrconfig.xml

Specificly: it is (probably) your browser cache, as Solr doesn't cache 
anything between restarts.

Info about disabling (or changing the rules for) HTTP caching can be found 
here...

http://wiki.apache.org/solr/SolrConfigXml#HTTP_Caching


-Hoss