Re: SolrCloud and exernal file fields

2012-11-21 Thread Martin Koch
On Wed, Nov 21, 2012 at 7:08 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> On Wed, Nov 21, 2012 at 2:07 AM, Martin Koch  wrote:
>
> >  I'm not sure about the mmap directory or where that
> > would be configured in solr - can you explain that?
> >
>
> You can check it at Solr Admin/Statistics/core/searcher/stats/readerDir
> should be org.apache.lucene.store.MMapDirectory
>
> It says '
org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@
'

/Martin

--
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


RE: Reduce QueryComponent prepare time

2012-11-21 Thread Markus Jelsma
Hi Mikhail,

Thanks for sharing your experiences. I'll look into the flexible query parser.

Markus
 
 
-Original message-
> From:Mikhail Khludnev 
> Sent: Tue 20-Nov-2012 19:53
> To: solr-user@lucene.apache.org
> Subject: Re: Reduce QueryComponent prepare time
> 
> Markus,
> 
> It seems you faced the challenge of optimizing complex eDisMax code for
> your particular usecase, which is not so common. I can not help with these
> coding, just can share some experience: we have mind blowing queries too -
> they spawns many fields and enumerate many phrase shingles. We have similar
> contra intuitive hot spot - query parsing takes more than searching and
> faceting. But for our case dictionaries lookup - i.e. terms substitution
> and transformations are the main CPU consumption. We build our own query
> parser with something like
> http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.html.
> This way, when you represent core query structure as a DOM-like nodes
> skeleton, and then transform them into particular queries instances, *might
> be more performant* (and *might be not* for you) than current eDismax.
> Nothing more useful from me.
> 
> Bye.
> 
> 
> On Tue, Nov 20, 2012 at 7:01 PM, Markus Jelsma
> wrote:
> 
> > Hi,
> >
> > Profiling pointed me directly to the method i already suspected:
> > ExtendedDismaxQParser.parse(). I added manual timers in parts of the method
> > and made sure the timers add up to the QueryComponent prepare time. After
> > starting Solr there's one small part taking almost 100ms on a fast machine
> > with lots of memory, fortunately this is only once. KStemmer and the
> > loading of the KStemData and the ThaiWordFilter's init take the bulk of it.
> >
> >   ExtendedSolrQueryParser up =
> > new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME);
> >   up.addAlias(IMPOSSIBLE_FIELD_NAME,
> > tiebreaker, queryFields);
> >   addAliasesFromRequest(up, tiebreaker);
> >   up.setPhraseSlop(qslop); // slop for explicit user phrase queries
> >   up.setAllowLeadingWildcard(true);
> >
> > After it's been running for some time two parts continue to take a lot of
> > time, parsing the query
> >
> >   if (parsedUserQuery == null) {
> > sb = new StringBuilder();
> > for (Clause clause : clauses) {
> >
> > 
> >
> > if (parsedUserQuery instanceof BooleanQuery) {
> >   BooleanQuery t = new BooleanQuery();
> >   SolrPluginUtils.flattenBooleanQuery(t,
> > (BooleanQuery)parsedUserQuery);
> >   SolrPluginUtils.setMinShouldMatch(t, minShouldMatch);
> >   parsedUserQuery = t;
> > }
> >   }
> >
> > and handing the phrase fields (pf, pf2, pf3):
> >
> >   if (allPhraseFields.size() > 0) {
> > // full phrase and shingles
> > for (FieldParams phraseField: allPhraseFields) {
> >   Map pf = new HashMap(1);
> >   pf.put(phraseField.getField(),phraseField.getBoost());
> >   addShingledPhraseQueries(query, normalClauses, pf,
> >   phraseField.getWordGrams(),tiebreaker, phraseField.getSlop());
> > }
> >   }
> >
> > The problem is significant when having a lot of fields, the prepare time
> > is usually higher than the process times of query, highlight and facet
> > combined.
> >
> >
> >
> > -Original message-
> > > From:Mikhail Khludnev 
> > > Sent: Mon 19-Nov-2012 12:52
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Reduce QueryComponent prepare time
> > >
> > > Markus,
> > >
> > > It's hard to suggest anything until you provide a profiler snapshot which
> > > says what it spends time in prepare for. As far as I know in prepare it
> > > parses queries e.g. we have a really heavy query parsers, but I don't
> > think
> > > it's really common.
> > >
> > >
> > > On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma
> > > wrote:
> > >
> > > > I'd also like to know which parts of the entire query constitute the
> > > > prepare time and if it would matter significantly if we extend the
> > edismax
> > > > plugin and hardcode the parameters we pass into (reusable) objects.
> > > >
> > > > Thanks,
> > > > Markus
> > > >
> > > > -Original message-
> > > > > From:Markus Jelsma 
> > > > > Sent: Fri 16-Nov-2012 15:57
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Reduce QueryComponent prepare time
> > > > >
> > > > > Hi,
> > > > >
> > > > > We're seeing high prepare times for the QueryComponent, obviously
> > due to
> > > > the vast amount of field and queries. It's common to have a prepare
> > time of
> > > > 70-80ms while the process times drop significantly due to warmed
> > searchers,
> > > > OS cache etc. The prepare time is a recurring issue and i'd hope if
> > there
> > > > are people here that can share some thoughts or hints.
> > > > >
> > > > > We're using a recent check out on a 10 node test cluster with SSD's
> > > > (althoug

Re: user session id / cookie to record search query

2012-11-21 Thread Paul Libbrecht
Record?
E.g. output the cookie value of a given name in the log?
Provided you use Apache mod_proxy, we do this by a special log-format.

paul


Le 21 nov. 2012 à 09:50, Romita Saha a écrit :

> Hi All,
> 
> Do anyone have an idea how to use user session id / cookie to record 
> search query from that particular user. 
> 
> Thanks and regards,
> Romita



Re: user session id / cookie to record search query

2012-11-21 Thread Rafał Kuć
Hello!

You want it to be written into logs ? If that is the case you can just
add additional parameter, that is not recognized by Solr, for example
'userId' and send a query like this:

http://localhost:8983/solr/select?q=*:*&userId=user1

In the logs you should see something like that:
INFO: [collection1] webapp=/solr path=/select
params={q=*:*&userId=user1} hits=0 status=0 QTime=1

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hi All,

> Do anyone have an idea how to use user session id / cookie to record 
> search query from that particular user. 

> Thanks and regards,
> Romita



Re: Solr defining Schema structure trouble.

2012-11-21 Thread denl0
isn't it possible to combine the document related values and page related
values at query time?

Book1
Page1 with ref to book1
Page2 with ref to book2

When querying making all pages (page1+book1) and (page2+book1) Or would this
be hard to achieve. 

I'm pretty sure they wan't to search on book related metadata too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-defining-Schema-structure-trouble-tp4020305p4021531.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: user session id / cookie to record search query

2012-11-21 Thread Romita Saha
Hello Rafał Kuć

Thanks a lot for you guidance. I am not quite sure how to i collect the 
logs. Could you please help.

Romita 



From:   Rafał Kuć 
To: solr-user@lucene.apache.org, 
Date:   11/21/2012 04:57 PM
Subject:Re: user session id / cookie to record search query



Hello!

You want it to be written into logs ? If that is the case you can just
add additional parameter, that is not recognized by Solr, for example
'userId' and send a query like this:

http://localhost:8983/solr/select?q=*:*&userId=user1

In the logs you should see something like that:
INFO: [collection1] webapp=/solr path=/select
params={q=*:*&userId=user1} hits=0 status=0 QTime=1

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hi All,

> Do anyone have an idea how to use user session id / cookie to record 
> search query from that particular user. 

> Thanks and regards,
> Romita 




Re: SolrCloud and exernal file fields

2012-11-21 Thread Mikhail Khludnev
On Wed, Nov 21, 2012 at 11:53 AM, Martin Koch  wrote:

>
> I wasn't aware until now that it is possible to send a commit to one core
> only. What we observed was the effect of curl
> localhost:8080/solr/update?commit=true but perhaps we should experiment
> with solr/coreN/update?commit=true. A quick trial run seems to indicate
> that a commit to a single core causes commits on all cores.
>
You should see something like this in the log:
... SolrCmdDistributor  Distrib commit to: ...

>
>
> Perhaps I should clarify that we are using SOLR as a black box; we do not
> touch the code at all - we only install the distribution WAR file and
> proceed from there.
>
I still don't understand how you deploy/launch Solr. How many jettys you
start whether you have -DzkRun -DzkHost -DnumShards=2  or you specifies
shards= param for every request and distributes updates yourself? What
collections do you create and with which settings?


>
>
> > Also from my POV such deployments should start at least from *16* 4-way
> > vboxes, it's more expensive, but should be much better available during
> > cpu-consuming operations.
> >
>
> Do you mean that you recommend 16 hosts with 4 cores each? Or 4 hosts with
> 16 cores? Or am I misunderstanding something :) ?
>
I prefer to start from 16 hosts with 4 cores each.


>
>
> > Other details, if you use single jetty for all of them, are you sure that
> > jetty's threadpool doesn't limit requests? is it large enough?
> > You have 60G and set -Xmx=10G. are you sure that total size of cores
> index
> > directories is less than 45G?
> >
> > The total index size is 230 GB, so it won't fit in ram, but we're using
> an
> SSD disk to minimize disk access time. We have tried putting the EFF onto a
> ram disk, but this didn't have a measurable effect.
>
> Thanks,
> /Martin
>
>
> > Thanks
> >
> >
> > On Wed, Nov 21, 2012 at 2:07 AM, Martin Koch  wrote:
> >
> > > Mikhail
> > >
> > > PSB
> > >
> > > On Tue, Nov 20, 2012 at 7:22 PM, Mikhail Khludnev <
> > > mkhlud...@griddynamics.com> wrote:
> > >
> > > > Martin,
> > > >
> > > > Please find additional question from me below.
> > > >
> > > > Simone,
> > > >
> > > > I'm sorry for hijacking your thread. The only what I've heard about
> it
> > at
> > > > recent ApacheCon sessions is that Zookeeper is supposed to replicate
> > > those
> > > > files as configs under solr home. And I'm really looking forward to
> > know
> > > > how it works with huge files in production.
> > > >
> > > > Thank You, Guys!
> > > >
> > > > 20.11.2012 18:06 пользователь "Martin Koch"  написал:
> > > > >
> > > > > Hi Mikhail
> > > > >
> > > > > Please see answers below.
> > > > >
> > > > > On Tue, Nov 20, 2012 at 12:28 PM, Mikhail Khludnev <
> > > > > mkhlud...@griddynamics.com> wrote:
> > > > >
> > > > > > Martin,
> > > > > >
> > > > > > Thank you for telling your own "war-story". It's really useful
> for
> > > > > > community.
> > > > > > The first question might seems not really conscious, but would
> you
> > > tell
> > > > me
> > > > > > what blocks searching during EFF reload, when it's triggered by
> > > handler
> > > > or
> > > > > > by listener?
> > > > > >
> > > > >
> > > > > We continuously index new documents using CommitWithin to get
> regular
> > > > > commits. However, we observed that the EFFs were not re-read, so we
> > had
> > > > to
> > > > > do external commits (curl '.../solr/update?commit=true') to force
> > > reload.
> > > > > When this is done, solr blocks. I can't tell you exactly why it's
> > doing
> > > > > that (it was related to SOLR-3985).
> > > >
> > > > Is there a chance to get a thread dump when they are blocked?
> > > >
> > > >
> > > Well I could try to recreate the situation. But the setup is fairly
> > simple:
> > > Create a large EFF in a largeish index with many shards. Issue a
> commit,
> > > and then try to do a search. Solr will not respond to the search before
> > the
> > > commit has completed, and this will take a long time.
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > > > I don't really get the sentence about sequential commits and
> number
> > > of
> > > > > > cores. Do I get right that file is replicated via Zookeeper?
> > Doesn't
> > > it
> > > > > >
> > > > >
> > > > > Again, this is observed behavior. When we issue a commit on a
> system
> > > with
> > > > a
> > > > > system with many solr cores using EFFs, the system blocks for a
> long
> > > time
> > > > > (15 minutes).  We do NOT use zookeeper for anything. The EFF is a
> > > symlink
> > > > > from each cores index dir to the actual file, which is updated by
> an
> > > > > external process.
> > > >
> > > > Hold on, I asked about Zookeeper because the subj mentions SolrCloud.
> > > >
> > > > Do you use SolrCloud, SolrShards, or these cores are just replicas of
> > the
> > > > same index?
> > > >
> > >
> > > Ah - we use solr 4 out of the box, so I guess this is SolrCloud. I'm a
> > bit
> > > unsure about the terminology here, but we've got a single index divided
> > > into 16

Re: user session id / cookie to record search query

2012-11-21 Thread Rafał Kuć
Hello!

What Solr are you using ? If not 4.0, information on logging can be
found on wiki - http://wiki.apache.org/solr/SolrLogging

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hello Rafał Kuć

> Thanks a lot for you guidance. I am not quite sure how to i collect the
> logs. Could you please help.

> Romita 



> From:   Rafał Kuć 
> To: solr-user@lucene.apache.org, 
> Date:   11/21/2012 04:57 PM
> Subject:Re: user session id / cookie to record search query



> Hello!

> You want it to be written into logs ? If that is the case you can just
> add additional parameter, that is not recognized by Solr, for example
> 'userId' and send a query like this:

> http://localhost:8983/solr/select?q=*:*&userId=user1

> In the logs you should see something like that:
> INFO: [collection1] webapp=/solr path=/select
> params={q=*:*&userId=user1} hits=0 status=0 QTime=1



Recip m parameter to take function value

2012-11-21 Thread Markus Jelsma
Hi,

We need the recip function's m-parameter to take other functions e.g. 
recip(dateField, div(1,prod(1,2)), 1,1) but ValueSourceParser want to read a 
float instead. How could we modifiy either Solr or Lucene as well to take 
functions for that parameter? I've been looking at the various extended 
ValueSource classes and FunctionValues classes but i'm not yet sure if that's 
the right place.

ReciprocalFloatFunction want a float but can i resolve a function's value to a 
float in ValueSourceParser? Or must i do it in ReciprocalFloatFunction? 

Thanks,
Markus


Re: user session id / cookie to record search query

2012-11-21 Thread Romita Saha
Hi,

Thanks a lot. Will follow the same.

Thanks and regards,
Romita 



From:   Rafał Kuć 
To: solr-user@lucene.apache.org, 
Date:   11/21/2012 05:34 PM
Subject:Re: user session id / cookie to record search query



Hello!

What Solr are you using ? If not 4.0, information on logging can be
found on wiki - http://wiki.apache.org/solr/SolrLogging

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hello Rafał Kuć

> Thanks a lot for you guidance. I am not quite sure how to i collect the
> logs. Could you please help.

> Romita 



> From:   Rafał Kuć 
> To: solr-user@lucene.apache.org, 
> Date:   11/21/2012 04:57 PM
> Subject:Re: user session id / cookie to record search query



> Hello!

> You want it to be written into logs ? If that is the case you can just
> add additional parameter, that is not recognized by Solr, for example
> 'userId' and send a query like this:

> http://localhost:8983/solr/select?q=*:*&userId=user1

> In the logs you should see something like that:
> INFO: [collection1] webapp=/solr path=/select
> params={q=*:*&userId=user1} hits=0 status=0 QTime=1




From Solr3.1 to SolrCloud

2012-11-21 Thread roySolr
hello,

We are using solr 3.1 for searching on our webpage right now. We want to use
the nice features of solr 4: realtime search. Our current configuration
looks like this:

Master
Slave1
Slave2
Slave3

We have 3 slaves and 1 master and the data is replication every night. In
the future we want to update every ~5 seconds. I was looking to SOLRCLOUD
and got a few questions:

- We aren't using shards because our index only contains 1 mil simple docs.
We only need multiple server because the amount of traffic. In the examples
of solrCloud i see only examples with shards. Is numshards=1 possible? One
big index is faster than multiple shards? I need 1 collection with multiple
nodes?

- Should i run a single zookeeper instance(without solr) on a seperate
server? 

- Is the DIH still there in solr 4?

Any help is welcome!

Thanks
Roy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to retrieve 20 specific documents

2012-11-21 Thread Dotan Cohen
On Tue, Nov 20, 2012 at 12:45 AM, Shawn Heisey  wrote:
> You can also use this query format:
>
> id:(123 OR 456 OR 789)
>
> This does get expanded internally by the query parser to the format that has
> the field name on every clause, but it is sometimes easier to write code
> that produces the above form.
>

Thank you Shawn, that is much cleaner and will be easier to debug when
/ if things go wrong.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Replication Backup

2012-11-21 Thread Eva Lacy
Hi Otis,

It seems to me that I'm going to have to write a script anyway that takes
handles the retention of the backups.
Plus it doesn't seem optimal that I would run a solr instance on that
server, taking up memory when I could probably
write a script that would pull all the data directly using the replication
handler.
My current thinking is that I could imitate a slave and pull the index
directly from the master by calling the replication
handler with http requests.
Does this seem reasonable?

Eva


On Wed, Nov 21, 2012 at 3:29 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi Eva,
>
> I think you just need to configure the Solr instance on your Windows and
> point it to your Solr master.  It will then copy the index from the master
> periodically.
> Please see http://search-lucene.com/?q=solr+replication+backup for some
> more info about doing backups - you don't need rsync.  Once you set up
> Windows as described above, you can call the Solr replication handler on it
> and tell it to make an index shapshot, which is basically a copy of the
> index at that point in time.
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
>
>
>
>
> On Tue, Nov 20, 2012 at 12:31 PM, Eva Lacy  wrote:
>
> > Hi All,
> >
> > It takes a long time to reindex our database and so I'd like to be able
> to
> > backup the solr server.
> > I'm running solr 3.6.1 using tomcat on debian squeeze and I'd like to be
> > able to backup to a
> > windows server that contains the rest of our backups.
> >
> > There isn't much free space on the solr server. Enough for maybe 3
> backups.
> > Ideally I would like to be able to backup the index by pulling it similar
> > to how a solr slave does
> > but from the windows server. I heard that it uses something like rsync to
> > do that.
> > That way I could create a backup solution that backs up daily for 3 days,
> > then holds onto one of those every 10 days
> > or something similar.
> >
> > Any ideas on this would be appreciated.
> >
> > Eva
> >
>


Re: [Solrj] How can I get unique field name?

2012-11-21 Thread zakaria benzidalmal
That's right.
Thank you Jack.

Cordialement.
__
Zakaria BENZIDALMAL
mobile: 06 31 40 04 33


2012/11/20 Jack Krupansky 

> There is no absolute requirement that a Solr schema have a unique key
> field, so you could get a null value for the field.
>
> -- Jack Krupansky
>
> -Original Message- From: zakaria benzidalmal
> Sent: Tuesday, November 20, 2012 6:02 AM
> To: solr-user@lucene.apache.org
> Subject: Re: [Solrj] How can I get unique field name?
>
>
> Thank you Mikhail,
>
> Yes it does.
> I can access it throu the SolrQueryRequest object.
>
> this.uniqueKeyFieldName = req.getSchema().**getUniqueKeyField().getName();
>
>
>
> 2012/11/20 Mikhail Khludnev 
>
>  Hello Zakharia,
>>
>> org.apache.solr.schema.**IndexSchema.getUniqueKeyField(**)
>> Does it help?
>>
>>
>>
>> On Tue, Nov 20, 2012 at 2:40 PM, zakaria benzidalmal > >wrote:
>>
>> > Hi all,
>> >
>> > I am writing a custom query response writer and I would like to handle
>> the
>> > unique key field without knowing his actual name to stay generic.
>> >
>> > My question is: how can I get the uniqueKey fieldname of a result
>> document?
>> >
>> >
>> > Regards.
>> > __
>> > Zakaria BENZIDALMAL
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> 
>>  
>>
>>
>


Re: From Solr3.1 to SolrCloud

2012-11-21 Thread Tomás Fernández Löbbe
>
> - We aren't using shards because our index only contains 1 mil simple docs.
> We only need multiple server because the amount of traffic. In the examples
> of solrCloud i see only examples with shards. Is numshards=1 possible? One
> big index is faster than multiple shards? I need 1 collection with multiple
> nodes?
>
> Yes, you can use SolrCloud and specify the numShards=1. With 1M docs, I
would use one shard too, the overhead of the distribution may be bigger
than the time it takes to process a query on a single node with an index
this size (I do encourage you to test and see, because it usually depends
on more factors than just the index size, but I think 1 shard will be the
best).


> - Should i run a single zookeeper instance(without solr) on a seperate
> server?
>
Separate, and even better if you use a 3 zk ensemble, otherwise the
Zookeeper becomes a single point of failure.

>
> - Is the DIH still there in solr 4?
>
Yes, you'll see it in all nodes, and you can run it from any of them. That
said, you may see some improvements if you execute the DIH on the leader
node (which may not always be the same). I don't think the
dataimport.properties gets distributed though, you may have to figure that
out.


Tomás

>
> Any help is welcome!
>
> Thanks
> Roy
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCloud and exernal file fields

2012-11-21 Thread Martin Koch
Mikhail,

PSB

On Wed, Nov 21, 2012 at 10:08 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> On Wed, Nov 21, 2012 at 11:53 AM, Martin Koch  wrote:
>
> >
> > I wasn't aware until now that it is possible to send a commit to one core
> > only. What we observed was the effect of curl
> > localhost:8080/solr/update?commit=true but perhaps we should experiment
> > with solr/coreN/update?commit=true. A quick trial run seems to indicate
> > that a commit to a single core causes commits on all cores.
> >
> You should see something like this in the log:
> ... SolrCmdDistributor  Distrib commit to: ...
>
> Yup, a commit towards a single core results in a commit on all cores.


> >
> >
> > Perhaps I should clarify that we are using SOLR as a black box; we do not
> > touch the code at all - we only install the distribution WAR file and
> > proceed from there.
> >
> I still don't understand how you deploy/launch Solr. How many jettys you
> start whether you have -DzkRun -DzkHost -DnumShards=2  or you specifies
> shards= param for every request and distributes updates yourself? What
> collections do you create and with which settings?
>
> We let SOLR do the sharding using one collection with 16 SOLR cores
holding one shard each. We launch only one instance of jetty with the
folllowing arguments:

-DnumShards=16
-DzkHost=
-Xmx10G
-Xms10G
-Xmn2G
-server

Would you like to see the solrconfig.xml?

/Martin


> >
> >
> > > Also from my POV such deployments should start at least from *16* 4-way
> > > vboxes, it's more expensive, but should be much better available during
> > > cpu-consuming operations.
> > >
> >
> > Do you mean that you recommend 16 hosts with 4 cores each? Or 4 hosts
> with
> > 16 cores? Or am I misunderstanding something :) ?
> >
> I prefer to start from 16 hosts with 4 cores each.
>
>
> >
> >
> > > Other details, if you use single jetty for all of them, are you sure
> that
> > > jetty's threadpool doesn't limit requests? is it large enough?
> > > You have 60G and set -Xmx=10G. are you sure that total size of cores
> > index
> > > directories is less than 45G?
> > >
> > > The total index size is 230 GB, so it won't fit in ram, but we're using
> > an
> > SSD disk to minimize disk access time. We have tried putting the EFF
> onto a
> > ram disk, but this didn't have a measurable effect.
> >
> > Thanks,
> > /Martin
> >
> >
> > > Thanks
> > >
> > >
> > > On Wed, Nov 21, 2012 at 2:07 AM, Martin Koch  wrote:
> > >
> > > > Mikhail
> > > >
> > > > PSB
> > > >
> > > > On Tue, Nov 20, 2012 at 7:22 PM, Mikhail Khludnev <
> > > > mkhlud...@griddynamics.com> wrote:
> > > >
> > > > > Martin,
> > > > >
> > > > > Please find additional question from me below.
> > > > >
> > > > > Simone,
> > > > >
> > > > > I'm sorry for hijacking your thread. The only what I've heard about
> > it
> > > at
> > > > > recent ApacheCon sessions is that Zookeeper is supposed to
> replicate
> > > > those
> > > > > files as configs under solr home. And I'm really looking forward to
> > > know
> > > > > how it works with huge files in production.
> > > > >
> > > > > Thank You, Guys!
> > > > >
> > > > > 20.11.2012 18:06 пользователь "Martin Koch" 
> написал:
> > > > > >
> > > > > > Hi Mikhail
> > > > > >
> > > > > > Please see answers below.
> > > > > >
> > > > > > On Tue, Nov 20, 2012 at 12:28 PM, Mikhail Khludnev <
> > > > > > mkhlud...@griddynamics.com> wrote:
> > > > > >
> > > > > > > Martin,
> > > > > > >
> > > > > > > Thank you for telling your own "war-story". It's really useful
> > for
> > > > > > > community.
> > > > > > > The first question might seems not really conscious, but would
> > you
> > > > tell
> > > > > me
> > > > > > > what blocks searching during EFF reload, when it's triggered by
> > > > handler
> > > > > or
> > > > > > > by listener?
> > > > > > >
> > > > > >
> > > > > > We continuously index new documents using CommitWithin to get
> > regular
> > > > > > commits. However, we observed that the EFFs were not re-read, so
> we
> > > had
> > > > > to
> > > > > > do external commits (curl '.../solr/update?commit=true') to force
> > > > reload.
> > > > > > When this is done, solr blocks. I can't tell you exactly why it's
> > > doing
> > > > > > that (it was related to SOLR-3985).
> > > > >
> > > > > Is there a chance to get a thread dump when they are blocked?
> > > > >
> > > > >
> > > > Well I could try to recreate the situation. But the setup is fairly
> > > simple:
> > > > Create a large EFF in a largeish index with many shards. Issue a
> > commit,
> > > > and then try to do a search. Solr will not respond to the search
> before
> > > the
> > > > commit has completed, and this will take a long time.
> > > >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > > I don't really get the sentence about sequential commits and
> > number
> > > > of
> > > > > > > cores. Do I get right that file is replicated via Zookeeper?
> > > Doesn't
> > > > it
> > > > > > >
> > > > > >
> > > > > > Again, this is observed behavior

Single Tomcat Multiple Shards

2012-11-21 Thread Cool Techi
Hey Guys,

We are experimenting with solr cloud, this is what we want to set up as,

2 Machines each having have 8 master shards, so total of 16 shards. The 
assumption is we want to store approximately 4-5 TB data over a period of 1 
year of so. 
Replication factor of 1 which are again distributed across 3-4 machines.
Initially we want to start with 8 shards in a single tomcat and single machine, 
but I cannot find a way of having multiple shards in a single SOLR_HOME and 
single Tomcat. Can this be achieved?
Regards,Ayush

  

SolrCloud and external Zookeeper ensemble

2012-11-21 Thread Marcin Rzewucki
Hi,

I have 4 solr collections, 2-3mn documents per collection, up to 100K
updates per collection daily (roughly). I'm going to create SolrCloud4x on
Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The question is
what about zookeeper? It's going to be external ensemble, but is it better
to use same nodes as solr or dedicated micro instances? Zookeeper does not
seem to be resources demanding process, but what would be better in this
case ? To keep it inside of solrcloud or separately (micro instances seem
to be enough here) ?

Thanks in advance.
Regards.


Re: From Solr3.1 to SolrCloud

2012-11-21 Thread roySolr
Thanks Tomás,

I will use numshards=1. Are there some instructions on how to install only
zookeeper on a separate server? Or do i have to install solr 4 on that
server?

How make the connection between the solr instances and the zk
instance(server)?

Thanks so far,

Roy




--
View this message in context: 
http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536p4021583.html
Sent from the Solr - User mailing list archive at Nabble.com.


[SolrCloud] is softcommit cluster-wide for the collection ?

2012-11-21 Thread GIROLAMI Philippe
Hello,
We're working on integrating SolrCloud andwe're  wondering whether issuing a 
softCommit via Solrj forces the soft commit :

a) only on the receiving core or
b) to the whole cluster and the receiving cores forwards the soft commit to all 
replicas.

If the answer is a), what is the best practice to ensure data is indeed 
commited cluster-wide ?
If the answer is b), what would happen on a 1-replica setup if one commit 
succeeded and the replica commit failed  ?

Thanks
Philippe Girolami


solr autocomplete

2012-11-21 Thread sasho
Hi all,

I'am using the apache-solr4.0.0, and the autocomplete feature. In general it
works fine, but I still have two problems which I can't solve. In general I
need the autocomplete to show a movie titles.

1. The first thing is that the autocomplete search ignores all characters
after the space.
For example when I search for the word "the" I get the "the matrix" and "the
girl next door". And when I type "the m" instead of getting only "the
matrix", the server keep returning the previous two results. It seems that
the search has been proceed only fro the first token. 
I red a post here which says that the solution can be implementing a custom
QueryConverter for it, but I also sow that there is a SuggestQueryConverter
which may do the same?

2. The second problem I have is configuring an autocompletion to suggest not
only movie titles starting with the searched string but also titles which
has this string in the middle of the title. 
For example when i type "tibet" to get a suggestion not only for "tibet and
tibetan" but also for "seven years in tibet". Is this possible at all?

Here is my configuration:

schema.xml









  



config.xml

  

  suggest
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.tst.TSTLookup
  
  title_autocomplete
  title


  

 

  true
  suggest
  true
  10
  false


  suggest

  


Thank you.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-autocomplete-tp4021587.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: From Solr3.1 to SolrCloud

2012-11-21 Thread Tomás Fernández Löbbe
>
> I will use numshards=1. Are there some instructions on how to install only
> zookeeper on a separate server? Or do i have to install solr 4 on that
> server?
>

You don't need to install Solr in that server. See
http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html


>
> How make the connection between the solr instances and the zk
> instance(server)?
>

With the -DzkHost=host:port , as described in the SolrCloud wiki page, but
now you have to set it to all the Solr instances, and none of them have to
use the "-DzkRun".

Tomás


> Thanks so far,
>
> Roy
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536p4021583.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Solr 4 Admin UI Dashboard Not Populating

2012-11-21 Thread richardg
Our Admin UI Dashboard is not populating on one of our servers, not sure if
it is a permission issue or what.  We have three others that it is working
on.


 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-Admin-UI-Dashboard-Not-Populating-tp4021602.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Copying few field using copyField to non multiValued field

2012-11-21 Thread Barry Galaxy
i would also like to copy a few fields to a single-valued field.
my reasoning for this is to then perform exact-match search on the
concatenated field.
e.g.

full_name = first_name + last+name

i would then like to search:

full_name:"john foo"

but copyField is making the full_name field look like this:

john
foo


which is not working for the exact match...

ideas?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Copying-few-field-using-copyField-to-non-multiValued-field-tp3066979p4021605.html
Sent from the Solr - User mailing list archive at Nabble.com.


Writing SOLR custom search component to search SOLR to fetch more documents

2012-11-21 Thread ashokr
I have stored following documents in my solr schema.Just to tell in brief
about the schema, the schema is about teacher, course and their
relationship.Example data :|ID| Name| Type| FromID | ToID ||1 | t1  |
Teacher ||||2 | t2  | Teacher ||||3 | c1  | Course 
||||4 | c2  | Course  ||||5 | c3  | Course  |   
|||6 | r1  | Relation|   1| 3  ||7 | r2  | Relation|   1| 4  ||8
| r3  | Relation|   2| 3  ||9 | r4  | Relation|   1| 5  ||10| r5  |
Relation|   2| 5  |I want the output to be some thing like this.|ID|
Name| Type| Handles  | Handled By ||1 | t1  | Teacher | c1,c2,c3 |  
  
||2 | t2  | Teacher | c1,c3|||3 | c1  | Course  |  | 
t1,t2 ||4 | c2  | Course  |  |   t1   ||5 | c3  | Course  | 

|  t1,t2 |Can we do this in one query ?I tried I couldn't. So I thought
we will get records like this first.|ID| Name| Type| |1 | t1  | Teacher
| |2 | t2  | Teacher | |3 | c1  | Course  | |4 | c2  | Course  | |5 | c3  |
Course  | Then for each document, get appropriate relation documents and
convert them to CSV.So I started writing custom search component as
"last-components" for the search handler.I was able to fetch the result as
above.But not able to proceed further that would make more calls to get
relations.Can any one give some code snippet or something that will do SOLR
call with JOIN query from custom search component.Thanks~Ashok



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Writing-SOLR-custom-search-component-to-search-SOLR-to-fetch-more-documents-tp4021598.html
Sent from the Solr - User mailing list archive at Nabble.com.

Pls help: Very long query - what to do?

2012-11-21 Thread uwe72
my query is like this, see below. I use already POST request.

i got a solr exception:
org.apache.solr.client.solrj.SolrServerException: Server at
http://server:7056/solr returned non ok status:400, message:Bad Request

is there a way in order to prevent this?

id:("ModuleImpl@20117" OR "ModuleImpl@37886" OR "ModuleImpl@9379" OR
"ModuleImpl@37906" OR "ModuleImpl@19969" OR "ModuleImpl@37936" OR
"ModuleImpl@115568" OR "ModuleImpl@19901" OR "ModuleImpl@115472" OR
"ModuleImpl@20044" OR "ModuleImpl@25168" OR "ModuleImpl@38026" OR
"ModuleImpl@115647" OR "ModuleImpl@115648" OR "ModuleImpl@115649" OR
"ModuleImpl@20045" OR "ModuleImpl@25169" OR "ModuleImpl@38031" OR
"ModuleImpl@115650" OR "ModuleImpl@21090" OR "ModuleImpl@38037" OR
"ModuleImpl@117097" OR "ModuleImpl@21091" OR "ModuleImpl@38038" OR
"ModuleImpl@117098" OR "ModuleImpl@117099" OR "ModuleImpl@19973" OR
"ModuleImpl@38040" OR "ModuleImpl@115571" OR "ModuleImpl@115572" OR
"ModuleImpl@115573" OR "ModuleImpl@21092" OR "ModuleImpl@38135" OR
"ModuleImpl@117100" OR "ModuleImpl@21093" OR "ModuleImpl@38136" OR
"ModuleImpl@117101" OR "ModuleImpl@117102" OR "ModuleImpl@19979" OR
"ModuleImpl@38140" OR "ModuleImpl@115581" OR "ModuleImpl@19980" OR
"ModuleImpl@38143" OR "ModuleImpl@115582" OR "ModuleImpl@115583" OR
"ModuleImpl@21094" OR "ModuleImpl@38223" OR "ModuleImpl@117104" OR
"ModuleImpl@117105" OR "ModuleImpl@117106" OR "ModuleImpl@117107" OR
"ModuleImpl@117108" OR "ModuleImpl@21095" OR "ModuleImpl@38224" OR
"ModuleImpl@117109" OR "ModuleImpl@19920" OR "ModuleImpl@25157" OR
"ModuleImpl@38240" OR "ModuleImpl@115493" OR "ModuleImpl@20139" OR
"ModuleImpl@38286" OR "ModuleImpl@115752" OR "ModuleImpl@21096" OR
"ModuleImpl@38327" OR "ModuleImpl@117111" OR "ModuleImpl@117112" OR
"ModuleImpl@117113" OR "ModuleImpl@21097" OR "ModuleImpl@38328" OR
"ModuleImpl@117114" OR "ModuleImpl@19989" OR "ModuleImpl@25166" OR
"ModuleImpl@38332" OR "ModuleImpl@115585" OR "ModuleImpl@115586" OR
"ModuleImpl@19990" OR "ModuleImpl@38339" OR "ModuleImpl@115587" OR
"ModuleImpl@115588" OR "ModuleImpl@115589" OR "ModuleImpl@115590" OR
"ModuleImpl@115591" OR "ModuleImpl@115592" OR "ModuleImpl@115593" OR
"ModuleImpl@115594" OR "ModuleImpl@115595" OR "ModuleImpl@19807" OR
"ModuleImpl@38365" OR "ModuleImpl@115365" OR "ModuleImpl@115366" OR
"ModuleImpl@19808" OR "ModuleImpl@38373" OR "ModuleImpl@115367" OR
"ModuleImpl@115368" OR "ModuleImpl@115369" OR "ModuleImpl@115370" OR
"ModuleImpl@115371" OR "ModuleImpl@21121" OR "ModuleImpl@38418" OR
"ModuleImpl@117132" OR "ModuleImpl@117133" OR "ModuleImpl@117134" OR
"ModuleImpl@732" OR "ModuleImpl@38438" OR "ModuleImpl@117115" OR
"ModuleImpl@21099" OR "ModuleImpl@38440" OR "ModuleImpl@117116" OR
"ModuleImpl@19929" OR "ModuleImpl@38450" OR "ModuleImpl@115501" OR
"ModuleImpl@115502" OR "ModuleImpl@19810" OR "ModuleImpl@38471" OR
"ModuleImpl@115372" OR "ModuleImpl@115373" OR "ModuleImpl@21124" OR
"ModuleImpl@38529" OR "ModuleImpl@117135" OR "ModuleImpl@117136" OR
"ModuleImpl@117137" OR "ModuleImpl@117138" OR "ModuleImpl@19931" OR
"ModuleImpl@115505" OR "ModuleImpl@21074" OR "ModuleImpl@38546" OR
"ModuleImpl@117077" OR "ModuleImpl@19934" OR "ModuleImpl@38548" OR
"ModuleImpl@115507" OR "ModuleImpl@115508" OR "ModuleImpl@115509" OR
"ModuleImpl@115510" OR "ModuleImpl@20550" OR "ModuleImpl@38607" OR
"ModuleImpl@115885" OR "ModuleImpl@21127" OR "ModuleImpl@38638" OR
"ModuleImpl@117139" OR "ModuleImpl@21077" OR "ModuleImpl@25182" OR
"ModuleImpl@38657" OR "ModuleImpl@117078" OR "ModuleImpl@117079" OR
"ModuleImpl@117080" OR "ModuleImpl@19938" OR "ModuleImpl@38658" OR
"ModuleImpl@115516" OR "ModuleImpl@115517" OR "ModuleImpl@115518" OR
"ModuleImpl@115519" OR "ModuleImpl@19864" OR "ModuleImpl@115432" OR
"ModuleImpl@19769" OR "ModuleImpl@38695" OR "ModuleImpl@115320" OR
"ModuleImpl@20556" OR "ModuleImpl@38720" OR "ModuleImpl@20494" OR
"ModuleImpl@38736" OR "ModuleImpl@19871" OR "ModuleImpl@115438" OR
"ModuleImpl@21056" OR "ModuleImpl@38771" OR "ModuleImpl@19775" OR
"ModuleImpl@19776" OR "ModuleImpl@38802" OR "ModuleImpl@115330" OR
"ModuleImpl@115331" OR "ModuleImpl@115332" OR "ModuleImpl@20566" OR
"ModuleImpl@38835" OR "ModuleImpl@115889" OR "ModuleImpl@115890" OR
"ModuleImpl@20501" OR "ModuleImpl@38846" OR "ModuleImpl@115869" OR
"ModuleImpl@115870" OR "ModuleImpl@21107" OR "ModuleImpl@38859" OR
"ModuleImpl@117118" OR "ModuleImpl@19879" OR "ModuleImpl@38871" OR
"ModuleImpl@115444" OR "ModuleImpl@115445" OR "ModuleImpl@21058" OR
"ModuleImpl@38873" OR "ModuleImpl@19823" OR "ModuleImpl@25153" OR
"ModuleImpl@38896" OR "ModuleImpl@115396" OR "ModuleImpl@115397" OR
"ModuleImpl@19779" OR "ModuleImpl@38904" OR "ModuleImpl@115334" OR
"ModuleImpl@115335" OR "ModuleImpl@115336" OR "ModuleImpl@20574" OR
"ModuleImpl@38932" OR "ModuleImpl@115892" OR "ModuleImpl@115893" OR
"ModuleImpl@20504" OR "ModuleImpl@38941" OR "ModuleImpl@115871" OR
"ModuleImpl@115872" OR "ModuleImpl@21083" OR "ModuleImpl@38962" OR
"ModuleImpl@117081" OR "ModuleImpl@117082" OR "ModuleImpl@19884" OR
"ModuleImpl@3896

Re: Pls help: Very long query - what to do?

2012-11-21 Thread Péter Király
Hi,

you have to set maxHttpHeaderSize of the  element in
server.xml. The default is something about 8K.
See it with more detail:
http://serverfault.com/questions/56691/whats-the-maximum-url-length-in-tomcat

Regards,
Péter Király
portal backend developer
http://europeana.eu

2012/11/21 uwe72 :
> my query is like this, see below. I use already POST request.
>
> i got a solr exception:
> org.apache.solr.client.solrj.SolrServerException: Server at
> http://server:7056/solr returned non ok status:400, message:Bad Request
>
> is there a way in order to prevent this?
>
> id:("ModuleImpl@20117" OR "ModuleImpl@37886" OR "ModuleImpl@9379" OR
> "ModuleImpl@37906" OR "ModuleImpl@19969" OR "ModuleImpl@37936" OR
> "ModuleImpl@115568" OR "ModuleImpl@19901" OR "ModuleImpl@115472" OR
> "ModuleImpl@20044" OR "ModuleImpl@25168" OR "ModuleImpl@38026" OR
> "ModuleImpl@115647" OR "ModuleImpl@115648" OR "ModuleImpl@115649" OR
> "ModuleImpl@20045" OR "ModuleImpl@25169" OR "ModuleImpl@38031" OR
> "ModuleImpl@115650" OR "ModuleImpl@21090" OR "ModuleImpl@38037" OR
> "ModuleImpl@117097" OR "ModuleImpl@21091" OR "ModuleImpl@38038" OR
> "ModuleImpl@117098" OR "ModuleImpl@117099" OR "ModuleImpl@19973" OR
> "ModuleImpl@38040" OR "ModuleImpl@115571" OR "ModuleImpl@115572" OR
> "ModuleImpl@115573" OR "ModuleImpl@21092" OR "ModuleImpl@38135" OR
> "ModuleImpl@117100" OR "ModuleImpl@21093" OR "ModuleImpl@38136" OR
> "ModuleImpl@117101" OR "ModuleImpl@117102" OR "ModuleImpl@19979" OR
> "ModuleImpl@38140" OR "ModuleImpl@115581" OR "ModuleImpl@19980" OR
> "ModuleImpl@38143" OR "ModuleImpl@115582" OR "ModuleImpl@115583" OR
> "ModuleImpl@21094" OR "ModuleImpl@38223" OR "ModuleImpl@117104" OR
> "ModuleImpl@117105" OR "ModuleImpl@117106" OR "ModuleImpl@117107" OR
> "ModuleImpl@117108" OR "ModuleImpl@21095" OR "ModuleImpl@38224" OR
> "ModuleImpl@117109" OR "ModuleImpl@19920" OR "ModuleImpl@25157" OR
> "ModuleImpl@38240" OR "ModuleImpl@115493" OR "ModuleImpl@20139" OR
> "ModuleImpl@38286" OR "ModuleImpl@115752" OR "ModuleImpl@21096" OR
> "ModuleImpl@38327" OR "ModuleImpl@117111" OR "ModuleImpl@117112" OR
> "ModuleImpl@117113" OR "ModuleImpl@21097" OR "ModuleImpl@38328" OR
> "ModuleImpl@117114" OR "ModuleImpl@19989" OR "ModuleImpl@25166" OR
> "ModuleImpl@38332" OR "ModuleImpl@115585" OR "ModuleImpl@115586" OR
> "ModuleImpl@19990" OR "ModuleImpl@38339" OR "ModuleImpl@115587" OR
> "ModuleImpl@115588" OR "ModuleImpl@115589" OR "ModuleImpl@115590" OR
> "ModuleImpl@115591" OR "ModuleImpl@115592" OR "ModuleImpl@115593" OR
> "ModuleImpl@115594" OR "ModuleImpl@115595" OR "ModuleImpl@19807" OR
> "ModuleImpl@38365" OR "ModuleImpl@115365" OR "ModuleImpl@115366" OR
> "ModuleImpl@19808" OR "ModuleImpl@38373" OR "ModuleImpl@115367" OR
> "ModuleImpl@115368" OR "ModuleImpl@115369" OR "ModuleImpl@115370" OR
> "ModuleImpl@115371" OR "ModuleImpl@21121" OR "ModuleImpl@38418" OR
> "ModuleImpl@117132" OR "ModuleImpl@117133" OR "ModuleImpl@117134" OR
> "ModuleImpl@732" OR "ModuleImpl@38438" OR "ModuleImpl@117115" OR
> "ModuleImpl@21099" OR "ModuleImpl@38440" OR "ModuleImpl@117116" OR
> "ModuleImpl@19929" OR "ModuleImpl@38450" OR "ModuleImpl@115501" OR
> "ModuleImpl@115502" OR "ModuleImpl@19810" OR "ModuleImpl@38471" OR
> "ModuleImpl@115372" OR "ModuleImpl@115373" OR "ModuleImpl@21124" OR
> "ModuleImpl@38529" OR "ModuleImpl@117135" OR "ModuleImpl@117136" OR
> "ModuleImpl@117137" OR "ModuleImpl@117138" OR "ModuleImpl@19931" OR
> "ModuleImpl@115505" OR "ModuleImpl@21074" OR "ModuleImpl@38546" OR
> "ModuleImpl@117077" OR "ModuleImpl@19934" OR "ModuleImpl@38548" OR
> "ModuleImpl@115507" OR "ModuleImpl@115508" OR "ModuleImpl@115509" OR
> "ModuleImpl@115510" OR "ModuleImpl@20550" OR "ModuleImpl@38607" OR
> "ModuleImpl@115885" OR "ModuleImpl@21127" OR "ModuleImpl@38638" OR
> "ModuleImpl@117139" OR "ModuleImpl@21077" OR "ModuleImpl@25182" OR
> "ModuleImpl@38657" OR "ModuleImpl@117078" OR "ModuleImpl@117079" OR
> "ModuleImpl@117080" OR "ModuleImpl@19938" OR "ModuleImpl@38658" OR
> "ModuleImpl@115516" OR "ModuleImpl@115517" OR "ModuleImpl@115518" OR
> "ModuleImpl@115519" OR "ModuleImpl@19864" OR "ModuleImpl@115432" OR
> "ModuleImpl@19769" OR "ModuleImpl@38695" OR "ModuleImpl@115320" OR
> "ModuleImpl@20556" OR "ModuleImpl@38720" OR "ModuleImpl@20494" OR
> "ModuleImpl@38736" OR "ModuleImpl@19871" OR "ModuleImpl@115438" OR
> "ModuleImpl@21056" OR "ModuleImpl@38771" OR "ModuleImpl@19775" OR
> "ModuleImpl@19776" OR "ModuleImpl@38802" OR "ModuleImpl@115330" OR
> "ModuleImpl@115331" OR "ModuleImpl@115332" OR "ModuleImpl@20566" OR
> "ModuleImpl@38835" OR "ModuleImpl@115889" OR "ModuleImpl@115890" OR
> "ModuleImpl@20501" OR "ModuleImpl@38846" OR "ModuleImpl@115869" OR
> "ModuleImpl@115870" OR "ModuleImpl@21107" OR "ModuleImpl@38859" OR
> "ModuleImpl@117118" OR "ModuleImpl@19879" OR "ModuleImpl@38871" OR
> "ModuleImpl@115444" OR "ModuleImpl@115445" OR "ModuleImpl@21058" OR
> "ModuleImpl@38873" OR "ModuleImpl@19823" OR "ModuleImpl@25153" OR
> "ModuleImpl@38896" OR "ModuleImpl@115396" OR "Module

Re: Pls help: Very long query - what to do?

2012-11-21 Thread Rafał Kuć
Hello!

If you really need a query that long, than one of the things is
increase the allowed header length in Jetty. Add the following to your
Jetty connector configuration:

16384

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> my query is like this, see below. I use already POST request.

> i got a solr exception:
> org.apache.solr.client.solrj.SolrServerException: Server at
> http://server:7056/solr returned non ok status:400, message:Bad Request

> is there a way in order to prevent this?

> id:("ModuleImpl@20117" OR "ModuleImpl@37886" OR "ModuleImpl@9379" OR
> "ModuleImpl@37906" OR "ModuleImpl@19969" OR "ModuleImpl@37936" OR
> "ModuleImpl@115568" OR "ModuleImpl@19901" OR "ModuleImpl@115472" OR
> "ModuleImpl@20044" OR "ModuleImpl@25168" OR "ModuleImpl@38026" OR
> "ModuleImpl@115647" OR "ModuleImpl@115648" OR "ModuleImpl@115649" OR
> "ModuleImpl@20045" OR "ModuleImpl@25169" OR "ModuleImpl@38031" OR
> "ModuleImpl@115650" OR "ModuleImpl@21090" OR "ModuleImpl@38037" OR
> "ModuleImpl@117097" OR "ModuleImpl@21091" OR "ModuleImpl@38038" OR
> "ModuleImpl@117098" OR "ModuleImpl@117099" OR "ModuleImpl@19973" OR
> "ModuleImpl@38040" OR "ModuleImpl@115571" OR "ModuleImpl@115572" OR
> "ModuleImpl@115573" OR "ModuleImpl@21092" OR "ModuleImpl@38135" OR
> "ModuleImpl@117100" OR "ModuleImpl@21093" OR "ModuleImpl@38136" OR
> "ModuleImpl@117101" OR "ModuleImpl@117102" OR "ModuleImpl@19979" OR
> "ModuleImpl@38140" OR "ModuleImpl@115581" OR "ModuleImpl@19980" OR
> "ModuleImpl@38143" OR "ModuleImpl@115582" OR "ModuleImpl@115583" OR
> "ModuleImpl@21094" OR "ModuleImpl@38223" OR "ModuleImpl@117104" OR
> "ModuleImpl@117105" OR "ModuleImpl@117106" OR "ModuleImpl@117107" OR
> "ModuleImpl@117108" OR "ModuleImpl@21095" OR "ModuleImpl@38224" OR
> "ModuleImpl@117109" OR "ModuleImpl@19920" OR "ModuleImpl@25157" OR
> "ModuleImpl@38240" OR "ModuleImpl@115493" OR "ModuleImpl@20139" OR
> "ModuleImpl@38286" OR "ModuleImpl@115752" OR "ModuleImpl@21096" OR
> "ModuleImpl@38327" OR "ModuleImpl@117111" OR "ModuleImpl@117112" OR
> "ModuleImpl@117113" OR "ModuleImpl@21097" OR "ModuleImpl@38328" OR
> "ModuleImpl@117114" OR "ModuleImpl@19989" OR "ModuleImpl@25166" OR
> "ModuleImpl@38332" OR "ModuleImpl@115585" OR "ModuleImpl@115586" OR
> "ModuleImpl@19990" OR "ModuleImpl@38339" OR "ModuleImpl@115587" OR
> "ModuleImpl@115588" OR "ModuleImpl@115589" OR "ModuleImpl@115590" OR
> "ModuleImpl@115591" OR "ModuleImpl@115592" OR "ModuleImpl@115593" OR
> "ModuleImpl@115594" OR "ModuleImpl@115595" OR "ModuleImpl@19807" OR
> "ModuleImpl@38365" OR "ModuleImpl@115365" OR "ModuleImpl@115366" OR
> "ModuleImpl@19808" OR "ModuleImpl@38373" OR "ModuleImpl@115367" OR
> "ModuleImpl@115368" OR "ModuleImpl@115369" OR "ModuleImpl@115370" OR
> "ModuleImpl@115371" OR "ModuleImpl@21121" OR "ModuleImpl@38418" OR
> "ModuleImpl@117132" OR "ModuleImpl@117133" OR "ModuleImpl@117134" OR
> "ModuleImpl@732" OR "ModuleImpl@38438" OR "ModuleImpl@117115" OR
> "ModuleImpl@21099" OR "ModuleImpl@38440" OR "ModuleImpl@117116" OR
> "ModuleImpl@19929" OR "ModuleImpl@38450" OR "ModuleImpl@115501" OR
> "ModuleImpl@115502" OR "ModuleImpl@19810" OR "ModuleImpl@38471" OR
> "ModuleImpl@115372" OR "ModuleImpl@115373" OR "ModuleImpl@21124" OR
> "ModuleImpl@38529" OR "ModuleImpl@117135" OR "ModuleImpl@117136" OR
> "ModuleImpl@117137" OR "ModuleImpl@117138" OR "ModuleImpl@19931" OR
> "ModuleImpl@115505" OR "ModuleImpl@21074" OR "ModuleImpl@38546" OR
> "ModuleImpl@117077" OR "ModuleImpl@19934" OR "ModuleImpl@38548" OR
> "ModuleImpl@115507" OR "ModuleImpl@115508" OR "ModuleImpl@115509" OR
> "ModuleImpl@115510" OR "ModuleImpl@20550" OR "ModuleImpl@38607" OR
> "ModuleImpl@115885" OR "ModuleImpl@21127" OR "ModuleImpl@38638" OR
> "ModuleImpl@117139" OR "ModuleImpl@21077" OR "ModuleImpl@25182" OR
> "ModuleImpl@38657" OR "ModuleImpl@117078" OR "ModuleImpl@117079" OR
> "ModuleImpl@117080" OR "ModuleImpl@19938" OR "ModuleImpl@38658" OR
> "ModuleImpl@115516" OR "ModuleImpl@115517" OR "ModuleImpl@115518" OR
> "ModuleImpl@115519" OR "ModuleImpl@19864" OR "ModuleImpl@115432" OR
> "ModuleImpl@19769" OR "ModuleImpl@38695" OR "ModuleImpl@115320" OR
> "ModuleImpl@20556" OR "ModuleImpl@38720" OR "ModuleImpl@20494" OR
> "ModuleImpl@38736" OR "ModuleImpl@19871" OR "ModuleImpl@115438" OR
> "ModuleImpl@21056" OR "ModuleImpl@38771" OR "ModuleImpl@19775" OR
> "ModuleImpl@19776" OR "ModuleImpl@38802" OR "ModuleImpl@115330" OR
> "ModuleImpl@115331" OR "ModuleImpl@115332" OR "ModuleImpl@20566" OR
> "ModuleImpl@38835" OR "ModuleImpl@115889" OR "ModuleImpl@115890" OR
> "ModuleImpl@20501" OR "ModuleImpl@38846" OR "ModuleImpl@115869" OR
> "ModuleImpl@115870" OR "ModuleImpl@21107" OR "ModuleImpl@38859" OR
> "ModuleImpl@117118" OR "ModuleImpl@19879" OR "ModuleImpl@38871" OR
> "ModuleImpl@115444" OR "ModuleImpl@115445" OR "ModuleImpl@21058" OR
> "ModuleImpl@38873" OR "ModuleImpl@19823" OR "ModuleImpl@25153" OR
> "ModuleImpl@38896" OR "ModuleImpl@115396" OR "ModuleImpl@115397" OR
>

Re: Pls help: Very long query - what to do?

2012-11-21 Thread uwe72
i have already:







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pls-help-Very-long-query-what-to-do-tp4021606p4021619.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pls help: Very long query - what to do?

2012-11-21 Thread Luis Cappa Banda
Hello,

Do not forget to increase maxBooleanClauses.

Regards,


- Luis Cappa.


2012/11/21 uwe72 

> i am using tomcat
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Pls-help-Very-long-query-what-to-do-tp4021606p4021620.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

- Luis Cappa


Re: [SolrCloud] is softcommit cluster-wide for the collection ?

2012-11-21 Thread Mark Miller

On Nov 21, 2012, at 9:11 AM, GIROLAMI Philippe  
wrote:

> Hello,
> We're working on integrating SolrCloud andwe're  wondering whether issuing a 
> softCommit via Solrj forces the soft commit :
> 
> a) only on the receiving core or
> b) to the whole cluster and the receiving cores forwards the soft commit to 
> all replicas.

The answer is b.

> 
> If the answer is a), what is the best practice to ensure data is indeed 
> commited cluster-wide ?

Commit is no longer what ensures durability in solrcloud. Because of the 
transactionlog, once a request is ack'd, it's in. Hard commits then become 
about relieving the memory pressure of the transactionlog, and soft commits are 
about visibility. Neither is required for durability.

> If the answer is b), what would happen on a 1-replica setup if one commit 
> succeeded and the replica commit failed  ?

What's the reason the commit failed? Either a really bad problem and that node 
will need to be restarted and either won't answer requests or it will be asked 
to recover by the leader when sending it an update that failed.

Because commits are not required for durability, it's probably not the issue 
that you think.

- Mark



Re: SolrCloud and external Zookeeper ensemble

2012-11-21 Thread Mark Miller
Separate is generally nice because then you can restart Solr nodes without 
consideration for ZooKeeper.

Performance-wise, I doubt it's a big deal either way.

- Mark

On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki  wrote:

> Hi,
> 
> I have 4 solr collections, 2-3mn documents per collection, up to 100K
> updates per collection daily (roughly). I'm going to create SolrCloud4x on
> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The question is
> what about zookeeper? It's going to be external ensemble, but is it better
> to use same nodes as solr or dedicated micro instances? Zookeeper does not
> seem to be resources demanding process, but what would be better in this
> case ? To keep it inside of solrcloud or separately (micro instances seem
> to be enough here) ?
> 
> Thanks in advance.
> Regards.



Re: Using SolrCloud for update often lose response and get 503 error

2012-11-21 Thread Mark Miller
Have you looked at the logs?

- Mark

On Nov 21, 2012, at 1:07 AM, Qun Wang  wrote:

> Hello,
> 
> Does anyone get the error of 503 when update by using SolrCloud? In my test I 
> found that if update too frequency Solr often get 503 error and all servers 
> inaccessible. Could someone provide any suggestion for how to avoid this 
> issue?
> I used 200 threads for building index, and each thread submit 100 documents 
> to server, and when running for a time, server would lost response for 
> update, otherwise, search is not unreachable at the same time.
> 
> Thanks.



RE: [SolrCloud] is softcommit cluster-wide for the collection ?

2012-11-21 Thread GIROLAMI Philippe
Hi Mark,
Thanks for the details
>> If the answer is b), what would happen on a 1-replica setup if one commit 
>> succeeded and the replica commit failed  ?
>What's the reason the commit failed? Either a really bad problem and that node 
>will need to be restarted and either won't answer requests or it will 
>be asked to recover by the leader when sending it an update that failed.
Something dumb like a full disk for example. So I understand that the leader 
for the shard stored to the transaction log which means that if, in the worst 
case, it crashes and does not loose disk data, it will replay it. And for 
"slaves" crashes, they will get the commit log from the leader.
Is this right ?

>Because commits are not required for durability, it's probably not the issue 
>that you think.
Sure looks like it !

Thanks


Re: Out Of Memory =( Too many cores on one server?

2012-11-21 Thread Shawn Heisey

On 11/21/2012 12:36 AM, stockii wrote:

okay. i will try out more RAM.

i am using not much caching because of "near-realt-time"-search. in this
case its better to increase xmn or only xmx and xms?


I have personally found that increasing the size of the young generation 
(Eden) is beneficial to Solr, at least if you are using the parallel GC 
options.  I theorize that the collector for the young generation is more 
efficient than the full GC, but that's just a guess.  When I started 
doing that, the amount of time my Solr JVM spent doing garbage 
collection went way down, even though the number of garbage collections 
went up.


Lately I have been increasing the Eden size by using -XX:NewRatio=1 
rather than an explicit value on -Xmn.  This has one advantage - if you 
change the min/max heap size, the same value for NewRatio will still work.


Here are the options that I am currently using in production with Java6:

-Xms4096M
-Xmx8192M
-XX:NewRatio=1
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled

Here is what I am planning for the future with Solr4 and beyond with 
Java7,  including an environment variable for Xmx. Due to the 
experimental nature of the G1 collector, I would only trust it with the 
latest Java releases, especially for Java6.  The Unlock option is not 
required on Java7, only Java6.


-Xms256M
-Xmx${JMEM}
-XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC

Thanks,
Shawn



Re: [SolrCloud] is softcommit cluster-wide for the collection ?

2012-11-21 Thread Mark Miller

On Nov 21, 2012, at 11:00 AM, GIROLAMI Philippe  
wrote:

> Hi Mark,
> Thanks for the details
>>> If the answer is b), what would happen on a 1-replica setup if one commit 
>>> succeeded and the replica commit failed  ?
>> What's the reason the commit failed? Either a really bad problem and that 
>> node will need to be restarted and either won't answer requests or it will 
>> be asked to recover by the leader when sending it an update that failed.
> Something dumb like a full disk for example. So I understand that the leader 
> for the shard stored to the transaction log which means that if, in the worst 
> case, it crashes and does not loose disk data, it will replay it. And for 
> "slaves" crashes, they will get the commit log from the leader.
> Is this right ?

All of the nodes have their own transaction log. When a node comes back up, he 
will replay his local transaction log. Then he will contact the leader and 
compare versions - if he matches, it's all good - if not, he recovers from the 
leader. If he is the leader, he just replays his own local transaction log.

- Mark

Re: SolrCloud and external Zookeeper ensemble

2012-11-21 Thread Rafał Kuć
Hello!

Zookeeper by itself is not demanding, but if something happens to your
nodes that have Solr on it, you'll loose ZooKeeper too if you have
them installed side by side. However if you will have 4 Solr nodes and
3 ZK instances you can get them running side by side. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Separate is generally nice because then you can restart Solr nodes
> without consideration for ZooKeeper.

> Performance-wise, I doubt it's a big deal either way.

> - Mark

> On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki  wrote:

>> Hi,
>> 
>> I have 4 solr collections, 2-3mn documents per collection, up to 100K
>> updates per collection daily (roughly). I'm going to create SolrCloud4x on
>> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The question is
>> what about zookeeper? It's going to be external ensemble, but is it better
>> to use same nodes as solr or dedicated micro instances? Zookeeper does not
>> seem to be resources demanding process, but what would be better in this
>> case ? To keep it inside of solrcloud or separately (micro instances seem
>> to be enough here) ?
>> 
>> Thanks in advance.
>> Regards.



Re: Single Tomcat Multiple Shards

2012-11-21 Thread Mark Miller

On Nov 21, 2012, at 8:32 AM, Cool Techi  wrote:

> Hey Guys,
> 
> We are experimenting with solr cloud, this is what we want to set up as,
> 
> 2 Machines each having have 8 master shards, so total of 16 shards. The 
> assumption is we want to store approximately 4-5 TB data over a period of 1 
> year of so. 
> Replication factor of 1 which are again distributed across 3-4 machines.
> Initially we want to start with 8 shards in a single tomcat and single 
> machine, but I cannot find a way of having multiple shards in a single 
> SOLR_HOME and single Tomcat. Can this be achieved?
> Regards,Ayush 

Yes, it can be achieved. There are a couple bugs when you run try and run more 
than one shard from the same collection in a single instance though - they will 
be fixed in the upcoming 4.1 release.

You simply give the cores the same collection name but different core names.

Later, you can migrate those shards to other servers if you'd like.

- Mark

group.facet=true performances

2012-11-21 Thread Mickael Magniez
Hi,

I'm trying to use field collapsing, and i'm facing performance issue where
using group.facet=true.

On a small index (100.000 documents), query with group=true and
group.facet=false takes 20ms, and group.facet=true take 800ms

Maybe i miss some configuration option?

Best regards,

Mickael



--
View this message in context: 
http://lucene.472066.n3.nabble.com/group-facet-true-performances-tp4021639.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Out Of Memory =( Too many cores on one server?

2012-11-21 Thread Mark Miller
> I have personally found that increasing the size of the young generation 
> (Eden) is beneficial to Solr,

I've seen the same thing - I think it's because requests create a lot
of short lived objects and if the eden is not large enough, a lot of
those objects will make it to the tenured space, which is basically an
alg fail.

It's not a bad knob to tweak, because if you just keep raising the
heap, you can wastefully keep giving more unnecessary RAM to the
tenured space when you might only want to give more to the eden space.

- Mark


On Wed, Nov 21, 2012 at 11:00 AM, Shawn Heisey  wrote:
> On 11/21/2012 12:36 AM, stockii wrote:
>>
>> okay. i will try out more RAM.
>>
>> i am using not much caching because of "near-realt-time"-search. in this
>> case its better to increase xmn or only xmx and xms?
>
>
> I have personally found that increasing the size of the young generation
> (Eden) is beneficial to Solr, at least if you are using the parallel GC
> options.  I theorize that the collector for the young generation is more
> efficient than the full GC, but that's just a guess.  When I started doing
> that, the amount of time my Solr JVM spent doing garbage collection went way
> down, even though the number of garbage collections went up.
>
> Lately I have been increasing the Eden size by using -XX:NewRatio=1 rather
> than an explicit value on -Xmn.  This has one advantage - if you change the
> min/max heap size, the same value for NewRatio will still work.
>
> Here are the options that I am currently using in production with Java6:
>
> -Xms4096M
> -Xmx8192M
> -XX:NewRatio=1
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
>
> Here is what I am planning for the future with Solr4 and beyond with Java7,
> including an environment variable for Xmx. Due to the experimental nature of
> the G1 collector, I would only trust it with the latest Java releases,
> especially for Java6.  The Unlock option is not required on Java7, only
> Java6.
>
> -Xms256M
> -Xmx${JMEM}
> -XX:+UnlockExperimentalVMOptions
> -XX:+UseG1GC
>
> Thanks,
> Shawn
>



-- 
- Mark


Re: Pls help: Very long query - what to do?

2012-11-21 Thread Jack Krupansky
Check the Solr log to see what the actual error (Solr vs. SolrJ) message 
was.


-- Jack Krupansky

-Original Message- 
From: uwe72

Sent: Wednesday, November 21, 2012 10:31 AM
To: solr-user@lucene.apache.org
Subject: Pls help: Very long query - what to do?

my query is like this, see below. I use already POST request.

i got a solr exception:
org.apache.solr.client.solrj.SolrServerException: Server at
http://server:7056/solr returned non ok status:400, message:Bad Request

is there a way in order to prevent this?

id:("ModuleImpl@20117" OR "ModuleImpl@37886" OR "ModuleImpl@9379" OR
"ModuleImpl@37906" OR "ModuleImpl@19969" OR "ModuleImpl@37936" OR
"ModuleImpl@115568" OR "ModuleImpl@19901" OR "ModuleImpl@115472" OR
"ModuleImpl@20044" OR "ModuleImpl@25168" OR "ModuleImpl@38026" OR
"ModuleImpl@115647" OR "ModuleImpl@115648" OR "ModuleImpl@115649" OR
"ModuleImpl@20045" OR "ModuleImpl@25169" OR "ModuleImpl@38031" OR
"ModuleImpl@115650" OR "ModuleImpl@21090" OR "ModuleImpl@38037" OR
"ModuleImpl@117097" OR "ModuleImpl@21091" OR "ModuleImpl@38038" OR
"ModuleImpl@117098" OR "ModuleImpl@117099" OR "ModuleImpl@19973" OR
"ModuleImpl@38040" OR "ModuleImpl@115571" OR "ModuleImpl@115572" OR
"ModuleImpl@115573" OR "ModuleImpl@21092" OR "ModuleImpl@38135" OR
"ModuleImpl@117100" OR "ModuleImpl@21093" OR "ModuleImpl@38136" OR
"ModuleImpl@117101" OR "ModuleImpl@117102" OR "ModuleImpl@19979" OR
"ModuleImpl@38140" OR "ModuleImpl@115581" OR "ModuleImpl@19980" OR
"ModuleImpl@38143" OR "ModuleImpl@115582" OR "ModuleImpl@115583" OR
"ModuleImpl@21094" OR "ModuleImpl@38223" OR "ModuleImpl@117104" OR
"ModuleImpl@117105" OR "ModuleImpl@117106" OR "ModuleImpl@117107" OR
"ModuleImpl@117108" OR "ModuleImpl@21095" OR "ModuleImpl@38224" OR
"ModuleImpl@117109" OR "ModuleImpl@19920" OR "ModuleImpl@25157" OR
"ModuleImpl@38240" OR "ModuleImpl@115493" OR "ModuleImpl@20139" OR
"ModuleImpl@38286" OR "ModuleImpl@115752" OR "ModuleImpl@21096" OR
"ModuleImpl@38327" OR "ModuleImpl@117111" OR "ModuleImpl@117112" OR
"ModuleImpl@117113" OR "ModuleImpl@21097" OR "ModuleImpl@38328" OR
"ModuleImpl@117114" OR "ModuleImpl@19989" OR "ModuleImpl@25166" OR
"ModuleImpl@38332" OR "ModuleImpl@115585" OR "ModuleImpl@115586" OR
"ModuleImpl@19990" OR "ModuleImpl@38339" OR "ModuleImpl@115587" OR
"ModuleImpl@115588" OR "ModuleImpl@115589" OR "ModuleImpl@115590" OR
"ModuleImpl@115591" OR "ModuleImpl@115592" OR "ModuleImpl@115593" OR
"ModuleImpl@115594" OR "ModuleImpl@115595" OR "ModuleImpl@19807" OR
"ModuleImpl@38365" OR "ModuleImpl@115365" OR "ModuleImpl@115366" OR
"ModuleImpl@19808" OR "ModuleImpl@38373" OR "ModuleImpl@115367" OR
"ModuleImpl@115368" OR "ModuleImpl@115369" OR "ModuleImpl@115370" OR
"ModuleImpl@115371" OR "ModuleImpl@21121" OR "ModuleImpl@38418" OR
"ModuleImpl@117132" OR "ModuleImpl@117133" OR "ModuleImpl@117134" OR
"ModuleImpl@732" OR "ModuleImpl@38438" OR "ModuleImpl@117115" OR
"ModuleImpl@21099" OR "ModuleImpl@38440" OR "ModuleImpl@117116" OR
"ModuleImpl@19929" OR "ModuleImpl@38450" OR "ModuleImpl@115501" OR
"ModuleImpl@115502" OR "ModuleImpl@19810" OR "ModuleImpl@38471" OR
"ModuleImpl@115372" OR "ModuleImpl@115373" OR "ModuleImpl@21124" OR
"ModuleImpl@38529" OR "ModuleImpl@117135" OR "ModuleImpl@117136" OR
"ModuleImpl@117137" OR "ModuleImpl@117138" OR "ModuleImpl@19931" OR
"ModuleImpl@115505" OR "ModuleImpl@21074" OR "ModuleImpl@38546" OR
"ModuleImpl@117077" OR "ModuleImpl@19934" OR "ModuleImpl@38548" OR
"ModuleImpl@115507" OR "ModuleImpl@115508" OR "ModuleImpl@115509" OR
"ModuleImpl@115510" OR "ModuleImpl@20550" OR "ModuleImpl@38607" OR
"ModuleImpl@115885" OR "ModuleImpl@21127" OR "ModuleImpl@38638" OR
"ModuleImpl@117139" OR "ModuleImpl@21077" OR "ModuleImpl@25182" OR
"ModuleImpl@38657" OR "ModuleImpl@117078" OR "ModuleImpl@117079" OR
"ModuleImpl@117080" OR "ModuleImpl@19938" OR "ModuleImpl@38658" OR
"ModuleImpl@115516" OR "ModuleImpl@115517" OR "ModuleImpl@115518" OR
"ModuleImpl@115519" OR "ModuleImpl@19864" OR "ModuleImpl@115432" OR
"ModuleImpl@19769" OR "ModuleImpl@38695" OR "ModuleImpl@115320" OR
"ModuleImpl@20556" OR "ModuleImpl@38720" OR "ModuleImpl@20494" OR
"ModuleImpl@38736" OR "ModuleImpl@19871" OR "ModuleImpl@115438" OR
"ModuleImpl@21056" OR "ModuleImpl@38771" OR "ModuleImpl@19775" OR
"ModuleImpl@19776" OR "ModuleImpl@38802" OR "ModuleImpl@115330" OR
"ModuleImpl@115331" OR "ModuleImpl@115332" OR "ModuleImpl@20566" OR
"ModuleImpl@38835" OR "ModuleImpl@115889" OR "ModuleImpl@115890" OR
"ModuleImpl@20501" OR "ModuleImpl@38846" OR "ModuleImpl@115869" OR
"ModuleImpl@115870" OR "ModuleImpl@21107" OR "ModuleImpl@38859" OR
"ModuleImpl@117118" OR "ModuleImpl@19879" OR "ModuleImpl@38871" OR
"ModuleImpl@115444" OR "ModuleImpl@115445" OR "ModuleImpl@21058" OR
"ModuleImpl@38873" OR "ModuleImpl@19823" OR "ModuleImpl@25153" OR
"ModuleImpl@38896" OR "ModuleImpl@115396" OR "ModuleImpl@115397" OR
"ModuleImpl@19779" OR "ModuleImpl@38904" OR "ModuleImpl@115334" OR
"ModuleImpl@115335" OR "ModuleImpl@115336" OR "ModuleImpl@20574" OR
"ModuleImpl@38932" 

Re: Pls help: Very long query - what to do?

2012-11-21 Thread Shawn Heisey

On 11/21/2012 8:53 AM, Luis Cappa Banda wrote:

Do not forget to increase maxBooleanClauses.


I believe this is the culprit right here.  I counted 1576 instances of 
"OR" in the long query, which is rather a lot higher than the default 
maxBooleanClauses value of 1024.


I think that the maxBooleanClauses parameter (and its default) exist for 
good reason, probably to reduce the likelihood of resource exhaustion.  
It can't scale to many thousands or millions of clauses.


If possible, it would be a good idea to redesign so you can stay lower 
than 1024.  I construct a query like this when I do document deletes, 
and I limit it to 512 records at a time.If you can't redesign, be aware 
that you won't be able to scale indefinitely, even if you change 
maxBooleanClauses.


Thanks,
Shawn



Re: Solr 4 Admin UI Dashboard Not Populating

2012-11-21 Thread Stefan Matheis
Richard

>From what i see from the Screen, the Javascript stopped executing because of 
>an error .. my first guess would be, that if you request 
>"http://solr-host:port/solr/production/admin/system?wt=json"; manually - you'll 
>not see a "host" property in the "core"-object, right?

Normally that looks like:

{
core: {
schema: "example",
host: "hostname.tld",
now: "2012-11-21T16:47:59.172Z",
...
}
}


To verify that .. you can easily do the following change in 
solr/webapp/web/index.js (according to the file which is used in your running 
instance, grep for that) .. it's Line 236:

- 'host' : app.dashboard_values['core']['host'],
+ 'host' : app.dashboard_values['core']['host'] || '-',

Stefan 


On Wednesday, November 21, 2012 at 4:20 PM, richardg wrote:

> Our Admin UI Dashboard is not populating on one of our servers, not sure if
> it is a permission issue or what. We have three others that it is working
> on.
> 
> 
>  
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-Admin-UI-Dashboard-Not-Populating-tp4021602.html
> Sent from the Solr - User mailing list archive at Nabble.com 
> (http://Nabble.com).





Re: Pls help: Very long query - what to do?

2012-11-21 Thread Jack Krupansky

You can increase that limit in your solrconfig.xml:


   
   1024

Don't go wild with it, but upping it to 2000 or 5000 shouldn't be a big deal 
considering that hardware performance has increased significantly since 
Lucene was started.


-- Jack Krupansky

-Original Message- 
From: Shawn Heisey

Sent: Wednesday, November 21, 2012 12:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Pls help: Very long query - what to do?

On 11/21/2012 8:53 AM, Luis Cappa Banda wrote:

Do not forget to increase maxBooleanClauses.


I believe this is the culprit right here.  I counted 1576 instances of
"OR" in the long query, which is rather a lot higher than the default
maxBooleanClauses value of 1024.

I think that the maxBooleanClauses parameter (and its default) exist for
good reason, probably to reduce the likelihood of resource exhaustion.
It can't scale to many thousands or millions of clauses.

If possible, it would be a good idea to redesign so you can stay lower
than 1024.  I construct a query like this when I do document deletes,
and I limit it to 512 records at a time.If you can't redesign, be aware
that you won't be able to scale indefinitely, even if you change
maxBooleanClauses.

Thanks,
Shawn 



Suggester for numbers

2012-11-21 Thread Gustav
Hello guys, 
Please i need help.. im using the suggest search component for autocomplete
in Solr 3.6.1, i have an autocomplete field wich contains two other fields:
an conteiner_name and conteiner_id just like this:

 
 
 




When i search for username in my suggester handler it returns the
suggestions just fine,
but when i search for ID (numbers) it doesnt return results at all does
anyone knows why?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggester-for-numbers-tp4021672.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pls help: Very long query - what to do?

2012-11-21 Thread uwe72
Yes it works when i increase the maxBooleanClauses

But any case i have to think how i redesign the document structure.

i have big problems do the relations between documents.

also a document can be changed, then i have to update many documents which
has a relation to the modified one.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pls-help-Very-long-query-what-to-do-tp4021606p4021673.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4 Admin UI Dashboard Not Populating

2012-11-21 Thread richardg
I was able to figure it out, I ran solr/admin/system?wt=xml and noticed that
the host entry was blank.  Our servers are Linux so I looked at /etc/hosts
file and noticed it was messed up.  I made the change and everything is
populating now.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-Admin-UI-Dashboard-Not-Populating-tp4021602p4021676.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr defining Schema structure trouble.

2012-11-21 Thread Jack Krupansky
You could implement a custom search component that takes the pages found by 
the query and then re-queries to find the book-level documents and adds them 
to the search results. Or, you could even have a query/parameter that found 
the pages but then discarded them and only kept the book metadata.


-- Jack Krupansky

-Original Message- 
From: denl0

Sent: Wednesday, November 21, 2012 4:01 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr defining Schema structure trouble.

isn't it possible to combine the document related values and page related
values at query time?

Book1
Page1 with ref to book1
Page2 with ref to book2

When querying making all pages (page1+book1) and (page2+book1) Or would this
be hard to achieve.

I'm pretty sure they wan't to search on book related metadata too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-defining-Schema-structure-trouble-tp4020305p4021531.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Pls help: Very long query - what to do?

2012-11-21 Thread uwe72
My design is like this at the moment:

Documents in general has a relation to each other.

So, a document has a id, some attributes and a multivalue-field
"navigateTo".

E.g.

Document1: id1, some attributes,  naviagteToAllDocumentsWhenColor:red,
navigateTo: id2, id3

Document2: id2, some attributes, color:red, navigateTo:  id1 (backlink)
Document3: id3, some attributes, color:red, navigateTo:  id1 (backlink)
Document4: id5, some attributes, color:black, navigateTo:  

My first problem is, that when I re-import document3 I have to load all
documents in cache which has a relation to my documents, because of my color
is red. Especially when my color is not red anymore, I have to update
document1 und delete the relation to document3.

Always do the queries in order to find out which documents I have to update,
the to load and update it, costs a lot of performance.

That’s why I changed the design. I don’t do all relations anymore at
importime. I have some serialized hashmaps and store and update them outide
of solr.

In this maps I have the informations which documents I related to a
document. I have all ids. But then I have this problem now, that this can be
up to 20.000 ids. So I think this is impossible to load the with
OR...OR...OR.

It is a bit complicated to explain...i am using solr 3.6.1. I think with
solr 4 they have this LINK feature, where can join other queries. Not sure
if this would fix my problem.

REGARDS, Uwe




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pls-help-Very-long-query-what-to-do-tp4021606p4021684.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent search results.

2012-11-21 Thread Jack Krupansky
Try the Solr Admin Analysis page and see how your failing examples analyze 
for both index and query.


Also, if you experiment with analyzer settings, be sure to FULLY reindex 
your documents since a mismatch between how the documents were ORIGINALLY 
analyzed and the latest query analysis can cause mismatches. Changing an 
index analyzer does not force an automatic reindex.


Also, check to see that there is not a delimiter character, such as a colon, 
immediately before a term with no white space.


-- Jack Krupansky

-Original Message- 
From: Sohail Aboobaker

Sent: Wednesday, November 21, 2012 8:13 AM
To: solr-user@lucene.apache.org
Subject: Inconsistent search results.

Hi,

We have 500k+ documents indexed with many fields. One of the fields is a
simple text filled that is defined as default search field and we copy many
field values into that field.

Some values are composed of two components with a "." as separator. When we
search for the partial terms for such values, we get inconsistent results.
Following are some examples:

Value: KWJ1112.MC2850

we search on MC2850, it returns result.
we search on KWJ1112, no results.

Value: ACW9920.KL1230

we search on ACW9920, gives results.
we search on KL1230, gives results.

The results are inconsistent. Sometimes, it will give results on both sides
of partial search. For others, it would give results on only the last part
of word. The last part search always works.

We are using standard tokenizer as follows:



What should we use in order to get consistent results for both sides of
component? Should we be using whitespace with worddelimiterfactory? Some
examples will be helpful.

Thanks

Sohail 



Re: SolrCloud and exernal file fields

2012-11-21 Thread Simone Gianni
Hi Martin,
thanks for sharing your experience with EFF and saving me a lot of time
figuring it out myself, I was afraid of exactly this kind of problems.

Mikhail, thanks for expanding the thread with even more useful informations!

Simone


2012/11/20 Martin Koch 

> Solr 4.0 does support using EFFs, but it might not give you what you're
> hoping fore.
>
> We tried using Solr Cloud, and have given up again.
>
> The EFF is placed in the parent of the index directory in each core; each
> core reads the entire EFF and picks out the IDs that it is responsible for.
>
> In the current 4.0.0 release of solr, solr blocks (doesn't answer queries)
> while re-reading the EFF. Even worse, it seems that the time to re-read the
> EFF is multiplied by the number of cores in use (i.e. the EFF is re-read by
> each core sequentially). The contents of the EFF become active after the
> first EXTERNAL commit (commitWithin does NOT work here) after the file has
> been updated.
>
> In our case, the EFF was quite large - around 450MB - and we use 16 shards,
> so when we triggered an external commit to force re-reading, the whole
> system would block for several (10-15) minutes. This won't work in a
> production environment. The reason for the size of the EFF is that we have
> around 7M documents in the index; each document has a 45 character ID.
>
> We got some help to try to fix the problem so that the re-read of the EFF
> proceeds in the background (see
> here for
> a fix on the 4.1 branch). However, even though the re-read proceeds in the
> background, the time required to launch solr now takes at least as long as
> re-reading the EFFs. Again, this is not good enough for our needs.
>
> The next issue is that you cannot sort on EFF fields (though you can return
> them as values using &fl=field(my_eff_field). This is also fixed in the 4.1
> branch here .
>
> So: Even after these fixes, EFF performance is not that great. Our solution
> is as follows: The actual value of the popularity measure (say, reads) that
> we want to report to the user is inserted into the search response
> post-query by our query front-end. This value will then be the
> authoritative value at the time of the query. The value of the popularity
> measure that we use for boosting in the ranking of the search results is
> only updated when the value has changed enough so that the impact on the
> boost will be significant (say, more than 2%). This does require frequent
> re-indexing of the documents that have significant changes in the number of
> reads, but at least we won't have to update a document if it moves from,
> say, 100 to 101 reads.
>
> /Martin Koch - ISSUU - senior systems architect.
>
> On Mon, Nov 19, 2012 at 3:22 PM, Simone Gianni  wrote:
>
> > Hi all,
> > I'm planning to move a quite big Solr index to SolrCloud. However, in
> this
> > index, an external file field is used for popularity ranking.
> >
> > Does SolrCloud supports external file fields? How does it cope with
> > sharding and replication? Where should the external file be placed now
> that
> > the index folder is not local but in the cloud?
> >
> > Are there otherwise other best practices to deal with the use cases
> > external file fields were used for, like popularity/ranking, in
> SolrCloud?
> > Custom ValueSources going to something external?
> >
> > Thanks in advance,
> > Simone
> >
>


Re: Inconsistent search results.

2012-11-21 Thread Luis Cappa Banda
Hello!

I suggest you to try PatternTokenizer with a regex that includes "." and
blank spaces, for example, in Query and Index analyzers for that fieldType.
The expression will be tokenized by that regex expression and you will
success querying. Unfortunately, you will have to reindex all if you change
your schema.

Regards,

- Luis Cappa
El 21/11/2012 19:13, "Jack Krupansky"  escribió:

> Try the Solr Admin Analysis page and see how your failing examples analyze
> for both index and query.
>
> Also, if you experiment with analyzer settings, be sure to FULLY reindex
> your documents since a mismatch between how the documents were ORIGINALLY
> analyzed and the latest query analysis can cause mismatches. Changing an
> index analyzer does not force an automatic reindex.
>
> Also, check to see that there is not a delimiter character, such as a
> colon, immediately before a term with no white space.
>
> -- Jack Krupansky
>
> -Original Message- From: Sohail Aboobaker
> Sent: Wednesday, November 21, 2012 8:13 AM
> To: solr-user@lucene.apache.org
> Subject: Inconsistent search results.
>
> Hi,
>
> We have 500k+ documents indexed with many fields. One of the fields is a
> simple text filled that is defined as default search field and we copy many
> field values into that field.
>
> Some values are composed of two components with a "." as separator. When we
> search for the partial terms for such values, we get inconsistent results.
> Following are some examples:
>
> Value: KWJ1112.MC2850
>
> we search on MC2850, it returns result.
> we search on KWJ1112, no results.
>
> Value: ACW9920.KL1230
>
> we search on ACW9920, gives results.
> we search on KL1230, gives results.
>
> The results are inconsistent. Sometimes, it will give results on both sides
> of partial search. For others, it would give results on only the last part
> of word. The last part search always works.
>
> We are using standard tokenizer as follows:
>
>  positionIncrementGap="100"><**analyzer type="index"> class="solr.**StandardTokenizerFactory"/><**filter
> class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
> enablePositionIncrements="**true"/> class="solr.**LowerCaseFilterFactory"/> type="query"><**filter
> class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
> enablePositionIncrements="**true"/> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> class="solr.**LowerCaseFilterFactory"/>
>
> What should we use in order to get consistent results for both sides of
> component? Should we be using whitespace with worddelimiterfactory? Some
> examples will be helpful.
>
> Thanks
>
> Sohail
>


Re: Weird Behaviour on Solr 5x (SolrCloud)

2012-11-21 Thread Mark Miller
I'm not sure - I guess I'll have to look into it - could you file a
JIRA issue with these details?

- Mark

On Wed, Nov 21, 2012 at 1:19 AM, deniz  wrote:
> well... i find a way to avoid this... i dont know if it is the correct way or
> i am simply bypassing the problem instead of fixing it..
>
> when i delete the data/ folders contents before restarting, it can get the
> index information from cloud without any problem...
>
> so it is the way how solrcloud works? or i am missing something important
> here?
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Weird-Behaviour-on-Solr-5x-SolrCloud-tp4021219p4021507.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
- Mark


Re: Pls help: Very long query - what to do?

2012-11-21 Thread Mikhail Khludnev
Uwe,

Do you think BlockJoin can help you
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html?
Do your docs forms parent-child blocks? How often you need to reindex
particular doc?


On Wed, Nov 21, 2012 at 9:54 PM, uwe72  wrote:

> My design is like this at the moment:
>
> Documents in general has a relation to each other.
>
> So, a document has a id, some attributes and a multivalue-field
> "navigateTo".
>
> E.g.
>
> Document1: id1, some attributes,  naviagteToAllDocumentsWhenColor:red,
> navigateTo: id2, id3
>
> Document2: id2, some attributes, color:red, navigateTo:  id1 (backlink)
> Document3: id3, some attributes, color:red, navigateTo:  id1 (backlink)
> Document4: id5, some attributes, color:black, navigateTo:
>
> My first problem is, that when I re-import document3 I have to load all
> documents in cache which has a relation to my documents, because of my
> color
> is red. Especially when my color is not red anymore, I have to update
> document1 und delete the relation to document3.
>
> Always do the queries in order to find out which documents I have to
> update,
> the to load and update it, costs a lot of performance.
>
> That’s why I changed the design. I don’t do all relations anymore at
> importime. I have some serialized hashmaps and store and update them outide
> of solr.
>
> In this maps I have the informations which documents I related to a
> document. I have all ids. But then I have this problem now, that this can
> be
> up to 20.000 ids. So I think this is impossible to load the with
> OR...OR...OR.
>
> It is a bit complicated to explain...i am using solr 3.6.1. I think with
> solr 4 they have this LINK feature, where can join other queries. Not sure
> if this would fix my problem.
>
> REGARDS, Uwe
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Pls-help-Very-long-query-what-to-do-tp4021606p4021684.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: is there a way to prevent abusing rows parameter

2012-11-21 Thread Alexandre Rafalovitch
Does that 'someone' has direct access to Solr endpoint? Is that a right
thing to do in a first place?

But assuming they do (e.g. intranet), you could build on Jack's suggestion
and create a couple of query-handler end-points that are only different in
invariant raw count value. So, your default search goes to search10, your
25 results page goes to search25, etc.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Nov 20, 2012 at 8:23 PM, solr-user  wrote:

> silly question
>
> is there any configuration value I can set to prevent someone from entering
> a bad value for the rows parameter?
>
> ie to prevent something like "&rows=1"  from crashing my servers?
>
> the server I am looking at is a solr v3.6
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/is-there-a-way-to-prevent-abusing-rows-parameter-tp4021467.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCloud and exernal file fields

2012-11-21 Thread Mikhail Khludnev
Martin,

I don't think solrconfig.xml shed any light on. I've just found what I
didn't get in your setup - the way of how to explicitly assigning core to
collection. Now, I realized most of details after all!
Ball is on your side, let us know whether you have managed your cores to
commit one by one to avoid freeze, or could you eliminate pauses by
allocating more hardware?
Thanks in advance!


On Wed, Nov 21, 2012 at 3:56 PM, Martin Koch  wrote:

> Mikhail,
>
> PSB
>
> On Wed, Nov 21, 2012 at 10:08 AM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
> > On Wed, Nov 21, 2012 at 11:53 AM, Martin Koch  wrote:
> >
> > >
> > > I wasn't aware until now that it is possible to send a commit to one
> core
> > > only. What we observed was the effect of curl
> > > localhost:8080/solr/update?commit=true but perhaps we should experiment
> > > with solr/coreN/update?commit=true. A quick trial run seems to indicate
> > > that a commit to a single core causes commits on all cores.
> > >
> > You should see something like this in the log:
> > ... SolrCmdDistributor  Distrib commit to: ...
> >
> > Yup, a commit towards a single core results in a commit on all cores.
>
>
> > >
> > >
> > > Perhaps I should clarify that we are using SOLR as a black box; we do
> not
> > > touch the code at all - we only install the distribution WAR file and
> > > proceed from there.
> > >
> > I still don't understand how you deploy/launch Solr. How many jettys you
> > start whether you have -DzkRun -DzkHost -DnumShards=2  or you specifies
> > shards= param for every request and distributes updates yourself? What
> > collections do you create and with which settings?
> >
> > We let SOLR do the sharding using one collection with 16 SOLR cores
> holding one shard each. We launch only one instance of jetty with the
> folllowing arguments:
>
> -DnumShards=16
> -DzkHost=
> -Xmx10G
> -Xms10G
> -Xmn2G
> -server
>
> Would you like to see the solrconfig.xml?
>
> /Martin
>
>
> > >
> > >
> > > > Also from my POV such deployments should start at least from *16*
> 4-way
> > > > vboxes, it's more expensive, but should be much better available
> during
> > > > cpu-consuming operations.
> > > >
> > >
> > > Do you mean that you recommend 16 hosts with 4 cores each? Or 4 hosts
> > with
> > > 16 cores? Or am I misunderstanding something :) ?
> > >
> > I prefer to start from 16 hosts with 4 cores each.
> >
> >
> > >
> > >
> > > > Other details, if you use single jetty for all of them, are you sure
> > that
> > > > jetty's threadpool doesn't limit requests? is it large enough?
> > > > You have 60G and set -Xmx=10G. are you sure that total size of cores
> > > index
> > > > directories is less than 45G?
> > > >
> > > > The total index size is 230 GB, so it won't fit in ram, but we're
> using
> > > an
> > > SSD disk to minimize disk access time. We have tried putting the EFF
> > onto a
> > > ram disk, but this didn't have a measurable effect.
> > >
> > > Thanks,
> > > /Martin
> > >
> > >
> > > > Thanks
> > > >
> > > >
> > > > On Wed, Nov 21, 2012 at 2:07 AM, Martin Koch  wrote:
> > > >
> > > > > Mikhail
> > > > >
> > > > > PSB
> > > > >
> > > > > On Tue, Nov 20, 2012 at 7:22 PM, Mikhail Khludnev <
> > > > > mkhlud...@griddynamics.com> wrote:
> > > > >
> > > > > > Martin,
> > > > > >
> > > > > > Please find additional question from me below.
> > > > > >
> > > > > > Simone,
> > > > > >
> > > > > > I'm sorry for hijacking your thread. The only what I've heard
> about
> > > it
> > > > at
> > > > > > recent ApacheCon sessions is that Zookeeper is supposed to
> > replicate
> > > > > those
> > > > > > files as configs under solr home. And I'm really looking forward
> to
> > > > know
> > > > > > how it works with huge files in production.
> > > > > >
> > > > > > Thank You, Guys!
> > > > > >
> > > > > > 20.11.2012 18:06 пользователь "Martin Koch" 
> > написал:
> > > > > > >
> > > > > > > Hi Mikhail
> > > > > > >
> > > > > > > Please see answers below.
> > > > > > >
> > > > > > > On Tue, Nov 20, 2012 at 12:28 PM, Mikhail Khludnev <
> > > > > > > mkhlud...@griddynamics.com> wrote:
> > > > > > >
> > > > > > > > Martin,
> > > > > > > >
> > > > > > > > Thank you for telling your own "war-story". It's really
> useful
> > > for
> > > > > > > > community.
> > > > > > > > The first question might seems not really conscious, but
> would
> > > you
> > > > > tell
> > > > > > me
> > > > > > > > what blocks searching during EFF reload, when it's triggered
> by
> > > > > handler
> > > > > > or
> > > > > > > > by listener?
> > > > > > > >
> > > > > > >
> > > > > > > We continuously index new documents using CommitWithin to get
> > > regular
> > > > > > > commits. However, we observed that the EFFs were not re-read,
> so
> > we
> > > > had
> > > > > > to
> > > > > > > do external commits (curl '.../solr/update?commit=true') to
> force
> > > > > reload.
> > > > > > > When this is done, solr blocks. I can't tell you exactly why
> it's
> > > > doing
> > > > > > > that (it was related 

Re: SolrCloud and external Zookeeper ensemble

2012-11-21 Thread Marcin Rzewucki
First of all: thank you for your answers. Yes, I meant side by side
configuration. I think the worst case for ZKs here is to loose two of them.
However, I'm going to use 4 availability zones in same region so at least
this will reduce the risk of loosing both of them at the same time.
Regards.

On 21 November 2012 17:06, Rafał Kuć  wrote:

> Hello!
>
> Zookeeper by itself is not demanding, but if something happens to your
> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> them installed side by side. However if you will have 4 Solr nodes and
> 3 ZK instances you can get them running side by side.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > Separate is generally nice because then you can restart Solr nodes
> > without consideration for ZooKeeper.
>
> > Performance-wise, I doubt it's a big deal either way.
>
> > - Mark
>
> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki 
> wrote:
>
> >> Hi,
> >>
> >> I have 4 solr collections, 2-3mn documents per collection, up to 100K
> >> updates per collection daily (roughly). I'm going to create SolrCloud4x
> on
> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The question is
> >> what about zookeeper? It's going to be external ensemble, but is it
> better
> >> to use same nodes as solr or dedicated micro instances? Zookeeper does
> not
> >> seem to be resources demanding process, but what would be better in this
> >> case ? To keep it inside of solrcloud or separately (micro instances
> seem
> >> to be enough here) ?
> >>
> >> Thanks in advance.
> >> Regards.
>
>


Re: Solr 4 Admin UI Dashboard Not Populating

2012-11-21 Thread Stefan Matheis
Glad it worked Richard, i've created an issue anyway: 
https://issues.apache.org/jira/browse/SOLR-4102



On Wednesday, November 21, 2012 at 6:40 PM, richardg wrote:

> Thanks Stefan host was the issue, I responded to my post before I saw yours.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-Admin-UI-Dashboard-Not-Populating-tp4021602p4021679.html
> Sent from the Solr - User mailing list archive at Nabble.com 
> (http://Nabble.com).





Using payloads to encode part-of-speech in Solr 4.0.0

2012-11-21 Thread Martí Quixal
Dear list members,

I am trying to figure out how to configure schema.xml in solr 4.0.0 so that
it takes into account part-of-speech (PoS) tags to index documents and
filter queries, all of it by using payloads.

The schema.xml file includes a payloads field in Solr 4.0.0. From the
comments I have learnt that payloads require an encoder whose values can be
float, integer or identity. However none of these seem to me appropriate to
encode PoS tags (they are rather strings: NNS, VLInf, CSUB, etc. well,
actually ugly|ADJ man|NNS).

Using identity as the encoder type the indexer does not complain (using the
other two it does). However, I haven't been able to filter queries
information on the right-hand side of the payloads delimiter (|) in the
payloads (ADJ or NNS in ugly|ADJ man|NNS).

I will appreciate any help or pointers you can provide me with. I will be
happy to provide more details if needed.

Best regards,
Martí

-- 
Martí Quixal
Computational Linguist & Educational Technologist
http://www.iqubo.org/quixal


Re: SolrCloud and external Zookeeper ensemble

2012-11-21 Thread Rafał Kuć
Hello!

As I told I wouldn't use the Zookeeper that is embedded into Solr, but
rather setup a standalone one. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> First of all: thank you for your answers. Yes, I meant side by side
> configuration. I think the worst case for ZKs here is to loose two of them.
> However, I'm going to use 4 availability zones in same region so at least
> this will reduce the risk of loosing both of them at the same time.
> Regards.

> On 21 November 2012 17:06, Rafał Kuć  wrote:

>> Hello!
>>
>> Zookeeper by itself is not demanding, but if something happens to your
>> nodes that have Solr on it, you'll loose ZooKeeper too if you have
>> them installed side by side. However if you will have 4 Solr nodes and
>> 3 ZK instances you can get them running side by side.
>>
>> --
>> Regards,
>>  Rafał Kuć
>>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>>
>> > Separate is generally nice because then you can restart Solr nodes
>> > without consideration for ZooKeeper.
>>
>> > Performance-wise, I doubt it's a big deal either way.
>>
>> > - Mark
>>
>> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki 
>> wrote:
>>
>> >> Hi,
>> >>
>> >> I have 4 solr collections, 2-3mn documents per collection, up to 100K
>> >> updates per collection daily (roughly). I'm going to create SolrCloud4x
>> on
>> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The question is
>> >> what about zookeeper? It's going to be external ensemble, but is it
>> better
>> >> to use same nodes as solr or dedicated micro instances? Zookeeper does
>> not
>> >> seem to be resources demanding process, but what would be better in this
>> >> case ? To keep it inside of solrcloud or separately (micro instances
>> seem
>> >> to be enough here) ?
>> >>
>> >> Thanks in advance.
>> >> Regards.
>>
>>



Re: SolrCloud and external Zookeeper ensemble

2012-11-21 Thread Marcin Rzewucki
Yes, I meant the same (not -zkRun). However, I was asking if it is safe to
have zookeeper and solr processes running on the same node or better on
different machines?

On 21 November 2012 21:18, Rafał Kuć  wrote:

> Hello!
>
> As I told I wouldn't use the Zookeeper that is embedded into Solr, but
> rather setup a standalone one.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > First of all: thank you for your answers. Yes, I meant side by side
> > configuration. I think the worst case for ZKs here is to loose two of
> them.
> > However, I'm going to use 4 availability zones in same region so at least
> > this will reduce the risk of loosing both of them at the same time.
> > Regards.
>
> > On 21 November 2012 17:06, Rafał Kuć  wrote:
>
> >> Hello!
> >>
> >> Zookeeper by itself is not demanding, but if something happens to your
> >> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> >> them installed side by side. However if you will have 4 Solr nodes and
> >> 3 ZK instances you can get them running side by side.
> >>
> >> --
> >> Regards,
> >>  Rafał Kuć
> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> ElasticSearch
> >>
> >> > Separate is generally nice because then you can restart Solr nodes
> >> > without consideration for ZooKeeper.
> >>
> >> > Performance-wise, I doubt it's a big deal either way.
> >>
> >> > - Mark
> >>
> >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki 
> >> wrote:
> >>
> >> >> Hi,
> >> >>
> >> >> I have 4 solr collections, 2-3mn documents per collection, up to 100K
> >> >> updates per collection daily (roughly). I'm going to create
> SolrCloud4x
> >> on
> >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The
> question is
> >> >> what about zookeeper? It's going to be external ensemble, but is it
> >> better
> >> >> to use same nodes as solr or dedicated micro instances? Zookeeper
> does
> >> not
> >> >> seem to be resources demanding process, but what would be better in
> this
> >> >> case ? To keep it inside of solrcloud or separately (micro instances
> >> seem
> >> >> to be enough here) ?
> >> >>
> >> >> Thanks in advance.
> >> >> Regards.
> >>
> >>
>
>


Partial results with not enough hits

2012-11-21 Thread Aleksey Vorona
In all of my queries I have timeAllowed parameter. My application is 
ready for partial results. However, whenever Solr returns partial result 
it is a very bad result.


For example, I have a test query and here its execution log with the 
strict time allowed:
WARNING: Query: ; Elapsed time: 120Exceeded allowed search 
time: 100 ms.
INFO: [] webapp=/solr path=/select 
params={&timeAllowed=100} hits=189 status=0 QTime=119

Here it is without such a strict limitation:
INFO: [] webapp=/solr path=/select 
params={&timeAllowed=1} hits=582 status=0 QTime=124


The total execution time is different by mere 5 ms, but the partial 
result has only about 1/3 of the full result.


Is it the expected behaviour? Does that mean I can never rely on the 
partial results?


I added timeAllowed to protect from too expensive wide queries, but I 
still want to return something relevant to the user. This query returned 
30% of the full result, but I have other queries in the log where 
partial result is just empty. Am I doing something wrong?


P.S. I am using Solr 3.6.1, index size is 3Gb and easily fits in memory. 
Load Average on the Solr box is very low.


-- Aleksey


Re: Partial results with not enough hits

2012-11-21 Thread Jack Krupansky
It could be that the time to get set up to return even the first result is 
high and then each additional document is a minimal increment in time.


Do a query with &rows=1 (or even 0) and see what the minimum query time is 
for your query, index, and environment.


-- Jack Krupansky

-Original Message- 
From: Aleksey Vorona

Sent: Wednesday, November 21, 2012 6:04 PM
To: solr-user@lucene.apache.org
Subject: Partial results with not enough hits

In all of my queries I have timeAllowed parameter. My application is
ready for partial results. However, whenever Solr returns partial result
it is a very bad result.

For example, I have a test query and here its execution log with the
strict time allowed:
WARNING: Query: ; Elapsed time: 120Exceeded allowed search
time: 100 ms.
INFO: [] webapp=/solr path=/select
params={&timeAllowed=100} hits=189 status=0 QTime=119
Here it is without such a strict limitation:
INFO: [] webapp=/solr path=/select
params={&timeAllowed=1} hits=582 status=0 QTime=124

The total execution time is different by mere 5 ms, but the partial
result has only about 1/3 of the full result.

Is it the expected behaviour? Does that mean I can never rely on the
partial results?

I added timeAllowed to protect from too expensive wide queries, but I
still want to return something relevant to the user. This query returned
30% of the full result, but I have other queries in the log where
partial result is just empty. Am I doing something wrong?

P.S. I am using Solr 3.6.1, index size is 3Gb and easily fits in memory.
Load Average on the Solr box is very low.

-- Aleksey 



Re: Weird Behaviour on Solr 5x (SolrCloud)

2012-11-21 Thread deniz
Mark Miller-3 wrote
> I'm not sure - I guess I'll have to look into it - could you file a
> JIRA issue with these details?

sure... 
but before that could it be because of using RAM dir? because basically when
you restart solr the ram is gone and it tries to checks the old folder that
it had used... and as it cant find anything in the ram it shows an empty
index... could it be the reason? though still this not explains why after
restart it was not filled with the data from cloud...





-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-Behaviour-on-Solr-5x-SolrCloud-tp4021219p4021776.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud(5x) - Detects all of the Solr insrances on a machine

2012-11-21 Thread deniz
after putting the port information to solr.xml too, it seems properly... i
dont know why this thing only happens on remote machines not on local, but
could this be a minor bug related with solr? basically if we are giving the
port information in the starting command, then we shouldnt be dealing with
the port information in configs files, imho



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-5x-Detects-all-of-the-Solr-insrances-on-a-machine-tp4021254p4021777.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to use CloudSolrServer in multi threaded indexing program

2012-11-21 Thread ss
I am a newbie to SolrCloud.

I have setup a SolrCloud of n leaders, n replicas and a zookeeper ensemble.
I have a client that uses SolrJ and has access to millions of docs. This
client program runs on a separate machine. Since I want these docs to be
indexed as fast as possible, I would like to spawn multiple threads - each
adding a set of docs to solr cloud. In this scenario, should each thread be
using CloudSolrServer? Since, CloudSolrServer is not thread safe, should
each thread maintain its own instance of CloudSolrServer or they be creating
a new instance of CloudSolrServer for each doc being submitted? Should I be
using ConcurrentUpdateSolrServer instead? But ConcurrentUpdateSolrServer is
attached to a single URL. Should I be passing load balancer URL to
ConcurrentUpdateSolrServer then?

I would like to hear from the Solr gurus out there as how they would design
the indexer/submitter client program for optimal throughput.

Thanks in advance. Happy thanks giving!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-use-CloudSolrServer-in-multi-threaded-indexing-program-tp4021783.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud(5x) - Detects all of the Solr insrances on a machine

2012-11-21 Thread Mark Miller
Limitation of web containers. There is not a clean way to get the port without 
making some request.

If you pass the port as a sys prop on the cmd line and use jetty, it works out 
of the box. If you don't do that, there is config necessary.

- Mark

On Nov 21, 2012, at 8:34 PM, deniz  wrote:

> after putting the port information to solr.xml too, it seems properly... i
> dont know why this thing only happens on remote machines not on local, but
> could this be a minor bug related with solr? basically if we are giving the
> port information in the starting command, then we shouldnt be dealing with
> the port information in configs files, imho
> 
> 
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-5x-Detects-all-of-the-Solr-insrances-on-a-machine-tp4021254p4021777.html
> Sent from the Solr - User mailing list archive at Nabble.com.