Re: Memory Using In Faceted Search (UnInvertedField's)

2013-09-19 Thread Anton M
I ran some load tests and working memory usage was always about 10-11 Gb
(very slowly raising - that should be cause of query cache being filled in,
I think). 6 Gb was always heap size while 4-5 Gb was reported as shareable
memory.
First, I became afraid that Solr could continue taking memory up to all
available, but looks like it stops somewhere after fieldValueCache is filled
in.

Shawn, I had swap file growing (up to 50-60%) and working while load tests
ran. Did you configure 'swapiness' on your Linux box (set it to 0 earlier,
maybe)? If not, my Windows OS could be cause of that difference.

I'm not sure if that's completely an issue about shareable memory or some
missing JVM configurations (I don't have anything special except -Xmx, -Xms
and -XX:MaxPermSize=512M) or some Solr memory leak.
I'd appreciate any thoughts on that.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Memory-Using-In-Faceted-Search-UnInvertedField-s-tp4090889p4091014.html
Sent from the Solr - User mailing list archive at Nabble.com.


Migrating from Endeca

2013-09-19 Thread Gareth Poulton
Hi,
A customer wants us to move their entire enterprise platform - of which one
of the many components is Oracle Endeca - to open source.
However, customers being the way they are, they don't want to have to give
up any of the features they currently use, the most prominent of which are
user friendly web-based editors for non-technical people to be able to edit
things like:
- Schema
- Dimensions (i.e. facets)
- Dimension groups (not sure what these are)
- Thesaurus
- Stopwords
- Report generation
- Boosting individual records (i.e. sponsored links)
- Relevance ranking settings
- Process pipeline editor for, e.g. adding new languages
-...all without touching any xml.

My question is, are there any solr features, plugins, modules, third party
applications, or the like that will do this for us? Or will we have to
develop all the above from scratch?

thanks,
Gareth


solr atomic updates stored="true", and copyField limitation

2013-09-19 Thread Tanguy Moal
Hello,

I'm using solr 4.4. I have a solr core with a schema defining a bunch of 
different fields, and among them, a date field:
- date: indexed and stored   // the date used at search time
In practice it's a TrieDateField but I think that's not relevant for the 
concern.

It also has a multi valued, not required, "string" field named "tags" which 
contains, well a list of tags, for some of the documents.

So far, so good: everything works as expected and I'm glad.
I'm able to perform partial (or atomic) updates on the tags field whenever it 
gets modified, and I love it.

Now I have an new source that also pushes updates to the same solr core. 
Unfortunately, that source's incoming documents have their date in an other 
field, of the same type, named created_time instead of date.
- created_time: stored only  // some documents come in with this field set
To be able to sort any document by time, I decided to ask solr to copy the 
contents of the field created_time to the field named date:
 

I updated my schema and reloaded my core and everything seemed fine. In fact, I 
did break something 8-)
But I figured it out later…
Quoting http://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations :
> all fields in your SchemaXml must be configured as stored="true" except for 
> fields which are  destinations -- which must be configured as 
> stored="false"


However at that time, I was not aware of the limitation and I was able to sort 
by time across all the documents in my solr core.
I then decided to make sure that partial (or atomic) updates could still be 
performed, and then I was surprised:
* documents from the more recent source (having both a date and a created_time 
field) are updated fine, the date field is kept (the copyField directive is 
replayed, I guess)
* documents from the first source (having only the date field set) are however 
a little bit less lucky: the date gets lost in process (looks like the date 
field was overridden by the execution of the copyField directive with nothing 
in its source field)

I then became aware of the caveats and limitations of atomic updates, but now I 
want to understand why ;-)

So my question is: What differs concerning copyField behaviours between a 
normal (classic) and a partial (atomic) update?
In practice, I don't understand why the targets of every copyField directives 
are *always* cleared during partial updates?
Could the clearing of the destination field be performed if one of the source 
field of a copyField is present in the atomic update only? May be we didn't 
want to do that because that would have put some complexity where it should not 
be (updates must be fast), but that's just an idea.

I have two ways to handle my problem:
1/ Create a stored="false" search_date field and have two copyFields 
directives, one for the original "date" field an another one for the newer 
"created_time" field, and make the search application rely on the search_date 
field
2/ Since I have some control over the second source pushing documents, I can 
make sure that documents are pushed with the same date field, and work around 
the limitation by removing the copyField directive entirely.
Since it simplifies my solr schema, I chose the option #2

Thank you very much for your attention

Tanguy

Re: Migrating from Endeca

2013-09-19 Thread Jack Krupansky

Take a look at LucidWorks Enterprise. It has a graphical UI.

But if you must meet all of the listed requirements and Lucid doesn't meet 
all of them, then... you will have to develop everything on your own. Or, 
maybe Lucid might be interested in partnering with you to allow your to add 
extensions to their UI. If you really are committed to a deep replacement of 
Endeca's UI, then rolling your own is probably the way to go. Then the 
question is whether you should open source that UI.


You can also consider extending the Solr Admin UI. It does not do most of 
your listed features, but having better integration with the Solr Admin UI 
is a good idea.


-- Jack Krupansky

-Original Message- 
From: Gareth Poulton

Sent: Thursday, September 19, 2013 7:50 AM
To: solr-user@lucene.apache.org
Subject: Migrating from Endeca

Hi,
A customer wants us to move their entire enterprise platform - of which one
of the many components is Oracle Endeca - to open source.
However, customers being the way they are, they don't want to have to give
up any of the features they currently use, the most prominent of which are
user friendly web-based editors for non-technical people to be able to edit
things like:
- Schema
- Dimensions (i.e. facets)
- Dimension groups (not sure what these are)
- Thesaurus
- Stopwords
- Report generation
- Boosting individual records (i.e. sponsored links)
- Relevance ranking settings
- Process pipeline editor for, e.g. adding new languages
-...all without touching any xml.

My question is, are there any solr features, plugins, modules, third party
applications, or the like that will do this for us? Or will we have to
develop all the above from scratch?

thanks,
Gareth 



How to highlight multiple words in document

2013-09-19 Thread bramha
Hi All,

I want to highlight multiple words in document.

e.g If I search for "Rework AND Build" then after opening the document
returned by search result should highlight both words("Rework" as well as
"Build") in that document.

Currently I am adding word to highlight in highlight field.
In this example I am setting highlight = "Rework AND Build". But it is
considering this as single word and highlighting this occurrence in that
document.

Thanks in advance.
- Bramha



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-highlight-multiple-words-in-document-tp4091021.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr4.4 admin page show "loading"

2013-09-19 Thread Micheal Chao
hi, I have installed solr4.4 on tomcat7.0. the problem is I can't see the
solr admin page, it's always show "loading". I can't find any error in
tomcat logs, and I can send search request, and get the result.

what can I do? please help me, thank you very much. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr4-4-admin-page-show-loading-tp4091039.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrcloud - adding a node as a replica?

2013-09-19 Thread didier deshommes
Thanks Furkan,
That's exactly what I was looking for.


On Wed, Sep 18, 2013 at 4:21 PM, Furkan KAMACI wrote:

> Are yoh looking for that:
>
> http://lucene.472066.n3.nabble.com/SOLR-Cloud-Collection-Management-quesiotn-td4063305.html
>
> 18 Eylül 2013 Çarşamba tarihinde didier deshommes 
> adlı
> kullanıcı şöyle yazdı:
> > Hi,
> > How do I add a node as a replica to a solrcloud cluster? Here is my
> > situation: some time ago, I created several collections
> > with replicationFactor=2. Now I need to add a new replica. I thought just
> > starting a new node and re-using the same zokeeper instance would make it
> > automatically a replica, but that isn't the case. Do I need to delete and
> > re-create my collections with the right replicationFactor (3 in this
> case)
> > again? I am using solr 4.3.0.
> >
> > Thanks,
> > didier
> >
>


Re: Solrcloud - adding a node as a replica?

2013-09-19 Thread Furkan KAMACI
Do not hesitate to ask questions if you have any problems about it.


2013/9/19 didier deshommes 

> Thanks Furkan,
> That's exactly what I was looking for.
>
>
> On Wed, Sep 18, 2013 at 4:21 PM, Furkan KAMACI  >wrote:
>
> > Are yoh looking for that:
> >
> >
> http://lucene.472066.n3.nabble.com/SOLR-Cloud-Collection-Management-quesiotn-td4063305.html
> >
> > 18 Eylül 2013 Çarşamba tarihinde didier deshommes 
> > adlı
> > kullanıcı şöyle yazdı:
> > > Hi,
> > > How do I add a node as a replica to a solrcloud cluster? Here is my
> > > situation: some time ago, I created several collections
> > > with replicationFactor=2. Now I need to add a new replica. I thought
> just
> > > starting a new node and re-using the same zokeeper instance would make
> it
> > > automatically a replica, but that isn't the case. Do I need to delete
> and
> > > re-create my collections with the right replicationFactor (3 in this
> > case)
> > > again? I am using solr 4.3.0.
> > >
> > > Thanks,
> > > didier
> > >
> >
>


I can't open the admin page, it's always loading.

2013-09-19 Thread Micheal Chao
Hi, I followed the tutoral to download solr4.4 and unzip it, and then i
started jetty. i can post data and search correctly, but when i try to open
admin page, it's always show "loading". 

and then i setup solr on tomcat 7.0, but it's the same.

what's wrong? please help, thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud setup - any advice?

2013-09-19 Thread Neil Prosser
Apologies for the giant email. Hopefully it makes sense.

We've been trying out SolrCloud to solve some scalability issues with our
current setup and have run into problems. I'd like to describe our current
setup, our queries and the sort of load we see and am hoping someone might
be able to spot the massive flaw in the way I've been trying to set things
up.

We currently run Solr 4.0.0 in the old style Master/Slave replication. We
have five slaves, each running Centos with 96GB of RAM, 24 cores and with
48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but
aren't slow either. Our GC parameters aren't particularly exciting, just
-XX:+UseConcMarkSweepGC. Java version is 1.7.0_11.

Our index size ranges between 144GB and 200GB (when we optimise it back
down, since we've had bad experiences with large cores). We've got just
over 37M documents some are smallish but most range between 1000-6000
bytes. We regularly update documents so large portions of the index will be
touched leading to a maxDocs value of around 43M.

Query load ranges between 400req/s to 800req/s across the five slaves
throughout the day, increasing and decreasing gradually over a period of
hours, rather than bursting.

Most of our documents have upwards of twenty fields. We use different
fields to store territory variant (we have around 30 territories) values
and also boost based on the values in some of these fields (integer ones).

So an average query can do a range filter by two of the territory variant
fields, filter by a non-territory variant field. Facet by a field or two
(may be territory variant). Bring back the values of 60 fields. Boost query
on field values of a non-territory variant field. Boost by values of two
territory-variant fields. Dismax query on up to 20 fields (with boosts) and
phrase boost on those fields too. They're pretty big queries. We don't do
any index-time boosting. We try to keep things dynamic so we can alter our
boosts on-the-fly.

Another common query is to list documents with a given set of IDs and
select documents with a common reference and order them by one of their
fields.

Auto-commit every 30 minutes. Replication polls every 30 minutes.

Document cache:
  * initialSize - 32768
  * size - 32768

Filter cache:
  * autowarmCount - 128
  * initialSize - 8192
  * size - 8192

Query result cache:
  * autowarmCount - 128
  * initialSize - 8192
  * size - 8192

After a replicated core has finished downloading (probably while it's
warming) we see requests which usually take around 100ms taking over 5s. GC
logs show concurrent mode failure.

I was wondering whether anyone can help with sizing the boxes required to
split this index down into shards for use with SolrCloud and roughly how
much memory we should be assigning to the JVM. Everything I've read
suggests that running with a 48GB heap is way too high but every attempt
I've made to reduce the cache sizes seems to wind up causing out-of-memory
problems. Even dropping all cache sizes by 50% and reducing the heap by 50%
caused problems.

I've already tried using SolrCloud 10 shards (around 3.7M documents per
shard, each with one replica) and kept the cache sizes low:

Document cache:
  * initialSize - 1024
  * size - 1024

Filter cache:
  * autowarmCount - 128
  * initialSize - 512
  * size - 512

Query result cache:
  * autowarmCount - 32
  * initialSize - 128
  * size - 128

Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB
memory) and four shards on two boxes and three on the rest I still see
concurrent mode failure. This looks like it's causing ZooKeeper to mark the
node as down and things begin to struggle.

Is concurrent mode failure just something that will inevitably happen or is
it avoidable by dropping the CMSInitiatingOccupancyFraction?

If anyone has anything that might shove me in the right direction I'd be
very grateful. I'm wondering whether our set-up will just never work and
maybe we're expecting too much.

Many thanks,

Neil


Problem with stopword

2013-09-19 Thread mpcmarcos
Hello everybody, 

I have a problem with stopwords, I have an index with some stopwords and
when I search by one of them only, solr dont select any document. ¿How can I
fix this? I need all the documents.

Example:

*Stopwords*: hello, goodbye
*Query*: http://localhost:8893/solr/select?q=hello
*DebugQuery*: 
*Total Results*: 0

I try do this with e dismax, but only works if I do a call to solr without
"q", no when "q" is empty by stopwords.

http://localhost:8983/solr/select?q=&defType=edismax&q.alt=*:*



Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-stopword-tp4091064.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing several sub-fields in one solr field

2013-09-19 Thread jimmy nguyen
Hello,

I'd like to index into Solr (4.4.0) documents that I previously annotated
with GATE (7.1).
I use Behemoth to be able to run my GATE application on a corpus of
documents on Hadoop, and then Behemoth allows me to directly send my
annotated documents to solr. But my question is not about the Behemoth or
Hadoop parts.

The annotations produced by my GATE application usually have several
features (for example, annotation type Person has the following features :
Person.title, Person.firstName, Person.lastName, Person.gender).
Each of my documents may contain more than one Person annotation, which is
why I would like to index all the features for one annotation in one field
in solr.
How do I do that ?

I thought I'd add the following lines in schema.xml :


...

...

...

...


...



But as soon as I start my solr instances and try to access solr from my
browser, I get an HTTP ERROR 500 :

Problem accessing /solr/. Reason:

{msg=SolrCore 'collection1' is not available due to init failure:
Plugin Initializing failure for [schema.xml]
fieldType,trace=org.apache.solr.common.SolrException: SolrCore
'collection1' is not available due to init failure: Plugin Initializing
failure for [schema.xml] fieldType
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:287)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Plugin Initializing
failure for [schema.xml] fieldType
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:193)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:164)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
... 1 more
Caused by: java.lang.RuntimeException: schema fieldtype
ladate(org.apache.solr.schema.StrField) invalid
arguments:{subSuffix=_ladate}
at org.apache

decimal numeric queries are too slow in solr

2013-09-19 Thread Karan jindal
Hi all,

I am using solr 3.4 and index size is around 250gb.
the issue that I am facing is the queries which have a decimal number in it
is taking long time to execute.
I am using dismax query handler with *qf* (15 fields) and *pf * (4 fields)
and a boost function on time.

Also I am using worddelimitorfilterfactory with following options (only
mentioning options related to numbers)
*generateNumberParts="1"
*
*preserveOriginal="1"
*
*catenateNumbers="1"** **
*

Example Query :
"solr 3.4" takes about 20 seconds
"solr 3" takes less than 1 second

Couldn't understand the reason of so much difference.
I can understand the internally 3.4 will translate into something like this
"(3.4 3) (4 34)" because of worddelimitorfilterfactory, but still the
difference is quite huge.

On what factors query execution time depends?
Any help which helps me in knowing the reason will be appreciated.

Regards,
Karan


Question on ICUFoldingFilterFactory

2013-09-19 Thread Nemani, Raj
Hello,

I was wondering if anybody who has experience with ICUFoldingFilterFactory can 
help out with the following issue.  Thank you so much in advance.

Raj

--

Problem:
When a document is created/updated, the value's casing is indexed properly. 
However, when it's queried, the value is returned in lowercase.
Example:
Document input: NBAE
Document value: NBAE
Query input: NBAE,nbae,Nbae...etc
Query Output: nbae

If I remove the ICUFoldingFilterFactory filter, the casing problem goes away, 
but I then searches for nbae (lowercase) or Nbae (mix case) return no values.


Field Type:

  







  



Let me know if that makes sense. I'm curious if the 
solr.ICUFoldingFilterFactory has additional attributes that I can use to 
control the casing behavior but retain it's other filtering properties 
(ASCIIFoldingFilter,  and ICUNormalizer2Filter)

Thanks!!!



RE: Solr 4.4.0: Plugin init failure for [schema.xml] analyzer/tokenizer

2013-09-19 Thread Chris Hostetter

Ok, first off -- let's clear up some confusion...

1) except for needing to put hte logging jars in your servlet container's 
to level classpath, you should not be putting any jars that come from solr 
or lucene, or any jars for custom plugins you have written in "tomcat/lib"

2) you should never manually add/remove any jars or any kind from solr's 
"WEB-INF/lib/" directory.

3) if you have custom plugins you want to load, you should create a *new* 
lib directory, and put your custom plugin jars 9and their dependencies) in 
that directory, and configure it (either with sharedLib in solr.xml, or 
with a  directive in your solrconfig.xml file)


As for the situation situation you find yourself in...

here are the big, gigantic (the size of football fields even) red flags 
that jump out at me as being the sort of thing that could cause all sorts 
of classloader nightmares with your setup...

: * Following are the jars placed in "tomcat/lib" dir:
...
: lucene-core.jar 
: solr-core-1.3.0.jar 
: solr-dataimporthandler-4.4.0.jar 
: solr-dataimporthandler-extras-4.4.0.jar
: solr-solrj-4.4.0.jar
: lucene-analyzers-common-4.2.0.jar
...
: Jars in "tomcat/ webapps/ROOT/WEB-INF/lib/"
...
: lucene-core-4.4.0.jar 
: nps-solr-plugin-1.0-SNAPSHOT.jar
: solr-core-4.4.0.jar
: solr-dataimporthandler-4.4.0.jar
: lucene-analyzers-common-4.4.0.jar 
: solr-solrj-4.4.0.jar
...

You clearly have two radically diff versions of solr-core and lucene-core 
in your classpath, which could easily explain the problems of 
ClassCastExceptions related to hte TokenizerFactory class -- because there 
are going ot be two radically differnet versions of that class in the 
classpath, and who knows which one java is trying to cast your custom impl 
to.

seperate from that: even if the multiple solr-dataimporthandler, 
lucene-analyzers-common, solr-solrj jars in each of those directories are 
the exact same binary files, when loaded into the hierarchical 
classloaders of a servlet container, they produce differnt copies of the 
same java classes -- so you can again have classloader problems of 
some execution paths using a "leaf" classloader to access ClassX while 
another thread might use a "parent" classloader to access ClassX -- these 
differnet class instances will have different static fields, and instances 
of these classes will (probably) not be .equals(), etc


-Hoss


Re: SolrCloud setup - any advice?

2013-09-19 Thread Shreejay Nair
Hi Neil,

Although you haven't mentioned it, just wanted to confirm - do you have
soft commits enabled?

Also what's the version of solr you are using for the solr cloud setup?
4.0.0 had lots of memory and zk related issues. What's the warmup time for
your caches? Have you tried disabling the caches?

Is this is static index or you documents are added continuously?

The answers to these questions might help us pin point the issue...

On Thursday, September 19, 2013, Neil Prosser wrote:

> Apologies for the giant email. Hopefully it makes sense.
>
> We've been trying out SolrCloud to solve some scalability issues with our
> current setup and have run into problems. I'd like to describe our current
> setup, our queries and the sort of load we see and am hoping someone might
> be able to spot the massive flaw in the way I've been trying to set things
> up.
>
> We currently run Solr 4.0.0 in the old style Master/Slave replication. We
> have five slaves, each running Centos with 96GB of RAM, 24 cores and with
> 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but
> aren't slow either. Our GC parameters aren't particularly exciting, just
> -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11.
>
> Our index size ranges between 144GB and 200GB (when we optimise it back
> down, since we've had bad experiences with large cores). We've got just
> over 37M documents some are smallish but most range between 1000-6000
> bytes. We regularly update documents so large portions of the index will be
> touched leading to a maxDocs value of around 43M.
>
> Query load ranges between 400req/s to 800req/s across the five slaves
> throughout the day, increasing and decreasing gradually over a period of
> hours, rather than bursting.
>
> Most of our documents have upwards of twenty fields. We use different
> fields to store territory variant (we have around 30 territories) values
> and also boost based on the values in some of these fields (integer ones).
>
> So an average query can do a range filter by two of the territory variant
> fields, filter by a non-territory variant field. Facet by a field or two
> (may be territory variant). Bring back the values of 60 fields. Boost query
> on field values of a non-territory variant field. Boost by values of two
> territory-variant fields. Dismax query on up to 20 fields (with boosts) and
> phrase boost on those fields too. They're pretty big queries. We don't do
> any index-time boosting. We try to keep things dynamic so we can alter our
> boosts on-the-fly.
>
> Another common query is to list documents with a given set of IDs and
> select documents with a common reference and order them by one of their
> fields.
>
> Auto-commit every 30 minutes. Replication polls every 30 minutes.
>
> Document cache:
>   * initialSize - 32768
>   * size - 32768
>
> Filter cache:
>   * autowarmCount - 128
>   * initialSize - 8192
>   * size - 8192
>
> Query result cache:
>   * autowarmCount - 128
>   * initialSize - 8192
>   * size - 8192
>
> After a replicated core has finished downloading (probably while it's
> warming) we see requests which usually take around 100ms taking over 5s. GC
> logs show concurrent mode failure.
>
> I was wondering whether anyone can help with sizing the boxes required to
> split this index down into shards for use with SolrCloud and roughly how
> much memory we should be assigning to the JVM. Everything I've read
> suggests that running with a 48GB heap is way too high but every attempt
> I've made to reduce the cache sizes seems to wind up causing out-of-memory
> problems. Even dropping all cache sizes by 50% and reducing the heap by 50%
> caused problems.
>
> I've already tried using SolrCloud 10 shards (around 3.7M documents per
> shard, each with one replica) and kept the cache sizes low:
>
> Document cache:
>   * initialSize - 1024
>   * size - 1024
>
> Filter cache:
>   * autowarmCount - 128
>   * initialSize - 512
>   * size - 512
>
> Query result cache:
>   * autowarmCount - 32
>   * initialSize - 128
>   * size - 128
>
> Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB
> memory) and four shards on two boxes and three on the rest I still see
> concurrent mode failure. This looks like it's causing ZooKeeper to mark the
> node as down and things begin to struggle.
>
> Is concurrent mode failure just something that will inevitably happen or is
> it avoidable by dropping the CMSInitiatingOccupancyFraction?
>
> If anyone has anything that might shove me in the right direction I'd be
> very grateful. I'm wondering whether our set-up will just never work and
> maybe we're expecting too much.
>
> Many thanks,
>
> Neil
>


Will Solr work with a mapped drive?

2013-09-19 Thread johnmunir
Hi,


I'm having this same problem as described here: 
http://stackoverflow.com/questions/17708163/absolute-paths-in-solr-xml-configuration-using-tomcat6-on-windows
  Any one knows if this is a limitation of Solr or not?


I searched the web, nothing came up.


Thanks!!!


-- MJ


Re: Indexing several sub-fields in one solr field

2013-09-19 Thread Jack Krupansky
There is no such fieldType attribute as "subSuffix". Solr is just 
complaining about extraneous, junk attributes. Delete the crap.


-- Jack Krupansky

-Original Message- 
From: jimmy nguyen

Sent: Thursday, September 19, 2013 12:43 PM
To: solr-user@lucene.apache.org
Subject: Indexing several sub-fields in one solr field

Hello,

I'd like to index into Solr (4.4.0) documents that I previously annotated
with GATE (7.1).
I use Behemoth to be able to run my GATE application on a corpus of
documents on Hadoop, and then Behemoth allows me to directly send my
annotated documents to solr. But my question is not about the Behemoth or
Hadoop parts.

The annotations produced by my GATE application usually have several
features (for example, annotation type Person has the following features :
Person.title, Person.firstName, Person.lastName, Person.gender).
Each of my documents may contain more than one Person annotation, which is
why I would like to index all the features for one annotation in one field
in solr.
How do I do that ?

I thought I'd add the following lines in schema.xml :


...

...

...

...


...



But as soon as I start my solr instances and try to access solr from my
browser, I get an HTTP ERROR 500 :

Problem accessing /solr/. Reason:

   {msg=SolrCore 'collection1' is not available due to init failure:
Plugin Initializing failure for [schema.xml]
fieldType,trace=org.apache.solr.common.SolrException: SolrCore
'collection1' is not available due to init failure: Plugin Initializing
failure for [schema.xml] fieldType
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:287)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Plugin Initializing
failure for [schema.xml] fieldType
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:193)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:164)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.jav

Re: Indexing several sub-fields in one solr field

2013-09-19 Thread jimmy nguyen
Hello,

thanks for the answer. Sorry, I actually meant attribute "subFieldSuffix".

So, in order to be able to index several features in one solr field, should
I program a new Java class inheriting AbstractSubTypeFieldType ? Or is
there another way to do it ?

Thanks !
Jim


On Thu, Sep 19, 2013 at 4:05 PM, Jack Krupansky wrote:

> There is no such fieldType attribute as "subSuffix". Solr is just
> complaining about extraneous, junk attributes. Delete the crap.
>
> -- Jack Krupansky
>
> -Original Message- From: jimmy nguyen
> Sent: Thursday, September 19, 2013 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: Indexing several sub-fields in one solr field
>
>
> Hello,
>
> I'd like to index into Solr (4.4.0) documents that I previously annotated
> with GATE (7.1).
> I use Behemoth to be able to run my GATE application on a corpus of
> documents on Hadoop, and then Behemoth allows me to directly send my
> annotated documents to solr. But my question is not about the Behemoth or
> Hadoop parts.
>
> The annotations produced by my GATE application usually have several
> features (for example, annotation type Person has the following features :
> Person.title, Person.firstName, Person.lastName, Person.gender).
> Each of my documents may contain more than one Person annotation, which is
> why I would like to index all the features for one annotation in one field
> in solr.
> How do I do that ?
>
> I thought I'd add the following lines in schema.xml :
>
> 
> ...
> 
> ...
> 
> ...
> 
> ...
>  multiValued="true" />
>  stored="false" />
> ...
> 
>
>
> But as soon as I start my solr instances and try to access solr from my
> browser, I get an HTTP ERROR 500 :
>
> Problem accessing /solr/. Reason:
>
>{msg=SolrCore 'collection1' is not available due to init failure:
> Plugin Initializing failure for [schema.xml]
> fieldType,trace=org.apache.**solr.common.SolrException: SolrCore
> 'collection1' is not available due to init failure: Plugin Initializing
> failure for [schema.xml] fieldType
> at org.apache.solr.core.**CoreContainer.getCore(**CoreContainer.java:860)
> at
> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
> SolrDispatchFilter.java:287)
> at
> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
> SolrDispatchFilter.java:158)
> at
> org.eclipse.jetty.servlet.**ServletHandler$CachedChain.**
> doFilter(ServletHandler.java:**1419)
> at
> org.eclipse.jetty.servlet.**ServletHandler.doHandle(**
> ServletHandler.java:455)
> at
> org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
> ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.**SecurityHandler.handle(**
> SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.**session.SessionHandler.**
> doHandle(SessionHandler.java:**231)
> at
> org.eclipse.jetty.server.**handler.ContextHandler.**
> doHandle(ContextHandler.java:**1075)
> at org.eclipse.jetty.servlet.**ServletHandler.doScope(**
> ServletHandler.java:384)
> at
> org.eclipse.jetty.server.**session.SessionHandler.**
> doScope(SessionHandler.java:**193)
> at
> org.eclipse.jetty.server.**handler.ContextHandler.**
> doScope(ContextHandler.java:**1009)
> at
> org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
> ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.**handler.**ContextHandlerCollection.**handle(**
> ContextHandlerCollection.java:**255)
> at
> org.eclipse.jetty.server.**handler.HandlerCollection.**
> handle(HandlerCollection.java:**154)
> at
> org.eclipse.jetty.server.**handler.HandlerWrapper.handle(**
> HandlerWrapper.java:116)
> at org.eclipse.jetty.server.**Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.**AbstractHttpConnection.**handleRequest(**
> AbstractHttpConnection.java:**489)
> at
> org.eclipse.jetty.server.**BlockingHttpConnection.**handleRequest(**
> BlockingHttpConnection.java:**53)
> at
> org.eclipse.jetty.server.**AbstractHttpConnection.**headerComplete(**
> AbstractHttpConnection.java:**942)
> at
> org.eclipse.jetty.server.**AbstractHttpConnection$**
> RequestHandler.headerComplete(**AbstractHttpConnection.java:**1004)
> at org.eclipse.jetty.http.**HttpParser.parseNext(**HttpParser.java:640)
> at org.eclipse.jetty.http.**HttpParser.parseAvailable(**
> HttpParser.java:235)
> at
> org.eclipse.jetty.server.**BlockingHttpConnection.handle(**
> BlockingHttpConnection.java:**72)
> at
> org.eclipse.jetty.server.bio.**SocketConnector$**ConnectorEndPoint.run(**
> SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.**QueuedThreadPool.runJob(**
> QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.**QueuedThreadPool$3.run(**
> QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.**java:722)
> Caused by: org.apache.solr.common.**SolrException: Plugin Initializing
> failure for [schema.xml] fieldType
> at
> org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
> AbstractPluginLoader.java:193)
> at org.apache.solr.schema.**IndexSchema.readSchema(**IndexSchema.java:467)
> at org.apache.solr.schema.*

Re: SOLR-5250

2013-09-19 Thread Chris Hostetter

: widget, BUT while researching for this message, I've learned about the 
: important difference between a text field and a string field in solr and 
: it appears that by default, the Drupal apachesolr module indexes text 
: fields as "text" and not strings. Now I just need to figure out how to 
: alter this process to suit my own needs. I'll update that d.org ticket 
: with my findings so hopefully that will prevent some other future, 
: confused developer from reaching out to the Apache Foundation 
: prematurely.

John: glad to hear you wer able to track down the root cause of your 
problem.

Thanks for closing the loop, and good luck on finding a solution that 
works nicely with the drupal bridge you are using.

Please feel free to folow up on this list with any additional questions 
you have on the solr side of things.

-Hoss


Re: I can't open the admin page, it's always loading.

2013-09-19 Thread Furkan KAMACI
Could you paste your jetty logs of when you try to open admin page.

19 Eylül 2013 Perşembe tarihinde Micheal Chao 
adlı kullanıcı şöyle yazdı:
> Hi, I followed the tutoral to download solr4.4 and unzip it, and then i
> started jetty. i can post data and search correctly, but when i try to
open
> admin page, it's always show "loading".
>
> and then i setup solr on tomcat 7.0, but it's the same.
>
> what's wrong? please help, thanks.
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Problem with stopword

2013-09-19 Thread Furkan KAMACI
Firstly, you  houl read here:

https://cwiki.apache.org/confluence/display/solr/Running+Your+Analyzer

Secondly, when you write a quey stop word are filtered from your query if
you use stop word analyzer so there will not be anything else to search.

19 Eylül 2013 Perşembe tarihinde mpcmarcos  adlı
kullanıcı şöyle yazdı:
> Hello everybody,
>
> I have a problem with stopwords, I have an index with some stopwords and
> when I search by one of them only, solr dont select any document. ¿How
can I
> fix this? I need all the documents.
>
> Example:
>
> *Stopwords*: hello, goodbye
> *Query*: http://localhost:8893/solr/select?q=hello
> *DebugQuery*: 
> *Total Results*: 0
>
> I try do this with e dismax, but only works if I do a call to solr without
> "q", no when "q" is empty by stopwords.
>
> http://localhost:8983/solr/select?q=&defType=edismax&q.alt=*:*
>
>
>
> Thank you.
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/Problem-with-stopword-tp4091064.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


SOLR-5250

2013-09-19 Thread John Brandenburg
Greetings, 

This is a follow up to https://issues.apache.org/jira/browse/SOLR-5250 where I 
reported a possible issue with sorting content which contains hyphens. Hoss Man 
suggested that I likely have a misconfiguration on my field settings and that I 
send a message to this list.

I am using the Drupal apachesolr module version 1.4 (Where I actually also 
posted an issue at https://drupal.org/node/2092363) with a hosted Acquia solr 
index. So the schema settings will reflect what is packaged with the apachesolr 
module in "drupal-3.0-rc2-solr3."

I wasn't initially familiar with how Drupal field types are mapped to Solr 
field types, and the field in question is using the Text field widget, BUT 
while researching for this message, I've learned about the important difference 
between a text field and a string field in solr and it appears that by default, 
the Drupal apachesolr module indexes text fields as "text" and not strings. Now 
I just need to figure out how to alter this process to suit my own needs. I'll 
update that d.org ticket with my findings so hopefully that will prevent some 
other future, confused developer from reaching out to the Apache Foundation 
prematurely.

--
John P. Brandenburg
Developer

jbrandenb...@forumone.com
www.forumone.com
703-894-4362

Forum One Communications
Communicate • Collaborate • Change the World

Re: I can't open the admin page, it's always loading.

2013-09-19 Thread Chris Hostetter

: Hi, I followed the tutoral to download solr4.4 and unzip it, and then i
: started jetty. i can post data and search correctly, but when i try to open
: admin page, it's always show "loading". 

the admin UI is entirely rendered by client side javascript in your 
browser -- so the most important question we need to know is what OS & 
browser you are using to access the web UI.

if your browser has a debug/error console available, it would also help to 
know if it mentions any errors/warnings.


-Hoss


Re: SolrCloud setup - any advice?

2013-09-19 Thread Otis Gospodnetic
Hi Neil,

Consider using G1 instead.  See http://blog.sematext.com/?s=g1

If that doesn't help, we can play with various JVM parameters.  The latest
version of SPM for Solr exposes information about sizes and utilization of
JVM memory pools, which may help you understand which JVM params you need
to change, how, and whether your changes are achieving the desired effect.

Otis
Solr & ElasticSearch Support
http://sematext.com/


On Sep 19, 2013 11:21 AM, "Neil Prosser"  wrote:

> Apologies for the giant email. Hopefully it makes sense.
>
> We've been trying out SolrCloud to solve some scalability issues with our
> current setup and have run into problems. I'd like to describe our current
> setup, our queries and the sort of load we see and am hoping someone might
> be able to spot the massive flaw in the way I've been trying to set things
> up.
>
> We currently run Solr 4.0.0 in the old style Master/Slave replication. We
> have five slaves, each running Centos with 96GB of RAM, 24 cores and with
> 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but
> aren't slow either. Our GC parameters aren't particularly exciting, just
> -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11.
>
> Our index size ranges between 144GB and 200GB (when we optimise it back
> down, since we've had bad experiences with large cores). We've got just
> over 37M documents some are smallish but most range between 1000-6000
> bytes. We regularly update documents so large portions of the index will be
> touched leading to a maxDocs value of around 43M.
>
> Query load ranges between 400req/s to 800req/s across the five slaves
> throughout the day, increasing and decreasing gradually over a period of
> hours, rather than bursting.
>
> Most of our documents have upwards of twenty fields. We use different
> fields to store territory variant (we have around 30 territories) values
> and also boost based on the values in some of these fields (integer ones).
>
> So an average query can do a range filter by two of the territory variant
> fields, filter by a non-territory variant field. Facet by a field or two
> (may be territory variant). Bring back the values of 60 fields. Boost query
> on field values of a non-territory variant field. Boost by values of two
> territory-variant fields. Dismax query on up to 20 fields (with boosts) and
> phrase boost on those fields too. They're pretty big queries. We don't do
> any index-time boosting. We try to keep things dynamic so we can alter our
> boosts on-the-fly.
>
> Another common query is to list documents with a given set of IDs and
> select documents with a common reference and order them by one of their
> fields.
>
> Auto-commit every 30 minutes. Replication polls every 30 minutes.
>
> Document cache:
>   * initialSize - 32768
>   * size - 32768
>
> Filter cache:
>   * autowarmCount - 128
>   * initialSize - 8192
>   * size - 8192
>
> Query result cache:
>   * autowarmCount - 128
>   * initialSize - 8192
>   * size - 8192
>
> After a replicated core has finished downloading (probably while it's
> warming) we see requests which usually take around 100ms taking over 5s. GC
> logs show concurrent mode failure.
>
> I was wondering whether anyone can help with sizing the boxes required to
> split this index down into shards for use with SolrCloud and roughly how
> much memory we should be assigning to the JVM. Everything I've read
> suggests that running with a 48GB heap is way too high but every attempt
> I've made to reduce the cache sizes seems to wind up causing out-of-memory
> problems. Even dropping all cache sizes by 50% and reducing the heap by 50%
> caused problems.
>
> I've already tried using SolrCloud 10 shards (around 3.7M documents per
> shard, each with one replica) and kept the cache sizes low:
>
> Document cache:
>   * initialSize - 1024
>   * size - 1024
>
> Filter cache:
>   * autowarmCount - 128
>   * initialSize - 512
>   * size - 512
>
> Query result cache:
>   * autowarmCount - 32
>   * initialSize - 128
>   * size - 128
>
> Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB
> memory) and four shards on two boxes and three on the rest I still see
> concurrent mode failure. This looks like it's causing ZooKeeper to mark the
> node as down and things begin to struggle.
>
> Is concurrent mode failure just something that will inevitably happen or is
> it avoidable by dropping the CMSInitiatingOccupancyFraction?
>
> If anyone has anything that might shove me in the right direction I'd be
> very grateful. I'm wondering whether our set-up will just never work and
> maybe we're expecting too much.
>
> Many thanks,
>
> Neil
>


Re: Unknown attribute id in add:allowDups

2013-09-19 Thread Chris Hostetter

: I'm working with the Pecl package, with Solr 4.3.1. I have a doc defined in my
...
: $client = new SolrClient($options);
: $doc = new SolrInputDocument();
: $doc->addField('id', 12345);
: $doc->addField('description', 'This is the content of the doc');
: $updateResponse = $client->addDocument($doc);
: 
: When I do this, the doc is not added to the index, and I get the following
: error in the logs in admin
: 
:  Unknown attribute id in add:allowDups

"id" is a red herring here -- it's not refering to your "id" field it's 
refering to the fact that an XML attribute node exists with an "XML id" 
that it doesn't recognize.

or to put it another way: Pecl is generating an  xml element that 
contains an attribute like this:  allowDups="false|true" ...and solr 
doesn't know what to do with that.

"allowDups" was an option that existed prior to 4.0, but is no longer 
supported (the "overwrite" attribute now takes it's place)

So my best guess is that the Pecl code you are using was designed for 3.x, 
and doesn't entirely work correctly with 4.x.

the warning you are getting isn't fatal or anything -- it's just letting 
you know that unknown attribute is being ignored -- but you may want to 
look into wether there is an updated Pecl library (for example: if you 
really wanted to use allowDups="true" you should now be using 
overwrite="false" and maybe the newer version of your client library will 
let you)

I've updated some places in the ref guide and wiki were it wasn't obvious 
that allowDups is gone, gone, gone ... i'll also update that error message 
so it will be more clear starting in 4.6...

https://issues.apache.org/jira/browse/SOLR-5257

-Hoss


Re: solrcloud shards backup/restoration

2013-09-19 Thread Aditya Sakhuja
Hi,

Sorry for the late followup on this. Let me put in more details here.

*The problem:*

Cannot successfully restore back the index backed up with
'/replication?command=backup'. The backup was generated as *
snapshot.mmdd*

*My setup and steps:*
*
*
6 solrcloud instances
7 zookeepers instances

Steps:

1.> Take snapshot using *http://host1:8893/solr/replication?command=backup*,
on one host only. move *snapshot.mmdd *to some reliable storage.

2.> Stop all 6 solr instances, all 7 zk instances.

3.> Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting
the index data completely.

4.> Delete zookeeper/data/version*/* on all zookeeper nodes.

5.> Copy back index from backup to one of the nodes.
 \> cp *snapshot.mmdd/*  *../collectionname/data/index/*

6.> Restart all zk instances. Restart all solrcloud instances.


*Outcome:*
*
*
All solr instances are up. However, *num of docs = 0 *for all nodes.
Looking at the node where the index was restored, there is a new
index.yymmddhhmmss directory being created and index.properties pointing to
it. That explains why no documents are reported.


How do I have solrcloud pickup data from the index directory on a restart ?

Thanks in advance,
Aditya



On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja wrote:

> Thanks Shalin and Mark for your responses. I am on the same page about the
> conventions for taking the backup. However, I am less sure about the
> restoration of the index. Lets say we have 3 shards across 3 solrcloud
> servers.
>
> 1.> I am assuming we should take a backup from each of the shard leaders
> to get a complete collection. do you think that will get the complete index
> ( not worrying about what is not hard committed at the time of backup ). ?
>
> 2.> How do we go about restoring the index in a fresh solrcloud cluster ?
> From the structure of the snapshot I took, I did not see any
> replication.properties or index.properties  which I see normally on a
> healthy solrcloud cluster nodes.
> if I have the snapshot named snapshot.20130905 does the
> snapshot.20130905/* go into data/index ?
>
> Thanks
> Aditya
>
>
>
> On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller  wrote:
>
>> Phone typing. The end should not say "don't hard commit" - it should say
>> "do a hard commit and take a snapshot".
>>
>> Mark
>>
>> Sent from my iPhone
>>
>> On Sep 6, 2013, at 7:26 AM, Mark Miller  wrote:
>>
>> > I don't know that it's too bad though - its always been the case that
>> if you do a backup while indexing, it's just going to get up to the last
>> hard commit. With SolrCloud that will still be the case. So just make sure
>> you do a hard commit right before taking the backup - yes, it might miss a
>> few docs in the tran log, but if you are taking a back up while indexing,
>> you don't have great precision in any case - you will roughly get a
>> snapshot for around that time - even without SolrCloud, if you are worried
>> about precision and getting every update into that backup, you want to stop
>> indexing and commit first. But if you just want a rough snapshot for around
>> that time, in both cases you can still just don't hard commit and take a
>> snapshot.
>> >
>> > Mark
>> >
>> > Sent from my iPhone
>> >
>> > On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>> >
>> >> The replication handler's backup command was built for pre-SolrCloud.
>> >> It takes a snapshot of the index but it is unaware of the transaction
>> >> log which is a key component in SolrCloud. Hence unless you stop
>> >> updates, commit your changes and then take a backup, you will likely
>> >> miss some updates.
>> >>
>> >> That being said, I'm curious to see how peer sync behaves when you try
>> >> to restore from a snapshot. When you say that you haven't been
>> >> successful in restoring, what exactly is the behaviour you observed?
>> >>
>> >> On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja <
>> aditya.sakh...@gmail.com> wrote:
>> >>> Hello,
>> >>>
>> >>> I was looking for a good backup / recovery solution for the solrcloud
>> >>> indexes. I am more looking for restoring the indexes from the index
>> >>> snapshot, which can be taken using the replicationHandler's backup
>> command.
>> >>>
>> >>> I am looking for something that works with solrcloud 4.3 eventually,
>> but
>> >>> still relevant if you tested with a previous version.
>> >>>
>> >>> I haven't been successful in have the restored index replicate across
>> the
>> >>> new replicas, after I restart all the nodes, with one node having the
>> >>> restored index.
>> >>>
>> >>> Is restoring the indexes on all the nodes the best way to do it ?
>> >>> --
>> >>> Regards,
>> >>> -Aditya Sakhuja
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Shalin Shekhar Mangar.
>>
>
>
>
> --
> Regards,
> -Aditya Sakhuja
>



-- 
Regards,
-Aditya Sakhuja


[ANN] Lux Release 0.10.5

2013-09-19 Thread Michael Sokolov
I'm pleased to announce the release of the XML search engine Lux, 
version 0.10.5.  There has been a lot of progress made since our last 
announced release, which was 0.9.1.  Some highlights:


The app server now provides full access to HTTP request data and control 
of HTTP responses.  We've implemented the excellent EXPath specification 
for this (http://expath.org/spec/webapp 
) with only a few gaps (eg. no 
binary file upload yet).


Range comparisons (like [@title > 'median'] are now rewritten by the 
optimizer to use the lux:key() function when a suitable index is 
available, and comparisons involving lux:key() are optimized using the 
Lucene index.


and there have been numerous performance optimizations and bug fixes, 
detailed at http://issues.luxdb.org/ and in the release notes here: 
http://luxdb.org/RELEASE-0.10.html.


Lots more information, including downloads, documentation and setup 
instructions, is available at http://luxdb.org , 
source code is at http://github.com/msokolov/lux, and there is an email 
list: lu...@luxdb.org, archived at 
https://groups.google.com/forum/?fromgroups#!topic/luxdb 
.


Finally, I'll be presenting Lux at Lucene/Solr Revolution in Dublin Nov. 
6-7, so if you're anywhere nearby, I encourage you to come, and I look 
forward to seeing you there!


-Mike Sokolov
soko...@falutin.net


Re: I can't open the admin page, it's always loading.

2013-09-19 Thread Alexandre Rafalovitch
You may have some over-eager Ad Blockers! Check the network panel of
Firebug/Chrome console/whatever you have. See if some resources are not
loaded.

Regards,
Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Sep 19, 2013 at 9:21 PM, Micheal Chao wrote:

> Hi, I followed the tutoral to download solr4.4 and unzip it, and then i
> started jetty. i can post data and search correctly, but when i try to open
> admin page, it's always show "loading".
>
> and then i setup solr on tomcat 7.0, but it's the same.
>
> what's wrong? please help, thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Question on ICUFoldingFilterFactory

2013-09-19 Thread Alexandre Rafalovitch
What do you mean by "output"? Are you looking at fields in returned
documents? In which case you should see original stored field. Or are you -
for example - looking at facet/group values which are using tokenized
post-processed results?

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Sep 20, 2013 at 2:22 AM, Nemani, Raj  wrote:

> Hello,
>
> I was wondering if anybody who has experience with ICUFoldingFilterFactory
> can help out with the following issue.  Thank you so much in advance.
>
> Raj
>
> --
>
> Problem:
> When a document is created/updated, the value's casing is indexed
> properly. However, when it's queried, the value is returned in lowercase.
> Example:
> Document input: NBAE
> Document value: NBAE
> Query input: NBAE,nbae,Nbae...etc
> Query Output: nbae
>
> If I remove the ICUFoldingFilterFactory filter, the casing problem goes
> away, but I then searches for nbae (lowercase) or Nbae (mix case) return no
> values.
>
>
> Field Type:
>  positionIncrementGap="20" autoGeneratePhraseQueries="true">
>   
>  class="solr.PatternReplaceFilterFactory" pattern="\s&\s"
> replacement="\sand\s"/>
>  class="solr.PatternReplaceCharFilterFactory"
> pattern="[\p{Punct}\u00BF\u00A1]" replaceWith=" "/>
> 
> 
>  class="solr.PatternReplaceFilterFactory" pattern="[\p{Cntrl}]"
> replacement=""/>
> 
>  words="stopwords_en.txt" enablePositionIncrements="true" />
>   
> 
>
>
> Let me know if that makes sense. I'm curious if the
> solr.ICUFoldingFilterFactory has additional attributes that I can use to
> control the casing behavior but retain it's other filtering properties
> (ASCIIFoldingFilter,  and ICUNormalizer2Filter)
>
> Thanks!!!
>
>


Re: Memory Using In Faceted Search (UnInvertedField's)

2013-09-19 Thread Shawn Heisey
On 9/19/2013 3:14 AM, Anton M wrote:
> Shawn, I had swap file growing (up to 50-60%) and working while load tests
> ran. Did you configure 'swapiness' on your Linux box (set it to 0 earlier,
> maybe)? If not, my Windows OS could be cause of that difference.

The vm.swappiness sysctl setting is 1.  I have used 0 as well.  I don't
want it to start swapping unless it *REALLY* needs to.  The default of
60 is pretty aggressive.

> I'm not sure if that's completely an issue about shareable memory or some
> missing JVM configurations (I don't have anything special except -Xmx, -Xms
> and -XX:MaxPermSize=512M) or some Solr memory leak.
> I'd appreciate any thoughts on that.

As I said before, I think that the memory reported as shareable is not
actually allocated.  It probably should be listed under virtual memory.
 Our app rarely does facets, and it typically sorts on one field, so I
have absolutely no idea what's being measured in the 11g of shared
memory for the solr process.

I was present for a conversation between Lucene committers on IRC where
they seemed to be discussing this issue, and it sounded like it is a
side effect of using MMap in a particular way.  It sounded like they
didn't want to change the way its used, because it was the correct way
of using it.  The conversation went way over my head for the most part.

Thanks,
Shawn



Re: I can't open the admin page, it's always loading.

2013-09-19 Thread Micheal Chao
I'm using windows7 and IE8, i debuged the script, it showed error: "var
d3_formatPrefixes =
["y","z","a","f","p","n","μ","m","","k","M","G","T","P","E","Z","Y"].map(d3_formatPrefix);"
can't find object's method.

so i changed my browse, and it works.

thanks a lot. 
is this a bug of solr?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051p4091157.html
Sent from the Solr - User mailing list archive at Nabble.com.


JVM Crash using solr 4.4 on Centos

2013-09-19 Thread Oak McIlwain
I have solr 4.4 running on tomcat 7 on my local development environment
which is ubuntu based and it works fine (Querying, Posting Documents, Data
Import etc.)

I am trying to move into a staging environment which is Centos based (still
using tomcat 7 and solr 4.4 however when attempting to post documents and
do a data import from mysql through jdbc, after a few hundred documents,
the tomcat server crashes and it logs:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7fb4d8fe5e85, pid=10620, tid=140414656674112
#
# JRE version: 7.0-b147
# Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# J  org.apache.lucene.analysis.en.PorterStemFilter.incrementToken()Z

I'm using Sun Java JDK 1.7.0

Anyone got any ideas I can pursue to resolve this?


Re: I can't open the admin page, it's always loading.

2013-09-19 Thread Alexandre Rafalovitch
I think IE8 itself might the bug! :-) Many popular libraries have dropped
<=IE7 support completely and are phasing out IE8 as well. Looks like D3 - a
visualization library used for some of Admin stuff - is doing that as well.

Though I thought Admin javascript loading was more robust than that.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Sep 20, 2013 at 8:48 AM, Micheal Chao wrote:

> I'm using windows7 and IE8, i debuged the script, it showed error: "var
> d3_formatPrefixes =
>
> ["y","z","a","f","p","n","μ","m","","k","M","G","T","P","E","Z","Y"].map(d3_formatPrefix);"
> can't find object's method.
>
> so i changed my browse, and it works.
>
> thanks a lot.
> is this a bug of solr?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051p4091157.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: JVM Crash using solr 4.4 on Centos

2013-09-19 Thread Michael Ryan
This is a known bug in that JDK version. Upgrade to a newer version of JDK 7 
(any build within the last two years or so should be fine). If that's not 
possible for you, you can add -XX:-UseLoopPredicate as a command line option to 
java to work around this.

-Michael

-Original Message-
From: Oak McIlwain [mailto:oak.mcilw...@gmail.com] 
Sent: Thursday, September 19, 2013 10:10 PM
To: solr-user@lucene.apache.org
Subject: JVM Crash using solr 4.4 on Centos

I have solr 4.4 running on tomcat 7 on my local development environment which 
is ubuntu based and it works fine (Querying, Posting Documents, Data Import 
etc.)

I am trying to move into a staging environment which is Centos based (still 
using tomcat 7 and solr 4.4 however when attempting to post documents and do a 
data import from mysql through jdbc, after a few hundred documents, the tomcat 
server crashes and it logs:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7fb4d8fe5e85, pid=10620, tid=140414656674112 # # 
JRE version: 7.0-b147 # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 
mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# J  org.apache.lucene.analysis.en.PorterStemFilter.incrementToken()Z

I'm using Sun Java JDK 1.7.0

Anyone got any ideas I can pursue to resolve this?


Re: solrcloud shards backup/restoration

2013-09-19 Thread Aditya Sakhuja
How does one recover from an index corruption ? That's what I am trying to
eventually tackle here.

Thanks
Aditya

On Thursday, September 19, 2013, Aditya Sakhuja wrote:

> Hi,
>
> Sorry for the late followup on this. Let me put in more details here.
>
> *The problem:*
>
> Cannot successfully restore back the index backed up with
> '/replication?command=backup'. The backup was generated as *
> snapshot.mmdd*
>
> *My setup and steps:*
> *
> *
> 6 solrcloud instances
> 7 zookeepers instances
>
> Steps:
>
> 1.> Take snapshot using *http://host1:8893/solr/replication?command=backup
> *, on one host only. move *snapshot.mmdd *to some reliable storage.
>
> 2.> Stop all 6 solr instances, all 7 zk instances.
>
> 3.> Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting
> the index data completely.
>
> 4.> Delete zookeeper/data/version*/* on all zookeeper nodes.
>
> 5.> Copy back index from backup to one of the nodes.
>  \> cp *snapshot.mmdd/*  *../collectionname/data/index/*
>
> 6.> Restart all zk instances. Restart all solrcloud instances.
>
>
> *Outcome:*
> *
> *
> All solr instances are up. However, *num of docs = 0 *for all nodes.
> Looking at the node where the index was restored, there is a new
> index.yymmddhhmmss directory being created and index.properties pointing to
> it. That explains why no documents are reported.
>
>
> How do I have solrcloud pickup data from the index directory on a restart
> ?
>
> Thanks in advance,
> Aditya
>
>
>
> On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja 
> wrote:
>
> Thanks Shalin and Mark for your responses. I am on the same page about the
> conventions for taking the backup. However, I am less sure about the
> restoration of the index. Lets say we have 3 shards across 3 solrcloud
> servers.
>
> 1.> I am assuming we should take a backup from each of the shard leaders
> to get a complete collection. do you think that will get the complete index
> ( not worrying about what is not hard committed at the time of backup ). ?
>
> 2.> How do we go about restoring the index in a fresh solrcloud cluster ?
> From the structure of the snapshot I took, I did not see any
> replication.properties or index.properties  which I see normally on a
> healthy solrcloud cluster nodes.
> if I have the snapshot named snapshot.20130905 does the
> snapshot.20130905/* go into data/index ?
>
> Thanks
> Aditya
>
>
>
> On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller  wrote:
>
> Phone typing. The end should not say "don't hard commit" - it should say
> "do a hard commit and take a snapshot".
>
> Mark
>
> Sent from my iPhone
>
> On Sep 6, 2013, at 7:26 AM, Mark Miller  wrote:
>
> > I don't know that it's too bad though - its always been the case that if
> you do a backup while indexing, it's just going to get up to the last hard
> commit. With SolrCloud that will still be the case. So just make sure you
> do a hard commit right before taking the backup - yes, it might miss a few
> docs in the tran log, but if you are taking a back up while indexing, you
> don't have great precision in any case - you will roughly get a snapshot
> for around that time - even without SolrCloud, if you are worried about
> precision and getting every update into that backup, you want to stop
> indexing and commit first. But if you just want a rough snapshot for around
> that time, in both cases you can still just don't hard commit and take a
> snapshot.
> >
> > Mark
> >
> > Sent from my iPhone
> >
> > On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
> >
> >> The replication handler's backup command was built for pre-SolrCloud.
> >> It takes a snapshot of the index but it is unaware of the transaction
> >> log which is a key component in SolrCloud. Hence unless you stop
> >> updates, commit your changes and then take a backup, you will likely
> >> miss some updates.
> >>
> >> That being said, I'm curious to see how peer sync behaves when you try
> >> to restore from a snapshot. When you say that you haven't been
> >> successful in restoring, what exactly is the behaviour you observed?
> >>
> >> On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja <
> aditya.sakh...@gmail.com> wrote:
> >>> Hello,
> >>>
> >>> I was looking for a good backup / recovery solution for the solrcloud
> >>> indexes. I am more looking for restoring the indexes from the index
> >>> snapshot, which can be taken using the replicationHandler's backup
> command.
> >>>
> >>> I am looking for something that works with solrcloud 4.3 eventually,
> but
> >>> still relevant if you tested with a previous version.
> >>>
> >>> I haven't been successful in have the restored index replicate across
> the
> >>> new replicas, after I restart all the nodes, with one node having the
> >>> restored index.
> >>>
> >>> Is restoring the indexes on all the nodes the best way to do it ?
> >>> --
> >>> Regards,
> >>> -Aditya Sakhuja
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
>
>
>
>
> --
> Regards,
> -A

Re: Migrating from Endeca

2013-09-19 Thread Alexandre Rafalovitch
I think Hue ( http://cloudera.github.io/hue/ ) which Cloudera uses for Solr
search among other things has some of UI customization. And it is
open-source, so would make for much better base.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Sep 19, 2013 at 8:21 PM, Jack Krupansky wrote:

> Take a look at LucidWorks Enterprise. It has a graphical UI.
>
> But if you must meet all of the listed requirements and Lucid doesn't meet
> all of them, then... you will have to develop everything on your own. Or,
> maybe Lucid might be interested in partnering with you to allow your to add
> extensions to their UI. If you really are committed to a deep replacement
> of Endeca's UI, then rolling your own is probably the way to go. Then the
> question is whether you should open source that UI.
>
> You can also consider extending the Solr Admin UI. It does not do most of
> your listed features, but having better integration with the Solr Admin UI
> is a good idea.
>
> -- Jack Krupansky
>
> -Original Message- From: Gareth Poulton
> Sent: Thursday, September 19, 2013 7:50 AM
> To: solr-user@lucene.apache.org
> Subject: Migrating from Endeca
>
>
> Hi,
> A customer wants us to move their entire enterprise platform - of which one
> of the many components is Oracle Endeca - to open source.
> However, customers being the way they are, they don't want to have to give
> up any of the features they currently use, the most prominent of which are
> user friendly web-based editors for non-technical people to be able to edit
> things like:
> - Schema
> - Dimensions (i.e. facets)
> - Dimension groups (not sure what these are)
> - Thesaurus
> - Stopwords
> - Report generation
> - Boosting individual records (i.e. sponsored links)
> - Relevance ranking settings
> - Process pipeline editor for, e.g. adding new languages
> -...all without touching any xml.
>
> My question is, are there any solr features, plugins, modules, third party
> applications, or the like that will do this for us? Or will we have to
> develop all the above from scratch?
>
> thanks,
> Gareth
>