Re: Solr is NoSQL database or not?

2014-03-03 Thread Charlie Hull

On 01/03/2014 23:53, Jack Krupansky wrote:

NoSQL? To me it's just a marketing term, like Big Data.


+1

Depends very much who you talk to. Marketing folks like to ride the 
current wave, so if NoSQL is current, they'll jump on that one, likewise 
Big Data. Technical types like to be correct in their definitions :)


C


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Solr Shard Query From Inside Search Component Sometimes Gives Wrong Results

2014-03-03 Thread Vishnu Mishra
Hi,
  I am using Solr 4.6 and  doing Solr query on shard from inside
Solr search component and try to use the obtained results for my custom
logic. I have used a Solrj for doing distributed search, but the result
coming from this distributed search vary some time.  So the my questions
are,

1.  Can we do distributed search from Solr Search component. 
2.  Do we need to handle concurrency between Solr server by using
synchronization or other technique. 

Is there a way to make a distributed search in the Solr Search Component and
get the matched documents from all the shards? If anyone have Idea then help
me.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Shard-Query-From-Inside-Search-Component-Sometimes-Gives-Wrong-Results-tp4120840.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR cloud disaster recovery

2014-03-03 Thread Jan Van Besien
On Fri, Feb 28, 2014 at 7:50 PM, Per Steffensen  wrote:
> I might be able to find something for you. Which version are you using - I
> have some scripts that work on 4.0 and some other scripts that work for 4.4
> (and maybe later).

This sounds useful. I am using 4.6.1.

Kind regards
Jan


Re: Slow query time on stemmed fields

2014-03-03 Thread Jens Meiners
Sorry for the delay,

I did not have access to the server and could not query anything.

This is my Query:
http://server:port
/solr/core/select?q=keyword1+keyword2&wt=xml&indent=true&hl.fragsize=120&f.file_URI_tokenized.hl.fragsize=1000&spellcheck=true&f.file_content.hl.alternateField=spell&hl.simple.pre=%3Cb%3E&hl.fl=file_URI_tokenized,xmp_title,file_content&hl=true&rows=10&fl=file_URI,file_URI_tokenized,file_name,file_lastModification,file_lastModification_raw,xmp_creation_date,xmp_title,xmp_content_type,score,file_URI,host,xmp_manual_summary&hl.snippets=1&hl.useFastVectorHighlighter=true&hl.maxAlternateFieldLength=120&start=0&q=itdz+berlin&hl.simple.post=%3C/b%3E&fq=file_readright:%22wiki-access%22&debugQuery=true&defType=edismax&qf=file_URI_tokenized^10.0+file_content^10.0+xmp_title^5.0+spell^0.001&pf=file_URI_tokenized~2^1.0+file_content~100^2.0+xmp_title~2^1.0

newly extended testing showed that the normal QTime without a search on the
spell field is expected to be about 713 while it turns out to be at 70503
with the stemming parameter included like in the url above. Therefor its
just 100x slower at the moment.

Here comes the debug:


keyword1 keyword2
keyword1 keyword2
(+((DisjunctionMaxQuery((file_URI_tokenized:keyword1^10.0
| xmp_title:keyword1^5.0 | spell:keyword1^0.0010 |
file_content:keyword1^10.0))
DisjunctionMaxQuery((file_URI_tokenized:keyword2^10.0 |
xmp_title:keyword2^5.0 | spell:keyword2^0.0010 |
file_content:keyword2^10.0)))~2)
DisjunctionMaxQuery((file_URI_tokenized:"keyword1 keyword2"~2))
DisjunctionMaxQuery((file_content:"keyword1 keyword2"~100^2.0))
DisjunctionMaxQuery((xmp_title:"keyword1 keyword2"~2)))/no_coord
+(((file_URI_tokenized:keyword1^10.0 |
xmp_title:keyword1^5.0 | spell:keyword1^0.0010 |
file_content:keyword1^10.0) (file_URI_tokenized:keyword2^10.0 |
xmp_title:keyword2^5.0 | spell:keyword2^0.0010 |
file_content:keyword2^10.0))~2) (file_URI_tokenized:"keyword1 keyword2"~2)
(file_content:"keyword1 keyword2"~100^2.0) (xmp_title:"keyword1
keyword2"~2)











0.035045296 = (MATCH) sum of:
  0.035045296 = (MATCH) sum of:
0.0318122 = (MATCH) max of:
  8.29798E-4 = (MATCH) weight(spell:keyword1^0.0010 in 71660)
[DefaultSimilarity], result of:
8.29798E-4 = score(doc=71660,freq=2.0 = termFreq=2.0
), product of:
  6.7839865E-5 = queryWeight, product of:
0.0010 = boost
8.64913 = idf(docFreq=618, maxDocs=1299169)
0.0078435475 = queryNorm
  12.231716 = fieldWeight in 71660, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
8.64913 = idf(docFreq=618, maxDocs=1299169)
1.0 = fieldNorm(doc=71660)
  0.0318122 = (MATCH) weight(file_content:keyword1^10.0 in 71660)
[DefaultSimilarity], result of:
0.0318122 = score(doc=71660,freq=2.0 = termFreq=2.0
), product of:
  0.6720717 = queryWeight, product of:
10.0 = boost
8.568466 = idf(docFreq=670, maxDocs=1299169)
0.0078435475 = queryNorm
  0.047334533 = fieldWeight in 71660, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
8.568466 = idf(docFreq=670, maxDocs=1299169)
0.00390625 = fieldNorm(doc=71660)
0.003233097 = (MATCH) max of:
  0.003233097 = (MATCH) weight(file_content:keyword2^10.0 in 71660)
[DefaultSimilarity], result of:
0.003233097 = score(doc=71660,freq=1.0 = termFreq=1.0
), product of:
  0.25479192 = queryWeight, product of:
10.0 = boost
3.2484267 = idf(docFreq=137146, maxDocs=1299169)
0.0078435475 = queryNorm
  0.012689167 = fieldWeight in 71660, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.2484267 = idf(docFreq=137146, maxDocs=1299169)
0.00390625 = fieldNorm(doc=71660)


ExtendedDismaxQParser





file_readright:"wiki-access"
file_readright:wiki-access

66359.0


66357.0

80.0

0.0

0.0

65981.0

0.0

38.0

258.0



Why does the Highlighting take up this mutch time? is it a problem with my
parameter overload or does highlighting on the spell field actually take
place ?

I Noticed a 13MB file poping up only if the search results are extended via
the spell field. but highlighting this doc on a query that brings only this
doc up does not take any amount of time like this.

Thanks for your comments and time.

Best,
Jens


2014-02-24 17:32 GMT+01:00 Jack Krupansky :

> Maybe some heap/GC issue from using more of this 20 GB index. Maybe it was
> running at the edge and just one more field was too much for the heap.
>
> The "timing" section of the debug query response should shed a little
> light.
>
> -- Jack Krupansky
>
> -Original Message- From: Erick Erickson
> Sent: Monday, February 24, 2014 11:25 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Slow query time on stemmed fields
>
>
> This is really strange. You shoul

Re: Solr Shard Query From Inside Search Component Sometimes Gives Wrong Results

2014-03-03 Thread Shalin Shekhar Mangar
What was the query you are making? What is the sort order for the
query? Are you sure you are not indexing data in between making these
requests? Are you able to reproduce this outside of your search
component?

It is hard to answer questions about custom code without actually
looking at the code.

On Mon, Mar 3, 2014 at 3:37 PM, Vishnu Mishra  wrote:
> Hi,
>   I am using Solr 4.6 and  doing Solr query on shard from inside
> Solr search component and try to use the obtained results for my custom
> logic. I have used a Solrj for doing distributed search, but the result
> coming from this distributed search vary some time.  So the my questions
> are,
>
> 1.  Can we do distributed search from Solr Search component.
> 2.  Do we need to handle concurrency between Solr server by using
> synchronization or other technique.
>
> Is there a way to make a distributed search in the Solr Search Component and
> get the matched documents from all the shards? If anyone have Idea then help
> me.
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Shard-Query-From-Inside-Search-Component-Sometimes-Gives-Wrong-Results-tp4120840.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread Michael Sokolov

On 3/3/2014 1:54 AM, KNitin wrote:

3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
As others have pointed out, this is really unusual for Solr.  We often 
see high permgen in our app servers due to dynamic class loading that 
the framework performs; maybe you are somehow loading lots of new Solr 
plugins, or otherwise creating lots of classes?  Of course if you have a 
plugin or something that does a lot of string interning, that could also 
be an explanation.


-Mike


Solution for reverse order of year facets?

2014-03-03 Thread Michael Lackhoff
If I understand the docs right, it is only possible to sort facets by
count or value in ascending order. Both variants are not very helpful
for year facets if I want the most recent years at the top (or appear at
all if I restrict the number of facet entries).

It looks like a requirement that was articulated repeatedly and the
recommended solution seems to be to do some math like 1 - year and
index that. So far so good. Only problem is that I have many data
sources and I would like to avoid to change every connector to include
the new field. I think a better solution would be to have a custom
TokenFilterFactory that does it.

Since it seems a common request, did someone already build such a
TokenFilterFactory? If not, do you think I could build one myself? I do
some (script-)programming but have no experience with Java, so I think I
could adapt an example. Are there any guides out there?

Or even better, is there a built-in solution I haven't heard of?

-Michael


Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Greg Walters
Josh,

You've mentioned a couple of times that you've got PermGen set to 512M but then 
you say you're running with -XX:MaxPermSize=64M. These two statements are 
contradictory so are you *sure* that you're running with 512M of PermGen? 
Assuming your on a *nix box can you provide `ps` output proving this?

Thanks,
Greg

On Feb 28, 2014, at 5:22 PM, Furkan KAMACI  wrote:

> Hi;
> 
> You can also check here:
> http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled
> 
> Thanks;
> Furkan KAMACI
> 
> 
> 2014-02-26 22:35 GMT+02:00 Josh :
> 
>> Thanks Timothy,
>> 
>> I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to cause the
>> error to happen more quickly. With this option on it didn't seemed to do
>> any intermittent garbage collecting that delayed the issue in with it off.
>> I was already using a max of 512MB, and I can reproduce it with it set this
>> high or even higher. Right now because of how we have this implemented just
>> increasing it to something high just delays the problem :/
>> 
>> Anything else you could suggest I would really appreciate.
>> 
>> 
>> On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter >> wrote:
>> 
>>> Hi Josh,
>>> 
>>> Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM
>>> versions, permgen collection was disabled by default.
>>> 
>>> Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M may
>>> be too small.
>>> 
>>> 
>>> Timothy Potter
>>> Sr. Software Engineer, LucidWorks
>>> www.lucidworks.com
>>> 
>>> 
>>> From: Josh 
>>> Sent: Wednesday, February 26, 2014 12:27 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Solr Permgen Exceptions when creating/removing cores
>>> 
>>> We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows
>>> installation with 64bit Java 1.7U51 and we are seeing consistent issues
>>> with PermGen exceptions. We have the permgen configured to be 512MB.
>>> Bitnami ships with a 32bit version of Java for windows and we are
>> replacing
>>> it with a 64bit version.
>>> 
>>> Passed in Java Options:
>>> 
>>> -XX:MaxPermSize=64M
>>> -Xms3072M
>>> -Xmx6144M
>>> -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC
>>> -XX:CMSInitiatingOccupancyFraction=75
>>> -XX:+CMSClassUnloadingEnabled
>>> -XX:NewRatio=3
>>> 
>>> -XX:MaxTenuringThreshold=8
>>> 
>>> This is our use case:
>>> 
>>> We have what we call a database core which remains fairly static and
>>> contains the imported contents of a table from SQL server. We then have
>>> user cores which contain the record ids of results from a text search
>>> outside of Solr. We then query for the data we want from the database
>> core
>>> and limit the results to the content of the user core. This allows us to
>>> combine facet data from Solr with the search results from another engine.
>>> We are creating the user cores on demand and removing them when the user
>>> logs out.
>>> 
>>> Our issue is the constant creation and removal of user cores combined
>> with
>>> the constant importing seems to push us over our PermGen limit. The user
>>> cores are removed at the end of every session and as a test I made an
>>> application that would loop creating the user core, import a set of data
>> to
>>> it, query the database core using it as a limiter and then remove the
>> user
>>> core. My expectation was in this scenario that all the permgen associated
>>> with that user cores would be freed upon it's unload and allow permgen to
>>> reclaim that memory during a garbage collection. This was not the case,
>> it
>>> would constantly go up until the application would exhaust the memory.
>>> 
>>> I also investigated whether the there was a connection between the two
>>> cores left behind because I was joining them together in a query but even
>>> unloading the database core after unloading all the user cores won't
>>> prevent the limit from being hit or any memory to be garbage collected
>> from
>>> Solr.
>>> 
>>> Is this a known issue with creating and unloading a large number of
>> cores?
>>> Could it be configuration based for the core? Is there something other
>> than
>>> unloading that needs to happen to free the references?
>>> 
>>> Thanks
>>> 
>>> Notes: I've tried using tools to determine if it's a leak within Solr
>> such
>>> as Plumbr and my activities turned up nothing.
>>> 
>> 



Re: Solr 4.5.0 replication numDocs larger in slave

2014-03-03 Thread Greg Walters
I just ran into an issue similar to this that effected document scores on 
distributed searches. You might try doing an optimize and purging your deleted 
documents while no indexing is being done then checking your counts. Once I 
optimized all my indexes the document counts on all of my cores matched up and 
scoring was consistent.

Thanks,
Greg

On Feb 28, 2014, at 8:22 PM, Erick Erickson  wrote:

> That really shouldn't be happening IF indexing is shut off. Otherwise
> the slave is taking a snapshot of the master index and synching.
> 
> bq: The slave has about 33 more documents and one fewer
> segements (according to Overview in solr admin
> 
> Sounds like the master is still indexing and you've deleted documents
> on the master.
> 
> Best,
> Erick
> 
> 
> On Fri, Feb 28, 2014 at 11:08 AM, Geary, Frank 
> wrote:
> 
>> Hi,
>> 
>> I'm using Solr 4.5.0, I have a single master replicating to a single
>> slave.  Only the master is being indexed to - never the slave.  The master
>> is committed once each night.  After the first commit and replication the
>> numDoc counts are identical.  After the next nightly commit and after the
>> second replication a few minutes later, the numDocs has increased in both
>> the master and the slave as expected, but numDocs is not the same in the
>> master as it is in the slave.  The slave has about 33 more documents and
>> one fewer segements (according to Overview in solr admin).
>> 
>> I suspect the numDocs may be in sync again after tonight, but can anyone
>> explain what is going on here?   Is it possible a few deletions got
>> committed to the master but not replicated to the slave?
>> 
>> Thanks
>> 
>> Frank
>> 
>> 
>> 



Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Josh
It's a windows installation using a bitnami solr installer. I incorrectly
put 64M into the configuration for this, as I had copied the test
configuration I was using to recreate the permgen issue we were seeing on
our production system (that is configured to 512M) as it takes awhile with
to recreate the issue with larger permgen values. In the test scenario
there was a small 180 document data core that's static with 8 dynamic user
cores that are used to index the unique document ids in the users view,
which is then merged into a single user core. The final user core contains
the same number of document ids as the data core and the data core is
queried against with the ids in the final merged user core as the limiter.
The user cores are then unloaded, and deleted from the drive and then the
test is reran again with the user cores re-created

We are also using the core discovery mode to store/find our cores and the
database data core is using dynamic fields with a mix of single value and
multi value fields. The user cores use a static configuration. The data is
indexed from SQL Server using jtDS for both the user and data cores. As a
note we also reversed the test case I mention above where we keep the user
cores static and dynamically create the database core and this created the
same issue only it leaked faster. We assumed this because the configuration
was larger/loaded more classes then the simpler user core.

When I get the time I'm going to put together a SolrJ test app to recreate
the issue outside of our environment to see if others see the same issue
we're seeing to rule out any kind of configuration problem. Right now we're
interacting with solr with POCO via the restful interface and it's not very
easy for us to spin this off into something someone else could use. In the
mean time we've made changes to make the user cores more static, this has
slowed down the build up of permgen to something that can be managed by a
weekly reset.

Sorry about the confusion in my initial email and I appreciate the
response. Anything about my configuration that you can think might be
useful just let me know and I can provide it. We have a work around, but it
really hampers what our long term goals were for our Solr implementation.

Thanks
Josh


On Mon, Mar 3, 2014 at 9:57 AM, Greg Walters wrote:

> Josh,
>
> You've mentioned a couple of times that you've got PermGen set to 512M but
> then you say you're running with -XX:MaxPermSize=64M. These two statements
> are contradictory so are you *sure* that you're running with 512M of
> PermGen? Assuming your on a *nix box can you provide `ps` output proving
> this?
>
> Thanks,
> Greg
>
> On Feb 28, 2014, at 5:22 PM, Furkan KAMACI  wrote:
>
> > Hi;
> >
> > You can also check here:
> >
> http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled
> >
> > Thanks;
> > Furkan KAMACI
> >
> >
> > 2014-02-26 22:35 GMT+02:00 Josh :
> >
> >> Thanks Timothy,
> >>
> >> I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to cause
> the
> >> error to happen more quickly. With this option on it didn't seemed to do
> >> any intermittent garbage collecting that delayed the issue in with it
> off.
> >> I was already using a max of 512MB, and I can reproduce it with it set
> this
> >> high or even higher. Right now because of how we have this implemented
> just
> >> increasing it to something high just delays the problem :/
> >>
> >> Anything else you could suggest I would really appreciate.
> >>
> >>
> >> On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter  >>> wrote:
> >>
> >>> Hi Josh,
> >>>
> >>> Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM
> >>> versions, permgen collection was disabled by default.
> >>>
> >>> Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M
> may
> >>> be too small.
> >>>
> >>>
> >>> Timothy Potter
> >>> Sr. Software Engineer, LucidWorks
> >>> www.lucidworks.com
> >>>
> >>> 
> >>> From: Josh 
> >>> Sent: Wednesday, February 26, 2014 12:27 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Solr Permgen Exceptions when creating/removing cores
> >>>
> >>> We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows
> >>> installation with 64bit Java 1.7U51 and we are seeing consistent issues
> >>> with PermGen exceptions. We have the permgen configured to be 512MB.
> >>> Bitnami ships with a 32bit version of Java for windows and we are
> >> replacing
> >>> it with a 64bit version.
> >>>
> >>> Passed in Java Options:
> >>>
> >>> -XX:MaxPermSize=64M
> >>> -Xms3072M
> >>> -Xmx6144M
> >>> -XX:+UseParNewGC
> >>> -XX:+UseConcMarkSweepGC
> >>> -XX:CMSInitiatingOccupancyFraction=75
> >>> -XX:+CMSClassUnloadingEnabled
> >>> -XX:NewRatio=3
> >>>
> >>> -XX:MaxTenuringThreshold=8
> >>>
> >>> This is our use case:
> >>>
> >>> We have what we call a database core which remains fairly static and
> >>> contains the imported contents of a table from SQL serv

Re: Solution for reverse order of year facets?

2014-03-03 Thread Ahmet Arslan
Hi,

Currently there are two storing criteria available. However sort by index - to 
return the constraints sorted in their index order (lexicographic by indexed 
term) - should return most recent year at top, no?

Ahmet



On Monday, March 3, 2014 4:36 PM, Michael Lackhoff  wrote:
If I understand the docs right, it is only possible to sort facets by
count or value in ascending order. Both variants are not very helpful
for year facets if I want the most recent years at the top (or appear at
all if I restrict the number of facet entries).

It looks like a requirement that was articulated repeatedly and the
recommended solution seems to be to do some math like 1 - year and
index that. So far so good. Only problem is that I have many data
sources and I would like to avoid to change every connector to include
the new field. I think a better solution would be to have a custom
TokenFilterFactory that does it.

Since it seems a common request, did someone already build such a
TokenFilterFactory? If not, do you think I could build one myself? I do
some (script-)programming but have no experience with Java, so I think I
could adapt an example. Are there any guides out there?

Or even better, is there a built-in solution I haven't heard of?

-Michael



Multiple partial match

2014-03-03 Thread Zwer
Hi Guys,

Faced with a problem: make query to SOLR *name:co*^5*

It returns me two docs with equal score: {id: 1, name: 'Coca-Cola Company'},
{id: 2, name: Microsoft Corporation}.


How can I boost Coca-Cola Company because it contains more partial matches ?


P.S. All normalization used by TF-IDF engine disabled.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-partial-match-tp4120886.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solution for reverse order of year facets?

2014-03-03 Thread Michael Lackhoff
On 03.03.2014 16:33 Ahmet Arslan wrote:

> Currently there are two storing criteria available. However sort by index - 
> to return the constraints sorted in their index order (lexicographic by 
> indexed term) - should return most recent year at top, no?

No, it returns them -- as you say -- in lexicographic order and that
means oldest first, like:
1815
1820
...
2012
2013
(might well stop before we get here)
2014

-Michael


Re: Solr is NoSQL database or not?

2014-03-03 Thread Furkan KAMACI
Hi;

I said that:

"What are the main differences between ElasticSearch
and Solr that makes ElasticSearc a NoSQL store but not Solr."

because it is just a marketing term as Jack indicated after me. Also I said:

"The first link you provided includes ElasticSearch:
http://en.wikipedia.org/wiki/NoSQL
 as a Document Store"

I mean if you can add Solr to the wikipedia page but it is not a reference.
Because these are all "marketin terms" as like Big Data. You should
remember the definition of Big Data: "Data that is much more than you can
process with traditional methods" so it is not an exactly defined
definition. One can say Big Data for something but one can not. It is
similar to NoSQL.

Thanks;
Furkan KAMACI


2014-03-03 11:28 GMT+02:00 Charlie Hull :

> On 01/03/2014 23:53, Jack Krupansky wrote:
>
>> NoSQL? To me it's just a marketing term, like Big Data.
>>
>>  +1
>
> Depends very much who you talk to. Marketing folks like to ride the
> current wave, so if NoSQL is current, they'll jump on that one, likewise
> Big Data. Technical types like to be correct in their definitions :)
>
> C
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>


RE: Solr 4.5.0 replication numDocs larger in slave

2014-03-03 Thread Geary, Frank
Thanks Erick.  Indexing is not happening to the slave since it has never been 
set up there - there aren't even any commits happening on the slave (which we 
normally do via cron job).  But Indexing is definitely happening to the master 
at the time replication happens.  

" Sounds like the master is still indexing and you've deleted documents on the 
master.":

Yes, that's exactly what I suspect is happening.  But if that's true, I'd like 
to understand how those deletes could find there way into being replicated to 
the slave when the only commit happening on the master was presumably completed 
before the replication.  Do deletes get committed in some special way outside 
of an explicit commit?  Or do they get copied over to the slave as part of the 
replication and therefore effectively get committed to the slave before they 
are committed to the master?

My replication is configured to replicate after commit and after startup.  The 
slave polls the master every 10 minutes.  The master commits only once a day.  
Presumably the only time the number of documents changes is at the end of the 
commit.  Then once the commit is done I'd expect replication to begin.  So in 
order to end up with a different numDocs in the slave there would need to be 
some sort of commit happening during the replication, right?

Frank

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, February 28, 2014 9:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.5.0 replication numDocs larger in slave

That really shouldn't be happening IF indexing is shut off. Otherwise the slave 
is taking a snapshot of the master index and synching.

bq: The slave has about 33 more documents and one fewer segements (according to 
Overview in solr admin

Sounds like the master is still indexing and you've deleted documents on the 
master.

Best,
Erick


On Fri, Feb 28, 2014 at 11:08 AM, Geary, Frank wrote:

> Hi,
>
> I'm using Solr 4.5.0, I have a single master replicating to a single 
> slave.  Only the master is being indexed to - never the slave.  The 
> master is committed once each night.  After the first commit and 
> replication the numDoc counts are identical.  After the next nightly 
> commit and after the second replication a few minutes later, the 
> numDocs has increased in both the master and the slave as expected, 
> but numDocs is not the same in the master as it is in the slave.  The 
> slave has about 33 more documents and one fewer segements (according to 
> Overview in solr admin).
>
> I suspect the numDocs may be in sync again after tonight, but can anyone
> explain what is going on here?   Is it possible a few deletions got
> committed to the master but not replicated to the slave?
>
> Thanks
>
> Frank
>
>
>




Re: SolrCloud: heartbeat succeeding while node has failing SSD?

2014-03-03 Thread Gregg Donovan
Thanks, Mark!

The supervised process sounds very promising but complicated to get right.
E.g. where does the supervisor run, where do nodes report their status to,
are the checks active or passive, etc.

Having each node perform a regular background self-check and remove itself
from the cluster if that healthcheck doesn't pass seems like a great first
step, though. The most common failure we've seen has been disk failure and
a self-check should usually detect that. (JIRA:
https://issues.apache.org/jira/browse/SOLR-5805)

It would also be nice, as a cluster operator, to have an easy way to remove
a failing node from the cluster. Ideally, right from the Solr UI, but even
from a command-line script would be great. In the cases of disk failure, we
can often not SSH into a node to shut down the VM that's still connected to
ZooKeeper. We have to physically power it down. Having something quicker
would be great. (JIRA: https://issues.apache.org/jira/browse/SOLR-5806)




On Sun, Mar 2, 2014 at 9:36 PM, Mark Miller  wrote:

> The heartbeat that keeps the node alive is the connection it maintains
> with ZooKeeper.
>
> We don't currently have anything built in that will actively make sure
> each node can serve queries and remove it from clusterstatem.json if it
> cannot. If a replica is maintaining it's connection with ZooKeeper and in
> most cases, if it is accepting updates, it will appear up. Load balancing
> should handle the failures, but I guess it depends on how sticky the
> request fails are.
>
> In the past, I've seen this handled on a different search engine by having
> a variety of external agent scripts that would occasionally attempt to do a
> query, and if things did not go right, it killed the process to cause it to
> try and startup again (supervised process).
>
> I'm not sure what the right long term feature for Solr is here, but feel
> free to start a JIRA issue around it.
>
> One simple improvement might even be a background thread that periodically
> checks some local readings and depending on the results, pulls itself out
> of the mix as best it can (remove itself from clusterstate.json or simply
> closes it's zk conneciton).
>
> - Mark
>
> http://about.me/markrmiller
>
> On Mar 2, 2014, at 3:42 PM, Gregg Donovan  wrote:
>
> > We had a brief SolrCloud outage this weekend when a node's SSD began to
> > fail but the node still appeared to be up to the rest of the SolrCloud
> > cluster (i.e. still green in clusterstate.json). Distributed queries that
> > reached this node would fail but whatever heartbeat keeps the node in the
> > clustrstate.json must have continued to succeed.
> >
> > We eventually had to power the node down to get it to be removed from
> > clusterstate.json.
> >
> > This is our first foray into SolrCloud, so I'm still somewhat fuzzy on
> what
> > the default heartbeat mechanism is and how we may augment it to be sure
> > that the disk is checked as part of the heartbeat and/or we verify that
> it
> > can serve queries.
> >
> > Any pointers would be appreciated.
> >
> > Thanks!
> >
> > --Gregg
>
>


Configuration problem

2014-03-03 Thread Thomas Fischer
Hello,

for some reason I have problems to get my local solr system to run (MacBook, 
tomcat 6.0.35).

The setting is
solr directories (I use different solr versions at the same time):
/srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the 
new "discovery type" (no cores), and inside the core directories are empty 
files core.properties and symbolic links to the universal conf directory.
 
solr webapps (I use very different webapps simultaneously):
/srv/www/webapps/solr/solr4.6.1 is the solr webapp

I tried to convey this information to the tomcat server by putting a file 
solr4.6.1.xml into the cataiina/localhost folder with the contents





The Tomcat Manager shows solr4.6.1 as started, but following the given link 
gives an error with the message:
"SolrCore 'collection1' is not available due to init failure: Could not load 
config file /srv/solr4.6.1/collection1/solrconfig.xml"
which is plausible, since
1. there is no folder /srv/solr4.6.1/collection1 and
2.for the actual cores solrconfig.xml is inside of 
/srv/solr4.6.1/cores/geo/conf/

But why does Tomcat try to find a solrconfig.xml there?
The problem persists if I start tomcat with 
-Dsolr.solr.home=/srv/solr/solr4.6.1, it seems that the system just ignores the 
solr home setting.

Can somebody give me a hint what I'm doing wrong?

Best regards
Thomas

P.S.: Is there a way to stop Tomcat from throwing these errors into my face 
threefold: once as heading (!), once as message and once as description?




Re: Solr is NoSQL database or not?

2014-03-03 Thread Jack Krupansky
For the record, I am +1 for somebody to add Solr to the NoSQL wikipedia 
page, in much the same way that Elasticsearch is already there.


From a LucidWorks webinar blurb: "The long awaited Solr 4 release brings a 
large amount of new functionality that blurs the line between search engines 
and NoSQL databases. Now you can have your cake and search it too with 
Atomic updates, Versioning and Optimistic Concurrency, Durability, and 
Real-time Get! Learn about new Solr NoSQL features and implementation 
details of how the distributed indexing of Solr Cloud was designed from the 
ground up to accommodate them."


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Monday, March 3, 2014 10:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr is NoSQL database or not?

Hi;

I said that:

"What are the main differences between ElasticSearch
and Solr that makes ElasticSearc a NoSQL store but not Solr."

because it is just a marketing term as Jack indicated after me. Also I said:

"The first link you provided includes ElasticSearch:
http://en.wikipedia.org/wiki/NoSQL
as a Document Store"

I mean if you can add Solr to the wikipedia page but it is not a reference.
Because these are all "marketin terms" as like Big Data. You should
remember the definition of Big Data: "Data that is much more than you can
process with traditional methods" so it is not an exactly defined
definition. One can say Big Data for something but one can not. It is
similar to NoSQL.

Thanks;
Furkan KAMACI


2014-03-03 11:28 GMT+02:00 Charlie Hull :


On 01/03/2014 23:53, Jack Krupansky wrote:


NoSQL? To me it's just a marketing term, like Big Data.

 +1


Depends very much who you talk to. Marketing folks like to ride the
current wave, so if NoSQL is current, they'll jump on that one, likewise
Big Data. Technical types like to be correct in their definitions :)

C


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk





Re: Fetching uniqueKey and other int quickly from documentCache?

2014-03-03 Thread Gregg Donovan
Yonik,

That's a very clever idea. Unfortunately, I think that will skip the
distributed query optimization we were hoping to take advantage of in
SOLR-1880 [1], but it should work with the proposed distrib.singlePass
optimization in SOLR-5768 [2]. Does that sound right?

--Gregg

[1] https://issues.apache.org/jira/browse/SOLR-1880
[2] https://issues.apache.org/jira/browse/SOLR-5768


On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley  wrote:

> You could try forcing things to go through function queries (via
> pseudo-fields):
>
> fl=field(id), field(myfield)
>
> If you're not requesting any stored fields, that *might* currently
> skip that step.
>
> -Yonik
> http://heliosearch.org - native off-heap filters and fieldcache for solr
>
>
> On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan  wrote:
> > We fetch a large number of documents -- 1000+ -- for each search. Each
> > request fetches only the uniqueKey or the uniqueKey plus one secondary
> > integer key. Despite this, we find that we spent a sizable amount of time
> > in SolrIndexSearcher#doc(int docId, Set fields). Time is spent
> > fetching the two stored fields, LZ4 decoding, etc.
> >
> > I would love to be able to tell Solr to always fetch these two fields
> from
> > memory. We have them both in the fieldCache so we're already spending the
> > RAM. I've seen this asked previously [1], so it seems like a fairly
> common
> > need, especially for distributed search. Any ideas?
> >
> > A few possible ideas I had:
> >
> > --Check FieldCache.html#getCacheEntries() before going to stored fields.
> > --Give the documentCache config a list of fields it should load from the
> > fieldCache
> >
> >
> > Having an in-memory mapping from docId->uniqueKey has come up for us
> > before. We've used a custom SolrCache maintaining that mapping to quickly
> > filter over personalized collections. Maybe the uniqueKey should be more
> > optimized out of the box? Perhaps a custom "uniqueKey" codec that also
> > maintained the docId->uniqueKey mapping in memory?
> >
> > --Gregg
> >
> > [1] http://search-lucene.com/m/oCUKJ1heHUU1
>


Solr Filter Cache Size

2014-03-03 Thread Benjamin Wiens
How can we calculate how much heap memory the filter cache will consume? We
understand that in order to determine a good size we also need to evaluate
how many filterqueries would be used over a certain time period.



Here's our setting:







According to the post below, 53 GB of RAM would be needed just by the
filter cache alone with 1.4 million Docs. Not sure if this true and how
this would work.



Reference:
http://stackoverflow.com/questions/2004/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem



We filled the filterquery cache with Solr Meter and had a JVM Heap Size of
far less than 53 GB.



Can anyone chime in and enlighten us?



Thank you!


Ben Wiens & Benjamin Mosior


Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Tri Cao
Hey Josh,I am not an expert in Java performance, but I would start with  dumping a the heapand investigate with visualvm (the free tool that comes with JDK).In my experience, the most common cause for PermGen exception is the app createstoo many interned strings. Solr (actually Lucene) interns the field names so if you havetoo many fields, it might be the cause. How many fields in total across cores did youcreate before the exception?Can you reproduce the problem with the standard Solr? Is the bitnami distribution justSolr or do they have some other libraries?Hope this helps,TriOn Mar 03, 2014, at 07:28 AM, Josh  wrote:It's a windows installation using a bitnami solr installer. I incorrectly put 64M into the configuration for this, as I had copied the test configuration I was using to recreate the permgen issue we were seeing on our production system (that is configured to 512M) as it takes awhile with to recreate the issue with larger permgen values. In the test scenario there was a small 180 document data core that's static with 8 dynamic user cores that are used to index the unique document ids in the users view, which is then merged into a single user core. The final user core contains the same number of document ids as the data core and the data core is queried against with the ids in the final merged user core as the limiter. The user cores are then unloaded, and deleted from the drive and then the test is reran again with the user cores re-created  We are also using the core discovery mode to store/find our cores and the database data core is using dynamic fields with a mix of single value and multi value fields. The user cores use a static configuration. The data is indexed from SQL Server using jtDS for both the user and data cores. As a note we also reversed the test case I mention above where we keep the user cores static and dynamically create the database core and this created the same issue only it leaked faster. We assumed this because the configuration was larger/loaded more classes then the simpler user core.  When I get the time I'm going to put together a SolrJ test app to recreate the issue outside of our environment to see if others see the same issue we're seeing to rule out any kind of configuration problem. Right now we're interacting with solr with POCO via the restful interface and it's not very easy for us to spin this off into something someone else could use. In the mean time we've made changes to make the user cores more static, this has slowed down the build up of permgen to something that can be managed by a weekly reset.  Sorry about the confusion in my initial email and I appreciate the response. Anything about my configuration that you can think might be useful just let me know and I can provide it. We have a work around, but it really hampers what our long term goals were for our Solr implementation.  Thanks Josh   On Mon, Mar 3, 2014 at 9:57 AM, Greg Walters wrote: Josh,You've mentioned a couple of times that you've got PermGen set to 512M butthen you say you're running with -XX:MaxPermSize=64M. These two statementsare contradictory so are you *sure* that you're running with 512M ofPermGen? Assuming your on a *nix box can you provide `ps` output provingthis?Thanks,GregOn Feb 28, 2014, at 5:22 PM, Furkan KAMACI  wrote:> Hi;>> You can also check here:>http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled>> Thanks;> Furkan KAMACI>>> 2014-02-26 22:35 GMT+02:00 Josh :>>> Thanks Timothy, I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to causethe>> error to happen more quickly. With this option on it didn't seemed to do>> any intermittent garbage collecting that delayed the issue in with itoff.>> I was already using a max of 512MB, and I can reproduce it with it setthis>> high or even higher. Right now because of how we have this implementedjust>> increasing it to something high just delays the problem :/ Anything else you could suggest I would really appreciate.>> On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter >> wrote:> Hi Josh,>> Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM>>> versions, permgen collection was disabled by default.>> Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64Mmay>>> be too small.> Timothy Potter>>> Sr. Software Engineer, LucidWorks>>> www.lucidworks.com>> >>> From: Josh >> To: solr-user@lucene.apache.org>>> Subject: Solr Permgen Exceptions when creating/removing cores>> We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows>>> installation with 64bit Java 1.7U51 and we are seeing consistent issues>>> with PermGen exceptions. We have the permgen configured to be 512MB.>>> Bitnami ships with a 32bit version of 

Re: Multiple partial match

2014-03-03 Thread Jack Krupansky

Add a function query boost that uses the term frequency, "tf":

bf=tf(name,'co')  -- additive boost

boost=tf(name,'co')  -- multiplicative boost

That does of course require that term frequency is not disabled for that 
field in the schema.


You can multiply the term frequency as well in the function query.

boost=product(tf(name,'co'),10)

-- Jack Krupansky

-Original Message- 
From: Zwer

Sent: Monday, March 3, 2014 10:34 AM
To: solr-user@lucene.apache.org
Subject: Multiple partial match

Hi Guys,

Faced with a problem: make query to SOLR *name:co*^5*

It returns me two docs with equal score: {id: 1, name: 'Coca-Cola Company'},
{id: 2, name: Microsoft Corporation}.


How can I boost Coca-Cola Company because it contains more partial matches ?


P.S. All normalization used by TF-IDF engine disabled.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-partial-match-tp4120886.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Fetching uniqueKey and other int quickly from documentCache?

2014-03-03 Thread Yonik Seeley
On Mon, Mar 3, 2014 at 11:14 AM, Gregg Donovan  wrote:
> Yonik,
>
> That's a very clever idea. Unfortunately, I think that will skip the
> distributed query optimization we were hoping to take advantage of in
> SOLR-1880 [1], but it should work with the proposed distrib.singlePass
> optimization in SOLR-5768 [2]. Does that sound right?


Yep, the two together should do the trick.

-Yonik
http://heliosearch.org - native off-heap filters and fieldcache for solr


> --Gregg
>
> [1] https://issues.apache.org/jira/browse/SOLR-1880
> [2] https://issues.apache.org/jira/browse/SOLR-5768
>
>
> On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley  wrote:
>
>> You could try forcing things to go through function queries (via
>> pseudo-fields):
>>
>> fl=field(id), field(myfield)
>>
>> If you're not requesting any stored fields, that *might* currently
>> skip that step.
>>
>> -Yonik
>> http://heliosearch.org - native off-heap filters and fieldcache for solr
>>
>>
>> On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan  wrote:
>> > We fetch a large number of documents -- 1000+ -- for each search. Each
>> > request fetches only the uniqueKey or the uniqueKey plus one secondary
>> > integer key. Despite this, we find that we spent a sizable amount of time
>> > in SolrIndexSearcher#doc(int docId, Set fields). Time is spent
>> > fetching the two stored fields, LZ4 decoding, etc.
>> >
>> > I would love to be able to tell Solr to always fetch these two fields
>> from
>> > memory. We have them both in the fieldCache so we're already spending the
>> > RAM. I've seen this asked previously [1], so it seems like a fairly
>> common
>> > need, especially for distributed search. Any ideas?
>> >
>> > A few possible ideas I had:
>> >
>> > --Check FieldCache.html#getCacheEntries() before going to stored fields.
>> > --Give the documentCache config a list of fields it should load from the
>> > fieldCache
>> >
>> >
>> > Having an in-memory mapping from docId->uniqueKey has come up for us
>> > before. We've used a custom SolrCache maintaining that mapping to quickly
>> > filter over personalized collections. Maybe the uniqueKey should be more
>> > optimized out of the box? Perhaps a custom "uniqueKey" codec that also
>> > maintained the docId->uniqueKey mapping in memory?
>> >
>> > --Gregg
>> >
>> > [1] http://search-lucene.com/m/oCUKJ1heHUU1
>>


Re: Multiple partial match

2014-03-03 Thread Zwer
AFAICS tf(name, 'co') returns 0 on the {id:1, name:'Coca-Cola Company'}
because it does not support partial match. 
tf(name, 'company') will return 1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-partial-match-tp4120886p4120919.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Josh
In the user core there are two fields, the database core in question was
40, but in production environments the database core is dynamic. My time
has been pretty crazy trying to get this out the door and we haven't tried
a standard solr install yet but it's on my plate for the test app and I
don't know enough about Solr/Bitnami to know if they've done any serious
modifications to it.

I had tried doing a dump from VisualVM previously but it didn't seem to
give me anything useful but then again I didn't know how to look for
interned strings. This is something I can take another look at in the
coming weeks when I do my test case against a standard solr install with
SolrJ. The exception with user cores happens after 80'ish runs, so 640'ish
user cores with the PermGen set to 64MB. The database core test was far
lower, it was in the 10-15 range. As a note once the permgen limit is hit,
if we simply restart the service with the same number of cores loaded the
permgen usage is minimal even with the amount of user cores being high in
our production environment (500-600).

If this does end up being the interning of strings, is there anyway it can
be mitigated? Our production environment for our heavier users would see in
the range of 3200+ user cores created a day.

Thanks for the help.
Josh


On Mon, Mar 3, 2014 at 11:24 AM, Tri Cao  wrote:

> Hey Josh,
>
> I am not an expert in Java performance, but I would start with  dumping a
> the heap
> and investigate with visualvm (the free tool that comes with JDK).
>
> In my experience, the most common cause for PermGen exception is the app
> creates
> too many interned strings. Solr (actually Lucene) interns the field names
> so if you have
> too many fields, it might be the cause. How many fields in total across
> cores did you
> create before the exception?
>
> Can you reproduce the problem with the standard Solr? Is the bitnami
> distribution just
> Solr or do they have some other libraries?
>
> Hope this helps,
> Tri
>
> On Mar 03, 2014, at 07:28 AM, Josh  wrote:
>
> It's a windows installation using a bitnami solr installer. I incorrectly
> put 64M into the configuration for this, as I had copied the test
> configuration I was using to recreate the permgen issue we were seeing on
> our production system (that is configured to 512M) as it takes awhile with
> to recreate the issue with larger permgen values. In the test scenario
> there was a small 180 document data core that's static with 8 dynamic user
> cores that are used to index the unique document ids in the users view,
> which is then merged into a single user core. The final user core contains
> the same number of document ids as the data core and the data core is
> queried against with the ids in the final merged user core as the limiter.
> The user cores are then unloaded, and deleted from the drive and then the
> test is reran again with the user cores re-created
>
> We are also using the core discovery mode to store/find our cores and the
> database data core is using dynamic fields with a mix of single value and
> multi value fields. The user cores use a static configuration. The data is
> indexed from SQL Server using jtDS for both the user and data cores. As a
> note we also reversed the test case I mention above where we keep the user
> cores static and dynamically create the database core and this created the
> same issue only it leaked faster. We assumed this because the configuration
> was larger/loaded more classes then the simpler user core.
>
> When I get the time I'm going to put together a SolrJ test app to recreate
> the issue outside of our environment to see if others see the same issue
> we're seeing to rule out any kind of configuration problem. Right now we're
> interacting with solr with POCO via the restful interface and it's not very
> easy for us to spin this off into something someone else could use. In the
> mean time we've made changes to make the user cores more static, this has
> slowed down the build up of permgen to something that can be managed by a
> weekly reset.
>
> Sorry about the confusion in my initial email and I appreciate the
> response. Anything about my configuration that you can think might be
> useful just let me know and I can provide it. We have a work around, but it
> really hampers what our long term goals were for our Solr implementation.
>
> Thanks
> Josh
>
>
> On Mon, Mar 3, 2014 at 9:57 AM, Greg Walters  >wrote:
>
> Josh,
>
> You've mentioned a couple of times that you've got PermGen set to 512M but
>
> then you say you're running with -XX:MaxPermSize=64M. These two statements
>
> are contradictory so are you *sure* that you're running with 512M of
>
> PermGen? Assuming your on a *nix box can you provide `ps` output proving
>
> this?
>
> Thanks,
>
> Greg
>
> On Feb 28, 2014, at 5:22 PM, Furkan KAMACI  wrote:
>
> > Hi;
>
> >
>
> > You can also check here:
>
> >
>
>
> http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloading

RE: Solr 4.5.0 replication numDocs larger in slave

2014-03-03 Thread Geary, Frank
Thanks Greg.  We optimize the master once a week (early in the day Sunday) and 
we do not do a commit Sunday evening (the only evening of the week when we do 
not commit).  So now after optimization/replication the master/slave pair that 
were out on sync on Friday now have the same numDocs (and every other value on 
the Overview page agrees except "size" under Replication where it shows the 
slave is smaller).  Unfortunately, a different master/slave pair now have 
different numDocs after the optimize and replication done yesterday.  

For the newly out of sync master/slave pair, the Version (Under Statistics on 
the Overview page) is 4 revisions earlier on the slave than on the master and 
there are two fewer segments on the slave than there are on the master.   Under 
Replication on the Overview page, the Versions and Gen's are all the same, but 
the size of the slave is smaller than the master.  The slave has 51 fewer 
documents than the master.   But indexing is continuing on the master (but no 
commit has happened since the optimization early Sunday.)

I wonder if this is related to the NRT functionality in some way.  I see "Impl: 
org.apache.solr.core.NRTCachingDirectoryFactory" on the Overview page.  I've 
been trying to rely on default behavior whenever possible.  But perhaps I need 
to turn something off? 

Frank

-Original Message-
From: Greg Walters [mailto:greg.walt...@answers.com] 
Sent: Monday, March 03, 2014 10:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.5.0 replication numDocs larger in slave

I just ran into an issue similar to this that effected document scores on 
distributed searches. You might try doing an optimize and purging your deleted 
documents while no indexing is being done then checking your counts. Once I 
optimized all my indexes the document counts on all of my cores matched up and 
scoring was consistent.

Thanks,
Greg

On Feb 28, 2014, at 8:22 PM, Erick Erickson  wrote:

> That really shouldn't be happening IF indexing is shut off. Otherwise 
> the slave is taking a snapshot of the master index and synching.
> 
> bq: The slave has about 33 more documents and one fewer segements 
> (according to Overview in solr admin
> 
> Sounds like the master is still indexing and you've deleted documents 
> on the master.
> 
> Best,
> Erick
> 
> 
> On Fri, Feb 28, 2014 at 11:08 AM, Geary, Frank 
> wrote:
> 
>> Hi,
>> 
>> I'm using Solr 4.5.0, I have a single master replicating to a single 
>> slave.  Only the master is being indexed to - never the slave.  The 
>> master is committed once each night.  After the first commit and 
>> replication the numDoc counts are identical.  After the next nightly 
>> commit and after the second replication a few minutes later, the 
>> numDocs has increased in both the master and the slave as expected, 
>> but numDocs is not the same in the master as it is in the slave.  The 
>> slave has about 33 more documents and one fewer segements (according to 
>> Overview in solr admin).
>> 
>> I suspect the numDocs may be in sync again after tonight, but can anyone
>> explain what is going on here?   Is it possible a few deletions got
>> committed to the master but not replicated to the slave?
>> 
>> Thanks
>> 
>> Frank
>> 
>> 
>> 



Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Tri Cao
If it's really the interned strings, you could try upgrade JDK, as the newer HotSpotJVM puts interned strings in regular heap:http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html(search for String.intern() in that release)I haven't got a chance to look into the new core auto discovery code, so I don't knowif it's implemented with reflection or not. Reflection and dynamic class loading is anothersource of PermGen exception, in my experience.I don't see anything wrong with your JVM config, which is very much standard.Hope this helps,TriOn Mar 03, 2014, at 08:52 AM, Josh  wrote:In the user core there are two fields, the database core in question was 40, but in production environments the database core is dynamic. My time has been pretty crazy trying to get this out the door and we haven't tried a standard solr install yet but it's on my plate for the test app and I don't know enough about Solr/Bitnami to know if they've done any serious modifications to it.  I had tried doing a dump from VisualVM previously but it didn't seem to give me anything useful but then again I didn't know how to look for interned strings. This is something I can take another look at in the coming weeks when I do my test case against a standard solr install with SolrJ. The exception with user cores happens after 80'ish runs, so 640'ish user cores with the PermGen set to 64MB. The database core test was far lower, it was in the 10-15 range. As a note once the permgen limit is hit, if we simply restart the service with the same number of cores loaded the permgen usage is minimal even with the amount of user cores being high in our production environment (500-600).  If this does end up being the interning of strings, is there anyway it can be mitigated? Our production environment for our heavier users would see in the range of 3200+ user cores created a day.  Thanks for the help. Josh   On Mon, Mar 3, 2014 at 11:24 AM, Tri Cao  wrote: Hey Josh,I am not an expert in Java performance, but I would start with dumping athe heapand investigate with visualvm (the free tool that comes with JDK).In my experience, the most common cause for PermGen exception is the appcreatestoo many interned strings. Solr (actually Lucene) interns the field namesso if you havetoo many fields, it might be the cause. How many fields in total acrosscores did youcreate before the exception?Can you reproduce the problem with the standard Solr? Is the bitnamidistribution justSolr or do they have some other libraries?Hope this helps,TriOn Mar 03, 2014, at 07:28 AM, Josh  wrote:It's a windows installation using a bitnami solr installer. I incorrectlyput 64M into the configuration for this, as I had copied the testconfiguration I was using to recreate the permgen issue we were seeing onour production system (that is configured to 512M) as it takes awhile withto recreate the issue with larger permgen values. In the test scenariothere was a small 180 document data core that's static with 8 dynamic usercores that are used to index the unique document ids in the users view,which is then merged into a single user core. The final user core containsthe same number of document ids as the data core and the data core isqueried against with the ids in the final merged user core as the limiter.The user cores are then unloaded, and deleted from the drive and then thetest is reran again with the user cores re-createdWe are also using the core discovery mode to store/find our cores and thedatabase data core is using dynamic fields with a mix of single value andmulti value fields. The user cores use a static configuration. The data isindexed from SQL Server using jtDS for both the user and data cores. As anote we also reversed the test case I mention above where we keep the usercores static and dynamically create the database core and this created thesame issue only it leaked faster. We assumed this because the configurationwas larger/loaded more classes then the simpler user core.When I get the time I'm going to put together a SolrJ test app to recreatethe issue outside of our environment to see if others see the same issuewe're seeing to rule out any kind of configuration problem. Right now we'reinteracting with solr with POCO via the restful interface and it's not veryeasy for us to spin this off into something someone else could use. In themean time we've made changes to make the user cores more static, this hasslowed down the build up of permgen to something that can be managed by aweekly reset.Sorry about the confusion in my initial email and I appreciate theresponse. Anything about my configuration that you can think might beuseful just let me know and I can provide it. We have a work around, but itreally hampers what our long term goals were for our Solr implementation.ThanksJoshOn Mon, Mar 3, 2014 at 9:57 AM, Greg Walters wrote:Josh,You've mentioned a couple of times that you've got PermGen set to 512M 

RE: How to best handle search like Dave & David

2014-03-03 Thread Susheel Kumar
Thanks, Arun for sharing the idea on EdgeNGramFilter. In our case we are doing 
search using automated process so may EdgeNGramFilter may not work.  Wwe have 
used NGramFilterFactory in the past but will look into it again.

For cases like Dave & David and other English names does anyone has  idea which 
stemmer (currently using PorterStemFilterFactory) can work better? 

-Original Message-
From: Arun Rangarajan [mailto:arunrangara...@gmail.com] 
Sent: Sunday, March 02, 2014 1:47 PM
To: solr-user@lucene.apache.org
Subject: Re: How to best handle search like Dave & David

If you are trying to serve results as users are typing, then you can use 
EdgeNGramFilter (see 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
).

Let's say you configure your field like this, as shown in the Solr wiki:


   
  
  
   
   
  
   


Then this is what happens at index time for your tokens:

David ---> | LowerCaseTokenizerFactory | ---> david ---> | 
EdgeNGramFilterFactory
| ---> da dav davi david
Dave ---> | LowerCaseTokenizerFactory | ---> dave ---> | EdgeNGramFilterFactory
| ---> da dav dave

And at query time, when your user enters 'Dav' it will match both those tokens. 
Note that the moment your user starts typing more, say 'davi' it won't match 
'Dave' since you are doing edge N gramming only at index time and not at query 
time. You can also do edge N gramming at query time if you want 'Dave' to match 
'David', probably keeping a larger minGramSize (in this case 3) to avoid noise 
(like say 'Dave' matching 'Dana' though with a lower score), but it will be 
expensive to do n-gramming at query time.




On Fri, Feb 28, 2014 at 3:22 PM, Susheel Kumar < 
susheel.ku...@thedigitalgroup.net> wrote:

> Hi,
>
> We have name searches on Solr for millions of documents. User may 
> search like "Morrison Dave" or other may search like "Morrison David".  
> What's the best way to handle that both brings similar results. Adding 
> Synonym is the option we are using right.
>
> But we may need to add around such 50,000+ synonyms for different 
> names for each specific name there can be couple of synonyms like for 
> Richard, it can be Rich, Rick, Richie etc.
>
> Any experience adding so many synonyms or any other thoughts? Stemming 
> may help in few situations but not like Dave and David.
>
> Thanks,
> Susheel
>


RegexTransformer and xpath in DataImportHandler

2014-03-03 Thread eShard
Good afternoon,
I have this DIH:




https://redacted/";
processor="XPathEntityProcessor"
forEach="/rss/channel/item"
   
transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer">


















I can't seem to populate BOTH blogtitle and short_blogtitle with the same
xpath.
I can only do one or the other; why can't I put the same xpath in 2
different fields?
I removed the short_blogtitle (with the xpath statement) and left in the
regex statement and blogtitle gets populated and short_blogtitle goes to my
update.chain (to the auto complete index) but the field itself is blank in
this index.

If I leave the dih as above, then blogtitle doesn't get populated but
short_blogtitle does.

What am I doing wrong here? Is there a way to populate both? 
And I CANNOT use copyfield here because then the update.chain won't work

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/RegexTransformer-and-xpath-in-DataImportHandler-tp4120946.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facets, termvectors, relevancy and Multi word tokenizing

2014-03-03 Thread epnRui
Hi guys,

I'm on my way to solve it properly.

This is how my field looks like now:



  









  

I still have one case where I'm facing issues because in fact I want to
preserve the #:
 - #European Parliament is translated into one token instead of two:
"#European" and "Parliament"... anyway, I have some ideas on how to do it.
Ill let you know whatss the final solution



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4120948.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Elevation and core create

2014-03-03 Thread David Stuart
HI Erick,

Thanks for the response. 
On the wiki it states

config-file
Path to the file that defines query elevation. This file must exist in 
$/conf/ or$/. 

If the file exists in the /conf/ directory it will be loaded once at startup. 
If it exists in the data directory, it will be reloaded for each IndexReader.

Which is the elevate.xml. So looks like I will go down the custom coding route.

Regards,


David Stuart
M  +44(0) 778 854 2157
T   +44(0) 845 519 5465
www.axistwelve.com
Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK

AXIS12 - Enterprise Web Solutions

Reg Company No. 7215135
VAT No. 997 4801 60

This e-mail is strictly confidential and intended solely for the ordinary user 
of the e-mail account to which it is addressed. If you have received this 
e-mail in error please inform Axis12 immediately by return e-mail or telephone. 
We advise that in keeping with good computing practice the recipient of this 
e-mail should ensure that it is virus free. We do not accept any responsibility 
for any loss or damage that may arise from the use of this email or its 
contents.



On 2 Mar 2014, at 18:07, Erick Erickson  wrote:

> Hmmm, you _ought_ to be able to specify a relative path
> in solrconfig_slave.xml:solrconfig.xml,x.xml,y.xml
> 
> But there's certainly the chance that this is hard-coded in
> the query elevation component so I can't say that this'll work
> with assurance.
> 
> Best,
> Erick
> 
> On Sun, Mar 2, 2014 at 6:14 AM, David Stuart  wrote:
>> Hi sorry for the cross post but I got no response in the dev group so 
>> assumed I posted in the wrong place.
>> 
>> 
>> 
>> I am using Solr 3.6 and am trying to automate the deployment of cores with a 
>> custom elevate file. It is proving to be difficult as most of the file 
>> (schema, stop words etc) support absolute path elevate seems to need to be 
>> in either a conf directory as a sibling to data or in the data directory 
>> itself. I am able to achieve my goal by having a secondary process that 
>> places the file but thought I would as the group just in case I have missed 
>> the obvious. Should I move to Solr 4 is it fixed here? I could also go down 
>> the root of extending the SolrCore create function to accept additional 
>> params and move the file into the defined data directory.
>> 
>> Ideas?
>> 
>> Thanks for your help
>> David Stuart
>> M  +44(0) 778 854 2157
>> T   +44(0) 845 519 5465
>> www.axistwelve.com
>> Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK
>> 
>> AXIS12 - Enterprise Web Solutions
>> 
>> Reg Company No. 7215135
>> VAT No. 997 4801 60
>> 
>> This e-mail is strictly confidential and intended solely for the ordinary 
>> user of the e-mail account to which it is addressed. If you have received 
>> this e-mail in error please inform Axis12 immediately by return e-mail or 
>> telephone. We advise that in keeping with good computing practice the 
>> recipient of this e-mail should ensure that it is virus free. We do not 
>> accept any responsibility for any loss or damage that may arise from the use 
>> of this email or its contents.
>> 
>> 
>> 



Re: range types in SOLR

2014-03-03 Thread Smiley, David W.
The main reference for this approach is here:
http://wiki.apache.org/solr/SpatialForTimeDurations


Hoss’s illustrations he developed for the meetup presentation are great.
However, there are bugs in the instruction — specifically it’s important
to slightly buffer the query and choose an appropriate maxDistErr.  Also,
it’s more preferable to use the rectangle range query style of spatial
query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using
“Intersects(minX minY maxX maxY)”.  There’s no technical difference but
the latter is deprecated and will eventually be removed from Solr 5 /
trunk.

All this said, recognize this is a bit of a hack (one that works well).
There is a good chance a more ideal implementation approach is going to be
developed this year.

~ David


On 3/1/14, 2:54 PM, "Shawn Heisey"  wrote:

>On 3/1/2014 11:41 AM, Thomas Scheffler wrote:
>> Am 01.03.14 18:24, schrieb Erick Erickson:
>>> I'm not clear what you're really after here.
>>>
>>> Solr certainly supports ranges, things like time:[* TO date_spec] or
>>> date_field:[date_spec TO date_spec] etc.
>>>
>>>
>>> There's also a really creative use of spatial (of all things) to, say
>>> answer questions involving multiple dates per record. Imagine, for
>>> instance, employees with different hours on different days. You can
>>> use spatial to answer questions like "which employees are available
>>> on Wednesday between 4PM and 8PM".
>>>
>>> And if none of this is relevant, how about you give us some
>>> use-cases? This could well be an XY problem.
>> 
>> Hi,
>> 
>> lets try this example to show the problem. You have some old text that
>> was written in two periods of time:
>> 
>> 1.) 2nd half of 13th century: -> 1250-1299
>> 2.) Beginning of 18th century: -> 1700-1715
>> 
>> You are searching for text that were written between 1300-1699, than
>> this document described above should not be hit.
>> 
>> If you make start date and end date multiple this results in:
>> 
>> start: [1250, 1700]
>> end: [1299, 1715]
>> 
>> A search for documents written between 1300-1699 would be:
>> 
>> (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300
>> TO *]) (+start:[*-1699] +end:[1700 TO *])
>> 
>> You see that the document above would obviously hit by "(+start:[* TO
>> 1300] +end:[1300 TO *])"
>
>This sounds exactly like the spatial use case that Erick just described.
>
>http://wiki.apache.org/solr/SpatialForTimeDurations
>https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117
>/
>
>I am not sure whether the following presentation covers time series with
>spatial, but it does say deep dive.  It's over an hour long, and done by
>David Smiley, who wrote most of the Spatial code in Solr:
>
>http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive
>
>Hopefully someone who has actually used this can hop in and give you
>some additional pointers.
>
>Thanks,
>Shawn
>



Re: Solution for reverse order of year facets?

2014-03-03 Thread Ahmet Arslan
Hi Michael,

Yes you are correct, oldest comes fist. 

There is no built in solution for this.

Two workaround :

1) use facet.limit=-1 and invert the list (faceting response) at client side

2) use multiples facet.query
   a)facet.query=year:[2012 TO 2014]&facet.query=year:[2010 TO 2012] 
   b)facet.query=year:2014&facet.query=year:2013 ...



On Monday, March 3, 2014 5:45 PM, Michael Lackhoff  wrote:
On 03.03.2014 16:33 Ahmet Arslan wrote:

> Currently there are two storing criteria available. However sort by index - 
> to return the constraints sorted in their index order (lexicographic by 
> indexed term) - should return most recent year at top, no?

No, it returns them -- as you say -- in lexicographic order and that
means oldest first, like:
1815
1820
...
2012
2013
(might well stop before we get here)

2014

-Michael



Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Josh
Thanks Tri,

I really appreciate the response. When I get some free time shortly I'll
start giving some of these a try and report back.


On Mon, Mar 3, 2014 at 12:42 PM, Tri Cao  wrote:

> If it's really the interned strings, you could try upgrade JDK, as the
> newer HotSpot
> JVM puts interned strings in regular heap:
>
> http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html
>
> (search
> for String.intern() in that release)
>
> I haven't got a chance to look into the new core auto discovery code, so I
> don't know
> if it's implemented with reflection or not. Reflection and dynamic class
> loading is another
> source of PermGen exception, in my experience.
>
> I don't see anything wrong with your JVM config, which is very much
> standard.
>
> Hope this helps,
> Tri
>
>
> On Mar 03, 2014, at 08:52 AM, Josh  wrote:
>
> In the user core there are two fields, the database core in question was
> 40, but in production environments the database core is dynamic. My time
> has been pretty crazy trying to get this out the door and we haven't tried
> a standard solr install yet but it's on my plate for the test app and I
> don't know enough about Solr/Bitnami to know if they've done any serious
> modifications to it.
>
> I had tried doing a dump from VisualVM previously but it didn't seem to
> give me anything useful but then again I didn't know how to look for
> interned strings. This is something I can take another look at in the
> coming weeks when I do my test case against a standard solr install with
> SolrJ. The exception with user cores happens after 80'ish runs, so 640'ish
> user cores with the PermGen set to 64MB. The database core test was far
> lower, it was in the 10-15 range. As a note once the permgen limit is hit,
> if we simply restart the service with the same number of cores loaded the
> permgen usage is minimal even with the amount of user cores being high in
> our production environment (500-600).
>
> If this does end up being the interning of strings, is there anyway it can
> be mitigated? Our production environment for our heavier users would see in
> the range of 3200+ user cores created a day.
>
> Thanks for the help.
> Josh
>
>
> On Mon, Mar 3, 2014 at 11:24 AM, Tri Cao  wrote:
>
> Hey Josh,
>
> I am not an expert in Java performance, but I would start with dumping a
>
> the heap
>
> and investigate with visualvm (the free tool that comes with JDK).
>
> In my experience, the most common cause for PermGen exception is the app
>
> creates
>
> too many interned strings. Solr (actually Lucene) interns the field names
>
> so if you have
>
> too many fields, it might be the cause. How many fields in total across
>
> cores did you
>
> create before the exception?
>
> Can you reproduce the problem with the standard Solr? Is the bitnami
>
> distribution just
>
> Solr or do they have some other libraries?
>
> Hope this helps,
>
> Tri
>
> On Mar 03, 2014, at 07:28 AM, Josh  wrote:
>
> It's a windows installation using a bitnami solr installer. I incorrectly
>
> put 64M into the configuration for this, as I had copied the test
>
> configuration I was using to recreate the permgen issue we were seeing on
>
> our production system (that is configured to 512M) as it takes awhile with
>
> to recreate the issue with larger permgen values. In the test scenario
>
> there was a small 180 document data core that's static with 8 dynamic user
>
> cores that are used to index the unique document ids in the users view,
>
> which is then merged into a single user core. The final user core contains
>
> the same number of document ids as the data core and the data core is
>
> queried against with the ids in the final merged user core as the limiter.
>
> The user cores are then unloaded, and deleted from the drive and then the
>
> test is reran again with the user cores re-created
>
> We are also using the core discovery mode to store/find our cores and the
>
> database data core is using dynamic fields with a mix of single value and
>
> multi value fields. The user cores use a static configuration. The data is
>
> indexed from SQL Server using jtDS for both the user and data cores. As a
>
> note we also reversed the test case I mention above where we keep the user
>
> cores static and dynamically create the database core and this created the
>
> same issue only it leaked faster. We assumed this because the configuration
>
> was larger/loaded more classes then the simpler user core.
>
> When I get the time I'm going to put together a SolrJ test app to recreate
>
> the issue outside of our environment to see if others see the same issue
>
> we're seeing to rule out any kind of configuration problem. Right now we're
>
> interacting with solr with POCO via the restful interface and it's not very
>
> easy for us to spin this off into something someone else could use. In the
>
> mean time we've made changes to make the user cores mor

Re: Solution for reverse order of year facets?

2014-03-03 Thread Shawn Heisey

On 3/3/2014 7:35 AM, Michael Lackhoff wrote:

If I understand the docs right, it is only possible to sort facets by
count or value in ascending order. Both variants are not very helpful
for year facets if I want the most recent years at the top (or appear at
all if I restrict the number of facet entries).


There's already an issue in Jira.

https://issues.apache.org/jira/browse/SOLR-1672

I can't take a look now, but I will later if someone else hasn't taken 
it up.


Thanks,
Shawn



Re: Solution for reverse order of year facets?

2014-03-03 Thread Michael Lackhoff
Hi Ahmet,

> There is no built in solution for this.

Yes, I know, that's why I would like the TokenFilterFactory

> Two workaround :
> 
> 1) use facet.limit=-1 and invert the list (faceting response) at client side
> 
> 2) use multiples facet.query
>a)facet.query=year:[2012 TO 2014]&facet.query=year:[2010 TO 2012] 
>b)facet.query=year:2014&facet.query=year:2013 ...

I thought about these but they have the disadvantage that 1) could
return hundreds of facet entries. 2b) is better but would need about 30
facet-queries which makes quite a long URL and it wouldn't always work
as expected. There are subjects that were very popular in the past but
with no (or very few) recent publications. For these I would get empty
results for my 2014-1985 facet-queries but miss all the stuff from the
1960s.

>From all these thoughts I came to the conclusion that a custom
TokenFilterFactory could do exactly what I want. In effect it would give
me a reverse sort:
1 - 2014 = 7986
1 - 2013 = 7987
...
The client code can easily regain the original year values for display.

And I think it shouldn't be too difficult to write such a beast, only
problem is I am not a Java programmer. That is why I asked if someone
has done it already or if there is a guide I could use.
After all it is just a simple subtraction...

-Michael



Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread KNitin
Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud
(containing custom parsing, analyzers). But I haven't specifically enabled
any string interning. Does solr intern all strings in a collection by
default?

I agree with doc and Filter Query Cache. Query Result cache hits are
practically 0 for the large collection since our queries are tail by nature


Thanks
Nitin


On Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

> On 3/3/2014 1:54 AM, KNitin wrote:
>
>> 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
>>
> As others have pointed out, this is really unusual for Solr.  We often see
> high permgen in our app servers due to dynamic class loading that the
> framework performs; maybe you are somehow loading lots of new Solr plugins,
> or otherwise creating lots of classes?  Of course if you have a plugin or
> something that does a lot of string interning, that could also be an
> explanation.
>
> -Mike
>


Re: Solution for reverse order of year facets?

2014-03-03 Thread Michael Lackhoff
On 03.03.2014 19:58 Shawn Heisey wrote:

> There's already an issue in Jira.
> 
> https://issues.apache.org/jira/browse/SOLR-1672

Thanks, this is of course the best solution. Only problem is that I use
a custom verson from a vendor (based on version 4.3) I want to enhance.
But perhaps they apply the patch. In the meantime I still think the
custom filter could be a workaround.

> I can't take a look now, but I will later if someone else hasn't taken 
> it up.

That would be great!

Thanks
-Michael



Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread KNitin
Is there a way to dump the contents of permgen and look at which classes
are occupying the most memory in that?

- Nitin


On Mon, Mar 3, 2014 at 11:19 AM, KNitin  wrote:

> Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud
> (containing custom parsing, analyzers). But I haven't specifically enabled
> any string interning. Does solr intern all strings in a collection by
> default?
>
> I agree with doc and Filter Query Cache. Query Result cache hits are
> practically 0 for the large collection since our queries are tail by nature
>
>
> Thanks
> Nitin
>
>
> On Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov <
> msoko...@safaribooksonline.com> wrote:
>
>> On 3/3/2014 1:54 AM, KNitin wrote:
>>
>>> 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
>>>
>> As others have pointed out, this is really unusual for Solr.  We often
>> see high permgen in our app servers due to dynamic class loading that the
>> framework performs; maybe you are somehow loading lots of new Solr plugins,
>> or otherwise creating lots of classes?  Of course if you have a plugin or
>> something that does a lot of string interning, that could also be an
>> explanation.
>>
>> -Mike
>>
>
>


SOLR and Kerberos enabled HDFS

2014-03-03 Thread Jimmy
Hello,

I am trying to connect SOLR (tried 4.4 and 4.7) to kerberos enabled HDFS -
I am using Cloudera CDH 4.2.1
http://maven-repository.com/artifact/com.cloudera.cdh/cdh-root/4.2.1/pom_effective

the keytab and principal is valid (I tested it with flume as well as simple
hdfs cli)


did anobody successfully connect SOLR 4.x to CDH 4.2.1?



${solr.hdfs.security.kerberos.enabled:true}
${solr.hdfs.security.kerberos.keytabfile:/my.keytab}
${
solr.hdfs.security.kerberos.principal:m...@mydomain.com}


I am getting follow error


HTTP Status 500 - {msg=SolrCore 'collection1' is not available due to init
failure: java.io.IOException: Login failure for m...@mydomain.com from keytab
/my.keytab,
trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not
available due to init failure:
java.io.IOException: Login failure for m...@mydomain.com from keytab
/my.keytab
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:251)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Caused by: java.lang.RuntimeException: java.io.IOException: Login failure
for me@MYDOMAIN.COMfrom keytab /my.keytab
at
org.apache.solr.core.HdfsDirectoryFactory.initKerberos(HdfsDirectoryFactory.java:282)
at
org.apache.solr.core.HdfsDirectoryFactory.init(HdfsDirectoryFactory.java:90)
at org.apache.solr.core.SolrCore.initDirectoryFactory(SolrCore.java:443)
at org.apache.solr.core.SolrCore.(SolrCore.java:672)
at org.apache.solr.core.SolrCore.(SolrCore.java:629)
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:622)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:657)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138) ...

... 3 more Caused by: java.io.IOException: Login failure for
m...@mydomain.com from
keytab /my.keytab
at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:825)
at
org.apache.solr.core.HdfsDirectoryFactory.initKerberos(HdfsDirectoryFactory.java:280)

... 16 more Caused by: javax.security.auth.login.LoginException:
java.lang.IllegalArgumentException: Illegal principal name m...@mydomain.com
at org.apache.hadoop.security.User.(User.java:50)
at org.apache.hadoop.security.User.(User.java:43)
at
org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule.commit(UserGroupInformation.java:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
javax.security.auth.login.LoginContext.invoke(LoginContext.java:769)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
at java.security.AccessController.doPrivileged(Native Method)
at
javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
at javax.security.auth.login.LoginContext.login(LoginContext.java:576)
at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:816)
at
org.apache.solr.core.HdfsDirectoryFactory.initKerbe

Wildcard searches and tokenization

2014-03-03 Thread Hayden Muhl
I'm working on a user name autocomplete feature, and am having some issues
with the way we are tokenizing user names.

We're using the StandardTokenizerFactory to tokenize user names, so
"foo-bar" gets split into two tokens. We take input from the user and use
it as a prefix to search on the user name. This means wildcard searches of
"fo*" and "ba*" both return "foo-bar", which is what we want.

We have a problem when someone types in "foo-b" as a prefix. I would like
to split this into "foo" and "b", then use each as a prefix in a wildcard
search. Is there an easy way to tell Solr, "Tokenize this, then do a prefix
search"?

I've written at least one QParserPlugin, so that's an option. Hopefully
there's an easier way I'm unaware of.

- Hayden


What types is supported by Solrj addBean() in the fields of POJO objects?

2014-03-03 Thread T. Kuro Kurosaka
What are supported types of the POJO objects that are sent to 
SolrServer.addBean(obj)?

A quick glance of DocumentObjectBinder seems to suggest that
an arbitrary combination of an Collection, List, ArrayList, array ([]), Map, 
Hashmap,

of primitive types, String and Date is supported, but I'm not too sure. I would 
also
like to know what Solr field types are allowed for each object's (Java) field 
types.
Is there documentation explaining this?

Kuro


Re: Solution for reverse order of year facets?

2014-03-03 Thread Ahmet Arslan
Hi,

Regarding "just a simple subtraction" you do it in indexer code or in a update 
prcessor too. You can either modify original field or you can create an 
additional one. Java-script could be used : 
http://wiki.apache.org/solr/ScriptUpdateProcessor

Ahmet


On Monday, March 3, 2014 9:11 PM, Michael Lackhoff  wrote:
Hi Ahmet,

> There is no built in solution for this.

Yes, I know, that's why I would like the TokenFilterFactory

> Two workaround :
> 
> 1) use facet.limit=-1 and invert the list (faceting response) at client side
> 
> 2) use multiples facet.query
>    a)facet.query=year:[2012 TO 2014]&facet.query=year:[2010 TO 2012] 
>    b)facet.query=year:2014&facet.query=year:2013 ...

I thought about these but they have the disadvantage that 1) could
return hundreds of facet entries. 2b) is better but would need about 30
facet-queries which makes quite a long URL and it wouldn't always work
as expected. There are subjects that were very popular in the past but
with no (or very few) recent publications. For these I would get empty
results for my 2014-1985 facet-queries but miss all the stuff from the
1960s.

From all these thoughts I came to the conclusion that a custom
TokenFilterFactory could do exactly what I want. In effect it would give
me a reverse sort:
1 - 2014 = 7986
1 - 2013 = 7987
...
The client code can easily regain the original year values for display.

And I think it shouldn't be too difficult to write such a beast, only
problem is I am not a Java programmer. That is why I asked if someone
has done it already or if there is a guide I could use.
After all it is just a simple subtraction...


-Michael


Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread Tri Cao
If you just want to see which classes are occupying the most memory in a live JVM,you can do:jmap -permstat I don't think you can dump the contents of PERM space.Hope this helps,TriOn Mar 03, 2014, at 11:41 AM, KNitin  wrote:Is there a way to dump the contents of permgen and look at which classes are occupying the most memory in that?  - Nitin   On Mon, Mar 3, 2014 at 11:19 AM, KNitin  wrote: Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud(containing custom parsing, analyzers). But I haven't specifically enabledany string interning. Does solr intern all strings in a collection bydefault?I agree with doc and Filter Query Cache. Query Result cache hits arepractically 0 for the large collection since our queries are tail by natureThanksNitinOn Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov  wrote:On 3/3/2014 1:54 AM, KNitin wrote:3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)As others have pointed out, this is really unusual for Solr. We oftensee high permgen in our app servers due to dynamic class loading that theframework performs; maybe you are somehow loading lots of new Solr plugins,or otherwise creating lots of classes? Of course if you have a plugin orsomething that does a lot of string interning, that could also be anexplanation.-Mike

Re: Solution for reverse order of year facets?

2014-03-03 Thread Ahmet Arslan
Hi Michael,


I forgot to include what I did for one customer :

1) Using StatsComponent I get min and max values of the field (year)
2) Calculate "smart gap/range values" according to minimum and maximum.
3) Re-issue the same query (for thee second time) that includes a set of 
facet.query.

Ahmet



On Monday, March 3, 2014 10:30 PM, Ahmet Arslan  wrote:
Hi,

Regarding "just a simple subtraction" you do it in indexer code or in a update 
prcessor too. You can either modify original field or you can create an 
additional one. Java-script could be used : 
http://wiki.apache.org/solr/ScriptUpdateProcessor

Ahmet



On Monday, March 3, 2014 9:11 PM, Michael Lackhoff  wrote:
Hi Ahmet,

> There is no built in solution for this.

Yes, I know, that's why I would like the TokenFilterFactory

> Two workaround :
> 
> 1) use facet.limit=-1 and invert the list (faceting response) at client side
> 
> 2) use multiples facet.query
>    a)facet.query=year:[2012 TO 2014]&facet.query=year:[2010 TO 2012] 
>    b)facet.query=year:2014&facet.query=year:2013 ...

I thought about these but they have the disadvantage that 1) could
return hundreds of facet entries. 2b) is better but would need about 30
facet-queries which makes quite a long URL and it wouldn't always work
as expected. There are subjects that were very popular in the past but
with no (or very few) recent publications. For these I would get empty
results for my 2014-1985 facet-queries but miss all the stuff from the
1960s.

From all these thoughts I came to the conclusion that a custom
TokenFilterFactory could do exactly what I want. In effect it would give
me a reverse sort:
1 - 2014 = 7986
1 - 2013 = 7987
...
The client code can easily regain the original year values for display.

And I think it shouldn't be too difficult to write such a beast, only
problem is I am not a Java programmer. That is why I asked if someone
has done it already or if there is a guide I could use.
After all it is just a simple subtraction...


-Michael


Re: network slows when solr is running - help

2014-03-03 Thread Lan
How frequently are you committing? Frequent commits can slow everything down.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/network-slows-when-solr-is-running-help-tp4120523p4120992.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boost query syntax error

2014-03-03 Thread Chris Hostetter

: But this query does not work:
: 
: q={!boost
: b=if(exists(query({!v='user_type:ADMIN'})),10,1)}id:1&rows=1&fl=*,score
: It gives an error like this:

The problem is the way you are trying to nest queries inside of each other 
w/o any sort of quoting -- the parser has no indication that the "b" param 
is "if(exists(query({!v='user_type:ADMIN'})),10,1)" it thinks it' 
"if(exists(query({!v='user_type:ADMIN'" and the rest is confusing it.

If you quote the "b" param to the boost parser, then it should work...

http://localhost:8983/solr/select?q={!boost%20b=%22if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29%22}id:1

...or if you could use variable derefrencing, either of these should 
work...

http://localhost:8983/solr/select?q={!boost%20b=$b}id:1&b=if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29
http://localhost:8983/solr/select?q={!boost%20b=if(exists(query($nestedq)),10,1)}id:1&nestedq=foo_s:ADMIN


-Hoss
http://www.lucidworks.com/


Re[2]: query parameters

2014-03-03 Thread Andreas Owen
ok i like the logic, you can do much more. i think this should do it for me:

         (-organisations:["" TO *] -roles:["" TO *]) (+organisations:(150 42) 
+roles:(174 72))


i want to use this in fq and i need to set the operator to OR. My q.op is AND 
but I need OR in fq. I have read about ofq but that is for putting OR between 
multiple fq. Can I set the operator for fq?

The statement should find all docs without organisations and roles or those 
that have at least one roles and organisations entry. these fields are 
multivalued.

-Original-Nachricht- 
> Von: "Erick Erickson"  
> An: solr-user@lucene.apache.org 
> Datum: 19/02/2014 04:09 
> Betreff: Re: query parameters 
> 
> Solr/Lucene query language is NOT strictly boolean, see
> Chris's excellent blog here:
> http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/
> 
> Best,
> Erick
> 
> 
> On Tue, Feb 18, 2014 at 11:54 AM, Andreas Owen  wrote:
> 
> > I tried it in solr admin query and it showed me all the docs without a
> > value
> > in ogranisations and roles. It didn't matter if i used a base term, isn't
> > that give through the q-parameter?
> >
> > -Original Message-
> > From: Raymond Wiker [mailto:rwi...@gmail.com]
> > Sent: Dienstag, 18. Februar 2014 13:19
> > To: solr-user@lucene.apache.org
> > Subject: Re: query parameters
> >
> > That could be because the second condition does not do what you think it
> > does... have you tried running the second condition separately?
> >
> > You may have to add a "base term" to the second condition, like what you
> > have for the "bq" parameter in your config file; i.e, something like
> >
> > (*:* -organisations:["" TO *] -roles:["" TO *])
> >
> >
> >
> >
> > On Tue, Feb 18, 2014 at 12:16 PM, Andreas Owen  wrote:
> >
> > > It seams that fq doesn't except OR because: (organisations:(150 OR 41)
> > > AND
> > > roles:(174)) OR  (-organisations:["" TO *] AND -roles:["" TO *]) only
> > > returns docs that match the first conditions. it doesn't return any
> > > docs with the empty fields organisations and roles.
> > >
> > > -Original Message-
> > > From: Andreas Owen [mailto:a...@conx.ch]
> > > Sent: Montag, 17. Februar 2014 05:08
> > > To: solr-user@lucene.apache.org
> > > Subject: query parameters
> > >
> > >
> > > in solrconfig of my solr 4.3 i have a userdefined requestHandler. i
> > > would like to use fq to force the following conditions:
> > >    1: organisations is empty and roles is empty
> > >    2: organisations contains one of the commadelimited list in
> > > variable $org
> > >    3: roles contains one of the commadelimited list in variable $r
> > >    4: rule 2 and 3
> > >
> > > snipet of what i got (havent checked out if the is a "in" operator
> > > like in sql for the list value)
> > >
> > > 
> > >        explicit
> > >        10
> > >        edismax
> > >            true
> > >            plain_text^10 editorschoice^200
> > >                 title^20 h_*^14
> > >                 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
> > >                 contentmanager^5 links^5
> > >                 last_modified^5 url^5
> > >            
> > >            (organisations='' roles='') or
> > > (organisations=$org roles=$r) or (organisations='' roles=$r) or
> > > (organisations=$org roles='')
> > >            (expiration:[NOW TO *] OR (*:*
> > > -expiration:*))^6  
> > >            div(clicks,max(displays,1))^8 
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >





Re: Configuration problem

2014-03-03 Thread Shawn Heisey

On 3/3/2014 9:02 AM, Thomas Fischer wrote:

The setting is
solr directories (I use different solr versions at the same time):
/srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the new 
"discovery type" (no cores), and inside the core directories are empty files 
core.properties and symbolic links to the universal conf directory.
  
solr webapps (I use very different webapps simultaneously):

/srv/www/webapps/solr/solr4.6.1 is the solr webapp

I tried to convey this information to the tomcat server by putting a file 
solr4.6.1.xml into the cataiina/localhost folder with the contents






Your message is buried deep in another message thread about NoSQL, 
because you replied to an existing message rather than starting a new 
message to solr-user@lucene.apache.org.  On list-mirroring forums like 
Nabble, nobody will even see your message (or this reply) unless they 
actually open that other thread.  This is what it looks like on a 
threading mail reader (Thunderbird):


https://www.dropbox.com/s/87ilv7jls7y5gym/solr-reply-thread.png

I don't use Tomcat, so I can't even begin to comment on that.  I can 
talk about your solr home setting and what Solr is going to do with that.


You probably do not have /srv/solr/solr4.6.1/solr.xml on your system.  
Solr will look for solr.mxl in your solr home, and if it cannot find it, 
it assumes that you are not running multicore, so it look for things 
like collection1/conf/solrconfig.xml instead.


There is a solr.xml in the example.  Use that, changing as necessary, or 
create a solr.xml file with just the following line in it.  It will 
probably start working:




You *might* need the following instead, but since Solr uses standard XML 
parsing libraries, I would guess that the above line will work.





Thanks,
Shawn



is it possible to consolidate filterquery cache strings

2014-03-03 Thread solr-user
lets say I have a largish set of data (120M docs) and that I am partitioning
my data by groups of states (using the state codes)

Someone suggested that I could use the following format in my solrconfig.xml
when defining the filterqueries work:


  

  *:*
  State:AL
  State:AK
...
  State:WY
  


Would that work, and if so how would I know that the cache is being hit?

Or do I need to use the following traditional syntax instead:


  

  *:*
  State:AL


  *:*
  State:AK

...

  *:*
  State:WY

  


any help appreciated



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Startup

2014-03-03 Thread KNitin
A quick ping on this. To give more stats, I have 100's of collections on
every node. The time it takes for one collection to boot up /loadonStartup
is around 10-20 seconds ("and sometimes even 1 minute). I do not have any
query auto warming etc. On a per collection basis I load a bunch of
libraries (for custom analyzer plugins) to compute the classpath. That
might be a reason for the high boot up time

  My solrconfig.xml entry is as follows

  

 Every core that boots up seems to be loading all jars over and over again.
Is there a way to ask solr to load all jars only once?

Thanks
- Nitin


On Wed, Feb 26, 2014 at 3:06 PM, KNitin  wrote:

> Thanks, Shawn. I will try to upgrade solr soon
>
> Reg firstSearcher: I think it does nothing now. I have configured to use
> ExternalFileLoader but there the external file has no contents. Most of the
> queries hitting the collection are expensive and tail queries. What will be
> your recommendation to warm the first Searcher/new Searcher?
>
> Thanks
> Nitin
>
>
> On Tue, Feb 25, 2014 at 4:12 PM, Shawn Heisey  wrote:
>
>> On 2/25/2014 4:30 PM, KNitin wrote:
>>
>>> Jeff :  Thanks. I have tried reload before but it is not reliable
>>> (atleast
>>> in 4.3.1). A few cores get initialized and few dont (show as just
>>> recovering or down) and hence had to move away from it. Is it a known
>>> issue
>>> in 4.3.1?
>>>
>>
>> With Solr 4.3.1, you are running into this bug with reloads under
>> SolrCloud:
>>
>> https://issues.apache.org/jira/browse/SOLR-4805
>>
>> The only way to recover from this bug is to restart Solr.The bug is fixed
>> in 4.4.0 and later.
>>
>>
>>  Shawn,Otis,Erick
>>>
>>>   Yes I have reviewed the page before and have given 1/4 of my mem to JVM
>>> and the rest to RAM/Os Cache. (15 Gb heap and 45 G to rest. Totally 60G
>>> machine). I have also reviewed the tlog file and they are in the order of
>>> KB (4-10 or 30). I have SSD and the reads are hardly noticable (in the
>>> order of 100Kb during that time frame). I have also disabled swap on all
>>> machines
>>>
>>> Regarding firstSearcher, It is currently set to externalFileLoader. What
>>> is
>>> the use of first searcher? I havent played around with it
>>>
>>
>> I don't think it's a good idea to have extensive warming queries.  I do
>> exactly one query in firstSearcher and newSearcher: a query for all
>> documents with zero rows, sorted on our most common sort field.  This is
>> designed purely to preload the sort data into the FieldCache.
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: SolrCloud Startup

2014-03-03 Thread Shawn Heisey

On 3/3/2014 3:30 PM, KNitin wrote:

A quick ping on this. To give more stats, I have 100's of collections on
every node. The time it takes for one collection to boot up /loadonStartup
is around 10-20 seconds ("and sometimes even 1 minute). I do not have any
query auto warming etc. On a per collection basis I load a bunch of
libraries (for custom analyzer plugins) to compute the classpath. That
might be a reason for the high boot up time

   My solrconfig.xml entry is as follows

   

  Every core that boots up seems to be loading all jars over and over again.
Is there a way to ask solr to load all jars only once?


Three steps:

1) Get rid of all your  directives in solrconfig.xml entirely.
2) Copy all the extra jars that you need into ${solr.solr.home}/lib.
3) Remove any "sharedLib" parameter from your solr.xml file.

Step 3 is required because you are on 4.3.1 (or later if you have 
already upgraded).


The final comment on the following issue summarizes issues that I ran 
into while migrating this approach from 4.2.1 to later releases:


https://issues.apache.org/jira/browse/SOLR-4852

Thanks,
Shawn



Re: Configuration problem

2014-03-03 Thread Thomas Fischer
Am 03.03.2014 um 22:43 schrieb Shawn Heisey:

> On 3/3/2014 9:02 AM, Thomas Fischer wrote:
>> The setting is
>> solr directories (I use different solr versions at the same time):
>> /srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the 
>> new "discovery type" (no cores), and inside the core directories are empty 
>> files core.properties and symbolic links to the universal conf directory.
>>  solr webapps (I use very different webapps simultaneously):
>> /srv/www/webapps/solr/solr4.6.1 is the solr webapp
>> 
>> I tried to convey this information to the tomcat server by putting a file 
>> solr4.6.1.xml into the cataiina/localhost folder with the contents
>> 
>> > crossContext="true">
>>  > value="/srv/solr/solr4.6.1" override="true"/>
>> 
> 
> Your message is buried deep in another message thread about NoSQL, because 
> you replied to an existing message rather than starting a new message to 
> solr-user@lucene.apache.org.  On list-mirroring forums like Nabble, nobody 
> will even see your message (or this reply) unless they actually open that 
> other thread.  This is what it looks like on a threading mail reader 
> (Thunderbird):
> 
> https://www.dropbox.com/s/87ilv7jls7y5gym/solr-reply-thread.png

Yes, I'm sorry, I only afterwards realized that my question inherited the 
thread from the E-Mail I was reading and using as a template for the answer.

Meanwhile I figured out that I overlooked the third place to define solr home 
for Tomcat (after JAVA_OPTS and JNDI): web.xml in WEB-INF of the given webapp.
This overrides the other definitions and created the impression that I couldn't 
set  solr home.

But now I get the message
"Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml"
for the core "geo".
In the solr wiki I read (http://wiki.apache.org/solr/ConfiguringSolr):
"In each core, Solr will look for a conf/solrconfig.xml file" and expected solr 
to look for
/srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml (which exists), but obviously 
it doesn't.
Why? My misunderstanding?

Best
Thomas





Re: is it possible to consolidate filterquery cache strings

2014-03-03 Thread solr-user
note: by partitioning I mean that I have sharded the 120M docs into 9 Solr
partitions (each on a separate server)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121012.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Startup

2014-03-03 Thread KNitin
Thanks, Shawn.  Right now my solr.solr.home is not being passed from the
java runtime

Lets say /mnt/solr/ is my solr root. I can add all jars to /mnt/solr/lib/
and use -Dsolr.solr.home=/mnt/solr/  , that should do it right?

Thanks
Nitin


On Mon, Mar 3, 2014 at 2:44 PM, Shawn Heisey  wrote:

> On 3/3/2014 3:30 PM, KNitin wrote:
>
>> A quick ping on this. To give more stats, I have 100's of collections on
>> every node. The time it takes for one collection to boot up /loadonStartup
>> is around 10-20 seconds ("and sometimes even 1 minute). I do not have any
>> query auto warming etc. On a per collection basis I load a bunch of
>> libraries (for custom analyzer plugins) to compute the classpath. That
>> might be a reason for the high boot up time
>>
>>My solrconfig.xml entry is as follows
>>
>>
>>
>>   Every core that boots up seems to be loading all jars over and over
>> again.
>> Is there a way to ask solr to load all jars only once?
>>
>
> Three steps:
>
> 1) Get rid of all your  directives in solrconfig.xml entirely.
> 2) Copy all the extra jars that you need into ${solr.solr.home}/lib.
> 3) Remove any "sharedLib" parameter from your solr.xml file.
>
> Step 3 is required because you are on 4.3.1 (or later if you have already
> upgraded).
>
> The final comment on the following issue summarizes issues that I ran into
> while migrating this approach from 4.2.1 to later releases:
>
> https://issues.apache.org/jira/browse/SOLR-4852
>
> Thanks,
> Shawn
>
>


solrconfig.xml

2014-03-03 Thread Thomas Fischer
Hello,

I'm sorry to repeat myself but I didn't manage to get out of the thread I 
inadvertently slipped into.

My problem now is this:
I have a core "geo" (with an empty file core.properties inside) and 
solrconfig.xml at
/srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml
following the hint from the solr wiki  
(http://wiki.apache.org/solr/ConfiguringSolr):
"In each core, Solr will look for a conf/solrconfig.xml file"
But I get the error message:
"Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml"
Why? My misunderstanding?

Best
Thomas


Re: is it possible to consolidate filterquery cache strings

2014-03-03 Thread Chris Hostetter


: Would that work, and if so how would I know that the cache is being hit?

It should work -- filters are evaluated independently, so the fact that 
you are using all of them in query query (vs all of them in individual 
queries) won't change anything as far as the filterCache goes.

You can prove that it works by looking at the cache stats (available 
from the Admin UI) after opening a new searcher and verifying that they 
are all in the new caches.  you can also then do a query for soemthing 
like "q=foo&fq=State:AK" and reload the cache stats and see a "hit" on 
your filterCcahe.

: Or do I need to use the following traditional syntax instead:

The only reason to break them all out like that is if you in addition to 
populating the *filterCache* you also want to populate the 
*queryResultCache* with ~50 queries for "*:*" each with a different "fq" 
applied.



-Hoss
http://www.lucidworks.com/


Re: Boost query syntax error

2014-03-03 Thread Arun Rangarajan
All of them work like a charm! Thanks, Chris.


On Mon, Mar 3, 2014 at 1:28 PM, Chris Hostetter wrote:

>
> : But this query does not work:
> :
> : q={!boost
> : b=if(exists(query({!v='user_type:ADMIN'})),10,1)}id:1&rows=1&fl=*,score
> : It gives an error like this:
>
> The problem is the way you are trying to nest queries inside of each other
> w/o any sort of quoting -- the parser has no indication that the "b" param
> is "if(exists(query({!v='user_type:ADMIN'})),10,1)" it thinks it'
> "if(exists(query({!v='user_type:ADMIN'" and the rest is confusing it.
>
> If you quote the "b" param to the boost parser, then it should work...
>
>
> http://localhost:8983/solr/select?q={!boost%20b=%22if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29%22}id:1
>
> ...or if you could use variable derefrencing, either of these should
> work...
>
>
> http://localhost:8983/solr/select?q={!boost%20b=$b}id:1&b=if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29
>
> http://localhost:8983/solr/select?q={!boost%20b=if(exists(query($nestedq)),10,1)}id:1&nestedq=foo_s:ADMIN
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: solrconfig.xml

2014-03-03 Thread Alexandre Rafalovitch
File permissions? Malformed XML? Are there any other exceptions
earlier in the log? If you substitute that file with one from example
distribution, does it work?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Mar 4, 2014 at 6:07 AM, Thomas Fischer  wrote:
> Hello,
>
> I'm sorry to repeat myself but I didn't manage to get out of the thread I 
> inadvertently slipped into.
>
> My problem now is this:
> I have a core "geo" (with an empty file core.properties inside) and 
> solrconfig.xml at
> /srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml
> following the hint from the solr wiki  
> (http://wiki.apache.org/solr/ConfiguringSolr):
> "In each core, Solr will look for a conf/solrconfig.xml file"
> But I get the error message:
> "Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml"
> Why? My misunderstanding?
>
> Best
> Thomas


Re: is it possible to consolidate filterquery cache strings

2014-03-03 Thread solr-user
would not breaking the FQs out by state be faster for warming up the fq
caches?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121030.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solrconfig.xml

2014-03-03 Thread Chris Hostetter

: I have a core "geo" (with an empty file core.properties inside) and 
solrconfig.xml at
: /srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml
...
: But I get the error message:
: "Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml"

1) what does your solr.xml file look like?
2) what does cores/geo/core.properties look like?
3) do you get any other errors before this one in your log?
4) what kind of file permissions are set on "cores", "cores/geo", 
"cores/geo/conf", etc... ?


It's possible that this just a mistake in the error message after some 
"real" error with your actual geo/conf/solrconfig.xml has already been 
logged.  Or it's possible that solr couldn't read geo/conf/solrconfig.xml 
(permissions) and tried to fallback by looking for geo/solrconfig.xml (we 
used to do that, look in the instanceDir as a last resort -- not sure if 
the code is still in there) and you're just looking at the last errror.


-Hoss
http://www.lucidworks.com/


Re: java.lang.Exception: Conflict with StreamingUpdateSolrServer

2014-03-03 Thread Chris Hostetter

: Subject: java.lang.Exception: Conflict with StreamingUpdateSolrServer

the fact that you are using StreamingUpdateSolrServer isn't really a 
factor here -- what matters is the data you are sending to solr in the 
updates...

: location=StreamingUpdateSolrServer line=162 Status for: null is 409
...
: Conflict

A "409" HTTP Status is a "Conflict".  

It means that Optimistic concurrency failed.  Your update indicated a 
document version but the version of hte document on the server didn't meet 
the version requirements...

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency



-Hoss
http://www.lucidworks.com/


Re: Searching with special chars

2014-03-03 Thread deniz
So as there was no quick work around to this issue, we simply change the http
method from get to post, to avoid further problems which could be triggered
by user input too. though this violates the restful standards... at least we
have something running properly



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-with-special-chars-tp4120047p4121043.html
Sent from the Solr - User mailing list archive at Nabble.com.


Please add me to wiki contributors

2014-03-03 Thread Susheel Kumar
Hi,

Can you please add me to wiki contributors. I wanted to add some stats on Linux 
vs Windows we came across recently, CSV update handler examples,  and also 
wanted to add company name to public server page.

Thanks,
Susheel


Automate search results filtering based on scoring

2014-03-03 Thread Susheel Kumar
Hi,

We are looking to automate searches (name searches) & filter out the results 
based on some scoring confidence. Any suggestions on what different approaches 
we can use to pick only top closer matches and filter out rest of the results.


Thanks,
Susheel



Re: java.lang.Exception: Conflict with StreamingUpdateSolrServer

2014-03-03 Thread Gopal Patwa
Thanks Chirs,  I found in our application code it was related to optimistic
concurrency failure.


On Mon, Mar 3, 2014 at 6:13 PM, Chris Hostetter wrote:

>
> : Subject: java.lang.Exception: Conflict with StreamingUpdateSolrServer
>
> the fact that you are using StreamingUpdateSolrServer isn't really a
> factor here -- what matters is the data you are sending to solr in the
> updates...
>
> : location=StreamingUpdateSolrServer line=162 Status for: null is 409
> ...
> : Conflict
>
> A "409" HTTP Status is a "Conflict".
>
> It means that Optimistic concurrency failed.  Your update indicated a
> document version but the version of hte document on the server didn't meet
> the version requirements...
>
>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


RE: Please add me to wiki contributors

2014-03-03 Thread Susheel Kumar
My user name is SusheelKumar for solr wiki.

-Original Message-
From: Susheel Kumar [mailto:susheel.ku...@thedigitalgroup.net] 
Sent: Monday, March 03, 2014 9:36 PM
To: solr-user@lucene.apache.org
Subject: Please add me to wiki contributors

Hi,

Can you please add me to wiki contributors. I wanted to add some stats on Linux 
vs Windows we came across recently, CSV update handler examples,  and also 
wanted to add company name to public server page.

Thanks,
Susheel


Re: range types in SOLR

2014-03-03 Thread Thomas Scheffler

Am 03.03.2014 19:12, schrieb Smiley, David W.:

The main reference for this approach is here:
http://wiki.apache.org/solr/SpatialForTimeDurations


Hoss’s illustrations he developed for the meetup presentation are great.
However, there are bugs in the instruction — specifically it’s important
to slightly buffer the query and choose an appropriate maxDistErr.  Also,
it’s more preferable to use the rectangle range query style of spatial
query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using
“Intersects(minX minY maxX maxY)”.  There’s no technical difference but
the latter is deprecated and will eventually be removed from Solr 5 /
trunk.

All this said, recognize this is a bit of a hack (one that works well).
There is a good chance a more ideal implementation approach is going to be
developed this year.


Thank you,

having a working example is great but having a practically working 
example that hides this implementation detail would even better.


I would like to store:

2014-03-04T07:05:12,345Z, 2014-03-04, 2014-03 and 2014 into one field 
and make queries on that field.


Currently I have to normalize all to the first format (inventing 
information). That is only the worst approximation. Normalize them to a 
range would be the best in my opinion. So a query like "date:2014" would 
hit all but also "date:[2014-01 TO 2014-03]".


kind regards,

Thomas


Re: SOLRJ and SOLR compatibility

2014-03-03 Thread Thomas Scheffler

Am 27.02.2014 09:15, schrieb Shawn Heisey:

On 2/27/2014 12:49 AM, Thomas Scheffler wrote:

What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
that I'm completely ignorant here, but I have not heard of any.


Actually bug reports arrive me that sound like

"Unknown type 19"


Aha!  I found it!  It was caused by the change applied for SOLR-5658,
fixed in 4.7.0 (just released) by SOLR-5762.  Just my luck that there's
a bug bad enough to contradict what I told you.

https://issues.apache.org/jira/browse/SOLR-5658
https://issues.apache.org/jira/browse/SOLR-5762

I've added a comment that will help users find SOLR-5762 with a search
for "Unknown type 19".

If you use SolrJ 4.7.0, compatibility should be better.


Hi,

I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR 
4.5.1. I received a client stack trace this morning and still waiting 
for a Log-Output from the Server:


--
ERROR unable to submit tasks
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Unknown type 19
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
--

There is not much information in that Stacktrace, I know.
I'll send further information, when I receive more. In the mean time I 
asked our customer not to upgrade the SOLR server to resolve the issue. 
So we could dig deeper.


kind regards,

Thomas


Re: SOLRJ and SOLR compatibility

2014-03-03 Thread Thomas Scheffler

Am 04.03.2014 07:21, schrieb Thomas Scheffler:

Am 27.02.2014 09:15, schrieb Shawn Heisey:

On 2/27/2014 12:49 AM, Thomas Scheffler wrote:

What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
that I'm completely ignorant here, but I have not heard of any.


Actually bug reports arrive me that sound like

"Unknown type 19"


Aha!  I found it!  It was caused by the change applied for SOLR-5658,
fixed in 4.7.0 (just released) by SOLR-5762.  Just my luck that there's
a bug bad enough to contradict what I told you.

https://issues.apache.org/jira/browse/SOLR-5658
https://issues.apache.org/jira/browse/SOLR-5762

I've added a comment that will help users find SOLR-5762 with a search
for "Unknown type 19".

If you use SolrJ 4.7.0, compatibility should be better.


Hi,

I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR
4.5.1. I received a client stack trace this morning and still waiting
for a Log-Output from the Server:


Here we go for the server side (4.5.1):

Mrz 03, 2014 2:39:26 PM org.apache.solr.core.SolrCore execute
Information: [clausthal_test] webapp=/solr path=/select
params={fl=*,score&sort=mods.dateIssued+desc&q=%2BobjectType:"mods"+%2Bcategory:"clausthal_status\:published"&wt=javabin&version=2&rows=3}
hits=186 status=0 QTime=2
Mrz 03, 2014 2:39:38 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
Information: [clausthal_test] webapp=/solr path=/update
params={wt=javabin&version=2} {} 0 0
Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log
Schwerwiegend: java.lang.RuntimeException: Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log
Schwerwiegend: null:java.lang.RuntimeException: Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
at
o