Re: Deleting Fields

2015-06-01 Thread Charlie Hull

On 30/05/2015 00:30, Shawn Heisey wrote:

On 5/29/2015 5:08 PM, Joseph Obernberger wrote:

Hi All - I have a lot of fields to delete, but noticed that once I
started deleting them, I quickly ran out of heap space.  Is
delete-field a memory intensive operation?  Should I delete one field,
wait a while, then delete the next?


I'm not aware of a way to delete a field.  I may have a different
definition of what a field is than you do, though.

Solr lets you delete entire documents, but deleting a field from the
entire index would involve re-indexing every document in the index,
excluding that field.

Can you be more specific about exactly what you are doing, what you are
seeing, and what you want to see instead?

Also, please be aware of this:

http://people.apache.org/~hossman/#threadhijack

Thanks,
Shawn


Here's a rather old post on how we did something similar:
http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Number of clustering labels to show

2015-06-01 Thread Alessandro Benedetti
Only to clarify the initial mail, The carrot.fragSize has nothing to do
with the number of clusters produced.

When you select to work with field summary ( you will work only on snippets
from the original content, snippets produced by the highlight of the query
in the content), the fragSize will specify the size of these fragments.

>From Carrot documentation :

carrot.produceSummary

When true, the carrot.snippet
 field (if
no snippet field, then the carrot.title
 field) will
be highlighted and the highlighted text will be used for clustering.
Highlighting is recommended when the snippet field contains a lot of
content. Highlighting can also increase the quality of clustering because
the clustered content will get an additional query-specific context.
carrot.fragSize

The frag size to use for highlighting. Meaningful only when
carrot.produceSummary
 is
true. If not specified, the default highlighting fragsize (hl.fragsize)
will be used. If that isn't specified, then 100.


Cheers

2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo :

> Thank you Stanislaw for the links. Will read them up to better understand
> how the algorithm works.
>
> Regards,
> Edwin
>
> On 29 May 2015 at 17:22, Stanislaw Osinski <
> stanislaw.osin...@carrotsearch.com> wrote:
>
> > Hi,
> >
> > The number of clusters primarily depends on the parameters of the
> specific
> > clustering algorithm. If you're using the default Lingo algorithm, the
> > number of clusters is governed by
> > the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a
> look
> > at the documentation (
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
> > )
> > for some more details (the "Tweaking at Query-Time" section shows how to
> > pass the specific parameters at request time). A complete overview of the
> > Lingo clustering algorithm parameters is here:
> > http://doc.carrot2.org/#section.component.lingo.
> >
> > Stanislaw
> >
> > --
> > Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
> > http://carrotsearch.com
> >
> > On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I'm trying to increase the number of cluster result to be shown during
> > the
> > > search. I tried to set carrot.fragSize=20 but only 15 cluster labels is
> > > shown. Even when I tried to set carrot.fragSize=5, there's also 15
> labels
> > > shown.
> > >
> > > Is this the correct way to do this? I understand that setting it to 20
> > > might not necessary mean 20 lables will be shown, as the setting is for
> > > maximum number. But when I set this to 5, it should reduce the number
> of
> > > labels to 5?
> > >
> > > I'm using Solr 5.1.
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


AW: Occasionally getting error in solr suggester component.

2015-06-01 Thread Clemens Wyss DEV
Lucene 5.1:
I am (also) facing 
"java.lang.IllegalStateException: suggester was not built"

At the very moment no new documents seem tob e added to the index/core. Will a 
reboot "sanitize" the index/core?

I (still) have 
 name="buildOnCommit">true

How can I tell Solr to peridoically update the suggestions? If not possible per 
configuration (in solrconfig.xml), what ist he preferred approach through SolrJ?

Thx
Clemens

-Ursprüngliche Nachricht-
Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com] 
Gesendet: Donnerstag, 15. Januar 2015 19:52
An: solr-user@lucene.apache.org
Betreff: Re: Occasionally getting error in solr suggester component.

That sounds like a good approach to me.  Of course it depends how often you 
commit, and what your tolerance is for delay in having suggestions appear, but 
it sounds as if you have a good understanding of the tradeoffs there.

-Mike

On 1/15/15 10:31 AM, Dhanesh Radhakrishnan wrote:
> Hi,
>  From Solr 4.7 onwards, the implementation of this Suggester is 
> changed. The old SpellChecker based search component is replaced with 
> a new suggester that utilizes Lucene suggester module. The latest Solr 
> download is preconfigured with this new suggester I;m using Solr 4.10 
> and suggestion are based on query  /suggest instead of /spell.
> So what I did is that in changed to  name="buildOnCommit">false Its not good that each time rebuild 
> the index on  commit , however, I would like to build the index on 
> certain time period, say 1 hour.
> The lookup data will be built only when requested by URL parameter 
> suggest.build=true
>
> "http://localhost:8983/solr/ha/suggest?suggest.build=true";
>
> So this will rebuild the index again and the changes will reflect in 
> the suggester.
>
> There are certain pros and cons for this.
> Issue is that the change will reflect only on certain time interval, 
> here 1 hour. Advantage is that we can avoid the  rebuilt index  on 
> every commit or optimize.
>
> Is this the right way ?? or any that I missed ???
>
> Regards
> dhanesh s.r
>
>
>
>
> On Thu, Jan 15, 2015 at 3:20 AM, Michael Sokolov < 
> msoko...@safaribooksonline.com> wrote:
>
>> did you build the spellcheck index using spellcheck.build as 
>> described
>> here: https://cwiki.apache.org/confluence/display/solr/Spell+Checking ?
>>
>> -Mike
>>
>>
>> On 01/14/2015 07:19 AM, Dhanesh Radhakrishnan wrote:
>>
>>> Hi,
>>> Thanks for the reply.
>>> As you mentioned in the previous mail I changed buildOnCommit=false 
>>> in solrConfig.
>>> After that change, suggestions are not working.
>>> In Solr 4.7 introduced a new approach based on a dedicated 
>>> SuggestComponent I'm using that component to build suggestions and 
>>> lookup implementation is "AnalyzingInfixLookupFactory"
>>> Is there any work around ??
>>>
>>>
>>>
>>>
>>> On Wed, Jan 14, 2015 at 12:47 AM, Michael Sokolov < 
>>> msoko...@safaribooksonline.com> wrote:
>>>
>>>   I think you are probably getting bitten by one of the issues 
>>> addressed in
 LUCENE-5889

 I would recommend against using buildOnCommit=true - with a large 
 index this can be a performance-killer.  Instead, build the index 
 yourself using the Solr spellchecker support 
 (spellcheck.build=true)

 -Mike


 On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote:

   Hi all,
> I am experiencing a problem in Solr SuggestComponent Occasionally 
> solr suggester component throws an  error like
>
> Solr failed:
> {"responseHeader":{"status":500,"QTime":1},"error":{"msg":"suggest
> er
> was
> not built","trace":"java.lang.IllegalStateException: suggester was 
> not built\n\tat 
> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
> lookup(AnalyzingInfixSuggester.java:368)\n\tat
> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
> lookup(AnalyzingInfixSuggester.java:342)\n\tat
> org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\
> tat org.apache.solr.spelling.suggest.SolrSuggester.
> getSuggestions(SolrSuggester.java:199)\n\tat
> org.apache.solr.handler.component.SuggestComponent.
> process(SuggestComponent.java:234)\n\tat
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(
> SearchHandler.java:218)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:135)\n\tat
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
> handleRequest(RequestHandlers.java:246)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.execute(
> SolrDispatchFilter.java:777)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:418)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:207)\n\tat
> org.apache.catalina.core.ApplicationFilterChain.in

UI Admin - and "stored=false" fields

2015-06-01 Thread Sznajder ForMailingList
Hi

I am indexing some content under "text" field.
In the schema.xml "text" field is defined as :


   


However, when I am looking to the documents via the UI
http://localhost:8983/solr/#/sec_600b/query

I see the text field content in the returned documents.

Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
in admin ui) is expected?

thanks!

Benjamin.


Synonyms within FQ

2015-06-01 Thread John Blythe
morning everyone,

i'm attempting to find related documents based on a manufacturer's
competitor. as such i'm querying against the 'description' field with
manufacturer1's product description but running a filter query with
manufacturer2's name against the 'mfgname' field.

one of the ways that we help boost our document finding is with a synonym
dictionary for manufacturer names. many of the larger players have multiple
divisions, have absorbed smaller companies, etc. so we need all of their
potential names to map to our record.

i may be wrong, but from my initial testing it doesn't seem to be applying
to a fq. is there any way of doing this?

thanks-


Re: Synonyms within FQ

2015-06-01 Thread John Blythe
after further investigation it looks like the synonym i was testing against
was only associated with one of their multiple divisions (despite being the
most common name for them!). it looks like this may clear the issue up, but
thanks anyway!

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Mon, Jun 1, 2015 at 8:33 AM, John Blythe  wrote:

> morning everyone,
>
> i'm attempting to find related documents based on a manufacturer's
> competitor. as such i'm querying against the 'description' field with
> manufacturer1's product description but running a filter query with
> manufacturer2's name against the 'mfgname' field.
>
> one of the ways that we help boost our document finding is with a synonym
> dictionary for manufacturer names. many of the larger players have multiple
> divisions, have absorbed smaller companies, etc. so we need all of their
> potential names to map to our record.
>
> i may be wrong, but from my initial testing it doesn't seem to be applying
> to a fq. is there any way of doing this?
>
> thanks-
>


Re: Deleting Fields

2015-06-01 Thread Joseph Obernberger

Hi - we are using 64bit OS and 64bit JVM.  The JVM settings are currently:
-
-DSTOP.KEY=solrrocks
-DSTOP.PORT=8100
-Dhost=helios
-Djava.net.preferIPv4Stack=true
-Djetty.port=9100
-DnumShards=27
-Dsolr.clustering.enabled=true
-Dsolr.install.dir=/opt/solr
-Dsolr.lock.type=hdfs
-Dsolr.solr.home=/opt/solr/server/solr
-Duser.timezone=UTC-DzkClientTimeout=15000
-DzkHost=eris.querymasters.com:2181,daphnis.querymasters.com:2181,triton.querymasters.com:2181,oberon.querymasters.com:2181,portia.querymasters.com:2181,puck.querymasters.com:2181/solr5
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseLargePages
-XX:+UseParNewGC-XX:CMSFullGCsBeforeCompaction=1
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:CMSTriggerPermRatio=80
-XX:ConcGCThreads=8
-XX:MaxDirectMemorySize=26g
-XX:MaxTenuringThreshold=8
-XX:NewRatio=3
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 9100 /opt/solr/server/logs
-XX:ParallelGCThreads=8
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-Xloggc:/opt/solr/server/logs/solr_gc.log
-Xms8g
-Xmx16g
-Xss256k
-verbose:gc
-

At the time out of the OOM error, Xmx was set to 10g.  OS limits, for 
the most part, are 'factory' Scientific Linux 6.6.  I didn't see any 
messages in the log about too many open files.  Thank you for the tips!


-Joe

On 5/31/2015 4:24 AM, Tomasz Borek wrote:

Joseph,

You are doing a memory intensive operation and perhaps an IO intensive
operation at once. That makes your C-heap run out of memory or hit a thread
limit (thus first problem, java.lang.OutOfMemoryError: unable to create new
native thread) and later you're also hitting the problem of Java heap being
full or - more precisely - GC being unable to free enough space there even
with collection, to allocate new object that you want allocated (thus
second throw: java.lang.OutOfMemoryError: Java heap space).

Important is:
- whether your OS is 32-bit or 64-bit
- whether your JVM is 32-bit or 64-bit
- what are your OS limits on thread creation and have you touched them or
changed them (1st problem)
- how do you start JVM (Xms, Xmx, Xss, dumps, direct memory, permgen size -
both problems)

What you can do to solve your problems differs depending on what exactly
causes them, but in general:

NATIVE:
1) Either your operation causes many threads to spawn and you hit your
thread limit (OS limits how many threads process can create) - have a
threaddump and see
2) Or your op causes many thread creation and the memory settings you start
JVM with plus 32/64 bits of OS and JVM make it impossible for the C-heap to
have this much memory thus you're hitting the OOM error - adjust settings,
move to 64-bit architectures, add RAM while on 64-bit (32-bit really chokes
you down, less than 4GB is available for you for EVERYTHING: Java heap,
PermGen space AND C-Heap)

Usually, with such thread-greedy operation, it's also nice to look at the
code and see if perhaps one can optimize thread creation/management.

OOM on Java heap:
Add crash on memory dump parameter to your JVM and walk dominator tree or
see histogram to actually tell what's eating your heap space. MAT is a good
tool for this, unless your heap is like 150GB, then Netbeans may help or
see Alexey Ragozin's work, I think he forked the code for NetBeans heap
analyzer and made some adjustments specially for such cases. Light Google
search and here it is: http://blog.ragozin.info/


pozdrawiam,
LAFK

2015-05-30 20:48 GMT+02:00 Erick Erickson :


Faceting on very high cardinality fields can use up memory, no doubt
about that. I think the entire delete question was a red herring, but
you know that already ;)

So I think you can forget about the delete stuff. Although do note
that if you do re-index your old documents, the new version won't have
the field, and as segments are merged the deleted documents will have
all their resources reclaimed, effectively deleting the field from the
old docs So you could gradually re-index your corpus and get this
stuff out of there.

Best,
Erick

On Sat, May 30, 2015 at 5:18 AM, Joseph Obernberger
 wrote:

Thank you Erick.  I was thinking that it actually went through and

removed

the index data; that you for the clarification.  What happened was I had
some bad data that created a lot of fields (some 8000).  I was getting

some

errors adding new fields where solr could not talk to zookeeper, and I
thought it may be because there are so many fields.  The index size is

some

420million docs.
I'm hesitant to try to re-create as when the shards crash, they leave a
write.lock file in HDFS, and I need to manually delete that file (on 27
machines) before bringing 

Best strategy for logging & security

2015-06-01 Thread Vishal Swaroop
It will be great if you can provide your valuable inputs on strategy for
logging & security...


Thanks a lot in advance...



Logging :

- Is there a way to implement logging for each cores separately.

- What will be the best strategy to log every query details (like source
IP, search query, etc.) at some point we will need monthly reports for
analysis.



Securing SOLR :

- We need to implement SOLR security from client as well as server side...
requests will be performed via web app as well as other server side apps
e.g. curl...

Please suggest about the best approach we can follow... link to any
documentation will also help.



Environment : SOLR 4.7 configured on Tomcat 7  (Linux)


Re: Occasionally getting error in solr suggester component.

2015-06-01 Thread Erick Erickson
Attach suggester.build=true or suggester.buildAll=true to any request
to suggester to rebuild.
OR
add
buildOnStartup or buildOnCommit or buildOnOptimize to the definition
in solrConfig.

BUT:
building can be a _very_ expensive operation. For document-based
indexes, the build process
reads through _all_ of the _stored_ documents in your index, and that
can take many minutes so
I recommend against these options for a large index, and strongly
recommend you test these
with a large corpus.

Best,
Erick



On Mon, Jun 1, 2015 at 4:01 AM, Clemens Wyss DEV  wrote:
> Lucene 5.1:
> I am (also) facing
> "java.lang.IllegalStateException: suggester was not built"
>
> At the very moment no new documents seem tob e added to the index/core. Will 
> a reboot "sanitize" the index/core?
>
> I (still) have
>  name="buildOnCommit">true
>
> How can I tell Solr to peridoically update the suggestions? If not possible 
> per configuration (in solrconfig.xml), what ist he preferred approach through 
> SolrJ?
>
> Thx
> Clemens
>
> -Ursprüngliche Nachricht-
> Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com]
> Gesendet: Donnerstag, 15. Januar 2015 19:52
> An: solr-user@lucene.apache.org
> Betreff: Re: Occasionally getting error in solr suggester component.
>
> That sounds like a good approach to me.  Of course it depends how often you 
> commit, and what your tolerance is for delay in having suggestions appear, 
> but it sounds as if you have a good understanding of the tradeoffs there.
>
> -Mike
>
> On 1/15/15 10:31 AM, Dhanesh Radhakrishnan wrote:
>> Hi,
>>  From Solr 4.7 onwards, the implementation of this Suggester is
>> changed. The old SpellChecker based search component is replaced with
>> a new suggester that utilizes Lucene suggester module. The latest Solr
>> download is preconfigured with this new suggester I;m using Solr 4.10
>> and suggestion are based on query  /suggest instead of /spell.
>> So what I did is that in changed to > name="buildOnCommit">false Its not good that each time rebuild
>> the index on  commit , however, I would like to build the index on
>> certain time period, say 1 hour.
>> The lookup data will be built only when requested by URL parameter
>> suggest.build=true
>>
>> "http://localhost:8983/solr/ha/suggest?suggest.build=true";
>>
>> So this will rebuild the index again and the changes will reflect in
>> the suggester.
>>
>> There are certain pros and cons for this.
>> Issue is that the change will reflect only on certain time interval,
>> here 1 hour. Advantage is that we can avoid the  rebuilt index  on
>> every commit or optimize.
>>
>> Is this the right way ?? or any that I missed ???
>>
>> Regards
>> dhanesh s.r
>>
>>
>>
>>
>> On Thu, Jan 15, 2015 at 3:20 AM, Michael Sokolov <
>> msoko...@safaribooksonline.com> wrote:
>>
>>> did you build the spellcheck index using spellcheck.build as
>>> described
>>> here: https://cwiki.apache.org/confluence/display/solr/Spell+Checking ?
>>>
>>> -Mike
>>>
>>>
>>> On 01/14/2015 07:19 AM, Dhanesh Radhakrishnan wrote:
>>>
 Hi,
 Thanks for the reply.
 As you mentioned in the previous mail I changed buildOnCommit=false
 in solrConfig.
 After that change, suggestions are not working.
 In Solr 4.7 introduced a new approach based on a dedicated
 SuggestComponent I'm using that component to build suggestions and
 lookup implementation is "AnalyzingInfixLookupFactory"
 Is there any work around ??




 On Wed, Jan 14, 2015 at 12:47 AM, Michael Sokolov <
 msoko...@safaribooksonline.com> wrote:

   I think you are probably getting bitten by one of the issues
 addressed in
> LUCENE-5889
>
> I would recommend against using buildOnCommit=true - with a large
> index this can be a performance-killer.  Instead, build the index
> yourself using the Solr spellchecker support
> (spellcheck.build=true)
>
> -Mike
>
>
> On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote:
>
>   Hi all,
>> I am experiencing a problem in Solr SuggestComponent Occasionally
>> solr suggester component throws an  error like
>>
>> Solr failed:
>> {"responseHeader":{"status":500,"QTime":1},"error":{"msg":"suggest
>> er
>> was
>> not built","trace":"java.lang.IllegalStateException: suggester was
>> not built\n\tat
>> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
>> lookup(AnalyzingInfixSuggester.java:368)\n\tat
>> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
>> lookup(AnalyzingInfixSuggester.java:342)\n\tat
>> org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\
>> tat org.apache.solr.spelling.suggest.SolrSuggester.
>> getSuggestions(SolrSuggester.java:199)\n\tat
>> org.apache.solr.handler.component.SuggestComponent.
>> process(SuggestComponent.java:234)\n\tat
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(

Re: UI Admin - and "stored=false" fields

2015-06-01 Thread Erick Erickson
That's the whole point of having a true/false option for stored.
Stored="true" implies that those fields are available for display to
the user in results lists. stored="false" and they're not.

Best,
Erick

On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList
 wrote:
> Hi
>
> I am indexing some content under "text" field.
> In the schema.xml "text" field is defined as :
>
>
> multiValued="true"/>
>
>
> However, when I am looking to the documents via the UI
> http://localhost:8983/solr/#/sec_600b/query
>
> I see the text field content in the returned documents.
>
> Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
> in admin ui) is expected?
>
> thanks!
>
> Benjamin.


Re: Synonyms within FQ

2015-06-01 Thread Erick Erickson
For future reference, fq clauses are parsed just like the q clause;
they can be arbitrarily complex.

Best,
Erick

On Mon, Jun 1, 2015 at 5:52 AM, John Blythe  wrote:
> after further investigation it looks like the synonym i was testing against
> was only associated with one of their multiple divisions (despite being the
> most common name for them!). it looks like this may clear the issue up, but
> thanks anyway!
>
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>
> On Mon, Jun 1, 2015 at 8:33 AM, John Blythe  wrote:
>
>> morning everyone,
>>
>> i'm attempting to find related documents based on a manufacturer's
>> competitor. as such i'm querying against the 'description' field with
>> manufacturer1's product description but running a filter query with
>> manufacturer2's name against the 'mfgname' field.
>>
>> one of the ways that we help boost our document finding is with a synonym
>> dictionary for manufacturer names. many of the larger players have multiple
>> divisions, have absorbed smaller companies, etc. so we need all of their
>> potential names to map to our record.
>>
>> i may be wrong, but from my initial testing it doesn't seem to be applying
>> to a fq. is there any way of doing this?
>>
>> thanks-
>>


Re: Synonyms within FQ

2015-06-01 Thread John Blythe
Thanks Erick!

On Mon, Jun 1, 2015 at 11:29 AM, Erick Erickson 
wrote:

> For future reference, fq clauses are parsed just like the q clause;
> they can be arbitrarily complex.
> Best,
> Erick
> On Mon, Jun 1, 2015 at 5:52 AM, John Blythe  wrote:
>> after further investigation it looks like the synonym i was testing against
>> was only associated with one of their multiple divisions (despite being the
>> most common name for them!). it looks like this may clear the issue up, but
>> thanks anyway!
>>
>> --
>> *John Blythe*
>> Product Manager & Lead Developer
>>
>> 251.605.3071 | j...@curvolabs.com
>> www.curvolabs.com
>>
>> 58 Adams Ave
>> Evansville, IN 47713
>>
>> On Mon, Jun 1, 2015 at 8:33 AM, John Blythe  wrote:
>>
>>> morning everyone,
>>>
>>> i'm attempting to find related documents based on a manufacturer's
>>> competitor. as such i'm querying against the 'description' field with
>>> manufacturer1's product description but running a filter query with
>>> manufacturer2's name against the 'mfgname' field.
>>>
>>> one of the ways that we help boost our document finding is with a synonym
>>> dictionary for manufacturer names. many of the larger players have multiple
>>> divisions, have absorbed smaller companies, etc. so we need all of their
>>> potential names to map to our record.
>>>
>>> i may be wrong, but from my initial testing it doesn't seem to be applying
>>> to a fq. is there any way of doing this?
>>>
>>> thanks-
>>>

Sorting in Solr

2015-06-01 Thread Steven White
Hi everyone,

I need to be able to sot in Solr.  Obviously, I need to do this in a way
sorting won't cause OOM when a result may contain 1000's of hits if not
millions.  Can you guide me on how I can do this?  Is there a way to tell
Solr sort top N results (discarding everything else) or must such sorting
be cone on the client side?

Thanks in advanced

Steve


Re: Sorting in Solr

2015-06-01 Thread Shawn Heisey
On 6/1/2015 9:29 AM, Steven White wrote:
> I need to be able to sot in Solr.  Obviously, I need to do this in a way
> sorting won't cause OOM when a result may contain 1000's of hits if not
> millions.  Can you guide me on how I can do this?  Is there a way to tell
> Solr sort top N results (discarding everything else) or must such sorting
> be cone on the client side?

Solr supports sorting.

https://wiki.apache.org/solr/CommonQueryParameters#sort

https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter

I think we may have an omission from the docs -- docValues can also be
used for sorting, and may also offer a performance advantage.

Thanks,
Shawn



Re: UI Admin - and "stored=false" fields

2015-06-01 Thread Erik Hatcher
Did you happen to change the field type definition without reindexing?  (it 
requires reindexing to “unstore” them if they were originally stored)

If you’re seeing a field value in a document result (not facets, those are 
driven by indexed terms) when stored=“false” then something is wrong and I’d 
guess it’s because of the field definition changing as mentioned.


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 




> On Jun 1, 2015, at 11:27 AM, Erick Erickson  wrote:
> 
> That's the whole point of having a true/false option for stored.
> Stored="true" implies that those fields are available for display to
> the user in results lists. stored="false" and they're not.
> 
> Best,
> Erick
> 
> On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList
>  wrote:
>> Hi
>> 
>> I am indexing some content under "text" field.
>> In the schema.xml "text" field is defined as :
>> 
>> 
>>   > multiValued="true"/>
>> 
>> 
>> However, when I am looking to the documents via the UI
>> http://localhost:8983/solr/#/sec_600b/query
>> 
>> I see the text field content in the returned documents.
>> 
>> Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
>> in admin ui) is expected?
>> 
>> thanks!
>> 
>> Benjamin.



Re: Sorting in Solr

2015-06-01 Thread Erick Erickson
Steve:

Surprisingly, the number of hits is completely irrelevant for the
memory requirements for sorting. The base memory size is, AFAIK, an
array of maxDoc ints (you can find maxDoc on the admin screen).
There's some additional overhead, but that's the base size. If you sue
DocValues, much of the overhead is kept in the MMapDirectory space
IIRC.

Best,
Erick


On Mon, Jun 1, 2015 at 8:41 AM, Shawn Heisey  wrote:
> On 6/1/2015 9:29 AM, Steven White wrote:
>> I need to be able to sot in Solr.  Obviously, I need to do this in a way
>> sorting won't cause OOM when a result may contain 1000's of hits if not
>> millions.  Can you guide me on how I can do this?  Is there a way to tell
>> Solr sort top N results (discarding everything else) or must such sorting
>> be cone on the client side?
>
> Solr supports sorting.
>
> https://wiki.apache.org/solr/CommonQueryParameters#sort
>
> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter
>
> I think we may have an omission from the docs -- docValues can also be
> used for sorting, and may also offer a performance advantage.
>
> Thanks,
> Shawn
>


Re: UI Admin - and "stored=false" fields

2015-06-01 Thread Erick Erickson
Reset, pay attention to Erik. I didn't read it all the way through.

Erick

On Mon, Jun 1, 2015 at 8:27 AM, Erick Erickson  wrote:
> That's the whole point of having a true/false option for stored.
> Stored="true" implies that those fields are available for display to
> the user in results lists. stored="false" and they're not.
>
> Best,
> Erick
>
> On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList
>  wrote:
>> Hi
>>
>> I am indexing some content under "text" field.
>> In the schema.xml "text" field is defined as :
>>
>>
>>> multiValued="true"/>
>>
>>
>> However, when I am looking to the documents via the UI
>> http://localhost:8983/solr/#/sec_600b/query
>>
>> I see the text field content in the returned documents.
>>
>> Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
>> in admin ui) is expected?
>>
>> thanks!
>>
>> Benjamin.


fq and defType

2015-06-01 Thread david . davila
Hello,

I need to parse some complicated queries that only works properly with the 
edismax query parser, in q and fq parameters. I am testing with 
defType=edismax, but it seems that this clause only affects to the q 
parameter. Is there any way to set edismax to the fq parameter?

Thank you very much, 


David Dávila Atienza
DIT
Teléfono: 915828763
Extensión: 36763

Re: fq and defType

2015-06-01 Thread Shawn Heisey
On 6/1/2015 10:44 AM, david.dav...@correo.aeat.es wrote:
> I need to parse some complicated queries that only works properly with the 
> edismax query parser, in q and fq parameters. I am testing with 
> defType=edismax, but it seems that this clause only affects to the q 
> parameter. Is there any way to set edismax to the fq parameter?

fq={!edismax}querystring

The other edismax parameters on your request (qf, etc) apply to those
filter queries just like they would for the q parameter.

Thanks,
Shawn



Re: fq and defType

2015-06-01 Thread david . davila
Thank you!

David



De: Shawn Heisey 
Para:   solr-user@lucene.apache.org, 
Fecha:  01/06/2015 18:53
Asunto: Re: fq and defType



On 6/1/2015 10:44 AM, david.dav...@correo.aeat.es wrote:
> I need to parse some complicated queries that only works properly with 
the 
> edismax query parser, in q and fq parameters. I am testing with 
> defType=edismax, but it seems that this clause only affects to the q 
> parameter. Is there any way to set edismax to the fq parameter?

fq={!edismax}querystring

The other edismax parameters on your request (qf, etc) apply to those
filter queries just like they would for the q parameter.

Thanks,
Shawn




Re: Best strategy for logging & security

2015-06-01 Thread Rajesh Hazari
Logging :

Just use logstash to a parse your logs for all collection and  logstash
forwarder and lumberjack at your solr replicas in your solr cloud to send
the log events to you central logstash server and send it to back to solr
(either the same or different instance) to a different collection.

The default log4j.properties that comes with solr dist can log core name
with each query log.

Security:
suggest you to go through this wiki
https://wiki.apache.org/solr/SolrSecurity

*Thanks,*
*Rajesh,*
*(mobile) : 8328789519.*

On Mon, Jun 1, 2015 at 11:20 AM, Vishal Swaroop 
wrote:

> It will be great if you can provide your valuable inputs on strategy for
> logging & security...
>
>
> Thanks a lot in advance...
>
>
>
> Logging :
>
> - Is there a way to implement logging for each cores separately.
>
> - What will be the best strategy to log every query details (like source
> IP, search query, etc.) at some point we will need monthly reports for
> analysis.
>
>
>
> Securing SOLR :
>
> - We need to implement SOLR security from client as well as server side...
> requests will be performed via web app as well as other server side apps
> e.g. curl...
>
> Please suggest about the best approach we can follow... link to any
> documentation will also help.
>
>
>
> Environment : SOLR 4.7 configured on Tomcat 7  (Linux)
>


Chef recipes for Solr

2015-06-01 Thread Walter Underwood
Anyone have Chef recipes they like for deploying Solr? 

I’d especially appreciate one for uploading the configs directly to a Zookeeper 
ensemble.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




Multiple Word Synonyms with Autophrasing

2015-06-01 Thread Chris Morley
Hello everyone @ solr-user,
  
 At Wayfair, I have implemented multiple word synonyms in a clean and 
efficient way in conjunction with with a slightly modified version of the 
LucidWorks' Autophrasing plugin by also tacking on a modified version of 
edismax.  It is not released or on use on our public website yet, but it 
will be very soon.  While it is not ready to officially open source yet, I 
know some people out there are anxious to implement this type of thing.  
Please feel free to contact me if you are interested in learning about how 
to theoretically accomplish this on your own.  Note that while this may 
have some concepts in common with Named Entity Recognition implementations, 
I think it really is a completely different thing.  I get a lot of spam, so 
if you please, would you write me privately your questions with the subject 
line being "MWSwA" so I can easily compile everyone's questions about this. 
 I will respond to everyone at some point soon with some beta documentation 
or possibly with an invitation to a private github or something so that you 
can review an example.
  
 Thanks!
 -Chris.
  



Re: Best strategy for logging & security

2015-06-01 Thread Vishal Swaroop
Thanks Rajesh... just trying to figure out if *logstash *is opensource and
free ?

On Mon, Jun 1, 2015 at 2:13 PM, Rajesh Hazari 
wrote:

> Logging :
>
> Just use logstash to a parse your logs for all collection and  logstash
> forwarder and lumberjack at your solr replicas in your solr cloud to send
> the log events to you central logstash server and send it to back to solr
> (either the same or different instance) to a different collection.
>
> The default log4j.properties that comes with solr dist can log core name
> with each query log.
>
> Security:
> suggest you to go through this wiki
> https://wiki.apache.org/solr/SolrSecurity
>
> *Thanks,*
> *Rajesh,*
> *(mobile) : 8328789519.*
>
> On Mon, Jun 1, 2015 at 11:20 AM, Vishal Swaroop 
> wrote:
>
> > It will be great if you can provide your valuable inputs on strategy for
> > logging & security...
> >
> >
> > Thanks a lot in advance...
> >
> >
> >
> > Logging :
> >
> > - Is there a way to implement logging for each cores separately.
> >
> > - What will be the best strategy to log every query details (like source
> > IP, search query, etc.) at some point we will need monthly reports for
> > analysis.
> >
> >
> >
> > Securing SOLR :
> >
> > - We need to implement SOLR security from client as well as server
> side...
> > requests will be performed via web app as well as other server side apps
> > e.g. curl...
> >
> > Please suggest about the best approach we can follow... link to any
> > documentation will also help.
> >
> >
> >
> > Environment : SOLR 4.7 configured on Tomcat 7  (Linux)
> >
>


Re: fq and defType

2015-06-01 Thread Mikhail Khludnev
fq={!edismax}you are welome


On Mon, Jun 1, 2015 at 6:44 PM,  wrote:

> Hello,
>
> I need to parse some complicated queries that only works properly with the
> edismax query parser, in q and fq parameters. I am testing with
> defType=edismax, but it seems that this clause only affects to the q
> parameter. Is there any way to set edismax to the fq parameter?
>
> Thank you very much,
>
>
> David Dávila Atienza
> DIT
> Teléfono: 915828763
> Extensión: 36763




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: solr uima and opennlp

2015-06-01 Thread Tommaso Teofili
yeah, I think you'd rather post it to d...@uima.apache.org .

Regards,
Tommaso

2015-05-28 15:19 GMT+02:00 hossmaa :

> Hi Tommaso
>
> Thanks for the quick reply! I have another question about using the
> Dictionary Annotator, but I guess it's better to post it separately.
>
> Cheers
> Andreea
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-uima-and-opennlp-tp4206873p4208348.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


SolrCloud 5.1 startup looking for standalone config

2015-06-01 Thread tuxedomoon
I followed these steps and I am unable to launch in cloud mode.

1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3

2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2

3. uploaded a configset to zk1  (solr home is /volume/solr/data)
---
/opt/solrsvc/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -zkhost 
zk1:2181  -confname mycollection_cloud_conf -solrhome /volume/solr/data
-confdir  /home/ec2-user/mycollection/conf


4. on s1, added these params to solr.in.sh
---
ZK_HOST=zk1:2181,zk2:2181,zk3:2181
SOLR_HOST=s1
ZK_CLIENT_TIMEOUT="15000"
SOLR_OPTS="$SOLR_OPTS -DnumShards=2"


5. on s1 created core directory and file

/volume/solr/data/mycollection/core.properties (name=mycollection)


6. repeated steps 4,5 for s2 minus the numShards param


Starting the service on s1 gives me

mycollection:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Could not load conf for core mycollection: Error loading solr config from
/volume/solr/data/mycollection/conf/solrconfig.xml 

but aren't the config files supposed to be in Zookeeper?  

Tux


   
   







--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud 5.1 startup looking for standalone config

2015-06-01 Thread Erick Erickson
bq: but aren't the config files supposed to be in Zookeeper

Yes, but you haven't done anything to tell Solr that the nodes you've
created are part of SolrCloud!

You're confusing, I think, core discovery with creating collections.
Basically you were pretty much OK up until step 5 (although I'm not at
all sure that SOLR_HOST is doing you any good, and certainly setting
numShards in SOLR_OPTS isn't a good idea, what happens if you want to
create a collection with 5 shards?)

You don't need to create any directories on your Solr nodes, that'll
be done for you automatically by the collection creation command from
the Collections API. So I'd down the nodes and nuke the directories
you created by hand and bring the nodes back up. It's probably not
necessary to take the nodes down, but I tend to be paranoid about
that.

Then just create the collection via the Collections API CREATE
command, see: 
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1

You can use curl or a browser to issue something like this to any
active Solr node, Solr will do the rest:
http://some_solr_node:port/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&collection.configName=my_collection_cloud_conf&etc..

I believe it's _possible_ to carefully construct the core.properties
files on all the Solr instances, but unless you know _exactly_ what's
going on under the covers it'll lead to endless tail-chasing. You can
control which nodes the collection ends up on with the createNodeSet
parameter etc

Best,
Erick

On Mon, Jun 1, 2015 at 4:37 PM, tuxedomoon  wrote:
> I followed these steps and I am unable to launch in cloud mode.
>
> 1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3
>
> 2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2
>
> 3. uploaded a configset to zk1  (solr home is /volume/solr/data)
> ---
> /opt/solrsvc/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -zkhost
> zk1:2181  -confname mycollection_cloud_conf -solrhome /volume/solr/data
> -confdir  /home/ec2-user/mycollection/conf
>
>
> 4. on s1, added these params to solr.in.sh
> ---
> ZK_HOST=zk1:2181,zk2:2181,zk3:2181
> SOLR_HOST=s1
> ZK_CLIENT_TIMEOUT="15000"
> SOLR_OPTS="$SOLR_OPTS -DnumShards=2"
>
>
> 5. on s1 created core directory and file
> 
> /volume/solr/data/mycollection/core.properties (name=mycollection)
>
>
> 6. repeated steps 4,5 for s2 minus the numShards param
>
>
> Starting the service on s1 gives me
>
> mycollection:
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Could not load conf for core mycollection: Error loading solr config from
> /volume/solr/data/mycollection/conf/solrconfig.xml
>
> but aren't the config files supposed to be in Zookeeper?
>
> Tux
>
>
>
>
>
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Number of clustering labels to show

2015-06-01 Thread Zheng Lin Edwin Yeo
Thank you so much Alessandro.

But i do not find any difference with the quality of the clustering results
when I change the hl.fragszie to a  even though I've set my
carrot.produceSummary to true.


Regards,
Edwin


On 1 June 2015 at 17:31, Alessandro Benedetti 
wrote:

> Only to clarify the initial mail, The carrot.fragSize has nothing to do
> with the number of clusters produced.
>
> When you select to work with field summary ( you will work only on snippets
> from the original content, snippets produced by the highlight of the query
> in the content), the fragSize will specify the size of these fragments.
>
> From Carrot documentation :
>
> carrot.produceSummary
>
> When true, the carrot.snippet
>  field
> (if
> no snippet field, then the carrot.title
>  field)
> will
> be highlighted and the highlighted text will be used for clustering.
> Highlighting is recommended when the snippet field contains a lot of
> content. Highlighting can also increase the quality of clustering because
> the clustered content will get an additional query-specific context.
> carrot.fragSize
>
> The frag size to use for highlighting. Meaningful only when
> carrot.produceSummary
> 
> is
> true. If not specified, the default highlighting fragsize (hl.fragsize)
> will be used. If that isn't specified, then 100.
>
>
> Cheers
>
> 2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo :
>
> > Thank you Stanislaw for the links. Will read them up to better understand
> > how the algorithm works.
> >
> > Regards,
> > Edwin
> >
> > On 29 May 2015 at 17:22, Stanislaw Osinski <
> > stanislaw.osin...@carrotsearch.com> wrote:
> >
> > > Hi,
> > >
> > > The number of clusters primarily depends on the parameters of the
> > specific
> > > clustering algorithm. If you're using the default Lingo algorithm, the
> > > number of clusters is governed by
> > > the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a
> > look
> > > at the documentation (
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
> > > )
> > > for some more details (the "Tweaking at Query-Time" section shows how
> to
> > > pass the specific parameters at request time). A complete overview of
> the
> > > Lingo clustering algorithm parameters is here:
> > > http://doc.carrot2.org/#section.component.lingo.
> > >
> > > Stanislaw
> > >
> > > --
> > > Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
> > > http://carrotsearch.com
> > >
> > > On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm trying to increase the number of cluster result to be shown
> during
> > > the
> > > > search. I tried to set carrot.fragSize=20 but only 15 cluster labels
> is
> > > > shown. Even when I tried to set carrot.fragSize=5, there's also 15
> > labels
> > > > shown.
> > > >
> > > > Is this the correct way to do this? I understand that setting it to
> 20
> > > > might not necessary mean 20 lables will be shown, as the setting is
> for
> > > > maximum number. But when I set this to 5, it should reduce the number
> > of
> > > > labels to 5?
> > > >
> > > > I'm using Solr 5.1.
> > > >
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > >
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Derive suggestions across multiple fields

2015-06-01 Thread Zheng Lin Edwin Yeo
Hi,

Does anyone knows if we can derive suggestions across multiple fields?

I tried to set something like this in my field in suggest searchComponents
in solrconfig.xml, but nothing is returned. It only works when I set a
single field, and not multiple field.

  

  suggest
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.tst.TSTLookupFactory
  Content, Summary  
  0.005
  true

  

I'm using solr 5.1.

Regards,
Edwin


Re: Chef recipes for Solr

2015-06-01 Thread Upayavira

I have many. My SolrCloud code has the app push configs to zookeeper.

I am afk at the mo. Feel free to bug me about it!

Upayavira

On Mon, Jun 1, 2015, at 07:29 PM, Walter Underwood wrote:
> Anyone have Chef recipes they like for deploying Solr? 
> 
> I’d especially appreciate one for uploading the configs directly to a
> Zookeeper ensemble.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 


Re: Chef recipes for Solr

2015-06-01 Thread Walter Underwood
That sounds great. Someone else here will be making the recipes, so I’ll put 
him in touch with you.

As always, this is a really helpful list.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Jun 1, 2015, at 10:20 PM, Upayavira  wrote:

> 
> I have many. My SolrCloud code has the app push configs to zookeeper.
> 
> I am afk at the mo. Feel free to bug me about it!
> 
> Upayavira
> 
> On Mon, Jun 1, 2015, at 07:29 PM, Walter Underwood wrote:
>> Anyone have Chef recipes they like for deploying Solr? 
>> 
>> I’d especially appreciate one for uploading the configs directly to a
>> Zookeeper ensemble.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>