SOLR cloud sharding

2016-06-02 Thread Selvam
Hello all,

We need to run a heavy SOLR  with 300 million documents, with each document
having around 350 fields. The average length of the fields will be around
100 characters, it may have date and integers fields as well. Now we are
not sure whether to have single server or run multiple servers (for each
node/shards?). We are using Solr 5.5 and want best performance.

We are new to SolrCloud, I would like to request your inputs on how many
nodes/shards we need to have and how many servers for best performance. We
primarily use geo-statial search.

-- 
Regards,
Selvam


Re: Last modified time is not updated on on SolrCloud 5.2.1 SSL

2016-06-02 Thread Ilan Schwarts
Erick you were right. I Dont know why there is difference when using SSL.
When i excplicitly added commit=true it did enforced Last modified to be
updated. Case is closed thank you
On Jun 1, 2016 8:55 PM, "Erick Erickson"  wrote:

> Issue an explicit commit to be sure.
>
> And as to whether the SSL makes a difference... I'm more
> going on the theory that you happened to look after
> the autocommit kicked in on the non-SSL case and
> before that on the SSL case. Admittedly a shot in the
> dark.
>
> Browser caching issues, have tripped me up more
> than once too.
>
> Best,
> Erick
>
> On Wed, Jun 1, 2016 at 9:26 AM, Ilan Schwarts  wrote:
> > Since its working in non ssl, i dont think its commit issue, it is the
> same
> > pc, i just update scheme on zoomeeper to https and un-comment the ssl
> > settings on solr.in.cmd.
> > On Jun 1, 2016 7:25 PM, "Ilan Schwarts"  wrote:
> >
> >> If a document was added on both cores/nodes. Doesnt it mean the document
> >> was successfully added and commited?
> >> On Jun 1, 2016 7:23 PM, "Erick Erickson" 
> wrote:
> >>
> >>> Did you issue a commit?
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Wed, Jun 1, 2016 at 8:15 AM, Ilan Schwarts 
> wrote:
> >>> > Hi all,
> >>> > I have working enviroment SolrCloud 5.2.1, When i am using without
> SSL,
> >>> > after adding a document, I can see on the core's information, Under
> >>> > "statistics" the Last Modified is working good, it writes "Less than
> a
> >>> > minuet".
> >>> >
> >>> > But when i set the solrcloud to SSL, after adding the document, it is
> >>> added
> >>> > to the collection, but the last modified is not updated.
> >>> > Is this a bug ? known issue ?
> >>> >
> >>> > Update:
> >>> > After restarting all the cores, the value of Last Modified is valid,
> >>> But it
> >>> > happens only after restart and not after update ot collection/index
> >>> document
> >>> >
> >>> >
> >>> > Thanks
> >>> >
> >>> > --
> >>> >
> >>> >
> >>> > -
> >>> > Ilan Schwarts
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> >
> >>> >
> >>> > -
> >>> > Ilan Schwarts
> >>>
> >>
>


Re: SOLR cloud sharding

2016-06-02 Thread Selvam
Hi,

On a  note, we also need all 350 fields to be stored and indexed.

On Thu, Jun 2, 2016 at 12:58 PM, Selvam  wrote:

> Hello all,
>
> We need to run a heavy SOLR  with 300 million documents, with each
> document having around 350 fields. The average length of the fields will be
> around 100 characters, it may have date and integers fields as well. Now we
> are not sure whether to have single server or run multiple servers (for
> each node/shards?). We are using Solr 5.5 and want best performance.
>
> We are new to SolrCloud, I would like to request your inputs on how many
> nodes/shards we need to have and how many servers for best performance. We
> primarily use geo-statial search.
>
> --
> Regards,
> Selvam
>
>
>


-- 
Regards,
Selvam
KnackForge 


Re: Metadata and HTML ending up in searchable text

2016-06-02 Thread Simon Blandford
I have investigated different Solr versions. I have found that 4.10.3 is 
the last version that completely strips the HTML to text as expected. 
4.10.4 starts introducing some HTML comments and Javascript and anything 
over 5.0 is full of mangled HTML and attribute artefacts such as 
"X-Parsed-By".


So for now the best solution for me is to just use 4.10.3, although I 
really miss the core and process management.


https://issues.apache.org/jira/browse/SOLR-9178

On 31/05/16 13:22, Allison, Timothy B. wrote:

  From the same page, extractFormat=text only applies when extractOnly
is true, which just shows the output from tika without indexing the document.

Y, sorry.  I just looked through the source code.  You're right.  If you use DIH 
(TikaEntityProcessor) instead of Solr Cell (ExtractingDocumentLoader), you should be able to set 
the handler type by setting the "format" attribute, and "text" is one option 
there.


I just want to make sure I'm not missing something really obvious before 
submitting a bug report.

I don't think you are.


  From the same page, extractFormat=text only applies when extractOnly
is true, which just shows the output from tika without indexing the document.
Running it in "extractOnly" mode resulting in a XML output. The
difference between selecting "text" or "xml" format is that the
escaped document in the  tag is either the original HTML
(xml mode) or stripped HTML (text mode). It seems some Javascript
creeps into the text version. (See below)

Regards,
Simon

HTML mode sample:
  051
;


  
  ...

TEXT mode (Blank lines stripped):

047
UsingMailingLists - Solr Wiki
Search:

Solr Wiki
Login






On 27/05/16 13:31, Allison, Timothy B. wrote:

I'm only minimally familiar with Solr Cell, but...

1) It looks like you aren't setting extractFormat=text.  According
to [0]...the default is xhtml which will include a bunch of the metadata.
2) is there an attr_* dynamic field in your index with type="ignored"?
This would strip out the attr_ fields so they wouldn't even be
indexed...if you don't want them.

As for the HTML file, it looks like Tika is failing to strip out the
style section.  Try running the file alone with tika-app: java -jar
tika-app.jar -t inputfile.html.  If you are finding the noise there.
Please open an issue on our JIRA:
https://issues.apache.org/jira/browse/tika


[0]
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with
+Solr+Cell+using+Apache+Tika


-Original Message-
From: Simon Blandford [mailto:simon.blandf...@bkconnect.net]
Sent: Thursday, May 26, 2016 9:49 AM
To: solr-user@lucene.apache.org
Subject: Metadata and HTML ending up in searchable text

Hi,

I am using Solr 6.0 on Ubuntu 14.04.

I am ending up with loads of junk in the text body. It starts like,

The JSON entry output of a search result shows the indexed text
starting with...
body_txt_en: " stream_size 36499 X-Parsed-By
org.apache.tika.parser.DefaultParser X-Parsed-By"

And then once it gets to the actual text I get CSS class names
appearing that were in  or  tags etc.
e.g. "the power of calibre3 silence calibre2 and", where
"calibre3" etc are the CSS class names.

All this junk is searchable and is polluting the index.

I would like to index _only_ the actual content I am interested in
searching for.

Steps to reproduce:

1) Solr installed by untaring solr tgz in /opt.

2) Core created by typing "bin/solr create -c mycore"

3) Solr started with bin/solr start

4) TXT document index using the following command curl
"http://localhost:8983/solr/mycore/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=body_txt_en&commit=true";
-F

"content/UsingMailingLists.txt=@/home/user/Documents/library/UsingMailingLists.txt"

5) HTML document index using following command curl
"http://localhost:8983/solr/mycore/update/extract?literal.id=doc2&uprefix=attr_&fmap.content=body_txt_en&commit=true";
-F

"content/UsingMailingLists.html=@/home/user/Documents/library/UsingMailingLists.html"

6) Query using URL:
http://localhost:8983/solr/mycore/select?q=especially&wt=json

Result:

For the txt file, I get the following JSON for the document...

{
id: "doc1",
attr_stream_size: [
"8107"
],
attr_x_parsed_by: [
"org.apache.tika.parser.DefaultPar

Re: After Solr 5.5, mm parameter doesn't work properly

2016-06-02 Thread Jan Høydahl
[Aside] Your quote style is confusing, leaving my lines unquoted and your new 
lines quoted?? [/Aside]

> So in relation to the OP's sample queries I was pointing out that 'q.op=OR
> + mm=2' and 'q,op=AND + mm=2' are treated as identical queries by Solr 5.4,
> but 5.5+ will manipulate the occurs flags differently before it applies mm
> afterwards... because that is what q.op does.

If a user explicitly says mm=2, then the users intent is that he should
neither have pure OR (no clauses required) nor pure AND (all clauses required),
but exactly two clauses required.

So I think we need to go back to a solution where q.op technically
stays as OR for custom mm. How that would affect queries with explicit operators
I don’t know...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 2. jun. 2016 kl. 05.12 skrev Greg Pendlebury :
> 
> I would describe that subtly differently, and I think it is where the
> difference lies:
> 
> "Then from 4.x it did not care about q.op if mm was set explicitly"
>>> I agree. q.op was not actually used in the query, but rather as a way of
> inferred the default mm value. eDismax still ignored whatever q.op was set
> and built your query operators (ie. the occurs flags) using q.op=OR.
> 
> "And from 5.5 it seems as q.op does something even if mm is set..."
>>> Yes, although I think it is the words 'even if' drawing too strong a
> relationship between the two parameters. q.op has a function of its own,
> and that now functions as it 'should' (opinionated, I know) in the query
> construction, and continues to influence the default value of mm if it has
> not been explicitly set. SOLR-8812 further evolves that influence by trying
> to improve backwards compatibility for users who were not explicitly
> setting mm, and only ever changed 'q.op' despite it being a step removed
> from the actual parameter they were trying to manipulate.
> 
> So in relation to the OP's sample queries I was pointing out that 'q.op=OR
> + mm=2' and 'q,op=AND + mm=2' are treated as identical queries by Solr 5.4,
> but 5.5+ will manipulate the occurs flags differently before it applies mm
> afterwards... because that is what q.op does.
> 
> 
> On 2 June 2016 at 07:13, Jan Høydahl  wrote:
> 
>> Edismax used to default to mm=100% and not care about q.op at all
>> 
>> Then from 4.x it did not care about q.op if mm was set explicitly,
>> but if mm was not set, then q.op=OR —> mm=0%, q.op=AND —> mm=100%
>> 
>> And from 5.5 it seems as q.op does something even if mm is set...
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 1. jun. 2016 kl. 23.05 skrev Greg Pendlebury >> :
>>> 
>>> But isn't that the default value? In this case the OP is setting mm
>>> explicitly to 2.
>>> 
>>> Will have to look at those code links more thoroughly at work this
>> morning.
>>> Apologies if I am wrong.
>>> 
>>> Ta,
>>> Greg
>>> 
>>> On Wednesday, 1 June 2016, Jan Høydahl  wrote:
>>> 
> 1. jun. 2016 kl. 03.47 skrev Greg Pendlebury <
>> greg.pendleb...@gmail.com
 >:
 
> I don't think it is 8812. q.op was completely ignored by edismax prior
>> to
> 5.5, so it is not mm that changed.
 
 That is not the case. Prior to 5.5, mm would be automatically set to
>> 100%
 if q.op==AND
 See https://issues.apache.org/jira/browse/SOLR-1889 and
 https://svn.apache.org/viewvc?view=revision&revision=950710
 
 Jan
>> 
>> 



Small setFacetLimit() terminates Solr

2016-06-02 Thread Markus Jelsma
Hello,

I ran accros an awkward situation where is collect all ~7.000.000 distinct 
values for a field via facetting. To keep things optimized and reduce memory 
consumption i don't do setFacetLimit(-1) but a reasonable limit of 10.000 or 
100.000.

To my surprise, Solr just stops or crashes. So, instead of decreasing the 
limit, i increased the limit to a 1.000.000! And it works! The weird thing is 
that with a limit of 100.000 or 200.000 and a heap of 3.5 GB, Solr stops. But 
with a limit of 1.000.000 and a reduced heap of 2.5 GB, it just works fine.

When it fails, it some times doesn't crash, but throw

396882 WARN  (NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983) [   ] 
o.a.z.s.NIOServerCnxn caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x155111cc413000d, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)s a

This is on Solr 6.0.0 in cloud mode. 3 shards and 2 replica's on my local 
machine.

What is happening here?

Many thanks,
Markus


Re: Recommended api/lib to search Solr using PHP

2016-06-02 Thread Scott Chu

Thanks for many people answer my question and gave me considerable opionions 
and suggestion. I'd like to share what these helps come out here.

Installing Solarium
=
# Under Windows
1. Solarium needs php 5.3.3 or up (5.3.4 is recommended). Make sure you install 
correct php engine and make sure php cli executable correctly. Run php cli, if 
there're any error(s)/warning(s), you won't be able to install composer (cause 
the composer setup exe will forbid running further). So try to eliminate these 
error(s)/warning(s), if any (In my case, I have a warning on imagick.dll, so I 
go to php.ini and comment out that extension.) 
2. Download Composer-Setup.exe from https://getcomposer.org (or 
http://getcomposer.org/Composer-Setup.exe).
3. Double-click exe file, follow the instructions and install it ok. It doesn't 
give you chance to change installed folder. I have Win7 64bit and the installed 
folder is c:\ProgramData\composer, setup will add it the PATH environment 
variable).
4.  At this time, you should be able to open a DOS windows and run 'compose' 
ok, so go ahead and open a DOS window, 'cd' to your web root folder (In my 
case, it's c:\appserv\www). Create a subfolder 'solarium'. 'cd' into it.
5. Create a text file named 'composer.json' with following contents: (Current 
newest stable version is 3.6.0)
{
  "require": {
"solarium/solarium": "3.6.0"
  }
}
then run 'composer install'. It'll show a lot of message. Be patient and 
when it's done, the last message is normally be ' Generating autoload files'. 
If everything ok, you'll see a file 'composer.lock' and a folder 'vendor'.
6. Congratulation! You just install Solarium ok.
#Under CentOS
1. As said above, prepare correct php version and run php cli ok.
2. Make folder, say /local/composer, cd into it, and make a subfolder 'bin'.cd 
to it.
3. Run 'curl https://getcomposer.org/installer | php'. Wen it's done, run 'mv 
composer.phar composer' (Note: .phar is a php archive executable).
4. 'cd' to your web root folder (In my case, it's /local/htdocs). Create a 
subfolder 'solarium'. 'cd' to it.
5. Create a file named 'composer.json' and its contents as described above. Run 
'/local/composer/bin/composer install'. Be patient and when it's done, it will 
also show 'Generating autoload files' as last message.
6. Again, congratulation! You install Solarium ok.

Checking installation
===
#Under Windows
1. 'cd' to that geneated 'vendor' folder. 'cd' into 
vendor\solarium\solarium\examples.
2.  Create a file named 'config.php' with following contents:
 array(
'localhost' => array(
'host' => '127.0.0.1',
'port' => 8080,
'core' => 'collection1',
'path' => '/solr/',
)
)
);
 but replace some values according your Solr server environment. (In my case, 
'localhost' is just a id name we can give on will, so I change it to 'mediaSE'. 
My mediaSE service url is http://10.18.1.237:8983/mediaSE/, so I replace 
127.0.0.1 with 10.18.1.237, 8080 with 8983, /solr/ with /mediaSE/. I don't have 
a core due to mediaSE is Solr 1.4, so I comment out 'core'=>' line.)
3. Edit init.php by change that requre statement to 'require 
__DIR__.'/../../../../vendor/autoload.php'; (i.e. make sure there're 4 ../ 
beofre vendor, i.e. make it point to correct folder that contains autoload.php).
4. Use your php to run, eith cli or web, 1.1-check-solarium-and-ping.php 
script. If things ok, you'll see message such as:
Solarium library version: 3.2.0 - Ping query successfularray(1) {
 ["status"]=>
 string(2) "OK"
}
(Note: For v3.6.0, it shows incorrect verison '3.4.0' which is already 
confirmed as a bug anf fixed, see: 
https://github.com/solariumphp/solarium/issues/428)
#Under CentOS:
1. Just repeat above steps except change folder delimiter \ to /.

To use Solarium in PHP
=
I'll just state some important things here. Can't provide intuitive tutorial, 
just have no time for now!
1. Put in 'require ../autoload.php' statement. Make sure the .. part 
point to correct folder that contains autoload.php.
2. 'new' Solarium\Client object first. Say the variable name is $client.
3. Use configuration mode, i.e. create a query config array variable, say its 
name is $config,  and '$client->createSelect($config)' to create a 'query' 
object. -OR-
Use standard mode, i.e. '$client->createSelect()' only, i.e. without any 
argument, to create a 'query' object.
Say the query object's variable name is $query.
4. Use $query->setXXX methods (please see the official document to see what XXX 
can be) to set necessary values.
5. Issue '$result = $client->select($query)' statement to execute query.
6. You can check if there're any document(s) returned by checking value of 
$result->getNumFound();.
7. If there're any document(s) returned, $result will essentially be an array 
of document object. Use 'foreach' to iterate it.
8. document object is essentially arra

Re: SOLR cloud sharding

2016-06-02 Thread Shawn Heisey
On 6/2/2016 1:28 AM, Selvam wrote:
> We need to run a heavy SOLR with 300 million documents, with each
> document having around 350 fields. The average length of the fields
> will be around 100 characters, it may have date and integers fields as
> well. Now we are not sure whether to have single server or run
> multiple servers (for each node/shards?). We are using Solr 5.5 and
> want best performance. We are new to SolrCloud, I would like to
> request your inputs on how many nodes/shards we need to have and how
> many servers for best performance. We primarily use geo-statial search.

The really fast answer, which I know isn't really an answer, is this:

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

This is *also* the answer if I take time to really think about it ...
and I do realize that none of this actually helps you.  You will need to
prototype.  Ideally, your prototype should be the entire index. 
Performance will generally not scale linearly, so if you make decisions
based on a small-scale prototype, you might find that you don't have
enough hardware.

The answer will be *heavily* influenced by how many of those 350 fields
will be used for searching, sorting, faceting, etc.  It will also be
influenced by the complexity of the queries, how fast the queries must
complete, and how many queries per second the cluster must handle.

With the information you have supplied, your whole index is likely to be
in the 10-20TB range.  Performance on an index that large, even with
plenty of hardware and good tuning, is probably not going to be
stellar.  You are likely to need several terabytes of total RAM (across
all servers) to achieve reasonable performance *on a single copy*.  If
you want two copies of the index for high availability, your RAM
requirements will double.  Handling an index this size is not going to
be inexpensive.

An unavoidable fact about Solr performance:  For best results, Solr must
be able to read critical data entirely from RAM for queries.  If it must
go to disk, then performance will not be optimal -- disks are REALLY
slow.  Putting the data on SSD will help, but even SSD storage is quite
a lot slower than RAM.

For *perfect* performance, the index data on a server must fit entirely
into unallocated memory -- which means memory beyond the Java heap and
the basic operating system requirements.  The operating system (not
Java) will automatically handle caching the index in this available
memory.  This perfect situation is usually not required in practice,
though -- the *entire* index is not needed when you do a query.

Here's something I wrote about the topic of Solr performance.  It is not
as comprehensive as I would like it to be, because I have tried to make
it relatively concise and useful:

https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: Small setFacetLimit() terminates Solr

2016-06-02 Thread Yonik Seeley
My guess would be that the smaller limit causes large facet refinement
requests to be sent out on the second phase.
It's not clear what's happening after that though (i.e. why that
causes things to crash)

-Yonik


On Thu, Jun 2, 2016 at 8:47 AM, Markus Jelsma
 wrote:
> Hello,
>
> I ran accros an awkward situation where is collect all ~7.000.000 distinct 
> values for a field via facetting. To keep things optimized and reduce memory 
> consumption i don't do setFacetLimit(-1) but a reasonable limit of 10.000 or 
> 100.000.
>
> To my surprise, Solr just stops or crashes. So, instead of decreasing the 
> limit, i increased the limit to a 1.000.000! And it works! The weird thing is 
> that with a limit of 100.000 or 200.000 and a heap of 3.5 GB, Solr stops. But 
> with a limit of 1.000.000 and a reduced heap of 2.5 GB, it just works fine.
>
> When it fails, it some times doesn't crash, but throw
>
> 396882 WARN  (NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983) [   ] 
> o.a.z.s.NIOServerCnxn caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x155111cc413000d, likely client has closed socket
> at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)s a
>
> This is on Solr 6.0.0 in cloud mode. 3 shards and 2 replica's on my local 
> machine.
>
> What is happening here?
>
> Many thanks,
> Markus


Re: issues using BlendedInfixLookupFactory in solr5.5

2016-06-02 Thread jmlucjav
hey Arcadius,

sorry I missed your reply and just saw it now. Thanks for the answers! I
will need to use some of those advanced settings for the suggesters, so
I'll have more questions/comments, and hopefully some fixes too (for
example for SOLR-8928 if I have the time)

xavi

On Thu, May 12, 2016 at 12:30 AM, Arcadius Ahouansou 
wrote:

> Hi Xavi.
>
> The blenderType=linear not working has been introduced in
> https://issues.apache.org/jira/browse/LUCENE-6939
>
> "linear" has been refactored to "position_linear"
>
> I would be grateful if a committer could help update the wiki with the
> comments at
>
>
>
> https://issues.apache.org/jira/browse/LUCENE-6939?focusedCommentId=15068054#comment-15068054
>
>
> About your question:
> "does SolrCloud totally support suggesters?"
> Yes, SolrCloud supports the BlendedInfixSuggester to some extend.
> What worked for us was buildOnCommit=true
>
> We used 2 collections one is live, the other one is in stand-by mode.
> We update the stand-by one in batches and we commit at the end...
> triggering the suggester rebuilt
> Then we swap the stand-by to become the live collection using aliases.
>
>
> Arcadius
>
>
> On 31 March 2016 at 18:04, xavi jmlucjav  wrote:
>
> > Hi,
> >
> > I have been working with
> > AnalyzingInfixLookupFactory/BlendedInfixLookupFactory in 5.5.0, and I
> have
> > a number of questions/comments, hopefully I get some insight into this:
> >
> > - Doc not complete/up-to-date:
> > - blenderType param does not accept 'linear' value, it did in 5.3. I
> > commented it out as it's the default.
> > - it should be mentioned contextField must be a stored field
> > - if the field used is whitespace tokenized, and you search for 'one t',
> > the suggestions are sorted by weight, not score. So if you give a
> constant
> > score to all docs, you might get this:
> > 1. one four two
> > 2. one two four
> >   Would taking the score into account (something not done yet but could
> be
> > done according to something I saw in code/jira) return 2,1 instead of
> 1,2?
> > My guess is it would, correct?
> > - what would we need to return the score too? Could it be done easily?
> > along with the payload or something.
> > - would it be possible to make BlendedInfixLookupFactory allow for some
> > fuzziness a la FuzzyLookupFactory?
> > - when building a big suggester, it can take a long time, you just send a
> > request with suggest.build=true and wait. Is there any possible way to
> > monitor the progress of this? I did not find one.
> > - for weightExpression, one typical use case would be to provide the
> users'
> > lat/lon to weight the suggestions by proximity, is this somehow feasible?
> > What would be needed?
> > - does SolrCloud totally support suggesters? If so does each shard build
> > its own suggester and it works just like a normal distributed search ?
> > - I filled SOLR-8928 suggest.cfq does not work with
> > DocumentExpressionDictionaryFactory/weightExpression as I found that
> combo
> > not working.
> >
> > regards
> > xavi
> >
>
>
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Applied Knowledge Is Power
> M: 07908761999
> W: www.menelic.com
> ---
>


Re: Small setFacetLimit() terminates Solr

2016-06-02 Thread Toke Eskildsen
On Thu, 2016-06-02 at 09:26 -0400, Yonik Seeley wrote:
> My guess would be that the smaller limit causes large facet refinement
> requests to be sent out on the second phase.
> It's not clear what's happening after that though (i.e. why that
> causes things to crash)

The facet refinement can be a lot heavier than the initial call. For
some of our queries (with unpatched Solr), we observed that it took 10
times as long.


Markus: You are hitting Solr in a way that scales very poorly. Maybe you
can use export instead?
https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets


If you really need the faceting with full counts & everything, consider
switching to a single-shard (and multiple replicas) setup as that
removes the need for the refinement phase.


- Toke Eskildsen, State and University Library, Denmark




Indexing date types

2016-06-02 Thread Steven White
Hi everyone,

This is two part question about date in Solr.

Question #1:

My understanding is, in order for me to index date types, the date data
must be formatted and indexed as such:

-MM-DDThh:mm:ssZ

What if I do not have the time part, should I be indexing it as such and
still get all the features of facet search on date (obviously, excluding
time):

-MM-DD

I have setup my Solr schema as such to index dates:




Question #2:

Per the above schema design, I will be indexing my date type as
"multiValued" which, as you know, more than 1 date data will be indexed
into the field "other_dates".  Will this be a problem when I facet search
on this field?  That is, will all the date facet capability still work,
such as range and math per
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
(obviously, excluding time)?

Thanks in advance.

Steve


Re: Indexing date types

2016-06-02 Thread Steven White
I forgot to mention another issue I run into.  Looks like "docValues" is
not supported with DateRangeField, is this true?

If I have:




Solr will fail to start, reporting the following error:

org.apache.solr.core.CoreContainer; Error creating core [openpages]:
Could not load conf for core openpages: Field type
dateRange{class=org.apache.solr.schema.DateRangeField,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={class=solr.DateRangeField}}
does not support doc values.

I have to remove "docValues" to fix this.  Is this the case or have I
missed something?

Thanks.

Steve

On Thu, Jun 2, 2016 at 11:46 AM, Steven White  wrote:

> Hi everyone,
>
> This is two part question about date in Solr.
>
> Question #1:
>
> My understanding is, in order for me to index date types, the date data
> must be formatted and indexed as such:
>
> -MM-DDThh:mm:ssZ
>
> What if I do not have the time part, should I be indexing it as such and
> still get all the features of facet search on date (obviously, excluding
> time):
>
> -MM-DD
>
> I have setup my Solr schema as such to index dates:
>
> 
>  multiValued="true" indexed="true" required="false" stored="false"/>
>
> Question #2:
>
> Per the above schema design, I will be indexing my date type as
> "multiValued" which, as you know, more than 1 date data will be indexed
> into the field "other_dates".  Will this be a problem when I facet search
> on this field?  That is, will all the date facet capability still work,
> such as range and math per
> https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
> (obviously, excluding time)?
>
> Thanks in advance.
>
> Steve
>


solr 5.4.1

2016-06-02 Thread Adnane Falh
hi i would like to create a new field structure(tika-config.xml) for my
indexing files using tika (ExtractingRequestHandler) and i just want a
working example to follow so that i can create my file thank you


scale

2016-06-02 Thread John Blythe
hi all,

having lots of processing happening using multiple solr cores to do some
data linkage with our customers' transactional data. it runs pretty slowly
at the moment. we were wondering if there were some solr or jetty tunings
that we could implement to help make it more powerful and efficient. it
currently is using less than 2GB on our box, can we open it up to use more
memory and get speedier as a result?

thanks for any tips!


Re: scale

2016-06-02 Thread Erick Erickson
Without having a lot more data it's hard to say anything helpful.

_What_ is slow? What does "data linkage" mean exactly? Etc.

Best,
Erick

On Thu, Jun 2, 2016 at 9:33 AM, John Blythe  wrote:
> hi all,
>
> having lots of processing happening using multiple solr cores to do some
> data linkage with our customers' transactional data. it runs pretty slowly
> at the moment. we were wondering if there were some solr or jetty tunings
> that we could implement to help make it more powerful and efficient. it
> currently is using less than 2GB on our box, can we open it up to use more
> memory and get speedier as a result?
>
> thanks for any tips!


Re: scale

2016-06-02 Thread John Blythe
sure.

the processes we run to do linkage take hours. we're processing ~600k
records, bouncing our users data up against a few data sources that act as
'sources of truth' for us for the sake of this linkage. we get the top 3
results and run some quick checks on it algorithmically to determine if we
have a match. we use parallel requests of 100 at a time.

solr isn't built for this sort of purpose specifically, i'm pretty sure,
but even so i'm imagining/hoping there is a way to give it a bit more
processing power.

thanks for any continued discussion!


-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Thu, Jun 2, 2016 at 12:49 PM, Erick Erickson 
wrote:

> Without having a lot more data it's hard to say anything helpful.
>
> _What_ is slow? What does "data linkage" mean exactly? Etc.
>
> Best,
> Erick
>
> On Thu, Jun 2, 2016 at 9:33 AM, John Blythe  wrote:
> > hi all,
> >
> > having lots of processing happening using multiple solr cores to do some
> > data linkage with our customers' transactional data. it runs pretty
> slowly
> > at the moment. we were wondering if there were some solr or jetty tunings
> > that we could implement to help make it more powerful and efficient. it
> > currently is using less than 2GB on our box, can we open it up to use
> more
> > memory and get speedier as a result?
> >
> > thanks for any tips!
>


data import handler for solr 5.4.1 to index rich Data

2016-06-02 Thread kostali hassan
I am looking for to define multi field for example the field links to
extract all links from the field text of each file.
I define in tika.config.xml a regex for the expression of links but when
the prossesor of indexation is finish I get just one value even if in
schema.xml I define the field links as multiValued (true) ; And I remark
the handler update/Extract get all the links automaticlly (multi value).
what I have to do to get all links present in each files with data import
handler.


Faceting Question(s)

2016-06-02 Thread Jamal, Sarfaraz
Hello Everyone,

I am working on implementing some basic faceting into my project.

I have it working the way I want to, but I feel like there is probably a better 
way the way I went about it.

* I want to show a category and its count.
* when someone clicks a category, it sets a FQ= to that category.

But now that the results are being filtered, the category counts from the 
original query without the filters are off.

So, I have a single api call that I make with rows set to 0 and the base query 
without any filters, and use that to display my categories.

And then I call the api again, this time to get the results. And the category 
count is the same.

I hope that makes sense.

I was hoping  facet.query would be of help, but I am not sure I understood it 
properly.

Thanks in advance =)

Sas


Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
Jamai - what is your q= set to? And do you have a fq for the original
query? I have found that if you do a wildcard search (*.*) you have to be
careful about other parameters you set as that can often result in the
numbers returned being off. In my case, my defaults had things like edismax
settings for phrase boosting, etc. that don't apply if there isn't a search
term, and once I removed those for a wildcard search I got the correct
numbers. So possibly your facet query itself may be set up correctly but
something else in the parameters and/or filters with the two queries may be
the cause of the difference.

Mary Jo


On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Hello Everyone,
>
> I am working on implementing some basic faceting into my project.
>
> I have it working the way I want to, but I feel like there is probably a
> better way the way I went about it.
>
> * I want to show a category and its count.
> * when someone clicks a category, it sets a FQ= to that category.
>
> But now that the results are being filtered, the category counts from the
> original query without the filters are off.
>
> So, I have a single api call that I make with rows set to 0 and the base
> query without any filters, and use that to display my categories.
>
> And then I call the api again, this time to get the results. And the
> category count is the same.
>
> I hope that makes sense.
>
> I was hoping  facet.query would be of help, but I am not sure I understood
> it properly.
>
> Thanks in advance =)
>
> Sas
>


MongoDB and Solr - Massive re-indexing

2016-06-02 Thread Robert Brown

Hi,

Currently we import data-sets from various sources (csv, xml, json, 
etc.) and POST to Solr, after some pre-processing to get it into a 
consistent format, and some other transformations.


We currently dump out to a json file in batches of 1,000 documents and 
POST that file to Solr.


Roughly 50m documents come in throughout the day, and are fully 
re-indexed.  Following the update calls, we then delete any docs based 
on a last_seen datetime field, which removes documents before the most 
recent run, related to that run.


I'm now importing our raw data firstly into MongoDB, in raw format. The 
data will then be translated and stored in another Mongo collection.  
These 2 steps are for business reasons.


That final Mongo collection then needs to be sent to Solr.

My question is whether sending batches of 1,000 documents to Solr is 
still beneficial (thinking about docs that may not change), or if I 
should look at the MongoDB connector for Solr, based on the volume of 
incoming data we see.


Would the connector still see all docs updating if I re-insert them 
blindly, and thus still send all 50m documents back to Solr everyday anyway?


Is my setup quite typical for the MongoDB connector?

Thanks,
Rob





Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
In other words... to diagnose such a problem it would really help to see
the exact parameters and filters you are using on each of the searches.

Mary Jo

On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Hello Everyone,
>
> I am working on implementing some basic faceting into my project.
>
> I have it working the way I want to, but I feel like there is probably a
> better way the way I went about it.
>
> * I want to show a category and its count.
> * when someone clicks a category, it sets a FQ= to that category.
>
> But now that the results are being filtered, the category counts from the
> original query without the filters are off.
>
> So, I have a single api call that I make with rows set to 0 and the base
> query without any filters, and use that to display my categories.
>
> And then I call the api again, this time to get the results. And the
> category count is the same.
>
> I hope that makes sense.
>
> I was hoping  facet.query would be of help, but I am not sure I understood
> it properly.
>
> Thanks in advance =)
>
> Sas
>


RE: [E] Re: Faceting Question(s)

2016-06-02 Thread Jamal, Sarfaraz
Absolutely,

Here is what it looks like:

This brings the right counts as it should
http://**select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team

Then when I specify which team
http://**select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team&fq=team:rollback

The counts are obviously different now, as the result set is limited to one 
team.

Sas

-Original Message-
From: MaryJo Sminkey [mailto:mjsmin...@gmail.com] 
Sent: Thursday, June 2, 2016 1:56 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Faceting Question(s)

Jamai - what is your q= set to? And do you have a fq for the original query? I 
have found that if you do a wildcard search (*.*) you have to be careful about 
other parameters you set as that can often result in the numbers returned being 
off. In my case, my defaults had things like edismax settings for phrase 
boosting, etc. that don't apply if there isn't a search term, and once I 
removed those for a wildcard search I got the correct numbers. So possibly your 
facet query itself may be set up correctly but something else in the parameters 
and/or filters with the two queries may be the cause of the difference.

Mary Jo


On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz < 
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Hello Everyone,
>
> I am working on implementing some basic faceting into my project.
>
> I have it working the way I want to, but I feel like there is probably 
> a better way the way I went about it.
>
> * I want to show a category and its count.
> * when someone clicks a category, it sets a FQ= to that category.
>
> But now that the results are being filtered, the category counts from 
> the original query without the filters are off.
>
> So, I have a single api call that I make with rows set to 0 and the 
> base query without any filters, and use that to display my categories.
>
> And then I call the api again, this time to get the results. And the 
> category count is the same.
>
> I hope that makes sense.
>
> I was hoping  facet.query would be of help, but I am not sure I 
> understood it properly.
>
> Thanks in advance =)
>
> Sas
>


Re: [E] Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
And you're saying the count for the second query is different than what was
returned in the facet? You may need to check for any defaults you have set
up in the solrconfig for the select parser, if for instance you have any
grouping going on, but aren't doing grouping in your facet, that could
result in the counts being off.

MJ




On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Absolutely,
>
> Here is what it looks like:
>
> This brings the right counts as it should
> http://
> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team
>
> Then when I specify which team
> http://
> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team&fq=team:rollback
>
> The counts are obviously different now, as the result set is limited to
> one team.
>
> Sas
>
> -Original Message-
> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
> Sent: Thursday, June 2, 2016 1:56 PM
> To: solr-user@lucene.apache.org
> Subject: [E] Re: Faceting Question(s)
>
> Jamai - what is your q= set to? And do you have a fq for the original
> query? I have found that if you do a wildcard search (*.*) you have to be
> careful about other parameters you set as that can often result in the
> numbers returned being off. In my case, my defaults had things like edismax
> settings for phrase boosting, etc. that don't apply if there isn't a search
> term, and once I removed those for a wildcard search I got the correct
> numbers. So possibly your facet query itself may be set up correctly but
> something else in the parameters and/or filters with the two queries may be
> the cause of the difference.
>
> Mary Jo
>
>
> On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>
> > Hello Everyone,
> >
> > I am working on implementing some basic faceting into my project.
> >
> > I have it working the way I want to, but I feel like there is probably
> > a better way the way I went about it.
> >
> > * I want to show a category and its count.
> > * when someone clicks a category, it sets a FQ= to that category.
> >
> > But now that the results are being filtered, the category counts from
> > the original query without the filters are off.
> >
> > So, I have a single api call that I make with rows set to 0 and the
> > base query without any filters, and use that to display my categories.
> >
> > And then I call the api again, this time to get the results. And the
> > category count is the same.
> >
> > I hope that makes sense.
> >
> > I was hoping  facet.query would be of help, but I am not sure I
> > understood it properly.
> >
> > Thanks in advance =)
> >
> > Sas
> >
>


Re: [E] Re: Faceting Question(s)

2016-06-02 Thread Robert Brown
MaryJo, I think you've mis-understood.  The counts are different simply 
because the 2nd query contains an filter of a facet value from the 1st 
query - that's completely expected.


The issue is how to get the original facet counts (with no filters but 
same q) in the same call as also filtering by one of those facet values.


Personally I don't think it's possible, but will be interested to hear 
others input, since it's a very common situation for me - I cache the 
first result in memcached and tag future queries as related to the first.


Or could you always make 2 calls back to Solr (one original (again), and 
one with the filters), the caches should help massively.




On 02/06/16 19:07, MaryJo Sminkey wrote:

And you're saying the count for the second query is different than what was
returned in the facet? You may need to check for any defaults you have set
up in the solrconfig for the select parser, if for instance you have any
grouping going on, but aren't doing grouping in your facet, that could
result in the counts being off.

MJ




On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:


Absolutely,

Here is what it looks like:

This brings the right counts as it should
http://
**select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team

Then when I specify which team
http://
**select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team&fq=team:rollback

The counts are obviously different now, as the result set is limited to
one team.

Sas

-Original Message-
From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
Sent: Thursday, June 2, 2016 1:56 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Faceting Question(s)

Jamai - what is your q= set to? And do you have a fq for the original
query? I have found that if you do a wildcard search (*.*) you have to be
careful about other parameters you set as that can often result in the
numbers returned being off. In my case, my defaults had things like edismax
settings for phrase boosting, etc. that don't apply if there isn't a search
term, and once I removed those for a wildcard search I got the correct
numbers. So possibly your facet query itself may be set up correctly but
something else in the parameters and/or filters with the two queries may be
the cause of the difference.

Mary Jo


On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:


Hello Everyone,

I am working on implementing some basic faceting into my project.

I have it working the way I want to, but I feel like there is probably
a better way the way I went about it.

* I want to show a category and its count.
* when someone clicks a category, it sets a FQ= to that category.

But now that the results are being filtered, the category counts from
the original query without the filters are off.

So, I have a single api call that I make with rows set to 0 and the
base query without any filters, and use that to display my categories.

And then I call the api again, this time to get the results. And the
category count is the same.

I hope that makes sense.

I was hoping  facet.query would be of help, but I am not sure I
understood it properly.

Thanks in advance =)

Sas





RE: [E] Re: Faceting Question(s)

2016-06-02 Thread Andrew Chillrud
It is possible to get the original facet counts for the field you are filtering 
on (we have been using this since Solr 3.6). Don't know if this can be extended 
to get the original counts for all fields however. 

This syntax is described here: 
https://cwiki.apache.org/confluence/display/solr/Faceting

Tagging and Excluding Filters

You can tag specific filters and exclude those filters when faceting. This is 
useful when doing multi-select faceting.

Consider the following example query with faceting:

q=mainquery&fq=status:public&fq=doctype:pdf&facet=true&facet.field=doctype

Because everything is already constrained by the filter doctype:pdf, the 
facet.field=doctype facet command is currently redundant and will return 0 
counts for everything except doctype:pdf.

To implement a multi-select facet for doctype, a GUI may want to still display 
the other doctype values and their associated counts, as if the doctype:pdf 
constraint had not yet been applied. For example:
=== Document Type ===
  [ ] Word (42)
  [x] PDF  (96)
  [ ] Excel(11)
  [ ] HTML (63)

To return counts for doctype values that are currently not selected, tag 
filters that directly constrain doctype, and exclude those filters when 
faceting on doctype.

q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=true&facet.field={!ex=dt}doctype

Filter exclusion is supported for all types of facets. Both the tag and ex 
local parameters may specify multiple values by separating them with commas.

- Andy -

-Original Message-
From: Robert Brown [mailto:r...@intelcompute.com] 
Sent: Thursday, June 02, 2016 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: [E] Re: Faceting Question(s)

MaryJo, I think you've mis-understood.  The counts are different simply because 
the 2nd query contains an filter of a facet value from the 1st query - that's 
completely expected.

The issue is how to get the original facet counts (with no filters but same q) 
in the same call as also filtering by one of those facet values.

Personally I don't think it's possible, but will be interested to hear others 
input, since it's a very common situation for me - I cache the first result in 
memcached and tag future queries as related to the first.

Or could you always make 2 calls back to Solr (one original (again), and one 
with the filters), the caches should help massively.



On 02/06/16 19:07, MaryJo Sminkey wrote:
> And you're saying the count for the second query is different than what was
> returned in the facet? You may need to check for any defaults you have set
> up in the solrconfig for the select parser, if for instance you have any
> grouping going on, but aren't doing grouping in your facet, that could
> result in the counts being off.
>
> MJ
>
>
>
>
> On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>
>> Absolutely,
>>
>> Here is what it looks like:
>>
>> This brings the right counts as it should
>> http://
>> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team
>>
>> Then when I specify which team
>> http://
>> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team&fq=team:rollback
>>
>> The counts are obviously different now, as the result set is limited to
>> one team.
>>
>> Sas
>>
>> -Original Message-
>> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
>> Sent: Thursday, June 2, 2016 1:56 PM
>> To: solr-user@lucene.apache.org
>> Subject: [E] Re: Faceting Question(s)
>>
>> Jamai - what is your q= set to? And do you have a fq for the original
>> query? I have found that if you do a wildcard search (*.*) you have to be
>> careful about other parameters you set as that can often result in the
>> numbers returned being off. In my case, my defaults had things like edismax
>> settings for phrase boosting, etc. that don't apply if there isn't a search
>> term, and once I removed those for a wildcard search I got the correct
>> numbers. So possibly your facet query itself may be set up correctly but
>> something else in the parameters and/or filters with the two queries may be
>> the cause of the difference.
>>
>> Mary Jo
>>
>>
>> On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
>> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>>
>>> Hello Everyone,
>>>
>>> I am working on implementing some basic faceting into my project.
>>>
>>> I have it working the way I want to, but I feel like there is probably
>>> a better way the way I went about it.
>>>
>>> * I want to show a category and its count.
>>> * when someone clicks a category, it sets a FQ= to that category.
>>>
>>> But now that the results are being filtered, the category counts from
>>> the original query without the filters are off.
>>>
>>> So, I have a single api call that I make with rows set to 0 and the
>>> base query without any filters, and use that to display my categories.
>>>
>>> And then I call the api again, this time to get the results. And the
>>> cate

Re: [E] Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
Ah yes I did misunderstand the question, I thought he was just saying the
count was not the same as what the facet in the first query had returned.

MJ



On Thu, Jun 2, 2016 at 2:11 PM, Robert Brown  wrote:

> MaryJo, I think you've mis-understood.  The counts are different simply
> because the 2nd query contains an filter of a facet value from the 1st
> query - that's completely expected.
>
> The issue is how to get the original facet counts (with no filters but
> same q) in the same call as also filtering by one of those facet values.
>
> Personally I don't think it's possible, but will be interested to hear
> others input, since it's a very common situation for me - I cache the first
> result in memcached and tag future queries as related to the first.
>
> Or could you always make 2 calls back to Solr (one original (again), and
> one with the filters), the caches should help massively.
>
>
>
> On 02/06/16 19:07, MaryJo Sminkey wrote:
>
>> And you're saying the count for the second query is different than what
>> was
>> returned in the facet? You may need to check for any defaults you have set
>> up in the solrconfig for the select parser, if for instance you have any
>> grouping going on, but aren't doing grouping in your facet, that could
>> result in the counts being off.
>>
>> MJ
>>
>>
>>
>>
>> On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
>> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>>
>> Absolutely,
>>>
>>> Here is what it looks like:
>>>
>>> This brings the right counts as it should
>>> http://
>>>
>>> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team
>>>
>>> Then when I specify which team
>>> http://
>>>
>>> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team&fq=team:rollback
>>>
>>> The counts are obviously different now, as the result set is limited to
>>> one team.
>>>
>>> Sas
>>>
>>> -Original Message-
>>> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
>>> Sent: Thursday, June 2, 2016 1:56 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: [E] Re: Faceting Question(s)
>>>
>>> Jamai - what is your q= set to? And do you have a fq for the original
>>> query? I have found that if you do a wildcard search (*.*) you have to be
>>> careful about other parameters you set as that can often result in the
>>> numbers returned being off. In my case, my defaults had things like
>>> edismax
>>> settings for phrase boosting, etc. that don't apply if there isn't a
>>> search
>>> term, and once I removed those for a wildcard search I got the correct
>>> numbers. So possibly your facet query itself may be set up correctly but
>>> something else in the parameters and/or filters with the two queries may
>>> be
>>> the cause of the difference.
>>>
>>> Mary Jo
>>>
>>>
>>> On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
>>> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>>>
>>> Hello Everyone,

 I am working on implementing some basic faceting into my project.

 I have it working the way I want to, but I feel like there is probably
 a better way the way I went about it.

 * I want to show a category and its count.
 * when someone clicks a category, it sets a FQ= to that category.

 But now that the results are being filtered, the category counts from
 the original query without the filters are off.

 So, I have a single api call that I make with rows set to 0 and the
 base query without any filters, and use that to display my categories.

 And then I call the api again, this time to get the results. And the
 category count is the same.

 I hope that makes sense.

 I was hoping  facet.query would be of help, but I am not sure I
 understood it properly.

 Thanks in advance =)

 Sas


>


RE: [E] Re: Faceting Question(s)

2016-06-02 Thread Jamal, Sarfaraz
Thank you Andrew, that looks like exactly what I am looking for =)
Thank you Robert, it looks like we are both doing it in similar fashion =)
Thank you MaryJo  for jumping right in!

Sas



-Original Message-
From: Andrew Chillrud [mailto:achill...@opentext.com] 
Sent: Thursday, June 2, 2016 2:17 PM
To: solr-user@lucene.apache.org
Subject: RE: [E] Re: Faceting Question(s)

It is possible to get the original facet counts for the field you are filtering 
on (we have been using this since Solr 3.6). Don't know if this can be extended 
to get the original counts for all fields however. 

This syntax is described here: 
https://cwiki.apache.org/confluence/display/solr/Faceting

Tagging and Excluding Filters

You can tag specific filters and exclude those filters when faceting. This is 
useful when doing multi-select faceting.

Consider the following example query with faceting:

q=mainquery&fq=status:public&fq=doctype:pdf&facet=true&facet.field=doctype

Because everything is already constrained by the filter doctype:pdf, the 
facet.field=doctype facet command is currently redundant and will return 0 
counts for everything except doctype:pdf.

To implement a multi-select facet for doctype, a GUI may want to still display 
the other doctype values and their associated counts, as if the doctype:pdf 
constraint had not yet been applied. For example:
=== Document Type ===
  [ ] Word (42)
  [x] PDF  (96)
  [ ] Excel(11)
  [ ] HTML (63)

To return counts for doctype values that are currently not selected, tag 
filters that directly constrain doctype, and exclude those filters when 
faceting on doctype.

q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=true&facet.field={!ex=dt}doctype

Filter exclusion is supported for all types of facets. Both the tag and ex 
local parameters may specify multiple values by separating them with commas.

- Andy -

-Original Message-
From: Robert Brown [mailto:r...@intelcompute.com]
Sent: Thursday, June 02, 2016 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: [E] Re: Faceting Question(s)

MaryJo, I think you've mis-understood.  The counts are different simply because 
the 2nd query contains an filter of a facet value from the 1st query - that's 
completely expected.

The issue is how to get the original facet counts (with no filters but same q) 
in the same call as also filtering by one of those facet values.

Personally I don't think it's possible, but will be interested to hear others 
input, since it's a very common situation for me - I cache the first result in 
memcached and tag future queries as related to the first.

Or could you always make 2 calls back to Solr (one original (again), and one 
with the filters), the caches should help massively.



On 02/06/16 19:07, MaryJo Sminkey wrote:
> And you're saying the count for the second query is different than 
> what was returned in the facet? You may need to check for any defaults 
> you have set up in the solrconfig for the select parser, if for 
> instance you have any grouping going on, but aren't doing grouping in 
> your facet, that could result in the counts being off.
>
> MJ
>
>
>
>
> On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz < 
> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>
>> Absolutely,
>>
>> Here is what it looks like:
>>
>> This brings the right counts as it should http:// 
>> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&fa
>> cet.field=team
>>
>> Then when I specify which team
>> http://
>> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&fa
>> cet.field=team&fq=team:rollback
>>
>> The counts are obviously different now, as the result set is limited 
>> to one team.
>>
>> Sas
>>
>> -Original Message-
>> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
>> Sent: Thursday, June 2, 2016 1:56 PM
>> To: solr-user@lucene.apache.org
>> Subject: [E] Re: Faceting Question(s)
>>
>> Jamai - what is your q= set to? And do you have a fq for the original 
>> query? I have found that if you do a wildcard search (*.*) you have 
>> to be careful about other parameters you set as that can often result 
>> in the numbers returned being off. In my case, my defaults had things 
>> like edismax settings for phrase boosting, etc. that don't apply if 
>> there isn't a search term, and once I removed those for a wildcard 
>> search I got the correct numbers. So possibly your facet query itself 
>> may be set up correctly but something else in the parameters and/or 
>> filters with the two queries may be the cause of the difference.
>>
>> Mary Jo
>>
>>
>> On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz < 
>> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>>
>>> Hello Everyone,
>>>
>>> I am working on implementing some basic faceting into my project.
>>>
>>> I have it working the way I want to, but I feel like there is 
>>> probably a better way the way I went about it.
>>>
>>> * I want to show a category and its count.
>>> * when someone clicks a cat

Zookeeper hanging after a commit

2016-06-02 Thread Jordan Drake
Hi all,

We are in the processing of streamlining our indexing process and trying to
increase some performance. We came across an issue where zookeeper seems to
hang for 10+ minutes (we've seen it as high as 40 min) after committing.
See the portion of the logs below.

Our indexing is being done using the MapReduceIndexerTool with the go-live
option to merge into our live Solr.
The creation of the segments in mapreduce is fairly quick, and the merge is
usually fast. It's just that we occasionally see this issue in one of our
environments.

I'm not sure whether this is a Zookeeper or Solr issue or if this is just
expected behavior. Any ideas on where to look for debugging?



16/06/02 09:03:06 INFO hadoop.MapReduceIndexerTool: Indexing 1 files
using 1 real mappers into 1 reducers
16/06/02 09:04:08 INFO hadoop.MapReduceIndexerTool: Done. Indexing 1
files using 1 real mappers into 1 reducers took 2.06103613E10 secs
16/06/02 09:04:08 INFO hadoop.GoLive: Live merging of output shards
into Solr cluster...
16/06/02 09:04:08 INFO hadoop.GoLive: Live merge
hdfs://192.168.5.228:8020/indexed/tmp/e2e/2223/results/part-0 into
http://192.168.5.227:8983/solr
16/06/02 09:04:22 INFO hadoop.GoLive: Committing live merge...
16/06/02 09:04:22 INFO zookeeper.ZooKeeper: Initiating client
connection, connectString=192.168.5.227:9983 sessionTimeout=1
watcher=org.apache.solr.common.cloud.ConnectionManager@1deca477
16/06/02 09:04:22 INFO cloud.ConnectionManager: Waiting for client to
connect to ZooKeeper
16/06/02 09:04:22 INFO zookeeper.ClientCnxn: Opening socket connection
to server 192.168.5.227/192.168.5.227:9983. Will not attempt to
authenticate using SASL (unknown error)
16/06/02 09:04:22 INFO zookeeper.ClientCnxn: Socket connection
established to 192.168.5.227/192.168.5.227:9983, initiating session
16/06/02 09:04:22 INFO zookeeper.ClientCnxn: Session establishment
complete on server 192.168.5.227/192.168.5.227:9983, sessionid =
0x154e9ea749c028f, negotiated timeout = 1
16/06/02 09:04:22 INFO cloud.ConnectionManager: Watcher
org.apache.solr.common.cloud.ConnectionManager@1deca477
name:ZooKeeperConnection Watcher:192.168.5.227:9983 got event
WatchedEvent state:SyncConnected type:None path:null path:null
type:None
16/06/02 09:04:22 INFO cloud.ConnectionManager: Client is connected to
ZooKeeper*16/06/02 09:04:22 INFO cloud.ZkStateReader: Updating cluster
state from ZooKeeper...
16/06/02 09:18:17 INFO zookeeper.ZooKeeper: Session: 0x154e9ea749c028f closed*
16/06/02 09:18:17 INFO zookeeper.ClientCnxn: EventThread shut down
16/06/02 09:18:17 INFO hadoop.GoLive: Done committing live merge
16/06/02 09:18:17 INFO hadoop.GoLive: Live merging of index shards
into Solr cluster took 2.83196359E11 secs
16/06/02 09:18:17 INFO hadoop.GoLive: Live merging completed successfully
16/06/02 09:18:17 INFO hadoop.MapReduceIndexerTool: Succeeded with
job: jobName: org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper,
jobId: job_1464681461364_0604
16/06/02 09:18:17 INFO hadoop.MapReduceIndexerTool: Success. Done.
Program took 3.04902275E11 secs. Goodbye.



Thanks,
Jordan Drake


Re: MongoDB and Solr - Massive re-indexing

2016-06-02 Thread Shawn Heisey
On 6/2/2016 11:56 AM, Robert Brown wrote:
> My question is whether sending batches of 1,000 documents to Solr is
> still beneficial (thinking about docs that may not change), or if I
> should look at the MongoDB connector for Solr, based on the volume of
> incoming data we see.
>
> Would the connector still see all docs updating if I re-insert them
> blindly, and thus still send all 50m documents back to Solr everyday
> anyway?
>
> Is my setup quite typical for the MongoDB connector?

Sending update requests to Solr containing batches of 1000 docs is a
good idea.  Depending on how large they are, you may be able to send
even more than 1000.  If you can avoid sending documents that haven't
changed, Solr will likely perform better and relevance scoring will be
better, because you won't have as many deleted docs.

The mongo connector is not software from the Solr project, or even from
Apache.  We don't know anything about it.  If you have questions about
that software, please contact the people who maintain it.  If their
answers lead to questions about Solr itself, then you can bring those
back here.

Thanks,
Shawn



Question(s) about Highlighting

2016-06-02 Thread Jamal, Sarfaraz
I am having some difficulty understanding how to do something and if it is even 
possible

I have tried the following sets of Synonyms:

1.  sarfaraz, sas, sasjamal
2.  sasjamal,sas => Sarfaraz

In the second instance, any searches with the world 'sasjamal' do not appear in 
the results, as it has been converted to Sarfaraz (I believe) -
In the first instance it works better - I believe all instances of any of those 
words  appear in the results. However the highlighted snippets also stop 
working when any of those words are 
Matched. Is there any documentation, insights or help about this issue?

Thanks in advance,

Sas


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, June 2, 2016 2:43 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: MongoDB and Solr - Massive re-indexing

On 6/2/2016 11:56 AM, Robert Brown wrote:
> My question is whether sending batches of 1,000 documents to Solr is 
> still beneficial (thinking about docs that may not change), or if I 
> should look at the MongoDB connector for Solr, based on the volume of 
> incoming data we see.
>
> Would the connector still see all docs updating if I re-insert them 
> blindly, and thus still send all 50m documents back to Solr everyday 
> anyway?
>
> Is my setup quite typical for the MongoDB connector?

Sending update requests to Solr containing batches of 1000 docs is a good idea. 
 Depending on how large they are, you may be able to send even more than 1000.  
If you can avoid sending documents that haven't changed, Solr will likely 
perform better and relevance scoring will be better, because you won't have as 
many deleted docs.

The mongo connector is not software from the Solr project, or even from Apache. 
 We don't know anything about it.  If you have questions about that software, 
please contact the people who maintain it.  If their answers lead to questions 
about Solr itself, then you can bring those back here.

Thanks,
Shawn



RE: Alternate Port Not Working for Solr 6.0.0

2016-06-02 Thread Teague James
  Hi Shawn!

Thanks for that suggestion, but I had found that file and I had changed it to 
80, but still no luck. Solr isn't running because it never started in the first 
place. I also tried the -p 80 flag using the install script and it failed. 

Tried: ./install_solr_service.sh solr-6.0.0.tgz –p 80
Result: ERROR: Unrecognized or misplaced argument -p!

Tried: ./install_solr_service.sh solr-6.0.0.tgz -i /opt -d /var/solr -u solr -s 
solr -p 80
Result: 
  id: solr: no such user
  Creating new user: solr
  Adding system user 'solr' (UID 116) ...
  Adding new group 'solr' (GID 125) ...
  Adding new user 'solr' (UID 116) with group 'solr' ...
  Creating home directory '/var/solr' ...

  Extracting solr-6.0.0.tgz to /opt

  Installing symlink /opt/solr -> /opt/solr-6.0.0 ...

  Installing /etc/init.d/solr script ...

  Installing /etc/default/solr.in.sh ...

Adding system startup for /etc/init.d/solr ...
  /etc/rc0.d/K20solr -> ../init.d/solr
  /etc/rc1.d/K20solr -> ../init.d/solr
  /etc/rc6.d/K20solr -> ../init.d/solr
  /etc/rc2.d/K20solr -> ../init.d/solr
  /etc/rc3.d/K20solr -> ../init.d/solr
  /etc/rc4.d/K20solr -> ../init.d/solr
  /etc/rc5.d/K20solr -> ../init.d/solr
Waiting up to 30 seconds to see Solr running on port 80 [\]  Still not seeing 
Solr listening on 80 after 30 seconds!
(This is followed by several lines of INFO from the log. The only WARNs I got 
are below)
970  WARN  (main) [   ] o.e.j.s.SecurityHandler 
ServletContext@o.e.j.w.WebAppContext@2286778{/solr,file:///opt/solr-6.0.0/server/solr-webapp/webapp/,STARTING}{/opt/solr-6.0.0/server/solr-webapp/webapp}
 has uncovered http methods for path: /
1102 WARN  (main) [   ] o.a.s.c.CoreContainer Couldn't add files from 
/var/solr/data/lib to classpath: /var/solr/data/lib

In the "solr-800-console.log" I found the same log entries, plus the following:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:214)
at org.eclipse.jetty.start.Main.start(Main.java:457)
at org.eclipse.jetty.start.Main.main(Main.java:75)
Caused by: java.net.SocketException: Permission denied
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at 
org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:326)
at 
org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at 
org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:244)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.server.Server.doStart(Server.java:384)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1510)
at java.security.AccessController.doPrivileged(Native Method)
at 
org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1435)
... 7 more

IF I execute the exact same command, but with port 8983 instead of 80, no 
issues! 

Tried: ./install_solr_service.sh solr-6.0.0.tgz -i /opt -d /var/solr -u solr -s 
solr -p 8983
Result: Solr server on port 8983 (pid=10503). Happy searching!

This led me to question if this was just an issue with port 80...

IF I do as you suggest, and using the successful port 8983 installation, change 
the port assignment in /etc/default/solr.in.sh to 80 while Solr is stopped, 
then start Solr, I get the exact same log dump to the screen and failure to 
start Solr.

IF I change the port assignment back to 8983, no issues - happy searching!

IF I change the port assignment to 81, same screen dump/failure to load as with 
port 80.

IF I change the port assignment to 800, same screen dump/failure to load as 
with port 80.

IF I change the port assignment to 8000, no issues - happy searching!

IF I change the port assignment to 999, same screen dump/failure to load as 
with port 80.

IF I change the port assignment to 1000, same screen dump/failure to load as 
with port 80.

IF I change the port assignment to 7000, no issues - happy searching!

IF I change the port assignment to 4000, no issues - happy searching!

IF I change the port assignment to 2000, no issues - happy searching!

IF I change the port assignment to 1500, no issues - happy searching!

IF I change the port assignment to 1001, same screen dump/failure to 

Re: Alternate Port Not Working for Solr 6.0.0

2016-06-02 Thread Shawn Heisey
On 6/2/2016 12:51 PM, Teague James wrote:
> Thanks for that suggestion, but I had found that file and I had
> changed it to 80, but still no luck. Solr isn't running because it
> never started in the first place. I also tried the -p 80 flag using
> the install script and it failed.

Something I just thought of, but should have remembered earlier:  In
order to bind to port 80, you must run as root.  Binding to any port
below 1024 requires privilege.  It looks like you installed Solr to run
as the user named "solr" -- so it cannot do what it is being asked to do.

It might be possible to fiddle with selinux and achieve this without
running as root, but I have no idea how that is done.  You can also
install a proxy in front of Solr that runs on port 80, and accesses Solr
via some other port.

This is one of the reasons that Solr runs on a high port number by default.

Thanks,
Shawn



Re: Alternate Port Not Working for Solr 6.0.0

2016-06-02 Thread Robert Brown
In addition to a separate proxy you could use iptables, I use this 
technique for another app (running on port 5000 but requests come in 
port 80)...



*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]

-A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 5000

COMMIT
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT





On 02/06/16 20:48, Shawn Heisey wrote:

On 6/2/2016 12:51 PM, Teague James wrote:

Thanks for that suggestion, but I had found that file and I had
changed it to 80, but still no luck. Solr isn't running because it
never started in the first place. I also tried the -p 80 flag using
the install script and it failed.

Something I just thought of, but should have remembered earlier:  In
order to bind to port 80, you must run as root.  Binding to any port
below 1024 requires privilege.  It looks like you installed Solr to run
as the user named "solr" -- so it cannot do what it is being asked to do.

It might be possible to fiddle with selinux and achieve this without
running as root, but I have no idea how that is done.  You can also
install a proxy in front of Solr that runs on port 80, and accesses Solr
via some other port.

This is one of the reasons that Solr runs on a high port number by default.

Thanks,
Shawn





Re: Question(s) about Highlighting

2016-06-02 Thread Alessandro Benedetti
Hi Jamal,
I assume you are using the Synonym token filter.
>From the observation I can assume you are using it only at indexing time.
This means that when you index you are  :

1) given a row in the synonym.txt you index all the terms per row in place
of any of the term in the row .

2) given any of the term in the left side of the expression, you index the
term in the right side of the expression

You can verify this easily with the analysis tool in the Solr UI .



On Thu, Jun 2, 2016 at 7:50 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> I am having some difficulty understanding how to do something and if it is
> even possible
>
> I have tried the following sets of Synonyms:
>
> 1.  sarfaraz, sas, sasjamal
> 2.  sasjamal,sas => Sarfaraz
>
> In the second instance, any searches with the world 'sasjamal' do not
> appear in the results, as it has been converted to Sarfaraz (I believe) -
>

This means you don't use the same synonym.txt at query time. indeed
sasjamal is not in the index at all.


> In the first instance it works better - I believe all instances of any of
> those words  appear in the results. However the highlighted snippets also
> stop working when any of those words are
> Matched. Is there any documentation, insights or help about this issue?
>

I should verify that, it could be related the term offset.
Please take a look to the analysis tool as well to understand better how
the offsets are assigned.
I remember long time ago there was a discussion about it and a bug or
similar raised.

Cheers

>
> Thanks in advance,
>
> Sas
>
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Thursday, June 2, 2016 2:43 PM
> To: solr-user@lucene.apache.org
> Subject: [E] Re: MongoDB and Solr - Massive re-indexing
>
> On 6/2/2016 11:56 AM, Robert Brown wrote:
> > My question is whether sending batches of 1,000 documents to Solr is
> > still beneficial (thinking about docs that may not change), or if I
> > should look at the MongoDB connector for Solr, based on the volume of
> > incoming data we see.
> >
> > Would the connector still see all docs updating if I re-insert them
> > blindly, and thus still send all 50m documents back to Solr everyday
> > anyway?
> >
> > Is my setup quite typical for the MongoDB connector?
>
> Sending update requests to Solr containing batches of 1000 docs is a good
> idea.  Depending on how large they are, you may be able to send even more
> than 1000.  If you can avoid sending documents that haven't changed, Solr
> will likely perform better and relevance scoring will be better, because
> you won't have as many deleted docs.
>
> The mongo connector is not software from the Solr project, or even from
> Apache.  We don't know anything about it.  If you have questions about that
> software, please contact the people who maintain it.  If their answers lead
> to questions about Solr itself, then you can bring those back here.
>
> Thanks,
> Shawn
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Help: Lucidwork Fusion documentation

2016-06-02 Thread Aman Tandon
Hi,

How could I download the Fusion documentation pdf ? If anyone is aware,
please help me!!

With Regards
Aman Tandon


RE: Help: Lucidwork Fusion documentation

2016-06-02 Thread Davis, Daniel (NIH/NLM) [C]
Is the Solr Reference Guide what you are looking for?

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-6.0.pdf

I don't know how to find older versions.


From: Aman Tandon [amantandon...@gmail.com]
Sent: Thursday, June 02, 2016 7:10 PM
To: solr-user@lucene.apache.org
Subject: Help: Lucidwork Fusion documentation

Hi,

How could I download the Fusion documentation pdf ? If anyone is aware,
please help me!!

With Regards
Aman Tandon



Re: Help: Lucidwork Fusion documentation

2016-06-02 Thread Chris Hostetter

Lucidworks Fusion is a commercial product, not a part of the Apache 
Software Foundation - questions about using it are not really appropriate 
for this mailing list.  You should contact Lucidworks support directly...

https://lucidworks.com/company/contact/

...with that in mind, the docs for Fusion can be found here...

https://doc.lucidworks.com/index.html



: Date: Fri, 3 Jun 2016 04:40:57 +0530
: From: Aman Tandon 
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" 
: Subject: Help: Lucidwork Fusion documentation
: 
: Hi,
: 
: How could I download the Fusion documentation pdf ? If anyone is aware,
: please help me!!
: 
: With Regards
: Aman Tandon
: 

-Hoss
http://www.lucidworks.com/


Re: How can we incrementally build the solr suggestions

2016-06-02 Thread Erick Erickson
In a word "no". However, there is a _third_ option which is to explicitly
build the suggesters on whatever schedule you want by issuing
(using cURL or the like, perhaps with a cron job) where the
URL looks something like
http://localhost:8983/solr/techproducts/suggest?suggest=true&suggest.build=true&;

Best,
Erick

On Wed, Jun 1, 2016 at 2:54 AM, Subrahmanyam MadhavaBotla
 wrote:
> Hi Team,
>
> We are using Solr suggestions based on indexed terms.
> However, We see only two options for building solr suggestion.. commit... on 
> start. .
> We understand that these will completely rebuild the suggestions every time 
> these are called..
> How can we incrementally build the solr suggestions.. is there any 
> configuration we can supply for this?
>
>
> Thanks and Regards,
> Subrahmanyam MadhavaBotla
> Senior Product Engineer | Products | Accelerite
> madhava_subrahmn...@persistent.co.in 
> | Cell: +91-9923051689 | Tel: +91-712-6691129 | IT PARK NAGPUR
> Persistent Systems Ltd. Partners in Innovation | 
> www.persistent.co.in
>
>
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is the 
> property of Persistent Systems Ltd. It is intended only for the use of the 
> individual or entity to which it is addressed. If you are not the intended 
> recipient, you are not authorized to read, retain, copy, print, distribute or 
> use this message. If you have received this communication in error, please 
> notify the sender and delete all copies of this message. Persistent Systems 
> Ltd. does not accept any liability for virus infected mails.
>


Re: Configure SolrCloud for Loadbalance for .net client

2016-06-02 Thread Erick Erickson
Most people just put a hardware load balancer in front of their Solr
cluster and leave it at that. Since you're using .net, you can't
use CloudSolrClient which has a software LB built in so you'll
have to do something external.

Best,
Erick

On Wed, Jun 1, 2016 at 4:10 AM, shivendra.tiwari
 wrote:
> Hi,
>
> I have to configure SolrCloud for loadbalance on .net application please 
> suggest what we have to needed and how to configure it. We are currently 
> working on lower version of Solr with Master and Slave concept.
>
> Please suggest.
>
>
> Warm Regards!
> Shivendra Kumar Tiwari


Re: Solr off-heap FieldCache & HelioSearch

2016-06-02 Thread Erick Erickson
Basically it never reached consensus, see the discussion at:
https://issues.apache.org/jira/browse/SOLR-6638

If you can afford it I've seen people with very good results
using Zing/Azul, but that can be expensive.

DocValues can help for fields you facet and sort on,
those essentially move memory into the OS
cache.

But memory is an ongoing struggle I'm afraid.

Best,
Erick

On Wed, Jun 1, 2016 at 12:34 PM, Phillip Peleshok  wrote:
> Hey everyone,
>
> I've been using Solr for some time now and running into GC issues as most
> others have.  Now I've exhausted all the traditional GC settings
> recommended by various individuals (ie Shawn Heisey, etc) but neither
> proved sufficient.  The one solution that I've seen that proved useful is
> Heliosearch and the off-heap implementation.
>
> My question is this, why wasn't the off-heap FieldCache implementation (
> http://yonik.com/hs-solr-off-heap-fieldcache-performance/) ever rolled into
> Solr when the other HelioSearch improvement were merged? Was there a
> fundamental design problem or just a matter of time/testing that would be
> incurred by the move?
>
> Thanks,
> Phil


Re: SolrCloud 5.2.1 nodes are out of sync - how to handle

2016-06-02 Thread Erick Erickson
A pedantic nit... leader/replica is not much like
"old master/slave".

That out of the way, here's what I'd do.
1> use the ADDREPLICA to add a new replica for the shard
_on the same node as the bad one_.
2> Once that had recoverd (green in the admin UI) and you
 were confident of
   its integrity (you can verify by running queries against this
  new replica and the leader with &distrib=false), use
   DELETEREPLICA on the "bad" core.

Best,
Erick

On Wed, Jun 1, 2016 at 5:54 AM, Ilan Schwarts  wrote:
> Hi,
> We have in lab SolrCloud 5.2.1
> 2 Shards, each shard has 2 cores/nodes, replication factor is 1. meaning
> that one node is leader (like old master-slave).
> (upon collection creation numShards=1 rp=1)
>
> Now there is a problem in the lab, shard 1 has 2 cores, but the number of
> docs is different, and when adding a document to one of the cores, it will
> not replicate the data to the other one.
> If i check cluster state.json it appears fine, it writes there are 2 active
> cores and only 1 is set as leader.
>
> What is the recovery method for a scenario like this ? I dont have logs
> anymore and cannot reproduce.
> Is it possible to merge the 2 cores into 1, and then split that core to 2
> cores ?
> Or maybe to enforce sync if possible ?
>
> The other shard, Shard 2 is functioning well, the replication works fine,
> when adding a document to 1 core, it will replicate it to the other.
>
> --
>
>
> -
> Ilan Schwarts


Re: Solr off-heap FieldCache & HelioSearch

2016-06-02 Thread Phillip Peleshok
Fantastic! I'm sorry I couldn't find that JIRA before and for getting you
to track it down.

Yup, I noticed that for the docvalues with the ordinal map and I'm
definitely leveraging all that but I'm hitting the terms limit now and that
ends up pushing me over.  I'll see about giving Zing/Azul a try.  From all
my readings using theUnsafe seemed a little sketchy (
http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/) so
I'm glad that seemed to be the point of contention bringing it in and not
anything else.

Thank you very much for the info,
Phil

On Thu, Jun 2, 2016 at 6:14 PM, Erick Erickson 
wrote:

> Basically it never reached consensus, see the discussion at:
> https://issues.apache.org/jira/browse/SOLR-6638
>
> If you can afford it I've seen people with very good results
> using Zing/Azul, but that can be expensive.
>
> DocValues can help for fields you facet and sort on,
> those essentially move memory into the OS
> cache.
>
> But memory is an ongoing struggle I'm afraid.
>
> Best,
> Erick
>
> On Wed, Jun 1, 2016 at 12:34 PM, Phillip Peleshok 
> wrote:
> > Hey everyone,
> >
> > I've been using Solr for some time now and running into GC issues as most
> > others have.  Now I've exhausted all the traditional GC settings
> > recommended by various individuals (ie Shawn Heisey, etc) but neither
> > proved sufficient.  The one solution that I've seen that proved useful is
> > Heliosearch and the off-heap implementation.
> >
> > My question is this, why wasn't the off-heap FieldCache implementation (
> > http://yonik.com/hs-solr-off-heap-fieldcache-performance/) ever rolled
> into
> > Solr when the other HelioSearch improvement were merged? Was there a
> > fundamental design problem or just a matter of time/testing that would be
> > incurred by the move?
> >
> > Thanks,
> > Phil
>


Re: After Solr 5.5, mm parameter doesn't work properly

2016-06-02 Thread Greg Pendlebury
I think the confusion stems from the legacy implementation partially
conflating q.op with mm for users, when they are very different things.
q.op tells Solr how to insert boolean operators before they are converted
into occurs flags, and then downstream, mm applies on _only_ the SHOULD
occurs flags, not MUST or NOT flags.

So if the user is setting mm=2, they are asking for a minimum of 2 of the
SHOULD clauses to be found, not 2 of ALL clauses. mm has absolutely nothing
to do with q.op other than (because of the implementation) q.op is used to
derive a default value when it is not explicitly set.

The legacy implementation has situations where it was not possible to
generate the search you wanted because of the conflation, hence why
SOLR-2649 was so popular. I fully acknowledge that there are cases where
the change is disrupting users that (for whatever reason) are/were not
necessarily aware of what the parameters they are using actually do, or
users that were very aware, but forced to rely on a non-intuitive settings
to work around the behaviour eDismax had. SOLR-8812 (although not relevant
to the OP) goes part way towards helping the former users, but the latter
will want to adjust their parameters to be explicit now instead of
leveraging a workaround.

I haven't yet seen a use case where the final solution we put in for
SOLR-2649 does not work, but I have seen lots of user parameters used that
Solr handles perfectly... just in a way that the user did not expect. I
suspect this is mainly because the topic and the implementation are fairly
technically dense (from q.op, then to boolean to occurs conversion, then
finally to mm) and difficult to explain and document accurately for an end
user.

I am writing this in rush sorry, to go collect a child from school.

Ta,
Greg


On 2 June 2016 at 19:08, Jan Høydahl  wrote:

> [Aside] Your quote style is confusing, leaving my lines unquoted and your
> new lines quoted?? [/Aside]
>
> > So in relation to the OP's sample queries I was pointing out that
> 'q.op=OR
> > + mm=2' and 'q,op=AND + mm=2' are treated as identical queries by Solr
> 5.4,
> > but 5.5+ will manipulate the occurs flags differently before it applies
> mm
> > afterwards... because that is what q.op does.
>
> If a user explicitly says mm=2, then the users intent is that he should
> neither have pure OR (no clauses required) nor pure AND (all clauses
> required),
> but exactly two clauses required.
>
> So I think we need to go back to a solution where q.op technically
> stays as OR for custom mm. How that would affect queries with explicit
> operators
> I don’t know...
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 2. jun. 2016 kl. 05.12 skrev Greg Pendlebury  >:
> >
> > I would describe that subtly differently, and I think it is where the
> > difference lies:
> >
> > "Then from 4.x it did not care about q.op if mm was set explicitly"
> >>> I agree. q.op was not actually used in the query, but rather as a way
> of
> > inferred the default mm value. eDismax still ignored whatever q.op was
> set
> > and built your query operators (ie. the occurs flags) using q.op=OR.
> >
> > "And from 5.5 it seems as q.op does something even if mm is set..."
> >>> Yes, although I think it is the words 'even if' drawing too strong a
> > relationship between the two parameters. q.op has a function of its own,
> > and that now functions as it 'should' (opinionated, I know) in the query
> > construction, and continues to influence the default value of mm if it
> has
> > not been explicitly set. SOLR-8812 further evolves that influence by
> trying
> > to improve backwards compatibility for users who were not explicitly
> > setting mm, and only ever changed 'q.op' despite it being a step removed
> > from the actual parameter they were trying to manipulate.
> >
> > So in relation to the OP's sample queries I was pointing out that
> 'q.op=OR
> > + mm=2' and 'q,op=AND + mm=2' are treated as identical queries by Solr
> 5.4,
> > but 5.5+ will manipulate the occurs flags differently before it applies
> mm
> > afterwards... because that is what q.op does.
> >
> >
> > On 2 June 2016 at 07:13, Jan Høydahl  wrote:
> >
> >> Edismax used to default to mm=100% and not care about q.op at all
> >>
> >> Then from 4.x it did not care about q.op if mm was set explicitly,
> >> but if mm was not set, then q.op=OR —> mm=0%, q.op=AND —> mm=100%
> >>
> >> And from 5.5 it seems as q.op does something even if mm is set...
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>> 1. jun. 2016 kl. 23.05 skrev Greg Pendlebury <
> greg.pendleb...@gmail.com
> >>> :
> >>>
> >>> But isn't that the default value? In this case the OP is setting mm
> >>> explicitly to 2.
> >>>
> >>> Will have to look at those code links more thoroughly at work this
> >> morning.
> >>> Apologies if I am wrong.
> >>>
> >>> Ta,
> >>> Greg
> >>>
> >>> On Wednesday, 1 June 2016, Jan Høydahl  wr

Re: [E] Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
Well thanks for asking the question because I had no idea what Andrew
posted was even possible... and I most definitely will be using that
myself! Totally brilliant stuff. I am so loving Solr... well, when it's not
driving me bonkers.

Mary Jo


On Thu, Jun 2, 2016 at 2:33 PM, Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Thank you Andrew, that looks like exactly what I am looking for =)
> Thank you Robert, it looks like we are both doing it in similar fashion =)
> Thank you MaryJo  for jumping right in!
>
> Sas
>
>
>
> -Original Message-
> From: Andrew Chillrud [mailto:achill...@opentext.com]
> Sent: Thursday, June 2, 2016 2:17 PM
> To: solr-user@lucene.apache.org
> Subject: RE: [E] Re: Faceting Question(s)
>
> It is possible to get the original facet counts for the field you are
> filtering on (we have been using this since Solr 3.6). Don't know if this
> can be extended to get the original counts for all fields however.
>
> This syntax is described here:
> https://cwiki.apache.org/confluence/display/solr/Faceting
>
> Tagging and Excluding Filters
>
> You can tag specific filters and exclude those filters when faceting. This
> is useful when doing multi-select faceting.
>
> Consider the following example query with faceting:
>
> q=mainquery&fq=status:public&fq=doctype:pdf&facet=true&facet.field=doctype
>
> Because everything is already constrained by the filter doctype:pdf, the
> facet.field=doctype facet command is currently redundant and will return 0
> counts for everything except doctype:pdf.
>
> To implement a multi-select facet for doctype, a GUI may want to still
> display the other doctype values and their associated counts, as if the
> doctype:pdf constraint had not yet been applied. For example:
> === Document Type ===
>   [ ] Word (42)
>   [x] PDF  (96)
>   [ ] Excel(11)
>   [ ] HTML (63)
>
> To return counts for doctype values that are currently not selected, tag
> filters that directly constrain doctype, and exclude those filters when
> faceting on doctype.
>
>
> q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=true&facet.field={!ex=dt}doctype
>
> Filter exclusion is supported for all types of facets. Both the tag and ex
> local parameters may specify multiple values by separating them with commas.
>
> - Andy -
>
> -Original Message-
> From: Robert Brown [mailto:r...@intelcompute.com]
> Sent: Thursday, June 02, 2016 2:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: [E] Re: Faceting Question(s)
>
> MaryJo, I think you've mis-understood.  The counts are different simply
> because the 2nd query contains an filter of a facet value from the 1st
> query - that's completely expected.
>
> The issue is how to get the original facet counts (with no filters but
> same q) in the same call as also filtering by one of those facet values.
>
> Personally I don't think it's possible, but will be interested to hear
> others input, since it's a very common situation for me - I cache the first
> result in memcached and tag future queries as related to the first.
>
> Or could you always make 2 calls back to Solr (one original (again), and
> one with the filters), the caches should help massively.
>
>
>
> On 02/06/16 19:07, MaryJo Sminkey wrote:
> > And you're saying the count for the second query is different than
> > what was returned in the facet? You may need to check for any defaults
> > you have set up in the solrconfig for the select parser, if for
> > instance you have any grouping going on, but aren't doing grouping in
> > your facet, that could result in the counts being off.
> >
> > MJ
> >
> >
> >
> >
> > On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
> > sarfaraz.ja...@verizonwireless.com.invalid> wrote:
> >
> >> Absolutely,
> >>
> >> Here is what it looks like:
> >>
> >> This brings the right counts as it should http://
> >> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&fa
> >> cet.field=team
> >>
> >> Then when I specify which team
> >> http://
> >> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&fa
> >> cet.field=team&fq=team:rollback
> >>
> >> The counts are obviously different now, as the result set is limited
> >> to one team.
> >>
> >> Sas
> >>
> >> -Original Message-
> >> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
> >> Sent: Thursday, June 2, 2016 1:56 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: [E] Re: Faceting Question(s)
> >>
> >> Jamai - what is your q= set to? And do you have a fq for the original
> >> query? I have found that if you do a wildcard search (*.*) you have
> >> to be careful about other parameters you set as that can often result
> >> in the numbers returned being off. In my case, my defaults had things
> >> like edismax settings for phrase boosting, etc. that don't apply if
> >> there isn't a search term, and once I removed those for a wildcard
> >> search I got the correct numbers. So possibly your facet query itself
> >> may be set up correctly but

Re: [E] Re: Faceting Question(s)

2016-06-02 Thread Erick Erickson
One of the most valuable things I did when I started out
(way back in the Lucene-only days) was try to answer _one_
question every so often. Even if someone else beat me to the
punch, I benefitted from the research. And the rest of the time
I discovered things I never knew about Solr/Lucene!

I think one of the most valuable lessons was "Somebody's
probably run into this before, I wonder what _they_ did?"
;)

Best,
Erick

On Thu, Jun 2, 2016 at 9:46 PM, MaryJo Sminkey  wrote:
> Well thanks for asking the question because I had no idea what Andrew
> posted was even possible... and I most definitely will be using that
> myself! Totally brilliant stuff. I am so loving Solr... well, when it's not
> driving me bonkers.
>
> Mary Jo
>
>
> On Thu, Jun 2, 2016 at 2:33 PM, Jamal, Sarfaraz <
> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>
>> Thank you Andrew, that looks like exactly what I am looking for =)
>> Thank you Robert, it looks like we are both doing it in similar fashion =)
>> Thank you MaryJo  for jumping right in!
>>
>> Sas
>>
>>
>>
>> -Original Message-
>> From: Andrew Chillrud [mailto:achill...@opentext.com]
>> Sent: Thursday, June 2, 2016 2:17 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: [E] Re: Faceting Question(s)
>>
>> It is possible to get the original facet counts for the field you are
>> filtering on (we have been using this since Solr 3.6). Don't know if this
>> can be extended to get the original counts for all fields however.
>>
>> This syntax is described here:
>> https://cwiki.apache.org/confluence/display/solr/Faceting
>>
>> Tagging and Excluding Filters
>>
>> You can tag specific filters and exclude those filters when faceting. This
>> is useful when doing multi-select faceting.
>>
>> Consider the following example query with faceting:
>>
>> q=mainquery&fq=status:public&fq=doctype:pdf&facet=true&facet.field=doctype
>>
>> Because everything is already constrained by the filter doctype:pdf, the
>> facet.field=doctype facet command is currently redundant and will return 0
>> counts for everything except doctype:pdf.
>>
>> To implement a multi-select facet for doctype, a GUI may want to still
>> display the other doctype values and their associated counts, as if the
>> doctype:pdf constraint had not yet been applied. For example:
>> === Document Type ===
>>   [ ] Word (42)
>>   [x] PDF  (96)
>>   [ ] Excel(11)
>>   [ ] HTML (63)
>>
>> To return counts for doctype values that are currently not selected, tag
>> filters that directly constrain doctype, and exclude those filters when
>> faceting on doctype.
>>
>>
>> q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=true&facet.field={!ex=dt}doctype
>>
>> Filter exclusion is supported for all types of facets. Both the tag and ex
>> local parameters may specify multiple values by separating them with commas.
>>
>> - Andy -
>>
>> -Original Message-
>> From: Robert Brown [mailto:r...@intelcompute.com]
>> Sent: Thursday, June 02, 2016 2:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: [E] Re: Faceting Question(s)
>>
>> MaryJo, I think you've mis-understood.  The counts are different simply
>> because the 2nd query contains an filter of a facet value from the 1st
>> query - that's completely expected.
>>
>> The issue is how to get the original facet counts (with no filters but
>> same q) in the same call as also filtering by one of those facet values.
>>
>> Personally I don't think it's possible, but will be interested to hear
>> others input, since it's a very common situation for me - I cache the first
>> result in memcached and tag future queries as related to the first.
>>
>> Or could you always make 2 calls back to Solr (one original (again), and
>> one with the filters), the caches should help massively.
>>
>>
>>
>> On 02/06/16 19:07, MaryJo Sminkey wrote:
>> > And you're saying the count for the second query is different than
>> > what was returned in the facet? You may need to check for any defaults
>> > you have set up in the solrconfig for the select parser, if for
>> > instance you have any grouping going on, but aren't doing grouping in
>> > your facet, that could result in the counts being off.
>> >
>> > MJ
>> >
>> >
>> >
>> >
>> > On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
>> > sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>> >
>> >> Absolutely,
>> >>
>> >> Here is what it looks like:
>> >>
>> >> This brings the right counts as it should http://
>> >> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&fa
>> >> cet.field=team
>> >>
>> >> Then when I specify which team
>> >> http://
>> >> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&fa
>> >> cet.field=team&fq=team:rollback
>> >>
>> >> The counts are obviously different now, as the result set is limited
>> >> to one team.
>> >>
>> >> Sas
>> >>
>> >> -Original Message-
>> >> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
>> >> Sent: Thursday, June 2, 2016 1:56 PM
>> >> To: solr-user@luc

Re: Using solr with increasing complicated access control

2016-06-02 Thread Erick Erickson
Lisheng:

I'm not too up on the details of Lucene block join, but I don't
think it really applies to access control. You'd have to
have documents grouped by access control (i.e. every
child doc of doc X has the same access control). If you
can do that, you can put an "authorization token" in the
doc (or more than one) and just use simple fq clauses.

Here's one technique I've seen: Implement a custom
post-filter that computes access rights when each doc
comes through (see:
http://qaware.blogspot.com/2014/11/how-to-write-postfilter-for-solr-49.html
it's a little old but it'll give you an idea of how to do this).

Then, in the collect method, quit calculating these after
N docs (where N is "how many you can compute quickly enough
to satisfy your SLA) and after reaching N, fail all other docs.
Then return some indicator about "please refine your search"
so the user knows they may not have seen the best docs,
but there were a _lot_ of docs that matched.

It's not perfect, but it often suffices. You certainly don't want
to be in the situation where you have to calculate the access
privileges for every doc in the corpus or, as you indicated,
it gets really slow.

Or get the principals to have more reasonable access rules ;)

Best,
Erick

On Wed, Jun 1, 2016 at 5:07 PM, Lisheng Zhang  wrote:
> Erick, very sorry that i misspelled your name earlier! later i read more
> and found that lucene seemed to implement approach 2/ (search a few times
> and combine results), i guess when joining becomes complicated the
> performance may suffer? later i will try to study more,
>
> thanks for helps, Lisheng
>
> On Wed, Jun 1, 2016 at 12:34 PM, Lisheng Zhang  wrote:
>
>> Eric: thanks very much for your quick response (somehow msg was sent to
>> spam initially, sorry about that)
>>
>> yes the rules has to be complicated beyond my control, we also tried to
>> filter after search, but after data amount grows, it becomes slow ..
>>
>> Rightnow lucene has feature like document block or join to simulate
>> relational database behavior, did lucene implement join by:
>>
>> 1/ internally flatten out documents to generate one new document
>> 2/ or search more than once, then merge results
>> 3/ or better way i could not see?
>>
>> For now i only need a high level understanding, thanks for your helps,
>> Lisheng
>>
>>
>> On Mon, May 23, 2016 at 6:23 PM, Erick Erickson 
>> wrote:
>>
>>> I know this seems facetious, but Talk to your
>>> clients about _why_ they want such increasingly
>>> complex access requirements. Often the logic
>>> is pretty flawed for the complexity. Things like
>>> "allow user X to see document Y if they're part of
>>> groups A, B, C but not D or E unless they are
>>> also part of sub-group F and it's raining outside"...
>>>
>>> If the rules _must_ be complicated, that's what
>>> post-filters were actually invented for. Pretty often
>>> I'll build in some "bailout" because whatever you
>>> build has, eventually, to deal with the system
>>> admin searching all documents, i.e. doing the
>>> ACL calcs for every document.
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, May 23, 2016 at 6:02 PM, Lisheng Zhang 
>>> wrote:
>>> > Hi, i have been using solr for many years and it is VERY helpful.
>>> >
>>> > My problem is that our app has an increasingly more complicated access
>>> > control to satisfy client's requirement, in solr/lucene  it means we
>>> need
>>> > to add more and more fields into each document and use more and more
>>> > complicated filter conditions, so code is hard to maintain and indexing
>>> > becomes a serious issue because we want to search as real time as
>>> possible.
>>> >
>>> > I would appreciate a high level guidance on how to deal with this issue?
>>> > recently i investigated mySQL fulltext search (our app uses mySQL),
>>> using
>>> > mySQL means we simply reuse DB for access control, but mySQL fulltext
>>> > search performance is far from ideal compared to solr.
>>> >
>>> > Thanks very much for helps, Lisheng
>>>
>>
>>


Re: [E] Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
Yeah even though I'm still fairly new to this, I'm generally a good problem
solver or I'd never have gotten as far as I have already on my own (really
wanted to hire a Solr consultant and pushed VERY hard for it, but my boss
really likes us to figure things out on our own!) Just wish I'd found this
list long before now, I have a feeling it would have saved me some very
long nights and weekends trying to work out some of the more baffling
issues. That's why I jumped in and why I misinterpreted the question...
because the way I read it was the thing that literally drove me crazy for
two days straight trying to figure out. ;-)  But I'm very excited to find
out the real question and answer as that is something that definitely
applies to us as well and will certainly speed up our searches to drop the
extra server call.

MJ



On Fri, Jun 3, 2016 at 12:59 AM, Erick Erickson 
wrote:

> One of the most valuable things I did when I started out
> (way back in the Lucene-only days) was try to answer _one_
> question every so often. Even if someone else beat me to the
> punch, I benefitted from the research. And the rest of the time
> I discovered things I never knew about Solr/Lucene!
>
> I think one of the most valuable lessons was "Somebody's
> probably run into this before, I wonder what _they_ did?"
> ;)
>
> Best,
> Erick
>
> On Thu, Jun 2, 2016 at 9:46 PM, MaryJo Sminkey 
> wrote:
> > Well thanks for asking the question because I had no idea what Andrew
> > posted was even possible... and I most definitely will be using that
> > myself! Totally brilliant stuff. I am so loving Solr... well, when it's
> not
> > driving me bonkers.
> >
> > Mary Jo
> >
> >
> > On Thu, Jun 2, 2016 at 2:33 PM, Jamal, Sarfaraz <
> > sarfaraz.ja...@verizonwireless.com.invalid> wrote:
> >
> >> Thank you Andrew, that looks like exactly what I am looking for =)
> >> Thank you Robert, it looks like we are both doing it in similar fashion
> =)
> >> Thank you MaryJo  for jumping right in!
> >>
> >> Sas
> >>
> >>
> >>
> >> -Original Message-
> >> From: Andrew Chillrud [mailto:achill...@opentext.com]
> >> Sent: Thursday, June 2, 2016 2:17 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: RE: [E] Re: Faceting Question(s)
> >>
> >> It is possible to get the original facet counts for the field you are
> >> filtering on (we have been using this since Solr 3.6). Don't know if
> this
> >> can be extended to get the original counts for all fields however.
> >>
> >> This syntax is described here:
> >> https://cwiki.apache.org/confluence/display/solr/Faceting
> >>
> >> Tagging and Excluding Filters
> >>
> >> You can tag specific filters and exclude those filters when faceting.
> This
> >> is useful when doing multi-select faceting.
> >>
> >> Consider the following example query with faceting:
> >>
> >>
> q=mainquery&fq=status:public&fq=doctype:pdf&facet=true&facet.field=doctype
> >>
> >> Because everything is already constrained by the filter doctype:pdf, the
> >> facet.field=doctype facet command is currently redundant and will
> return 0
> >> counts for everything except doctype:pdf.
> >>
> >> To implement a multi-select facet for doctype, a GUI may want to still
> >> display the other doctype values and their associated counts, as if the
> >> doctype:pdf constraint had not yet been applied. For example:
> >> === Document Type ===
> >>   [ ] Word (42)
> >>   [x] PDF  (96)
> >>   [ ] Excel(11)
> >>   [ ] HTML (63)
> >>
> >> To return counts for doctype values that are currently not selected, tag
> >> filters that directly constrain doctype, and exclude those filters when
> >> faceting on doctype.
> >>
> >>
> >>
> q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=true&facet.field={!ex=dt}doctype
> >>
> >> Filter exclusion is supported for all types of facets. Both the tag and
> ex
> >> local parameters may specify multiple values by separating them with
> commas.
> >>
> >> - Andy -
> >>
> >> -Original Message-
> >> From: Robert Brown [mailto:r...@intelcompute.com]
> >> Sent: Thursday, June 02, 2016 2:12 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: [E] Re: Faceting Question(s)
> >>
> >> MaryJo, I think you've mis-understood.  The counts are different simply
> >> because the 2nd query contains an filter of a facet value from the 1st
> >> query - that's completely expected.
> >>
> >> The issue is how to get the original facet counts (with no filters but
> >> same q) in the same call as also filtering by one of those facet values.
> >>
> >> Personally I don't think it's possible, but will be interested to hear
> >> others input, since it's a very common situation for me - I cache the
> first
> >> result in memcached and tag future queries as related to the first.
> >>
> >> Or could you always make 2 calls back to Solr (one original (again), and
> >> one with the filters), the caches should help massively.
> >>
> >>
> >>
> >> On 02/06/16 19:07, MaryJo Sminkey wrote:
> >> > And you're saying the count for the second q

Re: Help: Lucidwork Fusion documentation

2016-06-02 Thread Aman Tandon
i am looking for lucidwork documentation.

ok chris I will contact lucidwork then.
thank you.

On Friday, June 3, 2016, Chris Hostetter  wrote:
>
> Lucidworks Fusion is a commercial product, not a part of the Apache
> Software Foundation - questions about using it are not really appropriate
> for this mailing list.  You should contact Lucidworks support directly...
>
> https://lucidworks.com/company/contact/
>
> ...with that in mind, the docs for Fusion can be found here...
>
> https://doc.lucidworks.com/index.html
>
>
>
> : Date: Fri, 3 Jun 2016 04:40:57 +0530
> : From: Aman Tandon 
> : Reply-To: solr-user@lucene.apache.org
> : To: "solr-user@lucene.apache.org" 
> : Subject: Help: Lucidwork Fusion documentation
> :
> : Hi,
> :
> : How could I download the Fusion documentation pdf ? If anyone is aware,
> : please help me!!
> :
> : With Regards
> : Aman Tandon
> :
>
> -Hoss
> http://www.lucidworks.com/
>

-- 
Sent from Gmail Mobile


Re: [E] Re: Faceting Question(s)

2016-06-02 Thread Erick Erickson
We can always use more documentation. One of the
valuable things about people getting started is that it's an
opportunity to clarify documents. Sometimes the people who
develop/write the docs jump into the middle and assume
the reader has knowledge they couldn't be expected to have

Hint, hint.

Best,
Erick

On Thu, Jun 2, 2016 at 10:09 PM, MaryJo Sminkey  wrote:
> Yeah even though I'm still fairly new to this, I'm generally a good problem
> solver or I'd never have gotten as far as I have already on my own (really
> wanted to hire a Solr consultant and pushed VERY hard for it, but my boss
> really likes us to figure things out on our own!) Just wish I'd found this
> list long before now, I have a feeling it would have saved me some very
> long nights and weekends trying to work out some of the more baffling
> issues. That's why I jumped in and why I misinterpreted the question...
> because the way I read it was the thing that literally drove me crazy for
> two days straight trying to figure out. ;-)  But I'm very excited to find
> out the real question and answer as that is something that definitely
> applies to us as well and will certainly speed up our searches to drop the
> extra server call.
>
> MJ
>
>
>
> On Fri, Jun 3, 2016 at 12:59 AM, Erick Erickson 
> wrote:
>
>> One of the most valuable things I did when I started out
>> (way back in the Lucene-only days) was try to answer _one_
>> question every so often. Even if someone else beat me to the
>> punch, I benefitted from the research. And the rest of the time
>> I discovered things I never knew about Solr/Lucene!
>>
>> I think one of the most valuable lessons was "Somebody's
>> probably run into this before, I wonder what _they_ did?"
>> ;)
>>
>> Best,
>> Erick
>>
>> On Thu, Jun 2, 2016 at 9:46 PM, MaryJo Sminkey 
>> wrote:
>> > Well thanks for asking the question because I had no idea what Andrew
>> > posted was even possible... and I most definitely will be using that
>> > myself! Totally brilliant stuff. I am so loving Solr... well, when it's
>> not
>> > driving me bonkers.
>> >
>> > Mary Jo
>> >
>> >
>> > On Thu, Jun 2, 2016 at 2:33 PM, Jamal, Sarfaraz <
>> > sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>> >
>> >> Thank you Andrew, that looks like exactly what I am looking for =)
>> >> Thank you Robert, it looks like we are both doing it in similar fashion
>> =)
>> >> Thank you MaryJo  for jumping right in!
>> >>
>> >> Sas
>> >>
>> >>
>> >>
>> >> -Original Message-
>> >> From: Andrew Chillrud [mailto:achill...@opentext.com]
>> >> Sent: Thursday, June 2, 2016 2:17 PM
>> >> To: solr-user@lucene.apache.org
>> >> Subject: RE: [E] Re: Faceting Question(s)
>> >>
>> >> It is possible to get the original facet counts for the field you are
>> >> filtering on (we have been using this since Solr 3.6). Don't know if
>> this
>> >> can be extended to get the original counts for all fields however.
>> >>
>> >> This syntax is described here:
>> >> https://cwiki.apache.org/confluence/display/solr/Faceting
>> >>
>> >> Tagging and Excluding Filters
>> >>
>> >> You can tag specific filters and exclude those filters when faceting.
>> This
>> >> is useful when doing multi-select faceting.
>> >>
>> >> Consider the following example query with faceting:
>> >>
>> >>
>> q=mainquery&fq=status:public&fq=doctype:pdf&facet=true&facet.field=doctype
>> >>
>> >> Because everything is already constrained by the filter doctype:pdf, the
>> >> facet.field=doctype facet command is currently redundant and will
>> return 0
>> >> counts for everything except doctype:pdf.
>> >>
>> >> To implement a multi-select facet for doctype, a GUI may want to still
>> >> display the other doctype values and their associated counts, as if the
>> >> doctype:pdf constraint had not yet been applied. For example:
>> >> === Document Type ===
>> >>   [ ] Word (42)
>> >>   [x] PDF  (96)
>> >>   [ ] Excel(11)
>> >>   [ ] HTML (63)
>> >>
>> >> To return counts for doctype values that are currently not selected, tag
>> >> filters that directly constrain doctype, and exclude those filters when
>> >> faceting on doctype.
>> >>
>> >>
>> >>
>> q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=true&facet.field={!ex=dt}doctype
>> >>
>> >> Filter exclusion is supported for all types of facets. Both the tag and
>> ex
>> >> local parameters may specify multiple values by separating them with
>> commas.
>> >>
>> >> - Andy -
>> >>
>> >> -Original Message-
>> >> From: Robert Brown [mailto:r...@intelcompute.com]
>> >> Sent: Thursday, June 02, 2016 2:12 PM
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Re: [E] Re: Faceting Question(s)
>> >>
>> >> MaryJo, I think you've mis-understood.  The counts are different simply
>> >> because the 2nd query contains an filter of a facet value from the 1st
>> >> query - that's completely expected.
>> >>
>> >> The issue is how to get the original facet counts (with no filters but
>> >> same q) in the same call as also filtering by one of tho

Re: Configure SolrCloud for Loadbalance for .net client

2016-06-02 Thread shivendra.tiwari

Thanks Erick,

Accually, I am using fork provided by SolrNet for Cloud from here 
https://github.com/vladen/SolrNet  but unable to communicate from zookeeper. 
Do you have any idea it is stable for SolrCloud. I am using SolrNet for 
simple master and slave it is working fine but for cloud mode unable to 
understand what I have to use.


Thank you very much!

Warm Regards!
Shivendra Kumar Tiwari

-Original Message- 
From: Erick Erickson

Sent: Friday, June 03, 2016 6:40 AM
To: solr-user
Subject: Re: Configure SolrCloud for Loadbalance for .net client

Most people just put a hardware load balancer in front of their Solr
cluster and leave it at that. Since you're using .net, you can't
use CloudSolrClient which has a software LB built in so you'll
have to do something external.

Best,
Erick

On Wed, Jun 1, 2016 at 4:10 AM, shivendra.tiwari
 wrote:

Hi,

I have to configure SolrCloud for loadbalance on .net application please 
suggest what we have to needed and how to configure it. We are currently 
working on lower version of Solr with Master and Slave concept.


Please suggest.


Warm Regards!
Shivendra Kumar Tiwari 





Re: SolrCloud 5.2.1 nodes are out of sync - how to handle

2016-06-02 Thread Ilan Schwarts
In my question i confusef you, there are 2 shards and 2 nodes on each
shard, one leader and one not. When created the collection num of shards
was 2 and replication factor was 2.
Now the status is shard 1 has 2 out of sync nodes, so it is needed to
merge/sync them. Do you still suggest same? Add replica to the damaged
shard and then delete it? If the collection was created with composite
routing is it possible?
On Jun 3, 2016 4:18 AM, "Erick Erickson"  wrote:

> A pedantic nit... leader/replica is not much like
> "old master/slave".
>
> That out of the way, here's what I'd do.
> 1> use the ADDREPLICA to add a new replica for the shard
> _on the same node as the bad one_.
> 2> Once that had recoverd (green in the admin UI) and you
>  were confident of
>its integrity (you can verify by running queries against this
>   new replica and the leader with &distrib=false), use
>DELETEREPLICA on the "bad" core.
>
> Best,
> Erick
>
> On Wed, Jun 1, 2016 at 5:54 AM, Ilan Schwarts  wrote:
> > Hi,
> > We have in lab SolrCloud 5.2.1
> > 2 Shards, each shard has 2 cores/nodes, replication factor is 1. meaning
> > that one node is leader (like old master-slave).
> > (upon collection creation numShards=1 rp=1)
> >
> > Now there is a problem in the lab, shard 1 has 2 cores, but the number of
> > docs is different, and when adding a document to one of the cores, it
> will
> > not replicate the data to the other one.
> > If i check cluster state.json it appears fine, it writes there are 2
> active
> > cores and only 1 is set as leader.
> >
> > What is the recovery method for a scenario like this ? I dont have logs
> > anymore and cannot reproduce.
> > Is it possible to merge the 2 cores into 1, and then split that core to 2
> > cores ?
> > Or maybe to enforce sync if possible ?
> >
> > The other shard, Shard 2 is functioning well, the replication works fine,
> > when adding a document to 1 core, it will replicate it to the other.
> >
> > --
> >
> >
> > -
> > Ilan Schwarts
>


Re: [E] Re: Faceting Question(s)

2016-06-02 Thread MaryJo Sminkey
On Fri, Jun 3, 2016 at 1:25 AM, Erick Erickson 
wrote:

> We can always use more documentation. One of the
> valuable things about people getting started is that it's an
> opportunity to clarify documents. Sometimes the people who
> develop/write the docs jump into the middle and assume
> the reader has knowledge they couldn't be expected to have
>
> Hint, hint.
>

Well not sure how best to document this but it basically related to using
some of the edismax parameters, doing boosts on search fields, using the
phrase matching and phrase boosts, etc. all which are intended to work on
actual search terms. I had added these to my config but in some of my
searches, the filters do all the work and I just have a wildcard search
(q=*.*). It seems that if you have entries for these edismax settings, and
do this kind of search, you can get some odd results ( I didn't really
track down which setting exactly was the culprit but it definitely related
to the edismax-specific params). In my case, some of the docs that should
have showed up based on the filters were going missing. Once I figured this
out and moved the edismax params out of the defaults and only turned them
on when I had actual search terms, I got the results I expected. But it
took quite a lot of time to track them down as the cause due to the
complexity of the code I am working with.

So I guess what would be useful would be a caution on the Edismax page
about having these parameters set when you have wildcard searches possible
(Not sure if this applies to dismax as parser as well but probably does).

Mary Jo




Sent with MailTrack



Re: Configure SolrCloud for Loadbalance for .net client

2016-06-02 Thread Mikhail Khludnev
Hello,

How does it work now? Do you have a list of slaves configured on a client
app? Btw what do you use to call Solr from .net?
01 июня 2016 г. 14:08 пользователь "shivendra.tiwari" <
shivendra.tiw...@arcscorp.net> написал:

> Hi,
>
> I have to configure SolrCloud for loadbalance on .net application please
> suggest what we have to needed and how to configure it. We are currently
> working on lower version of Solr with Master and Slave concept.
>
> Please suggest.
>
>
> Warm Regards!
> Shivendra Kumar Tiwari