Re: Accessing Solr collections at different ports - Need help

2019-05-03 Thread Jörn Franke
This is just the setup for an experimental cluster (generally it does also not 
make sense to have many instances on the same server). Once you have got more 
experience take a look at 
https://lucene.apache.org/solr/guide/7_7/taking-solr-to-production.html

To see how to set up clusters.

> Am 03.05.2019 um 08:52 schrieb Salmaan Rashid Syed 
> :
> 
> Thanks Jorn for your reply.
> 
> I say that the nodes are limited to 4 because when I launch Solr in cloud
> mode, the first prompt that I get is to choose number of nodes [1-4]. When
> I tried to enter 7, it says that they are more than 4 and choose a smaller
> number.
> 
> 
> *Thanks and Regards,*
> Salmaan Rashid Syed
> +91 8978353445 | www.panna.ai |
> 5550 Granite Pkwy, Suite #225, Plano TX-75024.
> Cyber Gateways, Hi-tech City, Hyderabad, Telangana, India.
> 
> 
> 
>> On Fri, May 3, 2019 at 12:05 PM Jörn Franke  wrote:
>> 
>> BTW why do you think that SolrCloud is limited to 4 nodes? More are for
>> sure possible.
>> 
>>> Am 03.05.2019 um 07:54 schrieb Salmaan Rashid Syed <
>> salmaan.ras...@mroads.com>:
>>> 
>>> Hi Solr Users,
>>> 
>>> I am using Solr 7.6 in cloud mode with external zookeeper installed at
>>> ports 2181, 2182, 2183. Currently we have only one server allocated for
>>> Solr. We are planning to move to multiple servers for better sharing,
>>> replication etc in near future.
>>> 
>>> Now the issue is that, our organisation has data indexed for different
>>> clients as separate collections. We want to uniquely access, update and
>>> index each collection separately so that each individual client has
>> access
>>> to their respective collections at their respective ports. Eg:—
>> Collection1
>>> at port 8983, Collection2 at port 8984, Collection3 at port 8985 etc.
>>> 
>>> I have two options I guess, one is to run Solr in cloud mode with 4 nodes
>>> (max as limited by Solr) at 4 different ports. I don’t know how to go
>>> beyond 4 nodes/ports in this case.
>>> 
>>> The other option is to run Solr as service and create multiple copies of
>>> Solr folder within the Server folder and access each Solr at different
>> port
>>> with its own collection as shown by
>>> https://www.youtube.com/watch?v=wmQFwK2sujE
>>> 
>>> I am really confused as to which is the better path to choose. Please
>> help
>>> me out.
>>> 
>>> Thanks.
>>> 
>>> Regards,
>>> Salmaan
>>> 
>>> 
>>> *Thanks and Regards,*
>>> Salmaan Rashid Syed
>>> +91 8978353445 | www.panna.ai |
>>> 5550 Granite Pkwy, Suite #225, Plano TX-75024.
>>> Cyber Gateways, Hi-tech City, Hyderabad, Telangana, India.
>> 


Re: problem indexing GPS metadata for video upload

2019-05-03 Thread Where is Where
Thank you very much Tim, I wonder how to make the Tika change apply to
Solr? I saw Tika core, parse and xml jar files tika-core.jar
tika-parsers.jar tika-xml.jar in solr contrib/extraction/lib folder. Do we
just  replace these files? Thanks!

On Thu, May 2, 2019 at 12:16 PM Where is Where  wrote:

> Thank you Alex and Tim.
> I have looked at the solrconfig.xml file (I am trying the techproducts
> demo config), the only related place I can find is the extract handle
>
>startup="lazy"
>   class="solr.extraction.ExtractingRequestHandler" >
> 
>   true
>   
>
>   
>   true
>   links
>   ignored_
> 
>   
>
> I am using this command bin/post -c techproducts
> example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_"
>
> I have tried commenting out ignored_ and
> changing to div
> but still not working. I don't quite get why image is getting gps etc
> metadata but video is acting differently while it is using the same
> solrconfig and the gps metadata are in the same fields. There is no
> differentiation in solrconfig setting between image and video.
>
> Tim yes this is related to the TIKA link. Thank you!
>
> Here is the output in solr for mp4.
>
> {
> "attr_meta":["stream_size",
>   "5721559",
>   "date",
>   "2019-03-29T04:36:39Z",
>   "X-Parsed-By",
>   "org.apache.tika.parser.DefaultParser",
>   "X-Parsed-By",
>   "org.apache.tika.parser.mp4.MP4Parser",
>   "stream_content_type",
>   "application/octet-stream",
>   "meta:creation-date",
>   "2019-03-29T04:36:39Z",
>   "Creation-Date",
>   "2019-03-29T04:36:39Z",
>   "tiff:ImageLength",
>   "1080",
>   "resourceName",
>   "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>   "dcterms:created",
>   "2019-03-29T04:36:39Z",
>   "dcterms:modified",
>   "2019-03-29T04:36:39Z",
>   "Last-Modified",
>   "2019-03-29T04:36:39Z",
>   "Last-Save-Date",
>   "2019-03-29T04:36:39Z",
>   "xmpDM:audioSampleRate",
>   "1000",
>   "meta:save-date",
>   "2019-03-29T04:36:39Z",
>   "modified",
>   "2019-03-29T04:36:39Z",
>   "tiff:ImageWidth",
>   "1920",
>   "xmpDM:duration",
>   "2.64",
>   "Content-Type",
>   "video/mp4"],
> "id":"mp4_4",
> "attr_stream_size":["5721559"],
> "attr_date":["2019-03-29T04:36:39Z"],
> "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
>   "org.apache.tika.parser.mp4.MP4Parser"],
> "attr_stream_content_type":["application/octet-stream"],
> "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
> "attr_creation_date":["2019-03-29T04:36:39Z"],
> "attr_tiff_imagelength":["1080"],
> 
> "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
> "attr_dcterms_created":["2019-03-29T04:36:39Z"],
> "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
> "last_modified":"2019-03-29T04:36:39Z",
> "attr_last_save_date":["2019-03-29T04:36:39Z"],
> "attr_xmpdm_audiosamplerate":["1000"],
> "attr_meta_save_date":["2019-03-29T04:36:39Z"],
> "attr_modified":["2019-03-29T04:36:39Z"],
> "attr_tiff_imagewidth":["1920"],
> "attr_xmpdm_duration":["2.64"],
> "content_type":["video/mp4"],
> "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  
> \n  \n  \n  \n  \n  \n  \n  \n \n   "],
> "_version_":1632383499325407232}]
>   }}
>
> JPEG is getting these:
> "attr_meta":[
> "GPS Latitude",
>   "37° 47' 41.99\"",
> 
> "attr_gps_latitude":["37° 47' 41.99\""],
>
>
> On Wed, May 1, 2019 at 2:57 PM Where is Where  wrote:
>
>> uploading video to solr via tika
>> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
>> The index has no video GPS metadata which is extracted and indexed for
>> images such as jpeg. I have checked both MP4 and MOV files, the files I
>> checked all have GPS Exif data embedded in the same fields as image. Any
>> idea? Thanks!
>>
>


Re: Status of solR / HDFS-v3 compatibility

2019-05-03 Thread Hendrik Haddorp

We have some Solr 7.6 setups connecting to HDFS 3 clusters. So far that
did not show any compatibility problems.

On 02.05.19 15:37, Kevin Risden wrote:

For Apache Solr 7.x or older yes - Apache Hadoop 2.x was the dependency.
Apache Solr 8.0+ has Hadoop 3 compatibility with SOLR-9515. I did some
testing to make sure that Solr 8.0 worked on Hadoop 2 as well as Hadoop 3,
but the libraries are Hadoop 3.

The reference guide for 8.0+ hasn't been released yet, but also don't think
it was updated.

Kevin Risden


On Thu, May 2, 2019 at 9:32 AM Nicolas Paris 
wrote:


Hi

solr doc [1] says it's only compatible with hdfs 2.x
is that true ?


[1]: http://lucene.apache.org/solr/guide/7_7/running-solr-on-hdfs.html

--
nicolas





Sort field values by client-specified order

2019-05-03 Thread Andreas Hubold

Hi,

we have a fixed number of values in a String field (up to around 100), 
that should be used for sorting query results. Is there some way to let 
the client specify the sort order as part of its query?


I was thinking about using a function query. Is it possible to specify 
the order of values as part of the request? If so, do you have an idea 
for how many values that would work? 10, 100, 1000?


The use case behind this is that different clients have different 
localizations and expect results sorted according to their localization. 
If the above isn't possible, I would probably add fields with the 
localized values (one for each locale), but that would be a bit complex 
in our setup and require re-indexing when localizations change or new 
locales are added.


Thanks for any hints.

Kind regards,
Andreas



Solr Log rotation

2019-05-03 Thread shruti suri
Hi,

My log size is growing larger and it take most of the space. Please suggest
how to handle this. Also Is there a way for log cleanup other than on
startup as my servers didn't restart daily and the size keep on increasing.


log4j.properties

#  Logging level
solr.log=/var/log/solr
log4j.rootLogger=INFO, file, CONSOLE

log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender

log4j.appender.CONSOLE.layout=org.apache.log4j.EnhancedPatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%-4r %-5p (%t)
[%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n

#- size rotation with log cleanup.
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.MaxFileSize=10MB
log4j.appender.file.MaxBackupIndex=10

#- File to log to and log format
log4j.appender.file.File=${solr.log}/solr.log
log4j.appender.file.layout=org.apache.log4j.EnhancedPatternLayout
log4j.appender.file.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS}
%-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n

log4j.logger.org.apache.zookeeper=WARN
log4j.logger.org.apache.hadoop=WARN

# set to INFO to enable infostream log messages
log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF

Thanks
Shruti






-
Regards
Shruti
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr long q values

2019-05-03 Thread solrnoobie
So whenever we have long q values (from a sentence to a small paragraph), we
encounter some heap problems (OOM) and I guess this is normal?

So my question would be is how should we handle this type of problem? Of
course we could always limit the size of the search term queries in the
application side but is there anything we could do in our configuration that
could prevent the OOM issues even if some random user intentionally bombard
us with long search queries in the front end?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Search using filter query on multivalued fields

2019-05-03 Thread Srinivas Kashyap
Hi,

I have indexed data as shown below using DIH:

"INGREDIENT_NAME": [
  "EGG",
  "CANOLA OIL",
  "SALT"
],
"INGREDIENT_NO": [
  "550",
  "297",
  "314"
],
"COMPOSITION PERCENTAGE": [
  20,
  60,
  40
],

Similar to this, many other records are also indexed. These are multi-valued 
fields.

I have a requirement to search all the records which has ingredient name salt 
and it's composition percentage is more than 20.

How do I write a filter query for this?

P.S: I should only fetch records, whose Salt Composition percentage is more 
than 20 and not other percentages.

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Facetting heat map, too many cells

2019-05-03 Thread Markus Jelsma
Hello,

With gridlevel set to 3 i have a map of 256 x 128. However, i would really like 
a higher resolution, preferable twice as high. But with any gridlevel higher 
than 3, or distErrPct 0.1 or lower, i get the IllegalArgumentException, saying 
it does not want to give me a 1024x1024 sized map.

How can i get a 512x256 sized heat map for the whole Earth?

Many thanks,
Markus


Unresolved dependencies (io.dropwizard.metrics) while building Solr

2019-05-03 Thread Erlend Garåsen


I'm trying to build the latest Solr release from Git, but I'm stuck at
this stage:

ivy:retrieve]   ::
[ivy:retrieve]  ::  UNRESOLVED DEPENDENCIES ::
[ivy:retrieve]  ::
[ivy:retrieve]  :: io.dropwizard.metrics#metrics-jetty9;4.0.5: several
problems occurred while resolving dependency:
io.dropwizard.metrics#metrics-jetty9;4.0.5 {metrics=[master]}:
[ivy:retrieve]  several problems occurred while resolving dependency:
io.dropwizard.metrics#metrics-bom;4.0.5 {}:
[ivy:retrieve]  io.dropwizard.metrics#metrics-parent;4.0.5

I have tried to clean the Ivy cache several times, but with no luck.

Is this a known problem? I need to test a patch a Solr developer has
made, and it seems that the patch is only compatible with the latest
version from Git.

Erlend


SSL in Solr 7.6.0

2019-05-03 Thread dinesh naik
Hi all,
I am working on securing Solr and Client communication by implementing SSL
for a multi node cluster(100+).

The client are connecting to Solr via CloudSolrClient through Zoo keeper and
i am looking for best way to create the certificate for making the
connection secured.

for a cluster of size 100 and plus, it becomes hard to have all the
hostnames/ip's while generating the certificate and wildcard option is
ruled out due tp security concerns, so what is the best way to handle this
scenario.

Also if you give some light on usage of SOLR_SSL_CHECK_PEER_NAME param and
if that will help in any way ?

-- 
Best Regards,
Dinesh Naik


Re: Why did Solr stats min/max values were returned as float number for field of type="pint"?

2019-05-03 Thread Wendy2
Hi Joel,

Thanks for your response.

Regarding your response "This syntax is bringing back correct data types",

I have a pint field, the stats returned the following min/max values.
"min":0.0, 
"max":1356.0, 

But I was expecting min/max values like below. Is it possible?Thanks! 
"min":0 
"max":1356



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Unable to tag queries (q) in SOLR >= 7.2

2019-05-03 Thread Fredrik Rodland
Thank you for a quick response David.

Your suggestion works like a charm.

(And you were of course right about the query being manually edited).

Regards,

Fredrik

> On 30 Apr 2019, at 14:48, David Smiley  wrote:
> 
> Hi Frederik,
> 
> In your example, I think you may have typed it manually since there are
> mistakes like df=edismax which I think you meant defType=edismax.  Any way,
> assuming you need local-param syntax in 'q' (for tagging or whatever other
> reason), then this means you must specify the query parser there and *not*
> defType (don't set defType or set it to "lucene" which is the default).
> 
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
> 
> 
> On Tue, Apr 30, 2019 at 8:17 AM Fredrik Rodland  wrote:
> 
>> Hi.
>> 
>> I seems SOLR-11501 may have changed more than just the ability to control
>> the query parser set through {!queryparser}.  We tag our queries to provide
>> facets both with and without the query in the same request, just as tagging
>> in fq described here:
>> https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-TaggingandExcludingFilters
>> 
>> After upgrading to 7.2 this does not work anymore (for the q-parameter)
>> using edismax.  We’ve tried to add the uf-paramter:
>> 
>> select?q={!tag%3Dmytag}house&debug=query&rows=0&uf=query&df=edismax
>> 
>> But this only result in q being allowed through, but not parsed - i.e.:
>> "+(+DisjunctionMaxQuery(((synrank80:tagmytagingeniør)^8.0 |
>> (stemrank40:tagmytagingeniør)^4.0…
>> 
>> Does anybody have any experience or tips for enabling tagging of queries
>> for SOLR >= 7.2?
>> 
>> Regards
>> 
>> Fredrik



Help extracting text from PDF images when indexing files

2019-05-03 Thread Miguel Fernandes
Hi all,

I'm new to Solr, i've recently downloaded solr 8.0.0 and have been
following the tutorials. Using the 2 example instances created, i'm trying
to create my own collection. I've done a copy of the _default configset and
used it to create my collection.

For my case, the files i want to index are pdf files composed of images. I
have tesseract installed and i can parse correctly the pdf files using an
tika server instance i downloaded, i.e i can get the extracted text from
the images.

I'm following the instructions on from page "Uploading Data with Solr Cell
Using Apache Tika" to propertly configure the PDF image extraction but i'm
not being able to correctly get this. My aim is that the content of the PDF
file goes into a field named content that i've created in my schema. From
my attempts this field is non existent or when it exists it doesnt contain
the expected text from the parsed images.

In the configuration of ExtractingRequestHandler, the lib clauses are
present in my solrconfig.xml, that section is as below:

  

  true
  content

parseContext.xml
  

And my parseContext.xml file is:








Any help on how to correctly extract the text from the PDF images would be
great.
Thanks
Miguel


Re: Accessing Solr collections at different ports

2019-05-03 Thread Shawn Heisey

On 5/2/2019 11:47 PM, Salmaan Rashid Syed wrote:

I am using Solr 7.6 in cloud mode with external zookeeper installed at ports 
2181, 2182, 2183. Currently we have only one server allocated for Solr. We are 
planning to move to multiple servers for better sharing, replication etc in 
near future.

Now the issue is that, our organisation has data indexed for different clients 
as separate collections. We want to uniquely access, update and index each 
collection separately so that each individual client has access to their 
respective collections at their respective ports. Eg:— Collection1 at port 
8983, Collection2 at port 8984, Collection3 at port 8985 etc.


This is not something you can do with a single instance of Solr.


I have two options I guess, one is to run Solr in cloud mode with 4 nodes (max 
as limited by Solr) at 4 different ports. I don’t know how to go beyond 4 
nodes/ports in this case.


There are no limits to the number of nodes.  I know people are running 
SolrCloud clusters with hundreds of nodes.  And there might be some out 
there with thousands ... although if those exist, they're really pushing 
the limits.



The other option is to run Solr as service and create multiple copies of Solr folder 
within the Server folder and access each Solr at different port with its own 
collection as shown by https://www.youtube.com/watch?v=wmQFwK2sujE 



If you have multiple Solr nodes in a single cluster, you can access any 
collection from any node.  This will probably present a security problem 
for you.  Even if the receiving node doesn't have any of the 
collection's data, SolrCloud will proxy the connection over to the nodes 
that DO have that data.


If you give each Solr node (instance) a different chroot on its zkHost 
string, then each one would be a completely separate cluster from all 
the others, and you can run them all in one zookeeper ensemble.  Each 
one would have zkHost strings that look something like this:


zk1.example.com:2181,zk2.example.com:2181,zk3.example.com/solr1
zk1.example.com:2181,zk2.example.com:2181,zk3.example.com/solr2
zk1.example.com:2181,zk2.example.com:2181,zk3.example.com/solr3
zk1.example.com:2181,zk2.example.com:2181,zk3.example.com/solr4

You'll need to find the section in the documentation that talks about 
creating a chroot in ZK.


I think that would give you what you're after.

It's probably easiest to set up each Solr instance in its own directory, 
not try to run multiple services out of one installation directory.  The 
entire extracted archive is less than 200MB ... tiny by modern standards.


Thanks,
Shawn


Re: Accessing Solr collections at different ports - Need help

2019-05-03 Thread Shawn Heisey

On 5/3/2019 12:52 AM, Salmaan Rashid Syed wrote:

I say that the nodes are limited to 4 because when I launch Solr in cloud
mode, the first prompt that I get is to choose number of nodes [1-4]. When
I tried to enter 7, it says that they are more than 4 and choose a smaller
number.


That's the cloud *EXAMPLE*.  It sets everything up on one server that 
would normally be on separate servers, and runs an embedded zookeeper in 
the first node.


Example setups are not meant for production.

Thanks,
Shawn


Re: Accessing Solr collections at different ports

2019-05-03 Thread Erick Erickson
This is not true. You can run as many separate JVMs on a single physical 
machine as you have available ports. There’s no capability to address a Solr 
_collection_ in the _same_ JVM by a different port though.

But what you didn’t mention is having separate collections per client. A single 
Solr instance (defined by a running JVM) can host a number of different 
collections. So you have URLS like
http://some_sever:port/solr/collection1/query
http://some_sever:port/solr/collection2/query

Now you restrict the URLs available to client1 to collection1, client2 to 
collection2 etc.

Best,
Erick

> On May 3, 2019, at 1:47 AM, Salmaan Rashid Syed  
> wrote:
> 
> Solr in cloud mode with 4 nodes (max as limited by Solr) at 4 different ports



Re: Unresolved dependencies (io.dropwizard.metrics) while building Solr

2019-05-03 Thread Erick Erickson
Sometimes this is leftover files in the checkout tree, occasionally it has to 
do with checksum files, sometimes it’s gremlins. I have never had a problem if 
I try some combination of: 

1> clone the repo into a new directory
2> clean the ivy cache
3> ant clean-jars jar-checksums

But usually I just do 1 and 2 as that combo is much faster than trying to 
figure it out…..

Best,
Erick

Or as Randall Monroe (XKCD) so accurately reveals when it comes to how to deal 
with Git issues: https://xkcd.com/1597/

> On May 3, 2019, at 8:59 AM, Erlend Garåsen  wrote:
> 
> 
> I'm trying to build the latest Solr release from Git, but I'm stuck at
> this stage:
> 
> ivy:retrieve] ::
> [ivy:retrieve]::  UNRESOLVED DEPENDENCIES ::
> [ivy:retrieve]::
> [ivy:retrieve]:: io.dropwizard.metrics#metrics-jetty9;4.0.5: 
> several
> problems occurred while resolving dependency:
> io.dropwizard.metrics#metrics-jetty9;4.0.5 {metrics=[master]}:
> [ivy:retrieve]several problems occurred while resolving dependency:
> io.dropwizard.metrics#metrics-bom;4.0.5 {}:
> [ivy:retrieve]io.dropwizard.metrics#metrics-parent;4.0.5
> 
> I have tried to clean the Ivy cache several times, but with no luck.
> 
> Is this a known problem? I need to test a patch a Solr developer has
> made, and it seems that the patch is only compatible with the latest
> version from Git.
> 
> Erlend



Re: Search using filter query on multivalued fields

2019-05-03 Thread Erick Erickson
There is no way to do this with the setup you describe. That is, there’s no way 
to say “only use the third element of a multiValued field”.

What I’d do is index (perhaps in a separate field) with payloads, so you have 
input like SALT|20, then use some of the payload functionality to make this 
happen. See: https://lucidworks.com/2017/09/14/solr-payloads/

There are some other strategies that are simpler, one could index (again, 
perhaps in a separate field) SALT_20. Then you can form filter queries like 
“fq=ingredient:[SALT_20 TO *]. That’s not very flexible and you have to 
normalize (i.e. 1% couldn’t be SALT_1), so “it depends”.

The point is that you have to index cleverly to do what you want.

Best,
Erick

> On May 3, 2019, at 6:26 AM, Srinivas Kashyap  wrote:
> 
> Hi,
> 
> I have indexed data as shown below using DIH:
> 
> "INGREDIENT_NAME": [
>  "EGG",
>  "CANOLA OIL",
>  "SALT"
>],
> "INGREDIENT_NO": [
>  "550",
>  "297",
>  "314"
>],
> "COMPOSITION PERCENTAGE": [
>  20,
>  60,
>  40
>],
> 
> Similar to this, many other records are also indexed. These are multi-valued 
> fields.
> 
> I have a requirement to search all the records which has ingredient name salt 
> and it's composition percentage is more than 20.
> 
> How do I write a filter query for this?
> 
> P.S: I should only fetch records, whose Salt Composition percentage is more 
> than 20 and not other percentages.
> 
> Thanks and Regards,
> Srinivas Kashyap
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.



Re: Solr Log rotation

2019-05-03 Thread Erick Erickson
Shouldn’t be happening like this, you should have 10, approximately 10M files. 
Did you by any chance upgrade to a Solr that uses Log4j2 and keep the old 
config files? log4j2.xml should be your config if so and it has a much 
different format than what you’re showing.

The when to purge decision is made whenever a hard commit happens, do the next 
question is do you hard commit? See: 
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Or do you have CDCR enabled?

In short, something about your setup is changed from the standard distribution, 
you need to identify what..

Best,
Erick

> On May 3, 2019, at 4:22 AM, shruti suri  wrote:
> 
> Hi,
> 
> My log size is growing larger and it take most of the space. Please suggest
> how to handle this. Also Is there a way for log cleanup other than on
> startup as my servers didn't restart daily and the size keep on increasing.
> 
> 
> log4j.properties
> 
> #  Logging level
> solr.log=/var/log/solr
> log4j.rootLogger=INFO, file, CONSOLE
> 
> log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
> 
> log4j.appender.CONSOLE.layout=org.apache.log4j.EnhancedPatternLayout
> log4j.appender.CONSOLE.layout.ConversionPattern=%-4r %-5p (%t)
> [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n
> 
> #- size rotation with log cleanup.
> log4j.appender.file=org.apache.log4j.RollingFileAppender
> log4j.appender.file.MaxFileSize=10MB
> log4j.appender.file.MaxBackupIndex=10
> 
> #- File to log to and log format
> log4j.appender.file.File=${solr.log}/solr.log
> log4j.appender.file.layout=org.apache.log4j.EnhancedPatternLayout
> log4j.appender.file.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS}
> %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n
> 
> log4j.logger.org.apache.zookeeper=WARN
> log4j.logger.org.apache.hadoop=WARN
> 
> # set to INFO to enable infostream log messages
> log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF
> 
> Thanks
> Shruti
> 
> 
> 
> 
> 
> 
> -
> Regards
> Shruti
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Search using filter query on multivalued fields

2019-05-03 Thread David Hastings
another option is to index dynamically, so you would index in this case, or
this is what i would do:
INGREDIENT_SALT_i:40
INGREDIENT_EGG_i:20
etc

and query
INGREDIENT_SALT_i:[20 TO *]
or an arbitrary max value, since these are percentages

INGREDIENT_SALT_i:[20 TO 100]


On Fri, May 3, 2019 at 12:01 PM Erick Erickson 
wrote:

> There is no way to do this with the setup you describe. That is, there’s
> no way to say “only use the third element of a multiValued field”.
>
> What I’d do is index (perhaps in a separate field) with payloads, so you
> have input like SALT|20, then use some of the payload functionality to make
> this happen. See: https://lucidworks.com/2017/09/14/solr-payloads/
>
> There are some other strategies that are simpler, one could index (again,
> perhaps in a separate field) SALT_20. Then you can form filter queries like
> “fq=ingredient:[SALT_20 TO *]. That’s not very flexible and you have to
> normalize (i.e. 1% couldn’t be SALT_1), so “it depends”.
>
> The point is that you have to index cleverly to do what you want.
>
> Best,
> Erick
>
> > On May 3, 2019, at 6:26 AM, Srinivas Kashyap 
> wrote:
> >
> > Hi,
> >
> > I have indexed data as shown below using DIH:
> >
> > "INGREDIENT_NAME": [
> >  "EGG",
> >  "CANOLA OIL",
> >  "SALT"
> >],
> > "INGREDIENT_NO": [
> >  "550",
> >  "297",
> >  "314"
> >],
> > "COMPOSITION PERCENTAGE": [
> >  20,
> >  60,
> >  40
> >],
> >
> > Similar to this, many other records are also indexed. These are
> multi-valued fields.
> >
> > I have a requirement to search all the records which has ingredient name
> salt and it's composition percentage is more than 20.
> >
> > How do I write a filter query for this?
> >
> > P.S: I should only fetch records, whose Salt Composition percentage is
> more than 20 and not other percentages.
> >
> > Thanks and Regards,
> > Srinivas Kashyap
> > 
> > DISCLAIMER:
> > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> > No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
>
>


Re: Reverse-engineering existing installation

2019-05-03 Thread Doug Reeder
Thanks! Alexandre's presentation is helpful in understanding what's not
essential.  David's suggesting of comparing config files is good - I'll
have to see if I can dig up the config files for version 4.2, which we're
currently running.

I'll also look into updating to a supported version. I guess I'll be
reading https://lucene.apache.org/solr/guide/6_6/upgrading-solr.html and
the similar ones for later versions.  Is an upgrade guide for version 4 to
5 still around somewhere?

On Fri, May 3, 2019 at 12:21 AM David Smiley 
wrote:

> Consider trying to diff configs from a default at the version it was copied
> from, if possible. Even better, the configs should be in source control and
> then you can browse history with commentary and sometimes links to issue
> trackers and code reviews.
>
> Also a big part that you can’t see by staring at configs is what the
> queries look like. You should examine the system interacting with Solr to
> observe embedded comments/docs for insights.
>
> On Thu, May 2, 2019 at 11:21 PM Doug Reeder 
> wrote:
>
> > The documentation for SOLR is good.  However it is oriented toward
> setting
> > up a new installation, with the data model known.
> >
> > I have inherited an existing installation.  Aspects of the data model I
> > know, but there's a lot of ways things could have been configured in
> SOLR,
> > and for some cases, I don't know what SOLR was supposed to do.
> >
> > Can you reccomend any documentation on working out the configuration of
> an
> > existing installation?
> >
> --
> Sent from Gmail Mobile
>


Re: Solr long q values

2019-05-03 Thread Shawn Heisey

On 5/3/2019 2:32 AM, solrnoobie wrote:

So whenever we have long q values (from a sentence to a small paragraph), we
encounter some heap problems (OOM) and I guess this is normal?

So my question would be is how should we handle this type of problem? Of
course we could always limit the size of the search term queries in the
application side but is there anything we could do in our configuration that
could prevent the OOM issues even if some random user intentionally bombard
us with long search queries in the front end?


If you're running out of memory, then Solr will need a larger heap, or 
you'll need to change something so it requires less heap.


A large query string is one of those things that might require a larger 
heap.


The default heap size that Solr has shipped with since 5.0 is 512MB ... 
which is VERY small.  Virtually all Solr users will need to increase 
this or they will run into OOME, or find that their server is running 
extremely slow.  It does not take very much index data to require more 
than 512MB heap.


A thought for Erick and other committers:  I know we are trying to 
reduce log verbosity.  But along the same lines as the log entries about 
file and process limits, I was thinking it might be a good idea to have 
a one-line WARN entry if the max heap size is 1GB or less.  And a config 
option to disable the logging.


Thanks,
Shawn


Solr 7.7.1 issue: TemplateTransformer doesn't take the value of static template attribute value

2019-05-03 Thread Irfan Nagoo
Hi,

Recently we upgraded our Solr from 5.1 to 7.7.1.  Here is an example of an 
entity in data-config.xml to illustrate the issue we are facing:





Re: Solr long q values

2019-05-03 Thread Walter Underwood
We run very long queries with an 8 GB heap. 30 million documents in 8 shards 
with an average query length of 25 terms.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 3, 2019, at 6:49 PM, Shawn Heisey  wrote:
> 
> On 5/3/2019 2:32 AM, solrnoobie wrote:
>> So whenever we have long q values (from a sentence to a small paragraph), we
>> encounter some heap problems (OOM) and I guess this is normal?
>> So my question would be is how should we handle this type of problem? Of
>> course we could always limit the size of the search term queries in the
>> application side but is there anything we could do in our configuration that
>> could prevent the OOM issues even if some random user intentionally bombard
>> us with long search queries in the front end?
> 
> If you're running out of memory, then Solr will need a larger heap, or you'll 
> need to change something so it requires less heap.
> 
> A large query string is one of those things that might require a larger heap.
> 
> The default heap size that Solr has shipped with since 5.0 is 512MB ... which 
> is VERY small.  Virtually all Solr users will need to increase this or they 
> will run into OOME, or find that their server is running extremely slow.  It 
> does not take very much index data to require more than 512MB heap.
> 
> A thought for Erick and other committers:  I know we are trying to reduce log 
> verbosity.  But along the same lines as the log entries about file and 
> process limits, I was thinking it might be a good idea to have a one-line 
> WARN entry if the max heap size is 1GB or less.  And a config option to 
> disable the logging.
> 
> Thanks,
> Shawn



Re: Solr long q values

2019-05-03 Thread Erick Erickson
Shawn:

We already do warnings for ulimits, so memory seems reasonable. Along the same 
vein, does starting with 512M make sense either?

Feel free to, raise a JIRA, but I won’t have any time to work on it….

> On May 3, 2019, at 3:27 PM, Walter Underwood  wrote:
> 
> We run very long queries with an 8 GB heap. 30 million documents in 8 shards 
> with an average query length of 25 terms.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On May 3, 2019, at 6:49 PM, Shawn Heisey  wrote:
>> 
>> On 5/3/2019 2:32 AM, solrnoobie wrote:
>>> So whenever we have long q values (from a sentence to a small paragraph), we
>>> encounter some heap problems (OOM) and I guess this is normal?
>>> So my question would be is how should we handle this type of problem? Of
>>> course we could always limit the size of the search term queries in the
>>> application side but is there anything we could do in our configuration that
>>> could prevent the OOM issues even if some random user intentionally bombard
>>> us with long search queries in the front end?
>> 
>> If you're running out of memory, then Solr will need a larger heap, or 
>> you'll need to change something so it requires less heap.
>> 
>> A large query string is one of those things that might require a larger heap.
>> 
>> The default heap size that Solr has shipped with since 5.0 is 512MB ... 
>> which is VERY small.  Virtually all Solr users will need to increase this or 
>> they will run into OOME, or find that their server is running extremely 
>> slow.  It does not take very much index data to require more than 512MB heap.
>> 
>> A thought for Erick and other committers:  I know we are trying to reduce 
>> log verbosity.  But along the same lines as the log entries about file and 
>> process limits, I was thinking it might be a good idea to have a one-line 
>> WARN entry if the max heap size is 1GB or less.  And a config option to 
>> disable the logging.
>> 
>> Thanks,
>> Shawn
> 



Re: Reverse-engineering existing installation

2019-05-03 Thread Erick Erickson
Doug:

You can pull any version of Solr from Git.

git clone https://gitbox.apache.org/repos/asf/lucene-solr.git some_local_dir

Then git will let you check out any previous branch. 4.2 is from before we 
switched to Git, co I’m not sure you can go that far back, but 4x is probably 
close enough for comparing configs.

All that said, and assuming you’re going to either 7x or 8x… I’d just think 
about starting over. Once you get the old configs and account for 

1> any schema changes.
2> any config changes, _especially_ any custom components

Consider starting with a current version of Solr and re-indexing. You’ll 
absolutely _have_ to re-index _all_ your source material anyway so thinking 
about going from 4x->5x->6x->7x->8x is futile anyway.

Best,
Erick

> On May 3, 2019, at 12:51 PM, Doug Reeder  wrote:
> 
> Thanks! Alexandre's presentation is helpful in understanding what's not
> essential.  David's suggesting of comparing config files is good - I'll
> have to see if I can dig up the config files for version 4.2, which we're
> currently running.
> 
> I'll also look into updating to a supported version. I guess I'll be
> reading https://lucene.apache.org/solr/guide/6_6/upgrading-solr.html and
> the similar ones for later versions.  Is an upgrade guide for version 4 to
> 5 still around somewhere?
> 
> On Fri, May 3, 2019 at 12:21 AM David Smiley 
> wrote:
> 
>> Consider trying to diff configs from a default at the version it was copied
>> from, if possible. Even better, the configs should be in source control and
>> then you can browse history with commentary and sometimes links to issue
>> trackers and code reviews.
>> 
>> Also a big part that you can’t see by staring at configs is what the
>> queries look like. You should examine the system interacting with Solr to
>> observe embedded comments/docs for insights.
>> 
>> On Thu, May 2, 2019 at 11:21 PM Doug Reeder 
>> wrote:
>> 
>>> The documentation for SOLR is good.  However it is oriented toward
>> setting
>>> up a new installation, with the data model known.
>>> 
>>> I have inherited an existing installation.  Aspects of the data model I
>>> know, but there's a lot of ways things could have been configured in
>> SOLR,
>>> and for some cases, I don't know what SOLR was supposed to do.
>>> 
>>> Can you reccomend any documentation on working out the configuration of
>> an
>>> existing installation?
>>> 
>> --
>> Sent from Gmail Mobile
>> 



Re: Solr long q values

2019-05-03 Thread Shawn Heisey

On 5/3/2019 1:37 PM, Erick Erickson wrote:

We already do warnings for ulimits, so memory seems reasonable. Along the same 
vein, does starting with 512M make sense either?

Feel free to, raise a JIRA, but I won’t have any time to work on it….


Done.

https://issues.apache.org/jira/browse/SOLR-13446

I think that for typical server systems, starting with a 512MB heap is a 
little bit nuts.


I think I know why such a low number was chosen.  Without a much smarter 
startup, a super low default is the only way to ensure that Solr will 
start on virtually any system that somebody tries it on, like the small 
AWS servers.


Thanks,
Shawn


Re: Solr long q values

2019-05-03 Thread Walter Underwood
512M was the default heap for Java 1.1. We never changed the default. So no 
size was “chosen”.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 3, 2019, at 10:11 PM, Shawn Heisey  wrote:
> 
> On 5/3/2019 1:37 PM, Erick Erickson wrote:
>> We already do warnings for ulimits, so memory seems reasonable. Along the 
>> same vein, does starting with 512M make sense either?
>> Feel free to, raise a JIRA, but I won’t have any time to work on it….
> 
> Done.
> 
> https://issues.apache.org/jira/browse/SOLR-13446
> 
> I think that for typical server systems, starting with a 512MB heap is a 
> little bit nuts.
> 
> I think I know why such a low number was chosen.  Without a much smarter 
> startup, a super low default is the only way to ensure that Solr will start 
> on virtually any system that somebody tries it on, like the small AWS servers.
> 
> Thanks,
> Shawn



Re: Reverse-engineering existing installation

2019-05-03 Thread Shawn Heisey

On 5/3/2019 1:44 PM, Erick Erickson wrote:

Then git will let you check out any previous branch. 4.2 is from before we 
switched to Git, co I’m not sure you can go that far back, but 4x is probably 
close enough for comparing configs.


Git has all of Lucene's history, and most of Solr's history, back to 
when Lucene and Solr were merged before the 3.1.0 release.  So the 4.x 
releases are there:



elyograg@smeagol:~/asf/lucene-solr$ git checkout releases/lucene-solr/4.2.1
Checking out files: 100% (13209/13209), done.
Note: checking out 'releases/lucene-solr/4.2.1'.

You are in 'detached HEAD' state. You can look around, make experimental 
changes and commit them, and you can discard any commits you make in 
this state without impacting any branches by performing another checkout.


If you want to create a new branch to retain commits you create, you may 
do so (now or later) by using -b with the checkout command again. Example:


  git checkout -b 

HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.


Thanks,
Shawn


Solr RuleBasedAuthorizationPlugin question

2019-05-03 Thread Jérémy
Hi,

I hope that this question wasn't answered already, but I couldn't find what
I was looking for in the archives.

I'm having a hard time to use solr with the BasicAuth and
RoleBasedAuthorization plugins.
The auth part works well but I have issues with the RoleBasedAuthorization
part. I'd like to have an admin role and a readonly one. I have two users,
each having one role. However both of them can create cores, delete
documents etc...

Here's my security.json:
{
  "authentication": {
"blockUnknown": true,
"class": "solr.BasicAuthPlugin",
"credentials": {
  "adminuser": "adminpwd",
  "readuser": "readpwd"
}
  },
  "authorization": {
"class": "solr.RuleBasedAuthorizationPlugin",
"permissions": [
  {
"name": "read",
"role": "readonly"
  },
  {
"name": "security-edit",
"role": "admin"
  }
],
"user-role": {
  "readuser": "readonly",
  "adminuser": "admin"
}
  }
}

I tried that with Solr 7.7.0 and 8.0.0, in cloud and standalone mode. I
can't figure out why the readuser can delete documents.

Any help is appreciated!

Thanks,
Jeremy


Re: [collection create & delete] collection It is not created after several hundred times when it is repeatedly deleted and created. Resolved after restarting the service.

2019-05-03 Thread Shawn Heisey

On 4/30/2019 1:38 AM, 유정인 wrote:

2019-04-27 21:50:32.043 ERROR (OverseerThreadFactory-1184-thread-4-
processing-n:211.60.221.94:9080_) [   ]
o.a.s.c.a.c.OverseerCollectionMessageHandler [processResponse:880] Error
from shard: http://x.x.x.x:8080
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://x.x.x.x:8080





Caused by: java.net.SocketException: Connection reset


Java is saying that the TCP connection was reset.

This would most likely mean that at the TCP layer, a RST packet was 
received.  To my knowledge, Solr doesn't reset connections.  Solr 
doesn't contain ANY networking code.  On your install, all the 
networking is done in Tomcat code, which utilizes Java APIs, and the 
heavy lifting of TCP itself is handled by the operating system.


So when I said that I didn't think Tomcat could be responsible for your 
problem, I might have been wrong.  This seems to be a problem at the 
transport layer, and that is something that Tomcat probably can influence.


This could also be a lower level problem ... with the OS, or with your 
networking equipment.  If the Solr connections are traversing the 
Internet or a WAN, it could also be a networking problem you cannot 
control at all -- somewhere in somebody else's routing infrastructure.


Why did you choose the more difficult path of running under Tomcat? 
Have you tried running on the fully tested container that comes with Solr?


Thanks,
Shawn


Re: Reverse-engineering existing installation

2019-05-03 Thread Doug Reeder
Thanks! Diffs for solr.xml and zoo.cfg were easy, but it looks like we'll
need to strip the comments before we can get a useful diff of
solrconfig.xml or schema.xml.  Can you recommend tools to normalize XML
files?  XMLStarlet is hosted on SourceForge, which I no longer trust, and
hasn't been updated in years.


On Fri, May 3, 2019 at 4:24 PM Shawn Heisey  wrote:

> On 5/3/2019 1:44 PM, Erick Erickson wrote:
> > Then git will let you check out any previous branch. 4.2 is from before
> we switched to Git, co I’m not sure you can go that far back, but 4x is
> probably close enough for comparing configs.
>
> Git has all of Lucene's history, and most of Solr's history, back to
> when Lucene and Solr were merged before the 3.1.0 release.  So the 4.x
> releases are there:
>
> 
> elyograg@smeagol:~/asf/lucene-solr$ git checkout
> releases/lucene-solr/4.2.1
> Checking out files: 100% (13209/13209), done.
> Note: checking out 'releases/lucene-solr/4.2.1'.
>
> You are in 'detached HEAD' state. You can look around, make experimental
> changes and commit them, and you can discard any commits you make in
> this state without impacting any branches by performing another checkout.
>
> If you want to create a new branch to retain commits you create, you may
> do so (now or later) by using -b with the checkout command again. Example:
>
>git checkout -b 
>
> HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.
> 
>
> Thanks,
> Shawn
>


Re: Reverse-engineering existing installation

2019-05-03 Thread Erick Erickson
Wait. I was recommending you diff the 4.2.1 solrconfig and the solrconfig 
you’re using. Ditto with the schema. If you’re trying to diff the 7x or 8x ones 
they’ll be totally different.

But if you are getting massive differences in the yo4.2.1 stock and what you’re 
using, then whoever set it up made the changes and you’ll probably have to go 
through them by hand, noting all the differences in the non-commented parts.

Things that are _missing_ from the one you’re using .vs. the stock distro files 
you can pretty much ignore. They’ll be interesting in that you can delete the 
equivalent from the new distro, but…

I expect the schema will be the most different, solrconfig usually doesn’t 
change much.

FWIW,
Erick



> On May 3, 2019, at 7:30 PM, Doug Reeder  wrote:
> 
> Thanks! Diffs for solr.xml and zoo.cfg were easy, but it looks like we'll
> need to strip the comments before we can get a useful diff of
> solrconfig.xml or schema.xml.  Can you recommend tools to normalize XML
> files?  XMLStarlet is hosted on SourceForge, which I no longer trust, and
> hasn't been updated in years.
> 
> 
> On Fri, May 3, 2019 at 4:24 PM Shawn Heisey  wrote:
> 
>> On 5/3/2019 1:44 PM, Erick Erickson wrote:
>>> Then git will let you check out any previous branch. 4.2 is from before
>> we switched to Git, co I’m not sure you can go that far back, but 4x is
>> probably close enough for comparing configs.
>> 
>> Git has all of Lucene's history, and most of Solr's history, back to
>> when Lucene and Solr were merged before the 3.1.0 release.  So the 4.x
>> releases are there:
>> 
>> 
>> elyograg@smeagol:~/asf/lucene-solr$ git checkout
>> releases/lucene-solr/4.2.1
>> Checking out files: 100% (13209/13209), done.
>> Note: checking out 'releases/lucene-solr/4.2.1'.
>> 
>> You are in 'detached HEAD' state. You can look around, make experimental
>> changes and commit them, and you can discard any commits you make in
>> this state without impacting any branches by performing another checkout.
>> 
>> If you want to create a new branch to retain commits you create, you may
>> do so (now or later) by using -b with the checkout command again. Example:
>> 
>>   git checkout -b 
>> 
>> HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.
>> 
>> 
>> Thanks,
>> Shawn
>>