from:"david"



So I started over (deleted all documents), re-deployed configs to 
zookeeper and reloaded the collection.


This error still appears when I group.field=f1

unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED). 
Use UninvertingReader or index with docvalues.


What exactly does this error mean and why am I getting it with a field 
that doesn't even have docValues defined?


Why is the DocValues code being used when docValues are not defined 
anywhere in my schema.xml?



null:java.lang.IllegalStateException: unexpected docvalues type 
SORTED_SET for field 'f1' (expected=SORTED). Use UninvertingReader or 
index with docvalues.

at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
at org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)
	at 
org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector.doSetNextReader(TermFirstPassGroupingCollector.java:92)
	at 
org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)
	at 
org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:117)
	at 
org.apache.lucene.search.TimeLimitingCollector.getLeafCollector(TimeLimitingCollector.java:144)
	at 
org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:117)

at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:763)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)
	at 
org.apache.solr.search.grouping.CommandHandler.searchWithTimeLimiter(CommandHandler.java:233)
	at 
org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:160)
	at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:398)


etc ...



On 02/28/2016 05:31 PM, David Santamauro wrote:


I'm porting a 4.8 schema to 5.3 and I came across this new error when I
tried to group.field=f1:

unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED).
Use UninvertingReader or index with docvalues.

f1 is defined as

 
   
 
 
 
   
 

   

Notice that I don't have docValues defined. I realize the field type
doesn't allow docValues so why does this group request fail with a
docValues error? It did work with 4.8

Any clue would be appreciated, thanks

David

Re: docValues error





On 02/29/2016 06:05 AM, Mikhail Khludnev wrote:

On Mon, Feb 29, 2016 at 12:43 PM, David Santamauro <
david.santama...@gmail.com> wrote:


unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED). Use
UninvertingReader or index with docvalues.


  DocValues is primary citizen api for accessing forward-view index, ie. it
replaced FieldCache. The error is caused by an attempt to group by
multivalue field, which is explicitly claimed as unsupported in the doc.



You will have noticed below, the field definition does not contain 
multiValues=true




On 02/28/2016 05:31 PM, David Santamauro wrote:



f1 is defined as

Re: docValues error




On 02/29/2016 07:59 AM, Tom Evans wrote:

On Mon, Feb 29, 2016 at 11:43 AM, David Santamauro
 wrote:

You will have noticed below, the field definition does not contain
multiValues=true


What version of the schema are you using? In pre 1.1 schemas,
multiValued="true" is the default if it is omitted.


1.5

Other single-value fields (tint, string) group correctly. The move from 
4.8 to 5.3 has rendered grouping on populated, single-value, 
solr.TextField fields crippled -- at least for me.

Re: docValues error



thanks Shawn, that seems to be the error exactly.

On 02/29/2016 09:22 AM, Shawn Heisey wrote:

On 2/28/2016 3:31 PM, David Santamauro wrote:


I'm porting a 4.8 schema to 5.3 and I came across this new error when
I tried to group.field=f1:

unexpected docvalues type SORTED_SET for field 'f1' (expected=SORTED).
Use UninvertingReader or index with docvalues.

f1 is defined as

 
   
 
 
 
   
 

   

Notice that I don't have docValues defined. I realize the field type
doesn't allow docValues so why does this group request fail with a
docValues error? It did work with 4.8

Any clue would be appreciated, thanks


It sounds like you are running into pretty much exactly what I did with 5.x.

https://issues.apache.org/jira/browse/SOLR-8088

I had to create a copyField that's a string (StrField) type and include
docValues on that field.  I still can't use my tokenized field like I
want to, as I do in 4.x.

Thanks,
Shawn

Re: Regarding google maps polyline to use IsWithin(POLYGON(())) in solr

2016-03-15 Thread David Smiley

Hi Pradeep,

Are you seeing an error when it doesn't work?  I believe a shape
overlapping itself will cause an error from JTS.  If you do see that, then
you can ask Spatial4j (used by Lucene/Solr) to attempt to deal with it in a
number of ways.  See "validationRule":
https://locationtech.github.io/spatial4j/apidocs/org/locationtech/spatial4j/context/jts/JtsSpatialContextFactory.html
<https://locationtech.github.io/spatial4j/apidocs/>
Probably try validationRule="repairBuffer0".

If it still doesn't work (and if you can't use what I say next), I
suggesting debugging this at the JTS level.  You might then wind up
submitting a question to the JTS list.

Spatial4j extends the WKT syntax with a BUFFER() syntax which is possibly
easier/better than your approach of manually building up the buffered path
with your own code to produce a large polygon to send to Solr.  You would
do something like BUFFER(LINESTRING(...),0.001) whereas "10" is the
distance in degrees if you have geo="true", otherwise whatever units your
data was put in.  You can use that with or without JTS since Spatial4j has
a native BufferedLineString shape.  But FYI it doesn't support geo="true"
very well (i.e. working in degrees); the buffer will be skewed very much
away from the equator.  So you could set geo="false" and supply, say,
web-mercator bounding box and work in that Euclidean/2D projected space.

Another FYI, Lucene has a "Geo3d" package within the Spatial3d module that
has a native implementation of a buffered LineString as well, one that
works on the surface of the earth.  It hasn't yet been hooked into
Spatial4j, after which Solr would need no changes.  There's a user "Chris"
who is working on that; it's filed here:
https://github.com/locationtech/spatial4j/issues/134

Good luck.

~ David

On Tue, Mar 15, 2016 at 2:45 PM Pradeep Chandra <
pradeepchandra@gmail.com> wrote:

> Hi Sir,
>
> I want to draw a polyline along the route given by google maps (from one
> place to another place).
>
> I applied the logic of calculating parallel lines between the two markers
> on the route on both sides of the route. Because of the non-linear nature
> of the route. In some cases the polyline is overlapping.
>
> Finally what I am willing to do is by drawing that polyline along the
> route. I will give that polygon go Solr in order to get the results within
> the polygon. But where the problem I am getting is because of the
> overlapping nature of polyline, the Solr is not taking that shape.
>
> Can u suggest me a logic to draw a polyline along the route / Let me know
> is there any type to fetch the data with that type of polyline also in Sorl.
>
> I construct a polygon with 300 points. But for that solr is not giving any
> result..Where as it is giving for results for polygon having points of <
> 200...Can u tell me about the max no.of points to construct a polygon using
> solr...Or it is restricted to that many points in solr.
>
> I am sending some images of my final desired one & my applied one. Please
> find those attachments.
>
> Thanks and Regards
> M Pradeep Chandra
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Regarding google maps polyline to use IsWithin(POLYGON(())) in solr

2016-03-19 Thread David Smiley

JTS doesn't has any vertex limit on the geometries.  So I don't know why
your query isn't working.

On Wed, Mar 16, 2016 at 1:58 AM Pradeep Chandra <
pradeepchandra@gmail.com> wrote:

> Hi Sir,
>
> Let me give some clarification on IsWithin(POLYGON(())) query...It is not
> giving any result for beyond 213 points of polygon...
>
> Thanks
> M Pradeep Chandra
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Regarding-google-maps-polyline-to-use-IsWithin-POLYGON-in-solr-tp4263975p4264046.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Seasonal searches in SOLR 5.x

2016-03-22 Thread David Smiley

Hi,

I suggest having a "season" field (or whatever you might want to call it)
using DateRangeField but simply use a nominal year value.  So basically all
durations would be within this nominal year.  For some docs that span
new-years, this might mean 2 durations and that's okay.  Also it's okay if
you have multiple values and it's okay if your calculations result in some
that overlap; you needn't make them distinct; it'll all get coalesced in
the index.

If for some reason you wind up going the route of abusing point data for
durations, I recommend this link:
http://wiki.apache.org/solr/SpatialForTimeDurations
and it most definitely does not require polygons (and thus JTS); I'm not
sure what gave you that impression.  It's all rectangles & points.

~ David

On Mon, Mar 21, 2016 at 1:29 PM Ioannis Kirmitzoglou <
ioanniskirmitzog...@gmail.com> wrote:

> Hi all,
>
> I would like to implement seasonal date searches on date ranges. I’m using
> SOLR 5.4.1 and have indexed date ranges using a DateRangeField (let’s call
> this field date_ranges).
> Each document in SOLR corresponds to a biological sample and each sample
> was collected during a date range that can span from a single day to
> multiple years. For my application it makes sense to enable seasonal
> searches, ie find samples that were collected during a specific period of
> the year (e.g. summer, or February). In this type of search, the year that
> the sample was collected is not relevant, only the days of the year. I’ve
> been all over SOLR documentation and I haven’t been able to find anything
> that will enable do me that. The closest I got was a post with instructions
> on how to use a spatial field to do date searches (
> https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/).
> Using the logic in that post I was able to come up with a solution but it’s
> rather complex and needs polygon searches (which in turn means installing
> the JTS Topology suite).
> Before committing to that I would like to ask for your input and whether
> there’s an easier way to do these types of searches.
>
> Many thanks,
>
> Ioannis
>
> -
> Ioannis Kirmitzoglou, PhD
> Bioinformatician - Scientific Programmer
> Imperial College, London
> www.vectorbase.org
> www.vigilab.org
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Deleted documents and expungeDeletes

2016-03-30 Thread David Santamauro




On 03/30/2016 08:23 AM, Jostein Elvaker Haande wrote:

On 30 March 2016 at 12:25, Markus Jelsma  wrote:

Hello - with TieredMergePolicy and default reclaimDeletesWeight of 2.0, and 
frequent updates, it is not uncommon to see a ratio of 25%. If you want deletes 
to be reclaimed more often, e.g. weight of 4.0, you will see very frequent 
merging of large segments, killing performance if you are on spinning disks.


Most of our installations are on spinning disks, so if I want a more
aggressive reclaim, this will impact performance. This is of course
something that I do not desire, so I'm wondering if scheduling a
commit with 'expungeDeletes' during off peak business hours is a
better approach than setting up a more aggressive merge policy.



As far as my experimentation with @expungeDeletes goes, if the data you 
indexed and committed using @expungeDeletes didn't touch segments with 
any deleted documents nor wasn't enough data to cause merging with a 
segment containing deleted documents, no deleted documents will be 
removed. Basically, @expungeDeletes expunges deletes in segments 
affected by the commit. If you have a large update that touches many 
segments containing deleted documents and you use @expungeDeletes, it 
could be just as resource intensive as an optimize.


My setting for reclaimDeletesWeight:
  5.0

It keeps the deleted documents down to ~ 10% without any noticable 
impact on resources or performance. But I'm still in the testing phase 
with this setting.

Re: Deleted documents and expungeDeletes

2016-04-01 Thread David Santamauro



The docs on reclaimDeletesWeight say:

"Controls how aggressively merges that reclaim more deletions are 
favored. Higher values favor selecting merges that reclaim deletions."


I can't imagine you would notice anything after only a few commits. I 
have many shards that size or larger and what I do occasionally is to 
loop an optimize, setting maxSegments with decremented values, e.g.,


for maxSegments in $( seq 40 -1 20 ); do
  # optimize maxSegments=$maxSegments
done

It's definitely a poor-man's hack and is clearly not the most efficient 
way of optimizing, but it does remove deletes without requiring double 
or triple the disk space that a full optimize requires. I can usually 
reclaim 100-300GB of disk space in a collection that us currently ~ 2TB 
-- not inconsequential.


Seeing you only have 1.6M documents, perhaps an index rebuild isn't out 
of the question? I did just that on a test collection with 100M 
documents. Starting with 0 deleted docs, a reclaimDeletesWeight=5.0 and 
probably about 1-3% document turnover per week (updates) over the last 3 
months and my deleted percentage is staying below 10%.


If that's not an option, keeping reclaimDeletesWeight at 5.0 and using 
expungeDeletes=true on commit will get that percentage down over time.


//


On 04/01/2016 04:49 AM, Jostein Elvaker Haande wrote:

On 30 March 2016 at 17:46, Erick Erickson  wrote:

through a clever bit of reflection, you can set the
reclaimDeletesWeight variable from solrconfig by including something
like
5 (going from memory
here, you'll get an error on startup if I've messed it up.)


I added the following to my solrconfig a couple of days ago:

 
   8
   8
   5.0
 

There has been several commits and the core is current according to
SOLR admin, however I'm still seeing a lot of deleted docs. These are
my current core statistics.

Last Modified:4 minutes ago
Num Docs:1 675 255
Max Doc:2 353 476
Heap Memory Usage:208 464 267
Deleted Docs:678 221
Version:1 870 539
Segment Count:39

Index size is close to 149GB.

So at the moment, I'm seeing a deleted docs to max docs percentage
ratio of 28.81%. With 'reclaimsWeight' set to 5, it doesn't seem to be
deleting away any deleted docs.

Anything obvious I'm missing?

Solr update fails with “Could not initialize class sun.nio.fs.LinuxNativeDispatcher”

2016-04-07 Thread David Moles

Hi folks,

New Solr user here, attempting to apply the following Solr update command via 
curl

curl 'my-solr-server:8983/solr/my-core/update?commit=true' \
  -H 'Content-type:application/json' -d \
  '[{"my_id_field":"some-id-value","my_other_field":{"set":"new-field-value"}}]'

I'm getting an error response with a stack trace that reduces to:

Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
sun.nio.fs.LinuxNativeDispatcher
at sun.nio.fs.LinuxFileSystem.getMountEntries(LinuxFileSystem.java:81)
at sun.nio.fs.LinuxFileStore.findMountEntry(LinuxFileStore.java:86)
at sun.nio.fs.UnixFileStore.(UnixFileStore.java:65)
at sun.nio.fs.LinuxFileStore.(LinuxFileStore.java:44)
at 
sun.nio.fs.LinuxFileSystemProvider.getFileStore(LinuxFileSystemProvider.java:51)
at 
sun.nio.fs.LinuxFileSystemProvider.getFileStore(LinuxFileSystemProvider.java:39)
at 
sun.nio.fs.UnixFileSystemProvider.getFileStore(UnixFileSystemProvider.java:368)
at java.nio.file.Files.getFileStore(Files.java:1461)
at org.apache.lucene.util.IOUtils.getFileStore(IOUtils.java:528)
at org.apache.lucene.util.IOUtils.spinsLinux(IOUtils.java:483)
at org.apache.lucene.util.IOUtils.spins(IOUtils.java:472)
at org.apache.lucene.util.IOUtils.spins(IOUtils.java:447)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.initDynamicDefaults(ConcurrentMergeScheduler.java:371)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:457)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1817)
at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2761)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2866)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2833)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:586)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
... 22 more

It looks like sun.nio.fs can't find its own classes, which seems odd. Solr is 
running with OpenJDK 1.8.0_77 on Amazon Linux AMI release 2016.03.

Does anyone know what might be going on here? Is it an OpenJDK / Amazon Linux 
problem?

--
David Moles
UC Curation Center
California Digital Library

Re: Solr update fails with “Could not initialize class sun.nio.fs.LinuxNativeDispatcher”

2016-04-07 Thread David Moles

Hmm, I wonder whether I *am* using an SSD or spinning disk, in Apache. :) I 
guess I can try to find out.

I forgot to mention, this is with Solr 5.2.1 — is that likely to make much 
difference?

-- 
David Moles
UC Curation Center
California Digital Library










On 4/7/16, 4:19 PM, "Chris Hostetter"  wrote:

>
>hat's a strainge error to get.
>
>I can't explain why LinuxFileSystem can't load LinuxNativeDispatcher, but 
>you can probably bypass hte entire situation by explicitly configuring 
>ConcurrentMergeScheduler with defaults so that it doesn't try determine 
>wether you are using an SSD or "spinning" disk...
>
>http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/index/ConcurrentMergeScheduler.html
>https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments
>
>Something like this in your indexConfig settings...
>
>
>  42
>  7
>
>
>...will force those specific settings, instead of trying to guess 
>defaults.
>
>I haven't tested this, but in theory you can also use something like to 
>indicate definitively that you are using a spinning disk (or not) but let 
>it pick the appropriate default values for the merge count & 
>threads accordingly ...
>
>
>  true
>
>
>
>
>: Date: Thu, 7 Apr 2016 22:56:54 +
>: From: David Moles 
>: Reply-To: solr-user@lucene.apache.org
>: To: "solr-user@lucene.apache.org" 
>: Subject: Solr update fails with “Could not initialize class
>: sun.nio.fs.LinuxNativeDispatcher”
>: 
>: Hi folks,
>: 
>: New Solr user here, attempting to apply the following Solr update command 
>via curl
>: 
>: curl 'my-solr-server:8983/solr/my-core/update?commit=true' \
>:   -H 'Content-type:application/json' -d \
>:   
>'[{"my_id_field":"some-id-value","my_other_field":{"set":"new-field-value"}}]'
>: 
>: I'm getting an error response with a stack trace that reduces to:
>: 
>: Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
>sun.nio.fs.LinuxNativeDispatcher
>: at sun.nio.fs.LinuxFileSystem.getMountEntries(LinuxFileSystem.java:81)
>: at sun.nio.fs.LinuxFileStore.findMountEntry(LinuxFileStore.java:86)
>: at sun.nio.fs.UnixFileStore.(UnixFileStore.java:65)
>: at sun.nio.fs.LinuxFileStore.(LinuxFileStore.java:44)
>: at 
>sun.nio.fs.LinuxFileSystemProvider.getFileStore(LinuxFileSystemProvider.java:51)
>: at 
>sun.nio.fs.LinuxFileSystemProvider.getFileStore(LinuxFileSystemProvider.java:39)
>: at 
>sun.nio.fs.UnixFileSystemProvider.getFileStore(UnixFileSystemProvider.java:368)
>: at java.nio.file.Files.getFileStore(Files.java:1461)
>: at org.apache.lucene.util.IOUtils.getFileStore(IOUtils.java:528)
>: at org.apache.lucene.util.IOUtils.spinsLinux(IOUtils.java:483)
>: at org.apache.lucene.util.IOUtils.spins(IOUtils.java:472)
>: at org.apache.lucene.util.IOUtils.spins(IOUtils.java:447)
>: at 
>org.apache.lucene.index.ConcurrentMergeScheduler.initDynamicDefaults(ConcurrentMergeScheduler.java:371)
>: at 
>org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:457)
>: at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1817)
>: at 
>org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2761)
>: at 
>org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2866)
>: at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2833)
>: at 
>org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:586)
>: at 
>org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
>: at 
>org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>: at 
>org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
>: at 
>org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
>: at 
>org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
>: at 
>org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
>: at 
>org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
>: at 
>org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>: at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>: at org.apache.solr.serv

Solr Support for BM25F

2016-04-14 Thread David Cawley

Hello,
I am developing an enterprise search engine for a project and I was hoping
to implement BM25F ranking algorithm to configure the tuning parameters on
a per field basis. I understand BM25 similarity is now supported in Solr
but I was hoping to be able to configure k1 and b for different fields such
as title, description, anchor etc, as they are structured documents.
I am fairly new to Solr so any help would be appreciated. If this is
possible or any steps as to how I can go about implementing this it would
be greatly appreciated.

Regards,

David

Current Solr Version 5.4.1

Re: Facet heatmaps: cluster coordinates based on average position of docs

2016-04-19 Thread David Smiley

Hi Anton,

Perhaps you should request a more detailed / high-res heatmap, and then
work with that, perhaps using some clustering technique?  I confess I don't
work on the UI end of things these days.

p.s. I'm on vacation this week; so I don't respond quickly

~ David

On Thu, Apr 7, 2016 at 3:43 PM Anton K.  wrote:

> I am working with new solr feature: facet heatmaps. It works great, i
> create clusters on my map with counts. When user click on cluster i zoom in
> that area and i might show him more clusters or documents (based on current
> zoom level).
>
> But all my cluster icons (i use round one, see screenshot below) placed
> straight in the center of cluster's rectangles:
>
> https://dl.dropboxusercontent.com/u/1999619/images/map_grid3.png
>
> Some clusters can be in sea and so on. Also it feels not natural in my case
> to have icons placed orderly on the world map.
>
> I want to place cluster's icons in average coords based on coordinates of
> all my docs inside cluster. Is there any way to achieve this? I am trying
> to use stats component for facet heatmap but it isn't implemented yet.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Replicas for same shard not in sync

2016-04-25 Thread David Smith

Erick,

So that my understanding is correct, let me ask, if one or more replicas are 
down, updates presented to the leader still succeed, right?  If so, tedsolr is 
correct that the Solr client app needs to re-issue updates, if it wants 
stronger guarantees on replica consistency than what Solr provides.

The “Write Fault Tolerance” section of the Solr Wiki makes what I believe is 
the same point:

"On the client side, if the achieved replication factor is less than the 
acceptable level, then the client application can take additional measures to 
handle the degraded state. For instance, a client application may want to keep 
a log of which update requests were sent while the state of the collection was 
degraded and then resend the updates once the problem has been resolved."


https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance


Kind Regards,

David




On 4/25/16, 11:57 AM, "Erick Erickson"  wrote:

>bq: I also read that it's up to the
>client to keep track of updates in case commits don't happen on all the
>replicas.
>
>This is not true. Or if it is it's a bug.
>
>The update cycle is this:
>1> updates get to the leader
>2> updates are sent to all followers and indexed on the leader as well
>3> each replica writes the updates to the local transaction log
>4> all the replicas ack back to the leader
>5> the leader responds to the client.
>
>At this point, all the replicas for the shard have the docs locally
>and can take over as leader.
>
>You may be confusing indexing in batches and having errors with
>updates getting to replicas. When you send a batch of docs to Solr,
>if one of them fails indexing some of the rest of the docs may not
>be indexed. See SOLR-445 for some work on this front.
>
>That said, bouncing servers willy-nilly during heavy indexing, especially
>if the indexer doesn't know enough to retry if an indexing attempt fails may
>be the root cause here. Have you verified that your indexing program
>retries in the event of failure?
>
>Best,
>Erick
>
>On Mon, Apr 25, 2016 at 6:13 AM, tedsolr  wrote:
>> I've done a bit of reading - found some other posts with similar questions.
>> So I gather "Optimizing" a collection is rarely a good idea. It does not
>> need to be condensed to a single segment. I also read that it's up to the
>> client to keep track of updates in case commits don't happen on all the
>> replicas. Solr will commit and return success as long as one replica gets
>> the update.
>>
>> I have a state where the two replicas for one collection are out of sync.
>> One has some updates that the other does not. And I don't have log data to
>> tell me what the differences are. This happened during a maintenance window
>> when the servers got restarted while a large index job was running. Normally
>> this doesn't cause a problem, but it did last Thursday.
>>
>> What I plan to do is select the replica I believe is incomplete and delete
>> it. Then add a new one. I was just hoping Solr had a solution for this -
>> maybe using the ZK transaction logs to replay some updates, or force a
>> resync between the replicas.
>>
>> I will also implement a fix to prevent Solr from restarting unless one of
>> its config files has changed. No need to bounce Solr just for kicks.
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Replicas-for-same-shard-not-in-sync-tp4272236p4272602.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: issues doing a spatial query

2016-04-28 Thread David Smiley

Hi.
This makes sense to me.  The point 49.8,-97.1 is in your query box.  The
box is lower-left to upper-right, so your box is actually an almost
world-wrapping one grabbing all longitudes except  -93 to -92.  Maybe you
mean to switch your left & right.

On Sun, Apr 24, 2016 at 8:03 PM GW  wrote:

> I was not getting the results I expected so I started testing with the solr
> webclient
>
> Maybe I don;t understand things.
>
> simple test query
>
> q=*:*&fq=locations:[49,-92 TO 50,-93]
>
> I don't understand why I get a result set for longitude range -92 to -93
> but should be zero results as far as I understand.
>
>
> 
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 2,
> "params": {
>   "q": "*:*",
>   "indent": "true",
>   "fq": "locations:[49,-92 TO 50,-93]",
>   "wt": "json",
>   "_": "1461541195102"
> }
>   },
>   "response": {
> "numFound": 85,
> "start": 0,
> "docs": [
>   {
> "id": "data.spidersilk.co!337",
> "entity_id": "337",
> "type_id": "simple",
> "gender": "Male",
> "name": "Aviator Sunglasses",
> "short_description": "A timeless accessory staple, the
> unmistakable teardrop lenses of our Aviator sunglasses appeal to
> everyone from suits to rock stars to citizens of the world.",
> "description": "Gunmetal frame with crystal gradient
> polycarbonate lenses in grey. ",
> "size": "",
> "color": "",
> "zdomain": "magento.spidersilk.co",
> "zurl":
> "
> http://magento.spidersilk.co/index.php/catalog/product/view/id/337/s/aviator-sunglasses/
> ",
> "main_image_url":
> "
> http://magento.spidersilk.co/media/catalog/product/cache/0/image/9df78eab33525d08d6e5fb8d27136e95/a/c/ace000a_1.jpg
> ",
> "keywords": "Eyewear  ",
> "data_size": "851,564",
> "category": "Eyewear",
> "final_price_without_tax": "295,USD",
> "image_url": [
>   "
> http://magento.spidersilk.co/media/catalog/product/a/c/ace000a_1.jpg";,
>   "
> http://magento.spidersilk.co/media/catalog/product/a/c/ace000b_1.jpg";
> ],
> "locations": [
>   "37.4463603,-122.1591775",
>   "42.5857514,-82.8873787",
>   "41.6942622,-86.2697108",
>   "49.8522263,-97.1390697"
> ],
> "_version_": 1532418847465799700
>   },
>
>
>
> Thanks,
>
> GW
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Solr - index polygons from csv

2016-04-28 Thread David Smiley

Hi.

To use polygons, you need to add JTS, otherwise you get an unsupported
shape error.  See
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
it involves not only adding a JTS lib to your classpath (ideal spot is
WEB-INF/lib ) but also adding a spatialContextFactory attribute.  Note that
the value of this attribute is different from 6.0 forward (as seen on the
live page), so get a PDF copy of the ref guide matching the Solr version
you are using if you are not on the latest.  Also, I recommend using
solr.RptWithGeometrySpatialField for indexing non-point data (and it'll
probably work fine for point data too).

When you use geo=false, there are no units or it might have an ignorable
value of degrees.  Essentially it's in whatever units your data is on the
Euclidean 2D plane.

~ David

On Fri, Apr 22, 2016 at 4:33 AM Jan Nekuda  wrote:

> Hello guys,
> I use solr 6 for indexing data with points and polygons.
>
> I have a question about indexing polygons from csv file. I have configured
> type:
>  class="solr.SpatialRecursivePrefixTreeFieldType" geo="false"
> maxDistErr="0.001" worldBounds="ENVELOPE(-1,-1,-1,-1)"
> distErrPct="0.025" distanceUnits="kilometers"/>
>
> and field
>  stored="true"/>
>
> I have tried to import this csv:
>
> kod_adresa,nazev_ulice,cislo_orientacni,cislo_domovni,polygon_mapa,nazev_obec,Nazev_cast_obce,kod_ulice,kod_cast_obce,kod_obec,kod_momc,nazev_momc,Nazev,psc,nazev_vusc,kod_vusc,Nazev_okres,Kod_okres
> 9,,,4,"POLYGON ((-30 -10,-10 -20,-20 -40,-40 -40,-30
> -10))",Vacov,Javorník,,57843,550621,,,Stachy,38473,Jihočeský
> kraj,35,Prachatice,3306
>
> and result is:
>
> Posting files to [base] url http://localhost:8983/solr/ruian/update...
> Entering auto mode. File endings considered are
>
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> POSTing file polygon.csv (text/csv) to [base]
> SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url:
> http://localhost:8983/solr/ruian/update
> SimplePostTool: WARNING: Response: 
> 
> 400 name="QTime">3 name="error-class">org.apache.solr.common.SolrException
> name="root-error-class">java.lang.UnsupportedOperationException name="msg">Couldn't parse shape 'POLYGON ((-30 -10,-10 -20,-20 -40,-40
> -40,-30 -10))' because: java.lang.UnsupportedOperationException:
> Unsupported shape of this SpatialContext. Try JTS or Geo3D. name="code">400
> 
> SimplePostTool: WARNING: IOException while reading response:
> java.io.IOException: Server returned HTTP response code: 400 for URL:
> http://localhost:8983/solr/ruian/update
> 1 files indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/ruian/update.
> ..
> Time spent: 0:00:00.036
>
> Could someone give me any advice how to solve it? With indexing points in
> the same way I'm fine.
>
> and one more question:
> I have this field type:
>   class="solr.SpatialRecursivePrefixTreeFieldType"* geo="false*"
> maxDistErr="0.001"
> worldBounds="ENVELOPE(-1,-1,-1,-1)" distErrPct="0.025"
> distanceUnits="kilometers"/>
>
> if I use  geo=false for solr.SpatialRecursivePrefixTreeFieldType and I use
> this query:
>
> http://localhost:8983/solr/ruian/select?indent=on&q=*:*&fq={!bbox%20sfield=mapa}&pt=-818044.37%20-1069122.12&d=20
> <http://localhost:8983/solr/ruian/select?indent=on&q=*:*&fq=%7B!bbox%20sfield=mapa%7D&pt=-818044.37%20-1069122.12&d=20>
> <
> http://localhost:8983/solr/ruian/select?indent=on&q=*:*&fq=%7B!bbox%20sfield=mapa%7D&pt=-818044.37%20-1069122.12&d=20
> >
> for
> getting all object in distance. But I actually don't know in which units
> the distance is with this settings.
>
>
>
> Thank you very much
>
> Jan
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

11:12:25 ERROR SolrCore org.apache.solr.common.SolrException: undefined field 1948

2016-05-05 Thread Garfinkel, David

I'm new to administering Solr, but it is part of my DAM and I'd like to
have a better understanding. If I understand correctly I have a field in my
schema with uuid 1948 that is causing an issue right?

-- 
David Garfinkel
Digital Asset Management/Helpdesk/Systems Support
The Museum of Modern Art
212.708.9866
david_garfin...@moma.org

Re: 11:12:25 ERROR SolrCore org.apache.solr.common.SolrException: undefined field 1948

2016-05-05 Thread Garfinkel, David

Thanks Shawn!

On Thu, May 5, 2016 at 12:14 PM, Shawn Heisey  wrote:

> On 5/5/2016 9:52 AM, Garfinkel, David wrote:
> > I'm new to administering Solr, but it is part of my DAM and I'd like to
> > have a better understanding. If I understand correctly I have a field in
> my
> > schema with uuid 1948 that is causing an issue right?
>
> The data being indexed contains a field *named* 1948.  That is not the
> value of the field, it's the name.  Your schema does not contain a field
> named 1948, so Solr refuses to index the data.
>
> Thanks,
> Shawn
>
>


-- 
David Garfinkel
Digital Asset Management/Helpdesk/Systems Support
The Museum of Modern Art
212.708.9866
david_garfin...@moma.org

Re: relaxed vs. improved validation in solr.TrieDateField

2016-05-06 Thread David Smiley

Sorry to hear that Uwe Reh.

If this is just in your input/index data, then this could be handled with
an URP, maybe evan an existing URP.
See ParseDateFieldUpdateProcessorFactory which uses the Joda-time API.  I
am not sure if that will work, I'm a little doubtful in fact since Solr now
uses the Java 8 time API which was taken, more or less, from Joda-time.
But it's worth a shot, any way.  If it doesn't work, let me know and I'll
give you a snippet of JavaScript you can use in your URP chain.

~ David

On Fri, Apr 29, 2016 at 4:07 AM Uwe Reh  wrote:

> Hi,
>
> doing some migration tests (4.10 to 6.0) I recognized a improved
> validation of TrieDateField.
> Syntactical correct but impossible days are rejected now. (stack trace
> at the end of the mail)
>
> Examples:
> - '1997-02-29T00:00:00Z'
> - '2006-06-31T00:00:00Z'
> - '2000-00-00T00:00:00Z'
> The first two dates are formal ok, but the Date does not exist. The
> third date is more suspicions, but was also accepted by Solr 4.10.
>
> I appreciate this improvement in principle, but I have to respect the
> original data. The dates might be intentionally wrong.
>
> Is there an easy way to get the weaker validation back?
>
> Regards
> Uwe
>
>
> > Invalid Date in Date Math String:'1997-02-29T00:00:00Z'
> > at
> org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:254)
> > at
> org.apache.solr.schema.TrieField.createField(TrieField.java:726)
> > at
> org.apache.solr.schema.TrieField.createFields(TrieField.java:763)
> > at
> org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:47)
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Trouble getting "langid.map.individual" setting to work in Solr 5.0.x

2015-08-03 Thread David Smith

I am trying to use “languid.map.individual” setting to allow field “a” to 
detect as, say, English, and be mapped to “a_en”, while in the same document, 
field “b” detects as, say, German and is mapped to “b_de”.

What happens in my tests is that the global language is detected (for example, 
German), but BOTH fields are mapped to “_de” as a result.  I cannot get 
individual detection or mapping to work.  Am I mis-understanding the purpose of 
this setting?

Here is the resulting document from my test:


  {
"id": "1005!22345",
"language": [
  "de"
],
"a_de": "A title that should be detected as English with high 
confidence",
"b_de": "Die Einführung einer anlasslosen Speicherung von 
Passagierdaten für alle Flüge aus einem Nicht-EU-Staat in die EU und umgekehrt 
ist näher gerückt. Der Ausschuss des EU-Parlaments für bürgerliche Freiheiten, 
Justiz und Inneres (LIBE) hat heute mit knapper Mehrheit für einen 
entsprechenden Richtlinien-Entwurf der EU-Kommission gestimmt. Bürgerrechtler, 
Grüne und Linke halten die geplante Richtlinie für eine andere Form der 
anlasslosen Vorratsdatenspeicherung, die alle Flugreisenden zu Verdächtigen 
mache.",
"_version_": 1508494723734569000
  }


I expected “a_de” to be “a_en”, and the “language” multi-valued field to have 
“en” and “de”.

Here is my configuration in solrconfig.xml:





true
a,b
true
true
language
af:uns,ar:uns,bg:uns,bn:uns,cs:uns,da:uns,el:uns,et:uns,fa:uns,fi:uns,gu:uns,he:uns,hi:uns,hr:uns,hu:uns,id:uns,ja:uns,kn:uns,ko:uns,lt:uns,lv:uns,mk:uns,ml:uns,mr:uns,ne:uns,nl:uns,no:uns,pa:uns,pl:uns,ro:uns,ru:uns,sk:uns,sl:uns,so:uns,sq:uns,sv:uns,sw:uns,ta:uns,te:uns,th:uns,tl:uns,tr:uns,uk:uns,ur:uns,vi:uns,zh-cn:uns,zh-tw:uns
en








The debug output of lang detect, during indexing, is as follows:

---
DEBUG - 2015-08-03 14:37:54.450; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Language 
detected de with certainty 0.964723182276
DEBUG - 2015-08-03 14:37:54.450; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Detected 
main document language from fields [a, b]: de
DEBUG - 2015-08-03 14:37:54.450; 
org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor; 
Appending field a
DEBUG - 2015-08-03 14:37:54.451; 
org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor; 
Appending field b
DEBUG - 2015-08-03 14:37:54.453; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Language 
detected de with certainty 0.964723182276
DEBUG - 2015-08-03 14:37:54.453; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Mapping 
field a using individually detected language de
DEBUG - 2015-08-03 14:37:54.454; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Doing 
mapping from a with language de to field a_de
DEBUG - 2015-08-03 14:37:54.454; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Mapping 
field 1005!22345 to de
DEBUG - 2015-08-03 14:37:54.454; org.eclipse.jetty.webapp.WebAppClassLoader; 
loaded class org.apache.solr.common.SolrInputField from 
WebAppClassLoader=525571@80503
DEBUG - 2015-08-03 14:37:54.454; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Removing 
old field a
DEBUG - 2015-08-03 14:37:54.455; 
org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor; 
Appending field a
DEBUG - 2015-08-03 14:37:54.455; 
org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor; 
Appending field b
DEBUG - 2015-08-03 14:37:54.456; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Language 
detected de with certainty 0.980402022373
DEBUG - 2015-08-03 14:37:54.456; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Mapping 
field b using individually detected language de
DEBUG - 2015-08-03 14:37:54.456; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Doing 
mapping from b with language de to field b_de
DEBUG - 2015-08-03 14:37:54.456; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Mapping 
field 1005!22345 to de
DEBUG - 2015-08-03 14:37:54.456; 
org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Removing 
old field b
-

From this, my takeaway is that every time the 
LangDetectLanguageIdentifierUpdateProcessor is asked to detect the language, it 
is using field a AND b.  But I can’t quite tell from this output.

Any insight appreciated.

Regards,

David

collection mbeans: requests

2015-08-04 Thread David Santamauro



I have a question about how the stat 'requests' is calculated. I would 
really appreciate it if anyone could shed some light on the figures below.


Assumptions:
  version: 5.2.0
  layout: 8 node solrcloud, no replicas (node71-node78)
  collection: col1
  handler: /search
  stats request: /col1/admin/mbeans?stats=true&cat=QUERYHANDLER&wt=json'

I wrote a simple shell script that grabs the requests stats member from 
every node.


After collection reload
node 71 -- requests: 2
node 72 -- requests: 2
node 73 -- requests: 2
node 74 -- requests: 2
node 75 -- requests: 2
node 76 -- requests: 2
node 77 -- requests: 2
node 78 -- requests: 2
* I assume these are the auto-warm searches


After submitting 1 request (q=*:*)
node 71 -- requests: 4
node 72 -- requests: 3
node 73 -- requests: 3
node 74 -- requests: 3
node 75 -- requests: 3
node 76 -- requests: 4
node 77 -- requests: 3
node 78 -- requests: 3

After resubmitting the same request
node 71 -- requests: 6
node 72 -- requests: 4
node 73 -- requests: 4
node 74 -- requests: 4
node 75 -- requests: 4
node 76 -- requests: 5
node 77 -- requests: 5
node 78 -- requests: 4

If that wasn't strange enough, things get out of control if I add in 
facet.pivot parameter(s)


Fresh after reload (see above, 2 for every node)

Total after a facet.pivot on two fields
node 71 -- requests: 13
node 72 -- requests: 15
node 73 -- requests: 14
node 74 -- requests: 12
node 75 -- requests: 14
node 76 -- requests: 12
node 77 -- requests: 14
node 78 -- requests: 12

I imagine I'm seeing the internal cross-talk between nodes and if so, 
how can one reliably keep stats on the number of "real" requests?


thanks

David

Re: collection mbeans: requests

2015-08-04 Thread David Santamauro



I have your suggested shards.qt set up in another collection for another 
reason but I'll do that redirect here as well, thanks for the confirmation.


On 08/04/2015 10:45 AM, Shawn Heisey wrote:

On 8/4/2015 5:19 AM, David Santamauro wrote:


I have a question about how the stat 'requests' is calculated. I would
really appreciate it if anyone could shed some light on the figures below.

Assumptions:
   version: 5.2.0
   layout: 8 node solrcloud, no replicas (node71-node78)
   collection: col1
   handler: /search
   stats request: /col1/admin/mbeans?stats=true&cat=QUERYHANDLER&wt=json'

I wrote a simple shell script that grabs the requests stats member from
every node.

After collection reload
node 71 -- requests: 2
node 72 -- requests: 2
node 73 -- requests: 2
node 74 -- requests: 2
node 75 -- requests: 2
node 76 -- requests: 2
node 77 -- requests: 2
node 78 -- requests: 2
* I assume these are the auto-warm searches


After submitting 1 request (q=*:*)
node 71 -- requests: 4
node 72 -- requests: 3
node 73 -- requests: 3
node 74 -- requests: 3
node 75 -- requests: 3
node 76 -- requests: 4
node 77 -- requests: 3
node 78 -- requests: 3

After resubmitting the same request
node 71 -- requests: 6
node 72 -- requests: 4
node 73 -- requests: 4
node 74 -- requests: 4
node 75 -- requests: 4
node 76 -- requests: 5
node 77 -- requests: 5
node 78 -- requests: 4

If that wasn't strange enough, things get out of control if I add in
facet.pivot parameter(s)

Fresh after reload (see above, 2 for every node)

Total after a facet.pivot on two fields
node 71 -- requests: 13
node 72 -- requests: 15
node 73 -- requests: 14
node 74 -- requests: 12
node 75 -- requests: 14
node 76 -- requests: 12
node 77 -- requests: 14
node 78 -- requests: 12

I imagine I'm seeing the internal cross-talk between nodes and if so,
how can one reliably keep stats on the number of "real" requests?


Queries on distributed indexes change from the one request that you make
into a request to every shard, to check for relevant documents.  If
relevant documents are found, a second call to those specific shards is
made to retrieve those documents.  So if you have 5 shards in your
index, there could be up to 11 requests counted for a single query.  If
all the shards are on separate nodes, then for that 11-request query,
one of those nodes would count three requests and the others would count
two.

I know what I'm going to say next would work on an index that is
distributed but *not* SolrCloud, and I think it will work in SolrCloud too.

If you add a "shards.qt" parameter to defaults in your main request
handler (usually /select) that points at another, identically configured
handler (perhaps named "/shards") that is also in solrconfig.xml, then
that other handler should receive the distributed requests and the main
handler should only count the "real" requests.  You would be able to
track those numbers separately.

Thanks,
Shawn

Hash of solr documents

2015-08-26 Thread david . davila

Hi,

I have read in one post in the Internet that the hash Solr Cloud 
calculates over the key field to send each document to a different shard 
is indexed. Is this true? If true, is there any way to show this hash for 
each document?

Thanks,

David

Re: Hash of solr documents

2015-08-26 Thread david . davila

Yes, it´s an XY  problem :)

We are making the first tests to split our shard (Solr 5.1)

The problem we have is this: the number of documents indexed in the new 
shards is lower than in the original one (19814  and 19653, vs 61100), and 
always the same. We have no idea why Solr is doing this. A problem with 
some documents, with the segment?

A long time after we changed from "normal" Solr to Solr Cloud, we found 
that the parameter "router" in clusterstate.json was incorrect, because we 
wanted to have "compositeId" and it was set as "explicit". The solution 
was deleting the clusterstate.json and restart Solr. And we are thinking 
that maybe the problem with the SPLIT is related with that: some documents 
are stored with the hash value and others not, and SPLIT needs that to 
distribute them. But I know that this likely has nothing to do with the 
SPLIT problem, it's only an idea. 

This is the log, all seem to be normal:

INFO  - 2015-08-26 09:13:47.654; 
org.apache.solr.handler.admin.CoreAdminHandler; Invoked split action for 
core: buscon
INFO  - 2015-08-26 09:13:47.656; 
org.apache.solr.update.DirectUpdateHandler2; start 
commit{,optimize=false,openSearcher=true,
waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2015-08-26 09:13:47.656; 
org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. 
Skipping IW.commit.
INFO  - 2015-08-26 09:13:47.657; org.apache.solr.core.SolrCore; 
SolrIndexSearcher has not changed - not re-opening: org.apach
e.solr.search.SolrIndexSearcher
INFO  - 2015-08-26 09:13:47.657; 
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
INFO  - 2015-08-26 09:13:47.658; org.apache.solr.update.SolrIndexSplitter; 
SolrIndexSplitter: partitions=2 segments=1
INFO  - 2015-08-26 09:13:47.922; org.apache.solr.update.SolrIndexSplitter; 
SolrIndexSplitter: partition #0 partitionCount=2 r
ange=0-3fff
INFO  - 2015-08-26 09:13:47.922; org.apache.solr.update.SolrIndexSplitter; 
SolrIndexSplitter: partition #0 partitionCount=2 r
ange=0-3fff segment #0 segmentCount=1
INFO  - 2015-08-26 09:22:19.533; org.apache.solr.update.SolrIndexSplitter; 
SolrIndexSplitter: partition #1 partitionCount=2 r
ange=4000-7fff
INFO  - 2015-08-26 09:22:19.536; org.apache.solr.update.SolrIndexSplitter; 
SolrIndexSplitter: partition #1 partitionCount=2 r
ange=4000-7fff segment #0 segmentCount=1
INFO  - 2015-08-26 09:30:44.141; 
org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null 
path=/admin/cores params={ta
rgetCore=buscon_shard2_0_replica1&targetCore=buscon_shard2_1_replica1&action=SPLIT&core=buscon&wt=javabin&qt=/admin/cores&ver
sion=2} status=0 QTime=1016486 
INFO  - 2015-08-26 09:30:44.387; 
org.apache.solr.handler.admin.CoreAdminHandler; Applying buffered updates 
on core: buscon_sh
ard2_0_replica1
INFO  - 2015-08-26 09:30:44.387; 
org.apache.solr.handler.admin.CoreAdminHandler; No buffered updates 
available. core=buscon_s
hard2_0_replica1
INFO  - 2015-08-26 09:30:44.388; 
org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null 
path=/admin/cores params={na
me=buscon_shard2_0_replica1&action=REQUESTAPPLYUPDATES&wt=javabin&qt=/admin/cores&version=2}

status=0 QTime=2 
INFO  - 2015-08-26 09:30:44.441; 
org.apache.solr.handler.admin.CoreAdminHandler; Applying buffered updates 
on core: buscon_sh
ard2_1_replica1
INFO  - 2015-08-26 09:30:44.441; 
org.apache.solr.handler.admin.CoreAdminHandler; No buffered updates 
available. core=buscon_s
hard2_1_replica1
INFO  - 2015-08-26 09:30:44.441; 
org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null 
path=/admin/cores params={na
me=buscon_shard2_1_replica1&action=REQUESTAPPLYUPDATES&wt=javabin&qt=/admin/cores&version=2}

status=0 QTime=0 
INFO  - 2015-08-26 09:30:44.743; 
org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
WatchedEvent state:Syn
cConnected type:NodeDataChanged path:/clusterstate.json, has occurred - 
updating... (live nodes size: 4)

Thanks,

David

De: Anshum Gupta 
Para:   "solr-user@lucene.apache.org" , 
Fecha:  26/08/2015 10:27
Asunto: Re: Hash of solr documents

Hi David,

The route key itself is indexed, but not the hash value. Why do you need 
to
know and display the hash value? This seems like an XY problem to me:
http://people.apache.org/~hossman/#xyproblem

On Wed, Aug 26, 2015 at 1:17 AM,  wrote:

> Hi,
>
> I have read in one post in the Internet that the hash Solr Cloud
> calculates over the key field to send each document to a different shard
> is indexed. Is this true? If true, is there any way to show this hash 
for
> each document?
>
> Thanks,
>
> David

-- 
Anshum Gupta

Re: collection API timeout

2015-11-04 Thread Julien David


I forgot to mention that we are using Solr 4.9.0 and zookeeper 3.4.6

Thanks

Julien

Le 04/11/2015 11:37, Julien DAVID - Decalog a écrit :

Hi all,

We have a production environment composed by 6 solrcloud server and 3 
zookeeper.

We've got around 30 collections, with 6 shards each.
We recently moved from 3 solr to 6, splitting the shards (3 to 6).

As the last weeks were a low period we didn't noticed any problem.
But since monday, the API collections calls go systematically to timeout.
We use calls to CLUSTERSTATUS, but LIST or OVERSEERSTATUS has the same 
results, whatever the node.


We don't have any problem on the qualification environment which is 
identical, except the load.


The error message is :
CLUSTERSTATUS the collection time 
out:180sorg.apache.solr.common.SolrException: 
CLUSTERSTATUS the collection time out:180s
at 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:368)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:320)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleClusterStatus(CollectionsHandler.java:639)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:220)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)

at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)

at java.lang.Thread.run(Thread.java:745)


Thanks for your help

--
Julien

Re: collection API timeout

2015-11-05 Thread Julien David


Seems I'ill need to upgrade to 5.3.1

It is possible to upgrade from 4.9 to 5.3 or do I need deploy all 
intermediate versions?


Thks

Re: Arabic analyser

2015-11-11 Thread David Murgatroyd

>So BasisTech works for the latest version of solr?

Yes, our latest Arabic analyzer supports up through 5.3.x. But since the
examples you give are names, it sounds like you might instead/also want our
fuzzy name matcher which will find "عبد الله" not only with "عبدالله" but
also with typos like "عبالله" or even translations into 'English' like
"abdollah". You can visit http://www.basistech.com/solutions/search/solr/
and fill out the form there to learn more (mentioning this thread). See
also http://www.slideshare.net/dmurga/simple-fuzzy-name-matching-in-solr
for a talk I gave at the San Francisco Solr Meet-up in April on how it
plugs in to Solr by creating a special field type you can query just like
any other; this was also presented at Lucene/Solr Revolution last month (
http://lucenerevolution.org/sessions/simple-fuzzy-name-matching-in-solr/).

Best,
David Murgatroyd
(VP, Engineering, Basis Technology)

On Wed, Nov 11, 2015 at 4:31 AM, Mahmoud Almokadem 
wrote:

> Thank Alex,
>
> So BasisTech works for the latest version of solr?
>
> Sincerely,
> Mahmoud
>
> On Tue, Nov 10, 2015 at 5:28 PM, Alexandre Rafalovitch  >
> wrote:
>
> > If this is for a significant project and you are ready to pay for it,
> > BasisTech has commercial solutions in this area I believe.
> >
> > Regards,
> >Alex.
> > 
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 10 November 2015 at 08:46, Mahmoud Almokadem 
> > wrote:
> > > Thanks Pual,
> > >
> > > Arabic analyser applying filters of normalisation and stemming only for
> > > single terms out of standard tokenzier.
> > > Gathering all synonyms will be hard work. Should I customise my
> Tokenizer
> > > to handle this case?
> > >
> > > Sincerely,
> > > Mahmoud
> > >
> > >
> > > On Tue, Nov 10, 2015 at 3:06 PM, Paul Libbrecht 
> > wrote:
> > >
> > >> Mahmoud,
> > >>
> > >> there is an arabic analyzer:
> > >>   https://wiki.apache.org/solr/LanguageAnalysis#Arabic
> > >> doesn't it do what you describe?
> > >> Synonyms probably work there too.
> > >>
> > >> Paul
> > >>
> > >> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
> > >> > 9 novembre 2015 17:47
> > >> > Thanks Jack,
> > >> >
> > >> > This is a good solution, but we have more combinations that I think
> > >> > can’t be handled as synonyms like every word starts with ‘عبد’ ‘Abd’
> > >> > and ‘أبو’ ‘Abo’. When using Standard tokenizer on ‘أبو بكر’ ‘Abo
> > >> > Bakr’, It’ll be tokenised to ‘أبو’ and ‘بكر’ and the filters will be
> > >> > applied for each separate term.
> > >> >
> > >> > Is there available tokeniser to tokenise ‘أبو *’ or ‘عبد *' as a
> > >> > single term?
> > >> >
> > >> > Thanks,
> > >> > Mahmoud
> > >> >
> > >> >
> > >> >
> > >> > Jack Krupansky <mailto:jack.krupan...@gmail.com>
> > >> > 9 novembre 2015 16:47
> > >> > Use an index-time (but not query time) synonym filter with a rule
> > like:
> > >> >
> > >> > Abd Allah,Abdallah
> > >> >
> > >> > This will index the combined word in addition to the separate words.
> > >> >
> > >> > -- Jack Krupansky
> > >> >
> > >> > On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem <
> > >> prog.mahm...@gmail.com>
> > >> >
> > >> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
> > >> > 9 novembre 2015 10:48
> > >> > Hello,
> > >> >
> > >> > We are indexing Arabic content and facing a problem for tokenizing
> > multi
> > >> > terms phrases like 'عبد الله' 'Abd Allah', so users will search for
> > >> > 'عبدالله' 'Abdallah' without space and need to get the results of
> 'عبد
> > >> > الله' with space. We are using StandardTokenizer.
> > >> >
> > >> >
> > >> > Is there any configurations to handle this case?
> > >> >
> > >> > Thank you,
> > >> > Mahmoud
> > >> >
> > >>
> > >>
> >
>

Re: Boosting by calculated distance buckets

2015-02-14 Thread David Smiley

Hello,
You can totally boost by calculations that happen on-the-fly on a
per-document basis when you search.  These are called function queries in
Solr.

Your your specific example… a solution that doesn’t involve writing a custom
so-called ValueSource in Java would likely mean calculating the distance
multiple times per document for each range.  Instead I suggest a continuous
function, like the reciprocal of the distance.  See the definition of the
formula here: 
https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-AvailableFunctions
  
For ‘m’ provide 1.0.  For ‘a’ and ‘b’ I suggest using the same value set to
roughly 1/10th the distance to the perimeter of the region of relevant
interest — perhaps 1/10th of say 200km.  You will of course fiddle with this
to your liking.  Assuming you use edismax, you could multiply the natural
score by something like:
&boost=recip(geodist(),1,20,20)

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


sraav wrote
> I hit a block when I ran into a use case where I had to boost on ranges of
> distances calculated at query time. This is the case when the distance is
> not present in the document initially but calulated based on the user
> entered lat/long values. 
> 
> 1. Is it required that all the boost parameters be searchable or can we
> boost on dynamic parameters which are calculated ?
> 2. Is there a way to boost on geodist() in a specific range – For example
> – Boost all the cars listed within 20-50kms range(from the search zip) by
> 100. And give a boost of 85 to all the cars listed within 51-80kms range 
> from the search zip. 
> 
> Please provide your feedback and let me know if there are any other
> options that i could try out.





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 Independent Lucene/Solr search consultant, 
http://www.linkedin.com/in/davidwsmiley
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-by-calculated-distance-buckets-tp4186504p4186587.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boosting by calculated distance buckets

2015-02-17 Thread David Smiley

Raav,

You may need to actually subscribe to the solr-user list.  Nabble seems to
not be working to well.
p.s. I’m on vacation this week so I can’t be very responsive

First of all... it's not clear you actually want to *boost* (since you seem
to not care about the relevancy score), it seems you want to *sort* based on
a function query.  So simply sort by the function query instead of using the
'bq' param.

Have you read about geodist() in the Solr Reference Guide?  It returns the
spatial distance.  With that and other function queries like map() you could
do something like sum(map(geodist(),0,40,40,0),map(geodist(),0,20,10,0)) and
you could put that into your main function query.  I purposefully overlapped
the map ranges so that I didn't have to deal with double-counting an edge. 
The only thing I don't like about this is that the distance is going to be
calculated as many times as you reference the function, and it's slow.  So
you may want to write your own function query (internally called a
ValueSource), which is relatively easy to do in Solr.

~ David


sraav wrote
> David,
> 
> Thank you for your prompt response. I truly appreciate it. Also, My post
> was not accepted the first two times so I am posting it again one final
> time. 
> 
> In my case I want to turn off the dependency on scoring and let solr use
> just the boost values that I pass to each function to sort on. Here is a
> quick example of how I got that to work with non-geo fields which are
> present in the document and are not dynamically calculated. Using edismax
> ofcourse.
> 
> I was able to turn off the scoring (i mean remove the dependency on score)
> on the result set and drive the sort by the boost that I mentioned in the
> below query. In the below function For example - if the "document1"
> matches the date listed it gets a boost = 5. If the same document matches
> the owner AND product  - it will get an additional boost of 5 more. The
> total boost of this "document1" is 10. From what ever I have seen, it
> seems like i was able to turn off of negate the affects of solr score.
> There was a query norm param that was affecting the boost but it seemed to
> be a constant around 0.70345...most of the time for any fq mentioned).  
> 
> bq = {!func}sum(if(query({!v='datelisted:[2015-01-22T00:00:00.000Z TO
> *]'}),5,0),if(and(query({!v='owner:*BRAVE*'}),query({!v='PRODUCT:*SWORD*'}),5,0))
> 
> What I am trying to do is to add additional boosting function to the
> custom boost that will eventually tie into the above function and boost
> value.
> 
> For example - if "document1" falls in 0-20 KM range i would like to add a
> boost of 50 making the final boost value to be 60. If it falls under
> 20-40KM - i would like to add a boost of 40 and so on.  
> 
> Is there a way we can do this?  Please let me know if I can provide better
> clarity on the use case that I am trying to solve. Thank you David.
> 
> Thanks,
> Raav





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 Independent Lucene/Solr search consultant, 
http://www.linkedin.com/in/davidwsmiley
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-by-calculated-distance-buckets-tp4186504p4187112.html
Sent from the Solr - User mailing list archive at Nabble.com.

Problem with queries that includes NOT

2015-02-25 Thread david . davila

Hello,

We have problems with some queries. All of them include the tag NOT, and 
in my opinion, the results don´t make any sense.

First problem:

This query " NOT Proc:ID01 "   returns   95806 results, however this one "
NOT Proc:ID01 OR FileType:PDF_TEXT" returns  11484 results. But it's 
impossible that adding a tag OR the query has less number of results.

Second problem. Here the problem is because of the brackets and the NOT 
tag:

 This query:

(NOT Proc:"ID01" AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE 
returns 0 documents.

But this query:

(NOT Proc:"ID01" AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE) 
returns 53 documents, which is correct. So, the problem is the position of 
the bracket. I have checked the same query without NOTs, and it works fine 
returning the same number of results in both cases.  So, I think the 
problem is the combination of the bracket positions and the NOT tag.

This second problem is less important, but the queries comes from a web 
page and I'd have to change it, so I need to know if the problem is Solr 
or not.



This is the part of the scheme that applies:





Thank you very much,




David Dávila 

DIT - 915828763

Re: Problem with queries that includes NOT

2015-02-25 Thread david . davila

Hi Shawn,

thank you for your quick response. I will read your links and make some 
tests.

Regards,

David Dávila
DIT - 915828763

De: Shawn Heisey 
Para:   solr-user@lucene.apache.org, 
Fecha:  25/02/2015 13:23
Asunto: Re: Problem with queries that includes NOT

On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote:
> We have problems with some queries. All of them include the tag NOT, and 

> in my opinion, the results don´t make any sense.
> 
> First problem:
> 
> This query " NOT Proc:ID01 "   returns   95806 results, however this one 
"
> NOT Proc:ID01 OR FileType:PDF_TEXT" returns  11484 results. But it's 
> impossible that adding a tag OR the query has less number of results.
> 
> Second problem. Here the problem is because of the brackets and the NOT 
> tag:
> 
>  This query:
> 
> (NOT Proc:"ID01" AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE 
> returns 0 documents.
> 
> But this query:
> 
> (NOT Proc:"ID01" AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE) 
> returns 53 documents, which is correct. So, the problem is the position 
of 
> the bracket. I have checked the same query without NOTs, and it works 
fine 
> returning the same number of results in both cases.  So, I think the 
> problem is the combination of the bracket positions and the NOT tag.

For the first query, there is a difference between "NOT condition1 OR
condition2" and "NOT (condition1 OR condition2)" ... I can imagine the
first one increasing the document count compared to just "NOT
condition1" ... the second one wouldn't increase it.

Boolean queries in Solr (and very likely Lucene as well) do not always
do what people expect.

http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/
https://lucidworks.com/blog/why-not-and-or-and-not/

As mentioned in the second link above, you'll get better results if you
use the prefix operators with explicit parentheses.  One word of
warning, though -- the prefix operators do not work correctly if you
change the default operator to AND.

Thanks,
Shawn

Re: Problem with queries that includes NOT

2015-02-26 Thread david . davila

Hi,

I thought that we were using the edismax query parser, but it seems that 
we had configured the dismax parser.
I have made some tests with the edismax parser and it works fine, so I'll 
change it in our production Solr.

Regards,

David Dávila
DIT - 915828763




De: Alvaro Cabrerizo 
Para:   "solr-user@lucene.apache.org" , 
Fecha:  25/02/2015 16:41
Asunto: Re: Problem with queries that includes NOT



Hi,

The edismax parser should be able to manage the query you want to ask. 
I've
made a test and the next both queries give me the right result (see the
parenthesis):

   - {!edismax}(NOT id:7 AND NOT id:8  AND id:9)   (gives 1 
hit
   the id:9)
   - {!edismax}((NOT id:7 AND NOT id:8)  AND id:9) (gives 1 
hit
   the id:9)

In general, the issue appears when using the lucene query parser mixing
different boolean clauses (including NOT). Thus, as you commented, the 
next
queries gives different result


   - NOT id:7 AND NOT id:8  AND id:9   (gives 1 hit the
   id:9)
   - (NOT id:7 AND NOT id:8)  AND id:9 (gives 0 hits when
   expecting 1 )

Since I read the chapter "Limitations of prohibited clauses in 
sub-queries"
from the "Apache Solr 3 Enterprise Search Server" many years ago,  I 
always
add the *all documents query clause *:**  to the negative clauses to avoid
the problem you mentioned. Thus I will recommend to rewrite the query you
showed us as:

   - (**:*: AND* NOT Proc:"ID01" AND NOT FileType:PDF_TEXT) AND
   sys_FileType:PROTOTIPE
   - (NOT id:7 AND NOT id:8 *AND *:**)  AND id:9 (gives 1 hit
   as expected)

The above query can be read then as give me all the documents except those
having ID01 and PDF_TEXT and having PROTOTIPE

Regards.




On Wed, Feb 25, 2015 at 1:23 PM, Shawn Heisey  wrote:

> On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote:
> > We have problems with some queries. All of them include the tag NOT, 
and
> > in my opinion, the results don´t make any sense.
> >
> > First problem:
> >
> > This query " NOT Proc:ID01 "   returns   95806 results, however this 
one
> "
> > NOT Proc:ID01 OR FileType:PDF_TEXT" returns  11484 results. But it's
> > impossible that adding a tag OR the query has less number of results.
> >
> > Second problem. Here the problem is because of the brackets and the 
NOT
> > tag:
> >
> >  This query:
> >
> > (NOT Proc:"ID01" AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE
> > returns 0 documents.
> >
> > But this query:
> >
> > (NOT Proc:"ID01" AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE)
> > returns 53 documents, which is correct. So, the problem is the 
position
> of
> > the bracket. I have checked the same query without NOTs, and it works
> fine
> > returning the same number of results in both cases.  So, I think the
> > problem is the combination of the bracket positions and the NOT tag.
>
> For the first query, there is a difference between "NOT condition1 OR
> condition2" and "NOT (condition1 OR condition2)" ... I can imagine the
> first one increasing the document count compared to just "NOT
> condition1" ... the second one wouldn't increase it.
>
> Boolean queries in Solr (and very likely Lucene as well) do not always
> do what people expect.
>
> http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/
> https://lucidworks.com/blog/why-not-and-or-and-not/
>
> As mentioned in the second link above, you'll get better results if you
> use the prefix operators with explicit parentheses.  One word of
> warning, though -- the prefix operators do not work correctly if you
> change the default operator to AND.
>
> Thanks,
> Shawn
>
>

Re: Solr join + Boost in single query

2015-03-03 Thread David Smiley

No, not without writing something custom anyway. It'd be difficult to make it
fast if there's a lot of documents to join on.


sraav wrote
> David,
> 
> Is it possible to write a query to join two cores and either bring back
> data from the two cores or to boost on the data coming back from either of
> the cores? Is that possible with Solr? 
> 
> Raavi





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 Independent Lucene/Solr search consultant, 
http://www.linkedin.com/in/davidwsmiley
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-join-Boost-in-single-query-tp4190825p4190849.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread David Smiley

Another more modern option, very related to this, is to use DateRangeField in 
5.0.  You have full 64 bit precision.  More info is in the Solr Ref Guide.

If Alessandro sticks with RPT, then the best reference to give is this:
http://wiki.apache.org/solr/SpatialForTimeDurations

~ David
https://www.linkedin.com/in/davidwsmiley

> On May 21, 2015, at 11:49 AM, Holger Rieß  
> wrote:
> 
> Give geospatial search a chance. Use the 
> 'SpatialRecursivePrefixTreeFieldType' field type, set 'geo' to false.
> The date is located on the X-axis, prices on the Y axis.
> For every price you get a horizontal line between start and end date. Index a 
> rectangle with height 0.001(< 1 cent) and width 'end date - start date'.
> 
> Find all prices that are valid on a given day or in a given date range with 
> the 'geofilt' function.
> 
> The field type could look like (not tested):
> 
>  class="solr.SpatialRecursivePrefixTreeFieldType"
>   geo="false" distErrPct="0.025" maxDistErr="0.09" units="degrees"
>   worldBounds="1 0 366 1" />
> 
> Faceting possibly can be done with a facet query for every of your price 
> ranges.
> For example day 20, price range 0-5$, rectangle: 20.0 0.0 
> 21.0 5.0.
> 
> Regards Holger
>

fq and defType

2015-06-01 Thread david . davila

Hello,

I need to parse some complicated queries that only works properly with the 
edismax query parser, in q and fq parameters. I am testing with 
defType=edismax, but it seems that this clause only affects to the q 
parameter. Is there any way to set edismax to the fq parameter?

Thank you very much, 


David Dávila Atienza
DIT
Teléfono: 915828763
Extensión: 36763

Re: fq and defType

2015-06-01 Thread david . davila

Thank you!

David



De: Shawn Heisey 
Para:   solr-user@lucene.apache.org, 
Fecha:  01/06/2015 18:53
Asunto: Re: fq and defType



On 6/1/2015 10:44 AM, david.dav...@correo.aeat.es wrote:
> I need to parse some complicated queries that only works properly with 
the 
> edismax query parser, in q and fq parameters. I am testing with 
> defType=edismax, but it seems that this clause only affects to the q 
> parameter. Is there any way to set edismax to the fq parameter?

fq={!edismax}querystring

The other edismax parameters on your request (qf, etc) apply to those
filter queries just like they would for the q parameter.

Thanks,
Shawn

Looking for help in building a configuration that should be simple

2015-06-02 Thread David Patterson

I've been asked to build a sample configuration of SolrCloud using Solr
4.10.

I want to have two instances (virtual machines) each with two solr nodes.
Let's call the instances 1 and 2, and the nodes 1AO, 1BB, 2AB, and 2BO.  I
want 1AO to be the owner of that shard with 2AB as the backup, and 2BO to
be the owner of its data and have 1BB as its backup.

I also want to use an external ZooKeeper that we already have and trust for
all 4 solr nodes.

Is this something that is doable, and what does it take to make it so?

Thanks.

Dave Patterson

Could not find configName for collection client_active found:nul

2015-06-03 Thread David McReynolds

I’m helping someone with this but my zookeeper experience is limited (as in
none). They have purportedly followed the instruction from the wiki.



https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble





Jun 02, 2015 2:40:37 PM org.apache.solr.common.cloud.ZkStateReader
updateClusterState

INFO: Updating cloud state from ZooKeeper...

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController
createCollectionZkNode

INFO: Check for collection zkNode:client_active

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater
updateState

INFO: Update state numShards=null message={

  "operation":"state",

  "state":"down",

  "base_url":"http://10.10.1.178:8983/solr";,

  "core":"client_active",

  "roles":null,

  "node_name":"10.10.1.178:8983_solr",

  "shard":null,

  "collection":"client_active",

  "numShards":null,

  "core_node_name":"10.10.1.178:8983_solr_client_active"}

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController
createCollectionZkNode

INFO: Creating collection in ZooKeeper:client_active

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater
updateState

INFO: shard=shard1 is already registered

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController getConfName

INFO: Looking for collection configName

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController getConfName

INFO: Could not find collection configName - pausing for 3 seconds and
trying again - try: 1

Jun 02, 2015 2:40:37 PM
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process

INFO: LatchChildWatcher fired on path: /overseer/queue state: SyncConnected
type NodeChildrenChanged

Jun 02, 2015 2:40:37 PM org.apache.solr.common.cloud.ZkStateReader$2 process

INFO: A cluster state change: WatchedEvent state:SyncConnected
type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
(live nodes size: 1)

Jun 02, 2015 2:40:40 PM org.apache.solr.cloud.ZkController getConfName

INFO: Could not find collection configName - pausing for 3 seconds and
trying again - try: 2

Jun 02, 2015 2:40:43 PM org.apache.solr.cloud.ZkController getConfName

INFO: Could not find collection configName - pausing for 3 seconds and
trying again - try: 3

Jun 02, 2015 2:40:46 PM org.apache.solr.cloud.ZkController getConfName

INFO: Could not find collection configName - pausing for 3 seconds and
trying again - try: 4

Jun 02, 2015 2:40:49 PM org.apache.solr.cloud.ZkController getConfName

INFO: Could not find collection configName - pausing for 3 seconds and
trying again - try: 5

Jun 02, 2015 2:40:52 PM org.apache.solr.cloud.ZkController getConfName

SEVERE: Could not find configName for collection client_active

Jun 02, 2015 2:40:52 PM org.apache.solr.core.CoreContainer recordAndThrow

SEVERE: Unable to create core: client_active

org.apache.solr.common.cloud.ZooKeeperException: Could not find configName
for collection client_active found:null

-- 
--
*Mi aerodeslizador está lleno de anguilas.*

How important is the name of the data collection?

2015-06-08 Thread David Patterson

I'm trying to make two virtual machines, each with one 4.10 SOLR-cloud code
instance, connected to the same external Zookeeper site.

I want to create one data collection with one shard on each of these two
machines.

If I use the "start" method as described in the Apache Solr Reference Guide
for my release, will the two machines be connected if I declare the same
data collection name for both of them? If not, how do I connect them?

(I know the start method can make two solr-cloud instances on ONE virtual
machine, but I want to make one on each of two virtual machines.)

Thanks

Dave P

Re: Highlighting phone numbers

2016-05-18 Thread David Smiley

Perhaps an easy thing to try is see of the FastVectorHighlighter yields any
different results.  There are some nuances to the highlighters -- it might.

Failing that, this likely due to your analysis chain, and where exactly the
offsets point to, which you can see/debug in Solr's analysis screen.  You
might have to develop custom analysis components (e.g. custom TokenFilter)
if the offsets aren't what you want.

Good luck,
~ David

On Wed, May 18, 2016 at 9:07 AM marotosg  wrote:

> Hi,
>
> I have a solr multivalued field with a list of phone numbers with many
> different formats. Below field type.
> 
> 
> 
> 
>  pattern="([^0-9])"
> replacement="" replace="all"/>
>  minGramSize="5" maxGramSize="30"
> />
> 
> 
> 
> 
>  pattern="([^0-9])"
> replacement="" replace="all"/>
>  minGramSize="3" maxGramSize="30"
> />
> 
>  class="com.spencerstuart.similarities.SpencerStuartNoSimilarity"/>
> 
>
> I have a requirement to highlight the part of the number matched to explain
> to the user why this record is returned.
>
> If I search for "17573062033" I am able to match many results but the
> fullnumber is highlighted.
>
> 
>   0
>   12
>   
> CoreID,PhoneListS
> true
> PhoneListS:17573062033
> 1463576646314
> 
> 
> PhoneListS
> xml
> true
> 3
>   
> 
> 
>   
> 
>   1757.306.2033
> 
> 10224838
>   
> 
>   1757.306.2033
> 
> 10224840
>   
> 
>   1757.306.2089
>   1757.306.7006
> 
> 10034811
> 
> 
>   
> 
>   1757.306.2033
> 
>   
>   
> 
>   1757.306.2033
> 
>   
>   
> 
>   1757.306.2089
> 
>   
> 
> 
>
> Would it be possible to get the piece of information which matches.
> Something like this 1757.306.2089
>
> thanks
> Sergio
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Highlighting-phone-numbers-tp4277491.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Facet heatmaps: cluster coordinates based on average position of docs

2016-05-18 Thread David Smiley

Sorry for such a belated response; I don't monitor this list as much as I
used to.
My response is within...

On Wed, Apr 20, 2016 at 4:28 AM Anton K.  wrote:

> Thanks for your answer, David, and have a good vacation.
>
> It seems more detailed heatmap is not a goods solution in my case because i
> need to display cluster icon with number of items inside cluster. So if i
> got very large amount of cells on map, some of the cells will overlap.
>

I did not mean to suggest you display one cluster for each non-zero heatmap
cell; I meant you funnel this as input to other client-side heatmap
renderers that do the clustering.  The point of this is to keep the number
of inputs to that renderer manageable instead of potentially a gazillion if
you have that many docs/points.

I also think about Stat component for facet.heatmap feature. Maybe we can
> use stat component to add average positions of documents in cell?
>

I think I've seen hand-rolled heatmap capabilities added to Solr (i.e. no
custom Solr hacking) that went about it kinda like that.  stats.facet on
some geohash (or similar), then average lat & average lon.

~ David


> 2016-04-20 4:28 GMT+03:00 David Smiley :
>
> > Hi Anton,
> >
> > Perhaps you should request a more detailed / high-res heatmap, and then
> > work with that, perhaps using some clustering technique?  I confess I
> don't
> > work on the UI end of things these days.
> >
> > p.s. I'm on vacation this week; so I don't respond quickly
> >
> > ~ David
> >
> > On Thu, Apr 7, 2016 at 3:43 PM Anton K.  wrote:
> >
> > > I am working with new solr feature: facet heatmaps. It works great, i
> > > create clusters on my map with counts. When user click on cluster i
> zoom
> > in
> > > that area and i might show him more clusters or documents (based on
> > current
> > > zoom level).
> > >
> > > But all my cluster icons (i use round one, see screenshot below) placed
> > > straight in the center of cluster's rectangles:
> > >
> > > https://dl.dropboxusercontent.com/u/1999619/images/map_grid3.png
> > >
> > > Some clusters can be in sea and so on. Also it feels not natural in my
> > case
> > > to have icons placed orderly on the world map.
> > >
> > > I want to place cluster's icons in average coords based on coordinates
> of
> > > all my docs inside cluster. Is there any way to achieve this? I am
> trying
> > > to use stats component for facet heatmap but it isn't implemented yet.
> > >
> > --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Issues with coordinates in Solr during updating of fields

2016-06-13 Thread David Smiley

Zheng,
There are a few Solr FieldTypes that are basically composite fields -- a
virtual field of other fields.  AFAIK they are all spatial related.  You
don't necessarily need to pay attention to the fact that gps_1_coordinate
exists under the hood unless you wish to customize the options on that
field type in the schema.  e.g. if you don't need it for filtering (perhaps
using RPT for that) then you can set indexed=false.
~ David

On Fri, Jun 10, 2016 at 8:43 PM Zheng Lin Edwin Yeo 
wrote:

> Would like to check, what is the use of the gps_0_coordinate and
> gps_1_coordinate
> field then? Is it just to store the data points, or does it have any other
> use?
>
> When I do the query, I found that we are only querying the gps_field, which
> is something like this:
> http://localhost:8983/solr/collection1/highlight?q=*:*&fq={!geofilt
> pt=1.5,100.0
> <http://localhost:8983/solr/collection1/highlight?q=*:*&fq=%7B!geofiltpt=1.5,100.0>
> sfield=gps d=5}
>
>
> Regards,
> Edwin
>
> On 27 May 2016 at 08:48, Erick Erickson  wrote:
>
> > Should be fine. When the location field is
> > re-indexed (as it is with Atomic Updates)
> > the two fields will be filled back in.
> >
> > Best,
> > Erick
> >
> > On Thu, May 26, 2016 at 4:45 PM, Zheng Lin Edwin Yeo
> >  wrote:
> > > Thanks Erick for your reply.
> > >
> > > It works when I remove the 'stored="true" ' from the gps_0_coordinate
> and
> > > gps_1_coordinate.
> > >
> > > But will this affect the search functions of the gps coordinates in the
> > > future?
> > >
> > > Yes, I am referring to Atomic Updates.
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 27 May 2016 at 02:02, Erick Erickson 
> wrote:
> > >
> > >> Try removing the 'stored="true" ' from the gps_0_coordinate and
> > >> gps_1_coordinate.
> > >>
> > >> When you say "...tried to do an update on any other fileds" I'm
> assuming
> > >> you're
> > >> talking about Atomic Updates, which require that the destinations of
> > >> copyFields are single valued. Under the covers the location type is
> > >> split and copied to the other two fields so I suspect that's what's
> > going
> > >> on.
> > >>
> > >> And you could also try one of the other types, see:
> > >> https://cwiki.apache.org/confluence/display/solr/Spatial+Search
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Thu, May 26, 2016 at 1:46 AM, Zheng Lin Edwin Yeo
> > >>  wrote:
> > >> > Anyone has any solutions to this problem?
> > >> >
> > >> > I tried to remove the gps_0_coordinate and gps_1_coordinate, but I
> > will
> > >> get
> > >> > the following error during indexing.
> > >> > ERROR: [doc=id1] unknown field 'gps_0_coordinate'
> > >> >
> > >> > Regards,
> > >> > Edwin
> > >> >
> > >> >
> > >> > On 25 May 2016 at 11:37, Zheng Lin Edwin Yeo 
> > >> wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> I have an implementation of storing the coordinates in Solr during
> > >> >> indexing.
> > >> >> During indexing, I will only store the value in the field name
> > ="gps".
> > >> For
> > >> >> the field name = "gps_0_coordinate" and "gps_1_coordinate", the
> value
> > >> will
> > >> >> be auto filled and indexed from the "gps" field.
> > >> >>
> > >> >> > >> required="false"/>
> > >> >> > >> stored="true" required="false"/>
> > >> >> > >> stored="true" required="false"/>
> > >> >>
> > >> >> But when I tried to do an update on any other fields in the index,
> > Solr
> > >> >> will try to add another value in the "gps_0_coordinate" and
> > >> >> "gps_1_coordinate". However, as these 2 fields are not
> multi-Valued,
> > it
> > >> >> will lead to an error:
> > >> >> multiple values encountered for non multiValued field
> > gps_0_coordinate:
> > >> >> [1.0,1.0]
> > >> >>
> > >> >> Does anyone knows how we can solve this issue?
> > >> >>
> > >> >> I am using Solr 5.4.0
> > >> >>
> > >> >> Regards,
> > >> >> Edwin
> > >> >>
> > >>
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: json facet - date range & interval

2016-06-28 Thread David Santamauro



Have you tried %-escaping?

json.facet = {
  daterange : { type  : range,
field : datefield,
start : "NOW/DAY%2D10DAYS",
end   : "NOW/DAY",
gap   : "%2B1DAY"
  }
}


On 06/28/2016 01:19 PM, Jay Potharaju wrote:

json.facet={daterange : {type : range, field : datefield, start :
"NOW/DAY-10DAYS", end : "NOW/DAY",gap:"\+1DAY"} }

Escaping the plus sign also gives the same error. Any other suggestions how
can i make this work?
Thanks
Jay

On Mon, Jun 27, 2016 at 10:23 PM, Erick Erickson 
wrote:


First thing I'd do is escape the plus. It's probably being interpreted
as a space.

Best,
Erick

On Mon, Jun 27, 2016 at 9:24 AM, Jay Potharaju 
wrote:

Hi,
I am trying to use the json range facet with a tdate field. I tried the
following but get an error. Any suggestions on how to fix the following
error /examples for date range facets.

json.facet={daterange : {type : range, field : datefield, start
:"NOW-10DAYS", end : "NOW/DAY", gap : "+1DAY" } }

  msg": "Can't add gap 1DAY to value Fri Jun 17 15:49:36 UTC 2016 for

field:

datefield", "code": 400

--
Thanks
Jay

Re: error rendering solr spatial in geoserver

2016-06-29 Thread David Smiley

For polygons in 6.0 you need to set
spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
-- see
https://cwiki.apache.org/confluence/display/solr/Spatial+Search and the
example.  And of course as you probably already know, put the JTS jar on
Solr's classpath.  What likely tripped you up between 5x and 6x is the
change in value of the spatialContextFactory as a result in organizational
package moving "com.spatial4j.core" to "org.locationtech.spatial4j".

On Wed, Jun 29, 2016 at 12:44 PM tkg_cangkul  wrote:

> hi erick, thx for your reply.
>
> i've solve this problem.
> i got this error when i use solr 6.0.0
> so i try to downgrade my solr to version 5.5.0 and it's successfull
>
>
> On 29/06/16 22:39, Erick Erickson wrote:
> > There is not nearly enough information here to say anything very helpful.
> > What does your schema look like for this field?
> > What does the input look like?
> > How are you pulling data from geoserver?
> >
> > You might want to review:
> > http://wiki.apache.org/solr/UsingMailingLists
> >
> > Best,
> > Erick
> >
> > On Wed, Jun 29, 2016 at 2:31 AM, tkg_cangkul  > > wrote:
> >
> > hi, i try to load data spatial from solr with geoserver.
> > when i try to show the layer preview i've got this error message.
> >
> > error
> >
> >
> > anybody can help me pls?
> >
> >
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: error rendering solr spatial in geoserver

2016-07-01 Thread David Smiley

Sorry, good point Era; I forgot about that.  I filed an issue:
https://issues.apache.org/jira/browse/SOLR-9270
When I work on that I'll add an upgrading note to the 6x section.

~ David

On Wed, Jun 29, 2016 at 6:31 AM Ere Maijala  wrote:

> It would have been _really_ nice if this had been in the release notes.
> Made me also scratch my head for a while when upgrading to Solr 6.
> Additionally, this makes a rolling upgrade from Solr 5.x a bit more
> scary since you have to update the collection schema to make the Solr 6
> nodes work while making sure that no Solr 5 node reloads the configuration.
>
> --Ere
>
> 30.6.2016, 3.46, David Smiley kirjoitti:
> > For polygons in 6.0 you need to set
> >
> spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
> > -- see
> > https://cwiki.apache.org/confluence/display/solr/Spatial+Search and the
> > example.  And of course as you probably already know, put the JTS jar on
> > Solr's classpath.  What likely tripped you up between 5x and 6x is the
> > change in value of the spatialContextFactory as a result in
> organizational
> > package moving "com.spatial4j.core" to "org.locationtech.spatial4j".
> >
> > On Wed, Jun 29, 2016 at 12:44 PM tkg_cangkul 
> wrote:
> >
> >> hi erick, thx for your reply.
> >>
> >> i've solve this problem.
> >> i got this error when i use solr 6.0.0
> >> so i try to downgrade my solr to version 5.5.0 and it's successfull
> >>
> >>
> >> On 29/06/16 22:39, Erick Erickson wrote:
> >>> There is not nearly enough information here to say anything very
> helpful.
> >>> What does your schema look like for this field?
> >>> What does the input look like?
> >>> How are you pulling data from geoserver?
> >>>
> >>> You might want to review:
> >>> http://wiki.apache.org/solr/UsingMailingLists
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Wed, Jun 29, 2016 at 2:31 AM, tkg_cangkul  >>> <mailto:yuza.ras...@gmail.com>> wrote:
> >>>
> >>> hi, i try to load data spatial from solr with geoserver.
> >>> when i try to show the layer preview i've got this error message.
> >>>
> >>> error
> >>>
> >>>
> >>> anybody can help me pls?
> >>>
> >>>
> >>
> >> --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
> >
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: analyzer for _text_ field

2016-07-15 Thread David Santamauro



The opening and closing single quotes don't match

-data-binary '{ ... }’

it should be:

-data-binary '{ ... }'


On 07/15/2016 02:59 PM, Steve Rowe wrote:

Waldyr, maybe it got mangled by my email client or yours?

Here’s the same command:

   

--
Steve
www.lucidworks.com


On Jul 15, 2016, at 2:16 PM, Waldyr Neto  wrote:

Hy Steves, tks for the help
unfortunately i'm making some mistake

when i try to run



curl -X POST -H 'Content-type: application/json’ \
http://localhost:8983/solr/gettingstarted/schema --data-binary
'{"add-field-type": { "name": "my_new_field_type", "class":
"solr.TextField","analyzer": {"charFilters": [{"class":
"solr.HTMLStripCharFilterFactory"}], "tokenizer": {"class":
"solr.StandardTokenizerFactory"},"filters":[{"class":
"solr.WordDelimiterFilterFactory"}, {"class":
"solr.LowerCaseFilterFactory"}]}},"replace-field": { "name":
"_text_","type": "my_new_field_type", "multiValued": "true","indexed":
"true","stored": "false"}}’

i receave the folow error msg from curl program
:

curl: (3) [globbing] unmatched brace in column 1

curl: (6) Could not resolve host: name

curl: (6) Could not resolve host: my_new_field_type,

curl: (6) Could not resolve host: class

curl: (6) Could not resolve host: solr.TextField,analyzer

curl: (3) [globbing] unmatched brace in column 1

curl: (3) [globbing] bad range specification in column 2

curl: (3) [globbing] unmatched close brace/bracket in column 32

curl: (6) Could not resolve host: tokenizer

curl: (3) [globbing] unmatched brace in column 1

curl: (3) [globbing] unmatched close brace/bracket in column 30

curl: (3) [globbing] unmatched close brace/bracket in column 32

curl: (3) [globbing] unmatched brace in column 1

curl: (3) [globbing] unmatched close brace/bracket in column 28

curl: (3) [globbing] unmatched brace in column 1

curl: (6) Could not resolve host: name

curl: (6) Could not resolve host: _text_,type

curl: (6) Could not resolve host: my_new_field_type,

curl: (6) Could not resolve host: multiValued

curl: (6) Could not resolve host: true,indexed

curl: (6) Could not resolve host: true,stored

curl: (3) [globbing] unmatched close brace/bracket in column 6

cvs1:~ vvisionphp1$

On Fri, Jul 15, 2016 at 2:45 PM, Steve Rowe  wrote:


Hi Waldyr,

An example of changing the _text_ analyzer by first creating a new field
type, and then changing the _text_ field to use the new field type (after
starting Solr 6.1 with “bin/solr start -e schemaless”):

-
PROMPT$ curl -X POST -H 'Content-type: application/json’ \
http://localhost:8983/solr/gettingstarted/schema --data-binary '{
  "add-field-type": {
"name": "my_new_field_type",
"class": "solr.TextField",
"analyzer": {
  "charFilters": [{
"class": "solr.HTMLStripCharFilterFactory"
  }],
  "tokenizer": {
"class": "solr.StandardTokenizerFactory"
  },
  "filters":[{
  "class": "solr.WordDelimiterFilterFactory"
}, {
  "class": "solr.LowerCaseFilterFactory"
  }]}},
  "replace-field": {
"name": "_text_",
"type": "my_new_field_type",
"multiValued": "true",
"indexed": "true",
"stored": "false"
  }}’
-

PROMPT$ curl
http://localhost:8983/solr/gettingstarted/schema/fields/_text_

-
{
  "responseHeader”:{ […] },
  "field":{
"name":"_text_",
"type":"my_new_field_type",
"multiValued":true,
"indexed":true,
"stored":false}}
-

--
Steve
www.lucidworks.com


On Jul 15, 2016, at 12:54 PM, Waldyr Neto  wrote:

Hy, How can i configure the analyzer for the _text_ field?

Re: error indexing spatial

2016-07-25 Thread David Smiley

Hi tig.  Most likely, you didn't repeat the first point as the last.  Even
though it's redundant, nonetheless this is what WKT (and some other spatial
formats) calls for.
~ David

On Wed, Jul 20, 2016 at 10:13 PM tkg_cangkul  wrote:

> hi i try to indexing spatial format to solr 5.5.0 but i've got this error
> message.
>
> [image: error1]
>
> [image: error2]
> anybody can help me to solve this pls?
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Need Help Resolving Unknown Shape Definition Error

2016-08-15 Thread David Smiley

Hello Jennifer,

The spatial documentation is largely this page:
https://cwiki.apache.org/confluence/display/solr/Spatial+Search
(however note the online version is always for the latest Solr release. You
can download a PDF versioned against your Solr version).

To do polygon searches, you both need to add the JTS jar (which you already
did), and also to set the spatialContextFactory as the ref guide indicates
-- that you have yet to do and is I think why you see that error.

Another thing I see that looks like a problem is that you set geo=false,
yet didn't set the worldBounds.  Typically geo=true and you get the typical
decimal degree +/- 180, +/- 90 box.  But if you set false then the grid
system  needs to know the extent of your grid.

~ David

On Thu, Aug 11, 2016 at 4:04 PM Jennifer Coston <
jennifer.cos...@raytheon.com> wrote:

>
> Hello,
>
> I am trying to setup a local solr core so that I can perform Spatial
> searches on it. I am using version 5.2.1. I have updated my schema.xml file
> to include the location-rpt fieldType:
>
>  class="solr.SpatialRecursivePrefixTreeFieldType"
> geo="false" distErrPct="0.025" maxDistErr="0.001"
> distanceUnits="degrees" />
>
> And I have defined my field to use this type:
>
>  stored="true" />
>
> I also added the jts-1.4.0.jar file to C:\solr-5.2.1\server\solr-webapp
> \webapp\WEB-INF\lib.
>
> However when I try to add a document through the Solr Admin Console I am
> seeing this response:
>
> {
>   "responseHeader": {
> "status": 400,
> "QTime": 6
>   },
>   "error": {
> "msg": "Unknown Shape definition [POLYGON((-77.23 38.922, -77.23
> 38.923, -77.228 38.923, -77.228 38.922, -77.23 38.922))]",
> "code": 400
>   }
> }
>
> I can submit documents successfully if I remove the positionWkt field. Did
> I miss a configuration step?
>
> Here is the document I am trying to add:
>
> {
> "observationId": "8e09f47f",
> "observationType": "image",
> "startTime": "2015-09-19T21:03:51Z",
> "endTime": "2015-09-19T21:03:51Z",
> "receiptTime": "2016-07-29T15:49:49.328Z",
> "locationLat": 38.9225015078814,
> "locationLon": -77.22900299194423,
> "position": "38.9225015078814,-77.22900299194423",
> "positionWkt": "POLYGON((-77.23 38.922, -77.23 38.923, -77.228
> 38.923, -77.228 38.922, -77.23 38.922))",
> "provider": "a"
> }
>
> Here are the fields I added to the schema.xml file (I started with the
> template, please let me know if you need the whole thing):
>
> observationId
>
> 
> 
> 
>  required="true" multiValued="false"/>
> 
> 
> 
> 
> 
> 
> 
>  stored="true" />
>
> Thank you!
>
> Jennifer

-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Sorting on DateRangeField?

2016-09-09 Thread David Smiley

Hi Alex,

DateRangeField extends some spatial stuff, which has that error message in
it, not in DateRangeField proper.  You cannot sort on a DateRangeField.  If
you want to... try adding either one plain docValues field if you just have
date instances, or a pair of them to hold a min & max and pick the right
one to sort on.

The "sorting by the query" in the context of spatial refers to doing a
score sorted sort, noting that the score of a spatial query can be the
distance or some formula involving the distance or possibly overlap of the
shape with something else.  e.g.  q={!geofilt score=distance ...}  This
is documented in the ref guide on the spatial page, including an example
for BBoxField.

&q={!field f=bbox score=overlapRatio}Intersects(ENVELOPE(-10, 20, 15, 10))

I think that example could be simpler using {!bbox} but probably wants to
show different ways to skin this cat, so to speak.

~ David

On Wed, Sep 7, 2016 at 1:49 PM Alexandre Rafalovitch 
wrote:

> So, I tried sorting on a DateRangeField. And I got back:  "Sorting not
> supported on SpatialField: release_date, instead try sorting by
> query."
>
> Two questions:
> 1) Spatial is kind of super-internal info here, the message is rather
> confusing.
> 2) What's "sorting by query" in this case? Can I still sort on the
> field, but with a different syntax?
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: script to get core num docs

2016-09-19 Thread David Santamauro



https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API

wget -O- -q \

'/admin/cores?action=STATUS&core=coreName&wt=json&indent=true' 
\


  | grep numDocs

//


/admin/cores?action=STATUS&core=alexandria_shard2_replica1&wt=json&indent=1'|grep 
numDocs|cut -f2 -d':'|


On 09/19/2016 11:22 AM, KRIS MUSSHORN wrote:

How can i get the count of docs from a core with bash?
Seems like I have to call Admin/Luke but cant find any specifics.
Thanks
Kris

Re: request SOLR - spatial field with Intersect and Contains functions

2016-09-19 Thread David Smiley

Hi Leo,

You should use two spatial fields for this -- one is for an indexed
Box/Envelope, and another for an indexed LineString.  The indexed box
should use either BBoxField or RptWithGeometrySpatialField, and the
LineString field should use RptWithGeometrySpatialField.   If you have an
older installation 5.x version, RptWithGeometrySpatialField may not be
available in which case settle
for solr.SpatialRecursivePrefixTreeFieldType.  When you do a search, it'd
be a search for one field OR the other with the requirements you have for
each.

~ David

On Mon, Sep 19, 2016 at 8:48 AM Leo BRUVRY-LAGADEC <
leo.bruvry.laga...@partenaire-exterieur.ifremer.fr> wrote:

> Hi,
>
> I am trying spatial search in SOLR 5.0 and I don't know how to implement
> a solution for the problem I will try to explain.
>
> On a SOLR server I have indexed a collection of objects that contains
> spacial field :
>
>  multiValued="true" />
>  class="solr.SpatialRecursivePrefixTreeFieldType"
> geo="true"
> distErrPct="0.025"
> maxDistErr="0.09"
> distanceUnits="degrees" />
>
> The spatial data indexed in the field named "geo" can be ENVELOPE or
> LINESTRING :
>
> LINESTRING(-4.6837 48.5792, -4.6835 48.5788, -4.684
> 48.5788, -4.6832 48.579, -4.6837 48.5792, -4.6188 48.6265, -4.6122
> 48.63, -4.615 48.6258, -4.6125 48.6215, -4.6112 48.6218)
>
> or
>
> ENVELOPE(-5.0, -4.0, 49.0, 48.0)
>
> Actually in my application, when I do a SOLR request to get objects that
> are in a spatial area, I do something like this :
>
> q=:&fq=(geo:"Intersects(ENVELOPE(-116.894531, 107.402344, 57.433227,
> -42.146973))")
>
> But I want to change how it work. Now, when the geo field contain an
> ENVELOPE I want to do an CONTAINS request and when it contain a
> LINESTRING I want to do an INTERSECTS request.
>
> example :
>
> If geo = ENVELOPE then q=*:*&fq=(geo:"Contains(ENVELOPE(-116.894531,
> 107.402344, 57.433227, -42.146973))")
>
> If geo = LINESTRING then q=*:*&fq=(geo:"Intersects(ENVELOPE(-116.894531,
> 107.402344, 57.433227, -42.146973))")
>
> How can my application know if the field contain ENVELOPE or LINESTRING ?
>
> Any idea can this be done ?
>
> Best reguards,
> Leo.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Negative Date Query for Local Params in Solr

It should, I think... what happens? Can you ascertain the nature of the
results?
~ David

On Tue, Sep 20, 2016 at 5:35 AM Sandeep Khanzode
 wrote:

> For Solr 6.1.0
> This works .. -{!field f=schedule op=Intersects}2016-08-26T12:00:56Z
>
> This works .. {!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
> 2016-08-26T15:00:12Z]
>
>
> Why does this not work?-{!field f=schedule
> op=Contains}[2016-08-26T12:00:12Z TO 2016-08-26T15:00:12Z]
>  SRK

-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Negative Date Query for Local Params in Solr

OH!  Ok the moment the query no longer starts with "{!", the query is
parsed by defType (for 'q') and will default to lucene QParser.  So then it
appears we have a clause with a NOT operator.  In this parsing mode,
embedded "{!" terminates at the "}".  This means you can't put the
sub-query text after the "}", you instead need to put it in the special "v"
local-param.  e.g.:
-{!field f=schedule op=Contains v='[2016-08-26T12:00:12Z TO
2016-08-26T15:00:12Z]'}

On Tue, Sep 20, 2016 at 8:15 AM Sandeep Khanzode
 wrote:

> This is what I get ...
> { "responseHeader": { "status": 400, "QTime": 1, "params": { "q":
> "-{!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
> 2016-08-26T15:00:12Z]", "indent": "true", "wt": "json", "_":
> "1474373612202" } }, "error": { "msg": "Invalid Date in Date Math
> String:'[2016-08-26T12:00:12Z'", "code": 400 }}
>  SRK
>
> On Tuesday, September 20, 2016 5:34 PM, David Smiley <
> david.w.smi...@gmail.com> wrote:
>
>
>  It should, I think... what happens? Can you ascertain the nature of the
> results?
> ~ David
>
> On Tue, Sep 20, 2016 at 5:35 AM Sandeep Khanzode
>  wrote:
>
> > For Solr 6.1.0
> > This works .. -{!field f=schedule op=Intersects}2016-08-26T12:00:56Z
> >
> > This works .. {!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
> > 2016-08-26T15:00:12Z]
> >
> >
> > Why does this not work?-{!field f=schedule
> > op=Contains}[2016-08-26T12:00:12Z TO 2016-08-26T15:00:12Z]
> >  SRK
>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
>
>

-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Negative Date Query for Local Params in Solr

Personally I learned this by pouring over Solr's source code some time
ago.  I suppose the only official reference to this stuff is:
https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries
But that page doesn't address the implications for when the syntax is a
clause of a larger query instead of being the whole query (i.e. has "{!"...
but but not at the first char).

On Tue, Sep 20, 2016 at 2:06 PM Sandeep Khanzode
 wrote:

> Wow. Simply awesome!
> Where can I read more about this? I am not sure whether I understand what
> is going on behind the scenes ... like which parser is invoked for !field,
> how can we know which all special local params exist, whether we should
> prefer edismax over others, when is the LuceneQParser invoked in other
> conditions, etc? Would appreciate if you could indicate some references to
> catch up.
> Thanks a lot ...  SRK
>
>   Show original message On Tuesday, September 20, 2016 5:54 PM, David
> Smiley  wrote:
>
>
>  OH!  Ok the moment the query no longer starts with "{!", the query is
> parsed by defType (for 'q') and will default to lucene QParser.  So then it
> appears we have a clause with a NOT operator.  In this parsing mode,
> embedded "{!" terminates at the "}".  This means you can't put the
> sub-query text after the "}", you instead need to put it in the special "v"
> local-param.  e.g.:
> -{!field f=schedule op=Contains v='[2016-08-26T12:00:12Z TO
> 2016-08-26T15:00:12Z]'}
>
> On Tue, Sep 20, 2016 at 8:15 AM Sandeep Khanzode
>  wrote:
>
> > This is what I get ...
> > { "responseHeader": { "status": 400, "QTime": 1, "params": { "q":
> > "-{!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
> > 2016-08-26T15:00:12Z]", "indent": "true", "wt": "json", "_":
> > "1474373612202" } }, "error": { "msg": "Invalid Date in Date Math
> > String:'[2016-08-26T12:00:12Z'", "code": 400 }}
> >  SRK
> >
> >On Tuesday, September 20, 2016 5:34 PM, David Smiley <
> > david.w.smi...@gmail.com> wrote:
> >
> >
> >  It should, I think... what happens? Can you ascertain the nature of the
> > results?
> > ~ David
> >
> > On Tue, Sep 20, 2016 at 5:35 AM Sandeep Khanzode
> >  wrote:
> >
> > > For Solr 6.1.0
> > > This works .. -{!field f=schedule op=Intersects}2016-08-26T12:00:56Z
> > >
> > > This works .. {!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
> > > 2016-08-26T15:00:12Z]
> > >
> > >
> > > Why does this not work?-{!field f=schedule
> > > op=Contains}[2016-08-26T12:00:12Z TO 2016-08-26T15:00:12Z]
> > >  SRK
> >
> > --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
> >
> >
> >
>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
>
>

-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Negative Date Query for Local Params in Solr

So that page referenced describes local-params, and describes the special
"v" local-param.  But first, see a list of all query parsers (which lists
"field"): https://cwiki.apache.org/confluence/display/solr/Other+Parsers
and
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser for
the "lucene" one.

The "op" param is rather unique... it's not defined by any query parser.  A
trick is done in which a custom field type (DateRangeField in this case) is
able to inspect the local-params, and thus define and use params it needs.
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates "More
DateRangeField Details" mentions "op".  {!lucene df=dateRange
op=Contains}... would also work.  I don't know of any other local-param
used in this way.

On Tue, Sep 20, 2016 at 11:21 PM David Smiley 
wrote:

> Personally I learned this by pouring over Solr's source code some time
> ago.  I suppose the only official reference to this stuff is:
>
> https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries
> But that page doesn't address the implications for when the syntax is a
> clause of a larger query instead of being the whole query (i.e. has "{!"...
> but but not at the first char).
>
> On Tue, Sep 20, 2016 at 2:06 PM Sandeep Khanzode
>  wrote:
>
>> Wow. Simply awesome!
>> Where can I read more about this? I am not sure whether I understand what
>> is going on behind the scenes ... like which parser is invoked for !field,
>> how can we know which all special local params exist, whether we should
>> prefer edismax over others, when is the LuceneQParser invoked in other
>> conditions, etc? Would appreciate if you could indicate some references to
>> catch up.
>> Thanks a lot ...  SRK
>>
>>   Show original message On Tuesday, September 20, 2016 5:54 PM, David
>> Smiley  wrote:
>>
>>
>>  OH!  Ok the moment the query no longer starts with "{!", the query is
>> parsed by defType (for 'q') and will default to lucene QParser.  So then
>> it
>> appears we have a clause with a NOT operator.  In this parsing mode,
>> embedded "{!" terminates at the "}".  This means you can't put the
>> sub-query text after the "}", you instead need to put it in the special
>> "v"
>> local-param.  e.g.:
>> -{!field f=schedule op=Contains v='[2016-08-26T12:00:12Z TO
>> 2016-08-26T15:00:12Z]'}
>>
>> On Tue, Sep 20, 2016 at 8:15 AM Sandeep Khanzode
>>  wrote:
>>
>> > This is what I get ...
>> > { "responseHeader": { "status": 400, "QTime": 1, "params": { "q":
>> > "-{!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
>> > 2016-08-26T15:00:12Z]", "indent": "true", "wt": "json", "_":
>> > "1474373612202" } }, "error": { "msg": "Invalid Date in Date Math
>> > String:'[2016-08-26T12:00:12Z'", "code": 400 }}
>> >  SRK
>> >
>> >On Tuesday, September 20, 2016 5:34 PM, David Smiley <
>> > david.w.smi...@gmail.com> wrote:
>> >
>> >
>> >  It should, I think... what happens? Can you ascertain the nature of the
>> > results?
>> > ~ David
>> >
>> > On Tue, Sep 20, 2016 at 5:35 AM Sandeep Khanzode
>> >  wrote:
>> >
>> > > For Solr 6.1.0
>> > > This works .. -{!field f=schedule op=Intersects}2016-08-26T12:00:56Z
>> > >
>> > > This works .. {!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
>> > > 2016-08-26T15:00:12Z]
>> > >
>> > >
>> > > Why does this not work?-{!field f=schedule
>> > > op=Contains}[2016-08-26T12:00:12Z TO 2016-08-26T15:00:12Z]
>> > >  SRK
>> >
>> > --
>> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> > http://www.solrenterprisesearchserver.com
>> >
>> >
>> >
>>
>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>>
>>
>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Removing SOLR fields from schema

2016-09-22 Thread David Santamauro




On 09/22/2016 08:55 AM, Shawn Heisey wrote:

On 9/21/2016 11:46 PM, Selvam wrote:

We use SOLR 5.x in cloud mode and have huge set of fields. We now want
to remove some 50 fields from Index/schema itself so that indexing &
querying will be faster. Is there a way to do that without losing
existing data on other fields? We don't want to do full re-indexing.


When you remove fields from your schema, you can continue to use Solr
with no problems even without a reindex.  But you won't see any benefit
to your query performance until you DO reindex.  Until the reindex is
done (ideally wiping the index first), all the data from the removed
fields will remain in the index and affect your query speeds.


Will an optimize remove those fields and corresponding data?

Re: how to remove duplicate from search result

2016-09-27 Thread David Santamauro


Have a look at

https://cwiki.apache.org/confluence/display/solr/Result+Grouping


On 09/27/2016 11:03 AM, googoo wrote:

hi,

We want to provide remove duplicate from search result function.

like we have below documents.
id(uniqueKey)   guid
doc1G1
doc2G2
doc3G3
doc4G1

user run one query and hit doc1, doc2 and doc4.
user want to remove duplicate from search result based on guid field.
since doc1 and doc4 has same guid, one of them should be drop from search
result.

how we can address this requirement?

Thanks,
Yongtao





--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-remove-duplicate-from-search-result-tp4298272.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Migrating to Solr 6.1.0 from 5.5.0

2016-09-29 Thread David Smiley

Arjun,

Your input is a POLYGON -- as seen in the error message.  The "Try JTS" was
hopefully a clue -- on
https://cwiki.apache.org/confluence/display/solr/Spatial+Search search for
"JTS" and you should see how to set the spatialContextFactory to JTS, and a
mention of needing JTS jar.  I'll try and add a bit more info on suggesting
exactly where to put it and a download link.  I'll also mention a shortcut
so you don't have to type out the classname -- a recent feature in 6.2.

Since you said you were upgrading... presumably your spatialContextFactory
attribute was already set for this to work at all in 5.5?  The package
reference changed for this value -- I imagine you would have seen a
warning/error to this effect in Solr's logs.  Do you?

~ David

On Tue, Sep 27, 2016 at 10:29 AM William Bell  wrote:

> the documentation is not good on this. Not sure how to fix it either.
>
> On Tue, Sep 27, 2016 at 3:41 AM, M, Arjun (Nokia - IN/Bangalore) <
> arju...@nokia.com> wrote:
>
> > Hi,
> >
> > We are getting the below errors when migrating Solr from 5.5.0 to
> > 6.1.0. Could anyone help in resolving the issue, if you have come across
> > this?
> >
> >
>  org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> > Error from server at http://127.0.0.1:41569/solr/collection1: Unable to
> > parse shape given formats "lat,lon", "x y" or as WKT because
> > java.text.ParseException: java.lang.UnsupportedOperationException:
> > Unsupported shape of this SpatialContext. Try JTS or Geo3D. input:
> > POLYGON((-10 30, -40 40, -10 -20, 0 0, -10 30))
> >
> > Thanks in advance..
> >
> > Thanks & Regards,
> >Arjun M
> >
> >
> >
> >
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Heatmap in JSON facet API

2016-11-01 Thread David Smiley

I plan on adding this in the near future... hopefully for Solr 6.4.

On Mon, Oct 31, 2016 at 7:06 AM Никита Веневитин 
wrote:

> I've built query as described in https://cwiki.apache.org/confluence/x/ZYDxAQ";>Heatmap Faceting,
> but I would like to get same results using JSON facet API
>
> 2016-10-30 15:24 GMT+03:00 GW :
>
> > If we are talking about the same kind of heat maps you might want to look
> > at the TomTom map API for a quick and dirty yet solid solution. Just
> supply
> > a whack of coordinates and let TomTom do the work. The Heat maps will
> zoom
> > in and de-cluster.
> >
> > Example below.
> >
> > http://www.frogclassifieds.com/tomtom/markers-clustering.html
> >
> >
> > On 28 October 2016 at 09:05, Никита Веневитин  >
> > wrote:
> >
> > > Hi!
> > > Is it possible to use JSON facet API to get heatmaps?
> > >
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

How-To: Secure Solr by IP Address

2016-11-04 Thread David Smiley

I was just researching how to secure Solr by IP address and I finally
figured it out. Perhaps this might go in the ref guide but I'd like to
share it here anyhow. The scenario is where only "localhost" should have
full unfettered access to Solr, whereas everyone else (notably web clients)
can only access some whitelisted paths. This setup is intended for a
single instance of Solr (not a member of a cluster); the particular config
below would probably need adaptations for a cluster of Solr instances. The
technique here uses a utility with Jetty called IPAccessHandler --
http://download.eclipse.org/jetty/stable-9/apidocs/org/eclipse/jetty/server/handler/IPAccessHandler.html
For reasons I don't know (and I did search), it was recently deprecated and
there's another InetAccessHandler (not in Solr's current version of Jetty)
but it doesn't support constraints incorporating paths, so it's a
non-option for my needs.

First, Java must be told to insist on it's IPv4 stack. This is because
Jetty's IPAccessHandler simply doesn't support IPv6 IP matching; it throws
NPEs in my experience. In recent versions of Solr, this can be easily done
just by adding -Djava.net.preferIPv4Stack=true at the Solr start
invocation. Alternatively put it into SOLR_OPTS perhaps in solr.in.sh.

Edit server/etc/jetty.xml, and replace the line
mentioning ContextHandlerCollection with this:

127.0.0.1
-.-.-.-|/solr/techproducts/select

false

This mechanism wraps ContextHandlerCollection (which ultimately serves
Solr) with this handler that adds the constraints. These constraints above
allow localhost to do anything; other IP addresses can only access
/solr/techproducts/select. That line could be duplicated for other
white-listed paths -- I recommend creating request handlers for your use,
possibly with invariants to further constraint what someone can do.

note: I originally tried inserting the IPAccessHandler in
server/contexts/solr-jetty-context.xml but found that there's a bug in
IPAccessHanlder that fails to consider when HttpServletRequest.getPathInfo
is null. And it wound up letting everything through (if I recall). But I
like it up in server.xml anyway as it intercepts everything

~ David

--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: How-To: Secure Solr by IP Address

2016-11-04 Thread David Smiley

Not to knock the other suggestions, but a benefit to securing Jetty like
this is that *everyone* can do this approach.

On Fri, Nov 4, 2016 at 9:54 AM john saylor  wrote:

> hi
>
> any firewall worth it's name should be able to do this. in fact, that is
> one of several things that a firewall was designed to do.
>
> also, you are stopping this traffic at the application, which is good;
> but you'd prolly be better off stopping it at the network interface
> [using a firewall, for instance].
>
> of course, firewalls have their own complexity ...
>
> good luck!
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Aggregate Values Inside a Facet Range

2016-11-04 Thread David Santamauro



I believe your answer is in the subject
  => facet.range
https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-RangeFaceting

//

On 11/04/2016 02:25 PM, Furkan KAMACI wrote:

I have documents like that

id:5
timestamp:NOW //pseudo date representation
count:13

id:4
timestamp:NOW //pseudo date representation
count:3

id:3
timestamp:NOW-1DAY //pseudo date representation
count:21

id:2
timestamp:NOW-1DAY //pseudo date representation
count:29

id:1
timestamp:NOW-3DAY //pseudo date representation
count:4

When I want to facet last 3 days data by timestamp its OK. However my need
is that:

facets:
 TODAY: 16 //pseudo representation
 TODAY - 1: 50 //pseudo date representation
 TODAY - 2: 0 //pseudo date representation
 TODAY - 3: 4 //pseudo date representation

I mean, I have to facet by dates and aggregate values inside that facet
range. Is it possible to do that without multiple queries at Solr?

Kind Regards,
Furkan KAMACI

Re: Overlapped Gap Facets

2016-11-17 Thread David Santamauro



I had a similar question a while back but it was regarding date 
differences. Perhaps that might give you some ideas.


http://lucene.472066.n3.nabble.com/date-difference-faceting-td4249364.html

//



On 11/17/2016 09:49 AM, Furkan KAMACI wrote:

Is it possible to do such a facet on a date field:

  Last 1 Day
  Last 1 Week
  Last 1 Month
  Last 6 Month
  Last 1 Year
  Older than 1 Year

which has overlapped facet gaps?

Kind Regards,
Furkan KAMACI

Adding a Basic Authentication user fails with 404

2017-06-06 Thread David Parker

Hello,

I am running a stand-alone instance of Solr 6.5 (without ZooKeeper).  I am
attempting to implement Basic Authentication per the documentation, but
when I try to use the API to add a user, I get a 404 error.  It seems the
/admin/authentication API entry point isn't there:

$ curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication
-H 'Content-type:application/json' -d '{"set-user": {"myuser" :
"mypasswd"}}'



Error 404 Not Found

HTTP ERROR 404
Problem accessing /solr/admin/authentication. Reason:
Not Found



But according to the documentation, the API entry point is
admin/authentication, and it states the following:

"This endpoint is not collection-specific, so users are created for the
entire Solr cluster. If users need to be restricted to a specific
collection, that can be done with the authorization rules."

The only thing which stands out to me is "users are created for the entire
Solr cluster."  Is this entry point missing because I'm running Solr
stand-alone?

Any help is greatly appreciated!

- Dave

-- 
Dave Parker
Database & Systems Administrator
Utica College
Integrated Information Technology Services
(315) 792-3229
Registered Linux User #408177

Re: Score higher if multiple terms match

2017-06-07 Thread David Hastings

well, short answer, use the analyzer to see whats happening.
long answer
 theres a difference between
name:tv promotion   -->  name:tv default_field:promotion
name:"tv promotion"   -->  name:"tv promotion"
name:tv AND name:promotion --> name:tv AND name:promotion

since your default field most likely isnt name, its going to search only
the default field for it.  you can alter this behavior using qf parameters:

qf='name^5 text'

for example would apply a boost of 5 if it matched the field 'name', and
only 1 for 'text'

On Wed, Jun 7, 2017 at 4:35 PM, OTH  wrote:

> Hello,
>
> I have what I would think to be a fairly simple problem to solve, however
> I'm not sure how it's done in Solr and couldn't find an answer on Google.
>
> Say I have two documents, "TV" and "TV promotion".  If the search query is
> "TV promotion", then, obviously, I would like the document "TV promotion"
> to score higher.  However, that is not the case right now.
>
> My syntax is something like this:
> http://localhost:8983/solr/sales/select?indent=on&wt=
> json&fl=*,score&q=name:tv
> promotion
> (I tried "q=name:tv+promotion (added the '+'), but it made no difference.)
>
> It's not scoring the document "TV promotion" higher than "TV"; in fact it's
> scoring it lower.
>
> Thanks
>

Re: Score higher if multiple terms match

2017-06-07 Thread David Hastings

sorry, i meant debug query where you would get output like this:

"debug": {
"rawquerystring": "name:tv promotion",
"querystring": "name:tv promotion",
"parsedquery": "+name:tv +text:promotion",


On Wed, Jun 7, 2017 at 4:41 PM, David Hastings  wrote:

> well, short answer, use the analyzer to see whats happening.
> long answer
>  theres a difference between
> name:tv promotion   -->  name:tv default_field:promotion
> name:"tv promotion"   -->  name:"tv promotion"
> name:tv AND name:promotion --> name:tv AND name:promotion
>
>
> since your default field most likely isnt name, its going to search only
> the default field for it.  you can alter this behavior using qf parameters:
>
>
>
> qf='name^5 text'
>
>
> for example would apply a boost of 5 if it matched the field 'name', and
> only 1 for 'text'
>
> On Wed, Jun 7, 2017 at 4:35 PM, OTH  wrote:
>
>> Hello,
>>
>> I have what I would think to be a fairly simple problem to solve, however
>> I'm not sure how it's done in Solr and couldn't find an answer on Google.
>>
>> Say I have two documents, "TV" and "TV promotion".  If the search query is
>> "TV promotion", then, obviously, I would like the document "TV promotion"
>> to score higher.  However, that is not the case right now.
>>
>> My syntax is something like this:
>> http://localhost:8983/solr/sales/select?indent=on&wt=json&;
>> fl=*,score&q=name:tv
>> promotion
>> (I tried "q=name:tv+promotion (added the '+'), but it made no difference.)
>>
>> It's not scoring the document "TV promotion" higher than "TV"; in fact
>> it's
>> scoring it lower.
>>
>> Thanks
>>
>
>

Re: Score higher if multiple terms match

2017-06-08 Thread David Hastings

Agreed, you need to show the debug query info from your original query:


My syntax is something like this:
>> >>> >> http://localhost:8983/solr/sales/select?indent=on&wt=json&;
>> >>> >> fl=*,score&q=name:tv
>> >>> >> promotion

and could probably help you get the results you want


On Thu, Jun 8, 2017 at 10:54 AM, Erick Erickson 
wrote:

> bq: I hope that clears the confusion.
>
> Nope, doesn't clear it up at all. It's not clear which query you're
> talking about at least to me.
>
> If you're searching for
> name:tv AND name:promotion
>
> and getting back a document that has only "tv" in the name field
> that's simply wrong and you need to find out why.
>
> If you're saying that searching for
> name:tv OR name:promotion
>
> returns both and that docs with both terms score higher, that's likely
> true although it'll be fuzzy. I'm guessing that the name field is
> fairly short so the length norm will be the sam and this will be
> fairly reliable. If the field could have a widely varying number of
> terms it's less reliable.
>
> Best,
> Erick
>
> On Thu, Jun 8, 2017 at 1:41 AM, OTH  wrote:
> > Hi - Sorry it was very late at night for me and I think I didn't pick my
> > wordings right.
> > bq: it is indeed returning documents with only either one of the two
> query
> > terms
> > What I meant was:  Initially, I thought it was only returning documents
> > which contained both 'tv' and 'promotion'.  Then I realized I was
> mistaken;
> > it was also returning documents which contained either 'tv' or
> 'promotion'
> > (as well as documents which contained both, which were scored higher).
> > I hope that clears the confusion.
> > Thanks
> >
> > On Thu, Jun 8, 2017 at 9:04 AM, Erick Erickson 
> > wrote:
> >
> >> bq: it is indeed returning documents with only either one of the two
> query
> >> terms
> >>
> >> Uhm, this should not be true. What's the output of adding debug=query?
> >> And are you totally sure the above is true and you're just not seeing
> >> the other term in the return? Or that you have a synonyms file that is
> >> somehow making docs match? Or ???
> >>
> >> So you're saying you get the exact same number of hits for
> >> name:tv OR name:promotion
> >> and
> >> name:tv AND name:promotion
> >> ??? Definitely not expected unless all docs happen to have both these
> >> terms in the name field either through normal input or synonyms etc.
> >>
> >> You should need something like:
> >> name:tv OR name:promotion OR (name:tv AND name:promotion)^100
> >> to score all the docs with both terms in the name field higher than just
> >> one.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Jun 7, 2017 at 3:05 PM, OTH  wrote:
> >> > I'm sorry, there was a mistake.
> >> >
> >> > I previously wrote:
> >> >
> >> > However, these are returning only those documents which have both the
> >> terms
> >> >> 'tv promotion' in them (there are a few).  It's not returning any
> >> >> document which have only 'tv' or only 'promotion' in them.
> >> >
> >> >
> >> > That's not true at all; it is indeed returning documents with only
> either
> >> > one of the two query terms (so, documents with only 'tv' or only
> >> > 'promotion' in them).  Sorry.  You can disregard my question in the
> last
> >> > email.
> >> >
> >> > Thanks
> >> >
> >> > On Thu, Jun 8, 2017 at 2:03 AM, OTH  wrote:
> >> >
> >> >> Thanks.
> >> >> Both of these are working in my case:
> >> >> name:"tv promotion"   -->  name:"tv promotion"
> >> >> name:tv AND name:promotion --> name:tv AND name:promotion
> >> >> (Although I'm assuming, the first might not have worked if my
> document
> >> had
> >> >> been say 'promotion tv' or 'tv xyz promotion')
> >> >>
> >> >> However, these are returning only those documents which have both the
> >> >> terms 'tv promotion' in them (there are a few).  It's not returning
> any
> >> >> document which have only 'tv' or only &#x

Re: Highlighter not working on some documents

2017-06-11 Thread David Smiley

Probably the most common reason is the default hl.maxAnalyzedChars -- thus
your highlightable text might not be in the first 51200 chars of text.  The
first Solr release with the unified highlighter had an even lower default
of 10k chars.

On Fri, Jun 9, 2017 at 9:58 PM Phil Scadden  wrote:

> Tried hard to find difference between pdfs returning no highlighter and
> ones that do for same search term.  Includes pdfs that have been OCRed and
> ones that were text to begin with. Head scratching to me.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, 10 June 2017 6:22 a.m.
> To: solr-user 
> Subject: Re: Highlighter not working on some documents
>
> Need lots more information. I.e. schema definitions, query you use,
> handler configuration and the like. Note that highlighted fields must have
> stored="true" set and likely the _text_ field doesn't. At least in the
> default schemas stored is set to false for the catch-all field.
> And you don't want to store that information anyway since it's usually the
> destination of copyField directives and you'd highlight _those_ fields.
>
> Best,
> Erick
>
> On Thu, Jun 8, 2017 at 8:37 PM, Phil Scadden  wrote:
> > Do a search with:
> > fl=id,title,datasource&hl=true&hl.method=unified&limit=50&page=1&q=pre
> > ssure+AND+testing&rows=50&start=0&wt=json
> >
> > and I get back a good list of documents. However, some documents are
> returning empty fields in the highlighter. Eg, in the highlight array have:
> > "W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}
> >
> > Getting this well up the list of results with good highlighted matchers
> above and below this entry. Why would the highlighter be failing?
> >
> > Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Swapping indexes on disk

2017-06-14 Thread David Hastings

I dont have an answer to why the folder got cleared, however i am wondering
why you arent using basic replication to do this exact same thing, since
solr will natively take care of all this for you with no interruption to
the user and no stop/start routines etc.

On Wed, Jun 14, 2017 at 2:26 PM, Mike Lissner <
mliss...@michaeljaylissner.com> wrote:

> We are replacing a drive mounted at /old with one mounted at /new. Our
> index currently lives on /old, and our plan was to:
>
> 1. Create a new index on /new
> 2. Reindex from our database so that the new index on /new is properly
> populated.
> 3. Stop solr.
> 4. Symlink /old to /new (Solr now looks for the index at /old/solr, which
> redirects to /new/solr)
> 5. Start solr
> 6. (Later) Stop solr, swap the drives (old for new), and start solr. (Solr
> now looks for the index at /old/solr again, and finds it there.)
> 7. Delete the index pointing to /new created in step 1.
>
> The idea was that this would create a new index for solr, would populate it
> with the right content, and would avoid having to touch our existing solr
> configurations aside from creating one new index, which we could soon
> delete.
>
> I just did steps 1-5, but I got null pointer exceptions when starting solr,
> and it appears that the index on /new has been almost completely deleted by
> Solr (this is a bummer, since it takes days to populate).
>
> Is this expected? Am I terribly crazy to try to swap indexes on disk? As
> far as I know, the only difference between the indexes is their name.
>
> We're using Solr version 4.10.4.
>
> Thank you,
>
> Mike
>

Re: Issue with highlighter

2017-06-14 Thread David Smiley

> Beware of NOT plus OR in a search. That will certainly produce no
highlights. (eg test -results when default op is OR)

Seems like a bug to me; the default operator shouldn't matter in that case
I think since there is only one clause that has no BooleanQuery.Occur
operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
effectively required and should definitely be highlighted.

Note to Ali: Phil's comment implies use of hl.method=unified which is not
the default.

On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden  wrote:

> Just had similar issue - works for some, not others. First thing to look
> at is hl.maxAnalyzedChars is the query. The default is quite small.
> Since many of my documents are large PDF files, I opted to use
> storeOffsetsWithPositions="true" termVectors="true" on the field I was
> searching on.
> This certainly did increase my index size but not too bad and certainly
> fast.
> https://cwiki.apache.org/confluence/display/solr/Highlighting
>
> Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
>
> -Original Message-
> From: Ali Husain [mailto:alihus...@outlook.com]
> Sent: Thursday, 15 June 2017 11:11 a.m.
> To: solr-user@lucene.apache.org
> Subject: Issue with highlighter
>
> Hi,
>
>
> I think I've found a bug with the highlighter. I search for the word
> "something" and I get an empty highlighting response for all the documents
> that are returned shown below. The fields that I am searching over are
> text_en, the highlighter works for a lot of queries. I have no
> stopwords.txt list that could be messing this up either.
>
>
>  "highlighting":{
> "310":{},
> "103":{},
> "406":{},
> "1189":{},
> "54":{},
> "292":{},
> "309":{}}}
>
>
> Just changing the search term to "something like" I get back this:
>
>
> "highlighting":{
> "310":{},
> "309":{
>   "content":["1949 Convention, like those"]},
> "103":{},
> "406":{},
> "1189":{},
> "54":{},
> "292":{},
> "286":{
>   "content":["persons in these classes are treated like
> combatants, but in other respects"]},
> "336":{
>   "content":["   be treated like engagement"]}}}
>
>
> So I know that I have it setup correctly, but I can't figure this out.
> I've searched through JIRA/Google and haven't been able to find a similar
> issue.
>
>
> Any ideas?
>
>
> Thanks,
>
> Ali
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

how to leave the mailing list? eof

2017-06-19 Thread david fernandes

Re: How are people using the ICUTokenizer?

2017-06-20 Thread David Hastings

Have you successfully used the shingles with the MoreLikeThis query?
Really curious about if this would to return the "interesting Phrases"

On Tue, Jun 20, 2017 at 12:01 PM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> Joel,
>
> I think the issue is doing word-breaking according to ICU rules.   So, if
> you are trying to make sure your index breaks words properly on eastern
> languages, just use ICU Tokenizer.   Unless your text is already in an ICU
> normal form, you should always use the ICUNormalizer character filter along
> with this:
>
> https://cwiki.apache.org/confluence/display/solr/CharFilterFactories#
> CharFilterFactories-solr.ICUNormalizer2CharFilterFactory
>
> I think that this would be good with Shingles when you are not removing
> stop words, maybe in an alternate analysis of the same content.
>
> I'm using it in this way, with shingles for phrase recognition and only
> doc freq and term freq - my possibly naïve idea is that I do not need
> positions and offsets if I'm using shingles, and my main goal is to do a
> MoreLikeThis query using the shingled versions of fields.
>
> -Original Message-
> From: Joel Bernstein [mailto:joels...@gmail.com]
> Sent: Tuesday, June 20, 2017 11:52 AM
> To: solr-user@lucene.apache.org
> Subject: How are people using the ICUTokenizer?
>
> It seems that there are some powerful capabilities in the ICUTokenizer. I
> was wondering how the community is making use of it.
>
> Does anyone have experience working with the ICUTokenizer that they can
> share?
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>

Re: Polygon search query working but NOT Multipolygon

Hi Puneeta,

So what does your field type definition look like?  I'd imagine you're using 
RptWithGeometrySpatialField.  And what is your Solr version?

BTW note the settings here 
https://locationtech.github.io/spatial4j/apidocs/org/locationtech/spatial4j/context/jts/JtsSpatialContextFactory.html
 
<https://locationtech.github.io/spatial4j/apidocs/org/locationtech/spatial4j/context/jts/JtsSpatialContextFactory.html>
  are reflected as attributes on the field type, thus you can set say 
useJtsMulti="false" to change the 'multi implementation.

~ David

> On Jun 28, 2017, at 6:44 AM, puneeta  wrote:
> 
> Hi,
> I am new to Solr Geospatial data and have set up JTS within solr. I have
> geo spatial data with Multipolygons. I am passing the coordinates and trying
> to find out which multipolygon contains those coordinates.However, The
> search query is working fine if I insert the data as a polygon. The same is
> not working if my data is inserted as a Multipolygon. I am unable to figure
> out what am I missing. Can anyone suggest where am I going wrong?
> 
> Data as Polygon:
> { "parcel_id":"6",
>"geo":["POLYGON((-86.452970463 32.449739005, 
>  -86.452889912 32.4494390510001, 
>  -86.453365379 32.44942802195, 
>  -86.453514854 32.44942453595))"]
> }
> 
> Data as Multipolygon:
> 
> { "parcel_id":"6",
>"geo":["MULTIPOLYGON(((-86.452970463 32.449739005, 
>  -86.452889912 32.4494390510001, 
>  -86.453365379 32.44942802195, 
>  -86.453514854 32.44942453595)))"]
> }
> 
> My search query:
> fq=geo:"Intersects(-86.453097892 32.449735102)"
> 
> This device surely lies between the polygon (My polygon coordinates are many
> more in the actual data. To reduce the size here I have omited few of the
> coordinates)
> 
> The query is returning only the polygon data. The multipolygon search is not
> happening.
> 
> Any help is highly appreciated.
> 
> Thanks in Advance,
> Puneeta
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Polygon-search-query-working-but-NOT-Multipolygon-tp4343143.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Polygon search query working but NOT Multipolygon

I suggest using RptWithGeometry field, and with that change remove distErrPct 
and maxDistErr.  See the ref guide, and note the geometry cache option.
BTW spatialContextFactory can simply be "jts".

If this fixes the issue, then the issue was related to grid approximation.

BTW you never quite said what it was about the results that was wrong.  Did you 
get hits you didn't expect (I'm guessing yes) or the inverse?

~ David

> On Jun 28, 2017, at 10:55 AM, puneeta  wrote:
> 
> Hi David,
> Thank you for the prompt reply. My field definition in schema.xml is :
> 
> I commented the existing location_rpt
> 
> 
> 
> And added:
>  class="solr.SpatialRecursivePrefixTreeFieldType"
> 
> spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
>   autoIndex="true"
>   validationRule="repairBuffer0"
>   distErrPct="0.025"
>   maxDistErr="0.001"
>   distanceUnits="kilometers" />
> 
> My Solr version is 6.2.1
> 
> Thanks,
> Puneeta
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Polygon-search-query-working-but-NOT-Multipolygon-tp4343143p4343162.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 5.5 - spatial intersects query returns results outside of search box

> On Jun 27, 2017, at 3:28 AM, Leila Gonzales  wrote:
> 
> {
> 
>"id": "5230",
> 
>"location_geo":
> ["ENVELOPE(-75.0,-75.939723,39.3597224,38.289722)"]
> 
>  }

This is an unusual rectangle.  Remember this is minX, maxX, maxY, minY.  Thus 
this rectangle wraps the entire globe except for nearly a degree.  It matches 
your query rectangle.

Re: Solr 5.5 - spatial intersects query returns results outside of search box

No prob.

BTW you may want to investigate use of BBoxField or 
RptWithGeometrySpatialField; both are also more accurate... but vanilla RPT may 
be just fine (fastest).


> On Jun 28, 2017, at 11:32 AM, Leila Gonzales  wrote:
> 
> Thanks David! I fixed the coordinates and put some error checking in my
> Solr indexing script to trap for this type of coordinate mismatch.
> 
> -Original Message-
> From: David Smiley [mailto:david.w.smi...@gmail.com]
> Sent: Wednesday, June 28, 2017 8:21 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 5.5 - spatial intersects query returns results outside
> of search box
> 
> 
>> On Jun 27, 2017, at 3:28 AM, Leila Gonzales  wrote:
>> 
>> {
>> 
>>   "id": "5230",
>> 
>>   "location_geo":
>> 
> ["ENVELOPE(-75.0,-75.939723,39.3597224,38.289722)"
> ]
>> 
>> }
> 
> This is an unusual rectangle.  Remember this is minX, maxX, maxY, minY.
> Thus this rectangle wraps the entire globe except for nearly a degree.  It
> matches your query rectangle.

Re: Polygon search query working but NOT Multipolygon

https://lucene.apache.org/solr/guide/6_6/spatial-search.html#SpatialSearch-RptWithGeometrySpatialField
 
<https://lucene.apache.org/solr/guide/6_6/spatial-search.html#SpatialSearch-RptWithGeometrySpatialField>


> On Jun 28, 2017, at 11:32 AM, puneeta  wrote:
> 
> Hi David,
> I am sorry ,I did not understand what do you mean by "I suggest using
> RptWithGeometry field". Should leave the existing location_rpt definition in
> schema.xml?
>  class="solr.SpatialRecursivePrefixTreeFieldType"
>   geo="true" distErrPct="0.025" maxDistErr="0.001"
> distanceUnits="kilometers" />
> This line I have commented. Should I uncomment it?
> 
> 1."remove distErrPct and maxDistErr" - 
> 2.Added usejtsMulti="false"
> 
> I will change the  field definition as follows, try to execute and report
> back.
>class="solr.SpatialRecursivePrefixTreeFieldType" 
> 
> jts*="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory" 
>  autoIndex="true"
>  validationRule="repairBuffer0"
>  distanceUnits="kilometers" 
>  *useJtsMulti="false"*/> 
> 
> 
> The issue I am facing is that the I am not getting the search result for
> Multipolygon i.e I should get hits.Currently, the numFound = 0, It should
> find atleast 1 record as it does for a Polygon search.
> 
> Thanks,
> Puneeta
> 
> david.w.smi...@gmail.com <mailto:david.w.smi...@gmail.com> wrote
>> I suggest using RptWithGeometry field, and with that change remove
>> distErrPct and maxDistErr.  See the ref guide, and note the geometry cache
>> option.
>> BTW spatialContextFactory can simply be "jts".
>> 
>> If this fixes the issue, then the issue was related to grid approximation.
>> 
>> BTW you never quite said what it was about the results that was wrong. 
>> Did you get hits you didn't expect (I'm guessing yes) or the inverse?
>> 
>> ~ David
>> 
>>> On Jun 28, 2017, at 10:55 AM, puneeta <
> 
>> pverma@
> 
>> > wrote:
>>> 
>>> Hi David,
>>> Thank you for the prompt reply. My field definition in schema.xml is :
>>> 
>>> I commented the existing location_rpt
>>> 
>>> 
>>> 
>>> And added:
>>> 
>> >> 
>> class="solr.SpatialRecursivePrefixTreeFieldType"
>>> 
>>> spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
>>>  autoIndex="true"
>>>  validationRule="repairBuffer0"
>>>  distErrPct="0.025"
>>>  maxDistErr="0.001"
>>>  distanceUnits="kilometers" />
>>> 
>>> My Solr version is 6.2.1
>>> 
>>> Thanks,
>>> Puneeta
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Polygon-search-query-working-but-NOT-Multipolygon-tp4343143p4343162.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Polygon-search-query-working-but-NOT-Multipolygon-tp4343143p4343184.html
>  
> <http://lucene.472066.n3.nabble.com/Polygon-search-query-working-but-NOT-Multipolygon-tp4343143p4343184.html>
> Sent from the Solr - User mailing list archive at Nabble.com 
> <http://nabble.com/>.

Re: Spatial Search based on the amount of docs, not the distance

Deniz didn't mention document-to-document distance sort but he/she didn't say 
it wasn't that case either.

Any way, FYI at the Lucene level with LatLonPoint there is some sophisticated 
BKD search code to efficiently return the top N distance ordered documents 
(where you supply N).  Although as far as I recall, it also has no filtering 
mechanism, so if you have any other filters (keyword/time/whatever), it 
wouldn't work.

I once did this feature on an RPT index for a client and I got the open-source 
permission but I haven't gotten around to properly adding it to Solr.  I might 
approach it a bit differently now.

~ David

> On Jun 22, 2017, at 8:34 PM, Tim Casey  wrote:
> 
> deniz,
> 
> I was going to add something here.  The reason what you want is probably
> hard to do is because you are asking solr, which stores a document, to
> return documents using an attribute of document pairs.  As only a though
> exercise, if you stored record pairs as a single document, you could
> probably query it directly.  That is, if you have d1 and d2 and you are
> querying  around d1 and ordering by distance, then you could get this
> directly from a document representing a record pair.  I don't think this is
> practical, because it is an n^2 store.
> 
> Since the n^2 problem is there, people are going to suggest some heuristic
> which avoids this problem.  What Erick is suggesting is down this path.
> Query around a point and sort by distance taking the top K results.  The
> result is taking a linear slice of the n^2 distance attribute.
> 
> tim
> 
> 
> 
> On Wed, Jun 21, 2017 at 7:50 PM, Erick Erickson 
> wrote:
> 
>> Would it serve to sort by distance? True, if you matched a zillion
>> documents within a 1km radius you'd still perform the distance calcs, but
>> the result would be a manageable number.
>> 
>> I have to ask "Why to you care?". Is this an efficiency question (i.e. you
>> want to keep Solr from having to do expensive work) or is it a question of
>> having to get hits at all? It's at least possible that the solution for one
>> is not the solution for the other.
>> 
>> Best,
>> Erick
>> 
>> On Wed, Jun 21, 2017 at 5:32 PM, deniz  wrote:
>> 
>>> it is for sure possible to use d value for limiting the distance,
>> however,
>>> it
>>> might not be very efficient, as some of the coords may not have any docs
>>> around for a large value of d... so it is hard to determine a default
>> value
>>> for d.
>>> 
>>> though it sounds like havinga default d and gradual increments on its
>> value
>>> might be a work around for top K results...
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -
>>> Zeki ama calismiyor... Calissa yapar...
>>> --
>>> View this message in context: http://lucene.472066.n3.
>>> nabble.com/Spatial-Search-based-on-the-amount-of-docs-not-the-distance-
>>> tp4342108p4342258.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>>

Re: Polygon search query working but NOT Multipolygon

This polygon is fairly rectangular with one side having a ton of points.
Nonetheless the query point is clearly far apart from it (it's much lower
(smaller 'y' dimension).

On Wed, Jun 28, 2017 at 10:17 PM puneeta  wrote:

> Hi David,
>   Actually my polygon had too many coordinates, so i just omitted some
> while
> posting my query. Here is my complete multipolygon where the last point is
> same as the first one:
>
> 
> MULTIPOLYGON (((-86.477551331 32.490605651,
> -86.477637350 32.4903921820001, -86.478257247 32.4905655910001,
> -86.478250466 32.4905802390001, -86.478243988 32.49059368096,
> -86.47823751 32.490607122, -86.478231749 32.49061910096, -86.478224637
> 32.4906340650001, -86.478218237 32.490647541, -86.478211847
> 32.49066103595, -86.478205478 32.4906745260001, -86.47820210799989
> 32.4906816669, -86.478199132 32.4906880240001, -86.478192825
> 32.490701523, -86.478186533 32.490715047, -86.478183209 32.4907222090001,
> -86.4781802789 32.4907285690001, -86.478174063 32.4907421250001,
> -86.478167851 32.4907556540001, -86.478162558 32.49076723696,
> -86.47815905399989 32.490774513000105, -86.477551331 32.490605651)))
> 
> 
>
> Thanks,
> Puneeta
>
>
>
>
> david.w.smi...@gmail.com wrote
> > I tried your data in the "JTS TestBuilder" GUI.  Firstly, your polygon
> > isn't "closed", but that was easily fixed by repeating the first point at
> > the end.  See the attached screenshot of the GUI for what these shapes
> > look like.  The red dot (the query point) is outside of this
> > triangular-ish shape, and thus not a match.
> >
> >
> >
> >
> >> On Jun 28, 2017, at 12:33 PM, puneeta <
>
> > pverma@
>
> > > wrote:
> >>
> >> Hi David,
> >>  I did the following changes:
> >>
> >> Changed in schema.xml:
> >>
> >  >>
> >
> >>
> spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
> >>   autoIndex="true"
> >>   validationRule="repairBuffer0"
> >>   distanceUnits="kilometers"
> >> useJtsMulti="false"
> >> />
> >>
> >>
> >> Added in solrconfig.xml:
> >>
> >  >>
> >class="solr.LRUCache"
> >>   size="256"
> >>   initialSize="0"
> >>   autowarmCount="100%"
> >>   regenerator="solr.NoOpRegenerator"/>
> >>
> >> My fields in the core as defined in the schema is:
> >> <
> http://lucene.472066.n3.nabble.com/file/n4343221/SolrGeoFieldDefinition.png>
> ;
> >>
> >> However, I still face the same issue. No results found for a
> multipolygon
> >> search.
> >>
> >> Not sure whats happening :(
> >>
> >> Puneeta
> >>
> >>
> >>
> >>
> >>
> >>
>
> > david.w.smiley@
>
> >  wrote
> >>>
> https://lucene.apache.org/solr/guide/6_6/spatial-search.html#SpatialSearch-RptWithGeometrySpatialField
> >>> <
> https://lucene.apache.org/solr/guide/6_6/spatial-search.html#SpatialSearch-RptWithGeometrySpatialField>
> ;
> >>>
> >>>
> >>>> On Jun 28, 2017, at 11:32 AM, puneeta <
> >>
> >>> pverma@
> >>
> >>> > wrote:
> >>>>
> >>>> Hi David,
> >>>> I am sorry ,I did not understand what do you mean by "I suggest using
> >>>> RptWithGeometry field". Should leave the existing location_rpt
> >>>> definition
> >>>> in
> >>>> schema.xml?
> >>>>
> >>>
> >  >>
> >>>
> >>> class="solr.SpatialRecursivePrefixTreeFieldType"
> >>>>  geo="true" distErrPct="0.025" maxDistErr="0.001"
> >>>> distanceUnits="kilometers" />
> >>>> This line I have commented. Should I uncomment it?
> >>>>
> >>>> 1."remove distErrPct and maxDistErr" -
> >>>> 2.Added usejtsMulti="false"
> >>>>
> >>>> I will change the  field definition as follows, try to execute and
> >>>> report
> >>>> back.
> >>>>
> &g

Re: Not highlighting "and" and "or"?