from:"David Smiley"

Re: Regarding google maps polyline to use IsWithin(POLYGON(())) in solr

2016-03-15 Thread David Smiley

Hi Pradeep,

Are you seeing an error when it doesn't work?  I believe a shape
overlapping itself will cause an error from JTS.  If you do see that, then
you can ask Spatial4j (used by Lucene/Solr) to attempt to deal with it in a
number of ways.  See "validationRule":
https://locationtech.github.io/spatial4j/apidocs/org/locationtech/spatial4j/context/jts/JtsSpatialContextFactory.html

Probably try validationRule="repairBuffer0".

If it still doesn't work (and if you can't use what I say next), I
suggesting debugging this at the JTS level.  You might then wind up
submitting a question to the JTS list.

Spatial4j extends the WKT syntax with a BUFFER() syntax which is possibly
easier/better than your approach of manually building up the buffered path
with your own code to produce a large polygon to send to Solr.  You would
do something like BUFFER(LINESTRING(...),0.001) whereas "10" is the
distance in degrees if you have geo="true", otherwise whatever units your
data was put in.  You can use that with or without JTS since Spatial4j has
a native BufferedLineString shape.  But FYI it doesn't support geo="true"
very well (i.e. working in degrees); the buffer will be skewed very much
away from the equator.  So you could set geo="false" and supply, say,
web-mercator bounding box and work in that Euclidean/2D projected space.

Another FYI, Lucene has a "Geo3d" package within the Spatial3d module that
has a native implementation of a buffered LineString as well, one that
works on the surface of the earth.  It hasn't yet been hooked into
Spatial4j, after which Solr would need no changes.  There's a user "Chris"
who is working on that; it's filed here:
https://github.com/locationtech/spatial4j/issues/134

Good luck.

~ David

On Tue, Mar 15, 2016 at 2:45 PM Pradeep Chandra <
pradeepchandra@gmail.com> wrote:

> Hi Sir,
>
> I want to draw a polyline along the route given by google maps (from one
> place to another place).
>
> I applied the logic of calculating parallel lines between the two markers
> on the route on both sides of the route. Because of the non-linear nature
> of the route. In some cases the polyline is overlapping.
>
> Finally what I am willing to do is by drawing that polyline along the
> route. I will give that polygon go Solr in order to get the results within
> the polygon. But where the problem I am getting is because of the
> overlapping nature of polyline, the Solr is not taking that shape.
>
> Can u suggest me a logic to draw a polyline along the route / Let me know
> is there any type to fetch the data with that type of polyline also in Sorl.
>
> I construct a polygon with 300 points. But for that solr is not giving any
> result..Where as it is giving for results for polygon having points of <
> 200...Can u tell me about the max no.of points to construct a polygon using
> solr...Or it is restricted to that many points in solr.
>
> I am sending some images of my final desired one & my applied one. Please
> find those attachments.
>
> Thanks and Regards
> M Pradeep Chandra
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Regarding google maps polyline to use IsWithin(POLYGON(())) in solr

2016-03-19 Thread David Smiley

JTS doesn't has any vertex limit on the geometries.  So I don't know why
your query isn't working.

On Wed, Mar 16, 2016 at 1:58 AM Pradeep Chandra <
pradeepchandra@gmail.com> wrote:

> Hi Sir,
>
> Let me give some clarification on IsWithin(POLYGON(())) query...It is not
> giving any result for beyond 213 points of polygon...
>
> Thanks
> M Pradeep Chandra
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Regarding-google-maps-polyline-to-use-IsWithin-POLYGON-in-solr-tp4263975p4264046.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Seasonal searches in SOLR 5.x

2016-03-22 Thread David Smiley

Hi,

I suggest having a "season" field (or whatever you might want to call it)
using DateRangeField but simply use a nominal year value.  So basically all
durations would be within this nominal year.  For some docs that span
new-years, this might mean 2 durations and that's okay.  Also it's okay if
you have multiple values and it's okay if your calculations result in some
that overlap; you needn't make them distinct; it'll all get coalesced in
the index.

If for some reason you wind up going the route of abusing point data for
durations, I recommend this link:
http://wiki.apache.org/solr/SpatialForTimeDurations
and it most definitely does not require polygons (and thus JTS); I'm not
sure what gave you that impression.  It's all rectangles & points.

~ David

On Mon, Mar 21, 2016 at 1:29 PM Ioannis Kirmitzoglou <
ioanniskirmitzog...@gmail.com> wrote:

> Hi all,
>
> I would like to implement seasonal date searches on date ranges. I’m using
> SOLR 5.4.1 and have indexed date ranges using a DateRangeField (let’s call
> this field date_ranges).
> Each document in SOLR corresponds to a biological sample and each sample
> was collected during a date range that can span from a single day to
> multiple years. For my application it makes sense to enable seasonal
> searches, ie find samples that were collected during a specific period of
> the year (e.g. summer, or February). In this type of search, the year that
> the sample was collected is not relevant, only the days of the year. I’ve
> been all over SOLR documentation and I haven’t been able to find anything
> that will enable do me that. The closest I got was a post with instructions
> on how to use a spatial field to do date searches (
> https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/).
> Using the logic in that post I was able to come up with a solution but it’s
> rather complex and needs polygon searches (which in turn means installing
> the JTS Topology suite).
> Before committing to that I would like to ask for your input and whether
> there’s an easier way to do these types of searches.
>
> Many thanks,
>
> Ioannis
>
> -
> Ioannis Kirmitzoglou, PhD
> Bioinformatician - Scientific Programmer
> Imperial College, London
> www.vectorbase.org
> www.vigilab.org
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Facet heatmaps: cluster coordinates based on average position of docs

2016-04-19 Thread David Smiley

Hi Anton,

Perhaps you should request a more detailed / high-res heatmap, and then
work with that, perhaps using some clustering technique?  I confess I don't
work on the UI end of things these days.

p.s. I'm on vacation this week; so I don't respond quickly

~ David

On Thu, Apr 7, 2016 at 3:43 PM Anton K.  wrote:

> I am working with new solr feature: facet heatmaps. It works great, i
> create clusters on my map with counts. When user click on cluster i zoom in
> that area and i might show him more clusters or documents (based on current
> zoom level).
>
> But all my cluster icons (i use round one, see screenshot below) placed
> straight in the center of cluster's rectangles:
>
> https://dl.dropboxusercontent.com/u/1999619/images/map_grid3.png
>
> Some clusters can be in sea and so on. Also it feels not natural in my case
> to have icons placed orderly on the world map.
>
> I want to place cluster's icons in average coords based on coordinates of
> all my docs inside cluster. Is there any way to achieve this? I am trying
> to use stats component for facet heatmap but it isn't implemented yet.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: issues doing a spatial query

2016-04-28 Thread David Smiley

Hi.
This makes sense to me.  The point 49.8,-97.1 is in your query box.  The
box is lower-left to upper-right, so your box is actually an almost
world-wrapping one grabbing all longitudes except  -93 to -92.  Maybe you
mean to switch your left & right.

On Sun, Apr 24, 2016 at 8:03 PM GW  wrote:

> I was not getting the results I expected so I started testing with the solr
> webclient
>
> Maybe I don;t understand things.
>
> simple test query
>
> q=*:*&fq=locations:[49,-92 TO 50,-93]
>
> I don't understand why I get a result set for longitude range -92 to -93
> but should be zero results as far as I understand.
>
>
> 
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 2,
> "params": {
>   "q": "*:*",
>   "indent": "true",
>   "fq": "locations:[49,-92 TO 50,-93]",
>   "wt": "json",
>   "_": "1461541195102"
> }
>   },
>   "response": {
> "numFound": 85,
> "start": 0,
> "docs": [
>   {
> "id": "data.spidersilk.co!337",
> "entity_id": "337",
> "type_id": "simple",
> "gender": "Male",
> "name": "Aviator Sunglasses",
> "short_description": "A timeless accessory staple, the
> unmistakable teardrop lenses of our Aviator sunglasses appeal to
> everyone from suits to rock stars to citizens of the world.",
> "description": "Gunmetal frame with crystal gradient
> polycarbonate lenses in grey. ",
> "size": "",
> "color": "",
> "zdomain": "magento.spidersilk.co",
> "zurl":
> "
> http://magento.spidersilk.co/index.php/catalog/product/view/id/337/s/aviator-sunglasses/
> ",
> "main_image_url":
> "
> http://magento.spidersilk.co/media/catalog/product/cache/0/image/9df78eab33525d08d6e5fb8d27136e95/a/c/ace000a_1.jpg
> ",
> "keywords": "Eyewear  ",
> "data_size": "851,564",
> "category": "Eyewear",
> "final_price_without_tax": "295,USD",
> "image_url": [
>   "
> http://magento.spidersilk.co/media/catalog/product/a/c/ace000a_1.jpg";,
>   "
> http://magento.spidersilk.co/media/catalog/product/a/c/ace000b_1.jpg";
> ],
> "locations": [
>   "37.4463603,-122.1591775",
>   "42.5857514,-82.8873787",
>   "41.6942622,-86.2697108",
>   "49.8522263,-97.1390697"
> ],
> "_version_": 1532418847465799700
>   },
>
>
>
> Thanks,
>
> GW
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Solr - index polygons from csv

2016-04-28 Thread David Smiley

Hi.

To use polygons, you need to add JTS, otherwise you get an unsupported
shape error.  See
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
it involves not only adding a JTS lib to your classpath (ideal spot is
WEB-INF/lib ) but also adding a spatialContextFactory attribute.  Note that
the value of this attribute is different from 6.0 forward (as seen on the
live page), so get a PDF copy of the ref guide matching the Solr version
you are using if you are not on the latest.  Also, I recommend using
solr.RptWithGeometrySpatialField for indexing non-point data (and it'll
probably work fine for point data too).

When you use geo=false, there are no units or it might have an ignorable
value of degrees.  Essentially it's in whatever units your data is on the
Euclidean 2D plane.

~ David

On Fri, Apr 22, 2016 at 4:33 AM Jan Nekuda  wrote:

> Hello guys,
> I use solr 6 for indexing data with points and polygons.
>
> I have a question about indexing polygons from csv file. I have configured
> type:
>  class="solr.SpatialRecursivePrefixTreeFieldType" geo="false"
> maxDistErr="0.001" worldBounds="ENVELOPE(-1,-1,-1,-1)"
> distErrPct="0.025" distanceUnits="kilometers"/>
>
> and field
>  stored="true"/>
>
> I have tried to import this csv:
>
> kod_adresa,nazev_ulice,cislo_orientacni,cislo_domovni,polygon_mapa,nazev_obec,Nazev_cast_obce,kod_ulice,kod_cast_obce,kod_obec,kod_momc,nazev_momc,Nazev,psc,nazev_vusc,kod_vusc,Nazev_okres,Kod_okres
> 9,,,4,"POLYGON ((-30 -10,-10 -20,-20 -40,-40 -40,-30
> -10))",Vacov,Javorník,,57843,550621,,,Stachy,38473,Jihočeský
> kraj,35,Prachatice,3306
>
> and result is:
>
> Posting files to [base] url http://localhost:8983/solr/ruian/update...
> Entering auto mode. File endings considered are
>
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> POSTing file polygon.csv (text/csv) to [base]
> SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url:
> http://localhost:8983/solr/ruian/update
> SimplePostTool: WARNING: Response: 
> 
> 400 name="QTime">3 name="error-class">org.apache.solr.common.SolrException
> name="root-error-class">java.lang.UnsupportedOperationException name="msg">Couldn't parse shape 'POLYGON ((-30 -10,-10 -20,-20 -40,-40
> -40,-30 -10))' because: java.lang.UnsupportedOperationException:
> Unsupported shape of this SpatialContext. Try JTS or Geo3D. name="code">400
> 
> SimplePostTool: WARNING: IOException while reading response:
> java.io.IOException: Server returned HTTP response code: 400 for URL:
> http://localhost:8983/solr/ruian/update
> 1 files indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/ruian/update.
> ..
> Time spent: 0:00:00.036
>
> Could someone give me any advice how to solve it? With indexing points in
> the same way I'm fine.
>
> and one more question:
> I have this field type:
>   class="solr.SpatialRecursivePrefixTreeFieldType"* geo="false*"
> maxDistErr="0.001"
> worldBounds="ENVELOPE(-1,-1,-1,-1)" distErrPct="0.025"
> distanceUnits="kilometers"/>
>
> if I use  geo=false for solr.SpatialRecursivePrefixTreeFieldType and I use
> this query:
>
> http://localhost:8983/solr/ruian/select?indent=on&q=*:*&fq={!bbox%20sfield=mapa}&pt=-818044.37%20-1069122.12&d=20
> 
> <
> http://localhost:8983/solr/ruian/select?indent=on&q=*:*&fq=%7B!bbox%20sfield=mapa%7D&pt=-818044.37%20-1069122.12&d=20
> >
> for
> getting all object in distance. But I actually don't know in which units
> the distance is with this settings.
>
>
>
> Thank you very much
>
> Jan
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: relaxed vs. improved validation in solr.TrieDateField

2016-05-06 Thread David Smiley

Sorry to hear that Uwe Reh.

If this is just in your input/index data, then this could be handled with
an URP, maybe evan an existing URP.
See ParseDateFieldUpdateProcessorFactory which uses the Joda-time API.  I
am not sure if that will work, I'm a little doubtful in fact since Solr now
uses the Java 8 time API which was taken, more or less, from Joda-time.
But it's worth a shot, any way.  If it doesn't work, let me know and I'll
give you a snippet of JavaScript you can use in your URP chain.

~ David

On Fri, Apr 29, 2016 at 4:07 AM Uwe Reh  wrote:

> Hi,
>
> doing some migration tests (4.10 to 6.0) I recognized a improved
> validation of TrieDateField.
> Syntactical correct but impossible days are rejected now. (stack trace
> at the end of the mail)
>
> Examples:
> - '1997-02-29T00:00:00Z'
> - '2006-06-31T00:00:00Z'
> - '2000-00-00T00:00:00Z'
> The first two dates are formal ok, but the Date does not exist. The
> third date is more suspicions, but was also accepted by Solr 4.10.
>
> I appreciate this improvement in principle, but I have to respect the
> original data. The dates might be intentionally wrong.
>
> Is there an easy way to get the weaker validation back?
>
> Regards
> Uwe
>
>
> > Invalid Date in Date Math String:'1997-02-29T00:00:00Z'
> > at
> org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:254)
> > at
> org.apache.solr.schema.TrieField.createField(TrieField.java:726)
> > at
> org.apache.solr.schema.TrieField.createFields(TrieField.java:763)
> > at
> org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:47)
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Boosting by calculated distance buckets

2015-02-14 Thread David Smiley

Hello,
You can totally boost by calculations that happen on-the-fly on a
per-document basis when you search.  These are called function queries in
Solr.

Your your specific example… a solution that doesn’t involve writing a custom
so-called ValueSource in Java would likely mean calculating the distance
multiple times per document for each range.  Instead I suggest a continuous
function, like the reciprocal of the distance.  See the definition of the
formula here: 
https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-AvailableFunctions
  
For ‘m’ provide 1.0.  For ‘a’ and ‘b’ I suggest using the same value set to
roughly 1/10th the distance to the perimeter of the region of relevant
interest — perhaps 1/10th of say 200km.  You will of course fiddle with this
to your liking.  Assuming you use edismax, you could multiply the natural
score by something like:
&boost=recip(geodist(),1,20,20)

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


sraav wrote
> I hit a block when I ran into a use case where I had to boost on ranges of
> distances calculated at query time. This is the case when the distance is
> not present in the document initially but calulated based on the user
> entered lat/long values. 
> 
> 1. Is it required that all the boost parameters be searchable or can we
> boost on dynamic parameters which are calculated ?
> 2. Is there a way to boost on geodist() in a specific range – For example
> – Boost all the cars listed within 20-50kms range(from the search zip) by
> 100. And give a boost of 85 to all the cars listed within 51-80kms range 
> from the search zip. 
> 
> Please provide your feedback and let me know if there are any other
> options that i could try out.





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 Independent Lucene/Solr search consultant, 
http://www.linkedin.com/in/davidwsmiley
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-by-calculated-distance-buckets-tp4186504p4186587.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boosting by calculated distance buckets

2015-02-17 Thread David Smiley

Raav,

You may need to actually subscribe to the solr-user list.  Nabble seems to
not be working to well.
p.s. I’m on vacation this week so I can’t be very responsive

First of all... it's not clear you actually want to *boost* (since you seem
to not care about the relevancy score), it seems you want to *sort* based on
a function query.  So simply sort by the function query instead of using the
'bq' param.

Have you read about geodist() in the Solr Reference Guide?  It returns the
spatial distance.  With that and other function queries like map() you could
do something like sum(map(geodist(),0,40,40,0),map(geodist(),0,20,10,0)) and
you could put that into your main function query.  I purposefully overlapped
the map ranges so that I didn't have to deal with double-counting an edge. 
The only thing I don't like about this is that the distance is going to be
calculated as many times as you reference the function, and it's slow.  So
you may want to write your own function query (internally called a
ValueSource), which is relatively easy to do in Solr.

~ David


sraav wrote
> David,
> 
> Thank you for your prompt response. I truly appreciate it. Also, My post
> was not accepted the first two times so I am posting it again one final
> time. 
> 
> In my case I want to turn off the dependency on scoring and let solr use
> just the boost values that I pass to each function to sort on. Here is a
> quick example of how I got that to work with non-geo fields which are
> present in the document and are not dynamically calculated. Using edismax
> ofcourse.
> 
> I was able to turn off the scoring (i mean remove the dependency on score)
> on the result set and drive the sort by the boost that I mentioned in the
> below query. In the below function For example - if the "document1"
> matches the date listed it gets a boost = 5. If the same document matches
> the owner AND product  - it will get an additional boost of 5 more. The
> total boost of this "document1" is 10. From what ever I have seen, it
> seems like i was able to turn off of negate the affects of solr score.
> There was a query norm param that was affecting the boost but it seemed to
> be a constant around 0.70345...most of the time for any fq mentioned).  
> 
> bq = {!func}sum(if(query({!v='datelisted:[2015-01-22T00:00:00.000Z TO
> *]'}),5,0),if(and(query({!v='owner:*BRAVE*'}),query({!v='PRODUCT:*SWORD*'}),5,0))
> 
> What I am trying to do is to add additional boosting function to the
> custom boost that will eventually tie into the above function and boost
> value.
> 
> For example - if "document1" falls in 0-20 KM range i would like to add a
> boost of 50 making the final boost value to be 60. If it falls under
> 20-40KM - i would like to add a boost of 40 and so on.  
> 
> Is there a way we can do this?  Please let me know if I can provide better
> clarity on the use case that I am trying to solve. Thank you David.
> 
> Thanks,
> Raav





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 Independent Lucene/Solr search consultant, 
http://www.linkedin.com/in/davidwsmiley
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-by-calculated-distance-buckets-tp4186504p4187112.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr join + Boost in single query

2015-03-03 Thread David Smiley

No, not without writing something custom anyway. It'd be difficult to make it
fast if there's a lot of documents to join on.


sraav wrote
> David,
> 
> Is it possible to write a query to join two cores and either bring back
> data from the two cores or to boost on the data coming back from either of
> the cores? Is that possible with Solr? 
> 
> Raavi





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 Independent Lucene/Solr search consultant, 
http://www.linkedin.com/in/davidwsmiley
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-join-Boost-in-single-query-tp4190825p4190849.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread David Smiley

Another more modern option, very related to this, is to use DateRangeField in 
5.0.  You have full 64 bit precision.  More info is in the Solr Ref Guide.

If Alessandro sticks with RPT, then the best reference to give is this:
http://wiki.apache.org/solr/SpatialForTimeDurations

~ David
https://www.linkedin.com/in/davidwsmiley

> On May 21, 2015, at 11:49 AM, Holger Rieß  
> wrote:
> 
> Give geospatial search a chance. Use the 
> 'SpatialRecursivePrefixTreeFieldType' field type, set 'geo' to false.
> The date is located on the X-axis, prices on the Y axis.
> For every price you get a horizontal line between start and end date. Index a 
> rectangle with height 0.001(< 1 cent) and width 'end date - start date'.
> 
> Find all prices that are valid on a given day or in a given date range with 
> the 'geofilt' function.
> 
> The field type could look like (not tested):
> 
>  class="solr.SpatialRecursivePrefixTreeFieldType"
>   geo="false" distErrPct="0.025" maxDistErr="0.09" units="degrees"
>   worldBounds="1 0 366 1" />
> 
> Faceting possibly can be done with a facet query for every of your price 
> ranges.
> For example day 20, price range 0-5$, rectangle: 20.0 0.0 
> 21.0 5.0.
> 
> Regards Holger
>

Re: Highlighting phone numbers

2016-05-18 Thread David Smiley

Perhaps an easy thing to try is see of the FastVectorHighlighter yields any
different results.  There are some nuances to the highlighters -- it might.

Failing that, this likely due to your analysis chain, and where exactly the
offsets point to, which you can see/debug in Solr's analysis screen.  You
might have to develop custom analysis components (e.g. custom TokenFilter)
if the offsets aren't what you want.

Good luck,
~ David

On Wed, May 18, 2016 at 9:07 AM marotosg  wrote:

> Hi,
>
> I have a solr multivalued field with a list of phone numbers with many
> different formats. Below field type.
> 
> 
> 
> 
>  pattern="([^0-9])"
> replacement="" replace="all"/>
>  minGramSize="5" maxGramSize="30"
> />
> 
> 
> 
> 
>  pattern="([^0-9])"
> replacement="" replace="all"/>
>  minGramSize="3" maxGramSize="30"
> />
> 
>  class="com.spencerstuart.similarities.SpencerStuartNoSimilarity"/>
> 
>
> I have a requirement to highlight the part of the number matched to explain
> to the user why this record is returned.
>
> If I search for "17573062033" I am able to match many results but the
> fullnumber is highlighted.
>
> 
>   0
>   12
>   
> CoreID,PhoneListS
> true
> PhoneListS:17573062033
> 1463576646314
> 
> 
> PhoneListS
> xml
> true
> 3
>   
> 
> 
>   
> 
>   1757.306.2033
> 
> 10224838
>   
> 
>   1757.306.2033
> 
> 10224840
>   
> 
>   1757.306.2089
>   1757.306.7006
> 
> 10034811
> 
> 
>   
> 
>   1757.306.2033
> 
>   
>   
> 
>   1757.306.2033
> 
>   
>   
> 
>   1757.306.2089
> 
>   
> 
> 
>
> Would it be possible to get the piece of information which matches.
> Something like this 1757.306.2089
>
> thanks
> Sergio
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Highlighting-phone-numbers-tp4277491.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Facet heatmaps: cluster coordinates based on average position of docs

2016-05-18 Thread David Smiley

Sorry for such a belated response; I don't monitor this list as much as I
used to.
My response is within...

On Wed, Apr 20, 2016 at 4:28 AM Anton K.  wrote:

> Thanks for your answer, David, and have a good vacation.
>
> It seems more detailed heatmap is not a goods solution in my case because i
> need to display cluster icon with number of items inside cluster. So if i
> got very large amount of cells on map, some of the cells will overlap.
>

I did not mean to suggest you display one cluster for each non-zero heatmap
cell; I meant you funnel this as input to other client-side heatmap
renderers that do the clustering.  The point of this is to keep the number
of inputs to that renderer manageable instead of potentially a gazillion if
you have that many docs/points.

I also think about Stat component for facet.heatmap feature. Maybe we can
> use stat component to add average positions of documents in cell?
>

I think I've seen hand-rolled heatmap capabilities added to Solr (i.e. no
custom Solr hacking) that went about it kinda like that.  stats.facet on
some geohash (or similar), then average lat & average lon.

~ David


> 2016-04-20 4:28 GMT+03:00 David Smiley :
>
> > Hi Anton,
> >
> > Perhaps you should request a more detailed / high-res heatmap, and then
> > work with that, perhaps using some clustering technique?  I confess I
> don't
> > work on the UI end of things these days.
> >
> > p.s. I'm on vacation this week; so I don't respond quickly
> >
> > ~ David
> >
> > On Thu, Apr 7, 2016 at 3:43 PM Anton K.  wrote:
> >
> > > I am working with new solr feature: facet heatmaps. It works great, i
> > > create clusters on my map with counts. When user click on cluster i
> zoom
> > in
> > > that area and i might show him more clusters or documents (based on
> > current
> > > zoom level).
> > >
> > > But all my cluster icons (i use round one, see screenshot below) placed
> > > straight in the center of cluster's rectangles:
> > >
> > > https://dl.dropboxusercontent.com/u/1999619/images/map_grid3.png
> > >
> > > Some clusters can be in sea and so on. Also it feels not natural in my
> > case
> > > to have icons placed orderly on the world map.
> > >
> > > I want to place cluster's icons in average coords based on coordinates
> of
> > > all my docs inside cluster. Is there any way to achieve this? I am
> trying
> > > to use stats component for facet heatmap but it isn't implemented yet.
> > >
> > --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Issues with coordinates in Solr during updating of fields

2016-06-13 Thread David Smiley

Zheng,
There are a few Solr FieldTypes that are basically composite fields -- a
virtual field of other fields.  AFAIK they are all spatial related.  You
don't necessarily need to pay attention to the fact that gps_1_coordinate
exists under the hood unless you wish to customize the options on that
field type in the schema.  e.g. if you don't need it for filtering (perhaps
using RPT for that) then you can set indexed=false.
~ David

On Fri, Jun 10, 2016 at 8:43 PM Zheng Lin Edwin Yeo 
wrote:

> Would like to check, what is the use of the gps_0_coordinate and
> gps_1_coordinate
> field then? Is it just to store the data points, or does it have any other
> use?
>
> When I do the query, I found that we are only querying the gps_field, which
> is something like this:
> http://localhost:8983/solr/collection1/highlight?q=*:*&fq={!geofilt
> pt=1.5,100.0
> 
> sfield=gps d=5}
>
>
> Regards,
> Edwin
>
> On 27 May 2016 at 08:48, Erick Erickson  wrote:
>
> > Should be fine. When the location field is
> > re-indexed (as it is with Atomic Updates)
> > the two fields will be filled back in.
> >
> > Best,
> > Erick
> >
> > On Thu, May 26, 2016 at 4:45 PM, Zheng Lin Edwin Yeo
> >  wrote:
> > > Thanks Erick for your reply.
> > >
> > > It works when I remove the 'stored="true" ' from the gps_0_coordinate
> and
> > > gps_1_coordinate.
> > >
> > > But will this affect the search functions of the gps coordinates in the
> > > future?
> > >
> > > Yes, I am referring to Atomic Updates.
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 27 May 2016 at 02:02, Erick Erickson 
> wrote:
> > >
> > >> Try removing the 'stored="true" ' from the gps_0_coordinate and
> > >> gps_1_coordinate.
> > >>
> > >> When you say "...tried to do an update on any other fileds" I'm
> assuming
> > >> you're
> > >> talking about Atomic Updates, which require that the destinations of
> > >> copyFields are single valued. Under the covers the location type is
> > >> split and copied to the other two fields so I suspect that's what's
> > going
> > >> on.
> > >>
> > >> And you could also try one of the other types, see:
> > >> https://cwiki.apache.org/confluence/display/solr/Spatial+Search
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Thu, May 26, 2016 at 1:46 AM, Zheng Lin Edwin Yeo
> > >>  wrote:
> > >> > Anyone has any solutions to this problem?
> > >> >
> > >> > I tried to remove the gps_0_coordinate and gps_1_coordinate, but I
> > will
> > >> get
> > >> > the following error during indexing.
> > >> > ERROR: [doc=id1] unknown field 'gps_0_coordinate'
> > >> >
> > >> > Regards,
> > >> > Edwin
> > >> >
> > >> >
> > >> > On 25 May 2016 at 11:37, Zheng Lin Edwin Yeo 
> > >> wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> I have an implementation of storing the coordinates in Solr during
> > >> >> indexing.
> > >> >> During indexing, I will only store the value in the field name
> > ="gps".
> > >> For
> > >> >> the field name = "gps_0_coordinate" and "gps_1_coordinate", the
> value
> > >> will
> > >> >> be auto filled and indexed from the "gps" field.
> > >> >>
> > >> >> > >> required="false"/>
> > >> >> > >> stored="true" required="false"/>
> > >> >> > >> stored="true" required="false"/>
> > >> >>
> > >> >> But when I tried to do an update on any other fields in the index,
> > Solr
> > >> >> will try to add another value in the "gps_0_coordinate" and
> > >> >> "gps_1_coordinate". However, as these 2 fields are not
> multi-Valued,
> > it
> > >> >> will lead to an error:
> > >> >> multiple values encountered for non multiValued field
> > gps_0_coordinate:
> > >> >> [1.0,1.0]
> > >> >>
> > >> >> Does anyone knows how we can solve this issue?
> > >> >>
> > >> >> I am using Solr 5.4.0
> > >> >>
> > >> >> Regards,
> > >> >> Edwin
> > >> >>
> > >>
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: error rendering solr spatial in geoserver

2016-06-29 Thread David Smiley

For polygons in 6.0 you need to set
spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
-- see
https://cwiki.apache.org/confluence/display/solr/Spatial+Search and the
example.  And of course as you probably already know, put the JTS jar on
Solr's classpath.  What likely tripped you up between 5x and 6x is the
change in value of the spatialContextFactory as a result in organizational
package moving "com.spatial4j.core" to "org.locationtech.spatial4j".

On Wed, Jun 29, 2016 at 12:44 PM tkg_cangkul  wrote:

> hi erick, thx for your reply.
>
> i've solve this problem.
> i got this error when i use solr 6.0.0
> so i try to downgrade my solr to version 5.5.0 and it's successfull
>
>
> On 29/06/16 22:39, Erick Erickson wrote:
> > There is not nearly enough information here to say anything very helpful.
> > What does your schema look like for this field?
> > What does the input look like?
> > How are you pulling data from geoserver?
> >
> > You might want to review:
> > http://wiki.apache.org/solr/UsingMailingLists
> >
> > Best,
> > Erick
> >
> > On Wed, Jun 29, 2016 at 2:31 AM, tkg_cangkul  > > wrote:
> >
> > hi, i try to load data spatial from solr with geoserver.
> > when i try to show the layer preview i've got this error message.
> >
> > error
> >
> >
> > anybody can help me pls?
> >
> >
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: error rendering solr spatial in geoserver

2016-07-01 Thread David Smiley

Sorry, good point Era; I forgot about that.  I filed an issue:
https://issues.apache.org/jira/browse/SOLR-9270
When I work on that I'll add an upgrading note to the 6x section.

~ David

On Wed, Jun 29, 2016 at 6:31 AM Ere Maijala  wrote:

> It would have been _really_ nice if this had been in the release notes.
> Made me also scratch my head for a while when upgrading to Solr 6.
> Additionally, this makes a rolling upgrade from Solr 5.x a bit more
> scary since you have to update the collection schema to make the Solr 6
> nodes work while making sure that no Solr 5 node reloads the configuration.
>
> --Ere
>
> 30.6.2016, 3.46, David Smiley kirjoitti:
> > For polygons in 6.0 you need to set
> >
> spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
> > -- see
> > https://cwiki.apache.org/confluence/display/solr/Spatial+Search and the
> > example.  And of course as you probably already know, put the JTS jar on
> > Solr's classpath.  What likely tripped you up between 5x and 6x is the
> > change in value of the spatialContextFactory as a result in
> organizational
> > package moving "com.spatial4j.core" to "org.locationtech.spatial4j".
> >
> > On Wed, Jun 29, 2016 at 12:44 PM tkg_cangkul 
> wrote:
> >
> >> hi erick, thx for your reply.
> >>
> >> i've solve this problem.
> >> i got this error when i use solr 6.0.0
> >> so i try to downgrade my solr to version 5.5.0 and it's successfull
> >>
> >>
> >> On 29/06/16 22:39, Erick Erickson wrote:
> >>> There is not nearly enough information here to say anything very
> helpful.
> >>> What does your schema look like for this field?
> >>> What does the input look like?
> >>> How are you pulling data from geoserver?
> >>>
> >>> You might want to review:
> >>> http://wiki.apache.org/solr/UsingMailingLists
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Wed, Jun 29, 2016 at 2:31 AM, tkg_cangkul  >>> <mailto:yuza.ras...@gmail.com>> wrote:
> >>>
> >>> hi, i try to load data spatial from solr with geoserver.
> >>> when i try to show the layer preview i've got this error message.
> >>>
> >>> error
> >>>
> >>>
> >>> anybody can help me pls?
> >>>
> >>>
> >>
> >> --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
> >
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: error indexing spatial

2016-07-25 Thread David Smiley

Hi tig.  Most likely, you didn't repeat the first point as the last.  Even
though it's redundant, nonetheless this is what WKT (and some other spatial
formats) calls for.
~ David

On Wed, Jul 20, 2016 at 10:13 PM tkg_cangkul  wrote:

> hi i try to indexing spatial format to solr 5.5.0 but i've got this error
> message.
>
> [image: error1]
>
> [image: error2]
> anybody can help me to solve this pls?
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Need Help Resolving Unknown Shape Definition Error

2016-08-15 Thread David Smiley

Hello Jennifer,

The spatial documentation is largely this page:
https://cwiki.apache.org/confluence/display/solr/Spatial+Search
(however note the online version is always for the latest Solr release. You
can download a PDF versioned against your Solr version).

To do polygon searches, you both need to add the JTS jar (which you already
did), and also to set the spatialContextFactory as the ref guide indicates
-- that you have yet to do and is I think why you see that error.

Another thing I see that looks like a problem is that you set geo=false,
yet didn't set the worldBounds.  Typically geo=true and you get the typical
decimal degree +/- 180, +/- 90 box.  But if you set false then the grid
system  needs to know the extent of your grid.

~ David

On Thu, Aug 11, 2016 at 4:04 PM Jennifer Coston <
jennifer.cos...@raytheon.com> wrote:

>
> Hello,
>
> I am trying to setup a local solr core so that I can perform Spatial
> searches on it. I am using version 5.2.1. I have updated my schema.xml file
> to include the location-rpt fieldType:
>
>  class="solr.SpatialRecursivePrefixTreeFieldType"
> geo="false" distErrPct="0.025" maxDistErr="0.001"
> distanceUnits="degrees" />
>
> And I have defined my field to use this type:
>
>  stored="true" />
>
> I also added the jts-1.4.0.jar file to C:\solr-5.2.1\server\solr-webapp
> \webapp\WEB-INF\lib.
>
> However when I try to add a document through the Solr Admin Console I am
> seeing this response:
>
> {
>   "responseHeader": {
> "status": 400,
> "QTime": 6
>   },
>   "error": {
> "msg": "Unknown Shape definition [POLYGON((-77.23 38.922, -77.23
> 38.923, -77.228 38.923, -77.228 38.922, -77.23 38.922))]",
> "code": 400
>   }
> }
>
> I can submit documents successfully if I remove the positionWkt field. Did
> I miss a configuration step?
>
> Here is the document I am trying to add:
>
> {
> "observationId": "8e09f47f",
> "observationType": "image",
> "startTime": "2015-09-19T21:03:51Z",
> "endTime": "2015-09-19T21:03:51Z",
> "receiptTime": "2016-07-29T15:49:49.328Z",
> "locationLat": 38.9225015078814,
> "locationLon": -77.22900299194423,
> "position": "38.9225015078814,-77.22900299194423",
> "positionWkt": "POLYGON((-77.23 38.922, -77.23 38.923, -77.228
> 38.923, -77.228 38.922, -77.23 38.922))",
> "provider": "a"
> }
>
> Here are the fields I added to the schema.xml file (I started with the
> template, please let me know if you need the whole thing):
>
> observationId
>
> 
> 
> 
>  required="true" multiValued="false"/>
> 
> 
> 
> 
> 
> 
> 
>  stored="true" />
>
> Thank you!
>
> Jennifer

-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Sorting on DateRangeField?

2016-09-09 Thread David Smiley

Hi Alex,

DateRangeField extends some spatial stuff, which has that error message in
it, not in DateRangeField proper.  You cannot sort on a DateRangeField.  If
you want to... try adding either one plain docValues field if you just have
date instances, or a pair of them to hold a min & max and pick the right
one to sort on.

The "sorting by the query" in the context of spatial refers to doing a
score sorted sort, noting that the score of a spatial query can be the
distance or some formula involving the distance or possibly overlap of the
shape with something else.  e.g.  q={!geofilt score=distance ...}  This
is documented in the ref guide on the spatial page, including an example
for BBoxField.

&q={!field f=bbox score=overlapRatio}Intersects(ENVELOPE(-10, 20, 15, 10))

I think that example could be simpler using {!bbox} but probably wants to
show different ways to skin this cat, so to speak.

~ David

On Wed, Sep 7, 2016 at 1:49 PM Alexandre Rafalovitch 
wrote:

> So, I tried sorting on a DateRangeField. And I got back:  "Sorting not
> supported on SpatialField: release_date, instead try sorting by
> query."
>
> Two questions:
> 1) Spatial is kind of super-internal info here, the message is rather
> confusing.
> 2) What's "sorting by query" in this case? Can I still sort on the
> field, but with a different syntax?
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: request SOLR - spatial field with Intersect and Contains functions

2016-09-19 Thread David Smiley

Hi Leo,

You should use two spatial fields for this -- one is for an indexed
Box/Envelope, and another for an indexed LineString.  The indexed box
should use either BBoxField or RptWithGeometrySpatialField, and the
LineString field should use RptWithGeometrySpatialField.   If you have an
older installation 5.x version, RptWithGeometrySpatialField may not be
available in which case settle
for solr.SpatialRecursivePrefixTreeFieldType.  When you do a search, it'd
be a search for one field OR the other with the requirements you have for
each.

~ David

On Mon, Sep 19, 2016 at 8:48 AM Leo BRUVRY-LAGADEC <
leo.bruvry.laga...@partenaire-exterieur.ifremer.fr> wrote:

> Hi,
>
> I am trying spatial search in SOLR 5.0 and I don't know how to implement
> a solution for the problem I will try to explain.
>
> On a SOLR server I have indexed a collection of objects that contains
> spacial field :
>
>  multiValued="true" />
>  class="solr.SpatialRecursivePrefixTreeFieldType"
> geo="true"
> distErrPct="0.025"
> maxDistErr="0.09"
> distanceUnits="degrees" />
>
> The spatial data indexed in the field named "geo" can be ENVELOPE or
> LINESTRING :
>
> LINESTRING(-4.6837 48.5792, -4.6835 48.5788, -4.684
> 48.5788, -4.6832 48.579, -4.6837 48.5792, -4.6188 48.6265, -4.6122
> 48.63, -4.615 48.6258, -4.6125 48.6215, -4.6112 48.6218)
>
> or
>
> ENVELOPE(-5.0, -4.0, 49.0, 48.0)
>
> Actually in my application, when I do a SOLR request to get objects that
> are in a spatial area, I do something like this :
>
> q=:&fq=(geo:"Intersects(ENVELOPE(-116.894531, 107.402344, 57.433227,
> -42.146973))")
>
> But I want to change how it work. Now, when the geo field contain an
> ENVELOPE I want to do an CONTAINS request and when it contain a
> LINESTRING I want to do an INTERSECTS request.
>
> example :
>
> If geo = ENVELOPE then q=*:*&fq=(geo:"Contains(ENVELOPE(-116.894531,
> 107.402344, 57.433227, -42.146973))")
>
> If geo = LINESTRING then q=*:*&fq=(geo:"Intersects(ENVELOPE(-116.894531,
> 107.402344, 57.433227, -42.146973))")
>
> How can my application know if the field contain ENVELOPE or LINESTRING ?
>
> Any idea can this be done ?
>
> Best reguards,
> Leo.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Negative Date Query for Local Params in Solr

2016-09-20 Thread David Smiley

It should, I think... what happens? Can you ascertain the nature of the
results?
~ David

On Tue, Sep 20, 2016 at 5:35 AM Sandeep Khanzode
 wrote:

> For Solr 6.1.0
> This works .. -{!field f=schedule op=Intersects}2016-08-26T12:00:56Z
>
> This works .. {!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
> 2016-08-26T15:00:12Z]
>
>
> Why does this not work?-{!field f=schedule
> op=Contains}[2016-08-26T12:00:12Z TO 2016-08-26T15:00:12Z]
>  SRK

-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Negative Date Query for Local Params in Solr

2016-09-20 Thread David Smiley

OH!  Ok the moment the query no longer starts with "{!", the query is
parsed by defType (for 'q') and will default to lucene QParser.  So then it
appears we have a clause with a NOT operator.  In this parsing mode,
embedded "{!" terminates at the "}".  This means you can't put the
sub-query text after the "}", you instead need to put it in the special "v"
local-param.  e.g.:
-{!field f=schedule op=Contains v='[2016-08-26T12:00:12Z TO
2016-08-26T15:00:12Z]'}

On Tue, Sep 20, 2016 at 8:15 AM Sandeep Khanzode
 wrote:

> This is what I get ...
> { "responseHeader": { "status": 400, "QTime": 1, "params": { "q":
> "-{!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
> 2016-08-26T15:00:12Z]", "indent": "true", "wt": "json", "_":
> "1474373612202" } }, "error": { "msg": "Invalid Date in Date Math
> String:'[2016-08-26T12:00:12Z'", "code": 400 }}
>  SRK
>
> On Tuesday, September 20, 2016 5:34 PM, David Smiley <
> david.w.smi...@gmail.com> wrote:
>
>
>  It should, I think... what happens? Can you ascertain the nature of the
> results?
> ~ David
>
> On Tue, Sep 20, 2016 at 5:35 AM Sandeep Khanzode
>  wrote:
>
> > For Solr 6.1.0
> > This works .. -{!field f=schedule op=Intersects}2016-08-26T12:00:56Z
> >
> > This works .. {!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
> > 2016-08-26T15:00:12Z]
> >
> >
> > Why does this not work?-{!field f=schedule
> > op=Contains}[2016-08-26T12:00:12Z TO 2016-08-26T15:00:12Z]
> >  SRK
>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
>
>

-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Negative Date Query for Local Params in Solr

2016-09-20 Thread David Smiley

Personally I learned this by pouring over Solr's source code some time
ago.  I suppose the only official reference to this stuff is:
https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries
But that page doesn't address the implications for when the syntax is a
clause of a larger query instead of being the whole query (i.e. has "{!"...
but but not at the first char).

On Tue, Sep 20, 2016 at 2:06 PM Sandeep Khanzode
 wrote:

> Wow. Simply awesome!
> Where can I read more about this? I am not sure whether I understand what
> is going on behind the scenes ... like which parser is invoked for !field,
> how can we know which all special local params exist, whether we should
> prefer edismax over others, when is the LuceneQParser invoked in other
> conditions, etc? Would appreciate if you could indicate some references to
> catch up.
> Thanks a lot ...  SRK
>
>   Show original message     On Tuesday, September 20, 2016 5:54 PM, David
> Smiley  wrote:
>
>
>  OH!  Ok the moment the query no longer starts with "{!", the query is
> parsed by defType (for 'q') and will default to lucene QParser.  So then it
> appears we have a clause with a NOT operator.  In this parsing mode,
> embedded "{!" terminates at the "}".  This means you can't put the
> sub-query text after the "}", you instead need to put it in the special "v"
> local-param.  e.g.:
> -{!field f=schedule op=Contains v='[2016-08-26T12:00:12Z TO
> 2016-08-26T15:00:12Z]'}
>
> On Tue, Sep 20, 2016 at 8:15 AM Sandeep Khanzode
>  wrote:
>
> > This is what I get ...
> > { "responseHeader": { "status": 400, "QTime": 1, "params": { "q":
> > "-{!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
> > 2016-08-26T15:00:12Z]", "indent": "true", "wt": "json", "_":
> > "1474373612202" } }, "error": { "msg": "Invalid Date in Date Math
> > String:'[2016-08-26T12:00:12Z'", "code": 400 }}
> >  SRK
> >
> >On Tuesday, September 20, 2016 5:34 PM, David Smiley <
> > david.w.smi...@gmail.com> wrote:
> >
> >
> >  It should, I think... what happens? Can you ascertain the nature of the
> > results?
> > ~ David
> >
> > On Tue, Sep 20, 2016 at 5:35 AM Sandeep Khanzode
> >  wrote:
> >
> > > For Solr 6.1.0
> > > This works .. -{!field f=schedule op=Intersects}2016-08-26T12:00:56Z
> > >
> > > This works .. {!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
> > > 2016-08-26T15:00:12Z]
> > >
> > >
> > > Why does this not work?-{!field f=schedule
> > > op=Contains}[2016-08-26T12:00:12Z TO 2016-08-26T15:00:12Z]
> > >  SRK
> >
> > --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
> >
> >
> >
>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
>
>

-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Negative Date Query for Local Params in Solr

2016-09-20 Thread David Smiley

So that page referenced describes local-params, and describes the special
"v" local-param.  But first, see a list of all query parsers (which lists
"field"): https://cwiki.apache.org/confluence/display/solr/Other+Parsers
and
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser for
the "lucene" one.

The "op" param is rather unique... it's not defined by any query parser.  A
trick is done in which a custom field type (DateRangeField in this case) is
able to inspect the local-params, and thus define and use params it needs.
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates "More
DateRangeField Details" mentions "op".  {!lucene df=dateRange
op=Contains}... would also work.  I don't know of any other local-param
used in this way.

On Tue, Sep 20, 2016 at 11:21 PM David Smiley 
wrote:

> Personally I learned this by pouring over Solr's source code some time
> ago.  I suppose the only official reference to this stuff is:
>
> https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries
> But that page doesn't address the implications for when the syntax is a
> clause of a larger query instead of being the whole query (i.e. has "{!"...
> but but not at the first char).
>
> On Tue, Sep 20, 2016 at 2:06 PM Sandeep Khanzode
>  wrote:
>
>> Wow. Simply awesome!
>> Where can I read more about this? I am not sure whether I understand what
>> is going on behind the scenes ... like which parser is invoked for !field,
>> how can we know which all special local params exist, whether we should
>> prefer edismax over others, when is the LuceneQParser invoked in other
>> conditions, etc? Would appreciate if you could indicate some references to
>> catch up.
>> Thanks a lot ...  SRK
>>
>>   Show original message On Tuesday, September 20, 2016 5:54 PM, David
>> Smiley  wrote:
>>
>>
>>  OH!  Ok the moment the query no longer starts with "{!", the query is
>> parsed by defType (for 'q') and will default to lucene QParser.  So then
>> it
>> appears we have a clause with a NOT operator.  In this parsing mode,
>> embedded "{!" terminates at the "}".  This means you can't put the
>> sub-query text after the "}", you instead need to put it in the special
>> "v"
>> local-param.  e.g.:
>> -{!field f=schedule op=Contains v='[2016-08-26T12:00:12Z TO
>> 2016-08-26T15:00:12Z]'}
>>
>> On Tue, Sep 20, 2016 at 8:15 AM Sandeep Khanzode
>>  wrote:
>>
>> > This is what I get ...
>> > { "responseHeader": { "status": 400, "QTime": 1, "params": { "q":
>> > "-{!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
>> > 2016-08-26T15:00:12Z]", "indent": "true", "wt": "json", "_":
>> > "1474373612202" } }, "error": { "msg": "Invalid Date in Date Math
>> > String:'[2016-08-26T12:00:12Z'", "code": 400 }}
>> >  SRK
>> >
>> >On Tuesday, September 20, 2016 5:34 PM, David Smiley <
>> > david.w.smi...@gmail.com> wrote:
>> >
>> >
>> >  It should, I think... what happens? Can you ascertain the nature of the
>> > results?
>> > ~ David
>> >
>> > On Tue, Sep 20, 2016 at 5:35 AM Sandeep Khanzode
>> >  wrote:
>> >
>> > > For Solr 6.1.0
>> > > This works .. -{!field f=schedule op=Intersects}2016-08-26T12:00:56Z
>> > >
>> > > This works .. {!field f=schedule op=Contains}[2016-08-26T12:00:12Z TO
>> > > 2016-08-26T15:00:12Z]
>> > >
>> > >
>> > > Why does this not work?-{!field f=schedule
>> > > op=Contains}[2016-08-26T12:00:12Z TO 2016-08-26T15:00:12Z]
>> > >  SRK
>> >
>> > --
>> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> > http://www.solrenterprisesearchserver.com
>> >
>> >
>> >
>>
>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>>
>>
>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Migrating to Solr 6.1.0 from 5.5.0

2016-09-29 Thread David Smiley

Arjun,

Your input is a POLYGON -- as seen in the error message.  The "Try JTS" was
hopefully a clue -- on
https://cwiki.apache.org/confluence/display/solr/Spatial+Search search for
"JTS" and you should see how to set the spatialContextFactory to JTS, and a
mention of needing JTS jar.  I'll try and add a bit more info on suggesting
exactly where to put it and a download link.  I'll also mention a shortcut
so you don't have to type out the classname -- a recent feature in 6.2.

Since you said you were upgrading... presumably your spatialContextFactory
attribute was already set for this to work at all in 5.5?  The package
reference changed for this value -- I imagine you would have seen a
warning/error to this effect in Solr's logs.  Do you?

~ David

On Tue, Sep 27, 2016 at 10:29 AM William Bell  wrote:

> the documentation is not good on this. Not sure how to fix it either.
>
> On Tue, Sep 27, 2016 at 3:41 AM, M, Arjun (Nokia - IN/Bangalore) <
> arju...@nokia.com> wrote:
>
> > Hi,
> >
> > We are getting the below errors when migrating Solr from 5.5.0 to
> > 6.1.0. Could anyone help in resolving the issue, if you have come across
> > this?
> >
> >
>  org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> > Error from server at http://127.0.0.1:41569/solr/collection1: Unable to
> > parse shape given formats "lat,lon", "x y" or as WKT because
> > java.text.ParseException: java.lang.UnsupportedOperationException:
> > Unsupported shape of this SpatialContext. Try JTS or Geo3D. input:
> > POLYGON((-10 30, -40 40, -10 -20, 0 0, -10 30))
> >
> > Thanks in advance..
> >
> > Thanks & Regards,
> >Arjun M
> >
> >
> >
> >
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Heatmap in JSON facet API

2016-11-01 Thread David Smiley

I plan on adding this in the near future... hopefully for Solr 6.4.

On Mon, Oct 31, 2016 at 7:06 AM Никита Веневитин 
wrote:

> I've built query as described in https://cwiki.apache.org/confluence/x/ZYDxAQ";>Heatmap Faceting,
> but I would like to get same results using JSON facet API
>
> 2016-10-30 15:24 GMT+03:00 GW :
>
> > If we are talking about the same kind of heat maps you might want to look
> > at the TomTom map API for a quick and dirty yet solid solution. Just
> supply
> > a whack of coordinates and let TomTom do the work. The Heat maps will
> zoom
> > in and de-cluster.
> >
> > Example below.
> >
> > http://www.frogclassifieds.com/tomtom/markers-clustering.html
> >
> >
> > On 28 October 2016 at 09:05, Никита Веневитин  >
> > wrote:
> >
> > > Hi!
> > > Is it possible to use JSON facet API to get heatmaps?
> > >
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

How-To: Secure Solr by IP Address

2016-11-04 Thread David Smiley

I was just researching how to secure Solr by IP address and I finally
figured it out. Perhaps this might go in the ref guide but I'd like to
share it here anyhow. The scenario is where only "localhost" should have
full unfettered access to Solr, whereas everyone else (notably web clients)
can only access some whitelisted paths. This setup is intended for a
single instance of Solr (not a member of a cluster); the particular config
below would probably need adaptations for a cluster of Solr instances. The
technique here uses a utility with Jetty called IPAccessHandler --
http://download.eclipse.org/jetty/stable-9/apidocs/org/eclipse/jetty/server/handler/IPAccessHandler.html
For reasons I don't know (and I did search), it was recently deprecated and
there's another InetAccessHandler (not in Solr's current version of Jetty)
but it doesn't support constraints incorporating paths, so it's a
non-option for my needs.

First, Java must be told to insist on it's IPv4 stack. This is because
Jetty's IPAccessHandler simply doesn't support IPv6 IP matching; it throws
NPEs in my experience. In recent versions of Solr, this can be easily done
just by adding -Djava.net.preferIPv4Stack=true at the Solr start
invocation. Alternatively put it into SOLR_OPTS perhaps in solr.in.sh.

Edit server/etc/jetty.xml, and replace the line
mentioning ContextHandlerCollection with this:

127.0.0.1
-.-.-.-|/solr/techproducts/select

false

This mechanism wraps ContextHandlerCollection (which ultimately serves
Solr) with this handler that adds the constraints. These constraints above
allow localhost to do anything; other IP addresses can only access
/solr/techproducts/select. That line could be duplicated for other
white-listed paths -- I recommend creating request handlers for your use,
possibly with invariants to further constraint what someone can do.

note: I originally tried inserting the IPAccessHandler in
server/contexts/solr-jetty-context.xml but found that there's a bug in
IPAccessHanlder that fails to consider when HttpServletRequest.getPathInfo
is null. And it wound up letting everything through (if I recall). But I
like it up in server.xml anyway as it intercepts everything

~ David

--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: How-To: Secure Solr by IP Address

2016-11-04 Thread David Smiley

Not to knock the other suggestions, but a benefit to securing Jetty like
this is that *everyone* can do this approach.

On Fri, Nov 4, 2016 at 9:54 AM john saylor  wrote:

> hi
>
> any firewall worth it's name should be able to do this. in fact, that is
> one of several things that a firewall was designed to do.
>
> also, you are stopping this traffic at the application, which is good;
> but you'd prolly be better off stopping it at the network interface
> [using a firewall, for instance].
>
> of course, firewalls have their own complexity ...
>
> good luck!
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Highlighter not working on some documents

2017-06-11 Thread David Smiley

Probably the most common reason is the default hl.maxAnalyzedChars -- thus
your highlightable text might not be in the first 51200 chars of text.  The
first Solr release with the unified highlighter had an even lower default
of 10k chars.

On Fri, Jun 9, 2017 at 9:58 PM Phil Scadden  wrote:

> Tried hard to find difference between pdfs returning no highlighter and
> ones that do for same search term.  Includes pdfs that have been OCRed and
> ones that were text to begin with. Head scratching to me.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, 10 June 2017 6:22 a.m.
> To: solr-user 
> Subject: Re: Highlighter not working on some documents
>
> Need lots more information. I.e. schema definitions, query you use,
> handler configuration and the like. Note that highlighted fields must have
> stored="true" set and likely the _text_ field doesn't. At least in the
> default schemas stored is set to false for the catch-all field.
> And you don't want to store that information anyway since it's usually the
> destination of copyField directives and you'd highlight _those_ fields.
>
> Best,
> Erick
>
> On Thu, Jun 8, 2017 at 8:37 PM, Phil Scadden  wrote:
> > Do a search with:
> > fl=id,title,datasource&hl=true&hl.method=unified&limit=50&page=1&q=pre
> > ssure+AND+testing&rows=50&start=0&wt=json
> >
> > and I get back a good list of documents. However, some documents are
> returning empty fields in the highlighter. Eg, in the highlight array have:
> > "W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}
> >
> > Getting this well up the list of results with good highlighted matchers
> above and below this entry. Why would the highlighter be failing?
> >
> > Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Issue with highlighter

2017-06-14 Thread David Smiley

> Beware of NOT plus OR in a search. That will certainly produce no
highlights. (eg test -results when default op is OR)

Seems like a bug to me; the default operator shouldn't matter in that case
I think since there is only one clause that has no BooleanQuery.Occur
operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
effectively required and should definitely be highlighted.

Note to Ali: Phil's comment implies use of hl.method=unified which is not
the default.

On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden  wrote:

> Just had similar issue - works for some, not others. First thing to look
> at is hl.maxAnalyzedChars is the query. The default is quite small.
> Since many of my documents are large PDF files, I opted to use
> storeOffsetsWithPositions="true" termVectors="true" on the field I was
> searching on.
> This certainly did increase my index size but not too bad and certainly
> fast.
> https://cwiki.apache.org/confluence/display/solr/Highlighting
>
> Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
>
> -Original Message-
> From: Ali Husain [mailto:alihus...@outlook.com]
> Sent: Thursday, 15 June 2017 11:11 a.m.
> To: solr-user@lucene.apache.org
> Subject: Issue with highlighter
>
> Hi,
>
>
> I think I've found a bug with the highlighter. I search for the word
> "something" and I get an empty highlighting response for all the documents
> that are returned shown below. The fields that I am searching over are
> text_en, the highlighter works for a lot of queries. I have no
> stopwords.txt list that could be messing this up either.
>
>
>  "highlighting":{
> "310":{},
> "103":{},
> "406":{},
> "1189":{},
> "54":{},
> "292":{},
> "309":{}}}
>
>
> Just changing the search term to "something like" I get back this:
>
>
> "highlighting":{
> "310":{},
> "309":{
>   "content":["1949 Convention, like those"]},
> "103":{},
> "406":{},
> "1189":{},
> "54":{},
> "292":{},
> "286":{
>   "content":["persons in these classes are treated like
> combatants, but in other respects"]},
> "336":{
>   "content":["   be treated like engagement"]}}}
>
>
> So I know that I have it setup correctly, but I can't figure this out.
> I've searched through JIRA/Google and haven't been able to find a similar
> issue.
>
>
> Any ideas?
>
>
> Thanks,
>
> Ali
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Polygon search query working but NOT Multipolygon

2017-06-28 Thread David Smiley

Hi Puneeta,

So what does your field type definition look like?  I'd imagine you're using 
RptWithGeometrySpatialField.  And what is your Solr version?

BTW note the settings here 
https://locationtech.github.io/spatial4j/apidocs/org/locationtech/spatial4j/context/jts/JtsSpatialContextFactory.html
 

  are reflected as attributes on the field type, thus you can set say 
useJtsMulti="false" to change the 'multi implementation.

~ David

> On Jun 28, 2017, at 6:44 AM, puneeta  wrote:
> 
> Hi,
> I am new to Solr Geospatial data and have set up JTS within solr. I have
> geo spatial data with Multipolygons. I am passing the coordinates and trying
> to find out which multipolygon contains those coordinates.However, The
> search query is working fine if I insert the data as a polygon. The same is
> not working if my data is inserted as a Multipolygon. I am unable to figure
> out what am I missing. Can anyone suggest where am I going wrong?
> 
> Data as Polygon:
> { "parcel_id":"6",
>"geo":["POLYGON((-86.452970463 32.449739005, 
>  -86.452889912 32.4494390510001, 
>  -86.453365379 32.44942802195, 
>  -86.453514854 32.44942453595))"]
> }
> 
> Data as Multipolygon:
> 
> { "parcel_id":"6",
>"geo":["MULTIPOLYGON(((-86.452970463 32.449739005, 
>  -86.452889912 32.4494390510001, 
>  -86.453365379 32.44942802195, 
>  -86.453514854 32.44942453595)))"]
> }
> 
> My search query:
> fq=geo:"Intersects(-86.453097892 32.449735102)"
> 
> This device surely lies between the polygon (My polygon coordinates are many
> more in the actual data. To reduce the size here I have omited few of the
> coordinates)
> 
> The query is returning only the polygon data. The multipolygon search is not
> happening.
> 
> Any help is highly appreciated.
> 
> Thanks in Advance,
> Puneeta
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Polygon-search-query-working-but-NOT-Multipolygon-tp4343143.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Polygon search query working but NOT Multipolygon

2017-06-28 Thread David Smiley

I suggest using RptWithGeometry field, and with that change remove distErrPct 
and maxDistErr.  See the ref guide, and note the geometry cache option.
BTW spatialContextFactory can simply be "jts".

If this fixes the issue, then the issue was related to grid approximation.

BTW you never quite said what it was about the results that was wrong.  Did you 
get hits you didn't expect (I'm guessing yes) or the inverse?

~ David

> On Jun 28, 2017, at 10:55 AM, puneeta  wrote:
> 
> Hi David,
> Thank you for the prompt reply. My field definition in schema.xml is :
> 
> I commented the existing location_rpt
> 
> 
> 
> And added:
>  class="solr.SpatialRecursivePrefixTreeFieldType"
> 
> spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
>   autoIndex="true"
>   validationRule="repairBuffer0"
>   distErrPct="0.025"
>   maxDistErr="0.001"
>   distanceUnits="kilometers" />
> 
> My Solr version is 6.2.1
> 
> Thanks,
> Puneeta
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Polygon-search-query-working-but-NOT-Multipolygon-tp4343143p4343162.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 5.5 - spatial intersects query returns results outside of search box

2017-06-28 Thread David Smiley

> On Jun 27, 2017, at 3:28 AM, Leila Gonzales  wrote:
> 
> {
> 
>"id": "5230",
> 
>"location_geo":
> ["ENVELOPE(-75.0,-75.939723,39.3597224,38.289722)"]
> 
>  }

This is an unusual rectangle.  Remember this is minX, maxX, maxY, minY.  Thus 
this rectangle wraps the entire globe except for nearly a degree.  It matches 
your query rectangle.

Re: Solr 5.5 - spatial intersects query returns results outside of search box

2017-06-28 Thread David Smiley

No prob.

BTW you may want to investigate use of BBoxField or 
RptWithGeometrySpatialField; both are also more accurate... but vanilla RPT may 
be just fine (fastest).


> On Jun 28, 2017, at 11:32 AM, Leila Gonzales  wrote:
> 
> Thanks David! I fixed the coordinates and put some error checking in my
> Solr indexing script to trap for this type of coordinate mismatch.
> 
> -Original Message-----
> From: David Smiley [mailto:david.w.smi...@gmail.com]
> Sent: Wednesday, June 28, 2017 8:21 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 5.5 - spatial intersects query returns results outside
> of search box
> 
> 
>> On Jun 27, 2017, at 3:28 AM, Leila Gonzales  wrote:
>> 
>> {
>> 
>>   "id": "5230",
>> 
>>   "location_geo":
>> 
> ["ENVELOPE(-75.0,-75.939723,39.3597224,38.289722)"
> ]
>> 
>> }
> 
> This is an unusual rectangle.  Remember this is minX, maxX, maxY, minY.
> Thus this rectangle wraps the entire globe except for nearly a degree.  It
> matches your query rectangle.

Re: Polygon search query working but NOT Multipolygon

2017-06-28 Thread David Smiley

https://lucene.apache.org/solr/guide/6_6/spatial-search.html#SpatialSearch-RptWithGeometrySpatialField
 



> On Jun 28, 2017, at 11:32 AM, puneeta  wrote:
> 
> Hi David,
> I am sorry ,I did not understand what do you mean by "I suggest using
> RptWithGeometry field". Should leave the existing location_rpt definition in
> schema.xml?
>  class="solr.SpatialRecursivePrefixTreeFieldType"
>   geo="true" distErrPct="0.025" maxDistErr="0.001"
> distanceUnits="kilometers" />
> This line I have commented. Should I uncomment it?
> 
> 1."remove distErrPct and maxDistErr" - 
> 2.Added usejtsMulti="false"
> 
> I will change the  field definition as follows, try to execute and report
> back.
>class="solr.SpatialRecursivePrefixTreeFieldType" 
> 
> jts*="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory" 
>  autoIndex="true"
>  validationRule="repairBuffer0"
>  distanceUnits="kilometers" 
>  *useJtsMulti="false"*/> 
> 
> 
> The issue I am facing is that the I am not getting the search result for
> Multipolygon i.e I should get hits.Currently, the numFound = 0, It should
> find atleast 1 record as it does for a Polygon search.
> 
> Thanks,
> Puneeta
> 
> david.w.smi...@gmail.com  wrote
>> I suggest using RptWithGeometry field, and with that change remove
>> distErrPct and maxDistErr.  See the ref guide, and note the geometry cache
>> option.
>> BTW spatialContextFactory can simply be "jts".
>> 
>> If this fixes the issue, then the issue was related to grid approximation.
>> 
>> BTW you never quite said what it was about the results that was wrong. 
>> Did you get hits you didn't expect (I'm guessing yes) or the inverse?
>> 
>> ~ David
>> 
>>> On Jun 28, 2017, at 10:55 AM, puneeta <
> 
>> pverma@
> 
>> > wrote:
>>> 
>>> Hi David,
>>> Thank you for the prompt reply. My field definition in schema.xml is :
>>> 
>>> I commented the existing location_rpt
>>> 
>>> 
>>> 
>>> And added:
>>> 
>> >> 
>> class="solr.SpatialRecursivePrefixTreeFieldType"
>>> 
>>> spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
>>>  autoIndex="true"
>>>  validationRule="repairBuffer0"
>>>  distErrPct="0.025"
>>>  maxDistErr="0.001"
>>>  distanceUnits="kilometers" />
>>> 
>>> My Solr version is 6.2.1
>>> 
>>> Thanks,
>>> Puneeta
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Polygon-search-query-working-but-NOT-Multipolygon-tp4343143p4343162.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Polygon-search-query-working-but-NOT-Multipolygon-tp4343143p4343184.html
>  
> 
> Sent from the Solr - User mailing list archive at Nabble.com 
> .

Re: Spatial Search based on the amount of docs, not the distance

2017-06-28 Thread David Smiley

Deniz didn't mention document-to-document distance sort but he/she didn't say 
it wasn't that case either.

Any way, FYI at the Lucene level with LatLonPoint there is some sophisticated 
BKD search code to efficiently return the top N distance ordered documents 
(where you supply N).  Although as far as I recall, it also has no filtering 
mechanism, so if you have any other filters (keyword/time/whatever), it 
wouldn't work.

I once did this feature on an RPT index for a client and I got the open-source 
permission but I haven't gotten around to properly adding it to Solr.  I might 
approach it a bit differently now.

~ David

> On Jun 22, 2017, at 8:34 PM, Tim Casey  wrote:
> 
> deniz,
> 
> I was going to add something here.  The reason what you want is probably
> hard to do is because you are asking solr, which stores a document, to
> return documents using an attribute of document pairs.  As only a though
> exercise, if you stored record pairs as a single document, you could
> probably query it directly.  That is, if you have d1 and d2 and you are
> querying  around d1 and ordering by distance, then you could get this
> directly from a document representing a record pair.  I don't think this is
> practical, because it is an n^2 store.
> 
> Since the n^2 problem is there, people are going to suggest some heuristic
> which avoids this problem.  What Erick is suggesting is down this path.
> Query around a point and sort by distance taking the top K results.  The
> result is taking a linear slice of the n^2 distance attribute.
> 
> tim
> 
> 
> 
> On Wed, Jun 21, 2017 at 7:50 PM, Erick Erickson 
> wrote:
> 
>> Would it serve to sort by distance? True, if you matched a zillion
>> documents within a 1km radius you'd still perform the distance calcs, but
>> the result would be a manageable number.
>> 
>> I have to ask "Why to you care?". Is this an efficiency question (i.e. you
>> want to keep Solr from having to do expensive work) or is it a question of
>> having to get hits at all? It's at least possible that the solution for one
>> is not the solution for the other.
>> 
>> Best,
>> Erick
>> 
>> On Wed, Jun 21, 2017 at 5:32 PM, deniz  wrote:
>> 
>>> it is for sure possible to use d value for limiting the distance,
>> however,
>>> it
>>> might not be very efficient, as some of the coords may not have any docs
>>> around for a large value of d... so it is hard to determine a default
>> value
>>> for d.
>>> 
>>> though it sounds like havinga default d and gradual increments on its
>> value
>>> might be a work around for top K results...
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -
>>> Zeki ama calismiyor... Calissa yapar...
>>> --
>>> View this message in context: http://lucene.472066.n3.
>>> nabble.com/Spatial-Search-based-on-the-amount-of-docs-not-the-distance-
>>> tp4342108p4342258.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>>

Re: Polygon search query working but NOT Multipolygon

2017-06-28 Thread David Smiley

This polygon is fairly rectangular with one side having a ton of points.
Nonetheless the query point is clearly far apart from it (it's much lower
(smaller 'y' dimension).

On Wed, Jun 28, 2017 at 10:17 PM puneeta  wrote:

> Hi David,
>   Actually my polygon had too many coordinates, so i just omitted some
> while
> posting my query. Here is my complete multipolygon where the last point is
> same as the first one:
>
> 
> MULTIPOLYGON (((-86.477551331 32.490605651,
> -86.477637350 32.4903921820001, -86.478257247 32.4905655910001,
> -86.478250466 32.4905802390001, -86.478243988 32.49059368096,
> -86.47823751 32.490607122, -86.478231749 32.49061910096, -86.478224637
> 32.4906340650001, -86.478218237 32.490647541, -86.478211847
> 32.49066103595, -86.478205478 32.4906745260001, -86.47820210799989
> 32.4906816669, -86.478199132 32.4906880240001, -86.478192825
> 32.490701523, -86.478186533 32.490715047, -86.478183209 32.4907222090001,
> -86.4781802789 32.4907285690001, -86.478174063 32.4907421250001,
> -86.478167851 32.4907556540001, -86.478162558 32.49076723696,
> -86.47815905399989 32.490774513000105, -86.477551331 32.490605651)))
> 
> 
>
> Thanks,
> Puneeta
>
>
>
>
> david.w.smi...@gmail.com wrote
> > I tried your data in the "JTS TestBuilder" GUI.  Firstly, your polygon
> > isn't "closed", but that was easily fixed by repeating the first point at
> > the end.  See the attached screenshot of the GUI for what these shapes
> > look like.  The red dot (the query point) is outside of this
> > triangular-ish shape, and thus not a match.
> >
> >
> >
> >
> >> On Jun 28, 2017, at 12:33 PM, puneeta <
>
> > pverma@
>
> > > wrote:
> >>
> >> Hi David,
> >>  I did the following changes:
> >>
> >> Changed in schema.xml:
> >>
> >  >>
> >
> >>
> spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
> >>   autoIndex="true"
> >>   validationRule="repairBuffer0"
> >>   distanceUnits="kilometers"
> >> useJtsMulti="false"
> >> />
> >>
> >>
> >> Added in solrconfig.xml:
> >>
> >  >>
> >class="solr.LRUCache"
> >>   size="256"
> >>   initialSize="0"
> >>   autowarmCount="100%"
> >>   regenerator="solr.NoOpRegenerator"/>
> >>
> >> My fields in the core as defined in the schema is:
> >> <
> http://lucene.472066.n3.nabble.com/file/n4343221/SolrGeoFieldDefinition.png>
> ;
> >>
> >> However, I still face the same issue. No results found for a
> multipolygon
> >> search.
> >>
> >> Not sure whats happening :(
> >>
> >> Puneeta
> >>
> >>
> >>
> >>
> >>
> >>
>
> > david.w.smiley@
>
> >  wrote
> >>>
> https://lucene.apache.org/solr/guide/6_6/spatial-search.html#SpatialSearch-RptWithGeometrySpatialField
> >>> <
> https://lucene.apache.org/solr/guide/6_6/spatial-search.html#SpatialSearch-RptWithGeometrySpatialField>
> ;
> >>>
> >>>
>  On Jun 28, 2017, at 11:32 AM, puneeta <
> >>
> >>> pverma@
> >>
> >>> > wrote:
> 
>  Hi David,
>  I am sorry ,I did not understand what do you mean by "I suggest using
>  RptWithGeometry field". Should leave the existing location_rpt
>  definition
>  in
>  schema.xml?
> 
> >>>
> >  >>
> >>>
> >>> class="solr.SpatialRecursivePrefixTreeFieldType"
>   geo="true" distErrPct="0.025" maxDistErr="0.001"
>  distanceUnits="kilometers" />
>  This line I have commented. Should I uncomment it?
> 
>  1."remove distErrPct and maxDistErr" -
>  2.Added usejtsMulti="false"
> 
>  I will change the  field definition as follows, try to execute and
>  report
>  back.
> 
> >>>
> >  >>
> >>>
> >>>   class="solr.SpatialRecursivePrefixTreeFieldType"
> 
> 
> >> *
> >>> jts*="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
>    autoIndex="true"
>    validationRule="repairBuffer0"
>    distanceUnits="kilometers"
>    *useJtsMulti="false"*/>
> 
> 
>  The issue I am facing is that the I am not getting the search result
>  for
>  Multipolygon i.e I should get hits.Currently, the numFound = 0, It
>  should
>  find atleast 1 record as it does for a Polygon search.
> 
>  Thanks,
>  Puneeta
> 
> 
> >>
> >>> david.w.smiley@
> >>
> >>>  >>
> >>> david.w.smiley@
> >>
> >>> > wrote
> > I suggest using RptWithGeometry field, and with that change remove
> > distErrPct and maxDistErr.  See the ref guide, and note the geometry
> > cache
> > option.
> > BTW spatialContextFactory can simply be "jts".
> >
> > If this fixes the issue, then the issue was related to grid
> > approximation.
> >
> > BTW you never quite said what it was about the results that was
> wrong.
> > Did you get hits you didn't expect (I'm guessing yes) or the inverse?
> >
> > ~ Davi

Re: Not highlighting "and" and "or"?

2017-06-28 Thread David Smiley

Hi Walter,
No they are not.  Does debug=query show that these words are in your parsed
query?

On Wed, Jun 28, 2017 at 5:13 PM Walter Underwood 
wrote:

> Is there some special casing in the highlighter to skip query syntax
> words? The words “and” and “or” don’t get highlighted.
>
> This is in 6.5.0.
>
>question
>html
>440
>fastVector
>1
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Issue: Hit Highlighting Working Inconsistently in Solr 6.6

2017-07-14 Thread David Smiley

Does hl.method=unified help any?

Perhaps you need to set hl.fl?  or hl.requireFieldMatch=false? (although it
should default to false already)

On Fri, Jul 14, 2017 at 6:52 PM Vikram Oberoi  wrote:

> Hi!
>
> Just wanted to close the loop here.
>
> I'm pretty sure this has something to do with the default _text_ "catchall"
> field being a slightly differently type ('text_general') from all my
> textual fields ('text_en'). A few things I tried support that hypothesis:
>
> - Specifying fields for terms correctly yields highlights consistently
> (e.g. "hello" doesn't work but "subject:hello" always does).
> - Creating a different catchall field with same type as all my textual
> fields ('text_en') and making that the default field yields highlighting
> results that work properly and consistently.
> - Finally -- I need to use a friendlier parser anyway. Using edismax for
> all my queries -- and eliminating my catchall field -- yields highlighting
> results properly and consistently.
>
> I've got this working, but I'm curious to know if this is what's happening
> more around precisely why. If anyone more knowledgable has thoughts or
> pointers to writing on how highlighting works internally, I'd really
> appreciate it!
>
> Cheers,
> Vikram
>
> On Thu, Jul 13, 2017 at 5:51 PM, Vikram Oberoi  wrote:
>
> > Hi there,
> >
> > I'm seeing inconsistent highlighting behavior using a default, fresh Solr
> > 6.6 install and it's unclear to me why or how to go about debugging it.
> >
> > Hit highlights either show entirely correct highlights or none at all
> when
> > there should be highlights.
> >
> >- Some queries show highlights out of the box, some do not.
> >   - e.g. "hello" yields no highlights, but "goodbye" correctly yields
> >   highlights
> >- Some queries that do not show highlights suddenly work when
> >specifying fields
> >   - e.g. "subject:hello" yields highlights, but "hello" does not
> >- When queries that yield highlights and queries that do not are
> >combined, only those that work are highlighted.
> >   - e.g. "hello goodbye" yields highlights correctly for "goodbye",
> >   but not for "hello"
> >
> > I've thrown specific details and examples in a Gist here:
> >
> > Full Gist: https://gist.github.com/voberoi/a7a8a679390fc4f27422e7
> > 0600cfb338
> >
> >- Problem description:
> >   - https://gist.github.com/voberoi/a7a8a679390fc4f27422e70600cf
> >   b338#file-problem-details-md
> >- Solr install, my schema, solrconfig details:
> >   - https://gist.github.com/voberoi/a7a8a679390fc4f27422e70600cf
> >   b338#file-solr-details-md
> >
> > Does anyone here have any hypotheses for why this might be happening?
> >
> > Thanks!
> > Vikram
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: The unified highlighter html escaping. Seems rather extreme...

2017-07-20 Thread David Smiley

The escaping does appear excessive.  Please file a bug to the Lucene
project in Apache JIRA.

On Fri, May 26, 2017 at 11:26 AM Michael Joyner  wrote:

> Isn't the unified html escaper a rather bit extreme in it's escaping?
>
> It makes it hard to deal with for simple post-processing.
>
> The original html escaper seems to do minimial escaping, not every
> non-alphabetical character it can find.
>
> Also, is there a way to control how much text is returned as context
> around the highlighted frag?
>
> Compare:
>
>
> Unified Snippet:
> [
Hepatoblastoma PRETEXT Stage 1 Hepatoblastoma PRETEXT Stage 2 Hepatoblastoma PRETEXT Stage 3 Hepatoblastoma PRETEXT Stage 4 Hepatoblastoma
Children's Oncology Group
CCTO@mssm.edu CIO_Web@cancercare.mb.ca CTO@uthscsa.edu FH.Cancer.Research@flhosp.org HamblinF@allkids.org HopeBeginsHere@providence.org JME3D@hscmail.mcc.virginia.edu LByatt@nmcca.org Stephanie.VanBebber@inova.org aecc@aecom.yu.edu bernicl@sutterhealth.org cancer.research.nurse@dartmouth.edu cancer.research@umassmed.edu cancerclinicaltrials@med.unc.edu cancerinfo@baptisthealth.net cancertrialsinfo@hsc.wvu.edu catherine.cole@health.wa.gov.au ccto-office@stanford.edu clevy4@northwell.edu crcwm-regulatory@crcwm.org ctsucontact@westat.com cywhs.oncsec@health.sa.gov.au global@insti.kitasato-u.ac.jp helpdesk@childrensoncologygroup.org info@siteman.wustl.edu info@stjude.org info@thechildren.com kcovert@saintpetersuh.com lisa.hartman@ttuhsc.edu mamcdci@amedd.army.mil mcdonagd@slhs.org ou-clinical-trials@ouhsc.edu pridgely@lifebridgehealth.org prmc.coordinator@nyumc.org research4kids@ucalgary.ca research@stvincent.org trials@ohsu.edu wrighd@mmc.org www-admin@ncc.go.jp
Alan K. Ikeda Albert Kheradpour Alice Lee Alyssa T. Reddy Amal M. Abu-Ghosh Anne-Marie R. Langevin Arthur K. Ritchey Ashley E. Meyer Atsushi Kikuta Ayman A. El-Sheikh Barbara A. Gruner Bassem I. Razzouk Beng R. Fuh Bethany G. Sleckman Birte Wistinghausen Brenda J. Weigel Brian S. Greffe Bruno Michon Carla B. Golden Carlos Rodriguez-Galindo Carol Portwine Carola A. Arndt Carolyn F. Levy Chitose Ogawa Christopher Mpofu Christopher P. Keuker Conrad V. Fernandez Craig Lotterman Daniel J. Greenfield David B. Dix David B. Wilson David L. Becton David S. Dickens Denise A. Rokitka Donna L. Johnston Douglas J. Scothorn Douglas R. Strother Douglas S. Hawkins Doured Daghistani Eiso Hiyama Elizabeth Fox Elyssa M. Rubin Emad K. Salman Enrique A. Escalon Eric C. Larsen Eric J. Lowe Eugene Suh Eugenia Chang Fataneh (Fae) Majlessipour Fouad M. Hajjar Gail C. Megason Gita V. Massey Gregory A. Hale Gregory E. Halligan Gregory P. Brandt Guy H. Grayson Hector M. Rodriguez-Cortes Howard Katzenstein Iftikhar Hanif J. M. Johnston Jacqueline M. Kraveka James I. Geller Janice F. Olson Jason M. Fixler Jeffrey R. Andolina Jeffrey S. Dome Jeffrey W. Taub Jennifer J. Clark Jennifer J. Greene Welch Jessica A. Bell Jessica Boklan Jessica C. Hochberg Joel A. Kaplan John J. Doyle Jonathan B. Gill Jonathan Bernstein Jonathan E. Wickiser Judy L. Felgenhauer Julio C. Barredo Katharina E. Elliott Kayelyn J. Wagner Keith J. August Kenneth B. De Santes Kenneth G. Lucas Kimberly P. Dunsmore Koh B. Boayue Lars M. Wagner Lauren J. Akers Leo Mascarenhas Linda C. Stork Lisa L. Hartman Lisa M. Kopp Lisa M. McGregor Lolie C. Yu Luis A. Clavell Mandy M. Atkinson Marcio H. Malogolowkin Maria L. Kirby Mariana P. Silva Marianne B. Phillips Mariko Sato Mark A. Ranalli Mark E. Weinblatt Mark J. Mogul Marshall A. Schorin Mary A. Bonilla Mary L. Schmidt Melanie A. Comito Melissa A. Forouhar Michael B. Harris Michael K. Richards Michael S. Isakoff Minnie Abromowitch Mohamad M. Al-Rahawan Muna Qayed Narayana Gowda Nibal A. Zaghloul Nichole L. Bryant Nina S. Kadan-Lottick Nkechi I. Mba Pamela H. Kempert Patrick A. Thompson Paul D. Harker-Murray Pedro A. De Alarcon Peter G. Steinherz Phillip E. Barnette Rama Jasty Ramamoorthy Nagasubramanian Rebecca E. McFall Rene Y. McNall-Knapp Renuka Gera Richard A. Drachtman Robert E. Goldsby Robert G. Irwin Robert J. Fallon Robert M. Cooper Ronnie W. Neuberg Sam W. Lew Sara Chaffee Scott C. Borinstein Sharon B. Abish Sharon L. Gardner Sheri L. Spunt Stanton C. Goldman Stephan R. Paul Steven J. Kuerbitz Steven K. Bergstrom Steven L. Halpern Stuart H. Gold Susan G. Kreissman Susan L. Cohn Susumu Inoue Thomas B. Russell Tsugumichi Koshinaga Vikramjit S. Kanwar Vincent F. Giusti Vinod K. Gidvani-Diaz Virginia L. Harrod Vonda L. Crouse Wade T. Kyono Wayne L. Furman Wendy L. Woods-Swafford William D. Roberts Yasmin C. Gosiengfiao Yasuhiro Okamoto Yousif (Joe) H. Matloub Yuji Ishida Yung S. Yim Yvan Samson
(08) 8161 7327 (08) 9340 8222 -12080 -5404 -5975 -6185 201-996-2879 202-444-2223 202-884-2549 203-785-5702 205-638-9285 207-396-8090 208-381-3376 210-450-3800 210-575-7000 212-263-4434 212-305-8615 212-639-7202 212-824-7309 214-648-7097 215-427-8991 215-590-2810 216-844-5437 217-545-7929 239-343-5333 252-744-2161 253-968-0129 254-724-5407 304-293-7374 304-388-9944 305-243-2647 306-

Re: Spatial search with arbitrary rectangle?

2017-08-29 Thread David Smiley

Hi,

The "rectangular area" refers to a hypothetical map UI.  In this scenario,
the UI ought to communicate the lat-lon of each corner.  The geofilt and
bbox query parsers don't handle that; they only take a point and distance.

RE projections: You may or may not need to care depending on exactly what
you're doing.  Most people by far don't need to care, I've found.
Basically:  If geo="true" on the spatial field (the default), then you work
in decimal degrees latitude,longitude.  Point-distance queries (i.e.
circles) use spherical geometry.  When geo="false", the units are whatever
you want them to be (there is no transformation; it's up to you to
transform them if needed), and a point-distance (circle) query is on the 2D
plane.  Other shapes (rectangles, line strings, polygons) use 2D Euclidean
geometry no matter if geo=true or false.

BTW sorry for my delayed response; I was on vacation.

~ David

On Wed, Aug 23, 2017 at 11:21 AM Paweł Kordek 
wrote:

> Hi All
>
>
> I've been skimming through the spatial search docs and came across this
> section:
>
>
>
> https://lucene.apache.org/solr/guide/6_6/spatial-search.html#SpatialSearch-Filteringbyanarbitraryrectangle
>
>
> "Sometimes the spatial search requirement calls for finding everything in
> a rectangular area, such as the area covered by a map the user is looking
> at. For this case, geofilt and bbox won’t cut it. "
>
>
> I can't understand what is meant here by the "rectangular area". What is
> the coordinate system of this rectangle? If we talk about the map, don't we
> have to consider what is the projection? Any help will be much appreciated.
>
>
> Best regards
>
> Paweł
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Sorting by distance resources with WKT polygon data

2017-09-19 Thread David Smiley

Hello,

Sorry for the belated response.

Solr only supports sorting from point or rectangles in the index.  For
rectangles use BBoxField.  For points, ideally use the new
LatLonPointSpatialField; failing that use LatLonType.  You can use RPT for
point data but I don't recommend sorting with it; use one of the others
just mentioned.

~ David

On Tue, Sep 12, 2017 at 5:09 PM Grondin Luc 
wrote:

> Hello,
>
> I am having difficulties with sorting by distance resources indexed with
> WKT geolocation data. I have tried different field configurations and query
> parameters and I did not get working results.
>
> I am using SOLR 6.6 and JTS-core 1.14. My test sample includes resources
> with point coordinates plus one associated with a polygon. I tried using
> both fieldtypes "solr.SpatialRecursivePrefixTreeFieldType" and
> "solr.RptWithGeometrySpatialField". In both cases, I get good results if I
> do not care about sorting. The problem arises when I include sorting.
>
> With SpatialRecursivePrefixTreeFieldType:
>
> The best request I used, based on the documentation I could find, was:
>
> select?fl=*,score&q={!geofilt%20sfield=PositionGeo%20pt=45.52,-73.53%20d=10%20score=distance}&sort=score%20asc
>
> The distance appears to be correctly evaluated for resources indexed with
> point coordinates. However, it is wrong for the resource with a polygon
>
> 
>   2.3913236
>   4.3242383
>   4.671504
>   4.806902
>   20015.115
> 
>
> (Please note that I have verified the polygon externally and it is correct)
>
> With solr.RptWithGeometrySpatialField:
>
> I get an exception triggered by the presence of « score=distance » in the
> request «
> q={!geofilt%20sfield=PositionGeo%20pt=45.52,-73.53%20d=10%20score=distance}
> »
>
> java.lang.UnsupportedOperationException
> at
> org.apache.lucene.spatial.composite.CompositeSpatialStrategy.makeDistanceValueSource(CompositeSpatialStrategy.java:92)
> at
> org.apache.solr.schema.AbstractSpatialFieldType.getValueSourceFromSpatialArgs(AbstractSpatialFieldType.java:412)
> at
> org.apache.solr.schema.AbstractSpatialFieldType.getQueryFromSpatialArgs(AbstractSpatialFieldType.java:359)
> at
> org.apache.solr.schema.AbstractSpatialFieldType.createSpatialQuery(AbstractSpatialFieldType.java:308)
> at
> org.apache.solr.search.SpatialFilterQParser.parse(SpatialFilterQParser.java:80)
>
> From there, I am rather stuck with no ideas on how to resolve these
> problems. So advises in that regards would be much appreciated. I can
> provide more details if necessary.
>
> Thank you in advance,
>
>
>  ---
>   Luc Grondin
>   Analyste en gestion de l'information numérique
>   Centre d'expertise numérique pour la recherche - Université de Montréal
>   téléphone: 514-343-6111 <(514)%20343-6111> p. 3988  --
> luc.gron...@umontreal.ca
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Solr Spatial Query Problem Hk.

2017-10-04 Thread David Smiley

Hi,

Firstly, if Solr returns an error referencing an exception then you can
look in Solr's logs for the stack trace, which helps debugging problems a
ton (at least for Solr devs).

I suspect that the problem here is that your schema might have a dynamic
field where *coordinates is defined to be a number.  The error suggests
this, at least.

On Wed, Sep 27, 2017 at 6:42 AM Can Ezgi Aydemir 
wrote:

> 1-
> http://localhost:8983/solr/nh/select?fq=geometry.coordinates:%22IsWithin(POLYGON((-80%2029,%20-90%2050,%20-60%2070,%200%200,%20-80%2029)))%20distErrPct=0%22


missing q=*:*


>
> 2-
> http://localhost:8983/solr/nh/select?q={!field%20f=geometry.coordinates}Intersects(POLYGON((-80%2029,%20-90%2050,%20-60%2070,%200%200,%20-80%2029)))
> 
> 3-
> http://localhost:8983/solr/nh/select?q=*:*&fq={!field%20f=geometry.coordinates}Intersects(POLYGON((-80%2029,%20-90%2050,%20-60%2070,%200%200,%20-80%2029)))
> 
>
> 
>  
>   400
>   1
>   
>geometry.coordinates:"IsWithin(POLYGON((-80 29, -90 50,
> -60 70, 0 0, -80 29))) distErrPct=0"
>
>   
>  
>  
>   
>org.apache.solr.common.SolrException
>org.apache.solr.common.SolrException
>   
>   Invalid Number: IsWithin(POLYGON((-80 29, -90 50, -60
> 70, 0 0, -80 29))) distErrPct=0
>   400
>  
> 
>
>
>
> [cid:74426A0B-010D-4871-A556-A3590DE88C60@islem.com.tr.]
>
> Can Ezgi AYDEMİR
> Oracle Veri Tabanı Yöneticisi
>
> İşlem Coğrafi Bilgi Sistemleri Müh. & Eğitim AŞ.
> 2024.Cadde No:14, Beysukent 06800, Ankara, Türkiye
> T : 0 312 233 50 00 .:. F : 0312 235 56 82
> E :  cayde...@islem.com.tr<
> https://mail.islem.com.tr/owa/redir.aspx?REF=nPSsfnBmV5Ce9vWorvlOrrYthN1Wt5jhrDrHz4IuPgJuXODmM8nUCAFtYWlsdG86Z2R1cmFuQGlzbGVtLmNvbS50cg..>
> .:. W : https://mail.islem.com.tr/owa/redir.aspx?REF=q0Pp2HH-W10G07gbyIRn7NyrFWyaL2QLhqXKE1SMNj1uXODmM8nUCAFodHRwOi8vd3d3LmlzbGVtLmNvbS50ci8
> .>
>
> Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece
> adreslenen kişileri ilgilendirir. Eğer adreslenen kişi siz değilseniz, bu
> e-postayı yaymayınız, dağıtmayınız veya kopyalamayınız. Eğer bu e-posta
> yanlışlıkla size gönderildiyse, lütfen bu e-posta ve ekindeki dosyaları
> sisteminizden siliniz ve göndereni hemen bilgilendiriniz. Ayrıca, bu
> e-posta ve ekindeki dosyaları virüs bulaşması ihtimaline karşı taratınız.
> İŞLEM GIS® bu e-posta ile taşınabilecek herhangi bir virüsün neden
> olabileceği hasarın sorumluluğunu kabul etmez. Bilgi iç
> in:b...@islem.com.tr This message may contain confidential information
> and is intended only for recipient name. If you are not the named addressee
> you should not disseminate, distribute or copy this e-mail. Please notify
> the sender immediately if you have received this e-mail by mistake and
> delete this e-mail from your system. Finally, the recipient should check
> this email and any attachments for the presence of viruses. İŞLEM GIS®
> accepts no liability for any damage may be caused by any virus transmitted
> by this email.” For information: b...@islem.com.tr
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Retrieve DocIdSet from Query in lucene 5.x

2017-10-24 Thread David Smiley

See SolrIndexSearcher.getDocSet.  It may not be identical to what you want
but following what it does on through to DocSetUtil.createDocSet may be
enlightening.

On Fri, Oct 20, 2017 at 5:10 PM Jamie Johnson  wrote:

> I am trying to migrate some old code that used to retrieve DocIdSets from
> filters, but with Filters being deprecated in Lucene 5.x I am trying to
> move away from those classes but I'm not sure the right way to do this
> now.  Are there any examples of doing this?
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Sum area polygon solr

2017-11-01 Thread David Smiley

Hi,

Ah, no -- sorry.  If you want to roll up your sleeves and write a Solr
plugin (a ValueSource in this case, perhaps) then you could lookup the
index polygon and then call out to JTS to compute the intersection and then
ask it for the area.  But that's going to be a very heavyweight computation
to score/sort on!  Instead, perhaps you can use BBoxField's overlapRatio to
compare bounding boxes which is relatively fast.

~ David

On Tue, Oct 31, 2017 at 8:45 AM Samur Araujo  wrote:

> Hi all, is it possible to sum the area of a polygon in solr?
>
> Suppose I do an polygon intersect and I want to retrieve the total area of
> the resulting polygon.
>
> Is it possible?
>
> Best,
>
> --
> Head of Data
> Geophy
> www.geophy.com
>
> Nieuwe Plantage 54
> -55
> 2611XK  Delft
> +31 (0)70 7640725 <+31%2070%20764%200725>
>
> 1 Fore Street
> EC2Y 9DT  London
> +44 (0)20 37690760 <+44%2020%203769%200760>
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Search opening hours

2016-11-24 Thread David Smiley

I just saw this conversation now.  I didn't read every word but I have to
ask immediately: does DateRangeField address your needs?
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates  It was
introduced in 5.0.

On Wed, Nov 16, 2016 at 4:59 AM O. Klein  wrote:

> Above implementation was too slow, so wondering if Solr 6 with all its new
> features provides a better solution to tackle operating hours. Especially
> dealing with different timezones.
>
> Any thoughts?
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Search-opening-hours-tp4225250p4306073.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Search opening hours

2016-11-28 Thread David Smiley

Lets say you wanted to do ranges over some integer.  Simply convert those
integers to dates, such as
java.time.Instant.ofEpochSecond(myInteger).toString().  It's more efficient
to convert to seconds (as in this example) as a base instead milliseconds
because the internal date oriented tree has 1000 leaves at the millisecond
level to aggregate to the next higher (second).  Also keep in mind you have
to work within a signed Long space.

Longer term, hopefully someone will add a Solr adapter to Lucene's
new IntRangeField (and *RangeField variants) which is for general use.  I'm
not sure if LongRangeField would be faster than DateRangeField as the
approaches are internally quite different.  It probably would be.  The
other factor is index size, and I think those new range fields would
generally be leaner.

~ David

On Fri, Nov 25, 2016 at 4:18 PM O. Klein  wrote:

> Thank you for your reply David.
>
> Yes, I ended up using a DateRangeField. Down side is that it needs frequent
> updates. Luckily not an issue for my use case.
>
> BTW how could I abuse DateRangeField for non-date data?
>
>
>
>
> david.w.smi...@gmail.com wrote
> > I just saw this conversation now.  I didn't read every word but I have to
> > ask immediately: does DateRangeField address your needs?
> > https://cwiki.apache.org/confluence/display/solr/Working+with+Dates  It
> > was
> > introduced in 5.0.
> >
> > On Wed, Nov 16, 2016 at 4:59 AM O. Klein <
>
> > klein@
>
> > > wrote:
> >
> >> Above implementation was too slow, so wondering if Solr 6 with all its
> >> new
> >> features provides a better solution to tackle operating hours.
> Especially
> >> dealing with different timezones.
> >>
> >> Any thoughts?
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Search-opening-hours-tp4225250p4306073.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> > --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Search-opening-hours-tp4225250p4307463.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: How to identify documents failed in a batch request?

2016-12-17 Thread David Smiley

If you enable the "TolerantUpdateProcessor" Solr-side, you can add
documents in bulk allowing some to fail and know which did:

http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/TolerantUpdateProcessorFactory.html

On Sat, Dec 17, 2016 at 5:05 PM S G  wrote:

> Hi,
>
> I am using the following code to send documents to Solr:
>
> final UpdateRequest request = new UpdateRequest();
> request.setAction(UpdateRequest.ACTION.COMMIT, false, false);
> request.add(docsList);
> UpdateResponse response = request.process(solrClient);
>
> The response returned from the last line does not seem to be very helpful
> in determining how I can identify documents failed in a batch request.
>
> Does anyone know how this can be done?
>
> Thanks
> SG
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Solr 6.4 new SynonymGraphFilter help for multi-word synonyms

2017-02-03 Thread David Smiley

Solr _does_ have a query parser that doesn't suffer from this problem --
SimpleQParser chosen as the string "simple".
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SimpleQueryParser
In this case, see the "WHITESPACE" operator feature which can be toggled.
Configure to be _not_ an operator so that whitespace is processed by the
underlying Analyzer to get proper multi-word handling.  This is a very fine
query parser, IMO; much simpler than any other that has it's feature set.
Though you still might need dismax/edismax.

On Thu, Feb 2, 2017 at 1:17 PM Cliff Dickinson 
wrote:

> Steve and Shawn, thanks for your replies/explanations!
>
> I eagerly await the completion of the Solr JIRA ticket referenced above in
> a future release.  Many thanks for addressing this challenge that has had
> me banging my head against my desk off and on for the last couple years!
>
> Cliff
>
> On Thu, Feb 2, 2017 at 1:01 PM, Steve Rowe  wrote:
>
> > Hi Cliff,
> >
> > The Solr query parsers (standard/“Lucene” and e/dismax anyway) have a
> > problem that prevents SynonymGraphFilter from working: the text fed to
> your
> > query analyzer is first split on whitespace.  So e.g. a query containing
> > “United States” will never match multi-word synonym “United
> States”->”US”,
> > since the analyzer will fist see “United” and then, separately, “States”.
> >
> > I fixed the whitespace splitting problem in the classic Lucene query
> > parser in .  (Note
> > that this is *not* the same as Solr’s standard/“Lucene” query parser,
> which
> > is actually a fork of Lucene’s query parser with added functionality.)
> >
> > There is a Solr JIRA I’m working on to fix the whitespace splitting
> > problem: .  I hope to
> > get it committed in time for inclusion in Solr 6.5.
> >
> > --
> > Steve
> > www.lucidworks.com
> >
> > > On Feb 2, 2017, at 9:50 AM, Shawn Heisey  wrote:
> > >
> > > On 2/2/2017 7:36 AM, Cliff Dickinson wrote:
> > >> The SynonymGraphFilter API documentation contains the following
> > statement
> > >> at the end:
> > >>
> > >> "To get fully correct positional queries when your synonym
> replacements
> > are
> > >> multiple tokens, you should instead apply synonyms using this
> > TokenFilter
> > >> at query time and translate the resulting graph to a
> TermAutomatonQuery
> > >> e.g. using TokenStreamToTermAutomatonQuery."
> > >
> > > Lucene is a programming API for search.  That documentation is intended
> > > for people who are writing Lucene programs.  Those users would be
> > > constructing query objects in their own code, so they would most likely
> > > know exactly which object needs to be changed to TermAutomatonQuery.
> > >
> > > Solr is a Lucene program ... and an immensely complicated one.  Many
> > > Lucene improvements require changes in the end program for full
> > > support.  I suspect that Solr's capability has not been updated to use
> > > this new feature in Lucene.  I cannot say for sure, I hope someone who
> > > is familiar with this Lucene change and Solr internals can comment.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Boolean expression for spatial query

2017-03-02 Thread David Smiley

I recommend the MULTIPOINT approach.

BTW if you go the route of multiple OR'ed sub-clauses, I recommend avoiding
the _query_ syntax which predates Solr 4.x's (4.2?) ability to embed fully
the sub-clauses more naturally; though you need to beware of the gotcha of
needing to add a leading space.  If Solr had this feature from the start
then that _query_ hack never would have been added.  For example:
fq=   {!field f=regionGeometry v="Intersects(POINT(x1, y1))"} OR
{!field f=regionGeometry v="Intersects(POINT(x2, y2))"}

Any way, MULTIPOINT is probably going to be much faster, plus it's more
intuitive to understand.  And avoid the "Contains" predicate when point
data is involved as it's slower yet semantically equivalent to "Intersects"
(for a single non-multi point any way)

On Mon, Feb 27, 2017 at 4:12 AM Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Michael,
>
> I haven't been playing with spatial for a while, but if it fully
> supports WKT, you could use Intersects instead of Contains and
> MULTIPOINT instead of POINT. Something like:
>
> fq={!field f=regionGeometry}Intersects(MULTIPOINT((x1 y1), (x2, y2)))
>
> In any case you can use OR-ed _query_:
>
> fq=_query_:"{!field f=regionGeometry}Contains(POINT(x1, y1))" OR
> _query_:"{!field f=regionGeometry}Contains(POINT(x2, y2))"
>
>
> HTH
> Emir
>
>
> On 26.02.2017 07:08, Michael Dürr wrote:
> > Hi all,
> >
> > I index documents containing a spatial field (rpt) that holds a wkt
> > multipolygon. In order to retrieve all documents for which a certain
> point
> > is contained within a polygon I issue the following query:
> >
> > q=*:*&fq={!field f=regionGeometry}Contains(POINT( ))
> >
> > This works pretty good.
> >
> > My question is: Is there any syntax to issue this query for multiple
> points
> > (i.e. return all documents for which at least one of the points is within
> > the document's polygon)?
> >
> > E.g. something like this:
> >
> > q=*:*&fq={!field f=regionGeometry}ContainsOR(POINT( ),POINT(
> > ),...)
> >
> > If not - what other efficient options do you xuggest to do such a query?
> >
> > Best regards,
> > Michael
> >
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Error with polygon search

2017-03-21 Thread David Smiley

Hello Hank,

The online version of the reference guide is always for the latest Solr 
release.  I think your configuration would work in the latest release.  Prior 
to Solr 6, the Spatial4J library had a different Java package location: replace 
"org.locationtech.spatial4j" with "com.spatial4j.core".  The only JTS jar file 
you need is "jts-1.14.jar".

~ David

> On Mar 21, 2017, at 4:31 PM, hank  wrote:
> 
> Hello,
> 
> 
> I'm having problems with a polygon search on location data. I've tried to 
> enable the JTS and Polygons from 
> https://cwiki.apache.org/confluence/display/solr/Spatial+Search but I get the 
> following error when I load solr
> 
> 
>   java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: Unable to create core [jordan]
>  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>  at org.apache.solr.core.CoreContainer$2.run(CoreContainer.java:496)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: Unable to create core 
> [jordan]
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:827)
>  at org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87)
>  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467)
>  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458)
>  ... 5 more
> Caused by: org.apache.solr.common.SolrException: Could not load conf for core 
> stats: Can't load schema 
> /opt/solr/solr-5.5.2/server/solr/stats/conf/managed-schema: Plugin 
> Initializing failure for [schema.xml] fieldType
>  at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:84)
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:812)
>  ... 8 more
> Caused by: org.apache.solr.common.SolrException: Can't load schema 
> /opt/solr/solr-5.5.2/server/solr/stats/conf/managed-schema: Plugin 
> Initializing failure for [schema.xml] fieldType
>  at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:577)
>  at org.apache.solr.schema.IndexSchema.(IndexSchema.java:159)
>  at 
> org.apache.solr.schema.ManagedIndexSchema.(ManagedIndexSchema.java:104)
>  at 
> org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:173)
>  at 
> org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:47)
>  at 
> org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:70)
>  at 
> org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:108)
>  at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:79)
>  ... 9 more
> Caused by: org.apache.solr.common.SolrException: Plugin Initializing failure 
> for [schema.xml] fieldType
>  at 
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:194)
>  at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:470)
>  ... 16 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory
>  at 
> com.spatial4j.core.context.SpatialContextFactory.makeSpatialContext(SpatialContextFactory.java:100)
>  at 
> org.apache.solr.schema.AbstractSpatialFieldType.init(AbstractSpatialFieldType.java:119)
>  at 
> org.apache.solr.schema.AbstractSpatialPrefixTreeFieldType.init(AbstractSpatialPrefixTreeFieldType.java:55)
>  at 
> org.apache.solr.schema.SpatialRecursivePrefixTreeFieldType.init(SpatialRecursivePrefixTreeFieldType.java:37)
>  at org.apache.solr.schema.FieldType.setArgs(FieldType.java:174)
>  at 
> org.apache.solr.schema.FieldTypePluginLoader.init(FieldTypePluginLoader.java:150)
>  at 
> org.apache.solr.schema.FieldTypePluginLoader.init(FieldTypePluginLoader.java:53)
>  at 
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:191)
>  ... 17 more
> Caused by: java.lang.ClassNotFoundException: 
> org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>  at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>  at 
> com.spatial4j.core.context.SpatialContextFactory.makeSpatialContext(SpatialContextFactory.java:97)
>  ... 24 more
> 
> 
> 
> My field looks like
> 
> 
> class="solr.SpatialRecursivePrefixTreeFieldType"
>   
> spatialContextFactory="org.loca

Re: DateRangeField and Faceting

2017-04-26 Thread David Smiley

Hi Stephen,

I agree that it would be nice if the JSON faceting module worked with
DateRangeField.  Sadly Solr has several faceting engines (classic, JSON
Facets, analytics contrib) and there has yet been any effort to coral
them.  My sense is that JSON Faceting is where effort should go, and as you
see there are some gaps.

~ David

On Fri, Apr 21, 2017 at 2:47 PM Stephen Weiss  wrote:

> One small detail - I just realized I've been doing JSON faceting and the
> wiki refers to old-school faceting.  Old-school faceting indeed does work
> but the problem is the facet is ultimately one of a whole tree of stats I'm
> collecting, so JSON facet is far more convenient for my use case (I don't
> think I can even do what I'm doing with old facets).  Why would
> daterangefield work with the old faceting system and not the new?
>
> --
> Steve
>
> On Fri, Apr 21, 2017 at 11:50 AM, Stephen Weiss  > wrote:
> Hi everyone,
>
> Just trying to do a sense check on this.  I'm trying to do a facet based
> off a DateRangeField and I'm hitting this error:
>
> Error from server at
> http://172.20.141.150:8983/solr/instock_au_shard1_replica0: Unable to
> range facet on
> field:sku_history.date_range{type=daterange,properties=indexed,stored,omitTermFreqAndPositions}
>
>
> Now I read through FacetRange.java and it seems like only TrieFields are
> accepted, while DateRangeField is a spatial type, so I suppose that makes
> sense. However, elsewhere in the codebase under DateCalc (which is
> essentially the same set of restrictions) it says:
>
> if (! (field.getType() instanceof TrieDateField) ) { throw new
> IllegalArgumentException("SchemaField must use field type extending
> TrieDateField or DateRangeField"); }
>
> Is this for some reason assuming that DateRangeField is a subclass of
> TrieDateField (it isn't)? I also see a mention here of someone doing a
> facet on a DateRangeField and having it work:
>
> https://wiki.apache.org/solr/DateRangeField
>
> In his case, his daterangefield is multivalued and it seemed to work for
> him - mine is simpler than that yet it doesn't work. I don't really
> understand what we're doing differently that matters, and reading the
> codebase, it really doesn't seem like this was ever possible - but that
> comment under DateCalc makes me wonder.
>
> If we could facet on the daterangefield, it would be very helpful, so any
> pointers on how to do that would be welcome.
>
> --
> Steve
>
>
> WGSN (www.wgsn.com) is the world’s leading trend authority for creative
> thinkers in over 94 countries. Our services cover consumer insights,
> fashion and lifestyle forecasting, data analytics, crowd-sourced design
> validation and expert consulting. We help drive our customers to greater
> success. Together, we Create Tomorrow.
>
> WGSN is part of WGSN Limited, comprising of market-leading products
> including WGSN Insight, WGSN Fashion, WGSN Instock, WGSN Lifestyle &
> Interiors, WGSN Styletrial and WGSN Mindset, our bespoke consultancy
> services. WGSN is owned by Ascential plc, a leading international media
> company that informs and connects business professionals in 150 countries
> through market-leading Exhibitions and Festivals, and Information Services.
>
> The information in or attached to this email is confidential and may be
> legally privileged. If you are not the intended recipient of this message,
> any use, disclosure, copying, distribution or any action taken in reliance
> on it is prohibited and may be unlawful. If you have received this message
> in error, please notify the sender immediately by return email and delete
> this message and any copies from your computer and network.
>
> WGSN does not warrant that this email and any attachments are free from
> viruses and accepts no liability for any loss resulting from infected email
> transmissions.
>
> WGSN reserves the right to monitor all email through its networks. Any
> views expressed may be those of the originator and not necessarily of WGSN.
> WGSN is powered by Ascential plc, which transforms knowledge businesses to
> deliver exceptional performance.
>
> Please be advised all phone calls may be recorded for training and quality
> purposes and by accepting and/or making calls from and/or to us you
> acknowledge and agree to calls being recorded.
>
> WGSN Limited, Company number 4858491
>
> Registered Address:
>
> Ascential plc, The Prow, 1 Wilder Walk, London W1B 5AP
> WGSN Inc., tax ID 04-3851246, registered office c/o National Registered
> Agents, Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States
> 4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register):
> 15.536.968/0001-04, Address: Avenida Cidade Jardim, 377, 7˚ andar CEP
> 01453-000, Itaim Bibi, São Paulo 4C Business Information Consulting
> (Shanghai) Co., Ltd, 富新商务信息咨询（上海）有限公司, registered address Unit 4810/4811,
> 48/F Tower 1, Grand Gateway, 1 Hong Qiao Road, Xuhui District, Shanghai
>
-- 
Lucene/Solr Search Committer, Consult

Re: Spatial Search: can not use FieldCache on a field which is neither indexed nor has doc values: latitudeLongitude_0_coordinate

2017-04-30 Thread David Smiley

Frederick,

RE LatLonType: Weird. Is the dynamic field "_coordinate" defined?  It
should be ensure it has indexed=true on it.  I forget if indexed needs
to be set on that or on the LLT field that refers to it but to be sure set
on both.

RE LatLonPointSpatialField: You should use this for sure assuming you are
using the latest Solr release (6.5.x).  You said "Solr version 6.1.0" which
doesn't have this field type though.

~ David

On Thu, Apr 27, 2017 at 8:26 AM freddy79 
wrote:

> Hi,
>
> when doing a query with spatial search i get the error: can not use
> FieldCache on a field which is neither indexed nor has doc values:
> latitudeLongitude_0_coordinate
>
> *SOLR Version:* 6.1.0
> *schema.xml:*
>
>  subFieldSuffix="_coordinate" />
>  stored="false" multiValued="false"  />
>
> *Query:*
>
> http://localhost:8983/solr/career_educationVacancyLocation/select?q=*:*&fq={!geofilt}&sfield=latitudeLongitude&pt=48.15,16.23&d=10
>
> *Error Message:*
> can not use FieldCache on a field which is neither indexed nor has doc
> values: latitudeLongitude_0_coordinate
>
> What is wrong? Thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Spatial-Search-can-not-use-FieldCache-on-a-field-which-is-neither-indexed-nor-has-doc-values-latitude-tp4332185.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: why MULTILINESTRING can contains polygon in solr spatial search

2017-06-02 Thread David Smiley

Hi,
Solr 4.7 is old but is probably okay.  Is it easy to try a 6.x version?
 (note Spatial4j java package names have changed).  There's also multiple
new pertinent options to your scenario:
https://locationtech.github.io/spatial4j/apidocs/org/locationtech/spatial4j/context/jts/JtsSpatialContextFactory.html
* "useJtsMulti":"false" (defaults to true)
* "useJtsLineString":"false" (defaults to true)

Any way, this could be due to the validationRule="repairBuffer0" logic if
per chance the indexed shape isn't considered "valid" (by JTS).

If flipping these options and using a recent Solr/Lucene/Spatial4j release
don't fix the issue, please file a JIRA issue to the Lucene project.

On Fri, Jun 2, 2017 at 5:53 AM kjdong  wrote:

> solr-version:4.7.0
>
> field spec as follows:
>  class="solr.SpatialRecursivePrefixTreeFieldType"
>
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
> validationRule="repairBuffer0" geo="true" distErrPct="0.025"
> maxDistErr="0.09" units="degrees" />
>
>  multiValued="true"/>
>
> And i index some MULTILINESTRING (wkt formatted  shape， the road data), and
> i query use "Intersects" spatial predicates like
> fq=geom:"Intersects(POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30)))
> distErrPct=0".
>
> In fact, i want to query the shape(multiline) which is intersect with the
> query polygon, but the searched return document has nothing to do with the
> query polygon(aka, isDisjointTo), then i test it use JTS api ,it indeed
> return false, but solr think the line intersects with the polygon ,even
> contains. is this a bug? or repair it in advanced version？
>
> Geometry line = new WKTReader.read(the line  wkt text string);
> Geometry polygon= new WKTReader.read(the polygon wkt text string);
> line.intersects(polygon);//return false
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/why-MULTILINESTRING-can-contains-polygon-in-solr-spatial-search-tp4338593.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Spatial maxDistErr changes

2014-04-02 Thread David Smiley

Good question Steve,

You'll have to re-index right off.

~ David
p.s. Sorry I didn't reply sooner; I just switched jobs and reconfigured my
mailing list subscriptions



Steven Bower wrote
> If am only indexing point shapes and I want to change the maxDistErr from
> 0.09 (1m res) to 0.00045 will this "break" as in searches stop working
> or will search work but any performance gain won't be seen until all docs
> are reindexed? Or will I have to reindex right off?
> 
> thanks,
> 
> steve





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 Independent Lucene/Solr search consultant, 
http://www.linkedin.com/in/davidwsmiley
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-maxDistErr-changes-tp4124836p4128620.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Rectangle with rotation in Solr

2018-09-13 Thread David Smiley

Polygon is the only way.

On Wed, Aug 29, 2018 at 7:46 AM Zahra Aminolroaya 
wrote:

> I have locations with 4-tuple (longitude,latitude) which are like
> rectangles
> and I want to index them. Solr BBoxField with minX, maxX, maxY and minY,
> only considers rectangles which does not have rotations. suppose my
> rectangle is rotated  45 degree  clockwise based on axis, how can I define
> rotation in bbox? Is using RPT (polygon) the only way?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Geofilt and distance measurement problems using SpatialRecursivePrefixTreeFieldType field type

2018-12-20 Thread David Smiley

Hi Peter,

Use of an RPT field for distance sorting/boosting is to be avoided where
possible because it's very inefficient at this specific use-case.  Simply
use LatLonType for this task, and continue to use RPT for the filter/search
use-case.

Also I see you putting a space between the coordinates instead of a
comma...   yet you have geo (latitude & longitude data) so this is a bit
confusing.  Do "lat,lon".  I think a space will be interpreted as "x y"
(thus reversed).  Perhaps you've mixed up the coordinates and this explains
the error?  A quick lookup of your sample coordinates suggests to me this
is likely the problem.  It's a common mistake.

BTW this:
maxDistErr="0.2" distanceUnits="kilometers"
means 200m accuracy (or better).  Is this what you want?  Just checking.

~ David

On Thu, Dec 13, 2018 at 6:38 AM Peter Lancaster <
peter.lancas...@findmypast.com> wrote:

> I am currently using Solr 5.5.2 and implementing a GeoSpatial search that
> returns results within a radius in Km of a specified LatLon. Using a field
> of type solr.LatLonType and a geofilt query this gives good results but is
> much slower than our regular queries. Using a bbox query is faster but of
> course less accurate.
>
> I then attempted to use a field of type
> solr.SpatialRecursivePrefixTreeFieldType to check performance and because I
> want to be able to do searches within a polygon eventually. The field is
> defined as follows
>
>   class="solr.SpatialRecursivePrefixTreeFieldType"
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
> geo="true" distErrPct="0.05" maxDistErr="0.2"
> distanceUnits="kilometers" autoIndex="true"/>
>
>  stored="true" multiValued="false" omitNorms="true" />
>
> I'm just using it to index single points right now. The problem is that
> the distance calculation is not working correctly. It seems to overstate
> the distances for differences in longitude.
>
> For example a query for
> &fl=Id,LatLonRPT__location_rpt,_dist_:geodist()&sfield=LatLonRPT__location_rpt&pt=53.409490
> -2.979677&query={!geofilt sfield=LatLonRPT__location_rpt pt="53.409490
> -2.979677" d=25} returns
>
> {
> "Id": "HAR/CH1/80763270",
> "LatLonRPT__location_rpt": "53.2 -2.91",
> "_dist_": 24.295607
> },
> {
> "Id": "HAR/CH42/1918283949",
> "LatLonRPT__location_rpt": "53.393239 -3.028859",
> "_dist_": 5.7587695
> }
>
> The true distances for these results are 23.67 and 3.73 km and other
> results at a true distance of 17 km aren't returned within the 25 km radius.
>
> The explain has the following
>
> +IntersectsPrefixTreeQuery(IntersectsPrefixTreeQuery(fieldName=LatLonRPT__location_rpt,queryShape=Circle(Pt(x=53.40949,y=-2.979677),
> d=0.2° 25.00km),detailLevel=6,prefixGridScanLevel=7))
>
> Is my set up incorrect in some way or is the
> SpatialRecursivePrefixTreeFieldType not suitable for doing radius searches
> on points in this way?
>
> Thanks in anticipation for any suggestions.
>
> Peter Lancaster.
>
> 
> This message is confidential and may contain privileged information. You
> should not disclose its contents to any other person. If you are not the
> intended recipient, please notify the sender named above immediately. It is
> expressly declared that this e-mail does not constitute nor form part of a
> contract or unilateral obligation. Opinions, conclusions and other
> information in this message that do not relate to the official business of
> findmypast shall be understood as neither given nor endorsed by it.
> 
>
> __
>
> This email has been checked for virus and other malicious content prior to
> leaving our network.
> __

-- 
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Geofilt and distance measurement problems using SpatialRecursivePrefixTreeFieldType field type

2018-12-23 Thread David Smiley

For latitude and longitude data, I recommend "lat,lon" and never use "x
y".  Perhaps the latter should be an error when geo=true (and inverse when
false) but it isn't.  Yes the documentation could be better!

On Fri, Dec 21, 2018 at 4:31 AM Peter Lancaster <
peter.lancas...@findmypast.com> wrote:

> Hi David,
>
> Ignore my previous reply.
>
> I think you've supplied the answer. Yes we do need to use a space to index
> points in an rpt field, but when we do that the order is flipped from
> Lat,Lon to Lon Lat, so we need to re-index our data. In my defence that is
> far from obvious in the documentation.
>
> Thanks again for your help.
>
> Cheers,
> Peter.
>
> -Original Message-
> From: David Smiley [mailto:david.w.smi...@gmail.com]
> Sent: 21 December 2018 04:44
> To: solr-user@lucene.apache.org
> Subject: Re: Geofilt and distance measurement problems using
> SpatialRecursivePrefixTreeFieldType field type
>
> Hi Peter,
>
> Use of an RPT field for distance sorting/boosting is to be avoided where
> possible because it's very inefficient at this specific use-case.  Simply
> use LatLonType for this task, and continue to use RPT for the filter/search
> use-case.
>
> Also I see you putting a space between the coordinates instead of a
> comma...   yet you have geo (latitude & longitude data) so this is a bit
> confusing.  Do "lat,lon".  I think a space will be interpreted as "x y"
> (thus reversed).  Perhaps you've mixed up the coordinates and this
> explains the error?  A quick lookup of your sample coordinates suggests to
> me this is likely the problem.  It's a common mistake.
>
> BTW this:
> maxDistErr="0.2" distanceUnits="kilometers"
> means 200m accuracy (or better).  Is this what you want?  Just checking.
>
> ~ David
>
> On Thu, Dec 13, 2018 at 6:38 AM Peter Lancaster <
> peter.lancas...@findmypast.com> wrote:
>
> > I am currently using Solr 5.5.2 and implementing a GeoSpatial search
> > that returns results within a radius in Km of a specified LatLon.
> > Using a field of type solr.LatLonType and a geofilt query this gives
> > good results but is much slower than our regular queries. Using a bbox
> > query is faster but of course less accurate.
> >
> > I then attempted to use a field of type
> > solr.SpatialRecursivePrefixTreeFieldType to check performance and
> > because I want to be able to do searches within a polygon eventually.
> > The field is defined as follows
> >
> >  >  class="solr.SpatialRecursivePrefixTreeFieldType"
> >
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
> > geo="true" distErrPct="0.05" maxDistErr="0.2"
> > distanceUnits="kilometers" autoIndex="true"/>
> >
> >  > stored="true" multiValued="false" omitNorms="true" />
> >
> > I'm just using it to index single points right now. The problem is
> > that the distance calculation is not working correctly. It seems to
> > overstate the distances for differences in longitude.
> >
> > For example a query for
> > &fl=Id,LatLonRPT__location_rpt,_dist_:geodist()&sfield=LatLonRPT__loca
> > tion_rpt&pt=53.409490 -2.979677&query={!geofilt
> > sfield=LatLonRPT__location_rpt pt="53.409490 -2.979677" d=25} returns
> >
> > {
> > "Id": "HAR/CH1/80763270",
> > "LatLonRPT__location_rpt": "53.2 -2.91",
> > "_dist_": 24.295607
> > },
> > {
> > "Id": "HAR/CH42/1918283949",
> > "LatLonRPT__location_rpt": "53.393239 -3.028859",
> > "_dist_": 5.7587695
> > }
> >
> > The true distances for these results are 23.67 and 3.73 km and other
> > results at a true distance of 17 km aren't returned within the 25 km
> radius.
> >
> > The explain has the following
> >
> > +IntersectsPrefixTreeQuery(IntersectsPrefixTreeQuery(fieldName=LatLonR
> > +PT__location_rpt,queryShape=Circle(Pt(x=53.40949,y=-2.979677),
> > d=0.2° 25.00km),detailLevel=6,prefixGridScanLevel=7))
> >
> > Is my set up incorrect in some way or is the
> > SpatialRecursivePrefixTreeFieldType not suitable for doing radius
> > searches on points in this way?
> >
> > Thanks in anticipation for any suggestions.
> >
> > Peter Lancaster

Re: Solr 7.2.1 Stream API throws null pointer execption when used with collapse filter query

2019-01-03 Thread David Smiley

File a JIRA issue please

On Thu, Jan 3, 2019 at 5:20 PM gopikannan  wrote:

> Hi,
>I am getting null pointer exception when streaming search is done with
> collapse filter query. When debugged the last element in FixedBitSet array
> is null. Please let me know if I can raise an issue.
>
>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/export/ExportWriter.java#L232
>
>
> http://localhost:8983/stream/?expr=search(coll_a ,sort="field_a
>
> asc",fl="field_a,field_b,field_c,field_d",qt="/export",q="*:*",fq="(filed_b:x)",fq="{!collapse
> field=field_c sort='field_d desc'}")
>
> org.apache.solr.servlet.HttpSolrCall null:java.lang.NullPointerException
> at org.apache.lucene.util.BitSetIterator.(BitSetIterator.java:61)
> at org.apache.solr.handler.ExportWriter.writeDocs(ExportWriter.java:243)
> at
> org.apache.solr.handler.ExportWriter.lambda$null$1(ExportWriter.java:222)
> at
>
> org.apache.solr.response.JSONWriter.writeIterator(JSONResponseWriter.java:523)
> at
>
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:180)
> at org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559)
> at
> org.apache.solr.handler.ExportWriter.lambda$null$2(ExportWriter.java:222)
> at
> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
> at
>
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:198)
> at org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559)
> at
> org.apache.solr.handler.ExportWriter.lambda$write$3(ExportWriter.java:220)
> at
> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
> at org.apache.solr.handler.ExportWriter.write(ExportWriter.java:218)
> at org.apache.solr.core.SolrCore$3.write(SolrCore.java:2627)
> at
>
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:49)
>
-- 
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: regarding debugging solr in eclipse

2019-01-18 Thread David Smiley

On Fri, Jan 18, 2019 at 9:20 AM Scott Stults <
sstu...@opensourceconnections.com> wrote:

> This blog article might help:
>
> https://opensourceconnections.com/blog/2013/04/13/how-to-debug-solr-with-eclipse/
>
>
I don't use Eclipse but I believe things are better now than the
instructions given.  The setup for both Eclipse and IntelliJ have a "run
configuration" (or whatever it's called in Eclipse) and thus you needn't
manually at the CLI run things nor do you need to setup a new run config
with the ports set.

~ David


>
>
> On Fri, Jan 18, 2019 at 6:53 AM SAGAR INGALE 
> wrote:
>
> > Can anybody tell me how to debug solr in eclipse, if possible how can I
> > build a maven project and launch the jetty server in debug mode?
> > Thanks. Regards
> >
>
>
> --
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780 <(434)%20409-2780>
> http://www.opensourceconnections.com
>
-- 
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Nested geofilt query for LTR feature

2019-03-20 Thread David Smiley

Hi,

I've never used the LTR module, but I suspect I might know what the error
is.  I think that the "query" Function Query has parsing limitations on
what you pass to it.  At least it used to.  Try to put the embedded query
onto another parameter and then refer to it with a dollar-sign.  See the
examples here:
https://builds.apache.org/job/Solr-reference-guide-master/javadoc/function-queries.html#query-function

Also, I think it's a bit inefficient to wrap a query function query around
a geofilt query that exposes a distance as a score.  If you want the
distance then call the "geodist" function query.

Additionally if you dump the full stack trace here, it might be helpful.
Getting a RuntimeException suggests we need to do a better of job
wrapping/cleaning errors internally.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Mar 14, 2019 at 11:43 PM Kamuela Lau  wrote:

> Hello,
>
> I'm currently using Solr 7.2.2 and trying to use the LTR contrib module to
> rerank queries.
> For my LTR model, I would like to use a feature that is essentially a
> "normalized distance," a value between 0 and 1 which is based on distance.
>
> When using geodist() to define a feature in the feature store, I received a
> "failed to parse feature query" error, and thus I am using the below
> geofilt query for distance.
>
> {
>   "name":"dist",
>   "class":"org.apache.solr.ltr.feature.SolrFeature",
>   "params":{"q":"{!geofilt sfield=latlon score=kilometers filter=false
> pt=${ltrpt} d=5000}"},
>   "store":"ltrFeatureStore"
> }
>
> This feature correctly returns the distance between ltrpt and the sfield
> latlon (LatLonPointSpatialField).
> As I mentioned previously, I would like a feature which uses this distance
> in another function. To test this functionality, I tried to define a
> feature which multiplies the distance by two:
>
> {
>   "name":"twoDist",
>   "class":"org.apache.solr.ltr.feature.SolrFeature",
>   "params":{"q":"{!func}product(2,query({!geofilt v= sfield=latlon
> score=kilometers filter=false pt=${ltrpt} d=5000},0.0))"},
>   "store":"ltrFeatureStore"
> }
>
> When trying to extract this feature, I receive the following error:
>
> java.lang.RuntimeException: Exception from createWeight for SolrFeature
> [name=multDist, params={q={!func}product(2,query({!geofilt v= sfield=latlon
> score=kilometers filter=false pt=${ltrpt} d=5000},0.0))}]  missing sfield
> for spatial request
>
> However, when I define the following in fl for a regular, non-reranked
> query, I find that it is correctly parsed and I receive the correct value,
> which is twice the value of geodist() (pt2 is defined in a different part
> of the query):
> fl=score,geodist(),{!func}product(2,query({!geofilt v= sfield=latlon
> score=kilometers filter=false pt=${pt2} d=5},0.0))
>
> For reference, below is what I have defined in my schema:
>
>
>  docValues="true"/>
>
> Is this the correct, intended behavior? If so, is my query for this
> correct, or should I go about extracting this sort of feature a different
> way?
>

Re: Range query syntax on a polygon field is returning all documents

2019-03-20 Thread David Smiley

Hi Mitchell,

Seems like there's a bug based on what you've shown.
* Can you please try RptWithGeometrySpatialField instead
of SpatialRecursivePrefixTreeFieldType to see if the problem goes away?
This could point to a precision issue; though still what you've seen is
suspicious.
* Can you try one other query syntax e.g. bbox query parser to see if the
problem goes away?  I doubt this is it but you seem to point to the syntax
being related.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 18, 2019 at 12:24 AM Mitchell Bösecke <
mitchell.bose...@forcorp.com> wrote:

> Hi everyone,
>
> I'm trying to index geodetic polygons and then query them out using an
> arbitrary rectangle. When using the Geo3D spatial context factory, the data
> indexes just fine but using a range query (as per the solr documentation)
> does not seem to filter the results appropriately (I get all documents
> back).
>
> When I switch it to JTS, everything works as expected. However, it
> significantly slowed down the initial indexing time. A sample size of 3000
> documents took 3 seconds with Geo3D and 50 seconds with JTS.
>
> I've documented my journey in detail on stack overflow:
> https://stackoverflow.com/q/55212622/1017571
>
>1. Can I not use the range query syntax with Geo3D? I.e. am I
>misreading the documentation?
>2. Is it expected that using JTS will *significantly* slow down the
>indexing time?
>
> Thanks for any insight.
>
> --
> Mitchell Bosecke, B.Sc.
> Senior Application Developer
>
> FORCORP
> Suite 200, 15015 - 123 Ave NW,
> Edmonton, AB, T5V 1J7
> www.forcorp.com
> (d) 780.733.0494
> (o) 780.452.5878 ext. 263
> (f) 780.453.3986
>

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread David Smiley

What/where is this benchmark?  I recall once Ishan was working with a
volunteer to set up something like Lucene has but sadly it was not
successful

On Wed, Apr 3, 2019 at 6:04 AM Đạt Cao Mạnh  wrote:

> Hi guys,
>
> I'm seeing the same problems with Shalin nightly indexing benchmark. This
> happen around this period
> git log --before=2018-12-07 --after=2018-11-21
>
> On Wed, Apr 3, 2019 at 8:45 AM Toke Eskildsen  wrote:
>
>> On Wed, 2019-04-03 at 15:24 +0800, Zheng Lin Edwin Yeo wrote:
>> > Yes, I am using DocValues for most of my fields.
>>
>> So that's a culprit. Thank you.
>>
>> > Currently we can't share the test data yet as some of the records are
>> > sensitive. Do you have any data from CSV file that you can test?
>>
>> Not really. I asked because it was a relatively easy way to do testing
>> (replicate your indexing flow with both Solr 7 & 8 as end-points,
>> attach JVisualVM to the Solrs and compare the profiles).
>>
>>
>> I'll put on my to-do to create a test or two with the scenario
>> "indexing from CSV with many DocValues fields". I'll try and generate
>> some test data and see if I can reproduce with them. If this is to be a
>> JIRA, that's needed anyway. Can't promise when I'll get to it, sorry.
>>
>> If this does turn out to be the cause of your performance regression,
>> the fix (if possible) will be for a later Solr version. Currently it is
>> not possible to tweak the docValues indexing parameters outside of code
>> changes.
>>
>>
>> Do note that we're still operating on guesses here. The cause for your
>> regression might easily be elsewhere.
>>
>> - Toke Eskildsen, Royal Danish Library
>>
>>
>>
>
> --
> *Best regards,*
> *Cao Mạnh Đạt*
>
>
> *D.O.B : 31-07-1991Cell: (+84) 946.328.329E-mail: caomanhdat...@gmail.com
> *
>
-- 
Sent from Gmail Mobile

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread David Smiley

Hi Edwin,

I'd like to rule something out.  Does your schema define a field "_root_"?
If you don't have nested documents then remove it.  It's presence adds
indexing weight in 8.0 that was not there previously.  I'm not sure how
much though; I've hoped small but who knows.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Apr 2, 2019 at 10:17 PM Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I am setting up the latest Solr 8.0.0, and I am re-indexing the data from
> scratch in Solr 8.0.0
>
> However, I found that the indexing speed is slower in Solr 8.0.0, as
> compared to the earlier version like Solr 7.7.1. I have not changed the
> schema.xml and solrconfig.xml yet, just did a change of the
> luceneMatchVersion in solrconfig.xml to 8.0.0
> uceneMatchVersion>8.0.0
>
> On average, the speed is about 40% to 50% slower. For example, the indexing
> speed was about 17 mins in Solr 7.7.1, but now it takes about 25 mins to
> index the same set of data.
>
> What could be the reason that causes the indexing to be slower in Solr
> 8.0.0?
>
> Regards,
> Edwin
>

Re: Spatial Search using two separate fields for lat and long

2019-04-13 Thread David Smiley

Hi,

I think your requirement of exporting back to CSV is fine but it's quite
normal for there to be some transformation steps on input and/or output...
and that such steps you mostly do yourself (not Solr).  That said, one
straight-forward solution is to have your spatial field be redundant with
the lat & lon separately.  Your spatial field could be stored=false, and
the separate fields would be stored but otherwise not be indexed or have
other characteristics that add weight.  The result is efficient; no
redundancies.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Apr 3, 2019 at 1:54 AM Tim Hedlund  wrote:

> Hi all,
>
> I'm importing documents (rows in excel file) that includes latitude and
> longitude fields. I want to use those two separate fields for searching
> with a bounding box. Is this possible (not using deprecated LatLonType) or
> do I need to combine them into one single field when indexing? The reason I
> want to keep the fields as two separate ones is that I want to be able to
> export from solr back to exact same excel file structure, i.e. solr fields
> maps exactly to excel columns.
>
> I'm using solr 7. Any thoughts or suggestions would be appreciated.
>
> Regards
> Tim
>
>

Re: Sorting results for spatial search

2018-02-01 Thread David Smiley

quote: "The problem is that this includes children that DON’T touch the
search area in the sum. How can I only include the shapes from the first
query above in my sort?"

Unless I'm misunderstanding your intent, I think this is a simple matter of
adding the spatial filter to the parent join query you are sorting on.  So
something like this (not tested):

&sort=query($sortQ) desc
&sortQ={!parent which=is_parent:true score=total}
  +is_parent:false
  +{!func}density
  +gridcell_rpt:"Intersects(POLYGON((-20 70, -50 80, -20 20, 30 60, -10 40,
-20 70)))"

Separately from your question, you state that these are grid cells and thus
rectangles.  For rectangles, I recommend using BBoxField, which will
probably overall perform better (smaller index, faster queries).  If you
need an RPT field nonetheless (heatmaps?) then you could use the more
concise ENVELOPE syntax but it shouldn't matter since a polygon that is a
rectangle will internally be optimized to be one.

On Wed, Jan 31, 2018 at 3:33 PM Leila Deljkovic <
leila.deljko...@koordinates.com> wrote:

> Hiya,
>
> So I have some nested documents in my index with this kind of structure:
> {
> "id": “parent",
> "gridcell_rpt": "POLYGON((30 10, 40 40, 20 40, 10 20, 30 10))",
> "density": “30"
>
> "_childDocuments_" : [
> {
> "id":"child1",
> "gridcell_rpt":"MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)))",
> "density":"25"
> },
> {
> "id":"child2",
> "gridcell_rpt":"MULTIPOLYGON(((15 5, 40 10, 10 20, 5 10, 15
> 5)))",
> "density":"5"
> }
> ]
> }
>
> The parent document is a WKT shape, and its children are “grid cells”,
> which are just divisions of the main shape (ie; cutting up the parent shape
> to get children shapes). The “density" is the feature count in each shape.
> When I query (through the Solr UI) I use “Intersects” to return parents
> which touch the search area (note that if a child is touching, the parent
> must also be touching).
>
> eg; fq={!field f=gridcell_rpt}Intersects(POLYGON((-20 70, -50 80,
> -20 20, 30 60, -10 40, -20 70)))
>
> and I want to sort the results by the sum of the densities of all the
> children touching the search area (so which parent has children that touch
> the search area, and how big the sum of these children’s densities is)
> something like {!parent which=is_parent:true score=total
> v='+is_parent:false +{!func}density'} desc
>
> The problem is that this includes children that DON’T touch the search
> area in the sum. How can I only include the shapes from the first query
> above in my sort?
>
> Cheers :)

-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: InetAddressPoint support in Solr or other IP type?

2018-03-23 Thread David Smiley

Hi,

For IPv4, use TrieIntField with precisionStep=8

For IPv6 https://issues.apache.org/jira/browse/SOLR-6741   There's nothing
there yet; you could help out if you are familiar with the codebase.  Or
you might try something relatively simple involving edge ngrams.

~ David

On Thu, Mar 22, 2018 at 1:09 PM Mike Cooper  wrote:

> I have scoured the web and cannot find any discussion of having the Lucene
> InetAddressPoint type exposed in Solr. Is there a reason this is omitted
> from the Solr supported types? Is it on the roadmap? Is there an
> alternative recommended way to index and store Ipv4 and Ipv6 addresses for
> optimal range searches and subnet searches? Thanks for your help.
>
>
>
> *Michael Cooper*
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: InetAddressPoint support in Solr or other IP type?

2018-03-27 Thread David Smiley

(I overlooked your reply; sorry to leave you hanging)

>From a simplicity standpoint, Just use InetAddressPoint.  Solr has no
rules/restrictions as to which Lucene module it's in.

That said, I *suspect* a Terms PrefixTree aligned to each byte would offer
better query performance, presuming that typical range queries are
byte-to-byte (as they would be for IPs?).  The Points API internally makes
the splitting decision, and it's not customizable.  It's blind to how
people will realistically query the data; it just wants a balanced tree.
For the same reason, I *suspect* (but have not benchmarked to see) that
DateRangeField has better query performance than DatePointField.  That
said, a Points index is probably going to be leaner & faster to index.

~ David

On Fri, Mar 23, 2018 at 7:51 PM Mike Cooper  wrote:

> Thanks David. Is there a reason we wouldn't want to base the Solr
> implementation on the InetAddressPoint class?
>
>
> https://lucene.apache.org/core/7_2_1/misc/org/apache/lucene/document/InetAddressPoint.html
>
> I realize that is in the "misc" package for now, so it's not part of core
> Lucene. But it is nice in that it has one class for both ipv4 and ipv6 and
> it's based on point numerics rather than trie numerics which seem to be
> deprecated. I'm pretty familiar with the code base, I could take a stab at
> implementing this. I just wanted to make sure there wasn't something I was
> missing since I couldn't find any discussion on this.
>
> Michael Cooper
>
> -Original Message-
> From: David Smiley [mailto:david.w.smi...@gmail.com]
> Sent: Friday, March 23, 2018 5:14 PM
> To: solr-user@lucene.apache.org
> Subject: Re: InetAddressPoint support in Solr or other IP type?
>
> Hi,
>
> For IPv4, use TrieIntField with precisionStep=8
>
> For IPv6 https://issues.apache.org/jira/browse/SOLR-6741   There's nothing
> there yet; you could help out if you are familiar with the codebase.  Or
> you
> might try something relatively simple involving edge ngrams.
>
> ~ David
>
> On Thu, Mar 22, 2018 at 1:09 PM Mike Cooper 
> wrote:
>
> > I have scoured the web and cannot find any discussion of having the
> > Lucene InetAddressPoint type exposed in Solr. Is there a reason this
> > is omitted from the Solr supported types? Is it on the roadmap? Is
> > there an alternative recommended way to index and store Ipv4 and Ipv6
> > addresses for optimal range searches and subnet searches? Thanks for your
> > help.
> >
> >
> >
> > *Michael Cooper*
> >
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Copying a SolrCloud collection to other hosts

2018-03-27 Thread David Smiley

The backup/restore API is intended to address this.
https://builds.apache.org/job/Solr-reference-guide-master/javadoc/making-and-restoring-backups.html

Erick's advice is good (and I once drafted docs for the same scheme years
ago as well), but I consider it dated -- it's what people had to do before
the backup/restore API existed.  Internally, backup/restore is doing
similar stuff.  It's easy to give backup/restore a try; surely you have by
now?

~ David

On Tue, Mar 6, 2018 at 9:47 AM Patrick Schemitz  wrote:

> Hi List,
>
> so I'm running a bunch of SolrCloud clusters (each cluster is: 8 shards
> on 2 servers, with 4 instances per server, no replicas, i.e. 1 shard per
> instance).
>
> Building the index afresh takes 15+ hours, so when I have to deploy a new
> index, I build it once, on one cluster, and then copy (scp) over the
> data//index directories (shutting down the Solr instances
> first).
>
> I could get Solr 6.5.1 to number the shard/replica directories nicely via
> the createNodeSet and createNodeSet.shuffle options:
>
> Solr 6.5.1 /var/lib/solr:
>
> Server node 1:
> instance00/data/main_index_shard1_replica1
> instance01/data/main_index_shard2_replica1
> instance02/data/main_index_shard3_replica1
> instance03/data/main_index_shard4_replica1
>
> Server node 2:
> instance00/data/main_index_shard5_replica1
> instance01/data/main_index_shard6_replica1
> instance02/data/main_index_shard7_replica1
> instance03/data/main_index_shard8_replica1
>
> However, while attempting to upgrade to 7.2.1, this numbering has changed:
>
> Solr 7.2.1 /var/lib/solr:
>
> Server node 1:
> instance00/data/main_index_shard1_replica_n1
> instance01/data/main_index_shard2_replica_n2
> instance02/data/main_index_shard3_replica_n4
> instance03/data/main_index_shard4_replica_n6
>
> Server node 2:
> instance00/data/main_index_shard5_replica_n8
> instance01/data/main_index_shard6_replica_n10
> instance02/data/main_index_shard7_replica_n12
> instance03/data/main_index_shard8_replica_n14
>
> This new numbering breaks my copy script, and furthermode, I'm worried
> as to what happens when the numbering is different among target clusters.
>
> How can I switch this back to the old numbering scheme?
>
> Side note: is there a recommended way of doing this? Is the
> backup/restore mechanism suitable for this? The ref guide is kind of terse
> here.
>
> Thanks in advance,
>
> Ciao, Patrick
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Copying a SolrCloud collection to other hosts

2018-03-28 Thread David Smiley

Right, there is a shared filesystem requirement.  It would be nice if this
Solr feature could be enhanced to have more options like backing up
directly to another SolrCloud using replication/fetchIndex like your cool
solrcloud_manager thing.

On Wed, Mar 28, 2018 at 12:34 PM Jeff Wartes  wrote:

> The backup/restore still requires setting up a shared filesystem on all
> your nodes though right?
>
> I've been using the fetchindex trick in my solrcloud_manager tool for ages
> now: https://github.com/whitepages/solrcloud_manager#cluster-commands
> Some of the original features in that tool have been incorporated into
> Solr itself these days, but I still use clonecollection/copycollection
> regularly. (most recently with Solr 7.2)
>
>
> On 3/27/18, 9:55 PM, "David Smiley"  wrote:
>
> The backup/restore API is intended to address this.
>
> https://builds.apache.org/job/Solr-reference-guide-master/javadoc/making-and-restoring-backups.html
>
> Erick's advice is good (and I once drafted docs for the same scheme
> years
> ago as well), but I consider it dated -- it's what people had to do
> before
> the backup/restore API existed.  Internally, backup/restore is doing
> similar stuff.  It's easy to give backup/restore a try; surely you
> have by
> now?
>
> ~ David
>
> On Tue, Mar 6, 2018 at 9:47 AM Patrick Schemitz  wrote:
>
> > Hi List,
> >
> > so I'm running a bunch of SolrCloud clusters (each cluster is: 8
> shards
> > on 2 servers, with 4 instances per server, no replicas, i.e. 1 shard
> per
> > instance).
> >
> > Building the index afresh takes 15+ hours, so when I have to deploy
> a new
> > index, I build it once, on one cluster, and then copy (scp) over the
> > data//index directories (shutting down the Solr instances
> > first).
> >
> > I could get Solr 6.5.1 to number the shard/replica directories
> nicely via
> > the createNodeSet and createNodeSet.shuffle options:
> >
> > Solr 6.5.1 /var/lib/solr:
> >
> > Server node 1:
> > instance00/data/main_index_shard1_replica1
> > instance01/data/main_index_shard2_replica1
> > instance02/data/main_index_shard3_replica1
> > instance03/data/main_index_shard4_replica1
> >
> > Server node 2:
> > instance00/data/main_index_shard5_replica1
> > instance01/data/main_index_shard6_replica1
> > instance02/data/main_index_shard7_replica1
> > instance03/data/main_index_shard8_replica1
> >
> > However, while attempting to upgrade to 7.2.1, this numbering has
> changed:
> >
> > Solr 7.2.1 /var/lib/solr:
> >
> > Server node 1:
> > instance00/data/main_index_shard1_replica_n1
> > instance01/data/main_index_shard2_replica_n2
> > instance02/data/main_index_shard3_replica_n4
> > instance03/data/main_index_shard4_replica_n6
> >
> > Server node 2:
> > instance00/data/main_index_shard5_replica_n8
> > instance01/data/main_index_shard6_replica_n10
> > instance02/data/main_index_shard7_replica_n12
> > instance03/data/main_index_shard8_replica_n14
> >
> > This new numbering breaks my copy script, and furthermode, I'm
> worried
> > as to what happens when the numbering is different among target
> clusters.
> >
> > How can I switch this back to the old numbering scheme?
> >
> > Side note: is there a recommended way of doing this? Is the
> > backup/restore mechanism suitable for this? The ref guide is kind of
> terse
> > here.
> >
> > Thanks in advance,
> >
> > Ciao, Patrick
> >
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: querying vs. highlighting: complete freedom?

2018-04-02 Thread David Smiley

Hi Arturas,

Both Erick and I had a go at improving the documentation here.  I hope it's
clearer.
https://builds.apache.org/job/Solr-reference-guide-master/javadoc/highlighting.html
The docs for hl.fl, hl.q, hl.qparser were all updated.  The meat of the
change was a new note in hl.fl including an example.  It's kinda hard to
document the problem you found but I hope the note will be somewhat
illustrative.

~ David

On Mon, Mar 26, 2018 at 3:12 AM Arturas Mazeika  wrote:

> Hi Erick,
>
> Adding a field-qualify to the hl.q parameter solved the issue. My
> excitement is steaming over the roof! What a thorough answer: the
> explanation about the behavior of solr, how it tries to interpret what I
> mean when I supply a keyword without the field-qualifier. Very impressive.
> Would you care (re)posting this answer to stackoverflow? If that is too
> much of a hassle, I'll do this in a couple of days myself on your behalf.
>
> I am impressed how well, thorough, fast and fully the question was
> answered.
>
> Steven hint pushed me into this direction further: he suggested to use the
> query part of solr to filter and sort out the relevant answers in the 1st
> step and in the 2nd step he'd highlight all the keywords using CTR+F (in
> the browser or some alternative viewer). This brought be to the next
> question:
>
> How can one match query terms with the analyze-chained documents in an
> efficient and distributed manner? My current understanding how to achieve
> this is the following:
>
> 1. Get the list of ids (contents) of the documents that match the query
> 2. Use the http://localhost:8983/solr/#/trans/analysis to re-analyze the
> document and the query
> 3. Use the matching of the substrings from the original text to last
> filter/tokenizer/analyzer in the analyze-chain to map the terms of the
> query
> 4. Emulate CTRL+F highlighting
>
> Web Interface of Solr offers quite a bit to advance towards this goal. If
> one fires this request:
>
> * analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955) was a
> German-born theoretical physicist[5] who developed the theory of
> relativity, one of the two pillars of modern physics (alongside quantum
> mechanics).&
> * analysis.query=reletivity theory
>
> to one of the cores of solr, one gets the steps 1-3 done:
>
>
> http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=xml&analysis.showmatch=true&analysis.fieldvalue=Albert%20Einstein%20(14%20March%201879%20%E2%80%93%2018%20April%201955)%20was%20a%20German-born%20theoretical%20physicist[5]%20who%20developed%20the%20theory%20of%20relativity,%20one%20of%20the%20two%20pillars%20of%20modern%20physics%20(alongside%20quantum%20mechanics).&analysis.query=reletivity%20theory&analysis.fieldtype=text_en
>
> Questions:
>
> 1. Is there a way to "load-balance" this? In the above url, I need to
> specify a specific core. Is it possible to generalize it, so the core that
> receives the request is not necessarily the one that processes it? Or this
> already is distributed in a sense that receiving core and processing cores
> are never the same?
>
> 2. The document was already analyze-chained. Is is possible to store this
> information so one does not need to re-analyze-chain it once more?
>
> Cheers
> Arturas
>
> On Fri, Mar 23, 2018 at 9:15 PM, Erick Erickson 
> wrote:
>
> > Arturas:
> >
> > Try to field-qualify your hl.q parameter. That looks like:
> >
> > hl.q=trans:Kundigung
> > or
> > hl.q=trans:Kündigung
> >
> > I saw the exact behavior you describe when I did _not_ specify the
> > field in the hl.q parameter, i.e.
> >
> > hl.q=Kundigung
> > or
> > hl.q=Kündigung
> >
> > didn't show all highlights.
> >
> > But when I did specify the field, it worked.
> >
> > Here's what I think is happening: Solr uses the default search
> > field when parsing an un-field-qualified query. I.e.
> >
> > q=something
> >
> > is parsed as
> >
> > q=default_search_field:something.
> >
> > The default field is controlled in solrconfig.xml with the "df"
> > parameter, you'll see entries like:
> > my_field
> >
> > Also when I changed the "df" parameter to the field I was highlighting
> > on, I didn't need to specify the field on the hl.q parameter.
> >
> > hl.q=Kundigung
> > or
> > hl.q=Kündigung
> >
> > The default  field is usually "text", which knows nothing about
> > the German-specific filters you've applied unless you changed it.
> >
> > So in the absence of a field-qualification for the hl.q parameter Solr
> > was parsing the query according to the analysis chain specifed
> > in your default field, and probably passed ü through without
> > transforming it. Since your indexing analysis chain for that field
> > folded ü to just plain u, it wasn't found or highlighted.
> >
> > On the surface, this does seem like something that should be
> > changed, I'll go ahead and ping the dev list.
> >
> > NOTE: I was trying this on Solr 7.1
> >
> > Best,
> > Erick
> >
> > On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika 
> > wrote:
> >

Re: PreAnalyzed FieldType, and simultaneously importing JSON

2018-04-02 Thread David Smiley

Hello Markus,

It appears you are not familiar with PreAnalyzedUpdateProcessor?  Using
that is much more flexible -- you could have different URP chains for your
use-cases. IMO PreAnalyzedField ought to go away.  I argued for the URP
version and thus it's superiority to the FieldType here:
https://issues.apache.org/jira/browse/SOLR-4619?focusedCommentId=13611191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13611191
Sadly, the FieldType is the one that is documented in the ref guide, but
not the URP :-(

~ David

On Thu, Mar 29, 2018 at 5:06 PM Markus Jelsma 
wrote:

> Hello,
>
> We want to move to PreAnalyzed FieldType to offload our very heavy
> analysis chain away from the search cluster, so we have to configure our
> fields to accept pre-analyzed tokens in production.
>
> But we use the same schema in development environments too, and that is
> where we use JSON files, or stream (export/import) data directly from
> production servers into a development environment, again via JSON. And in
> case of disaster recovery, we can import the daily exported JSON bzipped
> files back into our production servers.
>
> But this JSON loading does not work with PreAnalyzed FieldType. So to load
> JSON we must reset all fields back to their respective language specific
> FieldTypes on-the-fly, we could automate, but it is a hassle we like to
> avoid.
>
> Have i overlooked any configuration parameters that can help? Must we
> automate the on-the-fly schema reconfiguration and reset to PreAnalyzed
> after JSON loading is finished?
>
> Many thanks!
> Markus
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: querying vs. highlighting: complete freedom?

2018-04-03 Thread David Smiley

Thanks for your review!

On Tue, Apr 3, 2018 at 6:56 AM Arturas Mazeika  wrote:
...

> What I missed at the beginning of the documentation is the minimal set of
> requirements that is reacquired to have highlighting sensible: somehow I
> have a feeling that one needs some of the information stored in schema in
> some form. This of course is mentioned later on in the corresponding
> section, but I'd write this explicitly.
>

Explicitly say what up front?  "Requirements" are somewhat loose/minimal.
We ought to say clearly say that hl.fl fields need to be "stored".

...

> Is there a way to "load-balance" analyze-query-chain for the purpose of
> highlighting matches? In the url below, I need to specify a specific core.

...

I doubt it.  You'll have to do this yourself.  Why do you want to use this
for highlighting?  Is it to get the offsets returned to you?  There's a
JIRA or two for that already; someone ought to make that happen.
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: querying vs. highlighting: complete freedom?

2018-04-03 Thread David Smiley

On Tue, Apr 3, 2018 at 10:51 AM Arturas Mazeika  wrote:
...

>  Similarly, there's the
> hl.qparser parameter, but the documentation of that parameter is not as
> rich (the documentation says, that the default value is lucene). I am
> wondering are there other alternatives available? In case you are referring
> to other components, can you add a reference to those?
>

I'll make these links to other part of the ref guide to make it easier to
investigate.


> With respect to your question, why I'd like to use the analysis-chain for
> highlighting. That is a very good question: our end users cannot yet
> distinguish between highlighting capability of solr/information retrieval
> and search of the occurrences of the query terms in the documents. It is a
> rather difficult situation I am in. It is cool that there's a JIRA or two
> on the the load-balancing side.
>

No, I mean that there's a JIRA issue pertaining to exposing offsets from
highlighting output.  And I think there's a JIRA issue pertaining to being
able to post arbitrary text and highlight it.

Cheers,
  David
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: PreAnalyzed URP and SchemaRequest API

2018-04-05 Thread David Smiley

Is this really a problem when you could easily enough create a TextField
and call setTokenStream?

Does your remote client have Solr-core and all its dependencies on the
classpath?   That's one way to do it... and presumably the direction you
are going because you're asking how to work with PreAnalyzedParser which is
in solr-core.  *Alternatively*, only bring in Lucene core and construct
things yourself in the right format.  You could copy PreAnalyzedParser into
your codebase so that you don't have to reinvent any wheels, even though
that's awkward.  Perhaps that ought to be in Solrj?  But no we don't want
SolrJ depending on Lucene-core, though it'd make a fine "optional"
dependency.

On Wed, Apr 4, 2018 at 4:53 AM Markus Jelsma 
wrote:

> Hello,
>
> We intend to move to PreAnalyzed URP for analysis offloading. Browsing the
> Javadocs i came across the SchemaRequest API looking for a way to get a
> Field object remotely, which i seem to need for
> JsonPreAnalyzedParser.toFormattedString(Field f). But all i can get from
> SchemaRequest API is FieldTypeRepresentation, which offers me
> getIndexAnalyzer() but won't allow me to construct a Field object.
>
> So, to analyze remotely i do need an index-time analyzer. I can get it,
> but not turn it into a Field object, which the PreAnalyzedParser for some
> reason wants.
>
> Any hints here? I must be looking the wrong way.
>
> Many thanks!
> Markus
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: PreAnalyzed URP and SchemaRequest API

2018-04-12 Thread David Smiley

Ah ok.
I've wondered how much value there is in pre-analysis.  The serialization
of the analyzed form in JSON is bulky.  If you can share any results, I'd
be interested to hear how it went.  It's an optimization so you should be
able to know how much better it is.  Of course it isn't for everybody --
only when the analysis chain is sufficiently complex.

On Mon, Apr 9, 2018 at 9:45 AM Markus Jelsma 
wrote:

> Hello David,
>
> The remote client has everything on the class path but just calling
> setTokenStream is not going to work. Remotely, all i get from SchemaRequest
> API is a AnalyzerDefinition. I haven't found any Solr code that allows me
> to transform that directly into an analyzer. If i had that, it would make
> things easy.
>
> As far as i see it, i need to reconstruct a real Analyzer using
> AnalyzerDefinition's information. It won't be a problem, but it is
> cumbersome.
>
> Thanks anyway,
> Markus
>
> -Original message-
> > From:David Smiley 
> > Sent: Thursday 5th April 2018 19:38
> > To: solr-user@lucene.apache.org
> > Subject: Re: PreAnalyzed URP and SchemaRequest API
> >
> > Is this really a problem when you could easily enough create a TextField
> > and call setTokenStream?
> >
> > Does your remote client have Solr-core and all its dependencies on the
> > classpath?   That's one way to do it... and presumably the direction you
> > are going because you're asking how to work with PreAnalyzedParser which
> is
> > in solr-core.  *Alternatively*, only bring in Lucene core and construct
> > things yourself in the right format.  You could copy PreAnalyzedParser
> into
> > your codebase so that you don't have to reinvent any wheels, even though
> > that's awkward.  Perhaps that ought to be in Solrj?  But no we don't want
> > SolrJ depending on Lucene-core, though it'd make a fine "optional"
> > dependency.
> >
> > On Wed, Apr 4, 2018 at 4:53 AM Markus Jelsma  >
> > wrote:
> >
> > > Hello,
> > >
> > > We intend to move to PreAnalyzed URP for analysis offloading. Browsing
> the
> > > Javadocs i came across the SchemaRequest API looking for a way to get a
> > > Field object remotely, which i seem to need for
> > > JsonPreAnalyzedParser.toFormattedString(Field f). But all i can get
> from
> > > SchemaRequest API is FieldTypeRepresentation, which offers me
> > > getIndexAnalyzer() but won't allow me to construct a Field object.
> > >
> > > So, to analyze remotely i do need an index-time analyzer. I can get it,
> > > but not turn it into a Field object, which the PreAnalyzedParser for
> some
> > > reason wants.
> > >
> > > Any hints here? I must be looking the wrong way.
> > >
> > > Many thanks!
> > > Markus
> > >
> > --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: PreAnalyzed URP and SchemaRequest API

2018-04-13 Thread David Smiley

Yes I could imagine big gains from this strategy if OpenNLP is in the
analysis chain ;-)

On Fri, Apr 13, 2018 at 5:01 PM Markus Jelsma 
wrote:

> Hello David,
>
> If JSON serialization is too bulky, we could also opt for
> SimplePreAnalyzed right? At least as a FieldType it is possible, if not
> with URP, it just needs some work.
>
> Regarding results; we haven't done it yet, and won't for some time, but we
> will when we reintroduce OpenNLP in the analysis chain. We tried to
> introduce POS-tagging on our own two years ago, but i wasn't suited for
> production because it was too heavy on the CPU. Indexing data suddenly took
> eight to ten times longer in a SolrCloud environment with three replica's.
>
> If we offload our current chains without OpenNLP, it will only benefit
> when large fields pass through a regex, and for decompounding the Germanic
> languages we ingest. Offloading just this cost is a micro optimization,
> offloading the various OpenNLP char and token filters are really beneficial.
>
> Regarding a dependency on Lucene core and analysis-common, it would be
> helpful, but we'll manage.
>
> Thanks again,
> Markus
>
> -Original message-
> > From:David Smiley 
> > Sent: Thursday 12th April 2018 19:16
> > To: solr-user@lucene.apache.org
> > Subject: Re: PreAnalyzed URP and SchemaRequest API
> >
> > Ah ok.
> > I've wondered how much value there is in pre-analysis.  The serialization
> > of the analyzed form in JSON is bulky.  If you can share any results, I'd
> > be interested to hear how it went.  It's an optimization so you should be
> > able to know how much better it is.  Of course it isn't for everybody --
> > only when the analysis chain is sufficiently complex.
> >
> > On Mon, Apr 9, 2018 at 9:45 AM Markus Jelsma  >
> > wrote:
> >
> > > Hello David,
> > >
> > > The remote client has everything on the class path but just calling
> > > setTokenStream is not going to work. Remotely, all i get from
> SchemaRequest
> > > API is a AnalyzerDefinition. I haven't found any Solr code that allows
> me
> > > to transform that directly into an analyzer. If i had that, it would
> make
> > > things easy.
> > >
> > > As far as i see it, i need to reconstruct a real Analyzer using
> > > AnalyzerDefinition's information. It won't be a problem, but it is
> > > cumbersome.
> > >
> > > Thanks anyway,
> > > Markus
> > >
> > > -Original message-
> > > > From:David Smiley 
> > > > Sent: Thursday 5th April 2018 19:38
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: PreAnalyzed URP and SchemaRequest API
> > > >
> > > > Is this really a problem when you could easily enough create a
> TextField
> > > > and call setTokenStream?
> > > >
> > > > Does your remote client have Solr-core and all its dependencies on
> the
> > > > classpath?   That's one way to do it... and presumably the direction
> you
> > > > are going because you're asking how to work with PreAnalyzedParser
> which
> > > is
> > > > in solr-core.  *Alternatively*, only bring in Lucene core and
> construct
> > > > things yourself in the right format.  You could copy
> PreAnalyzedParser
> > > into
> > > > your codebase so that you don't have to reinvent any wheels, even
> though
> > > > that's awkward.  Perhaps that ought to be in Solrj?  But no we don't
> want
> > > > SolrJ depending on Lucene-core, though it'd make a fine "optional"
> > > > dependency.
> > > >
> > > > On Wed, Apr 4, 2018 at 4:53 AM Markus Jelsma <
> markus.jel...@openindex.io
> > > >
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > We intend to move to PreAnalyzed URP for analysis offloading.
> Browsing
> > > the
> > > > > Javadocs i came across the SchemaRequest API looking for a way to
> get a
> > > > > Field object remotely, which i seem to need for
> > > > > JsonPreAnalyzedParser.toFormattedString(Field f). But all i can get
> > > from
> > > > > SchemaRequest API is FieldTypeRepresentation, which offers me
> > > > > getIndexAnalyzer() but won't allow me to construct a Field object.
> > > > >
> > > > > So, to analyze remotely i do need an index-time analyzer. I can
> get it,
> > > > > but not turn it into a Field object, which the PreAnalyzedParser
> for
> > > some
> > > > > reason wants.
> > > > >
> > > > > Any hints here? I must be looking the wrong way.
> > > > >
> > > > > Many thanks!
> > > > > Markus
> > > > >
> > > > --
> > > > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > > > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > > > http://www.solrenterprisesearchserver.com
> > > >
> > >
> > --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: ClassCastException: o.a.l.d.Field cannot be cast to o.a.l.d.StoredField

2018-04-26 Thread David Smiley

I'm not sure but I wonder why you would want to cast it in the first
place.  Field is the base class; all it's subclasses are in one way or
another utilities/conveniences.  In other words, if you ever see code
casting Field to some subclass, there's a good chance it's fundamentally
wrong or making assumptions that aren't necessarily true.

If the problem you saw appears sporadic, there's a good chance it is in
some way related to updateLog replay.

On Tue, Apr 24, 2018 at 7:13 AM Markus Jelsma 
wrote:

> Hello,
>
> We have a DocumentTransformer that gets a Field from the SolrDocument and
> casts it to StoredField (although aparently we don't need to cast). This
> works well in tests and fine in production, except for some curious,
> unknown and unreproducible, cases, throwing the ClassCastException.
>
> I can, and will, just remove the cast to fix the rare exception, but in
> what cases could the exception get thrown?
>
> Many thanks,
> Markus
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

2018-04-26 Thread David Smiley

Yay!  I'm glad the UnifiedHighlighter is serving you well.  I was about to
suggest it.  If you think the fragmentation/snippeting could be improved in
a general way then post a JIRA for consideration.  Note: identical results
with the original Highlighter is a non-goal.

On Mon, Apr 23, 2018 at 10:14 PM howed  wrote:

> Finally got back to looking at this, and found that the solution was to
> switch to the  unified
> <
> https://lucene.apache.org/solr/guide/7_2/highlighting.html#choosing-a-highlighter>
>
> highlighter which doesn't seem to have the same problem with my complex
> synonyms.  This required some tweaking of the highlighting parameters and
> my
> code as it doesn't highlight exactly the same as the default highlighter,
> but all is working now.
>
> Thanks again for the assistance.
>
> David
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: ClassCastException: o.a.l.d.Field cannot be cast to o.a.l.d.StoredField

2018-04-26 Thread David Smiley

> but how would a DocumentTransformer affect UpdateLog replay?

Oh right; nevermind that silly theory ;-)

On Thu, Apr 26, 2018 at 10:42 AM Markus Jelsma 
wrote:

> Hello David,
>
> Yes it was sporadic indeed, but how would a DocumentTransformer affect
> UpdateLog replay?
>
> We removed the cast, no idea how it got there.
>
> Thanks,
> Markus
>
> -Original message-
> > From:David Smiley 
> > Sent: Thursday 26th April 2018 16:31
> > To: solr-user@lucene.apache.org
> > Subject: Re: ClassCastException: o.a.l.d.Field cannot be cast to
> o.a.l.d.StoredField
> >
> > I'm not sure but I wonder why you would want to cast it in the first
> > place.  Field is the base class; all it's subclasses are in one way or
> > another utilities/conveniences.  In other words, if you ever see code
> > casting Field to some subclass, there's a good chance it's fundamentally
> > wrong or making assumptions that aren't necessarily true.
> >
> > If the problem you saw appears sporadic, there's a good chance it is in
> > some way related to updateLog replay.
> >
> > On Tue, Apr 24, 2018 at 7:13 AM Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hello,
> > >
> > > We have a DocumentTransformer that gets a Field from the SolrDocument
> and
> > > casts it to StoredField (although aparently we don't need to cast).
> This
> > > works well in tests and fine in production, except for some curious,
> > > unknown and unreproducible, cases, throwing the ClassCastException.
> > >
> > > I can, and will, just remove the cast to fix the rare exception, but in
> > > what cases could the exception get thrown?
> > >
> > > Many thanks,
> > > Markus
> > >
> > --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Impact/Performance of maxDistErr

2018-05-29 Thread David Smiley

Hello Jens,
With solr.RptWithGeometrySpatialField, you always get an accurate result
thanks to the "WithGeometry" part.  The "Rpt" part is a grid index, and
most of the parameters pertain to that.  maxDistErr controls the highest
resolution grid.  No shape will be indexed to higher resolutions than this,
though may be courser resolutions dependent on distErrPct.  The
configuration you chose initially (that turned out to be slow for you) was
a meter, and then you changed it to a kilometer and got fast indexing
results.  I figure the size of your indexed shapes are on average a
kilometer in size (give or take an order of magnitude).  It's hard to guess
how your query shapes compare to your indexed shapes as there are multiple
possibilities that could yield similar query performance when changing
maxDistErr so much.

The bottom line is that you should dial up maxDistErr as much as you can
get away with it -- which is as long as query performance is good.  So you
did the right thing :-).  That number will probably be a distance somewhat
less than the average indexed shape diameter, or average query shape
diameter, whichever is greater.  Perhaps 1/10th smaller; if I had to pick.
The default setting, I think a meter, is probably not a good default for
this field type.

Note you could also try increasing distErrPct some, maybe to as much as
.25, though I wouldn't go much higher., as it may yield gridded shapes that
are so course as to not have interior cells.  Depending on what your query
shapes typically look like and indexed shapes relative to each other, that
may be significant or may not be.  If the indexed shapes are often much
larger than your query shape then it's significant.

~ David

On Fri, May 25, 2018 at 6:59 AM Jens Viebig  wrote:

> Hello,
>
> we are indexing a polygon with 4 points (non-rectangular, field-of-view of
> a camera) in a RptWithGeometrySpatialField alongside some more fields, to
> perform searches that check if a point is within this polygon
>
> We started using the default configuration found in several examples
> online:
>
> 
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>geo="true" distErrPct="0.15" maxDistErr="0.001"
> distanceUnits="kilometers" />
>
> We discovered that with this setting the indexing (soft commit) speed is
> very slow
> For 1 documents it takes several minutes to finish the commit
>
> If we disable this field, indexing+soft commit is only 3 seconds for 1
> docs,
> if we set maxDistErr to 1, indexing speed is at around 5 seconds, so a
> huge performance gain against the several minutes we had before
>
> I tried to find out via the documentation whats the impact of "maxDistErr"
> on search results but didn't quite find an in-depth explanation
> From our tests we did, the search results still seem to be very accurate
> even if the covered space of the polygon is less then 1km and search speed
> did not suffer.
>
> So i would love to learn more about the differences on having
> maxDistErr="0.001" vs maxDistErr="1" on a RptWithGeometrySpatialField and
> what problems could we run into with the bigger value
>
> Thanks
> Jens
>
>
>
>
> *Jens Viebig*
>
> Software Development
>
> MAM Products
>
>
> T. +49-(0)4307-8358-0 <+49%204307%2083580>
>
> E. jens.vie...@vitec.com
>
> *http://www.vitec.com *
>
>
>
> [image: VITEC_logo_for_email_signature]
>
>
>
> --
>
> VITEC GmbH, 24223 Schwentinental
>
> Geschäftsführer/Managing Director: Philippe Wetzel
> HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603
>
>
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Impact/Performance of maxDistErr

2018-05-30 Thread David Smiley

I suggest using the "Intersects" spatial predicate when either the data is
all points or if the query is a point.  It's semantically equivalent and
the algorithm is much faster.

On Wed, May 30, 2018 at 3:25 AM Jens Viebig  wrote:

> Thanks for the detailed answer David, that helps a lot to understand!
> Best Regards
>
> Jens
>
> P.S. Currently the only search we are doing on the polygon is
> Contains(POINT(x,y))
>
>
> Am 29.05.2018 um 13:30 schrieb David Smiley:
>
> Hello Jens,
> With solr.RptWithGeometrySpatialField, you always get an accurate result
> thanks to the "WithGeometry" part.  The "Rpt" part is a grid index, and
> most of the parameters pertain to that.  maxDistErr controls the highest
> resolution grid.  No shape will be indexed to higher resolutions than this,
> though may be courser resolutions dependent on distErrPct.  The
> configuration you chose initially (that turned out to be slow for you) was
> a meter, and then you changed it to a kilometer and got fast indexing
> results.  I figure the size of your indexed shapes are on average a
> kilometer in size (give or take an order of magnitude).  It's hard to guess
> how your query shapes compare to your indexed shapes as there are multiple
> possibilities that could yield similar query performance when changing
> maxDistErr so much.
>
> The bottom line is that you should dial up maxDistErr as much as you can
> get away with it -- which is as long as query performance is good.  So you
> did the right thing :-).  That number will probably be a distance somewhat
> less than the average indexed shape diameter, or average query shape
> diameter, whichever is greater.  Perhaps 1/10th smaller; if I had to pick.
> The default setting, I think a meter, is probably not a good default for
> this field type.
>
> Note you could also try increasing distErrPct some, maybe to as much as
> .25, though I wouldn't go much higher., as it may yield gridded shapes that
> are so course as to not have interior cells.  Depending on what your query
> shapes typically look like and indexed shapes relative to each other, that
> may be significant or may not be.  If the indexed shapes are often much
> larger than your query shape then it's significant.
>
> ~ David
>
> On Fri, May 25, 2018 at 6:59 AM Jens Viebig  wrote:
>
>> Hello,
>>
>> we are indexing a polygon with 4 points (non-rectangular, field-of-view
>> of a camera) in a RptWithGeometrySpatialField alongside some more fields,
>> to perform searches that check if a point is within this polygon
>>
>> We started using the default configuration found in several examples
>> online:
>>
>> >
>> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>>geo="true" distErrPct="0.15" maxDistErr="0.001"
>> distanceUnits="kilometers" />
>>
>> We discovered that with this setting the indexing (soft commit) speed is
>> very slow
>> For 1 documents it takes several minutes to finish the commit
>>
>> If we disable this field, indexing+soft commit is only 3 seconds for
>> 1 docs,
>> if we set maxDistErr to 1, indexing speed is at around 5 seconds, so a
>> huge performance gain against the several minutes we had before
>>
>> I tried to find out via the documentation whats the impact of
>> "maxDistErr" on search results but didn't quite find an in-depth explanation
>> From our tests we did, the search results still seem to be very accurate
>> even if the covered space of the polygon is less then 1km and search speed
>> did not suffer.
>>
>> So i would love to learn more about the differences on having
>> maxDistErr="0.001" vs maxDistErr="1" on a RptWithGeometrySpatialField and
>> what problems could we run into with the bigger value
>>
>> Thanks
>> Jens
>>
>>
>>
>>
>> *Jens Viebig*
>>
>> Software Development
>>
>> MAM Products
>>
>>
>> T. +49-(0)4307-8358-0 <+49%204307%2083580>
>>
>> E. jens.vie...@vitec.com
>>
>> *http://www.vitec.com <http://www.vitec.com>*
>>
>>
>>
>> [image: VITEC_logo_for_email_signature]
>>
>>
>>
>> --
>>
>> VITEC GmbH, 24223 Schwentinental
>>
>> Geschäftsführer/Managing Director: Philippe Wetzel
>> HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603
>>
>>
>>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
>
> --
>
>
> *Jens Viebig*
>
> Software Development
>
> MAM Products
>
>
> T. +49-(0)4307-8358-0 <+49%204307%2083580>
>
> E. jens.vie...@vitec.com
>
> *http://www.vitec.com <http://www.vitec.com>*
>
>
>
> [image: VITEC_logo_for_email_signature]
>
>
>
> --
>
> VITEC GmbH, 24223 Schwentinental
>
> Geschäftsführer/Managing Director: Philippe Wetzel
> HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603
>
>
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Syntax error while parsing Spatial Query as string

2020-02-14 Thread David Smiley

You are asking on solr-user but your scenario seems pure Lucene.

For Lucene and indexing point-data, I strongly recommend LatLonPoint.  For
Solr, same scenario, the Solr adaptation of the same functionality is
LatLonPointSpatialField.  I know this doesn't directly address your
question.  Just looking at your email and reported error, it seems you are
supplying some custom syntax.  If you wish to proceed with the
SpatialStrategy/Spatial4j based framework, then see SpatialExample.java in
the tests which serve as documentation by example.  FYI PointVectorStrategy
is slated for removal in 9.0 as it's obsolete.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

On Fri, Feb 14, 2020 at 6:47 AM vas aj  wrote:

> Hi team,
>
> I am using Lucene 6.6.2, Spatial4j 0.7, lucene-spatial-extras 6.6.2. I am
> trying to create a Spatial Query string for a given longitude, latitude &
> radius in miles.
>
> The query string generated using SpatialHelper (code as attached ) for
> long: -122.8515139 & lat: 45.5099231 in .25 miles radius  is as follow :
>
> #(+location__x:[-122.85667708964212 TO -122.84635071035788]
> +location__y:[45.50630481040378 TO 45.51354138959622])
> #frange(DistanceValueSource(PointVectorStrategy field:location
> ctx=SpatialContext.GEO, Pt(x=-122.8515139,y=45.5099231))):[0.0 TO
> 0.00361828959621958]
>
> My lucene index is as follows:
> create lucene index --name=myLuceneIndex --region=stations --field=title
> --analyzer=org.apache.lucene.analysis.en.EnglishAnalyzer
>
> I get error
> Syntax Error, cannot parse
> ConstantScore(#(+location__x:[-122.85667708964212 TO -122.84635071035788]
> +location__y:[45.50630481040378 TO 45.51354138959622])
> #frange(DistanceValueSource(PointVectorStrategy field:location
> ctx=SpatialContext.GEO, Pt(x=-122.8515139,y=45.5099231))):[0.0 TO
> 0.00361828959621958]):
>
> What am I doing wrong ?
>
> Regards,
> Aj
>

Re: Unified highlighter- unable to get results - can get results with original and termvector highlighters

2020-05-22 Thread David Smiley

Hello,

Did you get it to work eventually?

Try setting hl.weightMatches=false and see if that helps.  Wether this
helps or not, I'd like to have a deeper understanding of the internal
structure of the Query (not the original query string).  What query parser
are you using?.  If you pass debug=query to Solr then you'll get a a parsed
version of the query that would be helpful to me.

~ David


On Mon, May 11, 2020 at 10:46 AM Warren, David [USA] 
wrote:

> I am running Solr 8.4 and am attempting to use its highlighting feature.
> It appears to work well when I use the original highlighter or the term
> vector highlighter, but when I try to use the unified highlighter, I get no
> results returned.  My Google searches so far have not revealed anybody
> having this same problem (perhaps user error on my part), hence why I’m
> asking a question to the Solr mailing list.
>
> I am running a query which searches the “title_text” field for a term and
> highlights it.
> The configuration for title_text is this:
>  multiValued="true" termVectors="true"/>
>
> The query looks like this:
>
> https://solr-server/index/c1/select?hl.fl=title_text&hl.method=unified&hl=true&q=
> title_text%3Azelda
>
> If hl.method=original or hl.method=termvector, I get back results in the
> highlighting section with “Zelda” surrounded by  tags.
> If hl.method=unified, all results in the highlighting section are blank.
>
> I’ve attached a remote debugger to my Solr server and verified that the
> unified highlighter class
> (org/apache/solr/highlight/UnifiedSolrHighlighter.java) is being invoked
> when I set hl.method=unified.  And I do not see any errors in the Solr logs.
>
> Any idea what I’m doing wrong? In looking at the Solr highlighting
> documentation, I didn’t see any additional configuration which needs to be
> done to get the unified highlighter to work.
>
> I realize I have not provided a bunch of information here, but obviously
> can provide more if needed.
>
> Thank you,
> David Warren
> Booz | Allen | Hamilton
> 703-625-0311 mobile
>
>

Re: Highlighting Solr 8

2020-05-22 Thread David Smiley

What did you end up doing, Eric?  Did you migrate to the Unified
Highlighter?
~ David


On Wed, Oct 16, 2019 at 4:36 PM Eric Allen 
wrote:

> Thanks for the reply.
>
> Currently we are migrating from solr4 to solr8 under solr 4 we wrote our
> own highlighter because the provided one was too slow for our documents.
>
> We deal with many large documents, but we have full term vectors already.
> So as I understand it from my reading of the code the unified highlighter
> should be fast even on these large documents.
>
> The concern about alternate fields was if the highlighter was slow we
> could just return highlights from one field if they existed and if not then
> highlight the other fields.
>
> From my research I'm leaning towards returning highlights from all the
> fields we are interested in because I feel it will be fast.
>
> Eric Allen - Software Devloper, NetDocuments
> eric.al...@netdocuments.com | O: 801.989.9691 | C: 801.989.9691
>
> -Original Message-
> From: sasarun 
> Sent: Wednesday, October 16, 2019 2:45 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Highlighting Solr 8
>
> Hi Eric,
>
> Unified highlighter does not have an option to provide alternate field
> when highlighting. That option is available with Orginal and fast vector
> highlighter. As indicated in the Solr documentation, Unified is the
> recommended method for highlighting to meet most of the use cases. Please
> do share more details in case you are facing any specific issue with
> highlighting.
>
> Thanks,
>
> Arun
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
>

Re: Unified highlighter with storeOffsetsWithPositions and termVectors giving an exception

2020-05-22 Thread David Smiley

FWIW I tried this on the techproducts schema with a modification to the
name field, but did not see the issue.

I suspect you did not re-index after making these schema changes.  If you
did, then also check that the collection (or core) truly started fresh
(never had any previous schema) because if you tried it one way then merely
deleted/replaced the documents after changing the schema, then some
internal metadata in the underlying index data tends to persist.  I suspect
some of the options flipped here might stay sticky.

If that really isn't it, then you might suggest to me exactly how to
reproduce this from what Solr ships with, like the techproducts example
schema and dataset.

~ David

On Sun, Jul 21, 2019 at 10:07 PM Richard Walker 
wrote:

> On 22 Jul 2019, at 11:32 am, Richard Walker 
> wrote:
> > I'm trying out the advice in the user guide
> > (
> https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations
> )
> > for using the unified highlighter.
> >
> > ...
> > * "set storeOffsetsWithPositions to true"
> > * "set termVectors to true but no other term vector
> >  related options on the field being highlighted"
> ...
>
> I completely forgot to mention that I also tried _just_:
>
> > * "set storeOffsetsWithPositions to true"
>
> i.e., without _also_ setting termVectors, and this _doesn't_
> give the exception.
>
> So it seems to be the _combination_ of:
> * unified highlighter
> * storeOffsetsWithPositions
> * termVectors
>
> that seems to be giving the exception.
>
>

Re: unified highlighter methods works unexpected

2020-05-22 Thread David Smiley

Hi Roland,

I was not able to reproduce this.  I modified the tech_products same config
to change the name field to use a new field type that had a trivial
edgengram config.  Then I composed this query based. alittle on some of
your parameters, and it did find highlights:
http://localhost:8983/solr/techproducts/select?defType=edismax&fl=id%2Cname&hl.fl=name&hl.method=unified&hl=on&mm=3%3C74%25&q=%22hard%20dri%22&qf=name%20text&stopwords=true&tie=0.1

If you could help me in telling me reproducibility instructions with
tech_products, then I can help diagnose the underlying problem and possibly
fix.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Apr 2, 2020 at 9:02 AM Szűcs Roland 
wrote:

> Hi All,
>
> I use Solr 8.4.1 and implement suggester functionality. As part of the
> suggestions I would like to show product info so I had to implement this
> functionality with normal query parsers instead of suggester component. I
> applied an edgengramm filter without stemming to fasten the analysis of the
> query which is crucial for the suggester functionality.
> I could use the Highlight component with edismax query parser without any
> problem. This is a typical output if hl.method=original (this is the
> default):
> { "responseHeader":{ "status":0, "QTime":4, "params":{ "mm":"3<74%",
> "q":"Arany
> Já", "tie":"0.1", "defType":"edismax", "hl":"true", "echoParams":"all", "qf
> ":"author_ngram^5 title_ngram^10", "fl":"id,imageUrl,title,price",
> "pf":"author_ngram^15
> title_ngram^30", "hl.fl":"title", "hl.method":"original", "_":
> "1585830768672"}}, "response":{"numFound":2,"start":0,"docs":[ {
> "id":"369",
> "title":"Arany János összes költeményei", "price":185.0, "imageUrl":"
> https://cdn.bknw.net/prd/covers_big/369.jpg"}, { "id":"26321",
> "title":"Arany
> János összes költeményei", "price":1400.0, "imageUrl":"
> https://cdn.bknw.net/prd/covers_big/26321.jpg"}] }, "highlighting":{
> "369":{
> "title":["\n \n Arany\n \n János összes költeményei"]}, "
> 26321":{ "title":["\n \n Arany\n \n János összes
> költeményei"]}}}
>
> If I change the method to unified, I get unexpected result:
> { "responseHeader":{ "status":0, "QTime":5, "params":{ "mm":"3<74%",
> "q":"Arany
> Já", "tie":"0.1", "defType":"edismax", "hl":"true", "echoParams":"all", "qf
> ":"author_ngram^5 title_ngram^10", "fl":"id,imageUrl,title,price",
> "pf":"author_ngram^15
> title_ngram^30", "hl.fl":"title", "hl.method":"unified",
> "_":"1585830768672"
> }}, "response":{"numFound":2,"start":0,"docs":[ { "id":"369",
> "title":"Arany
> János összes költeményei", "price":185.0, "imageUrl":"
> https://cdn.bknw.net/prd/covers_big/369.jpg"}, { "id":"26321",
> "title":"Arany
> János összes költeményei", "price":1400.0, "imageUrl":"
> https://cdn.bknw.net/prd/covers_big/26321.jpg"}] }, "highlighting":{
> "369":{
> "title":[]}, "26321":{ "title":[]}}}
>
> Any idea why the newest method fails to deliver the same results?
>
> Thanks,
> Roland
>

Re: Alternate Fields for Unified Highlighter

2020-05-22 Thread David Smiley

Feel free to file an issue; I know it's not supported.  I also don't think
it's a big deal because you can just ask Solr to return the
"alternateField", thus letting the client side choose when to use that.  I
suppose it might be large, so I can imagine a concern there.  It'd be nice
if Solr had a DocTransformer to accomplish that.

I know it's been awhile; I'm curious how the UH has been working for you,
assuming you are using it.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

On Sun, Jun 2, 2019 at 6:47 AM Furkan KAMACI  wrote:

> Hi All,
>
> I want to switch to Unified Highlighter due to performance reasons for my
> Solr 7.6 I was using these fields
>
> solrQuery.addHighlightField("content_*")
> .set("f.content_en.hl.alternateField", "content")
> .set("f.content_es.hl.alternateField", "content")
> .set("hl.useFastVectorHighlighter", "true");
> .set("hl.maxAlternateFieldLength", 300);
>
> As far as I see, there is no definitions for alternate fields for unified
> highlighter. How can I configure such a configuration?
>
> Kind Regards,
> Furkan KAMACI
>

Re: hl.preserveMulti in Unified highlighter?

2020-05-22 Thread David Smiley

Hi Walter,

No, the UnifiedHighlighter does not behave as if this setting were true.

The docs say:

`hl.preserveMulti`::
If `true`, multi-valued fields will return all values in the order they
were saved in the index. If `false`, the default, only values that match
the highlight request will be returned.

The first sentence there is the essence of it.  Notice it's not conditional
on wether there are highlights or not.  The UH won't return values lacking
a highlight. Even hl.defaultSummary isn't triggered because *some* of the
values have a highlight.

As I look at the pertinent code right now, I imagine a solution would be to
provide a custom PassageFormatter.  If we can assume for this use-case that
you can use hl.bs.type=WHOLE as well, then a a simpler PassageFormatter
could basically ignore the passage starts & ends and merely mark up the
original content in entirety, which is a null concatenated sequence of all
the values for this field for a document.

~ David

On Fri, Mar 29, 2019 at 2:02 PM Walter Underwood 
wrote:

> We are testing 6.6.1.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Mar 29, 2019, at 11:02 AM, Walter Underwood 
> wrote:
> >
> > In testing, hl.preserveMulti=true works with the unified highlighter.
> But the documentation says that the parameter is only implemented in the
> original highlighter.
> >
> > Is the documentation wrong? Can we trust this to keep working with
> unified?
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Mar 26, 2019, at 12:08 PM, Walter Underwood 
> wrote:
> >>
> >> It looks like hl.preserveMulti is only implemented in the Original
> highlighter. Has anyone looked at doing this for the Unified highlighter?
> >>
> >> We need to preserve order in the highlights for a multi-valued field.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org 
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >
>
>

Re: Creating custom PassageFormatter

2020-05-22 Thread David Smiley

You've probably gotten you answer now but "no".  Basically, you'd need to
specify your own subclass of UnifiedSolrHighlighter in solrconfig.xml like
this:


   Error loading class 'solr.highlight.CustomPassageFormatter'".
>
> Example from solrconfig.xml:
>  class="solr.highlight.CustomPassageFormatter">
> 
>
> I'm asking if this is still the right way? Is the "formatter" tag in XML
> valid option for Unified Highlighter?
>
> Thank you.
>
> Kind regards,
>   Damjan
>

Re: hl.preserveMulti in Unified highlighter?

2020-05-23 Thread David Smiley

Better late than never?  I added some new mail filters to bring topics of
interest to my attention.

Any way; this seems like an important use-case.

Anthony:  You'd probably benefit from also setting hl.bs.type=WHOLE since
clearly you want whole values (no snippets/fragments of values).  If I get
around to implementing hl.preserveMulti for the UH, i'll have it make this
assumption likewise.

~ David


On Sat, May 23, 2020 at 1:48 PM Walter Underwood 
wrote:

> I’m a little amused that this thread has become active after almost two
> months of silence.
>
> I think we just used the old highlighter. I don’t even remember now.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On May 23, 2020, at 9:14 AM, Anthony Groves  wrote:
> >
> > Hi Walter,
> >
> > I did something very similar to what David is suggesting when switching
> > from the PostingsHighlighter to the UnifiedHighlighter in Solr 7.
> >
> > In order to include non-highlighted items (exact ordering) when using
> > preserveMulti, we used a custom PassageFormatter that ignored the start
> and
> > end offsets:
> >
> https://github.com/oreillymedia/ifpress-solr-plugin/blob/bf3b07c5be32fbcfa7b6fdfd439d511ef60dab68/src/main/java/com/ifactory/press/db/solr/highlight/HighlightFormatter.java#L35
> >
> > I was actually surprised to see not much of a performance hit from
> > essentially removing the offset usage, but our highlighted fields aren't
> > extremely large :-)
> >
> > Hope that helps!
> > Anthony
> >
> > *Anthony Groves*  | Technical Lead, Search
> >
> > O'Reilly Media, Inc.  | https://www.linkedin.com/in/anthonygroves/
> >
> >
> > On Fri, May 22, 2020 at 4:59 PM David Smiley 
> > wrote:
> >
> >> Hi Walter,
> >>
> >> No, the UnifiedHighlighter does not behave as if this setting were true.
> >>
> >> The docs say:
> >>
> >> `hl.preserveMulti`::
> >> If `true`, multi-valued fields will return all values in the order they
> >> were saved in the index. If `false`, the default, only values that match
> >> the highlight request will be returned.
> >>
> >>
> >> The first sentence there is the essence of it.  Notice it's not
> conditional
> >> on wether there are highlights or not.  The UH won't return values
> lacking
> >> a highlight. Even hl.defaultSummary isn't triggered because *some* of
> the
> >> values have a highlight.
> >>
> >> As I look at the pertinent code right now, I imagine a solution would
> be to
> >> provide a custom PassageFormatter.  If we can assume for this use-case
> that
> >> you can use hl.bs.type=WHOLE as well, then a a simpler PassageFormatter
> >> could basically ignore the passage starts & ends and merely mark up the
> >> original content in entirety, which is a null concatenated sequence of
> all
> >> the values for this field for a document.
> >>
> >> ~ David
> >>
> >>
> >> On Fri, Mar 29, 2019 at 2:02 PM Walter Underwood  >
> >> wrote:
> >>
> >>> We are testing 6.6.1.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>> On Mar 29, 2019, at 11:02 AM, Walter Underwood  >
> >>> wrote:
> >>>>
> >>>> In testing, hl.preserveMulti=true works with the unified highlighter.
> >>> But the documentation says that the parameter is only implemented in
> the
> >>> original highlighter.
> >>>>
> >>>> Is the documentation wrong? Can we trust this to keep working with
> >>> unified?
> >>>>
> >>>> wunder
> >>>> Walter Underwood
> >>>> wun...@wunderwood.org
> >>>> http://observer.wunderwood.org/  (my blog)
> >>>>
> >>>>> On Mar 26, 2019, at 12:08 PM, Walter Underwood <
> wun...@wunderwood.org
> >>>
> >>> wrote:
> >>>>>
> >>>>> It looks like hl.preserveMulti is only implemented in the Original
> >>> highlighter. Has anyone looked at doing this for the Unified
> highlighter?
> >>>>>
> >>>>> We need to preserve order in the highlights for a multi-valued field.
> >>>>>
> >>>>> wunder
> >>>>> Walter Underwood
> >>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> >>>>> http://observer.wunderwood.org/  (my blog)
> >>>>>
> >>>>
> >>>
> >>>
> >>
>
>

Re: highlighting a whole html document using Unified highlighter

2020-05-24 Thread David Smiley

Instead of stripping the HTML for the stored value, leave it be and remove
it during the analysis stage with solr.HTMLStripCharFilterFactory
<https://builds.apache.org/job/Solr-reference-guide-master/javadoc/charfilterfactories.html#solr-htmlstripcharfilterfactory>
This means the searchable text will only be the visible text, basically.
And the highlighter will only highlight what's searchable.

I suggest doing some experimentation for searching for words that you know
are directly adjacent (no spaces) to opening and closing tags to make sure
that the inserted HTML markup for the highlight balance correctly.  Use a
"phrase query" (quoted) as well, and see if you can highlight around markup
like "phrasequery" to see what happens.  You might need to set
hl.weightMatches=false to ensure the words separately are highlighted.  I
suspect you will find there is a problem, and the root cause is here:
LUCENE-5734 <https://issues.apache.org/jira/browse/LUCENE-5734>   It's on
my long TODO list but hasn't bitten me lately so I've neglected it.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

On Sun, May 24, 2020 at 7:20 AM Serkan KAZANCI 
wrote:

> Thanks Jörn for the answer,
>
> I use post tool to index html documents, so the html tags are stripped
> when indexed and stored. The remaining text is mapped to the field content
> by default.
>
> hl.fragsize=0 works perfect for the indexed document, but I can only
> display highlighted text-only version of html document because the html
> tags are stripped.
>
> So is it possible to index and store the html document without stripping
> the html tags, so that when the document is displayed with hl.fragsize=0
> parameter, it is displayed as original html document?
>
> Or
>
> Is it possible to give a whole html document as a parameter to the Unified
> highlighter so that output is also a highlighted html document?
>
> Or
>
> Do you have a better idea to highlight the keywords of the whole html
> document?
>
>  Thanks,
>
>  Serkan
>
> -Original Message-
> From: Jörn Franke [mailto:jornfra...@gmail.com]
> Sent: Sunday, May 24, 2020 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: highlighting a whole html document using Unified highlighter
>
> hl.fragsize=0
>
> https://lucene.apache.org/solr/guide/8_5/highlighting.html
>
>
>
> > Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI :
> >
> > Hi,
> >
> >
> >
> > I use solr to search over a million html documents, when a document is
> > searched and displayed, I want to highlight the keywords that are used to
> > find and access the document.
> >
> >
> >
> > Unified highlighter is fast, accurate and supports different languages
> but
> > only highlights passages with given parameters.
> >
> >
> >
> > How can I highlight a whole html document using Unified highlighter? I
> have
> > written a php code but it cannot do the complex word stemming functions.
> >
> >
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Serkan
> >
>
>

Re: highlighting a whole html document using Unified highlighter

2020-05-24 Thread David Smiley

These strategies are not mutually exclusive.  Yes I do suggest having the
HTML in whole go into one searchable field to satisfy your highlighting
use-case.  But I can imagine you will also want some document metadata in
separate fields.  It's up to you to parse that out somehow and add it.  You
mentioned you are using bin/post but, IMO, that capability is more for
quick experimentation / tutorials, some POCs, or very simple use-cases.  I
doubt you can do what I suggest while still using bin/post.  You might be
able to use "SolrCell" AKA ExtractingRequestHandler directly, which is what
bin/post does with HTML.

Good luck!

~ David


On Sun, May 24, 2020 at 10:52 AM Serkan KAZANCI 
wrote:

> Hi David,
>
> I have many meta-tags in html documents like   content="2019-10-15T23:59:59Z"> which matches the field descriptions in
> schema file.
>
> As I understand, you propose to index the whole html document as one text
> file and map it to a search field (do you?) . That would take care of the
> html highlight issue, however I would lose the field information coming
> from meta-tags .
>
> So is it possible to index the html document as html document ?
> (preserving the field data coming from meta-tags and not strip the html
> tags)
>
> Then I could use solr.HTMLStripCharFilterFactory for analysis.
>
> Thank You,
>
> Serkan,
>
>
>
>
> -Original Message-
> From: David Smiley [mailto:dsmi...@apache.org]
> Sent: Sunday, May 24, 2020 5:26 PM
> To: solr-user
> Subject: Re: highlighting a whole html document using Unified highlighter
>
> Instead of stripping the HTML for the stored value, leave it be and remove
> it during the analysis stage with solr.HTMLStripCharFilterFactory
> <
> https://builds.apache.org/job/Solr-reference-guide-master/javadoc/charfilterfactories.html#solr-htmlstripcharfilterfactory
> >
> This means the searchable text will only be the visible text, basically.
> And the highlighter will only highlight what's searchable.
>
> I suggest doing some experimentation for searching for words that you know
> are directly adjacent (no spaces) to opening and closing tags to make sure
> that the inserted HTML markup for the highlight balance correctly.  Use a
> "phrase query" (quoted) as well, and see if you can highlight around markup
> like "phrasequery" to see what happens.  You might need to set
> hl.weightMatches=false to ensure the words separately are highlighted.  I
> suspect you will find there is a problem, and the root cause is here:
> LUCENE-5734 <https://issues.apache.org/jira/browse/LUCENE-5734>   It's on
> my long TODO list but hasn't bitten me lately so I've neglected it.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, May 24, 2020 at 7:20 AM Serkan KAZANCI 
> wrote:
>
> > Thanks Jörn for the answer,
> >
> > I use post tool to index html documents, so the html tags are stripped
> > when indexed and stored. The remaining text is mapped to the field
> content
> > by default.
> >
> > hl.fragsize=0 works perfect for the indexed document, but I can only
> > display highlighted text-only version of html document because the html
> > tags are stripped.
> >
> > So is it possible to index and store the html document without stripping
> > the html tags, so that when the document is displayed with hl.fragsize=0
> > parameter, it is displayed as original html document?
> >
> > Or
> >
> > Is it possible to give a whole html document as a parameter to the
> Unified
> > highlighter so that output is also a highlighted html document?
> >
> > Or
> >
> > Do you have a better idea to highlight the keywords of the whole html
> > document?
> >
> >  Thanks,
> >
> >  Serkan
> >
> > -Original Message-
> > From: Jörn Franke [mailto:jornfra...@gmail.com]
> > Sent: Sunday, May 24, 2020 1:22 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: highlighting a whole html document using Unified highlighter
> >
> > hl.fragsize=0
> >
> > https://lucene.apache.org/solr/guide/8_5/highlighting.html
> >
> >
> >
> > > Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI :
> > >
> > > Hi,
> > >
> > >
> > >
> > > I use solr to search over a million html documents, when a document is
> > > searched and displayed, I want to highlight the keywords that are used
> to
> > > find and access the document.
> > >
> > >
> > >
> > > Unified highlighter is fast, accurate and supports different languages
> > but
> > > only highlights passages with given parameters.
> > >
> > >
> > >
> > > How can I highlight a whole html document using Unified highlighter? I
> > have
> > > written a php code but it cannot do the complex word stemming
> functions.
> > >
> > >
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > >
> > > Serkan
> > >
> >
> >
>
>

Re: unified highlighter performance in solr 8.5.1

2020-05-25 Thread David Smiley

Wow that's terrible!
So this problem is for SENTENCE in particular, and it's a regression in
8.5?  I'll see if I can reproduce this with the Lucene benchmark module.

I figure you have some meaty text, like "page" size or longer?

~ David

On Mon, May 25, 2020 at 10:38 AM Michal Hlavac  wrote:

> I did same test on solr 8.4.1 and response times are same for both
> hl.bs.type=SENTENCE and hl.bs.type=WORD
>
> m.
>
> On pondelok 25. mája 2020 15:28:24 CEST Michal Hlavac wrote:
>
>
> Hi,
>
> I have field:
>  stored="true" indexed="false" storeOffsetsWithPositions="true"/>
>
> and configuration:
> true
> unified
> true
> content_txt_sk_highlight
> 2
> true
>
> Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which
> is really slow.
> Same query with hl.bs.type=WORD takes from 8 - 45 ms
>
> is this normal behaviour or should I create issue?
>
> thanks, m.
>
>
>

Re: unified highlighter performance in solr 8.5.1

2020-05-26 Thread David Smiley

Please create an issue.  I haven't reproduced it yet but it seems unlikely
to be user-error.

~ David


On Mon, May 25, 2020 at 9:28 AM Michal Hlavac  wrote:

> Hi,
>
> I have field:
>  stored="true" indexed="false" storeOffsetsWithPositions="true"/>
>
> and configuration:
> true
> unified
> true
> content_txt_sk_highlight
> 2
> true
>
> Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which
> is really slow.
> Same query with hl.bs.type=WORD takes from 8 - 45 ms
>
> is this normal behaviour or should I create issue?
>
> thanks, m.
>

Re: unified highlighter performance in solr 8.5.1

2020-05-27 Thread David Smiley

try setting hl.fragsizeIsMinimum=true
I did some benchmarking and found that this helps quite a bit


BTW I used the highlights.alg benchmark file, with some changes to make it
more reflective of your scenario -- offsets in postings, and used "enwiki"
(english wikipedia) docs which are larger than the Reuters ones (so it
appears, any way).  I had to do a bit of hacking to use the
"LengthGoalBreakIterator, which wasn't previously used by this framework.

~ David


On Tue, May 26, 2020 at 4:42 PM Michal Hlavac  wrote:

> fine, I'l try to write simple test, thanks
>
>
>
> On utorok 26. mája 2020 17:44:52 CEST David Smiley wrote:
>
> > Please create an issue.  I haven't reproduced it yet but it seems
> unlikely
>
> > to be user-error.
>
> >
>
> > ~ David
>
> >
>
> >
>
> > On Mon, May 25, 2020 at 9:28 AM Michal Hlavac  wrote:
>
> >
>
> > > Hi,
>
> > >
>
> > > I have field:
>
> > > 
> > > stored="true" indexed="false" storeOffsetsWithPositions="true"/>
>
> > >
>
> > > and configuration:
>
> > > true
>
> > > unified
>
> > > true
>
> > > content_txt_sk_highlight
>
> > > 2
>
> > > true
>
> > >
>
> > > Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms
> which
>
> > > is really slow.
>
> > > Same query with hl.bs.type=WORD takes from 8 - 45 ms
>
> > >
>
> > > is this normal behaviour or should I create issue?
>
> > >
>
> > > thanks, m.
>
> > >
>
> >
>
>

Re: Why Did It Match?

2020-05-29 Thread David Smiley

I've used the highlighter in the past for this but it has to do a lot more
work than "explain".  Typically that extra work is analysis of the fields'
text again.  Still; the highlighter can make sense when the individual
fields aren't otherwise searchable because you are searching on an
aggregate catch-all field.

~ David


On Thu, May 28, 2020 at 6:40 PM Walter Underwood 
wrote:

> Are you sure they will wonder? I’d try it without that and see if the
> simpler UI is easier to use. Simple almost always wins the A/B test.
>
> You can use the highlighter to see if a field matched a term. Only use
> explain if you need all the scores.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On May 28, 2020, at 3:37 PM, Webster Homer <
> webster.ho...@milliporesigma.com> wrote:
> >
> > Thank you.
> >
> > The problem is that Endeca just provided this information. The website
> users see how each search result matched the query.
> > For example this is displayed for a hit:
> > 1 Product Result
> >
> > |  Match Criteria: Material, Product Number
> >
> > The business users will wonder why we cannot provide this information
> with the new system.
> >
> > -Original Message-
> > From: Erick Erickson 
> > Sent: Thursday, May 28, 2020 4:38 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Why Did It Match?
> >
> > Yes, debug=explain is expensive. Expensive in the sense that I’d never
> add it to every query. But if your business users are trying to understand
> why query X came back the way it did by examining individual queries, then
> I wouldn’t worry.
> >
> > You can easily see how expensive it is in your situation by looking at
> the timings returned. Debug is just a component just like facet etc and the
> time it takes is listed separately in the timings section of debug output…
> >
> > Best,
> > Erick
> >
> >> On May 28, 2020, at 4:52 PM, Webster Homer <
> webster.ho...@milliporesigma.com> wrote:
> >>
> >> My concern was that I thought that explain is resource heavy, and was
> only used for debugging queries.
> >>
> >> -Original Message-
> >> From: Doug Turnbull 
> >> Sent: Thursday, May 21, 2020 4:06 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Why Did It Match?
> >>
> >> Is your concern that the Solr explain functionality is slower than
> Endecas?
> >> Or harder to understand/interpret?
> >>
> >> If the latter, I might recommend http://splainer.io as one solution
> >>
> >> On Thu, May 21, 2020 at 4:52 PM Webster Homer <
> webster.ho...@milliporesigma.com> wrote:
> >>
> >>> My company is working on a new website. The old/current site is
> >>> powered by Endeca. The site under development is powered by Solr
> >>> (currently 7.7.2)
> >>>
> >>> Out of the box, Endeca provides the capability to show how a query
> >>> was matched in the search. The business users like this
> >>> functionality, in solr this functionality is an expensive debug
> >>> option. Is there another way to get this information from a query?
> >>>
> >>> Webster Homer
> >>>
> >>>
> >>>
> >>> This message and any attachment are confidential and may be
> >>> privileged or otherwise protected from disclosure. If you are not the
> >>> intended recipient, you must not copy this message or attachment or
> >>> disclose the contents to any other person. If you have received this
> >>> transmission in error, please notify the sender immediately and
> >>> delete the message and any attachment from your system. Merck KGaA,
> >>> Darmstadt, Germany and any of its subsidiaries do not accept
> >>> liability for any omissions or errors in this message which may arise
> >>> as a result of E-Mail-transmission or for damages resulting from any
> >>> unauthorized changes of the content of this message and any
> >>> attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> >>> subsidiaries do not guarantee that this message is free of viruses
> >>> and does not accept liability for any damages caused by any virus
> transmitted therewith.
> >>>
> >>>
> >>>
> >>> Click http://www.merckgroup.com/disclaimer to access the German,
> >>> French, Spanish and Portuguese versions of this disclaimer.
> >>>
> >>
> >>
> >> --
> >> *Doug Turnbull **| CTO* | OpenSource Connections
> >> , LLC | 240.476.9983
> >> Author: Relevant Search ; Contributor:
> *AI Powered Search * This e-mail and all
> contents, including attachments, is considered to be Company Confidential
> unless explicitly stated otherwise, regardless of whether attachments are
> marked as such.
> >>
> >>
> >> This message and any attachment are confidential and may be privileged
> or otherwise protected from disclosure. If you are not the intended
> recipient, you must not copy this message or attachment or disclose the
> contents to any other person. If you have received this transmission in
> error, please notify the sender immediatel

Re: Facet Performance

2020-06-17 Thread David Smiley

I strongly recommend setting indexed=true on a field you facet on for the
purposes of efficient refinement (fq=field:value).  But it strictly isn't
required, as you have discovered.

~ David


On Wed, Jun 17, 2020 at 9:02 AM Michael Gibney 
wrote:

> facet.method=enum works by executing a query (against indexed values)
> for each indexed value in a given field (which, for indexed=false, is
> "no values"). So that explains why facet.method=enum no longer works.
> I was going to suggest that you might not want to set indexed=false on
> the docValues facet fields anyway, since the indexed values are still
> used for facet refinement (assuming your index is distributed).
>
> What's the number of unique values in the relevant fields? If it's low
> enough, setting docValues=false and indexed=true and using
> facet.method=enum (with a sufficiently large filterCache) is
> definitely a viable option, and will almost certainly be faster than
> docValues-based faceting. (As an aside, noting for future reference:
> high-cardinality facets over high-cardinality DocSet domains might be
> able to benefit from a term facet count cache:
> https://issues.apache.org/jira/browse/SOLR-13807)
>
> I think you didn't specifically mention whether you acted on Erick's
> suggestion of setting "uninvertible=false" (I think Erick accidentally
> said "uninvertible=true") to fail fast. I'd also recommend doing that,
> perhaps even above all else -- it shouldn't actually *do* anything,
> but will help ensure that things are behaving as you expect them to!
>
> Michael
>
> On Wed, Jun 17, 2020 at 4:31 AM James Bodkin
>  wrote:
> >
> > Thanks, I've implemented some queries that improve the first-hit
> execution for faceting.
> >
> > Since turning off indexed on those fields, we've noticed that
> facet.method=enum no longer returns the facets when used.
> > Using facet.method=fc/fcs is significantly slower compared to
> facet.method=enum for us. Why do these two differences exist?
> >
> > On 16/06/2020, 17:52, "Erick Erickson"  wrote:
> >
> > Ok, I see the disconnect... Necessary parts if the index are read
> from disk
> > lazily. So your newSearcher or firstSearcher query needs to do
> whatever
> > operation causes the relevant parts of the index to be read. In this
> case,
> > probably just facet on all the fields you care about. I'd add
> sorting too
> > if you sort on different fields.
> >
> > The *:* query without facets or sorting does virtually nothing due
> to some
> > special handling...
> >
> > On Tue, Jun 16, 2020, 10:48 James Bodkin <
> james.bod...@loveholidays.com>
> > wrote:
> >
> > > I've been trying to build a query that I can use in newSearcher
> based off
> > > the information in your previous e-mail. I thought you meant to
> build a *:*
> > > query as per Query 1 in my previous e-mail but I'm still seeing the
> > > first-hit execution.
> > > Now I'm wondering if you meant to create a *:* query with each of
> the
> > > fields as part of the fl query parameters or a *:* query with each
> of the
> > > fields and values as part of the fq query parameters.
> > >
> > > At the moment I've been running these manually as I expected that
> I would
> > > see the first-execution penalty disappear by the time I got to
> query 4, as
> > > I thought this would replicate the actions of the newSeacher.
> > > Unfortunately we can't use the autowarm count that is available as
> part of
> > > the filterCache/filterCache due to the custom deployment mechanism
> we use
> > > to update our index.
> > >
> > > Kind Regards,
> > >
> > > James Bodkin
> > >
> > > On 16/06/2020, 15:30, "Erick Erickson" 
> wrote:
> > >
> > > Did you try the autowarming like I mentioned in my previous
> e-mail?
> > >
> > > > On Jun 16, 2020, at 10:18 AM, James Bodkin <
> > > james.bod...@loveholidays.com> wrote:
> > > >
> > > > We've changed the schema to enable docValues for these
> fields and
> > > this led to an improvement in the response time. We found a further
> > > improvement by also switching off indexed as these fields are used
> for
> > > faceting and filtering only.
> > > > Since those changes, we've found that the first-execution for
> > > queries is really noticeable. I thought this would be the
> filterCache based
> > > on what I saw in NewRelic however it is probably trying to read the
> > > docValues from disk. How can we use the autowarming to improve
> this?
> > > >
> > > > For example, I've run the following queries in sequence and
> each
> > > query has a first-execution penalty.
> > > >
> > > > Query 1:
> > > >
> > > > q=*:*
> > > > facet=true
> > > > facet.field=D_DepartureAirport
> > > > facet.field=D_Destination
> > > > facet.limit=-1
> > > > rows=0
> > >

Re: Master Slave Terminology

2020-06-17 Thread David Smiley

priv...@lucene.apache.org but it should have been public and expect it to
spill out to the dev list today.

~ David


On Wed, Jun 17, 2020 at 11:14 AM Mike Drob  wrote:

> Hi Jan,
>
> Can you link to the discussion? I searched the dev list and didn’t see
> anything, is it on slack or a jira or somewhere else?
>
> Mike
>
> On Wed, Jun 17, 2020 at 1:51 AM Jan Høydahl  wrote:
>
> > Hi Kaya,
> >
> > Thanks for bringing it up. The topic is already being discussed by
> > developers, so expect to see some change in this area; Not over-night,
> but
> > incremental.
> > Also, if you want to lend a helping hand, patches are more than welcome
> as
> > always.
> >
> > Jan
> >
> > > 17. jun. 2020 kl. 04:22 skrev Kayak28 :
> > >
> > > Hello, Community:
> > >
> > > As the Github and Python will replace terminologies that relative to
> > > slavery,
> > > why don't we replace master-slave for Solr as well?
> > >
> > > https://developers.srad.jp/story/18/09/14/0935201/
> > >
> >
> https://developer-tech.com/news/2020/jun/15/github-replace-slavery-terms-master-whitelist/
> > >
> > > --
> > >
> > > Sincerely,
> > > Kaya
> > > github: https://github.com/28kayak
> >
> >
>

Re: unified highlighter performance in solr 8.5.1

2020-07-03 Thread David Smiley

I think we should flip the default of hl.fragsizeIsMinimum to be 'true',
thus have the behavior close to what preceded 8.5.
(a) it was very recently (<= 8.4) the previous behavior and so may require
less tuning for users in 8.6 henceforth
(b) it's significantly faster for long text -- seems to be 2x to 5x for
long documents (assuming no change in hl.fragAlignRatio).  If the user
additionally configures hl.fragAlignRatio to 0 (also the previous behavior;
0.5 is the new default), I saw another 6x on top of that for "doc3" in the
test data Michal prepared.

Although I like that the sizing looks nicer, I think that is more from the
introduction and new default of hl.fragAlignRatio=0.5 than it is
hl.fragsizeIsMinimum=false.  We might even consider lowering
hl.fragAlignRatio to say 0.3 and retain pretty reasonable highlights
(avoids the extreme cases occurring with '0') and additional performance
benefit from that.

What do you think Nandor, Michal?

I'm hoping a change in settings (+ some better notes/docs on this) could
slip into an 8.6, all done by myself ASAP.

~ David


On Fri, Jun 19, 2020 at 2:32 PM Nándor Mátravölgyi 
wrote:

> Hi!
>
> With the provided test I've profiled the preceding() and following()
> calls on the base Java iterators in the different options.
>
> === default highlighter arguments ===
> Calling the test query with SENTENCE base iterator:
> - from LengthGoalBreakIterator.following(): 1130 calls of
> baseIter.preceding() took 1.039629 seconds in total
> - from LengthGoalBreakIterator.following(): 1140 calls of
> baseIter.following() took 0.340679 seconds in total
> - from LengthGoalBreakIterator.preceding(): 1150 calls of
> baseIter.preceding() took 0.099344 seconds in total
> - from LengthGoalBreakIterator.preceding(): 1100 calls of
> baseIter.following() took 0.015156 seconds in total
>
> Calling the test query with WORD base iterator:
> - from LengthGoalBreakIterator.following(): 1200 calls of
> baseIter.preceding() took 0.001006 seconds in total
> - from LengthGoalBreakIterator.following(): 1700 calls of
> baseIter.following() took 0.006278 seconds in total
> - from LengthGoalBreakIterator.preceding(): 1710 calls of
> baseIter.preceding() took 0.016320 seconds in total
> - from LengthGoalBreakIterator.preceding(): 1090 calls of
> baseIter.following() took 0.000527 seconds in total
>
> === hl.fragsizeIsMinimum=true&hl.fragAlignRatio=0 ===
> Calling the test query with SENTENCE base iterator:
> - from LengthGoalBreakIterator.following(): 860 calls of
> baseIter.following() took 0.012593 seconds in total
> - from LengthGoalBreakIterator.preceding(): 870 calls of
> baseIter.preceding() took 0.022208 seconds in total
>
> Calling the test query with WORD base iterator:
> - from LengthGoalBreakIterator.following(): 1360 calls of
> baseIter.following() took 0.004789 seconds in total
> - from LengthGoalBreakIterator.preceding(): 1370 calls of
> baseIter.preceding() took 0.015983 seconds in total
>
> === hl.fragsizeIsMinimum=true ===
> Calling the test query with SENTENCE base iterator:
> - from LengthGoalBreakIterator.following(): 980 calls of
> baseIter.following() took 0.010253 seconds in total
> - from LengthGoalBreakIterator.preceding(): 980 calls of
> baseIter.preceding() took 0.341997 seconds in total
>
> Calling the test query with WORD base iterator:
> - from LengthGoalBreakIterator.following(): 1670 calls of
> baseIter.following() took 0.005150 seconds in total
> - from LengthGoalBreakIterator.preceding(): 1680 calls of
> baseIter.preceding() took 0.013657 seconds in total
>
> === hl.fragAlignRatio=0 ===
> Calling the test query with SENTENCE base iterator:
> - from LengthGoalBreakIterator.following(): 1070 calls of
> baseIter.preceding() took 1.312056 seconds in total
> - from LengthGoalBreakIterator.following(): 1080 calls of
> baseIter.following() took 0.678575 seconds in total
> - from LengthGoalBreakIterator.preceding(): 1080 calls of
> baseIter.preceding() took 0.020507 seconds in total
> - from LengthGoalBreakIterator.preceding(): 1080 calls of
> baseIter.following() took 0.006977 seconds in total
>
> Calling the test query with WORD base iterator:
> - from LengthGoalBreakIterator.following(): 880 calls of
> baseIter.preceding() took 0.000706 seconds in total
> - from LengthGoalBreakIterator.following(): 1370 calls of
> baseIter.following() took 0.004110 seconds in total
> - from LengthGoalBreakIterator.preceding(): 1380 calls of
> baseIter.preceding() took 0.014752 seconds in total
> - from LengthGoalBreakIterator.preceding(): 1380 calls of
> baseIter.following() took 0.000106 seconds in total
>
> There is definitely a big difference between SENTENCE and WORD. I'm
> not sure how we can improve the logic on our side while keeping the
> features as is. Since the number of calls is roughly the same for when
> the performance is good and bad, it seems to depend on what the text
> is that the iterator is traversing.
>

1 2 3 4 >

1 - 100 of 327 matches

Mail list logo