Re: storing the analyzed value

2017-04-01 Thread Rick Leir

stored="true"  (the default)

https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties

On 2017-03-31 01:55 PM, John Blythe wrote:

hey all

i'm wanting to store one of my field's analyzed token for retrieval. is
there any way to do this? the preliminary googling i'd done had discussions
from 2007-2010, i didn't notice anything very recent touching on the
concept.

thanks-





Re: Suggestions with EdgeNGramFilterFactory and FuzzyLookupFactory

2017-04-01 Thread Alexandre Rafalovitch
Why do you think you need that filter when you are already using suggester
component.

What specific case is it supposed to solve?

Regards,
   Alex

On 31 Mar 2017 11:30 PM, "Alexis Aravena Silva" 
wrote:

> Hello All,
>
>
> I'm using the suggester component in Solr 6.4 with FuzzyLookupFactory and
> AnalyzingInfixLookupFactory, everything was ok until added
> EdgeNGramFilterFactory to my field type definition, after loading 8
> documents, I index manually, the process of indexing consumes 16GB of my
> hard disk, something so weird, this happens only with the
> FuzzyLookupFactory, during the process of indexing I noticed that Solr
> creates a temp file in "solr-6.4.0\server\tmp", this is my configuration:
>
> solrconfig.xml:
>
> 
> 
>   fuzzySuggester
>   FuzzyLookupFactory
>   fuzzy_suggestions
>   DocumentDictionaryFactory
>   _sugerencia_
>   idTipoRegistro
>   text_suggestion
>   false
>   false
> 
> 
>   infixSuggester
>   AnalyzingInfixLookupFactory
>   infix_suggestions
>   DocumentDictionaryFactory
>   _sugerencia_
>   idTipoRegistro
>   text_suggestion
>   false
>   false
> 
>   
>startup="lazy" >
> 
>   true
>   infixSuggester
>   fuzzySuggester
>   true
>   10
>   true
> 
> 
>   suggest
> 
>   
>
>
>
> shema.xml
>
>
>  stored="true" multiValued="false" />
>
>
>  positionIncrementGap="100" multiValued="true">
>   
> 
>  words="stopwords.txt" />
> 
>  maxGramSize="50" />
> 
>   
>   
> 
>  words="stopwords.txt" />
>  ignoreCase="true" expand="true"/>
> 
> 
>   
> 
>
>
> If I remove EdgeNGramFilterFactory everything works ok, but I require this
> filter for the suggestions.
>
>
> ¿What is the problem?
>
>
> Saludos,
>
> Alexis Aravena S.
>
> Scrum Master & Agile Coach
>
> Celular: +569 69080134
>
> Correo: aarav...@itsofteg.com
>
>


Re: Phrase Fields performance

2017-04-01 Thread Shawn Heisey
On 3/31/2017 1:55 PM, David Hastings wrote:
> So I un-commented out the line, to enable it to go against 6 important
> fields. Afterwards through monitoring performance I noticed that my
> searches were taking roughly 50% to 100% (2x!) longer, and it started
> at the exact time I committed that change, 1:40 pm, qtimes below in a
> 15 minute average cycle with the start time listed. 

That is fully expected.  Using both pf and qf basically has Solr doing
the exact same queries twice, once as specified on fields in qf, then
again as a phrase query on fields in pf.  If you add pf2 and/or pf3, you
can expect further speed drops.

If you're sorting by relevancy, using pf with higher boosts than qf
generally will make your results better, but it comes at a cost in
performance.

Thanks,
Shawn



Re: SOLr 6.2.1, dealing with the redirected SOLr web admin

2017-04-01 Thread Shawn Heisey
On 3/31/2017 1:42 PM, Stewart, Scott A. CTR OSD/DoDEA wrote:
> It seems to be working once I created a dummy core... 

As you may have already figured out, and Alexandre discussed:

The admin UI does not run inside the Solr server.  It runs in your
browser.  When you use a URL in a browser with the # character, that
character and everything that follows it are *NOT* sent to the web
server.  Those are used by the admin UI javascript that is running in
your browser.

URLs with # in them should never be used in a program context.  You need
to talk to API endpoints that do not contain that character.

Possible global URLs you could use to verify that the server is at least
UP (but do not necessarily guarantee full functionality):

http://server:port/solr/admin/info/system
http://server:port/solr/admin/cores
http://server:port/solr/admin/collections (only if running in cloud mode)

What you'll probably want is to check full functionality of one core:

http://server:port/solr/mycollection/admin/ping

The ping handler can be configured in solrconfig.xml, and can include a
healthcheck file which allows the handler to be disabled so it fails the
check even if it's perfectly fine.  I am using the ping handler as the
healthcheck URL in my load balancer (haproxy).  The core that I'm using
for my healthcheck is an aggregator core for a distributed (sharded)
index, so it actually checks the health of multiple cores on multiple
servers with a single URL call.

Thanks,
Shawn



Re: storing the analyzed value

2017-04-01 Thread John Blythe
Hi Rick. I should explain further. I'm not looking to have the input stored
but rather the final product, specifically the synonym that an input may be
mapped to.

If I have McDonald, McD's, and Mac Donald all mapped to "McDonald's" I'd
like to be able to not only access which one was sent to solr for search
(e.g. "McD's") but _also_ the synonym it mapped to: "McDonald's"

Does this make more sense?

Thanks for any continued discussion

On Sat, Apr 1, 2017 at 8:50 AM Rick Leir  wrote:

> stored="true"  (the default)
>
>
> https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties
>
> On 2017-03-31 01:55 PM, John Blythe wrote:
> > hey all
> >
> > i'm wanting to store one of my field's analyzed token for retrieval. is
> > there any way to do this? the preliminary googling i'd done had
> discussions
> > from 2007-2010, i didn't notice anything very recent touching on the
> > concept.
> >
> > thanks-
> >
>
> --
-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713


Re: Fieldtype json supported in SOLR 5.4.0 or 5.4.1

2017-04-01 Thread Zheng Lin Edwin Yeo
Did you upgrade your solrconfig.xml to the Solr 6.0 version too?
There are some difference in Solr 6.0 version which requires setting to
determine whether to use managed-schema or classic schema (the physical
schema.xml file)

Regards,
Edwin

On 1 April 2017 at 01:27, Abhijit Pawar  wrote:

> Hi Rick,
>
> I tried installing SOLR 6.0 since SOLR 6.0 has managed-schema and tried
> index the data from mongoDB :
>
>
> 
>  driver="com.mongodb.jdbc.MongoDriver" url="mongodb://
> ​<<>IP-Address>​
> :27017/
> ​<>
> "/>
> 
>  dataSource="mongod"
> transformer="TemplateTransformer,ProdsCatsFieldTransformer"
> onError="continue"
> pk="uuid"
> query="SELECT
> orgidStr,idStr,name,code,description,price,images,
> categoriesStr,enddate_solar,begin_date_solar,status_solar,
> current_stock_solar,retprice_solar,distprice_solar,
> listprice_solar,mfgprice_solar,out_of_stock_solar,hide_
> product_solar,saleprice_solar,metakey_solar,sales_enabled,
> new_product,has_sku,configurable,rating,updatedAt,comparable,hide_price
> FROM products"
> deltaImportQuery="SELECT
> orgidStr,idStr,name,code,description,price,images,
> categoriesStr,enddate_solar,begin_date_solar,status_solar,
> current_stock_solar,retprice_solar,distprice_solar,
> listprice_solar,mfgprice_solar,out_of_stock_solar,hide_
> product_solar,saleprice_solar,metakey_solar,sales_enabled,
> new_product,has_sku,configurable,rating,updatedAt,comparable,hide_price
> FROM products WHERE orgidStr = '${dataimporter.request.orgid}' AND idStr =
> '${dataimporter.delta.idStr}'"
> deltaQuery="SELECT idStr FROM products WHERE idStr =
> '${dataimporter.request.prodidStr}' AND orgidStr =
> '${dataimporter.request.orgid}'"
> >
> 
> 
>  template="org-${products.orgidStr}-prod-${products.idStr}"/>
> 
> 
> 
> 
>
>
> ​This is the error I get :
>
> getNext() failed for query 'SELECT
> orgidStr,idStr,name,code,description,price,images,
> categoriesStr,enddate_solar,begin_date_solar,status_solar,
> current_stock_solar,retprice_solar,distprice_solar,
> listprice_solar,mfgprice_solar,out_of_stock_solar,hide_
> product_solar,saleprice_solar,metakey_solar,sales_enabled,
> new_product,has_sku,configurable,rating,updatedAt,comparable,hide_price
> FROM products'
>
> :com.mongodb.MongoException$Network: can't call something : /
> ​<>
> :27017/
> ​<>
>
>
>
> Caused by: java.io.IOException: couldn't connect to [/
> ​
> ​<>:27017] bc:java.net.SocketTimeoutException: connect timed
> out
>
> ​Have anyone else gone through this kind of issue ?
>
>
>
>
> On Tue, Mar 28, 2017 at 6:20 PM, Rick Leir  wrote:
>
> > Abhijit
> > In Mongo you probably have one JSON record per document. You can post
> that
> > JSON record to Solr, and the JSON fields get indexed. The github project
> > you mention does just that. If you use the Solr managed schema then Solr
> > will automatically define fields based on what it receives. Otherwise you
> > will need to carefully design a schema.xml.
> > Cheers -- Rick
> >
> > On March 28, 2017 6:08:40 PM EDT, Abhijit Pawar <
> > abhijit.ibizs...@gmail.com> wrote:
> > >Hello All,
> > >
> > >I am working on a requirement to index field of type JSON (in mongoDB
> > >collection) in SOLR 5.4.0.
> > >
> > >I am using mongo-jdbc-dih which I found on GitHub :
> > >
> > >https://github.com/hrishik/solr-mongodb-dih
> > >
> > >However I could not find a fieldtype on Apache SOLR wiki page which
> > >would
> > >support JSON datatype in mongoDB.
> > >
> > >Can someone please recommend a way to include datatype / fieldtype in
> > >SOLR
> > >schema to support or index JSON data field from mongoDB.
> > >Thanks.
> > >
> > >R​egards,
> > >
> > >Abhijit​
> >
> > --
> > Sent from my Android device with K-9 Mail. Please excuse my brevity.
>


Re: Phrase Fields performance

2017-04-01 Thread Dave
Maybe commongrams could help this but it boils down to speed/quality/cheap. 
Choose two. Thanks

> On Apr 1, 2017, at 10:28 AM, Shawn Heisey  wrote:
> 
>> On 3/31/2017 1:55 PM, David Hastings wrote:
>> So I un-commented out the line, to enable it to go against 6 important
>> fields. Afterwards through monitoring performance I noticed that my
>> searches were taking roughly 50% to 100% (2x!) longer, and it started
>> at the exact time I committed that change, 1:40 pm, qtimes below in a
>> 15 minute average cycle with the start time listed. 
> 
> That is fully expected.  Using both pf and qf basically has Solr doing
> the exact same queries twice, once as specified on fields in qf, then
> again as a phrase query on fields in pf.  If you add pf2 and/or pf3, you
> can expect further speed drops.
> 
> If you're sorting by relevancy, using pf with higher boosts than qf
> generally will make your results better, but it comes at a cost in
> performance.
> 
> Thanks,
> Shawn
> 


Re: SOLr 6.2.1, dealing with the redirected SOLr web admin

2017-04-01 Thread Alexandre Rafalovitch
Actually I think the ping handler is now one of the implicit handlers and
does not need configuration.

Regards,
Alex

On 1 Apr 2017 10:35 AM, "Shawn Heisey"  wrote:

> On 3/31/2017 1:42 PM, Stewart, Scott A. CTR OSD/DoDEA wrote:
> > It seems to be working once I created a dummy core...
>
> As you may have already figured out, and Alexandre discussed:
>
> The admin UI does not run inside the Solr server.  It runs in your
> browser.  When you use a URL in a browser with the # character, that
> character and everything that follows it are *NOT* sent to the web
> server.  Those are used by the admin UI javascript that is running in
> your browser.
>
> URLs with # in them should never be used in a program context.  You need
> to talk to API endpoints that do not contain that character.
>
> Possible global URLs you could use to verify that the server is at least
> UP (but do not necessarily guarantee full functionality):
>
> http://server:port/solr/admin/info/system
> http://server:port/solr/admin/cores
> http://server:port/solr/admin/collections (only if running in cloud mode)
>
> What you'll probably want is to check full functionality of one core:
>
> http://server:port/solr/mycollection/admin/ping
>
> The ping handler can be configured in solrconfig.xml, and can include a
> healthcheck file which allows the handler to be disabled so it fails the
> check even if it's perfectly fine.  I am using the ping handler as the
> healthcheck URL in my load balancer (haproxy).  The core that I'm using
> for my healthcheck is an aggregator core for a distributed (sharded)
> index, so it actually checks the health of multiple cores on multiple
> servers with a single URL call.
>
> Thanks,
> Shawn
>
>


Re: SOLr 6.2.1, dealing with the redirected SOLr web admin

2017-04-01 Thread Shawn Heisey
On 4/1/2017 9:24 AM, Alexandre Rafalovitch wrote:
> Actually I think the ping handler is now one of the implicit handlers
> and does not need configuration.

This is true.  I was saying that they could configure it beyond the
defaults.  Which I believe is required if the healthcheck file is
desired.  I have not tried, though.

Thanks,
Shawn



Re: storing the analyzed value

2017-04-01 Thread Shawn Heisey
On 4/1/2017 8:51 AM, John Blythe wrote:
> Hi Rick. I should explain further. I'm not looking to have the input stored
> but rather the final product, specifically the synonym that an input may be
> mapped to.
>
> If I have McDonald, McD's, and Mac Donald all mapped to "McDonald's" I'd
> like to be able to not only access which one was sent to solr for search
> (e.g. "McD's") but _also_ the synonym it mapped to: "McDonald's"
>
> Does this make more sense?
>
> Thanks for any continued discussion

Generally speaking, search results contain EXACTLY the information that
was submitted for indexing, assuming that the value was stored.  In
newer versions, docValues can be used instead of stored in order to have
the field appear in results.  Index analysis only affects the search
index, what happens there will never show up in search results.  This is
a fundamental aspect of Solr's operation.

You *can* use a custom UpdateRequestProcessor.  For what you're talking
about, that would probably be a custom processor that you write yourself
and inject into Solr as a plugin.  Update processors are executed on the
indexed data, *before* it reaches the actual indexing part of Solr ...
so stored/docValues data is also affected.

https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors

Thanks,
Shawn



Re: storing the analyzed value

2017-04-01 Thread Shawn Heisey
On 4/1/2017 8:51 AM, John Blythe wrote:
> Hi Rick. I should explain further. I'm not looking to have the input stored
> but rather the final product, specifically the synonym that an input may be
> mapped to.
>
> If I have McDonald, McD's, and Mac Donald all mapped to "McDonald's" I'd
> like to be able to not only access which one was sent to solr for search
> (e.g. "McD's") but _also_ the synonym it mapped to: "McDonald's"

Thought of an alternate interpretation of what you asked about -- info
on the *query* side, not the *index* side.  This addresses that
alternate interpretation:

If you add "&debugQuery=true" to the URL, you'll see a lot of debug
information.  One piece of useful information there is the original
query, and the *parsed* query.  By comparing those, you would be able to
detect the synonyms applied at query time.

There may also be a "debug=" option that would only generate the
parsed query without spending the resources to generate the rest of the
debug info, but I am not familiar with all the debug options.  I know
that debugQuery=true will include that information, but the full debug
might cause your queries to become unacceptably slow.  There is a LOT of
info generated by the generic debugQuery.

Thanks,
Shawn



Re: Pagination bug? when sorting by a field (not unique field)

2017-04-01 Thread Pablo Anzorena
Excellent guns, thank you very much!

El mar. 29, 2017 18:09, "Erick Erickson"  escribió:

> You might be helped by "distributed IDF".
> see: SOLR-1632
>
> On Wed, Mar 29, 2017 at 1:56 PM, Chris Hostetter
>  wrote:
> >
> > The thing to keep in mind, is that w/o a fully deterministic sort,
> > the underlying problem statement "doc may appera on multiple pages" can
> > exist even in a single node solr index, even if no documents are
> > added/deleted between bage requests: because background merges /
> > searcher re-opening may happen in between those page requests.
> >
> > The best practice, if you really care about ensuring no (non-updated) doc
> > is ever returned twice in subsequent pages, is to to use a fully
> > deterministic sort, with a "tie breaker" clause that is unique to every
> > document (ie: uniqueKey field)
> >
> >
> >
> > : Date: Wed, 29 Mar 2017 23:14:22 +0300
> > : From: Mikhail Khludnev 
> > : Reply-To: solr-user@lucene.apache.org
> > : To: solr-user 
> > : Subject: Re: Pagination bug? when sorting by a field (not unique field)
> > :
> > : Great explanation, Alessandro!
> > :
> > : Let me briefly explain my experience. I have a tiny test with 2 shards
> and
> > : 2 replicas, index about a hundred of docs. And then when I fully
> paginate
> > : search results with score ranking, I've got duplicates across pages.
> And
> > : the reason is deletes, which occur probably due to update/failover.
> Every
> > : paging request lands to the different replica. There are a few
> workarounds:
> > : lands consequent requests to the same replicas; also  fixes
> > : duplicates; but tie-breaking is the best way for sure.
> > :
> > : On Wed, Mar 29, 2017 at 7:10 PM, alessandro.benedetti <
> a.benede...@sease.io>
> > : wrote:
> > :
> > : > The reason Mikhail mentioned that, is probably related to :
> > : >
> > : > *The way how number of document calculated is changed (LUCENE-6711)*
> > : > /The number of documents (docCount) is used to calculate term
> specificity
> > : > (idf) and average document length (avdl). Prior to LUCENE-6711,
> > : > collectionStats.maxDoc() was used for the statistics. Now,
> > : > collectionStats.docCount() is used whenever possible, if not
> maxDocs() is
> > : > used.
> > : > Assume that a collection contains 100 documents, and 50 of them have
> > : > "keywords" field. In this example, maxDocs is 100 while docCount is
> 50 for
> > : > the "keywords" field. The total number of tokens for "keywords"
> field is
> > : > divided by docCount to obtain avdl. Therefore, docCount which is the
> total
> > : > number of documents that have at least one term for the field, is a
> more
> > : > precise metric for optional fields.
> > : > DefaultSimilarity does not leverage avdl, so this change would have
> > : > relatively minor change in the result list. Because relative idf
> values of
> > : > terms will remain same. However, when combined with other factors
> such as
> > : > term frequency, relative ranking of documents could change. Some
> Similarity
> > : > implementations (such as the ones instantiated with NormalizationH2
> and
> > : > BM25) take account into avdl and would have notable change in ranked
> list.
> > : > Especially if you have a collection of documents with varying
> lengths.
> > : > Because NormalizationH2 tends to punish documents longer than avdl./
> > : >
> > : > This means that if you are load balancing, the page 2 query could go
> to
> > : > another replica, where the doc is scored differently, ending up on a
> > : > different position ( and maybe appearing again as a final effect).
> > : > This scenario is referred to scored ranking, so it will not affect
> sorting
> > : > (
> > : > and I believe in your initial mail you were referring not to sorting)
> > : >
> > : > Cheers
> > : >
> > : >
> > : > Pablo wrote
> > : > > Mikhall,
> > : > >
> > : > > effectively maxDocs are different and also deletedDocs, but
> numDocs are
> > : > > ok.
> > : > >
> > : > > I don't really get it, but can that be the problem?
> > : >
> > : >
> > : >
> > : >
> > : >
> > : > -
> > : > ---
> > : > Alessandro Benedetti
> > : > Search Consultant, R&D Software Engineer, Director
> > : > Sease Ltd. - www.sease.io
> > : > --
> > : > View this message in context: http://lucene.472066.n3.
> > : > nabble.com/Pagination-bug-when-sorting-by-a-field-not-unique-field-
> > : > tp4327408p4327461.html
> > : > Sent from the Solr - User mailing list archive at Nabble.com.
> > : >
> > :
> > :
> > :
> > : --
> > : Sincerely yours
> > : Mikhail Khludnev
> > :
> >
> > -Hoss
> > http://www.lucidworks.com/
>


Re: storing the analyzed value

2017-04-01 Thread Rick Leir

On 2017-04-01 10:51 AM, John Blythe wrote:

Hi Rick. I should explain further. I'm not looking to have the input stored
but rather the final product, specifically the synonym that an input may be
mapped to.

If I have McDonald, McD's, and Mac Donald all mapped to "McDonald's" I'd
like to be able to not only access which one was sent to solr for search
(e.g. "McD's") but _also_ the synonym it mapped to: "McDonald's"

Does this make more sense?

Thanks for any continued discussion

Shawn answered well, as always.
Here is possibly another way. You can enable highlighting, and scan the 
output to find the passage which is highlighted. But it can be a 
challenge to get highlighting working correctly (I have not yet used the 
new highlighter).

cheers -- Rick


DataImportHandler OutOfMemory Mysql

2017-04-01 Thread marotosg
Hi,

I am trying to load a big table into Solr using DataImportHandler and Mysql. 
I am getting OutOfMemory error because Solr is trying to load the full
table. I have been reading different posts and tried batchSize="-1". 
https://wiki.apache.org/solr/DataImportHandlerFaq

Do you have any idea what could be the issue?
Completely lost here.

Solr.6.4.1
mysql-connector-java-5.1.41-bin.jar

data-config 



  
 
 
 
 
   
 
 

  
   
   

 
 
 
  



Thanks
Sergio



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-OutOfMemory-Mysql-tp4327982.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: storing the analyzed value

2017-04-01 Thread Joel Bernstein
You may find this blog interesting.
http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-coming-in-solr-66.html

It deals with how the analyzer chain can now be applied in Streaming
Expressions. It will be part of the 6.6 release and is in master and
branch_6x already.

Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, Apr 1, 2017 at 5:23 PM, Rick Leir  wrote:

> On 2017-04-01 10:51 AM, John Blythe wrote:
>
>> Hi Rick. I should explain further. I'm not looking to have the input
>> stored
>> but rather the final product, specifically the synonym that an input may
>> be
>> mapped to.
>>
>> If I have McDonald, McD's, and Mac Donald all mapped to "McDonald's" I'd
>> like to be able to not only access which one was sent to solr for search
>> (e.g. "McD's") but _also_ the synonym it mapped to: "McDonald's"
>>
>> Does this make more sense?
>>
>> Thanks for any continued discussion
>>
> Shawn answered well, as always.
> Here is possibly another way. You can enable highlighting, and scan the
> output to find the passage which is highlighted. But it can be a challenge
> to get highlighting working correctly (I have not yet used the new
> highlighter).
> cheers -- Rick
>


Re: storing the analyzed value

2017-04-01 Thread John Blythe
All good info guys. Appreciate the responses.

Some more on the use case: I'm wanting to display the platonic ideal of
sorts for a given set of potential vendor names.

In some cases I can get away w a lookup in our database based on the term
in the document (not its other synonyms) brought back from the search (in
the analytics related query we do, for instance, leveraging the stats
component). But when we show the user the constituent data id like to show
the same platonic ideal. I could do the same lookup mentioned above for
display purposes. But if they want to sort on that column then my tricks
are suddenly exposed and I'm at a loss.

Does that make sense?

On Sat, Apr 1, 2017 at 6:28 PM Joel Bernstein  wrote:

> You may find this blog interesting.
>
> http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-coming-in-solr-66.html
>
> It deals with how the analyzer chain can now be applied in Streaming
> Expressions. It will be part of the 6.6 release and is in master and
> branch_6x already.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sat, Apr 1, 2017 at 5:23 PM, Rick Leir  wrote:
>
> > On 2017-04-01 10:51 AM, John Blythe wrote:
> >
> >> Hi Rick. I should explain further. I'm not looking to have the input
> >> stored
> >> but rather the final product, specifically the synonym that an input may
> >> be
> >> mapped to.
> >>
> >> If I have McDonald, McD's, and Mac Donald all mapped to "McDonald's" I'd
> >> like to be able to not only access which one was sent to solr for search
> >> (e.g. "McD's") but _also_ the synonym it mapped to: "McDonald's"
> >>
> >> Does this make more sense?
> >>
> >> Thanks for any continued discussion
> >>
> > Shawn answered well, as always.
> > Here is possibly another way. You can enable highlighting, and scan the
> > output to find the passage which is highlighted. But it can be a
> challenge
> > to get highlighting working correctly (I have not yet used the new
> > highlighter).
> > cheers -- Rick
> >
>
-- 
-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713


Re: Suggestions with EdgeNGramFilterFactory and FuzzyLookupFactory

2017-04-01 Thread Alexis Aravena Silva
Hi Alexandre,


Using only the suggester when I query the word "ferrada", not always I get the 
results, I don't know why, for example if I query:


ferr: I get result

ferra: I don't get result

ferrad: I don't get result

ferrada: I get result


Then I thought that typing letter by letter I'd get the result, that's why I 
added the filter, I read that EdgeNGramFilterFactory allows suggestions letter 
by letter.


Do you have any suggestion?, I need that all typing word returns result, please.



Regards.



From: Alexandre Rafalovitch 
Sent: Saturday, April 1, 2017 10:17:13 AM
To: solr-user
Subject: Re: Suggestions with EdgeNGramFilterFactory and FuzzyLookupFactory

Why do you think you need that filter when you are already using suggester
component.

What specific case is it supposed to solve?

Regards,
   Alex

On 31 Mar 2017 11:30 PM, "Alexis Aravena Silva" 
wrote:

> Hello All,
>
>
> I'm using the suggester component in Solr 6.4 with FuzzyLookupFactory and
> AnalyzingInfixLookupFactory, everything was ok until added
> EdgeNGramFilterFactory to my field type definition, after loading 8
> documents, I index manually, the process of indexing consumes 16GB of my
> hard disk, something so weird, this happens only with the
> FuzzyLookupFactory, during the process of indexing I noticed that Solr
> creates a temp file in "solr-6.4.0\server\tmp", this is my configuration:
>
> solrconfig.xml:
>
> 
> 
>   fuzzySuggester
>   FuzzyLookupFactory
>   fuzzy_suggestions
>   DocumentDictionaryFactory
>   _sugerencia_
>   idTipoRegistro
>   text_suggestion
>   false
>   false
> 
> 
>   infixSuggester
>   AnalyzingInfixLookupFactory
>   infix_suggestions
>   DocumentDictionaryFactory
>   _sugerencia_
>   idTipoRegistro
>   text_suggestion
>   false
>   false
> 
>   
>startup="lazy" >
> 
>   true
>   infixSuggester
>   fuzzySuggester
>   true
>   10
>   true
> 
> 
>   suggest
> 
>   
>
>
>
> shema.xml
>
>
>  stored="true" multiValued="false" />
>
>
>  positionIncrementGap="100" multiValued="true">
>   
> 
>  words="stopwords.txt" />
> 
>  maxGramSize="50" />
> 
>   
>   
> 
>  words="stopwords.txt" />
>  ignoreCase="true" expand="true"/>
> 
> 
>   
> 
>
>
> If I remove EdgeNGramFilterFactory everything works ok, but I require this
> filter for the suggestions.
>
>
> ¿What is the problem?
>
>
> Saludos,
>
> Alexis Aravena S.
>
> Scrum Master & Agile Coach
>
> Celular: +569 69080134
>
> Correo: aarav...@itsofteg.com
>
>


Re: DataImportHandler OutOfMemory Mysql

2017-04-01 Thread Mikhail Khludnev
Hello, Sergio.

Have you tried Integer.MIN_VALUE ? -2147483648 see
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html


On Sun, Apr 2, 2017 at 1:17 AM, marotosg  wrote:

> Hi,
>
> I am trying to load a big table into Solr using DataImportHandler and
> Mysql.
> I am getting OutOfMemory error because Solr is trying to load the full
> table. I have been reading different posts and tried batchSize="-1".
> https://wiki.apache.org/solr/DataImportHandlerFaq
>
> Do you have any idea what could be the issue?
> Completely lost here.
>
> Solr.6.4.1
> mysql-connector-java-5.1.41-bin.jar
>
> data-config
>
>  driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://188.68.190.85:3306/jobsdb"
> user="suer"
> password="passowrd"/>
> 
>pk="id"
> batchSize="-1"
> query="select * from job"
> deltaImportQuery="SELECT * from job WHERE id='${dih.delta.id}'"
> deltaQuery="SELECT id FROM job  WHERE updated_at >
> '${dih.last_index_time}'"
> >
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>   
> 
> 
>
> Thanks
> Sergio
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/DataImportHandler-OutOfMemory-Mysql-tp4327982.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev