Solr upgrade to latest version

2014-09-22 Thread Danesh Kuruppu
Hi all,

I currently working on upgrade sorl 1.4.1 to sorl latest stable release.

What is the latest stable release I can use?
Is there specfic things I need to look at when upgrade.

Need help
Thanks

Danesh


[ANNOUNCE] Apache Solr 4.9.1 released

2014-09-22 Thread Michael McCandless
September 2014, Apache Solr™ 4.9.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.9.1

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.9.1 is available for immediate download at:

http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Solr 4.9.1 includes 2 bug fixes, as well as Lucene 4.9.1 and its 7 bug fixes.

See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

Mike McCandless

http://blog.mikemccandless.com


Search multiple cores, one result

2014-09-22 Thread Clemens Wyss DEV
As mentioned in antoher post we (already) have a (Lucene-based) generic 
indexing framework which allows any source/entity to provide 
indexable/searchable data.
Sources may be:
pages
events
products
customers
...
As their names imply they have nothing in common ;) Never the less we'd like to 
search across them, getting one resultset with the top hits
(searching across sources is also required for (auto)suggesting search terms)

In our current Lucene-approach we create a Lucene index per source (and 
language) and then search across the indexs  with a MultiIndexReader.
Switching to Solr we'd like to rethink the design decision whether to 
a) put all data into one core(Lucene index) 
or to
b) split them into seperate cores

if  b) how can I search across the cores (in SolrJ)?

Thx
Clemens


Re: Solr upgrade to latest version

2014-09-22 Thread Alexandre Rafalovitch
4.10.1 out shortly is a good bet.

No idea about the upgrade specifically, but I would probably do some
reading of recent solrconfig.xml to get a hint of new features. Also,
schema.xml has a version number at the top. The default changed which
is controlled by that version number. So, it is something to keep in
mind.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 22 September 2014 05:29, Danesh Kuruppu  wrote:
> Hi all,
>
> I currently working on upgrade sorl 1.4.1 to sorl latest stable release.
>
> What is the latest stable release I can use?
> Is there specfic things I need to look at when upgrade.
>
> Need help
> Thanks
>
> Danesh


Re: with wildcard-source?

2014-09-22 Thread Alexandre Rafalovitch
On 22 September 2014 01:04, Clemens Wyss DEV  wrote:
> All I have at hand is "Solr in Action" which doesn't (didn't) mention the 
> copyField-wildcards...

Well, unless your implementation is also fully theoretical, you also
have all the various examples in the Solr distribution. They
demonstrate many of the features.

Regards,
Alex

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: [ANNOUNCE] Apache Solr 4.9.1 released

2014-09-22 Thread Bernd Fehling
This confuses me a bit, aren't we already at 4.10.0?

But CHANGES.txt of 4.10.0 doesn't know anything about 4.9.1.

Is this an interim version or something about backward compatibility?

Regards


Am 22.09.2014 um 11:36 schrieb Michael McCandless:
> September 2014, Apache Solr™ 4.9.1 available
> 
> The Lucene PMC is pleased to announce the release of Apache Solr 4.9.1
> 
> Solr is the popular, blazing fast, open source NoSQL search platform
> from the Apache Lucene project. Its major features include powerful
> full-text search, hit highlighting, faceted search, dynamic
> clustering, database integration, rich document (e.g., Word, PDF)
> handling, and geospatial search. Solr is highly scalable, providing
> fault tolerant distributed search and indexing, and powers the search
> and navigation features of many of the world's largest internet sites.
> 
> Solr 4.9.1 is available for immediate download at:
> 
> http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
> 
> Solr 4.9.1 includes 2 bug fixes, as well as Lucene 4.9.1 and its 7 bug fixes.
> 
> See the CHANGES.txt file included with the release for a full list of
> changes and further details.
> 
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/solr/discussion.html)
> 
> Note: The Apache Software Foundation uses an extensive mirroring
> network for distributing releases. It is possible that the mirror you
> are using may not have replicated the release yet. If that is the
> case, please try another mirror. This also goes for Maven access.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 


Re: [ANNOUNCE] Apache Solr 4.9.1 released

2014-09-22 Thread Shalin Shekhar Mangar
This is a bug fix release on top of 4.9. Only some important fixes from
4.10 and beyond were back-ported to the 4.9 branch. There may be a 4.10.1
release too very soon.

On Mon, Sep 22, 2014 at 5:54 PM, Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> This confuses me a bit, aren't we already at 4.10.0?
>
> But CHANGES.txt of 4.10.0 doesn't know anything about 4.9.1.
>
> Is this an interim version or something about backward compatibility?
>
> Regards
>
>
> Am 22.09.2014 um 11:36 schrieb Michael McCandless:
> > September 2014, Apache Solr™ 4.9.1 available
> >
> > The Lucene PMC is pleased to announce the release of Apache Solr 4.9.1
> >
> > Solr is the popular, blazing fast, open source NoSQL search platform
> > from the Apache Lucene project. Its major features include powerful
> > full-text search, hit highlighting, faceted search, dynamic
> > clustering, database integration, rich document (e.g., Word, PDF)
> > handling, and geospatial search. Solr is highly scalable, providing
> > fault tolerant distributed search and indexing, and powers the search
> > and navigation features of many of the world's largest internet sites.
> >
> > Solr 4.9.1 is available for immediate download at:
> >
> > http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
> >
> > Solr 4.9.1 includes 2 bug fixes, as well as Lucene 4.9.1 and its 7 bug
> fixes.
> >
> > See the CHANGES.txt file included with the release for a full list of
> > changes and further details.
> >
> > Please report any feedback to the mailing lists
> > (http://lucene.apache.org/solr/discussion.html)
> >
> > Note: The Apache Software Foundation uses an extensive mirroring
> > network for distributing releases. It is possible that the mirror you
> > are using may not have replicated the release yet. If that is the
> > case, please try another mirror. This also goes for Maven access.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Nathaniel Rudavsky-Brody

Hello,

I'm trying find the best way to "fake" the terms component for fuzzy 
queries. That is, I need the full set of index terms for each of the 
two queries "quidam~1" and "quidam~2".


I tried defining two suggesters with FuzzyLookupFactory, with 
maxEdits=1 and 2 respectively, but the results for "quidam~1" include 
suffixes like "quodammodo", which makes sense for a suggester but isn't 
what I want here.


Now I'm trying with the spell-checker. As far as I can see, 
IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
to distinguish between my two queries. DirectSolrSpellChecker seems 
like it should work, ie:


 
   
 fuzzy1
 solr.DirectSolrSpellChecker
1
...
   
   
 fuzzy2
 solr.DirectSolrSpellChecker
2
   ...
   
 

However the parameter spellcheck.alternativeTermCount has no effect, so 
the query "spellcheck.q=quidam" gives no results, but 
"spellcheck.q=quiam" (which doesn't exist in the index) gives the 
expected terms.


Am I missing something? Or is there a better way to do this?

Many thanks for any help and ideas,

Nathaniel


Re: [ANNOUNCE] Apache Solr 4.9.1 released

2014-09-22 Thread Shawn Heisey
On 9/22/2014 6:24 AM, Bernd Fehling wrote:
> This confuses me a bit, aren't we already at 4.10.0?
> 
> But CHANGES.txt of 4.10.0 doesn't know anything about 4.9.1.
> 
> Is this an interim version or something about backward compatibility?

It's a bugfix release, fixing some showstopper bugs in a recent release
that is critical to the RM (Michael McCandless) and/or an organization
where he has influence or liability.  Apparently this was a more
expedient path than completely validating a 4.10 upgrade and waiting for
the 4.10.1 bugfix release.  Validating the 4.10 upgrade probably would
have taken considerably longer than simply backporting some critical
fixes to the 4.9 release that they're actually using.

The two bug fixes for Solr are a license issue and a security
vulnerability.  The bugfix list for Lucene includes fixes for some major
problems that can cause index corruption or incorrect operation.

I had thought that the CHANGES.txt list would remain the same for trunk
and the stable branch because some of those bugfixes skipped the 4.10.0
release, but it looks like that's not the case for LUCENE-5919 (the only
one that I actually investigated).  If these issues all got updated to
the 4.9.1 section of CHANGES.txt in places other than the 4.9 branch and
the 4.9.1 tag, there might be a small amount of confusion in the distant
future.  That confusion would be cleared up by looking at CHANGES.txt
for the 4.10.0 release, though.

Looks like the 4.10.1 release has been delayed a little.  I hope that
this collection of fixes makes it in there too, so that 4.10.0 is the
only release where that confusion might impact users.

Thanks,
Shawn



Re: [ANNOUNCE] Apache Solr 4.9.1 released

2014-09-22 Thread Michael McCandless
I'll merge back the 4.9.1 CHANGES entries so when we do a 4.10.1,
they'll be there ... and I'll also make sure any fix we backported for
4.9.1, we also backport for 4.10.1.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Sep 22, 2014 at 9:11 AM, Shawn Heisey  wrote:
> On 9/22/2014 6:24 AM, Bernd Fehling wrote:
>> This confuses me a bit, aren't we already at 4.10.0?
>>
>> But CHANGES.txt of 4.10.0 doesn't know anything about 4.9.1.
>>
>> Is this an interim version or something about backward compatibility?
>
> It's a bugfix release, fixing some showstopper bugs in a recent release
> that is critical to the RM (Michael McCandless) and/or an organization
> where he has influence or liability.  Apparently this was a more
> expedient path than completely validating a 4.10 upgrade and waiting for
> the 4.10.1 bugfix release.  Validating the 4.10 upgrade probably would
> have taken considerably longer than simply backporting some critical
> fixes to the 4.9 release that they're actually using.
>
> The two bug fixes for Solr are a license issue and a security
> vulnerability.  The bugfix list for Lucene includes fixes for some major
> problems that can cause index corruption or incorrect operation.
>
> I had thought that the CHANGES.txt list would remain the same for trunk
> and the stable branch because some of those bugfixes skipped the 4.10.0
> release, but it looks like that's not the case for LUCENE-5919 (the only
> one that I actually investigated).  If these issues all got updated to
> the 4.9.1 section of CHANGES.txt in places other than the 4.9 branch and
> the 4.9.1 tag, there might be a small amount of confusion in the distant
> future.  That confusion would be cleared up by looking at CHANGES.txt
> for the 4.10.0 release, though.
>
> Looks like the 4.10.1 release has been delayed a little.  I hope that
> this collection of fixes makes it in there too, so that 4.10.0 is the
> only release where that confusion might impact users.
>
> Thanks,
> Shawn
>


Re: Issue Adding Filter Query

2014-09-22 Thread aaguilar
Hello Erick.

Below is the information you requested.   Thanks for your help!

   




On Fri, Sep 19, 2014 at 7:36 PM, Erick Erickson [via Lucene] <
ml-node+s472066n4160122...@n3.nabble.com> wrote:

> Hmmm, I'd have to see the schema definition for your description
> field. For this, the admin/analysis page is very helpful. Here's my
> guess:
>
> Your analysis chain doesn't break the incoming tokens up quite like
> you think it is. Thus you have the tokens in your index like
> 'protein,' (notice the comma) and 'protein-like' rather than just
> 'protein'. However, I can't quite reconcile this with your statement:
> "Another weird thing is that if I used description:"fatty
> acid-binding" AND description:"protein"
>
> so I'm at something of a loss. If you paste in your schema definition
> for the 'description' field _and_ the corresponding 
> definition I can give it a quick whirl.
>
> Best,
> Erick
>
> On Fri, Sep 19, 2014 at 11:53 AM, aaguilar <[hidden email]
> > wrote:
>
> > Hello Erick,
> >
> > Thanks for the response.  I tried adding the debug=True to the query,
> but I
> > do not know exactly what I am looking for in the output.  Would it be
> > possible for you to look at the results?  I would really appreciate it.
> I
> > attached two files, one of them is with the filter query
> description:"fatty
> > acid-binding" and the other is with the filter query description:"fatty
> > acid-binding protein".  If you see the file that has the results for
> > description:"fatty acid-binding" , you can see that the hits do have
> "fatty
> > acid-binding protein" and nothing in between.  I really appreciate any
> help
> > you can provide.
> >
> > Thanks you
> >
> > On Fri, Sep 19, 2014 at 2:03 PM, Erick Erickson [via Lucene] <
> > [hidden email] >
> wrote:
> >
> >> Your very best friend here is attaching &debug=query to the URL and
> >> looking at the parsed query results. Upon occasion there's some
> >>
> >> One possible explanation is that description field has something like
> >> "fatty acid-binding some words protein" in which case your query
> >> "fatty acid-binding protein" would fail, but "fatty acid-binding
> >> protein"~4 would succeed.
> >>
> >> The other possibility is that your query parsing isn't quite doing
> >> what you think, but adding &debug=query should help there.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Sep 19, 2014 at 8:10 AM, aaguilar <[hidden email]
> >> > wrote:
> >>
> >> > Hello All,
> >> >
> >> > I recently came across a problem when I tried using
> description:"fatty
> >> > acid-binding protein" as a filter query when doing a query through
> the
> >> query
> >> > interface for Solr in the Tomcat server.  Using that filter query did
> >> not
> >> > give me any results at all, however if I used description:"fatty
> >> > acid-binding" as the filter query, it would give me the results I
> >> wanted.
> >> >
> >> > The thing is that some of the results I got back from Solr, did have
> the
> >> > words "fatty acid-binding protein" in the description field.  So I
> >> really do
> >> > not know what might be causing the issue of Solr not being able to
> find
> >> > those hits.
> >> >
> >> > Another weird thing is that if I used description:"fatty
> acid-binding"
> >> AND
> >> > description:"protein" as the filter query when doing a query, it gave
> me
> >> the
> >> > results I anticipated (with some extra results that did not have the
> >> exact
> >> > phrase "fatty acid-binding protein").  Does anyone have an idea as to
> >> what
> >> > might be happening?  Just in case this is helpful, the version of
> Solr
> >> we
> >> > are using is 4.0.0.2012.10.06.03.04.33.  I appreciate any help anyone
> >> can
> >> > provide.
> >> >
> >> > Thanks!
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990.html
> >> > Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >> --
> >>  If you reply to this email, your message will be added to the
> discussion
> >> below:
> >>
> >>
> http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990p4160036.html
> >>  To unsubscribe from Issue Adding Filter Query, click here
> >> <
> >> .
> >> NAML
> >> <
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> >>
> >
> >
> > fatty_acid-binding_protein.xml (1K) <
> http://lucene.472066.n3.nabble.com/attachment/4160048/0/fatty_acid-binding_protein.xml>
>
> > fatty_acid-binding.xml (63K)

[ANN] Lucidworks Fusion 1.0.0

2014-09-22 Thread Grant Ingersoll
Hi All,

We at Lucidworks are pleased to announce the release of Lucidworks Fusion 1.0.  
 Fusion is built to overlay on top of Solr (in fact, you can manage multiple 
Solr clusters -- think QA, staging and production -- all from our Admin).In 
other words, if you already have Solr, simply point Fusion at your instance and 
get all kinds of goodies like Banana (https://github.com/LucidWorks/Banana -- 
our port of Kibana to Solr + a number of extensions that Kibana doesn't have), 
collaborative filtering style recommendations (without the need for Hadoop or 
Mahout!), a modern signal capture framework, analytics, NLP integration, 
Boosting/Blocking and other relevance tools, flexible index and query time 
pipelines as well as a myriad of connectors ranging from Twitter to web 
crawling to Sharepoint.  The best part of all this?  It all leverages the 
infrastructure that you know and love: Solr.  Want recommendations?  Deploy 
more Solr.  Want log analytics?  Deploy more Solr.  Want to track important 
system metrics?  Deploy more Solr.

Fusion represents our commitment as a company to continue to contribute a large 
quantity of enhancements to the core of Solr while complementing and extending 
those capabilities with value adds that integrate a number of 3rd party (e.g 
connectors) and home grown capabilities like an all new, responsive UI built in 
AngularJS.  Fusion is not a fork of Solr.  We do not hide Solr in any way.  In 
fact, our goal is that your existing applications will work out of the box with 
Fusion, allowing you to take advantage of new capabilities w/o overhauling your 
existing application.

If you want to learn more, please feel free to join our technical webinar on 
October 2: http://lucidworks.com/blog/say-hello-to-lucidworks-fusion/.  If 
you'd like to download: http://lucidworks.com/product/fusion/. 

Cheers,
Grant Ingersoll


Grant Ingersoll | CTO 
gr...@lucidworks.com | @gsingers
http://www.lucidworks.com



RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Dyer, James
Nathaniel,

Can you show us all of the parameters you are sending to the spellchecker?  
When you specify "alternativeTermCount" with "spellcheck.q=quidam", what are 
the terms you expect to get back?  Also, are you getting any query results 
back?  If you are using a "q" that returns results, or more results than you 
specify for "spellcheck.maxResultsForSuggest", spellcheck won't give you 
anything regardless of what you put for "spellcheck.q".

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 8:08 AM
To: solr-user@lucene.apache.org
Subject: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Hello,

I'm trying find the best way to "fake" the terms component for fuzzy 
queries. That is, I need the full set of index terms for each of the 
two queries "quidam~1" and "quidam~2".

I tried defining two suggesters with FuzzyLookupFactory, with 
maxEdits=1 and 2 respectively, but the results for "quidam~1" include 
suffixes like "quodammodo", which makes sense for a suggester but isn't 
what I want here.

Now I'm trying with the spell-checker. As far as I can see, 
IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
to distinguish between my two queries. DirectSolrSpellChecker seems 
like it should work, ie:

  

  fuzzy1
  solr.DirectSolrSpellChecker
 1
...


  fuzzy2
  solr.DirectSolrSpellChecker
 2
...

  

However the parameter spellcheck.alternativeTermCount has no effect, so 
the query "spellcheck.q=quidam" gives no results, but 
"spellcheck.q=quiam" (which doesn't exist in the index) gives the 
expected terms.

Am I missing something? Or is there a better way to do this?

Many thanks for any help and ideas,

Nathaniel


RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Nathaniel Rudavsky-Brody

Hi James,

The request 
/spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
returns


quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
quis, quae, quas, quem, quid, quin, qui, qua


Replacing quiam (not in the index) by quidam (in the index) returns 
nothing at all, but I want it to return


quidam, quam, quia, quidem, quadam, quodam, quedam, ...

When I was using the same parameters with IndexBasedSpellChecker, by 
setting a high alternativeTermCount, I got results for both. But as I 
said, then I can't differentiate the different maxEdits.


The request handler is:

class="org.apache.solr.handler.component.SearchHandler">

   
 fuzzy1
 20
 100
   
   
 fuzzyterms
   
 

Thanks!

Nathaniel

On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
 wrote:

Nathaniel,

Can you show us all of the parameters you are sending to the 
spellchecker?  When you specify "alternativeTermCount" with 
"spellcheck.q=quidam", what are the terms you expect to get back?  
Also, are you getting any query results back?  If you are using a "q" 
that returns results, or more results than you specify for 
"spellcheck.maxResultsForSuggest", spellcheck won't give you anything 
regardless of what you put for "spellcheck.q".


James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 8:08 AM

To: solr-user@lucene.apache.org
Subject: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Hello,

I'm trying find the best way to "fake" the terms component for fuzzy 
queries. That is, I need the full set of index terms for each of the 
two queries "quidam~1" and "quidam~2".


I tried defining two suggesters with FuzzyLookupFactory, with 
maxEdits=1 and 2 respectively, but the results for "quidam~1" include 
suffixes like "quodammodo", which makes sense for a suggester but 
isn't 
what I want here.


Now I'm trying with the spell-checker. As far as I can see, 
IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
to distinguish between my two queries. DirectSolrSpellChecker seems 
like it should work, ie:


  

  fuzzy1
  solr.DirectSolrSpellChecker
 1
...


  fuzzy2
  solr.DirectSolrSpellChecker
 2
...

  

However the parameter spellcheck.alternativeTermCount has no effect, 
so 
the query "spellcheck.q=quidam" gives no results, but 
"spellcheck.q=quiam" (which doesn't exist in the index) gives the 
expected terms.


Am I missing something? Or is there a better way to do this?

Many thanks for any help and ideas,

Nathaniel


RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Dyer, James
Did you try "spellcheck.alternativeTermCount" with DirectSolrSpellChecker?  You 
can set it to whatever low value you actually want it to return back to you 
(perhaps 20 suggestions max?).

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 9:36 AM
To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Hi James,

The request 
/spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
returns

quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
quis, quae, quas, quem, quid, quin, qui, qua

Replacing quiam (not in the index) by quidam (in the index) returns 
nothing at all, but I want it to return

quidam, quam, quia, quidem, quadam, quodam, quedam, ...

When I was using the same parameters with IndexBasedSpellChecker, by 
setting a high alternativeTermCount, I got results for both. But as I 
said, then I can't differentiate the different maxEdits.

The request handler is:

 

  fuzzy1
  20
  100


  fuzzyterms

  

Thanks!

Nathaniel

On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
 wrote:
> Nathaniel,
> 
> Can you show us all of the parameters you are sending to the 
> spellchecker?  When you specify "alternativeTermCount" with 
> "spellcheck.q=quidam", what are the terms you expect to get back?  
> Also, are you getting any query results back?  If you are using a "q" 
> that returns results, or more results than you specify for 
> "spellcheck.maxResultsForSuggest", spellcheck won't give you anything 
> regardless of what you put for "spellcheck.q".
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -Original Message-
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
> Sent: Monday, September 22, 2014 8:08 AM
> To: solr-user@lucene.apache.org
> Subject: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount
> 
> Hello,
> 
> I'm trying find the best way to "fake" the terms component for fuzzy 
> queries. That is, I need the full set of index terms for each of the 
> two queries "quidam~1" and "quidam~2".
> 
> I tried defining two suggesters with FuzzyLookupFactory, with 
> maxEdits=1 and 2 respectively, but the results for "quidam~1" include 
> suffixes like "quodammodo", which makes sense for a suggester but 
> isn't 
> what I want here.
> 
> Now I'm trying with the spell-checker. As far as I can see, 
> IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
> to distinguish between my two queries. DirectSolrSpellChecker seems 
> like it should work, ie:
> 
>   
> 
>   fuzzy1
>   solr.DirectSolrSpellChecker
>  1
>   ...
> 
> 
>   fuzzy2
>   solr.DirectSolrSpellChecker
>  2
> ...
> 
>   
> 
> However the parameter spellcheck.alternativeTermCount has no effect, 
> so 
> the query "spellcheck.q=quidam" gives no results, but 
> "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
> expected terms.
> 
> Am I missing something? Or is there a better way to do this?
> 
> Many thanks for any help and ideas,
> 
> Nathaniel


RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Nathaniel Rudavsky-Brody
Yep, I tried it both as a default param in the request handler (as in 
the config I sent), and in the request, but with no effect... That's 
what surprised me, since it seems it should work.


On Mon, Sep 22, 2014 at 4:38 , Dyer, James 
 wrote:
Did you try "spellcheck.alternativeTermCount" with 
DirectSolrSpellChecker?  You can set it to whatever low value you 
actually want it to return back to you (perhaps 20 suggestions max?).


James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 9:36 AM

To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
alternativeTermCount


Hi James,

The request 
/spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
returns


quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
quis, quae, quas, quem, quid, quin, qui, qua


Replacing quiam (not in the index) by quidam (in the index) returns 
nothing at all, but I want it to return


quidam, quam, quia, quidem, quadam, quodam, quedam, ...

When I was using the same parameters with IndexBasedSpellChecker, by 
setting a high alternativeTermCount, I got results for both. But as I 
said, then I can't differentiate the different maxEdits.


The request handler is:

 class="org.apache.solr.handler.component.SearchHandler">


  fuzzy1
  20
  100


  fuzzyterms

  

Thanks!

Nathaniel

On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
 wrote:

 Nathaniel,
 
 Can you show us all of the parameters you are sending to the 
 spellchecker?  When you specify "alternativeTermCount" with 
 "spellcheck.q=quidam", what are the terms you expect to get back?  
 Also, are you getting any query results back?  If you are using a 
"q" 
 that returns results, or more results than you specify for 
 "spellcheck.maxResultsForSuggest", spellcheck won't give you 
anything 
 regardless of what you put for "spellcheck.q".
 
 James Dyer

 Ingram Content Group
 (615) 213-4311
 
 
 -Original Message-
 From: Nathaniel Rudavsky-Brody 
[mailto:nathaniel.rudav...@gmail.com] 
 Sent: Monday, September 22, 2014 8:08 AM

 To: solr-user@lucene.apache.org
 Subject: fuzzy terms, DirectSolrSpellChecker and 
alternativeTermCount
 
 Hello,
 
 I'm trying find the best way to "fake" the terms component for 
fuzzy 
 queries. That is, I need the full set of index terms for each of 
the 
 two queries "quidam~1" and "quidam~2".
 
 I tried defining two suggesters with FuzzyLookupFactory, with 
 maxEdits=1 and 2 respectively, but the results for "quidam~1" 
include 
 suffixes like "quodammodo", which makes sense for a suggester but 
 isn't 
 what I want here.
 
 Now I'm trying with the spell-checker. As far as I can see, 
 IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use 
it 
 to distinguish between my two queries. DirectSolrSpellChecker seems 
 like it should work, ie:
 
   class="solr.SpellCheckComponent">

 
   fuzzy1
   solr.DirectSolrSpellChecker
  1
...
 
 
   fuzzy2
   solr.DirectSolrSpellChecker
  2
 ...
 
   
 
 However the parameter spellcheck.alternativeTermCount has no 
effect, 
 so 
 the query "spellcheck.q=quidam" gives no results, but 
 "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
 expected terms.
 
 Am I missing something? Or is there a better way to do this?
 
 Many thanks for any help and ideas,
 
 Nathaniel


Re: Problems for indexing large documents on SolrCloud

2014-09-22 Thread Olivier
Hi,

First thanks for your advices.
I did some several tests and finally I could index all the data on my
SolrCloud cluster.
The error was client side, it's documented in this post :
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201406.mbox/%3ccfc09ae1.94f8%25rebecca.t...@ucsf.edu%3E

"EofException from Jetty means one specific thing:  The client software
disconnected before Solr was finished with the request and sent its
response.  Chances are good that this is because of a configured socket
timeout on your SolrJ client or its HttpClient.  This might have been
done with the setSoTimeout method on the server object."

So I increased Solarium timeout from 5 to 60 seconds and all the data
is well indexed now. The error was not reproducible on my development
PC because the database and the Solr were on the same local virtual
machine with a lot of available resources so the indexation was faster
than in SolrCloud cluster.

Thanks,

Olivier


2014-09-11 0:21 GMT+02:00 Shawn Heisey :

> On 9/10/2014 2:05 PM, Erick Erickson wrote:
> > bq: org.apache.solr.common.SolrException: Unexpected end of input
> > block; expected an identifier
> >
> > This is very often an indication that your packets are being
> > truncated by "something in the chain". In your case, make sure
> > that Tomcat is configured to handle inputs of the size that you're
> sending.
> >
> > This may be happening before things get to Solr, in which case your
> settings
> > in solrconfig.xml aren't germane, the problem is earlier than than.
> >
> > A "semi-smoking-gun" here is that there's a size of your multivalued
> > field that seems to break things... That doesn't rule out time problems
> > of course.
> >
> > But I'd look at the Tomcat settings for maximum packet size first.
>
> The maximum HTTP request size is actually is controlled by Solr itself
> since 4.1, with changes committed for SOLR-4265.  Changing the setting
> on Tomcat probably will not help.
>
> An example from my own config which sets this to 32MB - the default is
> 2048, or 2MB:
>
>   multipartUploadLimitInKB="32768" formdataUploadLimitInKB="32768"/>
>
> Thanks,
> Shawn
>
>


using facet enum et fc in the same query.

2014-09-22 Thread jerome . dupont
Hello, 

I have a solr index (12 M docs, 45Go) with facets, and I'm trying to 
improve facet queries performances.
1/ I tried to use docvalue on facet fields, it didn't work well
2/ I tried facet.threads=-1 in my querie, and worked perfectely (from more 
15s  to 2s for longest queries)

3/ I'm trying to use facet.method=enum. It's supposed to improve the 
performance for facets fileds with few differents values. (type of 
documents, things like that)

My problem is that I don't know if there is a way to specifiy enum method 
for some  facets (3 to 5000 different values), and fc method the some 
others (up to 12M different values) and the same query?

Is it possible with something like MyFacet..facet.method=enum

?

Thanks in advance for the answer.

---
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
---


Participez à l'acquisition d'un Trésor national - Le manuscrit royal de 
François I er Avant d'imprimer, pensez à l'environnement. 

Re: max across documents?

2014-09-22 Thread Shawn Heisey
On 9/22/2014 12:05 AM, William Bell wrote:
> Is there an easy way to get max() across documents?

I think the stats component is probably what you want.  That component
seems to be enabled by default.

http://wiki.apache.org/solr/StatsComponent

Thanks,
Shawn



Re: using facet enum et fc in the same query.

2014-09-22 Thread Alan Woodward
You should be able to use f..method=enum

Alan Woodward
www.flax.co.uk


On 22 Sep 2014, at 16:21, jerome.dup...@bnf.fr wrote:

> Hello, 
> 
> I have a solr index (12 M docs, 45Go) with facets, and I'm trying to 
> improve facet queries performances.
> 1/ I tried to use docvalue on facet fields, it didn't work well
> 2/ I tried facet.threads=-1 in my querie, and worked perfectely (from more 
> 15s  to 2s for longest queries)
> 
> 3/ I'm trying to use facet.method=enum. It's supposed to improve the 
> performance for facets fileds with few differents values. (type of 
> documents, things like that)
> 
> My problem is that I don't know if there is a way to specifiy enum method 
> for some  facets (3 to 5000 different values), and fc method the some 
> others (up to 12M different values) and the same query?
> 
> Is it possible with something like MyFacet..facet.method=enum
> 
> ?
> 
> Thanks in advance for the answer.
> 
> ---
> Jérôme Dupont
> Bibliothèque Nationale de France
> Département des Systèmes d'Information
> Tour T3 - Quai François Mauriac
> 75706 Paris Cedex 13
> téléphone: 33 (0)1 53 79 45 40
> e-mail: jerome.dup...@bnf.fr
> ---
> 
> 
> Participez à l'acquisition d'un Trésor national - Le manuscrit royal de 
> François I er Avant d'imprimer, pensez à l'environnement.



RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Dyer, James
DirectSpellChecker defaults to not suggest anything for terms that occur in 1% 
or more of the total documents in the index.  You can set this higher in 
solrconfig.xml either with a fractional percent or a whole-number absolute 
number of documents.

See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMaxQueryFrequency%28float%29
 

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 9:41 AM
To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Yep, I tried it both as a default param in the request handler (as in 
the config I sent), and in the request, but with no effect... That's 
what surprised me, since it seems it should work.

On Mon, Sep 22, 2014 at 4:38 , Dyer, James 
 wrote:
> Did you try "spellcheck.alternativeTermCount" with 
> DirectSolrSpellChecker?  You can set it to whatever low value you 
> actually want it to return back to you (perhaps 20 suggestions max?).
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -Original Message-
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
> Sent: Monday, September 22, 2014 9:36 AM
> To: solr-user@lucene.apache.org
> Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
> alternativeTermCount
> 
> Hi James,
> 
> The request 
> /spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
> returns
> 
> quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
> quis, quae, quas, quem, quid, quin, qui, qua
> 
> Replacing quiam (not in the index) by quidam (in the index) returns 
> nothing at all, but I want it to return
> 
> quidam, quam, quia, quidem, quadam, quodam, quedam, ...
> 
> When I was using the same parameters with IndexBasedSpellChecker, by 
> setting a high alternativeTermCount, I got results for both. But as I 
> said, then I can't differentiate the different maxEdits.
> 
> The request handler is:
> 
>   class="org.apache.solr.handler.component.SearchHandler">
> 
>   fuzzy1
>   20
>   100
> 
> 
>   fuzzyterms
> 
>   
> 
> Thanks!
> 
> Nathaniel
> 
> On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
>  wrote:
>>  Nathaniel,
>>  
>>  Can you show us all of the parameters you are sending to the 
>>  spellchecker?  When you specify "alternativeTermCount" with 
>>  "spellcheck.q=quidam", what are the terms you expect to get back?  
>>  Also, are you getting any query results back?  If you are using a 
>> "q" 
>>  that returns results, or more results than you specify for 
>>  "spellcheck.maxResultsForSuggest", spellcheck won't give you 
>> anything 
>>  regardless of what you put for "spellcheck.q".
>>  
>>  James Dyer
>>  Ingram Content Group
>>  (615) 213-4311
>>  
>>  
>>  -Original Message-
>>  From: Nathaniel Rudavsky-Brody 
>> [mailto:nathaniel.rudav...@gmail.com] 
>>  Sent: Monday, September 22, 2014 8:08 AM
>>  To: solr-user@lucene.apache.org
>>  Subject: fuzzy terms, DirectSolrSpellChecker and 
>> alternativeTermCount
>>  
>>  Hello,
>>  
>>  I'm trying find the best way to "fake" the terms component for 
>> fuzzy 
>>  queries. That is, I need the full set of index terms for each of 
>> the 
>>  two queries "quidam~1" and "quidam~2".
>>  
>>  I tried defining two suggesters with FuzzyLookupFactory, with 
>>  maxEdits=1 and 2 respectively, but the results for "quidam~1" 
>> include 
>>  suffixes like "quodammodo", which makes sense for a suggester but 
>>  isn't 
>>  what I want here.
>>  
>>  Now I'm trying with the spell-checker. As far as I can see, 
>>  IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use 
>> it 
>>  to distinguish between my two queries. DirectSolrSpellChecker seems 
>>  like it should work, ie:
>>  
>>> class="solr.SpellCheckComponent">
>>  
>>fuzzy1
>>solr.DirectSolrSpellChecker
>>   1
>>  ...
>>  
>>  
>>fuzzy2
>>solr.DirectSolrSpellChecker
>>   2
>>  ...
>>  
>>
>>  
>>  However the parameter spellcheck.alternativeTermCount has no 
>> effect, 
>>  so 
>>  the query "spellcheck.q=quidam" gives no results, but 
>>  "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
>>  expected terms.
>>  
>>  Am I missing something? Or is there a better way to do this?
>>  
>>  Many thanks for any help and ideas,
>>  
>>  Nathaniel


RE: Help on custom sort

2014-09-22 Thread Scott Smith
I'll take a look at that.  Thanks

-Original Message-
From: Apoorva Gaurav [mailto:apoorva.gau...@myntra.com] 
Sent: Sunday, September 21, 2014 11:32 PM
To: solr-user
Subject: Re: Help on custom sort

Try using a custom value source parser and pass the "formula" of computing the 
price to solr; something like this 
http://java.dzone.com/articles/connecting-redis-solr-boosting

On Mon, Sep 22, 2014 at 1:38 AM, Scott Smith 
wrote:

> There are likely several hundred groups.  Also, new groups will be 
> added and some groups will be deleted.  So, I don't think putting a 
> field in the docs works.  Having to add a new group price into 100 
> million+ documents doesn't seem reasonable.
>
> Right now I'm looking at
> http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html.
> This reference a much older version of solr (the blog is from 2011) 
> and so I will need to update the classes referenced.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, September 20, 2014 11:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Help on custom sort
>
> How many different groups are there? And can user A ever be part of 
> more than one group?
> If
> 1> there are a reasonably small number of groups (< 100 or so as a
> place to start)
> and
> 2> a user is always part of a single group
>
> then you could store separate prices in each document by group, thus 
> you'd have some fields like
> price_group_a: $100
> price_group_b: $101
>
> then sorting  becomes trivial, you just specify a sort_group_a for 
> users in group A etc. If the number of groups is unknown-but-not-huge 
> dynamic fields could be used.
>
> If that's not the case, then you might be able to get clever with 
> sorting by function, here's a place to start:
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
> These can be arbitrarily complex, but I'm thinking something where the 
> price returned by the function respects the group the user is in, 
> perhaps even the min/max of all the groups the user is in. I admit I 
> haven't really thought that through well though...
>
> Best,
> Erick
>
> On Sat, Sep 20, 2014 at 9:26 AM, Scott Smith 
> 
> wrote:
> > I need to provide a custom sort option for sorting by price and I 
> > would
> like some suggestions.  It's not the straightforward "just sort by a 
> price field in the document" scenario or I wouldn't be asking for 
> help.  Here's the scenario I'm dealing with.
> >
> > I have 100 million+ documents (so multi-sharded).  Users search for
> documents they are interested in using a standard keyword search.  
> They then purchase documents they are interested in.  So far, nothing hard.
> >
> > Here's where things get "interesting".  The documents come from 
> > multiple
> suppliers.  Each supplier sets a price for his documents and different 
> suppliers will provide different pricing.
> >
> > That wouldn't be difficult except that *users* are divided up into
> different groups and depending on which group they are in, the 
> supplier will charge the user a different price.  So, user A may pay 
> one price for a document and user B may pay a different price for the 
> same document just because user A and user B are in different groups.  
> I don't even know if the relative order or pricing is the same between 
> different groups (e.g., if document X is more expensive than document 
> Y for a user in group M, it may not be more expensive for a user in 
> group N).  The one thing that may make this doable is that supplier A 
> will likely have the same price for all of his documents for each of 
> the user groups.  So, a user in group A will pay the same price 
> regardless of which document he buys from supplier 1.  A user in group 
> B will also pay the same price for any document from supplier 1; it's 
> just that a user in group B will likely pay a different price than a 
> user in group A.  So, within a supplier, the price varies based on user 
> group, not the document.
> >
> > To summarize, one of the requirements for the system is that we 
> > provide
> the ability to sort search results based on price.  This would be easy 
> except that the price a user pays not only depends on what he wants to 
> buy, but on what group the he is in.
> >
> > I suspect there is some kind of custom solr module I'm going to have 
> > to
> write.  I'm thinking that the user group gets passed in as a custom 
> solr parameter (I'm assuming that's possible??).  Then I'm thinking 
> that there has to be some kind of in memory database that tracks 
> pricing based on user group and document supplier).
> >
> > I'm happy to go read code, documents, links, etc if someone can 
> > point me
> in the right direction.  What kind of solr module am I likely going to 
> write (extend) and are there some examples somewhere?  Maybe there's a 
> way to do this without having to extend a solr module??
> >
> > Hope this makes sense.  Any he

RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Nathaniel Rudavsky-Brody

Thank you, that works!

I'd already tried several values for maxQueryFrequency, but apparently 
without properly understanding it. I was confused by the line "A lower 
threshold is better for small indexes" when in fact I need a high value 
like 0.99, so every term returns suggestions. (Is it possible to set it 
to 100%? Because 1 gets interpreted as an absolute value.)


Nathaniel

On Mon, Sep 22, 2014 at 6:17 , Dyer, James 
 wrote:
DirectSpellChecker defaults to not suggest anything for terms that 
occur in 1% or more of the total documents in the index.  You can set 
this higher in solrconfig.xml either with a fractional percent or a 
whole-number absolute number of documents.


See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMaxQueryFrequency%28float%29 


James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 9:41 AM

To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
alternativeTermCount


Yep, I tried it both as a default param in the request handler (as in 
the config I sent), and in the request, but with no effect... That's 
what surprised me, since it seems it should work.


On Mon, Sep 22, 2014 at 4:38 , Dyer, James 
 wrote:
 Did you try "spellcheck.alternativeTermCount" with 
 DirectSolrSpellChecker?  You can set it to whatever low value you 
 actually want it to return back to you (perhaps 20 suggestions 
max?).
 
 James Dyer

 Ingram Content Group
 (615) 213-4311
 
 
 -Original Message-
 From: Nathaniel Rudavsky-Brody 
[mailto:nathaniel.rudav...@gmail.com] 
 Sent: Monday, September 22, 2014 9:36 AM

 To: solr-user@lucene.apache.org
 Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
 alternativeTermCount
 
 Hi James,
 
 The request 
 
/spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
 returns
 
 quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
 quis, quae, quas, quem, quid, quin, qui, qua
 
 Replacing quiam (not in the index) by quidam (in the index) returns 
 nothing at all, but I want it to return
 
 quidam, quam, quia, quidem, quadam, quodam, quedam, ...
 
 When I was using the same parameters with IndexBasedSpellChecker, 
by 
 setting a high alternativeTermCount, I got results for both. But as 
I 
 said, then I can't differentiate the different maxEdits.
 
 The request handler is:
 
   class="org.apache.solr.handler.component.SearchHandler">

 
   fuzzy1
   20
   100
 
 
   fuzzyterms
 
   
 
 Thanks!
 
 Nathaniel
 
 On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
  wrote:

  Nathaniel,
  
  Can you show us all of the parameters you are sending to the 
  spellchecker?  When you specify "alternativeTermCount" with 
  "spellcheck.q=quidam", what are the terms you expect to get back? 
 
  Also, are you getting any query results back?  If you are using a 
 "q" 
  that returns results, or more results than you specify for 
  "spellcheck.maxResultsForSuggest", spellcheck won't give you 
 anything 
  regardless of what you put for "spellcheck.q".
  
  James Dyer

  Ingram Content Group
  (615) 213-4311
  
  
  -Original Message-
  From: Nathaniel Rudavsky-Brody 
 [mailto:nathaniel.rudav...@gmail.com] 
  Sent: Monday, September 22, 2014 8:08 AM

  To: solr-user@lucene.apache.org
  Subject: fuzzy terms, DirectSolrSpellChecker and 
 alternativeTermCount
  
  Hello,
  
  I'm trying find the best way to "fake" the terms component for 
 fuzzy 
  queries. That is, I need the full set of index terms for each of 
 the 
  two queries "quidam~1" and "quidam~2".
  
  I tried defining two suggesters with FuzzyLookupFactory, with 
  maxEdits=1 and 2 respectively, but the results for "quidam~1" 
 include 
  suffixes like "quodammodo", which makes sense for a suggester but 
  isn't 
  what I want here.
  
  Now I'm trying with the spell-checker. As far as I can see, 
  IndexBasedSpellChecker doesn't let me set maxEdits, so I can't 
use 
 it 
  to distinguish between my two queries. DirectSolrSpellChecker 
seems 
  like it should work, ie:
  
 class="solr.SpellCheckComponent">

  
fuzzy1
solr.DirectSolrSpellChecker
   1
...
  
  
fuzzy2
solr.DirectSolrSpellChecker
   2
  ...
  

  
  However the parameter spellcheck.alternativeTermCount has no 
 effect, 
  so 
  the query "spellcheck.q=quidam" gives no results, but 
  "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
  expected terms.
  
  Am I missing something? Or is there a better way to do this?
  
  Many thanks for any help and ideas,
  
  Nathaniel


Schema Parsing Failed: unknown field 'id' [Zookeeper, SolrCloud]

2014-09-22 Thread paulparsons
Hi,

I'm trying to set up a multicore SolrCloud on HDFS. I am getting the
following error for all my cores when trying to start the server:

ERROR org.apache.solr.core.CoreContainer  – Unable to create core: 
org.apache.solr.common.SolrException: Schema Parsing Failed: unknown field
'id'. Schema file is solr//schema.xml
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:618)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:166)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:243)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:595)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250)
at java.util.concurrent.FutureTask.run(FutureTask.java:273)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482)
at java.util.concurrent.FutureTask.run(FutureTask.java:273)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1176)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:853)
Caused by: java.lang.RuntimeException: unknown field 'id'
at 
org.apache.solr.schema.IndexSchema.getIndexedField(IndexSchema.java:340)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:536)
... 13 more


There is nothing wrong with the declaration of the 'id' field, and I have it
working fine when it's not using SolrCloud/HDFS. One odd thing is the part
that says "Schema file is solr//schema.xml", because there is no
schema file there. I have no idea where it is getting that path from. All of
the schema files are in solr//conf/schema.xml. I'm not sure if
this is the problem--it must be finding the schema, otherwise how does it
know about the 'id' field?

I am running it with the following command (with <> fields filled in
appropriately):

java -DnumShards=2 -Dbootstrap_conf=true -DzkHost=:2181 -Dhost=
-DSTOP.PORT=7983 -DSTOP.KEY=key -Dsolr.directoryFactory=HdfsDirectoryFactory
-Dsolr.hdfs.confdir=/hadoop-conf -Dsolr.lock.type=hdfs
-Dsolr.hdfs.home=hdfs:///user/pparsons/solrcloud -jar start.jar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-Parsing-Failed-unknown-field-id-Zookeeper-SolrCloud-tp4160478.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Dyer, James
You cannot use 100% because, as you say, 1 is intepreted as "1 document".  But 
you can do something like 99.9% .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 11:39 AM
To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Thank you, that works!

I'd already tried several values for maxQueryFrequency, but apparently 
without properly understanding it. I was confused by the line "A lower 
threshold is better for small indexes" when in fact I need a high value 
like 0.99, so every term returns suggestions. (Is it possible to set it 
to 100%? Because 1 gets interpreted as an absolute value.)

Nathaniel

On Mon, Sep 22, 2014 at 6:17 , Dyer, James 
 wrote:
> DirectSpellChecker defaults to not suggest anything for terms that 
> occur in 1% or more of the total documents in the index.  You can set 
> this higher in solrconfig.xml either with a fractional percent or a 
> whole-number absolute number of documents.
> 
> See 
> http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMaxQueryFrequency%28float%29
>  
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -Original Message-
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
> Sent: Monday, September 22, 2014 9:41 AM
> To: solr-user@lucene.apache.org
> Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
> alternativeTermCount
> 
> Yep, I tried it both as a default param in the request handler (as in 
> the config I sent), and in the request, but with no effect... That's 
> what surprised me, since it seems it should work.
> 
> On Mon, Sep 22, 2014 at 4:38 , Dyer, James 
>  wrote:
>>  Did you try "spellcheck.alternativeTermCount" with 
>>  DirectSolrSpellChecker?  You can set it to whatever low value you 
>>  actually want it to return back to you (perhaps 20 suggestions 
>> max?).
>>  
>>  James Dyer
>>  Ingram Content Group
>>  (615) 213-4311
>>  
>>  
>>  -Original Message-
>>  From: Nathaniel Rudavsky-Brody 
>> [mailto:nathaniel.rudav...@gmail.com] 
>>  Sent: Monday, September 22, 2014 9:36 AM
>>  To: solr-user@lucene.apache.org
>>  Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
>>  alternativeTermCount
>>  
>>  Hi James,
>>  
>>  The request 
>>  
>> /spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
>>  returns
>>  
>>  quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
>>  quis, quae, quas, quem, quid, quin, qui, qua
>>  
>>  Replacing quiam (not in the index) by quidam (in the index) returns 
>>  nothing at all, but I want it to return
>>  
>>  quidam, quam, quia, quidem, quadam, quodam, quedam, ...
>>  
>>  When I was using the same parameters with IndexBasedSpellChecker, 
>> by 
>>  setting a high alternativeTermCount, I got results for both. But as 
>> I 
>>  said, then I can't differentiate the different maxEdits.
>>  
>>  The request handler is:
>>  
>>   >  class="org.apache.solr.handler.component.SearchHandler">
>>  
>>fuzzy1
>>20
>>100
>>  
>>  
>>fuzzyterms
>>  
>>
>>  
>>  Thanks!
>>  
>>  Nathaniel
>>  
>>  On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
>>   wrote:
>>>   Nathaniel,
>>>   
>>>   Can you show us all of the parameters you are sending to the 
>>>   spellchecker?  When you specify "alternativeTermCount" with 
>>>   "spellcheck.q=quidam", what are the terms you expect to get back? 
>>>  
>>>   Also, are you getting any query results back?  If you are using a 
>>>  "q" 
>>>   that returns results, or more results than you specify for 
>>>   "spellcheck.maxResultsForSuggest", spellcheck won't give you 
>>>  anything 
>>>   regardless of what you put for "spellcheck.q".
>>>   
>>>   James Dyer
>>>   Ingram Content Group
>>>   (615) 213-4311
>>>   
>>>   
>>>   -Original Message-
>>>   From: Nathaniel Rudavsky-Brody 
>>>  [mailto:nathaniel.rudav...@gmail.com] 
>>>   Sent: Monday, September 22, 2014 8:08 AM
>>>   To: solr-user@lucene.apache.org
>>>   Subject: fuzzy terms, DirectSolrSpellChecker and 
>>>  alternativeTermCount
>>>   
>>>   Hello,
>>>   
>>>   I'm trying find the best way to "fake" the terms component for 
>>>  fuzzy 
>>>   queries. That is, I need the full set of index terms for each of 
>>>  the 
>>>   two queries "quidam~1" and "quidam~2".
>>>   
>>>   I tried defining two suggesters with FuzzyLookupFactory, with 
>>>   maxEdits=1 and 2 respectively, but the results for "quidam~1" 
>>>  include 
>>>   suffixes like "quodammodo", which makes sense for a suggester but 
>>>   isn't 
>>>   what I want here.
>>>   
>>>   Now I'm trying with the spell-checker. As far as I can see, 
>>>   IndexBasedSpellChecker doesn't let me set maxEdits, so I can't 
>>> use 
>>>  it 
>>>   to distinguish between my two queries. DirectSolrSpellCh

RE: using facet enum et fc in the same query.

2014-09-22 Thread Toke Eskildsen
jerome.dup...@bnf.fr [jerome.dup...@bnf.fr] wrote:
> I have a solr index (12 M docs, 45Go) with facets, and I'm trying to
> improve facet queries performances.
> 1/ I tried to use docvalue on facet fields, it didn't work well

That was surprising, as the normal result of switching to DocValues is 
positive. Can you elaborate on what you did and how it failed?

> 2/ I tried facet.threads=-1 in my querie, and worked perfectely (from more
> 15s  to 2s for longest queries)

That tells us that your primary problem is not IO. If your usage is normally 
single-threaded that can work, but it also means that you have a lot of CPU 
cores standing idle most of the time. How many fields are you using for 
faceting and how many of them are large (more unique values than the 5000 you 
mention)?

> 3/ I'm trying to use facet.method=enum. It's supposed to improve the
> performance for facets fileds with few differents values. (type of
> documents, things like that)

Having a mix of facet methods seems like a fine idea, although my personal 
experience is that enums gets slower than fc quite earlier than the 5000 unique 
values mark. As Alan states, the call is f.myfacetfield.facet.method=enum 
(Remember the 'facet.'-part. See 
https://wiki.apache.org/solr/SimpleFacetParameters#Parameters for details).

Or you could try Sparse Faceting (Disclaimer: I am the author), which seems to 
fit your setup very well: http://tokee.github.io/lucene-solr/

- Toke Eskildsen


Solr Boosting Unique Values

2014-09-22 Thread O. Olson
I use Solr to index some products that have an ImageUrl field. Obviously some
of the images are duplicates. I would like to boost the rankings of products
that have unique images (i.e. more specifically, unique ImageUrl field
values, because I don't deal with the image binary). 

By this I mean, if a certain product has a value in the ImageUrl not used by
any other product, it would be boosted more than another product which has a
value in the ImageUrl used by 3 other products. 

I hope I have explained that correctly. If not, please ask and I would try
again. 

For e.g. if I want to boost the products with quantity, I can add 

&bf=log(qty) 

to the request. Is there some similar function I can add to the ImageUrl
field to boost unique values?

Thank you in advance,
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Boosting-Unique-Values-tp4160507.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Schema Parsing Failed: unknown field 'id' [Zookeeper, SolrCloud]

2014-09-22 Thread Chris Hostetter

: There is nothing wrong with the declaration of the 'id' field, and I have it
: working fine when it's not using SolrCloud/HDFS. One odd thing is the part

...I can't explain that, but as far as this...

: that says "Schema file is solr//schema.xml", because there is no
: schema file there. I have no idea where it is getting that path from. All of

It's a bug in the error message itself...

https://issues.apache.org/jira/browse/SOLR-5814

: the schema files are in solr//conf/schema.xml. I'm not sure if
: this is the problem--it must be finding the schema, otherwise how does it
: know about the 'id' field?

The error message is indicating an inconsistnecy in your schema.xml -- 
most likeley related to the  declaration ... i suspect 
it's saying that your schema.xml defines id but you 
have no  anywhere in your schema (hard to be sure 
w/o knowing exactly what version of solr you are running, and w/o seeing a 
full copy of your schema.xml.

none of which explans why you *only* see this with SolrCloud on HDFS.

Full details would help: solr.xml, solrconfig.xml, schema.xml, etc


-Hoss
http://www.lucidworks.com/


Re: Static Fields Performance vs Dynamic Fields Performance

2014-09-22 Thread mark12345
Thanks for that link.  From what I read the performance difference is
negligible, especially if I would just be replacing one static field with a
dynamic one.


Erick Erickson wrote
> Sep 14, 2014; 12:06pm Re: Solr Dynamic Field Performance
> 
> 
> Dynamic fields, once they are actually _in_ a document, aren't any
> different than statically defined fields. Literally, there's no place
> in the search code that I know of that _ever_ has to check
> whether a field was dynamically or statically defined.
> 
> AFAIK, the only additional cost would be figuring out which pattern
> matched at index time, which is such a tiny portion of the cost of
> indexing that I doubt you could measure it.
> 
> Best,
> Erick

This leads me to my next question.  Does anyone know why doesn't Solr come
out of the box with dynamic fields for every field type (Simple example
below)?  Also is there a better template (Best practice) than
"solr-4.10.0/example/solr/collection1/conf/schema.xml"?  


>
> 
>
>  multiValued="true"/>
>
> 
>
>  multiValued="true"/>
>
> 
>
>  multiValued="true"/>
>
> 
>
>  multiValued="true"/>
>
> 
>
>  multiValued="true"/>
>
> 
>
>  multiValued="true"/>
>
>  stored="true"/>
>
>  stored="true" multiValued="true"/>
>
>  stored="true" multiValued="true"/>
> 
>
>  stored="true"  omitNorms="true"/>
>
>  stored="true"  multiValued="true"  omitNorms="true"/>
>
>  stored="true"  omitNorms="true"/>
>
>  stored="true"  multiValued="true"  omitNorms="true"/>
>
>  stored="true"  omitNorms="true"/>
>
>  stored="true"  multiValued="true"  omitNorms="true"/>
>
>  stored="true"  omitNorms="true"/>
>
>  stored="true"  multiValued="true"  omitNorms="true"/>
>
>  stored="true"  omitNorms="true"/>
>
>  stored="true"  multiValued="true"  omitNorms="true"/>
>
>  stored="true"  omitNorms="true"/>
>
>  stored="true"  multiValued="true"  omitNorms="true"/>
>
>  stored="true"  omitNorms="true"/>
>
>  stored="true"  multiValued="true"  omitNorms="true"/>
>
>  stored="true"  multiValued="true"  omitNorms="true"/>
>

>
>  stored="false" />
>
> 
>
>  multiValued="true"/>
>
> 
>
>  stored="false" omitNorms="true"/>
>
>  stored="true" omitNorms="true"/>
>
>  stored="true" multiValued="true" omitNorms="true"/>
>
>  stored="true" omitNorms="true"/>
> 

> 
>  indexed="true"  stored="true" />
>
>  indexed="true"  stored="true" multiValued="true"/>
>
>  indexed="true"  stored="true" />
>
>  indexed="true"  stored="true" multiValued="true"/>
>
>  indexed="true"  stored="true"/>
>
>  indexed="true"  stored="true" multiValued="true"/>
>
>  indexed="true"  stored="true"/>
>
>  indexed="true"  stored="true" multiValued="true"/>
>

>
>  stored="true"/>
>
>  stored="true" multiValued="true"/>
> 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Static-Fields-Performance-vs-Dynamic-Fields-Performance-tp4160316p4160513.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Schema Parsing Failed: unknown field 'id' [Zookeeper, SolrCloud]

2014-09-22 Thread paulparsons
Thanks. There is definitely a  in each of the schemas.

I am using 4.7.2.

Here is one of the *schema.xml* (the others are similar):



   
   

   
   
   
   
   
   
   
   
  
  
   
   
   
   

 id

   



  
  
   
  


















  



  
  




  



  





  



  
  




  



  




  


  


 




Here is the corresponding *solrconfig.xml*: 





  4.7

  
  
  

  

  
  
  

  ${solr.medline-citations.data.dir:}

  

   

  

  

  
${solr.lock.type:native}
true
  

  

  
  


  ${solr.ulog.dir:}


  
   ${solr.autoCommit.maxTime:15000} 
   false 
 

  
   ${solr.autoSoftCommit.maxTime:-1} 
 

  

  

1024







 


true

   20

   200


  

  


  

  static firstSearcher warming in solrconfig.xml

  


false

2

  

  
 




  

  

 
   explicit
   100
   catchall
   
 



  
  
 
   explicit
   json
   true
   text
 
  

  
 
   true
   json
   true
 
  

  
 
   explicit

   
   velocity
   browse
   layout
   Solritas

   
   edismax
   
  catchall^0.5 medline_abstract_text^1.0 medline_journal_title^1.2
medline_article_title^1.2 id^10.0 
   
   catchall
   100%
   *:*
   10
   *,score

   
 catchall^0.5 medline_abstract_text^1.0 medline_journal_title^1.2
medline_article_title^1.2 id^10.0
   
   catchall,medline_article_title,medline_journal_title
   3

   on
   medline_author_lastname
   medline_journal_title

   on
   medline_abstract_text
   html
   
   
   0
   medline_abstract_text

   on
   false   
   5
   2
   5   
   true
   true  
   5
   3   
 

 
 
   spellcheck
 
  


  

 application/json
   
  
  

 application/csv
   
  


  

  true
  ignored_

  
  true
  links
  ignored_

  

  

  

  

  
  

  solrpingquery


  all


  

  
  

 explicit 
 true

  

   
  



  

  ./medline-citations_DIHconfig.xml
  uima

  

  

text_general


  default
  catchall
  solr.DirectSolrSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  wordbreak
  solr.WordBreakSolrSpellChecker  
  name
  true
  true
  10


  

  

  catchall
  
  default
  wordbreak
  on
  true   
  10
  5
  5   
  true
  true  
  10
  5 


  spellcheck

  

  

  suggest
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.tst.TSTLookupFactory
  
  suggestions  
  
  true


  

  

  true
  suggest
  
  10
  


  suggest

  


  
  

  

  text
  true


  tvComponent

  

  

  lingo

  org.carrot2.clustering.lingo.LingoClusteringAlgorithm

  clustering/carrot2




  stc
  org.carrot2.clustering.stc.STCClusteringAlgorithm




  kmeans
  org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm

  
  
  
  

  
  
 
  true
  false
 

  terms

  


  
  

string
elevate.xml
  

  
  

  explicit
  text


  elevator

  

  
  

  
  
  

  100

  

  
  

  
  70
  
  0.5
  
  [-\w ,/\n\"']{20,200}

  

  
  

  
  

  

  
  

  
  
  
  
  
  
  
  
  
  
  

  

  
  

  
  

  
  
  

  10
  .,!? 	


  
  
  

  
  WORD
  
  
  en
  US

  

  

   



  


  
  /org/apache/uima/desc/AggregateGeneAE.xml
  
  
  true
  
  
false

   medline_abstract_text

Re: Schema Parsing Failed: unknown field 'id' [Zookeeper, SolrCloud]

2014-09-22 Thread Chris Hostetter

: Thanks. There is definitely a  in each of the schemas.
: 
: I am using 4.7.2.

if this conig is working for you when you don't use zookeeper/hdfs then 
you must be using a newer version of Solr when you test w/ zk/hdfs


4.8.0 is when the  and  section tags were deprecated.

in 4.7.x and earlier you *must* enclose all of your  and 
 tags in the appropriate sections.

See the 4.7.2 example configs and compare to yours.


>From the 4.8.0 upgrade notes...

 and  tags have been deprecated. There is no longer any 
reason to keep them in the schema file, they may be safely removed. This 
allows intermixing of ,  and  definitions if 
desired. Currently, these tags are supported so either style may be 
implemented. TBD is whether they'll be deprecated formally for 5.0 

https://issues.apache.org/jira/browse/SOLR-5228



: 
: Here is one of the *schema.xml* (the others are similar):
: 
: 
: 
:
:
: 
:
:
:
:
:
:
:
:
:   
:   
:
:
:
:
: 
:  id
: 
:
: 
: 
: 
:   
:   
:
:   
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
:   
: 
: 
: 
:   
:   
: 
: 
: 
: 
:   
: 
: 
: 
:   
: 
: 
: 
: 
: 
: 
: 
:   
: 
:   
:   
: 
: 
: 
: 
: 
: 
:   
: 
:   
: 
: 
: 
: 
:   
: 
: 
:   
: 
: 
:  
: 
: 
: 
: 
: Here is the corresponding *solrconfig.xml*: 
: 
: 
: 
: 
: 
:   4.7
: 
:   
:   
:   
: 
:   
: 
:   
:   
:   
: 
:   ${solr.medline-citations.data.dir:}
: 
:   
: 
:
: 
:   
: 
:   
: 
:   
: ${solr.lock.type:native}
: true
:   
: 
:   
: 
:   
:   
: 
: 
:   ${solr.ulog.dir:}
: 
: 
:   
:${solr.autoCommit.maxTime:15000} 
:false 
:  
: 
:   
:${solr.autoSoftCommit.maxTime:-1} 
:  
: 
:   
: 
:   
: 
: 1024
: 
: 
: 
: 
: 
: 
: 
:  
: 
: 
: true
: 
:20
: 
:200
: 
: 
:   
: 
:   
: 
: 
:   
: 
:   static firstSearcher warming in solrconfig.xml
: 
:   
: 
: 
: false
: 
: 2
: 
:   
: 
:   
:  
: 
: 
: 
: 
:   
: 
:   
: 
:  
:explicit
:100
:catchall
:
:  
: 
: 
: 
:   
:   
:  
:explicit
:json
:true
:text
:  
:   
: 
:   
:  
:true
:json
:true
:  
:   
: 
:   
:  
:explicit
: 
:
:velocity
:browse
:layout
:Solritas
: 
:
:edismax
:
:   catchall^0.5 medline_abstract_text^1.0 medline_journal_title^1.2
: medline_article_title^1.2 id^10.0 
:
:catchall
:100%
:*:*
:10
:*,score
: 
:
:  catchall^0.5 medline_abstract_text^1.0 medline_journal_title^1.2
: medline_article_title^1.2 id^10.0
:
:catchall,medline_article_title,medline_journal_title
:3
: 
:on
:medline_author_lastname
:medline_journal_title
: 
:on
:medline_abstract_text
:html
:
:
:0
:medline_abstract_text
: 
:on
:false   
:5
:2
:5   
:true
:true  
:5
:3   
:  
: 
:  
:  
:spellcheck
:  
:   
: 
: 
:   
: 
:  application/json
:
:   
:   
: 
:  application/csv
:
:   
: 
: 
:   
: 
:   true
:   ignored_
: 
:   
:   true
:   links
:   ignored_
: 
:   
: 
:   
: 
:   
: 
:   
: 
:   
:   
: 
:   solrpingquery
: 
: 
:   all
: 
: 
:   
: 
:   
:   
: 
:  explicit 
:  true
: 
:   
: 
:
:   
: 
: 
: 
:   
: 
:   ./medline-citations_DIHconfig.xml
:   uima
: 
:   
: 
:   
: 
: text_general
: 
: 
:   default
:   catchall
:   solr.DirectSolrSpellChecker
:   
:   internal
:   
:   0.5
:   
:   2
:   
:   1
:   
:   5
:   
:   4
:   
:   0.01
:   
: 
: 
: 
: 
:   wordbreak
:   solr.WordBreakSolrSpellChecker  
:   name
:   true
:   true
:   10
: 
: 
:   
: 
:   
: 
:   catchall
:   
:   default
:   wordbreak
:   on
:   true   
:   10
:   5
:   5   
:   true
:   true  
:   10
:   5 
: 
: 
:   spellcheck
: 
:   
: 
:   
: 
:   suggest
:   org.apache.solr.spelling.suggest.Suggester
:   org.apache.solr.spelling.suggest.tst.TSTLookupFactory
:   

Re: running solr in debug through eclipse

2014-09-22 Thread Erick Erickson
NP, glad to help someone dig into the Solr code.

We'll wait for the patch ;)...

Erick

On Sun, Sep 21, 2014 at 8:52 PM, Anurag Sharma  wrote:
> Hey Eric,
>
> It works like charm :).
> Thanks a lot for pin pointing the issue. My bad I was using the suspend=y
> option blindly.
>
> Thanks again,
> Anurag
>
> On Sun, Sep 21, 2014 at 10:03 PM, Erick Erickson 
> wrote:
>
>> It's doing exactly what you tell it to:
>>
>> java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7666
>> -jar start.jar
>>
>> Specifically suspend=y' means it will sit there, very patiently, until
>> you connect to it with a debugger and tell it to go. This is _very_
>> useful to debug initialization errors, but can sometimes be a bit
>> puzzling.
>>
>> I'd recommend you actually attach with the debugger (i.e. a "remote"
>> session). In IntelliJ, (I'm sure there are analogous ways in Eclipse),
>> you create, quite literally, a "remote session" that you give the URL
>> of the server you started above and the port. You start your server as
>> above and then start your remote session in your IDE and you'll be in
>> the debugger, attached to the running Solr instance. You can set
>> breakpoints or just hit the "go" button and the server should start
>> up. My setup usually just has "localhost" and 7666 for the URL/port.
>>
>> You do not have to attach a debugger first, just specify 'suspend=n'
>> instead. But starting with 'suspend=y' insures you have actually
>> attached to the server and have all the parts in place.
>>
>> Best,
>> Erick
>>
>> On Sun, Sep 21, 2014 at 3:51 AM, Anurag Sharma  wrote:
>> > Hi All,
>> >
>> > Thanks a lot for your suggestions. Shalin, your direction quickly took me
>> > to the issue, it was very insightful and helpful.
>> > Finally am able to understand the issue I was working on and run
>> particular
>> > unit test class AtomicUpdatesTest around it.
>> >
>> > On running the Solr in debug mode, I am still not able to start solr in
>> > debug mode using:
>> > java -Xdebug
>> -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7666
>> > -jar start.jar
>> > (Ref: http://wiki.apache.org/solr/HowToConfigureEclipse)
>> > The command wait for hours and the server never comes up on windows
>> without
>> > giving any error/info message.
>> >
>> > Please suggest if someone faced this issue. I tried restarting windows,
>> > ensured no process running on 7666 port. In previous trials to start, I
>> > used to get msg "Debugger failed to attach: handshake failed - received
>> >>GET /solr/ HTT< - expect" but it stopped coming now.
>> >
>> > Thanks
>> > Anurag
>> >
>> >
>> >
>> >
>> > On Fri, Sep 19, 2014 at 8:21 PM, Erick Erickson > >
>> > wrote:
>> >
>> >> Yeah, it's usually pretty daunting to know where to start, the
>> >> codebase is kinda big. Even "start from junit test" is often daunting,
>> >> there are a lot of them too.
>> >>
>> >> Others have given you good places to start, good luck!
>> >>
>> >> Erick
>> >>
>> >> On Fri, Sep 19, 2014 at 12:23 AM, Bernd Fehling
>> >>  wrote:
>> >> > Just start at the UpdateHandler and follow it down the line.
>> >> >
>> >> > I would start at org/apache/solr/update/UpdateHandler.java
>> >> >
>> >> > If you already know if it is add, delete or update then start with
>> >> > AddUpdateCommand.java, DeleteUpdateCommand.java or UpdateCommand.java.
>> >> >
>> >> > Just follow the red line :-)
>> >> >
>> >> > Regards
>> >> > Bernd
>> >> >
>> >> >
>> >> > Am 19.09.2014 um 08:47 schrieb Anurag Sharma:
>> >> >> Thanks Bernd for your insight.
>> >> >> As of now, I am focussing to fix the issue in the updater but not
>> able
>> >> to
>> >> >> localize which code to look in for it.
>> >> >>
>> >> >> Regards,
>> >> >> Anurag
>> >> >>
>> >> >> On Fri, Sep 19, 2014 at 12:09 PM, Bernd Fehling <
>> >> >> bernd.fehl...@uni-bielefeld.de> wrote:
>> >> >>
>> >> >>> It depends on what you are going to do.
>> >> >>>
>> >> >>> If you are adding/modifying code and Junit tests use Junit test
>> cases.
>> >> >>> If you are debugging runtime problems under load use remote
>> debugging.
>> >> >>> If you are going for in deep debugging (even into Jetty and Java)
>> use
>> >> >>> RunJettyRun for Eclipse.
>> >> >>>
>> >> >>> Regards
>> >> >>> Bernd
>> >> >>>
>> >> >>>
>> >> >>> Am 18.09.2014 um 20:50 schrieb Anurag Sharma:
>> >>  Dear Solr users,
>> >> 
>> >>  I am new to Solr dev community and trying to setup eclipse to
>> debug a
>> >>  running solr server. Please suggest if anyone of you have tried
>> doing
>> >> the
>> >>  same.
>> >> 
>> >>  Once above is done. Also suggest the entry point in code where
>> >> breakpoint
>> >>  can be placed.
>> >> 
>> >>  Thanks
>> >>  Anurag
>> >> 
>> >> >>>
>> >> >>>
>> >> >>
>> >>
>>


Re: Search multiple cores, one result

2014-09-22 Thread Erick Erickson
Depending on the size, I'd go for (a). IOW, I  wouldn't change the
sharding to use (a), but if you have the same shard setup in that
case, it's easier.

You'd index a type field with each doc indicating the source of your
document. Then use the grouping feature to return the top N from each
of the groups you care about.

Then a single request will return some docs from each of your doc
types, and it's again up to the application layer to combine them
intelligently. I'm sure you're aware that the scores aren't comparable
in this scenario, so

Of course you can use filter queries (fq) clauses to restrict to a
single type of doc as appropriate.

Best,
Erick

On Mon, Sep 22, 2014 at 4:54 AM, Clemens Wyss DEV  wrote:
> As mentioned in antoher post we (already) have a (Lucene-based) generic 
> indexing framework which allows any source/entity to provide 
> indexable/searchable data.
> Sources may be:
> pages
> events
> products
> customers
> ...
> As their names imply they have nothing in common ;) Never the less we'd like 
> to search across them, getting one resultset with the top hits
> (searching across sources is also required for (auto)suggesting search terms)
>
> In our current Lucene-approach we create a Lucene index per source (and 
> language) and then search across the indexs  with a MultiIndexReader.
> Switching to Solr we'd like to rethink the design decision whether to
> a) put all data into one core(Lucene index)
> or to
> b) split them into seperate cores
>
> if  b) how can I search across the cores (in SolrJ)?
>
> Thx
> Clemens


Solr cloud setup question

2014-09-22 Thread Susmit Shukla
Hi solr experts,

I am building out a solr cluster with this configuration

3 external zookeeprs
15 solr instances (nodes)
3 shards

I need to start out with 3 nodes and remaining 12 nodes would be added to
cluster. I am able to create a collection with 3 shards. This process works
fine using collections create API.
The core directory is automatically created by solr -
multishard_shard1_replica1, multishard_shard2_replica1 etc with
core.properties file containing shard and replica info

However, when I add new machines running solr and pointing to this zk
cluster, they do not get added as replicas. The new machines have solr.home
directory available but no core specific directories beneath it since I
want solr to auto add as replica.
If I create the directories manually beneath solr.home named
multishard_shard2_replica2, multishard_shard2_replica3 and so on and
provide core.properties file, they are correctly added to the cloud as
replicas.

Is there a way to automatically do it? since solr documentation says so..

https://cwiki.apache.org/confluence/display/solr/Nodes%2C+Cores%2C+Clusters+and+Leaders
in Leaders and Replicas section

Thanks,
Susmit


Re: Solr upgrade to latest version

2014-09-22 Thread Erick Erickson
Probably go for 4.9.1. There'll be a 4.10.1 out in the not-too-distant
future that you can upgrade to if you wish. 4.9.1 -> 4.10.1 should be
quite painless.

But do _not_ copy your schema.xml and solrconfig.xml files over form
1.4 to 4.x. There are some fairly easy ways to shoot yourself in the
foot there. Take the stock distribution configuration files and copy
_parts_ of your schema.xml and solrconfig.xml you care about.

If you're using multiple cores, read about core discovery here:
https://wiki.apache.org/solr/Core%20Discovery%20(4.4%20and%20beyond)

And be very aware that you should _not_ remove any of the _field_
entries in schema.xml. In particular _version_ and _root_ should be
left alone. As well as the "id" field.

And you'll have to re-index everything; Solr 4.x will not read Solr
1.4 indexes. If that's impossible, you'll have to upgrade from 1.4 to
3.x, optimize your index, then upgrade from 3.x to 4.x, add some
documents, and optimize/force_merge again.

HTH
Erick

On Mon, Sep 22, 2014 at 2:29 AM, Danesh Kuruppu  wrote:
> Hi all,
>
> I currently working on upgrade sorl 1.4.1 to sorl latest stable release.
>
> What is the latest stable release I can use?
> Is there specfic things I need to look at when upgrade.
>
> Need help
> Thanks
>
> Danesh


Re: Issue Adding Filter Query

2014-09-22 Thread Erick Erickson
You have your index and query time analysis chains defined much
differently. Omitting the WordDelimiterFilterFactory from the
query-time analysis chain will lead to endless problems.

With the definition you have, here are the terms in the index and
their term positions as  below. This is available from the
admin/analysis page if you click the "verbose" checkbox, although I
admit it's kind of hard to read:
1 2   34
fatty  acid-binding bindingprotein
 acid

But at query time, this is how they're being analyzed
1 2   3
fattyacid-bindingprotein

So searching for "fatty acid-binding protein" requires that the tokens
"fatty" "acid-binding" and "protein" appear in term positions 1, 2, 3
rather  than where they actually are (1, 2, 4). Searching for "fatty
acid-binding protein"~1 would actually find this, the "~1" means allow
one gap in there.

HOWEVER, that's the least of your problems. WordDelimiterFilterFactory
will _also_ "split on intra-word delimiters (all non alpha-numeric
characters)". While that doesn't really say so explicitly, that will
have the effect of removing puncutation. So searching for "fatty
acid-binding protein."~1 (note the period) will fail since the token
will include the period.

I'd _really_ advise you to use the stock WordDelimiterFilterFactory
settings in both analysis and query times included in the stock Solr
release for, say, text_en_splitting or even a single analyzer like
text_en_splitting_tight.

Best,
Erick

On Mon, Sep 22, 2014 at 6:33 AM, aaguilar  wrote:
> Hello Erick.
>
> Below is the information you requested.   Thanks for your help!
>
>  "100">   "solr.WhitespaceTokenizerFactory"/>  "solr.WordDelimiterFilterFactory" splitOnNumerics="0" splitOnCaseChange="0"
> generateWordParts="1" generateNumberParts="0" catenateWords="0"
> catenateNumbers="0" catenateAll="0" preserveOriginal="1"/>  "solr.StopFilterFactory"/>   analyzer>   "solr.WhitespaceTokenizerFactory"/>  "solr.LowerCaseFilterFactory"/>  
>
>
>  />
>
> On Fri, Sep 19, 2014 at 7:36 PM, Erick Erickson [via Lucene] <
> ml-node+s472066n4160122...@n3.nabble.com> wrote:
>
>> Hmmm, I'd have to see the schema definition for your description
>> field. For this, the admin/analysis page is very helpful. Here's my
>> guess:
>>
>> Your analysis chain doesn't break the incoming tokens up quite like
>> you think it is. Thus you have the tokens in your index like
>> 'protein,' (notice the comma) and 'protein-like' rather than just
>> 'protein'. However, I can't quite reconcile this with your statement:
>> "Another weird thing is that if I used description:"fatty
>> acid-binding" AND description:"protein"
>>
>> so I'm at something of a loss. If you paste in your schema definition
>> for the 'description' field _and_ the corresponding 
>> definition I can give it a quick whirl.
>>
>> Best,
>> Erick
>>
>> On Fri, Sep 19, 2014 at 11:53 AM, aaguilar <[hidden email]
>> > wrote:
>>
>> > Hello Erick,
>> >
>> > Thanks for the response.  I tried adding the debug=True to the query,
>> but I
>> > do not know exactly what I am looking for in the output.  Would it be
>> > possible for you to look at the results?  I would really appreciate it.
>> I
>> > attached two files, one of them is with the filter query
>> description:"fatty
>> > acid-binding" and the other is with the filter query description:"fatty
>> > acid-binding protein".  If you see the file that has the results for
>> > description:"fatty acid-binding" , you can see that the hits do have
>> "fatty
>> > acid-binding protein" and nothing in between.  I really appreciate any
>> help
>> > you can provide.
>> >
>> > Thanks you
>> >
>> > On Fri, Sep 19, 2014 at 2:03 PM, Erick Erickson [via Lucene] <
>> > [hidden email] >
>> wrote:
>> >
>> >> Your very best friend here is attaching &debug=query to the URL and
>> >> looking at the parsed query results. Upon occasion there's some
>> >>
>> >> One possible explanation is that description field has something like
>> >> "fatty acid-binding some words protein" in which case your query
>> >> "fatty acid-binding protein" would fail, but "fatty acid-binding
>> >> protein"~4 would succeed.
>> >>
>> >> The other possibility is that your query parsing isn't quite doing
>> >> what you think, but adding &debug=query should help there.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Fri, Sep 19, 2014 at 8:10 AM, aaguilar <[hidden email]
>> >> > wrote:
>> >>
>> >> > Hello All,
>> >> >
>> >> > I recently came across a problem when I tried using
>> description:"fatty
>> >> > acid-binding protein" as a filter query when doing a query through
>> the
>> >> query
>> >> > interface for Solr in the Tomcat server.  Using that filter query did
>> >> not
>> >> > give me any results at all, however if I used description:"fatt

Formatting dates

2014-09-22 Thread Manohar Kanuri
Hello,

I am a non-techie who decided to download and install Solr 5.0 to parse data  
for my community activism. Got it installed and running, updated the example 
schema and installation with a bunch of CSV data. And went back to deal with 
the first of two fields I deferred till later - dates and location data. 

The CSV data file for Jan - August 2014 is about 650mb with about 1.25 million 
records/rows. I split it into 5 pieces and went changed MM/DD/ HH:MM:SS 
AM/PM to the -MM-DDTHH:MM:SSZ format required by Solr, using TextWrangler. 
Which is what I know and a step up from trying to use Mac Numbers spreadsheet 
which does it very easily but I will have to break it into pieces smaller than 
25-30mb. Random fields can get updated months after the record was created so I 
have to find an easier way than break the CSV file into smaller bits and 
reformat manually. Each record/row has 4 date fields so potentially there are 
upto 5 million fields to be reformatted in 8 months worth of data.. 

I did a Google search (didn't see a Solr search page) on the mailing list 
archives and the internet, but seems like my question is either too simple 
and/or it's staring me in the face and I'm just missing it:  Is there a simple 
way to reformat the dates to Solr-style in a 650mb-1gig CSV file? Or, ideally, 
have the dates and times automatically reformatted as the Solr index gets 
updated the latest data (I recall reading this was not possible). Is there a 
widget/gadget/gizmo/script that would do this? 

thanks,
manohar

Re: Solr Boosting Unique Values

2014-09-22 Thread Erick Erickson
This should be happening automatically by the tf/idf
calculations, which weighs terms that are rare in the
index more heavily than ones that are more common.

That said, at very low numbers this may be invisibly,
I'm not sure the relevance calculations for 3 as opposed
to 1 are very consequential.

However, you _do_ have access to the tf in the Function Queries,
see: https://cwiki.apache.org/confluence/display/solr/Function+Queries

You could manipulate the scores of your docs by getting
creative with these I think for your particular case.

Best,
Erick

On Mon, Sep 22, 2014 at 11:12 AM, O. Olson  wrote:
> I use Solr to index some products that have an ImageUrl field. Obviously some
> of the images are duplicates. I would like to boost the rankings of products
> that have unique images (i.e. more specifically, unique ImageUrl field
> values, because I don't deal with the image binary).
>
> By this I mean, if a certain product has a value in the ImageUrl not used by
> any other product, it would be boosted more than another product which has a
> value in the ImageUrl used by 3 other products.
>
> I hope I have explained that correctly. If not, please ask and I would try
> again.
>
> For e.g. if I want to boost the products with quantity, I can add
>
> &bf=log(qty)
>
> to the request. Is there some similar function I can add to the ImageUrl
> field to boost unique values?
>
> Thank you in advance,
> O. O.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Boosting-Unique-Values-tp4160507.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Static Fields Performance vs Dynamic Fields Performance

2014-09-22 Thread Erick Erickson
The example schema and solrconfig are intended to
show you a large number of possibilities, they are
not necessarily intended to be "best practices". I would
argue that if you do _not_ want to have dynamic fields
defined, you should take them all out. And you should
take all of the other field definitions out that you don't want
in your specific installation. Ditto with the solrconfig stuff.
Do you use QueryElevationComponent? Clustering?
Autosuggest? No? Then rip them out. Ditto with
unused field types. Ditto with

I admit this is a personal preference, but the next person
to try to maintain your system will thank you for not
having to wonder if s/he needs all that stuff.

HOWEVER
Do not remove _version_, _root_ or "id" fields from your
schema.xml file unless you're absolutely certain you know
the consequences. And given your first question, at this
stage I suspect you don't know those consequences ;). The
same for any schema.xml field that starts and ends with
an underscore (_)...

And watch out for the "text" field. If you remove it blindly,
your schema, the core won't initialize since the solrconfig.xml file
look for the "text" field in several handlers to define the
default search field if no field is explicitly placed in the search.

This is not hard to fix, just be prepared to spend some time
chasing stuff like that down if you aggressively clean up
stuff you don't think you need.

Best,
Erick

On Mon, Sep 22, 2014 at 11:24 AM, mark12345
 wrote:
> Thanks for that link.  From what I read the performance difference is
> negligible, especially if I would just be replacing one static field with a
> dynamic one.
>
>
> Erick Erickson wrote
>> Sep 14, 2014; 12:06pm Re: Solr Dynamic Field Performance
>>
>>
>> Dynamic fields, once they are actually _in_ a document, aren't any
>> different than statically defined fields. Literally, there's no place
>> in the search code that I know of that _ever_ has to check
>> whether a field was dynamically or statically defined.
>>
>> AFAIK, the only additional cost would be figuring out which pattern
>> matched at index time, which is such a tiny portion of the cost of
>> indexing that I doubt you could measure it.
>>
>> Best,
>> Erick
>
> This leads me to my next question.  Does anyone know why doesn't Solr come
> out of the box with dynamic fields for every field type (Simple example
> below)?  Also is there a better template (Best practice) than
> "solr-4.10.0/example/solr/collection1/conf/schema.xml"?
>
>
>>
>> 
>>
>> > multiValued="true"/>
>>
>> 
>>
>> > multiValued="true"/>
>>
>> 
>>
>> > multiValued="true"/>
>>
>> 
>>
>> > multiValued="true"/>
>>
>> 
>>
>> > multiValued="true"/>
>>
>> 
>>
>> > multiValued="true"/>
>>
>> > stored="true"/>
>>
>> > stored="true" multiValued="true"/>
>>
>> > stored="true" multiValued="true"/>
>>
>>
>> > stored="true"  omitNorms="true"/>
>>
>> > stored="true"  multiValued="true"  omitNorms="true"/>
>>
>> > stored="true"  omitNorms="true"/>
>>
>> > stored="true"  multiValued="true"  omitNorms="true"/>
>>
>> > stored="true"  omitNorms="true"/>
>>
>> > stored="true"  multiValued="true"  omitNorms="true"/>
>>
>> > stored="true"  omitNorms="true"/>
>>
>> > stored="true"  multiValued="true"  omitNorms="true"/>
>>
>> > stored="true"  omitNorms="true"/>
>>
>> > stored="true"  multiValued="true"  omitNorms="true"/>
>>
>> > stored="true"  omitNorms="true"/>
>>
>> > stored="true"  multiValued="true"  omitNorms="true"/>
>>
>> > stored="true"  omitNorms="true"/>
>>
>> > stored="true"  multiValued="true"  omitNorms="true"/>
>>
>> > stored="true"  multiValued="true"  omitNorms="true"/>
>>
>
>>
>> > stored="false" />
>>
>> 
>>
>> > multiValued="true"/>
>>
>> 
>>
>> > stored="false" omitNorms="true"/>
>>
>> > stored="true" omitNorms="true"/>
>>
>> > stored="true" multiValued="true" omitNorms="true"/>
>>
>> > stored="true" omitNorms="true"/>
>>
>
>>
>> > indexed="true"  stored="true" />
>>
>> > indexed="true"  stored="true" multiValued="true"/>
>>
>> > indexed="true"  stored="true" />
>>
>> > indexed="true"  stored="true" multiValued="true"/>
>>
>> > indexed="true"  stored="true"/>
>>
>> > indexed="true"  stored="true" multiValued="true"/>
>>
>> > indexed="true"  stored="true"/>
>>
>> > indexed="true"  stored="true" multiValued="true"/>
>>
>
>>
>> > stored="true"/>
>>
>> > stored="true" multiValued="true"/>
>>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Static-Fields-Performance-vs-Dynamic-Fields-Performance-tp4160316p4160513.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Schema Parsing Failed: unknown field 'id' [Zookeeper, SolrCloud]

2014-09-22 Thread Erick Erickson
One other possibility in addition to Hoss' comments.
Did you load a version of your configs to ZooKeeper
sometime that didn't have these fields? I don't quite
know where the schema and solrconfig files came
from, but the fact that they're on a local disk says
nothing about what's in ZooKeeper. When you
follow the SolrCloud tutorial, the -Dbootstrap_confdir
loads the Solr configuration files up to ZK, and
thereafter any Solr starting up will use them
(assuming they're linked). So it's possible that you
uploaded something some time ago that you're using
every time you try to start SolrCloud.

P.S. You can pull them down from ZK with the
scripts in the example/scripts/cloud-scripts

FWIW,
Erick

On Mon, Sep 22, 2014 at 12:19 PM, Chris Hostetter
 wrote:
>
> : Thanks. There is definitely a  in each of the schemas.
> :
> : I am using 4.7.2.
>
> if this conig is working for you when you don't use zookeeper/hdfs then
> you must be using a newer version of Solr when you test w/ zk/hdfs
>
>
> 4.8.0 is when the  and  section tags were deprecated.
>
> in 4.7.x and earlier you *must* enclose all of your  and
>  tags in the appropriate sections.
>
> See the 4.7.2 example configs and compare to yours.
>
>
> From the 4.8.0 upgrade notes...
>
>  and  tags have been deprecated. There is no longer any
> reason to keep them in the schema file, they may be safely removed. This
> allows intermixing of ,  and  definitions if
> desired. Currently, these tags are supported so either style may be
> implemented. TBD is whether they'll be deprecated formally for 5.0
>
> https://issues.apache.org/jira/browse/SOLR-5228
>
>
>
> :
> : Here is one of the *schema.xml* (the others are similar):
> :
> :
> : 
> :
> :
> : : required="true" multiValued="false" />
> : : stored="true" />
> : : stored="true" termVectors="true" termPositions="true" termOffsets="true"/>
> :
> : : stored="true" multiValued="true"/>
> : : stored="true" multiValued="true"/>
> : : stored="true" multiValued="true" docValues="true"/>
> : : stored="true" termVectors="true" termPositions="true" termOffsets="true"/>
> :
> :: multiValued="true" required="false" />
> :: multiValued="true" required="false" />
> :: multiValued="true" required="false" />
> : : multiValued="true"/>
> : : stored="true" type="textSpell"/>
> : : type="string" multiValued="true" docValues="true"/>
> :
> :  id
> :
> : : dest="medline_journal_title_facets"/>
> :
> :  : positionIncrementGap="100">
> : 
> :   
> :   
> :
> :   
> :
> : 
> :
> :  : sortMissingLast="true"/>
> :
> :  : positionIncrementGap="0"/>
> :  : positionIncrementGap="0"/>
> :  : positionIncrementGap="0"/>
> :  : positionIncrementGap="0"/>
> :  : positionIncrementGap="0"/>
> :  : positionIncrementGap="0"/>
> :  : positionIncrementGap="0"/>
> :  : positionIncrementGap="0"/>
> :  : positionIncrementGap="0"/>
> :  : positionIncrementGap="0"/>
> : 
> : 
> :  : positionIncrementGap="100">
> :   
> : 
> :  : words="stopwords.txt" />
> : 
> :   
> :   
> : 
> :  : words="stopwords.txt" />
> :  : ignoreCase="true" expand="true"/>
> : 
> :   
> : 
> :
> :  : positionIncrementGap="100">
> :   
> : 
> :
> :
> :  : ignoreCase="true"
> : words="lang/stopwords_en.txt"
> : />
> : 
> : 
> :  : protected="protwords.txt"/>
> :
> : 
> :   
> :   
> : 
> :  : ignoreCase="true" expand="true"/>
> :  : ignoreCase="true"
> : words="lang/stopwords_en.txt"
> : />
> : 
> : 
> :  : protected="protwords.txt"/>
> :
> : 
> :   
> : 
> :
> :
> :  : positionIncrementGap="100">
> :   
> : 
> : 
> :   
> : 
> :
> :
> :  : multiValued="true" class="solr.StrField" />
> :
> : 
> :
> : Here is the corresponding *solrconfig.xml*:
> :
> :
> : 
> : 
> :
> :   4.7
> :
> :   
> :   
> :: />
> :
> :   
> :
> :   
> :   
> :   
> :
> :   ${solr.medline-citations.data.dir:}
> :
> ::
> : class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">
> :
> :   
> :
> :   
> :
> :   
> :
> :   
> : ${solr.lock.type:native}
> : true
> :   
> :
> :   
> :
> :
> :   
> :
> : 
> :   ${solr.ulog.dir:}
> : 
> :
> :  
> :${solr.autoCommit.maxTime:15000}
> :false
> :  
> :
> :  
> :${solr.autoSoftCommit.maxTime:-1}
> :  
> :
> :   
> :
> :   
> :
> : 1024
> :
> :  :  size="512"
> :  initialSize="512"
> :  autowarmCount="0"/>
> :
> :  :  size="512"
> :  initialSize="512"
> :  autowarmCount="0"/>
> :
> :  :  

Re: Solr cloud setup question

2014-09-22 Thread Erick Erickson
That page is talking about leaders/followers coming up
and going down, but pretty much after they've been
assigned in the first place. Your problem is just the
"assigned in the first place" bit.

Since Solr 4.8, there's the addreplica collections API
command that is what you want I think, see:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica

Best,
Erick

On Mon, Sep 22, 2014 at 12:45 PM, Susmit Shukla  wrote:
> Hi solr experts,
>
> I am building out a solr cluster with this configuration
>
> 3 external zookeeprs
> 15 solr instances (nodes)
> 3 shards
>
> I need to start out with 3 nodes and remaining 12 nodes would be added to
> cluster. I am able to create a collection with 3 shards. This process works
> fine using collections create API.
> The core directory is automatically created by solr -
> multishard_shard1_replica1, multishard_shard2_replica1 etc with
> core.properties file containing shard and replica info
>
> However, when I add new machines running solr and pointing to this zk
> cluster, they do not get added as replicas. The new machines have solr.home
> directory available but no core specific directories beneath it since I
> want solr to auto add as replica.
> If I create the directories manually beneath solr.home named
> multishard_shard2_replica2, multishard_shard2_replica3 and so on and
> provide core.properties file, they are correctly added to the cloud as
> replicas.
>
> Is there a way to automatically do it? since solr documentation says so..
>
> https://cwiki.apache.org/confluence/display/solr/Nodes%2C+Cores%2C+Clusters+and+Leaders
> in Leaders and Replicas section
>
> Thanks,
> Susmit


Re: Formatting dates

2014-09-22 Thread Erick Erickson
I think this'll help:

http://wiki.apache.org/solr/ScriptUpdateProcessor

Essentially, each time a document comes in to Solr,
this will get invoked on it. You'll have to do some
fiddling to get it right, you have to remove the field from
the doc and transform it then put it back. None of this
is hard, but it'll require a bit of programming. Fortunately
not too much.

Best,
Erick

On Mon, Sep 22, 2014 at 1:16 PM, Manohar Kanuri  wrote:
> Hello,
>
> I am a non-techie who decided to download and install Solr 5.0 to parse data  
> for my community activism. Got it installed and running, updated the example 
> schema and installation with a bunch of CSV data. And went back to deal with 
> the first of two fields I deferred till later - dates and location data.
>
> The CSV data file for Jan - August 2014 is about 650mb with about 1.25 
> million records/rows. I split it into 5 pieces and went changed MM/DD/ 
> HH:MM:SS AM/PM to the -MM-DDTHH:MM:SSZ format required by Solr, using 
> TextWrangler. Which is what I know and a step up from trying to use Mac 
> Numbers spreadsheet which does it very easily but I will have to break it 
> into pieces smaller than 25-30mb. Random fields can get updated months after 
> the record was created so I have to find an easier way than break the CSV 
> file into smaller bits and reformat manually. Each record/row has 4 date 
> fields so potentially there are upto 5 million fields to be reformatted in 8 
> months worth of data..
>
> I did a Google search (didn't see a Solr search page) on the mailing list 
> archives and the internet, but seems like my question is either too simple 
> and/or it's staring me in the face and I'm just missing it:  Is there a 
> simple way to reformat the dates to Solr-style in a 650mb-1gig CSV file? Or, 
> ideally, have the dates and times automatically reformatted as the Solr index 
> gets updated the latest data (I recall reading this was not possible). Is 
> there a widget/gadget/gizmo/script that would do this?
>
> thanks,
> manohar


Re: Issue Adding Filter Query

2014-09-22 Thread aaguilar
Hello Erick,

Thank you so much for your help.  That makes perfect sense.  I will do the
changes you suggest and let you know how it goes.

Thanks!

On Mon, Sep 22, 2014 at 4:12 PM, Erick Erickson [via Lucene] <
ml-node+s472066n4160547...@n3.nabble.com> wrote:

> You have your index and query time analysis chains defined much
> differently. Omitting the WordDelimiterFilterFactory from the
> query-time analysis chain will lead to endless problems.
>
> With the definition you have, here are the terms in the index and
> their term positions as  below. This is available from the
> admin/analysis page if you click the "verbose" checkbox, although I
> admit it's kind of hard to read:
> 1 2   34
> fatty  acid-binding bindingprotein
>  acid
>
> But at query time, this is how they're being analyzed
> 1 2   3
> fattyacid-bindingprotein
>
> So searching for "fatty acid-binding protein" requires that the tokens
> "fatty" "acid-binding" and "protein" appear in term positions 1, 2, 3
> rather  than where they actually are (1, 2, 4). Searching for "fatty
> acid-binding protein"~1 would actually find this, the "~1" means allow
> one gap in there.
>
> HOWEVER, that's the least of your problems. WordDelimiterFilterFactory
> will _also_ "split on intra-word delimiters (all non alpha-numeric
> characters)". While that doesn't really say so explicitly, that will
> have the effect of removing puncutation. So searching for "fatty
> acid-binding protein."~1 (note the period) will fail since the token
> will include the period.
>
> I'd _really_ advise you to use the stock WordDelimiterFilterFactory
> settings in both analysis and query times included in the stock Solr
> release for, say, text_en_splitting or even a single analyzer like
> text_en_splitting_tight.
>
> Best,
> Erick
>
> On Mon, Sep 22, 2014 at 6:33 AM, aaguilar <[hidden email]
> > wrote:
>
> > Hello Erick.
> >
> > Below is the information you requested.   Thanks for your help!
> >
> >  positionIncrementGap=
> > "100">   > "solr.WhitespaceTokenizerFactory"/>  > "solr.WordDelimiterFilterFactory" splitOnNumerics="0"
> splitOnCaseChange="0"
> > generateWordParts="1" generateNumberParts="0" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" preserveOriginal="1"/>  class=
> > "solr.StopFilterFactory"/> 
>  > analyzer>   > "solr.WhitespaceTokenizerFactory"/>  > "solr.LowerCaseFilterFactory"/>  
> >
> >
> >  stored="true"
> > />
> >
> > On Fri, Sep 19, 2014 at 7:36 PM, Erick Erickson [via Lucene] <
> > [hidden email] >
> wrote:
> >
> >> Hmmm, I'd have to see the schema definition for your description
> >> field. For this, the admin/analysis page is very helpful. Here's my
> >> guess:
> >>
> >> Your analysis chain doesn't break the incoming tokens up quite like
> >> you think it is. Thus you have the tokens in your index like
> >> 'protein,' (notice the comma) and 'protein-like' rather than just
> >> 'protein'. However, I can't quite reconcile this with your statement:
> >> "Another weird thing is that if I used description:"fatty
> >> acid-binding" AND description:"protein"
> >>
> >> so I'm at something of a loss. If you paste in your schema definition
> >> for the 'description' field _and_ the corresponding 
> >> definition I can give it a quick whirl.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Sep 19, 2014 at 11:53 AM, aaguilar <[hidden email]
> >> > wrote:
> >>
> >> > Hello Erick,
> >> >
> >> > Thanks for the response.  I tried adding the debug=True to the query,
> >> but I
> >> > do not know exactly what I am looking for in the output.  Would it be
> >> > possible for you to look at the results?  I would really appreciate
> it.
> >> I
> >> > attached two files, one of them is with the filter query
> >> description:"fatty
> >> > acid-binding" and the other is with the filter query
> description:"fatty
> >> > acid-binding protein".  If you see the file that has the results for
> >> > description:"fatty acid-binding" , you can see that the hits do have
> >> "fatty
> >> > acid-binding protein" and nothing in between.  I really appreciate
> any
> >> help
> >> > you can provide.
> >> >
> >> > Thanks you
> >> >
> >> > On Fri, Sep 19, 2014 at 2:03 PM, Erick Erickson [via Lucene] <
> >> > [hidden email] >
>
> >> wrote:
> >> >
> >> >> Your very best friend here is attaching &debug=query to the URL and
> >> >> looking at the parsed query results. Upon occasion there's some
> >> >>
> >> >> One possible explanation is that description field has something
> like
> >> >> "fatty acid-binding some words protein" in which case your query
> >> >> "fatty acid-binding protein" would fail, but "fatty acid-binding
> >> >> protein"~4 would succeed.
> >> >>
> >> >> The other possibili

Re: Formatting dates

2014-09-22 Thread Manohar Kanuri
Thanks Erick,

I expected to hear the dreaded word "programming" at some point and I guess 
that point has arrived. Now that I know where and what to tinker with. 

And I should have said 4.10 below, not 5.0.

On Sep 22, 2014, at 4:44 PM, Erick Erickson  wrote:

> I think this'll help:
> 
> http://wiki.apache.org/solr/ScriptUpdateProcessor
> 
> Essentially, each time a document comes in to Solr,
> this will get invoked on it. You'll have to do some
> fiddling to get it right, you have to remove the field from
> the doc and transform it then put it back. None of this
> is hard, but it'll require a bit of programming. Fortunately
> not too much.
> 
> Best,
> Erick
> 
> On Mon, Sep 22, 2014 at 1:16 PM, Manohar Kanuri  wrote:
>> Hello,
>> 
>> I am a non-techie who decided to download and install Solr 5.0 to parse data 
>>  for my community activism. Got it installed and running, updated the 
>> example schema and installation with a bunch of CSV data. And went back to 
>> deal with the first of two fields I deferred till later - dates and location 
>> data.
>> 
>> The CSV data file for Jan - August 2014 is about 650mb with about 1.25 
>> million records/rows. I split it into 5 pieces and went changed MM/DD/ 
>> HH:MM:SS AM/PM to the -MM-DDTHH:MM:SSZ format required by Solr, using 
>> TextWrangler. Which is what I know and a step up from trying to use Mac 
>> Numbers spreadsheet which does it very easily but I will have to break it 
>> into pieces smaller than 25-30mb. Random fields can get updated months after 
>> the record was created so I have to find an easier way than break the CSV 
>> file into smaller bits and reformat manually. Each record/row has 4 date 
>> fields so potentially there are upto 5 million fields to be reformatted in 8 
>> months worth of data..
>> 
>> I did a Google search (didn't see a Solr search page) on the mailing list 
>> archives and the internet, but seems like my question is either too simple 
>> and/or it's staring me in the face and I'm just missing it:  Is there a 
>> simple way to reformat the dates to Solr-style in a 650mb-1gig CSV file? Or, 
>> ideally, have the dates and times automatically reformatted as the Solr 
>> index gets updated the latest data (I recall reading this was not possible). 
>> Is there a widget/gadget/gizmo/script that would do this?
>> 
>> thanks,
>> manohar



Re: Schema Parsing Failed: unknown field 'id' [Zookeeper, SolrCloud]

2014-09-22 Thread paulparsons
Thanks for the suggestions. I actually had both problems. I couldn't figure
out how to remove the configs from zookeeper through the cloud scripts, so I
just manually removed the files in the zookeeper data directory.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-Parsing-Failed-unknown-field-id-Zookeeper-SolrCloud-tp4160478p4160580.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Formatting dates

2014-09-22 Thread Alexandre Rafalovitch
You could try - for your ideal scenario - creating an
UpdateRequestProcessor (URP) chain, that
includes:ParseDateFieldUpdateProcessorFactory
https://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html

Notice that it has been designed for dynamic field scenario, so by
default it looks at everything and tries to make it a date. But its
parent class has some parameters to specify specific fields to use:
https://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html

You can see an example in the schemaless config example:
https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1584

Just remember that when you are creating a URP chain:
1) You need to keep two (or three) of the update request processor in
the chain, not just your date one. The details are here:
https://wiki.apache.org/solr/UpdateRequestProcessor . The example
above uses three, to deal with cloud situation
2) You need to refer to that chain in the request handler to make sure
it is actually used:
https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1014

I THINK this should work and it would classify under configuration not
customization and definitely not programming.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 22 September 2014 16:16, Manohar Kanuri  wrote:
> Hello,
>
> I am a non-techie who decided to download and install Solr 5.0 to parse data  
> for my community activism. Got it installed and running, updated the example 
> schema and installation with a bunch of CSV data. And went back to deal with 
> the first of two fields I deferred till later - dates and location data.
>
> The CSV data file for Jan - August 2014 is about 650mb with about 1.25 
> million records/rows. I split it into 5 pieces and went changed MM/DD/ 
> HH:MM:SS AM/PM to the -MM-DDTHH:MM:SSZ format required by Solr, using 
> TextWrangler. Which is what I know and a step up from trying to use Mac 
> Numbers spreadsheet which does it very easily but I will have to break it 
> into pieces smaller than 25-30mb. Random fields can get updated months after 
> the record was created so I have to find an easier way than break the CSV 
> file into smaller bits and reformat manually. Each record/row has 4 date 
> fields so potentially there are upto 5 million fields to be reformatted in 8 
> months worth of data..
>
> I did a Google search (didn't see a Solr search page) on the mailing list 
> archives and the internet, but seems like my question is either too simple 
> and/or it's staring me in the face and I'm just missing it:  Is there a 
> simple way to reformat the dates to Solr-style in a 650mb-1gig CSV file? Or, 
> ideally, have the dates and times automatically reformatted as the Solr index 
> gets updated the latest data (I recall reading this was not possible). Is 
> there a widget/gadget/gizmo/script that would do this?
>
> thanks,
> manohar


Re: Schema Parsing Failed: unknown field 'id' [Zookeeper, SolrCloud]

2014-09-22 Thread Chris Hostetter

: out how to remove the configs from zookeeper through the cloud scripts, so I
: just manually removed the files in the zookeeper data directory.

https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities

Solr's zkcli.sh with "-cmd putfile" will replace a single file, or you can 
use "-cmd upconfig" to completley upload a new configset.



-Hoss
http://www.lucidworks.com/


Performance with fast vector highlighter in solr 4.x

2014-09-22 Thread lei
Hi there,

I'm using Solr 4.7 and find the fast vector highlighter is not as fast as
it used to be in solr 3.x. It seems the results are not cached, even after
several hits of the same query, it still takes dozens of milliseconds to
return. Any idea or solution is appreciated. Thanks.


Re: Formatting dates

2014-09-22 Thread Erick Erickson
Alexandre:

Honest, I looked for that but was in a rush and couldn't find it and
thought I was remembering something _else_.

That's definitely a better approach, thanks! Perhaps this time I'll
remember

Erick

On Mon, Sep 22, 2014 at 3:23 PM, Alexandre Rafalovitch 
wrote:

> You could try - for your ideal scenario - creating an
> UpdateRequestProcessor (URP) chain, that
> includes:ParseDateFieldUpdateProcessorFactory
>
> https://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html
>
> Notice that it has been designed for dynamic field scenario, so by
> default it looks at everything and tries to make it a date. But its
> parent class has some parameters to specify specific fields to use:
>
> https://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html
>
> You can see an example in the schemaless config example:
>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1584
>
> Just remember that when you are creating a URP chain:
> 1) You need to keep two (or three) of the update request processor in
> the chain, not just your date one. The details are here:
> https://wiki.apache.org/solr/UpdateRequestProcessor . The example
> above uses three, to deal with cloud situation
> 2) You need to refer to that chain in the request handler to make sure
> it is actually used:
>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1014
>
> I THINK this should work and it would classify under configuration not
> customization and definitely not programming.
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 22 September 2014 16:16, Manohar Kanuri  wrote:
> > Hello,
> >
> > I am a non-techie who decided to download and install Solr 5.0 to parse
> data  for my community activism. Got it installed and running, updated the
> example schema and installation with a bunch of CSV data. And went back to
> deal with the first of two fields I deferred till later - dates and
> location data.
> >
> > The CSV data file for Jan - August 2014 is about 650mb with about 1.25
> million records/rows. I split it into 5 pieces and went changed MM/DD/
> HH:MM:SS AM/PM to the -MM-DDTHH:MM:SSZ format required by Solr, using
> TextWrangler. Which is what I know and a step up from trying to use Mac
> Numbers spreadsheet which does it very easily but I will have to break it
> into pieces smaller than 25-30mb. Random fields can get updated months
> after the record was created so I have to find an easier way than break the
> CSV file into smaller bits and reformat manually. Each record/row has 4
> date fields so potentially there are upto 5 million fields to be
> reformatted in 8 months worth of data..
> >
> > I did a Google search (didn't see a Solr search page) on the mailing
> list archives and the internet, but seems like my question is either too
> simple and/or it's staring me in the face and I'm just missing it:  Is
> there a simple way to reformat the dates to Solr-style in a 650mb-1gig CSV
> file? Or, ideally, have the dates and times automatically reformatted as
> the Solr index gets updated the latest data (I recall reading this was not
> possible). Is there a widget/gadget/gizmo/script that would do this?
> >
> > thanks,
> > manohar
>


Re: Solr upgrade to latest version

2014-09-22 Thread Danesh Kuruppu
Thanks Alex and Erick for quick response,
This is really helpful.

On Tue, Sep 23, 2014 at 1:19 AM, Erick Erickson 
wrote:

> Probably go for 4.9.1. There'll be a 4.10.1 out in the not-too-distant
> future that you can upgrade to if you wish. 4.9.1 -> 4.10.1 should be
> quite painless.
>
> But do _not_ copy your schema.xml and solrconfig.xml files over form
> 1.4 to 4.x. There are some fairly easy ways to shoot yourself in the
> foot there. Take the stock distribution configuration files and copy
> _parts_ of your schema.xml and solrconfig.xml you care about.
>
> If you're using multiple cores, read about core discovery here:
> https://wiki.apache.org/solr/Core%20Discovery%20(4.4%20and%20beyond)
>
> And be very aware that you should _not_ remove any of the _field_
> entries in schema.xml. In particular _version_ and _root_ should be
> left alone. As well as the "id" field.
>
> And you'll have to re-index everything; Solr 4.x will not read Solr
> 1.4 indexes. If that's impossible, you'll have to upgrade from 1.4 to
> 3.x, optimize your index, then upgrade from 3.x to 4.x, add some
> documents, and optimize/force_merge again.
>
> HTH
> Erick
>
> On Mon, Sep 22, 2014 at 2:29 AM, Danesh Kuruppu 
> wrote:
> > Hi all,
> >
> > I currently working on upgrade sorl 1.4.1 to sorl latest stable release.
> >
> > What is the latest stable release I can use?
> > Is there specfic things I need to look at when upgrade.
> >
> > Need help
> > Thanks
> >
> > Danesh
>


Re: [ANN] Lucidworks Fusion 1.0.0

2014-09-22 Thread Thomas Egense
Hi Grant.
Will there be a Fusion demostration/presentation  at Lucene/Solr Revolution
DC? (Not listed in the program yet).


Thomas Egense

On Mon, Sep 22, 2014 at 3:45 PM, Grant Ingersoll 
wrote:

> Hi All,
>
> We at Lucidworks are pleased to announce the release of Lucidworks Fusion
> 1.0.   Fusion is built to overlay on top of Solr (in fact, you can manage
> multiple Solr clusters -- think QA, staging and production -- all from our
> Admin).In other words, if you already have Solr, simply point Fusion at
> your instance and get all kinds of goodies like Banana (
> https://github.com/LucidWorks/Banana -- our port of Kibana to Solr + a
> number of extensions that Kibana doesn't have), collaborative filtering
> style recommendations (without the need for Hadoop or Mahout!), a modern
> signal capture framework, analytics, NLP integration, Boosting/Blocking and
> other relevance tools, flexible index and query time pipelines as well as a
> myriad of connectors ranging from Twitter to web crawling to Sharepoint.
> The best part of all this?  It all leverages the infrastructure that you
> know and love: Solr.  Want recommendations?  Deploy more Solr.  Want log
> analytics?  Deploy more Solr.  Want to track important system metrics?
> Deploy more Solr.
>
> Fusion represents our commitment as a company to continue to contribute a
> large quantity of enhancements to the core of Solr while complementing and
> extending those capabilities with value adds that integrate a number of 3rd
> party (e.g connectors) and home grown capabilities like an all new,
> responsive UI built in AngularJS.  Fusion is not a fork of Solr.  We do not
> hide Solr in any way.  In fact, our goal is that your existing applications
> will work out of the box with Fusion, allowing you to take advantage of new
> capabilities w/o overhauling your existing application.
>
> If you want to learn more, please feel free to join our technical webinar
> on October 2: http://lucidworks.com/blog/say-hello-to-lucidworks-fusion/.
> If you'd like to download: http://lucidworks.com/product/fusion/.
>
> Cheers,
> Grant Ingersoll
>
> 
> Grant Ingersoll | CTO
> gr...@lucidworks.com | @gsingers
> http://www.lucidworks.com
>
>