Re: Collations are not working fine.

2015-02-17 Thread Nitin Solanki
Hey James Dyer,
 Sorry for late responding because I went out
for couple of days. I have tried out the Rajesh Hazari's configuration
which he pasted inside the mail. It seems to be working. I feel that It is
working because by reducing the *25 *to*
5* by which collations come less and
spellcheck.maxCollationTries is able to identify or evaluate the collation
"gone with the wind".
But here, the problem is that, hits of "gone with the wind" are coming
less(only 53) *{Look collations.png}* while there are 394 hits for "gone
with the wind", if I tried the correct phrase in param q="gone with the
wind". I got 394 - numFound in response.*{Look response.png}*
Any Idea of it?


On Fri, Feb 13, 2015 at 11:31 PM, Dyer, James 
wrote:

> Nitin,
>
> Can you post the full spellcheck response when you query:
>
> q=gram_ci:"gone wthh thes wint"&wt=json&indent=true&shards.qt=/spell
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: Nitin Solanki [mailto:nitinml...@gmail.com]
> Sent: Friday, February 13, 2015 1:05 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Collations are not working fine.
>
> Hi James Dyer,
>   I did the same as you told me. Used
> WordBreakSolrSpellChecker instead of shingles. But still collations are not
> coming or working.
> For instance, I tried to get collation of "gone with the wind" by searching
> "gone wthh thes wint" on field=gram_ci but didn't succeed. Even, I am
> getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*.
> Also I have documents which contains "gone with the wind" having 167 times
> in the documents. I don't know that I am missing something or not.
> Please check my below solr configuration:
>
> *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:"gone wthh thes
> wint"&wt=json&indent=true&shards.qt=/spell
>
> *solrconfig.xml:*
>
> 
> textSpellCi
> 
>   default
>   gram_ci
>   solr.DirectSolrSpellChecker
>   internal
>   0.5
>   2
>   0
>   5
>   2
>   0.9
>   freq
> 
> 
>   wordbreak
>   solr.WordBreakSolrSpellChecker
>   gram
>   true
>   true
>   5
> 
> 
>
> 
> 
>   gram_ci
>   default
>   on
>   true
>   25
>   true
>   1
>   25
>   true
>   50
>   50
>   true
> 
> 
>   spellcheck
> 
>   
>
> *Schema.xml: *
>
>  multiValued="false"/>
>
>  positionIncrementGap="100">
>
> 
> 
> 
> 
> 
> 
> 
> 
>


Re: Release date for Solr 5

2015-02-17 Thread CKReddy Bhimavarapu
Hi,
 Can i get any developer version to test and run for now.

On Tue, Feb 17, 2015 at 12:45 PM, Anshum Gupta 
wrote:

> There's a vote going on for the 3rd release candidate of Solr / Lucene 5.0.
> If everything goes smooth and the vote passes, the release should happen in
> about 4-5 days.
>
> On Mon, Feb 16, 2015 at 10:09 PM, CKReddy Bhimavarapu  >
> wrote:
>
> > What is the anticipated release date for Solr 5?
> >
> > --
> > ckreddybh. 
> >
>
>
>
> --
> Anshum Gupta
> http://about.me/anshumgupta
>



-- 
ckreddybh. 


Re: Collations are not working fine.

2015-02-17 Thread Nitin Solanki
Hey Rajesh,
 Sorry for late responding because I went out
for couple of days. I have tried out the configuration which you sent me.
Thanks a lot. It seems to be working. I feel that It is working because by
reducing the *25 *to* 5* by which collations come less and
spellcheck.maxCollationTries is able to identify or evaluate the collation
"gone with the wind".
But here, the problem is that, hits of "gone with the wind" are coming
less(only 53) *{Look collations.png}* while there are 394 hits for "gone
with the wind", if I tried the correct phrase in param q="gone with the
wind". I got 394 - numFound in response.*{Look response.png}*
Any Idea of it?

One more thing to say: You used
100%
AND
But It doesn't seems to be working. I tried by removing above 2 lines, it
doesn't affect the result. I also changed the value of
spellcheck.collateParam.mm to 0% and spellcheck.collateParam.q.op to "OR".
Even it doesn't affect on the results. I am unable to understand what is
spellcheck.collateParam.mm and spellcheck.collateParam.q.op after googling.
Will you please assist me?
Thanks .



On Sat, Feb 14, 2015 at 2:18 AM, Rajesh Hazari 
wrote:

> Hi Nitin,
>
> Can u try with the below config, we have these config seems to be working
> for us.
>
> 
>
>  text_general
>
>
>   
> wordbreak
> solr.WordBreakSolrSpellChecker
> textSpell
> true
> false
> 5
>   
>
>
> default
> textSpell
> solr.IndexBasedSpellChecker
> ./spellchecker
> 0.75
> 0.01
> true
> 5
>  
>
>
>   
>
>
>
> true
> default
> wordbreak
> 5
> 15
> true
> false
> true
> 100
> 100%
> AND
> 1000
>
>
> *Rajesh.*
>
> On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James  >
> wrote:
>
> > Nitin,
> >
> > Can you post the full spellcheck response when you query:
> >
> > q=gram_ci:"gone wthh thes wint"&wt=json&indent=true&shards.qt=/spell
> >
> > James Dyer
> > Ingram Content Group
> >
> >
> > -Original Message-
> > From: Nitin Solanki [mailto:nitinml...@gmail.com]
> > Sent: Friday, February 13, 2015 1:05 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Collations are not working fine.
> >
> > Hi James Dyer,
> >   I did the same as you told me. Used
> > WordBreakSolrSpellChecker instead of shingles. But still collations are
> not
> > coming or working.
> > For instance, I tried to get collation of "gone with the wind" by
> searching
> > "gone wthh thes wint" on field=gram_ci but didn't succeed. Even, I am
> > getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*.
> > Also I have documents which contains "gone with the wind" having 167
> times
> > in the documents. I don't know that I am missing something or not.
> > Please check my below solr configuration:
> >
> > *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:"gone wthh thes
> > wint"&wt=json&indent=true&shards.qt=/spell
> >
> > *solrconfig.xml:*
> >
> > 
> > textSpellCi
> > 
> >   default
> >   gram_ci
> >   solr.DirectSolrSpellChecker
> >   internal
> >   0.5
> >   2
> >   0
> >   5
> >   2
> >   0.9
> >   freq
> > 
> > 
> >   wordbreak
> >   solr.WordBreakSolrSpellChecker
> >   gram
> >   true
> >   true
> >   5
> > 
> > 
> >
> > 
> > 
> >   gram_ci
> >   default
> >   on
> >   true
> >   25
> >   true
> >   1
> >   25
> >   true
> >   50
> >   50
> >   true
> > 
> > 
> >   spellcheck
> > 
> >   
> >
> > *Schema.xml: *
> >
> >  > multiValued="false"/>
> >
> >  > positionIncrementGap="100">
> >
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
>


Re: Collations are not working fine.

2015-02-17 Thread Nitin Solanki
Hi Charles,
 Will you please send the configuration which you tried. It
will help to solve my problem. Have you sorted the collations on hits or
frequencies of suggestions? If you did than please assist me.

On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles <
charles.reit...@tiaa-cref.org> wrote:

> I have been working with collations the last couple days and I kept adding
> the collation-related parameters until it started working for me.   It
> seems I needed 50.
>
> But, I am using the Suggester with the WFSTLookupFactory.
>
> Also, I needed to patch the suggester to get frequency information in the
> spellcheck response.
>
> -Original Message-
> From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com]
> Sent: Friday, February 13, 2015 3:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Collations are not working fine.
>
> Hi Nitin,
>
> Can u try with the below config, we have these config seems to be working
> for us.
>
> 
>
>  text_general
>
>
>   
> wordbreak
> solr.WordBreakSolrSpellChecker
> textSpell
> true
> false
> 5
>   
>
>
> default
> textSpell
> solr.IndexBasedSpellChecker
> ./spellchecker
> 0.75
> 0.01
> true
> 5
>  
>
>
>   
>
>
>
> true
> default
> wordbreak
> 5
> 15
> true
> false
> true
> 100
> 100%
> AND
> 1000
>
>
> *Rajesh.*
>
> On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James  >
> wrote:
>
> > Nitin,
> >
> > Can you post the full spellcheck response when you query:
> >
> > q=gram_ci:"gone wthh thes wint"&wt=json&indent=true&shards.qt=/spell
> >
> > James Dyer
> > Ingram Content Group
> >
> >
> > -Original Message-
> > From: Nitin Solanki [mailto:nitinml...@gmail.com]
> > Sent: Friday, February 13, 2015 1:05 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Collations are not working fine.
> >
> > Hi James Dyer,
> >   I did the same as you told me. Used
> > WordBreakSolrSpellChecker instead of shingles. But still collations
> > are not coming or working.
> > For instance, I tried to get collation of "gone with the wind" by
> > searching "gone wthh thes wint" on field=gram_ci but didn't succeed.
> > Even, I am getting the suggestions of wtth as *with*, thes as *the*,
> wint as *wind*.
> > Also I have documents which contains "gone with the wind" having 167
> > times in the documents. I don't know that I am missing something or not.
> > Please check my below solr configuration:
> >
> > *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:"gone wthh thes
> > wint"&wt=json&indent=true&shards.qt=/spell
> >
> > *solrconfig.xml:*
> >
> > 
> > textSpellCi
> > 
> >   default
> >   gram_ci
> >   solr.DirectSolrSpellChecker
> >   internal
> >   0.5
> >   2
> >   0
> >   5
> >   2
> >   0.9
> >   freq
> > 
> > 
> >   wordbreak
> >   solr.WordBreakSolrSpellChecker
> >   gram
> >   true
> >   true
> >   5
> > 
> > 
> >
> > 
> > 
> >   gram_ci
> >   default
> >   on
> >   true
> >   25
> >   true
> >   1
> >   25
> >   true
> >   50
> >   50
> >   true
> > 
> > 
> >   spellcheck
> > 
> >   
> >
> > *Schema.xml: *
> >
> >  > multiValued="false"/>
> >
> >  > positionIncrementGap="100">
> >
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
>
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender
> immediately and then delete it.
>
> TIAA-CREF
> *
>


Re: Release date for Solr 5

2015-02-17 Thread Anshum Gupta
You can either checkout the release branch and build it yourself from:
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_0

or download it from the RC here:
http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC3-rev1659987

You should remember that this is a release candidate and not a release at
this point.


On Tue, Feb 17, 2015 at 12:13 AM, CKReddy Bhimavarapu 
wrote:

> Hi,
>  Can i get any developer version to test and run for now.
>
> On Tue, Feb 17, 2015 at 12:45 PM, Anshum Gupta 
> wrote:
>
> > There's a vote going on for the 3rd release candidate of Solr / Lucene
> 5.0.
> > If everything goes smooth and the vote passes, the release should happen
> in
> > about 4-5 days.
> >
> > On Mon, Feb 16, 2015 at 10:09 PM, CKReddy Bhimavarapu <
> chaitu...@gmail.com
> > >
> > wrote:
> >
> > > What is the anticipated release date for Solr 5?
> > >
> > > --
> > > ckreddybh. 
> > >
> >
> >
> >
> > --
> > Anshum Gupta
> > http://about.me/anshumgupta
> >
>
>
>
> --
> ckreddybh. 
>



-- 
Anshum Gupta
http://about.me/anshumgupta


Re: Release date for Solr 5

2015-02-17 Thread Shalin Shekhar Mangar
You can help by testing out the release candidate available from:
http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC3-rev1659987

Note that this is *NOT* an official release.

On Tue, Feb 17, 2015 at 1:43 PM, CKReddy Bhimavarapu 
wrote:

> Hi,
>  Can i get any developer version to test and run for now.
>
> On Tue, Feb 17, 2015 at 12:45 PM, Anshum Gupta 
> wrote:
>
> > There's a vote going on for the 3rd release candidate of Solr / Lucene
> 5.0.
> > If everything goes smooth and the vote passes, the release should happen
> in
> > about 4-5 days.
> >
> > On Mon, Feb 16, 2015 at 10:09 PM, CKReddy Bhimavarapu <
> chaitu...@gmail.com
> > >
> > wrote:
> >
> > > What is the anticipated release date for Solr 5?
> > >
> > > --
> > > ckreddybh. 
> > >
> >
> >
> >
> > --
> > Anshum Gupta
> > http://about.me/anshumgupta
> >
>
>
>
> --
> ckreddybh. 
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Weird Solr Replication Slave out of sync

2015-02-17 Thread Dmitry Kan
Hi,
This sounds quite strange. Do you see any error messages either in the solr
admin's replication page or in the master's OR slave's logs?

When we had issues with slave replicating from the master, they related to
slave running out of disk. I'm sure there could be a bunch of other reasons
for failed replication, but those should generally be evident in the logs.

On Tue, Feb 17, 2015 at 7:46 AM, Summer Shire  wrote:

> Hi All,
>
> My master and slave index version and generation is the same
> yet the index is not in sync because when I execute the same query
> on both master and slave I see old docs on slave which should not be there.
>
> I also tried to fetch a specific indexversion on slave using
> command=fetchindex&indexversion=
>
> This is very spooky because I do not get any errors on master or slave.
> Also I see in the logs that the slave is polling the master every 15 mins
> I was able to find this issue only because I was looking at the specific
> old document.
>
> Now I can manually delete the index folder on slave and restart my slave.
> But I really want to find out what could be going on. Because these type
> of issues are going to
> be hard to find especially when there are on errors.
>
> What could be happening. and how can I avoid this from happening ?
>
>
> Thanks,
> Summer
>
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Sort collation on hits.

2015-02-17 Thread Nitin Solanki
Hi,
All I want to sort the collations on hits in descending order. How
to do ?


Re: Solr suggest is related to second letter, not to initial letter

2015-02-17 Thread Volkan Altan
First of all thank you for your answer.

Example Url:
doc 1 suggest_field: galaxy samsung s5 phone
doc 2 suggest_field: shoe adidas 2 hiking 


http://localhost:8983/solr/solr/suggest?q=galaxy+s

The result for which I am waiting is just like the one indicated below. But; 
the ‘’Galaxy shoe’’ isn’t supposed to appear. However,unfortunately, the galaxy 
shoe appears now.



galaxy samsung
0

galaxy
samsung



galaxy s5
0

galaxy
s5




I don’t want to use KeywordTokenizer. Because, as long as the compound words 
written by the user are available in any document, I am able to receive a 
conclusion. I just don’t want “q=galaxy + samsung” to appear; because it is an 
inappropriate suggession and it doesn’t work.

Many Thanks Ahead of Time!


My settings;



default
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.tst.TSTLookup  
  
suggestions 
0.1
true

suggest_term




true
false
default
true
10
true
true
10
100


suggest

 


























> On 16 Şub 2015, at 03:52, Michael Sokolov  
> wrote:
> 
> StandardTokenizer splits your text into tokens, and the suggester suggests 
> tokens independently.  It sounds as if you want the suggestions to be based 
> on the entire text (not just the current word), and that only adjacent words 
> in the original should appear as suggestions.  Assuming that's what you are 
> after (it's a little hard to tell from your e-mail -- you might want to 
> clarify by providing a few example of how you *do* want it to work instead of 
> just examples of how you *don't* want it to work), you have a couple of 
> choices:
> 
> 1) don't use StandardTokenizer, use KeywordTokenizer instead - this will 
> preserve the entire original text and suggest complete texts, rather than 
> words
> 2) maybe consider using a shingle filter along with standard tokenizer, so 
> that your tokens include multi-word shingles
> 3) Use a suggester with better support for a statistical language model, like 
> this one: 
> http://blog.mikemccandless.com/2014/01/finding-long-tail-suggestions-using.html,
>  but to do this you will probably need to do some java programming since it 
> isn't well integrated into solr
> 
> -Mike
> 
> On 2/14/2015 3:44 AM, Volkan Altan wrote:
>> Any idea?
>> 
>> 
>>> On 12 Şub 2015, at 11:12, Volkan Altan  wrote:
>>> 
>>> Hello Everyone,
>>> 
>>> All I want to do with Solr suggester is obtaining the fact that the 
>>> asserted suggestions  for the second letter whose entry actualizes after 
>>> the initial letter  is actually related to initial letter, itself. But; 
>>> just like the initial letters, the second letters rotate independently, as 
>>> well.
>>> 
>>> 
>>> Example;
>>> http://localhost:8983/solr/solr/suggest?q=facet_suggest_data:”adidas+s"; 
>>> 
>>> 
>>> adidas s
>>> 
>>> response>
>>> 
>>> 0
>>> 4
>>> 
>>> 
>>> 
>>> 
>>> 1
>>> 27
>>> 28
>>> 
>>> samsung
>>> 
>>> 
>>> 
>>> facet_suggest_data:"adidas samsung"
>>> 0
>>> 
>>> adidas
>>> samsung
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> The terms of ‘’Adidas’’ and ‘’Samsung’’ are available within seperate 
>>> documents. A common place in which both of them are available cannot be 
>>> found.
>>> 
>>> How can I solve that problem?
>>> 
>>> 
>>> 
>>> schema.xml
>>> 
>>> >> positionIncrementGap="100">
>>> 
>>> 
>>> 
>>> 
>>> >> synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>>> >> words="stopwords.txt" enablePositionIncrements="true" />
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> >> multiValued="true" stored="false" omitNorms="true"/>
>>> 
>>> 
>>> Best
>>> 
>> 
> 



Possibility of Indexing without feeding again in Solr 4.10.2

2015-02-17 Thread dinesh naik
Hi all,
How to can do re-indexing in Solr without importing the data again?
Is there a way to do re-indexing only for few documents ?
-- 
Best Regards,
Dinesh Naik


Better way of copying/backup of index in Solr 4.10.2

2015-02-17 Thread dinesh naik
What is the best way for copying/backup of index in Solr 4.10.2?
-- 
Best Regards,
Dinesh Naik


unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
Hi,

We are currently comparing the RAM consumption of two parallel Solr
clusters with different solr versions: 4.10.2 and 4.3.1.

For comparable index sizes of a shard (20G and 26G), we observed 9G vs 5.6G
RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner.

We have not changed the solrconfig.xml to upgrade to 4.10.2 and have
reindexed data from scratch. The commits are all controlled on the client,
i.e. not auto-commits.

Solr: 4.10.2 (high load, mass indexing)
Java: 1.7.0_76 (Oracle)
-Xmx25600m


Solr: 4.3.1 (normal load, no mass indexing)
Java: 1.7.0_11 (Oracle)
-Xmx25600m

The RAM consumption remained the same after the load has stopped on the
4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
seen by top remained at 9G level.

This unusual spike happened during mass data indexing.

What else could be the artifact of such a difference -- Solr or JVM? Can it
only be explained by the mass indexing? What is worrisome is that the
4.10.2 shard reserves 8x times it uses.

What can be done about this?

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


RE: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Markus Jelsma
We have seen an increase between 4.8.1 and 4.10. 
 
-Original message-
> From:Dmitry Kan 
> Sent: Tuesday 17th February 2015 11:06
> To: solr-user@lucene.apache.org
> Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption
> 
> Hi,
> 
> We are currently comparing the RAM consumption of two parallel Solr
> clusters with different solr versions: 4.10.2 and 4.3.1.
> 
> For comparable index sizes of a shard (20G and 26G), we observed 9G vs 5.6G
> RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner.
> 
> We have not changed the solrconfig.xml to upgrade to 4.10.2 and have
> reindexed data from scratch. The commits are all controlled on the client,
> i.e. not auto-commits.
> 
> Solr: 4.10.2 (high load, mass indexing)
> Java: 1.7.0_76 (Oracle)
> -Xmx25600m
> 
> 
> Solr: 4.3.1 (normal load, no mass indexing)
> Java: 1.7.0_11 (Oracle)
> -Xmx25600m
> 
> The RAM consumption remained the same after the load has stopped on the
> 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
> jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
> seen by top remained at 9G level.
> 
> This unusual spike happened during mass data indexing.
> 
> What else could be the artifact of such a difference -- Solr or JVM? Can it
> only be explained by the mass indexing? What is worrisome is that the
> 4.10.2 shard reserves 8x times it uses.
> 
> What can be done about this?
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
> 


Re: Possibility of Indexing without feeding again in Solr 4.10.2

2015-02-17 Thread Gora Mohanty
On 17 February 2015 at 15:18, dinesh naik  wrote:

> Hi all,
> How to can do re-indexing in Solr without importing the data again?
> Is there a way to do re-indexing only for few documents ?
>

If you have a unique ID for your documents, updating the index with that ID
will update just that document. Other than that, you need to import all
your data again if you want to change the Solr index.

Regards,
Gora


Re: Better way of copying/backup of index in Solr 4.10.2

2015-02-17 Thread Gora Mohanty
On 17 February 2015 at 15:19, dinesh naik  wrote:
>
> What is the best way for copying/backup of index in Solr 4.10.2?

Please take a look at
https://cwiki.apache.org/confluence/display/solr/Backing+Up

Regards,
Gora


Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
Have you found an explanation to that?

On Tue, Feb 17, 2015 at 12:12 PM, Markus Jelsma 
wrote:

> We have seen an increase between 4.8.1 and 4.10.
>
> -Original message-
> > From:Dmitry Kan 
> > Sent: Tuesday 17th February 2015 11:06
> > To: solr-user@lucene.apache.org
> > Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption
> >
> > Hi,
> >
> > We are currently comparing the RAM consumption of two parallel Solr
> > clusters with different solr versions: 4.10.2 and 4.3.1.
> >
> > For comparable index sizes of a shard (20G and 26G), we observed 9G vs
> 5.6G
> > RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner.
> >
> > We have not changed the solrconfig.xml to upgrade to 4.10.2 and have
> > reindexed data from scratch. The commits are all controlled on the
> client,
> > i.e. not auto-commits.
> >
> > Solr: 4.10.2 (high load, mass indexing)
> > Java: 1.7.0_76 (Oracle)
> > -Xmx25600m
> >
> >
> > Solr: 4.3.1 (normal load, no mass indexing)
> > Java: 1.7.0_11 (Oracle)
> > -Xmx25600m
> >
> > The RAM consumption remained the same after the load has stopped on the
> > 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
> > jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
> > seen by top remained at 9G level.
> >
> > This unusual spike happened during mass data indexing.
> >
> > What else could be the artifact of such a difference -- Solr or JVM? Can
> it
> > only be explained by the mass indexing? What is worrisome is that the
> > 4.10.2 shard reserves 8x times it uses.
> >
> > What can be done about this?
> >
> > --
> > Dmitry Kan
> > Luke Toolbox: http://github.com/DmitryKey/luke
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> > SemanticAnalyzer: www.semanticanalyzer.info
> >
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


RE: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Markus Jelsma
I would have shared it if i had one :)  
 
-Original message-
> From:Dmitry Kan 
> Sent: Tuesday 17th February 2015 11:40
> To: solr-user@lucene.apache.org
> Subject: Re: unusually high 4.10.2 vs 4.3.1 RAM consumption
> 
> Have you found an explanation to that?
> 
> On Tue, Feb 17, 2015 at 12:12 PM, Markus Jelsma 
> wrote:
> 
> > We have seen an increase between 4.8.1 and 4.10.
> >
> > -Original message-
> > > From:Dmitry Kan 
> > > Sent: Tuesday 17th February 2015 11:06
> > > To: solr-user@lucene.apache.org
> > > Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption
> > >
> > > Hi,
> > >
> > > We are currently comparing the RAM consumption of two parallel Solr
> > > clusters with different solr versions: 4.10.2 and 4.3.1.
> > >
> > > For comparable index sizes of a shard (20G and 26G), we observed 9G vs
> > 5.6G
> > > RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner.
> > >
> > > We have not changed the solrconfig.xml to upgrade to 4.10.2 and have
> > > reindexed data from scratch. The commits are all controlled on the
> > client,
> > > i.e. not auto-commits.
> > >
> > > Solr: 4.10.2 (high load, mass indexing)
> > > Java: 1.7.0_76 (Oracle)
> > > -Xmx25600m
> > >
> > >
> > > Solr: 4.3.1 (normal load, no mass indexing)
> > > Java: 1.7.0_11 (Oracle)
> > > -Xmx25600m
> > >
> > > The RAM consumption remained the same after the load has stopped on the
> > > 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
> > > jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
> > > seen by top remained at 9G level.
> > >
> > > This unusual spike happened during mass data indexing.
> > >
> > > What else could be the artifact of such a difference -- Solr or JVM? Can
> > it
> > > only be explained by the mass indexing? What is worrisome is that the
> > > 4.10.2 shard reserves 8x times it uses.
> > >
> > > What can be done about this?
> > >
> > > --
> > > Dmitry Kan
> > > Luke Toolbox: http://github.com/DmitryKey/luke
> > > Blog: http://dmitrykan.blogspot.com
> > > Twitter: http://twitter.com/dmitrykan
> > > SemanticAnalyzer: www.semanticanalyzer.info
> > >
> >
> 
> 
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
> 


Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
;) ok. Currently I'm trying parallel GC options, mentioned here:
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/101377

At least the saw-tooth RAM chart is starting to shape up.

On Tue, Feb 17, 2015 at 12:55 PM, Markus Jelsma 
wrote:

> I would have shared it if i had one :)
>
> -Original message-
> > From:Dmitry Kan 
> > Sent: Tuesday 17th February 2015 11:40
> > To: solr-user@lucene.apache.org
> > Subject: Re: unusually high 4.10.2 vs 4.3.1 RAM consumption
> >
> > Have you found an explanation to that?
> >
> > On Tue, Feb 17, 2015 at 12:12 PM, Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > We have seen an increase between 4.8.1 and 4.10.
> > >
> > > -Original message-
> > > > From:Dmitry Kan 
> > > > Sent: Tuesday 17th February 2015 11:06
> > > > To: solr-user@lucene.apache.org
> > > > Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption
> > > >
> > > > Hi,
> > > >
> > > > We are currently comparing the RAM consumption of two parallel Solr
> > > > clusters with different solr versions: 4.10.2 and 4.3.1.
> > > >
> > > > For comparable index sizes of a shard (20G and 26G), we observed 9G
> vs
> > > 5.6G
> > > > RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner.
> > > >
> > > > We have not changed the solrconfig.xml to upgrade to 4.10.2 and have
> > > > reindexed data from scratch. The commits are all controlled on the
> > > client,
> > > > i.e. not auto-commits.
> > > >
> > > > Solr: 4.10.2 (high load, mass indexing)
> > > > Java: 1.7.0_76 (Oracle)
> > > > -Xmx25600m
> > > >
> > > >
> > > > Solr: 4.3.1 (normal load, no mass indexing)
> > > > Java: 1.7.0_11 (Oracle)
> > > > -Xmx25600m
> > > >
> > > > The RAM consumption remained the same after the load has stopped on
> the
> > > > 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
> > > > jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved
> RAM as
> > > > seen by top remained at 9G level.
> > > >
> > > > This unusual spike happened during mass data indexing.
> > > >
> > > > What else could be the artifact of such a difference -- Solr or JVM?
> Can
> > > it
> > > > only be explained by the mass indexing? What is worrisome is that the
> > > > 4.10.2 shard reserves 8x times it uses.
> > > >
> > > > What can be done about this?
> > > >
> > > > --
> > > > Dmitry Kan
> > > > Luke Toolbox: http://github.com/DmitryKey/luke
> > > > Blog: http://dmitrykan.blogspot.com
> > > > Twitter: http://twitter.com/dmitrykan
> > > > SemanticAnalyzer: www.semanticanalyzer.info
> > > >
> > >
> >
> >
> >
> > --
> > Dmitry Kan
> > Luke Toolbox: http://github.com/DmitryKey/luke
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> > SemanticAnalyzer: www.semanticanalyzer.info
> >
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Nitin Solanki
Hello Everyone,
  I got confusion between spellcheck.count and
spellcheck.alternativeTermCount in Solr. Any help in details?


Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Toke Eskildsen
On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote:
> Solr: 4.10.2 (high load, mass indexing)
> Java: 1.7.0_76 (Oracle)
> -Xmx25600m
> 
> 
> Solr: 4.3.1 (normal load, no mass indexing)
> Java: 1.7.0_11 (Oracle)
> -Xmx25600m
> 
> The RAM consumption remained the same after the load has stopped on the
> 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
> jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
> seen by top remained at 9G level.

As the JVM does not free OS memory once allocated, top just shows
whatever peak it reached at some point. When you tell the JVM that it is
free to use 25GB, it makes a lot of sense to allocate a fair chunk of
that instead of garbage collecting if there is a period of high usage
(mass indexing for example). 

> What else could be the artifact of such a difference -- Solr or JVM? Can it
> only be explained by the mass indexing? What is worrisome is that the
> 4.10.2 shard reserves 8x times it uses.

If you set your Xmx to a lot less, the JVM will probably favour more
frequent garbage collections over extra heap allocation.

- Toke Eskildsen, State and University Library, Denmark




Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
Thanks Toke!

Now I consistently see the saw-tooth pattern on two shards with new GC
parameters, next I will try your suggestion.

The current params are:

-Xmx25600m -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent
-XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=8
-XX:CMSInitiatingOccupancyFraction=40

Dmitry

On Tue, Feb 17, 2015 at 1:34 PM, Toke Eskildsen 
wrote:

> On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote:
> > Solr: 4.10.2 (high load, mass indexing)
> > Java: 1.7.0_76 (Oracle)
> > -Xmx25600m
> >
> >
> > Solr: 4.3.1 (normal load, no mass indexing)
> > Java: 1.7.0_11 (Oracle)
> > -Xmx25600m
> >
> > The RAM consumption remained the same after the load has stopped on the
> > 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
> > jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
> > seen by top remained at 9G level.
>
> As the JVM does not free OS memory once allocated, top just shows
> whatever peak it reached at some point. When you tell the JVM that it is
> free to use 25GB, it makes a lot of sense to allocate a fair chunk of
> that instead of garbage collecting if there is a period of high usage
> (mass indexing for example).
>
> > What else could be the artifact of such a difference -- Solr or JVM? Can
> it
> > only be explained by the mass indexing? What is worrisome is that the
> > 4.10.2 shard reserves 8x times it uses.
>
> If you set your Xmx to a lot less, the JVM will probably favour more
> frequent garbage collections over extra heap allocation.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Block Join Query Parsers regular expression feature workaround req

2015-02-17 Thread Sankalp Gupta
Hi

I need to have a query in which I need to choose only those parent docs
none of whose children's field is having the specified value.
i.e. I need something like this:
http://localhost:8983/solr/core1/select?*q={!parent
which=contentType:parent}childField:NOT value1*

The problem is* NOT operator is not being supported* in the Block Join
Query Parsers. Could anyone please suggest a way to workaround this problem?
Have also added the problem on *stackoverflow*:
http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature

Regards
Sankalp Gupta


Re: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Nitin Solanki
Any help please?

On Tue, Feb 17, 2015 at 4:57 PM, Nitin Solanki  wrote:

> Hello Everyone,
>   I got confusion between spellcheck.count and
> spellcheck.alternativeTermCount in Solr. Any help in details?
>


Re: Block Join Query Parsers regular expression feature workaround req

2015-02-17 Thread Mikhail Khludnev
try to search all children remove those who has a value1 by dash, then join
remaining
q={!parent which=contentType:parent}contentType:child -contentType:value1
if the space in underneath query causes the problem try to escape it or
wrap to v=$subq



On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta 
wrote:

> Hi
>
> I need to have a query in which I need to choose only those parent docs
> none of whose children's field is having the specified value.
> i.e. I need something like this:
> http://localhost:8983/solr/core1/select?*q={!parent
> which=contentType:parent}childField:NOT value1*
>
> The problem is* NOT operator is not being supported* in the Block Join
> Query Parsers. Could anyone please suggest a way to workaround this
> problem?
> Have also added the problem on *stackoverflow*:
>
> http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature
>
> Regards
> Sankalp Gupta
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Too many merges, stalling...

2015-02-17 Thread Shawn Heisey
On 2/16/2015 8:12 PM, ralph tice wrote:
> Recently I turned on INFO level logging in order to get better insight
> as to what our Solr cluster is doing.  Sometimes as frequently as
> almost 3 times a second we get messages like:
> [CMS][qtp896644936-33133]: too many merges; stalling...
> 
> Less frequently we get:
> [TMP][commitScheduler-8-thread-1]:
> seg=_5dy(4.10.3):C13520226/1044084:delGen=318 size=2784.291 MB [skip:
> too large]
> 
> where size is 2500-4900MB.

I've trimmed most of your original message, but I will refer to things
you have mentioned in the unquoted portion.

The first message simply indicates that you have reached more
simultaneous merges than CMS is configured to allow (3 by default), so
it will stall all of them except one.  The javadocs say that the one
allowed to run will be the smallest, but I have observed the opposite --
the one that is allowed to run is always the largest.

The second message indicates that the merge under consideration would
have exceeded the maximum size, which defaults to 5GB, so it refused to
do that merge.

The mergeFactor setting is deprecated, but still works for now in 4.x
releases.  The reason your merges are happening so frequently is that
you have set this to a low value - 5.  Setting it to a larger value will
make merges less frequent.

The mergeFactor value is used to set maxMergeAtOnce and segmentsPerTier.
 A proper TieredMergePolicy config will have those two settings
(normally set to the same value) as well as maxMergeAtOnceExplicit,
which should be set to three times the value of the other two.  My
config uses 35, 35, and 105 for each of those values, respectively.

You can also allow more simultaneous merges in the CMS config.  I use a
value of 6 here, to avoid lengthy indexing stalls that will kill the DIH
connection to MySQL.  If the disks are standard spinning magnetic disks,
the number of CMS threads should be one.  If it's SSD, you can use more
threads.

Thanks,
Shawn



Discrepancy between Full import and Delta import query

2015-02-17 Thread Aniket Bhoi
Hi Folks,

I am running Solr 3.4 and using DIH for importing data from a SQL server
backend.

The query for Full import and Delta import is the same ie both pull the
same data.

Full and Delta import query:

SELECT KB_ENTRY.ADDITIONAL_INFO ,KB_ENTRY.KNOWLEDGE_REF
ID,SU_ENTITY_TYPE.REF ENTRY_TYPE_REF,KB_ENTRY.PROFILE_REF,
KB_ENTRY.ITEM_REF, KB_ENTRY.TITLE, KB_ENTRY.ABSTRACT, KB_ENTRY.SOLUTION,
KB_ENTRY.SOLUTION_HTML, KB_ENTRY.FREE_TEXT, KB_ENTRY.DATE_UPDATED,
KB_ENTRY.STATUS_REF, KB_ENTRY.CALL_NUMBER, SU_ENTITY_TYPE.DISPLAY
ENTRY_TYPE, KB_PROFILE.NAME PROFILE_TYPE, AR_PRIMARY_ASSET.ASSET_REF
SERVICE_TYPE, AR_PERSON.FULL_NAME CONTRIBUTOR, IN_SYS_SOURCE.NAME SOURCE,
KB_ENTRY_STATUS.NAME STATUS,(SELECT COUNT (CL_KB_REFER.CALL_NUMBER) FROM
CL_KB_REFER WHERE CL_KB_REFER.ARTICLE_REF = KB_ENTRY.KNOWLEDGE_REF)
LINK_RATE FROM KB_ENTRY, SU_ENTITY_TYPE, KB_PROFILE, AR_PRIMARY_ASSET,
AR_PERSON, IN_SYS_SOURCE, KB_ENTRY_STATUS WHERE KB_ENTRY.PARTITION = 1 AND
KB_ENTRY.STATUS = 'A' AND AR_PERSON.OFFICER_IND = 1 AND
KB_ENTRY.CREATED_BY_REF = AR_PERSON.REF AND KB_ENTRY.SOURCE =
IN_SYS_SOURCE.REF AND KB_ENTRY.STATUS_REF = KB_ENTRY_STATUS.REF AND
KB_ENTRY_STATUS.STATUS = 'A' AND KB_ENTRY.PROFILE_REF = KB_PROFILE.REF AND
KB_ENTRY.ITEM_REF = AR_PRIMARY_ASSET.ITEM_REF AND KB_ENTRY.ENTITY_TYPE =
SU_ENTITY_TYPE.REF AND KB_ENTRY.KNOWLEDGE_REF='${dataimporter.delta.ID}'"


Delta query:select KNOWLEDGE_REF as ID from KB_ENTRY where (DATE_UPDATED
> '${dataimporter.last_index_time}' OR DATE_CREATED >
'${dataimporter.last_index_time}')">


The Problem here is that When I run the full Import ,everything works fine
and all the feilds .data are displayed fine in the search

However When I run Delta import,for some records the ENTRY_TYPE field is
not returned from the database,

Let me illustrate it with an example:

Search result After running Full Import:

Record Name:John Doe
Entry ID:500
Entry Type:Worker

Search result after running Delta import:

Record Name:John Doe
Entry ID:500
Entry Type:


FYI:I have run the Full and Delta import queries(Though both are the same)
on the SQL Server IDE  and both return The Entry Type feild correctly.

Not sure why the entry Type feild vanishes from Solr when Delta import is
run.

Any idea why this would happen.

Thanks,

Aniket


Collations are not using suggestions to build collations

2015-02-17 Thread Nitin Solanki
Hi,
  I want to build collations using suggestions of the query. But
collations are building without using suggestions, they are using its own
suggesters*(misspellingsAndCorrections)* and don't know from where these
suggestions are coming.

You can see the result by seeing below response for the query
*URL :*
http://localhost:8983/solr/wikingram/spell?q=gram_ci:%22kuchi%20kucch%20hota%22&wt=json&indent=true&shards.qt=/spell&shards.tolerant=true&rows=1

You can see that "kuch" terms are not in both "kuchi and kucch"
suggestions. But "kuch" is coming into

misspellingsAndCorrections",[
  "kuchi","kuch",
  "kucch","kuch",
  "hota","hota"]]]}}

. How it is happening?


*Response:*

{
  "responseHeader":{
"status":0,
"QTime":3440},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  },
  "spellcheck":{
"suggestions":[
  "kuchi",{
"numFound":5,
"startOffset":9,
"endOffset":14,
"origFreq":40,
"suggestion":[{
"word":"kochi",
"freq":976},
  {
"word":"k chi",
"freq":442},
  {
"word":"yuchi",
"freq":71},
  {
"word":"kucha",
"freq":32},
  {
"word":"kichi",
"freq":17}]},
  "kucch",{
"numFound":2,
"startOffset":15,
"endOffset":20,
"origFreq":9,
"suggestion":[{
"word":"kutch",
"freq":231},
  {
"word":"kusch",
"freq":67}]},
  "correctlySpelled",false,
  "collation",[
"collationQuery","gram_ci:\"kuch kuch hota\"",
"hits",22,
"misspellingsAndCorrections",[
  "kuchi","kuch",
  "kucch","kuch",
  "hota","hota"]]]}}


Re: Having a spot of trouble setting up /browse

2015-02-17 Thread Erik Hatcher
And FYI, out of the box with Solr 5.0, using the data driven config (the 
default when creating a collection with `bin/solr create -c …`), /browse is 
wired in by default with no templates explicit in the configuration as they are 
baked into the VrW library itself.

But yeah, what Alexandre said - need to the ’s included like in Solr’s 
4.10.3 example collection1 configuration as well as the conf/velocity files.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 




> On Feb 16, 2015, at 8:44 PM, Alexandre Rafalovitch  wrote:
> 
> Velocity libraries and .vm templates as a first step! Did you get those setup?
> 
> Regards,
>   Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
> 
> 
> On 16 February 2015 at 19:33, Benson Margulies  wrote:
>> So, I had set up a solr core modelled on the 'multicore' example in 4.10.3,
>> which has no /browse.
>> 
>> Upon request, I went to set up /browse.
>> 
>> I copied in a minimal version. When I go there, I just get some XML back:
>> 
>> 
>> 
>> 0
>> 4
>> 
>> 
>> 
>> 
>> 
>> What else does /browse depend upon?



Using TimestampUpdateProcessorFactory and updateRequestProcessorChain

2015-02-17 Thread Shu-Wai Chow
Hi, all.  I’m trying to insert a field into Solr called last_modified, which 
holds a timestamp of the update. Since this is a cloud setup, I'm using the 
TimestampUpdateProcessorFactory to update the updateRequestProcessorChain.

solrconfig.xml:



last_modified





last_modified






In schema.xml, I have:



This is the command I'm using to index:

curl 
"http://localhost:8983/solr/update/extract?uprefix=attr_&fmap.content=body&literal.id=1234.id&last_modified=NOW";
 -F "sc=@1234.txt"
However, after indexing, the last_modified field is still not showing up on 
queries. Is there something else I should be doing?  Thanks.

RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Dyer, James
See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the 
following section, for details.

Briefly, "count" is the # of suggestions it will return for terms that are 
*not* in your index/dictionary.  "alternativeTermCount" are the # of 
alternatives you want returned for terms that *are* in your dictionary.  You 
can set them to the same value, unless you want fewer suggestions when the 
terms is in the dictionary.

James Dyer
Ingram Content Group

-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 5:27 AM
To: solr-user@lucene.apache.org
Subject: spellcheck.count v/s spellcheck.alternativeTermCount

Hello Everyone,
  I got confusion between spellcheck.count and
spellcheck.alternativeTermCount in Solr. Any help in details?


Re: Block Join Query Parsers regular expression feature workaround req

2015-02-17 Thread Sankalp Gupta
Hi Mikhail,

It won't solve my problem.
For ex:
Suppose my docs are like this:

city1


   city2




city2


   city3



Now if I want* a query to return me all the users not having any address*
related to *city1* (i.e. only userid=2 should be in the result)and then if
i query:
*q={!parent which=userid:*}*:* -address:city1*
This will return me two* results i.e.** userid=2 and userid=1 *(as userid=1
is also having a child whose address is city2)  , *desired output was
userid=2 only.*

On Tue, Feb 17, 2015 at 8:12 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> try to search all children remove those who has a value1 by dash, then join
> remaining
> q={!parent which=contentType:parent}contentType:child -contentType:value1
> if the space in underneath query causes the problem try to escape it or
> wrap to v=$subq
>
>
>
> On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta  >
> wrote:
>
> > Hi
> >
> > I need to have a query in which I need to choose only those parent docs
> > none of whose children's field is having the specified value.
> > i.e. I need something like this:
> > http://localhost:8983/solr/core1/select?*q={!parent
> > which=contentType:parent}childField:NOT value1*
> >
> > The problem is* NOT operator is not being supported* in the Block Join
> > Query Parsers. Could anyone please suggest a way to workaround this
> > problem?
> > Have also added the problem on *stackoverflow*:
> >
> >
> http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature
> >
> > Regards
> > Sankalp Gupta
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Using TimestampUpdateProcessorFactory and updateRequestProcessorChain

2015-02-17 Thread Ahmet Arslan
Hi,

You are using "/update" when registering, but using "/update/extract" when 
invoking.

Ahmet



On Tuesday, February 17, 2015 6:28 PM, Shu-Wai Chow  
wrote:
Hi, all.  I’m trying to insert a field into Solr called last_modified, which 
holds a timestamp of the update. Since this is a cloud setup, I'm using the 
TimestampUpdateProcessorFactory to update the updateRequestProcessorChain.

solrconfig.xml:



last_modified





last_modified






In schema.xml, I have:



This is the command I'm using to index:

curl 
"http://localhost:8983/solr/update/extract?uprefix=attr_&fmap.content=body&literal.id=1234.id&last_modified=NOW";
 -F "sc=@1234.txt"
However, after indexing, the last_modified field is still not showing up on 
queries. Is there something else I should be doing?  Thanks.


Re: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Nitin Solanki
Hi James,
How can you say that "count" doesn't use
index/dictionary then from where suggestions come.

On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James 
wrote:

> See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
> the following section, for details.
>
> Briefly, "count" is the # of suggestions it will return for terms that are
> *not* in your index/dictionary.  "alternativeTermCount" are the # of
> alternatives you want returned for terms that *are* in your dictionary.
> You can set them to the same value, unless you want fewer suggestions when
> the terms is in the dictionary.
>
> James Dyer
> Ingram Content Group
>
> -Original Message-
> From: Nitin Solanki [mailto:nitinml...@gmail.com]
> Sent: Tuesday, February 17, 2015 5:27 AM
> To: solr-user@lucene.apache.org
> Subject: spellcheck.count v/s spellcheck.alternativeTermCount
>
> Hello Everyone,
>   I got confusion between spellcheck.count and
> spellcheck.alternativeTermCount in Solr. Any help in details?
>


Re: Too many merges, stalling...

2015-02-17 Thread Shawn Heisey
On 2/17/2015 7:47 AM, Shawn Heisey wrote:
> The first message simply indicates that you have reached more
> simultaneous merges than CMS is configured to allow (3 by default), so
> it will stall all of them except one.  The javadocs say that the one
> allowed to run will be the smallest, but I have observed the opposite --
> the one that is allowed to run is always the largest.

I have stated some things incorrectly here.  The gist of what I wrote is
correct, but the details are not.  These details are important,
especially for those who read this in the archives later.

As long as you are below maxMergeCount (default 3) for the number of
merges that have been scheduled, the system will simultaneously run up
to maxThreads (default 1) merges from that list, and it will ALSO allow
the incoming thread (indexing new data) to run.

Once you reach maxMergeCount, the incoming thread is stalled until you
are back below maxMergeCount, and up to maxThreads merges will be
running while the incoming thread is stalled.

Thanks,
Shawn



Re: Using TimestampUpdateProcessorFactory and updateRequestProcessorChain

2015-02-17 Thread Chris Hostetter
: Hi,
: 
: You are using "/update" when registering, but using "/update/extract" when 
invoking.
: 
: Ahmet

if your goal is that *every* doc will get a last_modified, regarldess of 
how it is indexed, then you don't need to set the "update.chain" default 
on every requestHandler -- instead just mark your 
updateRequestProcessorChain as the default...

   
 
   last_modified
 
 ...

: 
: On Tuesday, February 17, 2015 6:28 PM, Shu-Wai Chow 
 wrote:
: Hi, all.  I’m trying to insert a field into Solr called last_modified, which 
holds a timestamp of the update. Since this is a cloud setup, I'm using the 
TimestampUpdateProcessorFactory to update the updateRequestProcessorChain.
: 
: solrconfig.xml:
: 
: 
: 
: last_modified
: 
: 
: 
: 
: 
: last_modified
: 
: 
: 
: 
: 
: 
: In schema.xml, I have:
: 
: 
: 
: This is the command I'm using to index:
: 
: curl 
"http://localhost:8983/solr/update/extract?uprefix=attr_&fmap.content=body&literal.id=1234.id&last_modified=NOW";
 -F "sc=@1234.txt"
: However, after indexing, the last_modified field is still not showing up on 
queries. Is there something else I should be doing?  Thanks.
: 

-Hoss
http://www.lucidworks.com/

Re: Solr 4.8.1 : Response Code 500 when creating the new request handler

2015-02-17 Thread Chris Hostetter

: 1. Look further down in the stack trace for the "caused by" that details
: > the specific cause of the exception.

: I am still not able to find the cause of this.

jack is refering to the log file from your server ... sometimes there 
are more details there.

: Sorry i but don't know it is non-standard approach. please guide me here.

I'm not sure what jack was refering to -- i don't see anything "non 
standard" about how you have your handler configured.

: We are trying to find all the results so we are using q.alt=*:*.
: There are some products in our company who wants of find all the results 
*whose
: type is garments* and i forgot to mention we are trying to find only 6
: rows. So using this request handler we are providing the 6 rows.

Jack's point here is that you have specified a q.alt in your "invariants" 
but you have also specified it in the query params -- which will be 
totally ignored.  what specifically is your goal of haivng that query 
param in the sample query you tried? 

As a general debugging tip: Did you try ignoring your custom 
reuqestHandler, and just running a simple /select query with all of those 
params specified in the URL?  ... it can help to try and narrow down the 
problem -- in this case, i'm pretty sure you would have gotten the same 
error, and then the distractions of hte "invariants" question owuld have 
been irellevant


Looking at the source code for 4.8.1 it appears that the error you are 
seeing is edismax doing a really bad job of trying to report an error 
parsing in parsing the "qf" param -- which you haven't specified at all in 
your params

  try {
queryFields = DisMaxQParser.parseQueryFields(req.getSchema(), 
solrParams);  // req.getSearcher() here causes searcher refcount imbalance
  } catch (SyntaxError e) {
throw new RuntimeException();
  }

..if you add a "qf" param with the list of fields you want to search, (of 
a 'df' param to specify a default field) i suspect this error will go away.


I filed a bug to fix this terrible code to give a useful error msg in the 
future...

https://issues.apache.org/jira/browse/SOLR-7120




: > 3. You have q.alt in invariants, but also in the actual request, which is a
: > contradiction in terms - what is your actual intent? This isn't the cause
: > of the exception, but does raise questions of what you are trying to do.
: > 4. Why don't you have a q parameter for the actual query?
: >
: >
: > -- Jack Krupansky
: >
: > On Sat, Feb 14, 2015 at 1:57 AM, Aman Tandon 
: > wrote:
: >
: > > Hi,
: > >
: > > I am using Solr 4.8.1 and when i am creating the new request handler i am
: > > getting the following error:
: > >
: > > *Request Handler config:*
: > >
: > > 
: > > 
: > > edismax
: > > on
: > > *:*
: > >
: > > 0.01
: > > 
: > >
: > > 
: > > type:garments
: > > 
: > > 
: > >
: > > *Error:*
: > >
: > > java.lang.RuntimeException at
: > > >
: > >
: > 
org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.(ExtendedDismaxQParser.java:1455)
: > > > at
: > > >
: > >
: > 
org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239)
: > > > at
: > > >
: > >
: > 
org.apache.solr.search.ExtendedDismaxQParser.(ExtendedDismaxQParser.java:108)
: > > > at
: > > >
: > >
: > 
org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37)
: > > > at org.apache.solr.search.QParser.getParser(QParser.java:315) at
: > > >
: > >
: > 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144)
: > > > at
: > > >
: > >
: > 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
: > > > at
: > > >
: > >
: > 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
: > > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at
: > > >
: > >
: > 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
: > > > at
: > > >
: > >
: > 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
: > > > at
: > > >
: > >
: > 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
: > > > at
: > > >
: > >
: > 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
: > > > at
: > > >
: > >
: > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
: > > > at
: > > >
: > >
: > 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
: > > > at
: > > >
: > >
: > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
: > > > at
: > > >
: > >
: > 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
: > > > at
: > > >
: > >
: > 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
: > > > at
: > > >
: > org

RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Dyer, James
Here is an example to illustrate what I mean...

- query q=text:(life AND 
hope)&spellcheck.count=10&spellcheck.alternativeTermCount=5
- suppose at least one document in your dictionary field has "life" in it
- also suppose zero documents in your dictionary field have "hope" in them
- The spellchecker will try to return you up to 10 suggestions for "hope", but 
only up to 5 suggestions for "life"

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

Hi James,
How can you say that "count" doesn't use
index/dictionary then from where suggestions come.

On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James 
wrote:

> See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
> the following section, for details.
>
> Briefly, "count" is the # of suggestions it will return for terms that are
> *not* in your index/dictionary.  "alternativeTermCount" are the # of
> alternatives you want returned for terms that *are* in your dictionary.
> You can set them to the same value, unless you want fewer suggestions when
> the terms is in the dictionary.
>
> James Dyer
> Ingram Content Group
>
> -Original Message-
> From: Nitin Solanki [mailto:nitinml...@gmail.com]
> Sent: Tuesday, February 17, 2015 5:27 AM
> To: solr-user@lucene.apache.org
> Subject: spellcheck.count v/s spellcheck.alternativeTermCount
>
> Hello Everyone,
>   I got confusion between spellcheck.count and
> spellcheck.alternativeTermCount in Solr. Any help in details?
>


Re: Block Join Query Parsers regular expression feature workaround req

2015-02-17 Thread Kydryavtsev Andrey
How about  find all parents which have at least one child with address:city1 
and then "not"
Like (not sure about syntax at all)
q=-{!parent which=userid:*}address:city1

17.02.2015, 20:21, "Sankalp Gupta" :
> Hi Mikhail,
>
> It won't solve my problem.
> For ex:
> Suppose my docs are like this:
> 
> city1
> 
> 
>    city2
> 
> 
>
> 
> city2
> 
> 
>    city3
> 
> 
>
> Now if I want* a query to return me all the users not having any address*
> related to *city1* (i.e. only userid=2 should be in the result)and then if
> i query:
> *q={!parent which=userid:*}*:* -address:city1*
> This will return me two* results i.e.** userid=2 and userid=1 *(as userid=1
> is also having a child whose address is city2)  , *desired output was
> userid=2 only.*
>
> On Tue, Feb 17, 2015 at 8:12 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>>  try to search all children remove those who has a value1 by dash, then join
>>  remaining
>>  q={!parent which=contentType:parent}contentType:child -contentType:value1
>>  if the space in underneath query causes the problem try to escape it or
>>  wrap to v=$subq
>>
>>  On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta >  wrote:
>>>  Hi
>>>
>>>  I need to have a query in which I need to choose only those parent docs
>>>  none of whose children's field is having the specified value.
>>>  i.e. I need something like this:
>>>  http://localhost:8983/solr/core1/select?*q={!parent
>>>  which=contentType:parent}childField:NOT value1*
>>>
>>>  The problem is* NOT operator is not being supported* in the Block Join
>>>  Query Parsers. Could anyone please suggest a way to workaround this
>>>  problem?
>>>  Have also added the problem on *stackoverflow*:
>>  
>> http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature
>>>  Regards
>>>  Sankalp Gupta
>>  --
>>  Sincerely yours
>>  Mikhail Khludnev
>>  Principal Engineer,
>>  Grid Dynamics
>>
>>  
>>  


Re: Block Join Query Parsers regular expression feature workaround req

2015-02-17 Thread Mikhail Khludnev
Sankalp,
would you mind to post debugQuery=on output, without it it's hard to get
what's the problem?

However, it's worth to mention that Andrey's suggestion seems really
promising.


On Tue, Feb 17, 2015 at 8:19 PM, Sankalp Gupta 
wrote:

> Hi Mikhail,
>
> It won't solve my problem.
> For ex:
> Suppose my docs are like this:
> 
> city1
> 
> 
>city2
> 
> 
>
> 
> city2
> 
> 
>city3
> 
> 
>
> Now if I want* a query to return me all the users not having any address*
> related to *city1* (i.e. only userid=2 should be in the result)and then if
> i query:
> *q={!parent which=userid:*}*:* -address:city1*
> This will return me two* results i.e.** userid=2 and userid=1 *(as userid=1
> is also having a child whose address is city2)  , *desired output was
> userid=2 only.*
>
> On Tue, Feb 17, 2015 at 8:12 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
> > try to search all children remove those who has a value1 by dash, then
> join
> > remaining
> > q={!parent which=contentType:parent}contentType:child -contentType:value1
> > if the space in underneath query causes the problem try to escape it or
> > wrap to v=$subq
> >
> >
> >
> > On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta <
> sankalp.gu...@snapdeal.com
> > >
> > wrote:
> >
> > > Hi
> > >
> > > I need to have a query in which I need to choose only those parent docs
> > > none of whose children's field is having the specified value.
> > > i.e. I need something like this:
> > > http://localhost:8983/solr/core1/select?*q={!parent
> > > which=contentType:parent}childField:NOT value1*
> > >
> > > The problem is* NOT operator is not being supported* in the Block Join
> > > Query Parsers. Could anyone please suggest a way to workaround this
> > > problem?
> > > Have also added the problem on *stackoverflow*:
> > >
> > >
> >
> http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature
> > >
> > > Regards
> > > Sankalp Gupta
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread O. Olson
At this time the latest released version of Solr is 4.10.3. Is there anyway
we can get the source code for this release version?

I tried to checkout the Solr code from
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In the
commit log, I see a number of revisions but nothing mention which is the
release version. The latest revision being 1657441 on Feb 4. Does this
correspond to 4.10.3? If no, then how do I go about getting the source code
of 4.10.3.

I'm also curious where the version number is embedded i.e. is it in a file
somewhere?

I want to ensure I am using the released version, and not some bug fixes
after the version got released. 

Thank you in anticipation.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Collations are not working fine.

2015-02-17 Thread Reitzel, Charles
Hi Nitin,

I was trying many different options for a couple different queries.   In fact, 
I have collations working ok now with the Suggester and WFSTLookup.   The 
problem may have been due to a different dictionary and/or lookup 
implementation and the specific options I was sending.

In general, we're using spellcheck for search suggestions.   The Suggester 
component (vs. Suggester spellcheck implementation), doesn't handle all of our 
cases.  But we can get things working using the spellcheck interface.  What 
gives us particular troubles are the cases where a term may be valid by itself, 
but also be the start of longer words.

The specific terms are acronyms specific to our business.   But I'll attempt to 
show generic examples.

E.g. a partial term like "fo" can expand to fox, fog, etc. and a full term like 
brown can also expand to something like brownstone.   And, yes, the collation 
"brownstone fox" is nonsense.  But assume, for the sake of argument, it appears 
in our documents somewhere.

For multiple term query with a spelling error (or partially typed term):  brown 
fo

We get collations in order of hits, descending like ...
"brown fox",
"brown fog",
"brownstone fox".

So far, so good.  

For a single term query, brown, we get a single suggestion, brownstone and no 
collations.

So, we don't know to keep the term brown!

At this point, we need spellcheck.extendedResults=true and look at the origFreq 
value in the suggested corrections.  Unfortunately, the Suggester (spellcheck 
dictionary) does not populate the original frequency information.  And, without 
this information, the SpellCheckComponent cannot format the extended results.

However, with a simple change to Suggester.java, it was easy to get the needed 
frequency information use it to make a sound decision to keep or drop the input 
term.   But I'd be much obliged if there is a better way to go about it.

Configs below.

Thanks,
Charlie


  

  suggestDictionary
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.fst.WFSTLookupFactory
  text_all
  0.0001
  true
  true

  



  
Search Suggestions (spellcheck)
explicit
json
0
edismax
text_all
id,name,ticker,entityType,transactionType,accountType
true
5
suggestDictionary
5
true
true
10
5
  
  
suggestSC
  


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 3:17 AM
To: solr-user@lucene.apache.org
Subject: Re: Collations are not working fine.

Hi Charles,
 Will you please send the configuration which you tried. It 
will help to solve my problem. Have you sorted the collations on hits or 
frequencies of suggestions? If you did than please assist me.

On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles < 
charles.reit...@tiaa-cref.org> wrote:

> I have been working with collations the last couple days and I kept adding
> the collation-related parameters until it started working for me.   It
> seems I needed 50.
>
> But, I am using the Suggester with the WFSTLookupFactory.
>
> Also, I needed to patch the suggester to get frequency information in 
> the spellcheck response.
>
> -Original Message-
> From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com]
> Sent: Friday, February 13, 2015 3:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Collations are not working fine.
>
> Hi Nitin,
>
> Can u try with the below config, we have these config seems to be 
> working for us.
>
> 
>
>  text_general
>
>
>   
> wordbreak
> solr.WordBreakSolrSpellChecker
> textSpell
> true
> false
> 5
>   
>
>
> default
> textSpell
> solr.IndexBasedSpellChecker
> ./spellchecker
> 0.75
> 0.01
> true
> 5
>  
>
>
>   
>
>
>
> true
> default
> wordbreak
> 5
> 15
> true
> false
> true
> 100
> 100%
> AND
> 1000
>
>
> *Rajesh.*
>
> On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James 
>  >
> wrote:
>
> > Nitin,
> >
> > Can you post the full spellcheck response when you query:
> >
> > q=gram_ci:"gone wthh thes wint"&wt=json&indent=true&shards.qt=/spell
> >
> > James Dyer
> > Ingram Content Group
> >
> >
> > -Original Message-
> > From: Nitin Solanki [mailto:nitinml...@gmail.com]
> > Sent: Friday, February 13, 2015 1:05 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Collations are not working fine.
> >
> > Hi James Dyer,
> >   I did the same as you told me. Used 
> > WordBreakSolrSpellChecker instead of shingles. But still collations 
> > are not coming or working.
> > For instance, I tried to get collation of "gone with the wind" by 
> > searching "gone wthh thes wint" on field=gram_ci but didn't succeed.
> > Even, I am getting the suggestions of wtth as *with*, thes as *the*,
> wint as *wind*.
> > Also I have documents which contains "gone with the wind" having 167 
> > times in the documents. I don't know that I am missing something or not.
> > Please check my b

Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread Hrishikesh Gadre
Hi,

You can get the released code base here

https://github.com/apache/lucene-solr/releases

Thanks
Hrishikesh

On Tue, Feb 17, 2015 at 2:20 PM, O. Olson  wrote:

> At this time the latest released version of Solr is 4.10.3. Is there anyway
> we can get the source code for this release version?
>
> I tried to checkout the Solr code from
> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In
> the
> commit log, I see a number of revisions but nothing mention which is the
> release version. The latest revision being 1657441 on Feb 4. Does this
> correspond to 4.10.3? If no, then how do I go about getting the source code
> of 4.10.3.
>
> I'm also curious where the version number is embedded i.e. is it in a file
> somewhere?
>
> I want to ensure I am using the released version, and not some bug fixes
> after the version got released.
>
> Thank you in anticipation.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread Hrishikesh Gadre
Also the version number is encoded (at least) in the build file

https://github.com/apache/lucene-solr/blob/817303840fce547a1557e330e93e5a8ac0618f34/lucene/common-build.xml#L32

Hope this helps.

Thanks
Hrishikesh

On Tue, Feb 17, 2015 at 2:25 PM, Hrishikesh Gadre 
wrote:

> Hi,
>
> You can get the released code base here
>
> https://github.com/apache/lucene-solr/releases
>
> Thanks
> Hrishikesh
>
> On Tue, Feb 17, 2015 at 2:20 PM, O. Olson  wrote:
>
>> At this time the latest released version of Solr is 4.10.3. Is there
>> anyway
>> we can get the source code for this release version?
>>
>> I tried to checkout the Solr code from
>> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In
>> the
>> commit log, I see a number of revisions but nothing mention which is the
>> release version. The latest revision being 1657441 on Feb 4. Does this
>> correspond to 4.10.3? If no, then how do I go about getting the source
>> code
>> of 4.10.3.
>>
>> I'm also curious where the version number is embedded i.e. is it in a file
>> somewhere?
>>
>> I want to ensure I am using the released version, and not some bug fixes
>> after the version got released.
>>
>> Thank you in anticipation.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>


Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread O. Olson
Thank you Hrishikesh. Funny how GitHub is not mentioned  on
http://lucene.apache.org/solr/resources.html  

I think common-build.xml is what I was looking for. Thank you



Hrishikesh Gadre-3 wrote
> Also the version number is encoded (at least) in the build file
> 
> https://github.com/apache/lucene-solr/blob/817303840fce547a1557e330e93e5a8ac0618f34/lucene/common-build.xml#L32
> 
> Hope this helps.
> 
> Thanks
> Hrishikesh


Hrishikesh Gadre-3 wrote
> Hi,
> 
> You can get the released code base here
> 
> https://github.com/apache/lucene-solr/releases
> 
> Thanks
> Hrishikesh





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187048.html
Sent from the Solr - User mailing list archive at Nabble.com.


CSV entry as multiple documents

2015-02-17 Thread Henrique Oliveira
Hi all,

I was wondering if there is a way to tell Solr to treat a CSV entry as multiple 
documents instead of one document. For instance, suppose that a CSV file has 4 
fields and a single entry:
t1,v1,v2,v3
2015-01-01T01:00:59Z,0.3,0.5,0.7

I want Solr to update its index like it were 3 different documents:
t1,v
2015-01-01T01:00:59Z,0.3
2015-01-01T01:00:59Z,0.5
2015-01-01T01:00:59Z,0.7

Is that possible, or do I have to create a different CSV for it?

Many thanks,
Henrique.

Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread Shawn Heisey
On 2/17/2015 3:20 PM, O. Olson wrote:
> At this time the latest released version of Solr is 4.10.3. Is there anyway
> we can get the source code for this release version?
>
> I tried to checkout the Solr code from
> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In the
> commit log, I see a number of revisions but nothing mention which is the
> release version. The latest revision being 1657441 on Feb 4. Does this
> correspond to 4.10.3? If no, then how do I go about getting the source code
> of 4.10.3.

That is the current development branch for 4.10.x.  There are some
changes in that branch that are not in any released version yet.  If a
4.10.4 is ever released, it will come from that branch.  There is no
guarantee that a 4.10.4 will ever be released.

It is likely that the 5.0.0 release will be announced in the next few
days.  A problem could still be found, but the current release candidate
is looking good so far.

> I'm also curious where the version number is embedded i.e. is it in a file
> somewhere?

Yes.  You can find it in lucene/version.propertiesin a typical checkout.

> I want to ensure I am using the released version, and not some bug fixes
> after the version got released. 

For that exact version, you want to use this URL for your svn checkout:

http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_3/

I don't see lucene/version.properties in that tag, but the 4.10.3
version does show up in lucene/common-build.xml.

Thanks,
Shawn



Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread Mike Drob
The SVN source is under tags, not branches.

http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_3/

On Tue, Feb 17, 2015 at 4:39 PM, O. Olson  wrote:

> Thank you Hrishikesh. Funny how GitHub is not mentioned  on
> http://lucene.apache.org/solr/resources.html
>
> I think common-build.xml is what I was looking for. Thank you
>
>
>
> Hrishikesh Gadre-3 wrote
> > Also the version number is encoded (at least) in the build file
> >
> >
> https://github.com/apache/lucene-solr/blob/817303840fce547a1557e330e93e5a8ac0618f34/lucene/common-build.xml#L32
> >
> > Hope this helps.
> >
> > Thanks
> > Hrishikesh
>
>
> Hrishikesh Gadre-3 wrote
> > Hi,
> >
> > You can get the released code base here
> >
> > https://github.com/apache/lucene-solr/releases
> >
> > Thanks
> > Hrishikesh
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187048.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread O. Olson
Thank you Mike. This is what I was looking for. I apparently did not
understand what tags where.


Mike Drob wrote
> The SVN source is under tags, not branches.
> 
> http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_3/





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187054.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread O. Olson
Thank you Shawn. I have not updated my version in a while, so I prefer to do
it to 4.10 first, rather than go directly to 5.0. I'd be working on it
towards the end of this week.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187055.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solrcloud sizing

2015-02-17 Thread Dominique Bejean
One of our customers needs to index 15 billions document in a collection.
As this volume is not usual for me, I need some advices about solrcloud
sizing (how much servers, nodes, shards, replicas, memory, ...)

Some inputs :

   - Collection size : 15 billions document
   - Collection update : 8 millions new documents / days + 8 millions
   deleted documents / days
   - Updates occur during the night without queries
   - Queries occur during the day without updates
   - Document size is nearly 300 bytes
   - Document fields are mainly string including one date field
   - The same terms will occurs several time for a given field (from 10 to
   100.000)
   - Query will use a date period and a filter query on one or more fields
   - 10.000 queries / minutes
   - expected response time < 500ms
   - 1 billion documents indexed = 5Gb index size
   - no ssd drives

So, what is you advice about :

# of shards : 15 billions documents -> 16 shards ?
# of replicas ?
# of nodes = # of shards ?
heap memory per node ?
direct memory per node ?

Thank your advices ?

Dominique


Re: CSV entry as multiple documents

2015-02-17 Thread Anshum Gupta
Hi Henrique,

Solr supports posting a csv with multiple rows. Have a look at the
documentation in the ref. guide here:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates



On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira 
wrote:

> Hi all,
>
> I was wondering if there is a way to tell Solr to treat a CSV entry as
> multiple documents instead of one document. For instance, suppose that a
> CSV file has 4 fields and a single entry:
> t1,v1,v2,v3
> 2015-01-01T01:00:59Z,0.3,0.5,0.7
>
> I want Solr to update its index like it were 3 different documents:
> t1,v
> 2015-01-01T01:00:59Z,0.3
> 2015-01-01T01:00:59Z,0.5
> 2015-01-01T01:00:59Z,0.7
>
> Is that possible, or do I have to create a different CSV for it?
>
> Many thanks,
> Henrique.




-- 
Anshum Gupta
http://about.me/anshumgupta


Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
Hi,

I don't know whether it is my setup or any other reasons. But the fact is
that a very simple sort is not working in my Solr 4.7 environment.

The query is very simple :
http://localhost:8983/solr/bibs/select?q=author:soros&fl=id,author,title&sort=title+asc&wt=xml&start=0&indent=true

And the output is NOT sorted according to title :



0
1

title asc
id,author,title
true
0
author:soros
xml




9018

Soros, George, 1930-


The alchemy of finance : reading the mind of the market / George Soros



15785

Soros, George, 1930-
Soros Foundations

Bosnia / by George Soros


16281

Soros, George, 1930-
Soros Foundations


Prospect for European disintegration / by George Soros



25807

Soros, George


Open society : reforming global capitalism / George Soros



27440
George Soros on globalization

Soros, George, 1930-



22254

Soros, George, 1930-


The crisis of global capitalism : open society endangered / George Soros



16914

Soros, George, 1930-
Soros Fund Management

The theory of reflexivity / by George Soros


17343

Financial turmoil in Europe and the United States : essays / George Soros


Soros, George, 1930-



15542

Soros, George, 1930-
Harvard Club of New York City


Nationalist dictatorships versus open society / by George Soros



15891

Soros, George


The new paradigm for financial markets : the credit crisis of 2008 and what
it means / George Soros





Thank you for the help in advance,
Simon.


Re: CSV entry as multiple documents

2015-02-17 Thread Alexandre Rafalovitch
I think the question asked was a bit different. It was about having
one row/document split into multiple with some fields replicated and
some mapped.

JSON (single-document format) has a split command which might be
similar to what's being asked. CSV has a split command as well, but I
think it is more about creating a multiValued field.

Or did I miss a different parameter?

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 17 February 2015 at 19:41, Anshum Gupta  wrote:
> Hi Henrique,
>
> Solr supports posting a csv with multiple rows. Have a look at the
> documentation in the ref. guide here:
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates
>
>
>
> On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira 
> wrote:
>
>> Hi all,
>>
>> I was wondering if there is a way to tell Solr to treat a CSV entry as
>> multiple documents instead of one document. For instance, suppose that a
>> CSV file has 4 fields and a single entry:
>> t1,v1,v2,v3
>> 2015-01-01T01:00:59Z,0.3,0.5,0.7
>>
>> I want Solr to update its index like it were 3 different documents:
>> t1,v
>> 2015-01-01T01:00:59Z,0.3
>> 2015-01-01T01:00:59Z,0.5
>> 2015-01-01T01:00:59Z,0.7
>>
>> Is that possible, or do I have to create a different CSV for it?
>>
>> Many thanks,
>> Henrique.
>
>
>
>
> --
> Anshum Gupta
> http://about.me/anshumgupta


Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Alexandre Rafalovitch
What's the field definition for your "title" field? Is it just string
or are you doing some tokenizing?

It should be a string or a single token cleaned up (e.g. lower-cased)
using KeywordTokenizer. In the example schema, you will normally see
the original field tokenized and the sort field separately with
copyField connection. In latest Solr, docValues are also recommended
for sort fields.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/

On 17 February 2015 at 19:52, Simon Cheng  wrote:
> I don't know whether it is my setup or any other reasons. But the fact is
> that a very simple sort is not working in my Solr 4.7 environment.
>
> The query is very simple :
> http://localhost:8983/solr/bibs/select?q=author:soros&fl=id,author,title&sort=title+asc&wt=xml&start=0&indent=true
>
> And the output is NOT sorted according to title :


Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
Hi Alex,

It's simply defined like this in the schema.xml :

   

and it is cloned to the other multi-valued field o_title :

   

Should I simply change the type to be "string" instead?

Thanks again,
Simon.


On Wed, Feb 18, 2015 at 12:00 PM, Alexandre Rafalovitch 
wrote:

> What's the field definition for your "title" field? Is it just string
> or are you doing some tokenizing?
>
> It should be a string or a single token cleaned up (e.g. lower-cased)
> using KeywordTokenizer. In the example schema, you will normally see
> the original field tokenized and the sort field separately with
> copyField connection. In latest Solr, docValues are also recommended
> for sort fields.
>
> Regards,
>Alex.
>


Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Alexandre Rafalovitch
If you are not searching against the "title" field directly, you can
change it to string. If you do, create a separate one, specifically
for sorting. You should be able to use docValues with that field even
in Solr 4.7.

Remember to re-index.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 17 February 2015 at 20:16, Simon Cheng  wrote:
> Hi Alex,
>
> It's simply defined like this in the schema.xml :
>
> multiValued="false"/>
>
> and it is cloned to the other multi-valued field o_title :
>
>
>
> Should I simply change the type to be "string" instead?
>
> Thanks again,
> Simon.
>
>
> On Wed, Feb 18, 2015 at 12:00 PM, Alexandre Rafalovitch 
> wrote:
>
>> What's the field definition for your "title" field? Is it just string
>> or are you doing some tokenizing?
>>
>> It should be a string or a single token cleaned up (e.g. lower-cased)
>> using KeywordTokenizer. In the example schema, you will normally see
>> the original field tokenized and the sort field separately with
>> copyField connection. In latest Solr, docValues are also recommended
>> for sort fields.
>>
>> Regards,
>>Alex.
>>


Re: CSV entry as multiple documents

2015-02-17 Thread Henrique Oliveira
Yes, Alexandre is right about my question. To make it clear, a CSV that look 
like:
t1,v1,v2,v2
2015-01-01T01:59:00Z,0.3,0.5,0.7
2015-01-01T02:00:00Z,0.4,0.5,0.8

would be the same of indexing
t1,v
2015-01-01T01:59:00Z,0.3
2015-01-01T01:59:00Z,0.5
2015-01-01T01:59:00Z,0.7
2015-01-01T02:00:00Z,0.4
2015-01-01T02:00:00Z,0.5
2015-01-01T02:00:00Z,0.8

I don’t know if multiValued field would do the trick. Do you have more info on 
that split command?

Henrique

> On Feb 17, 2015, at 7:57 PM, Alexandre Rafalovitch  wrote:
> 
> I think the question asked was a bit different. It was about having
> one row/document split into multiple with some fields replicated and
> some mapped.
> 
> JSON (single-document format) has a split command which might be
> similar to what's being asked. CSV has a split command as well, but I
> think it is more about creating a multiValued field.
> 
> Or did I miss a different parameter?
> 
> Regards,
>   Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
> 
> 
> On 17 February 2015 at 19:41, Anshum Gupta  wrote:
>> Hi Henrique,
>> 
>> Solr supports posting a csv with multiple rows. Have a look at the
>> documentation in the ref. guide here:
>> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates
>> 
>> 
>> 
>> On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira 
>> wrote:
>> 
>>> Hi all,
>>> 
>>> I was wondering if there is a way to tell Solr to treat a CSV entry as
>>> multiple documents instead of one document. For instance, suppose that a
>>> CSV file has 4 fields and a single entry:
>>> t1,v1,v2,v3
>>> 2015-01-01T01:00:59Z,0.3,0.5,0.7
>>> 
>>> I want Solr to update its index like it were 3 different documents:
>>> t1,v
>>> 2015-01-01T01:00:59Z,0.3
>>> 2015-01-01T01:00:59Z,0.5
>>> 2015-01-01T01:00:59Z,0.7
>>> 
>>> Is that possible, or do I have to create a different CSV for it?
>>> 
>>> Many thanks,
>>> Henrique.
>> 
>> 
>> 
>> 
>> --
>> Anshum Gupta
>> http://about.me/anshumgupta



Re: CSV entry as multiple documents

2015-02-17 Thread Alexandre Rafalovitch
What's your business use case? You don't need the split command, as
you already have those values in separate fields. You could copyField
them to a single multiValued field, but you would still have one
document per original CSV line.

Why do you need multiple documents out of one big CSV entry?

Regards,
Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 17 February 2015 at 20:37, Henrique Oliveira  wrote:
> Yes, Alexandre is right about my question. To make it clear, a CSV that look 
> like:
> t1,v1,v2,v2
> 2015-01-01T01:59:00Z,0.3,0.5,0.7
> 2015-01-01T02:00:00Z,0.4,0.5,0.8
>
> would be the same of indexing
> t1,v
> 2015-01-01T01:59:00Z,0.3
> 2015-01-01T01:59:00Z,0.5
> 2015-01-01T01:59:00Z,0.7
> 2015-01-01T02:00:00Z,0.4
> 2015-01-01T02:00:00Z,0.5
> 2015-01-01T02:00:00Z,0.8
>
> I don’t know if multiValued field would do the trick. Do you have more info 
> on that split command?
>
> Henrique
>
>> On Feb 17, 2015, at 7:57 PM, Alexandre Rafalovitch  
>> wrote:
>>
>> I think the question asked was a bit different. It was about having
>> one row/document split into multiple with some fields replicated and
>> some mapped.
>>
>> JSON (single-document format) has a split command which might be
>> similar to what's being asked. CSV has a split command as well, but I
>> think it is more about creating a multiValued field.
>>
>> Or did I miss a different parameter?
>>
>> Regards,
>>   Alex.
>> 
>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>
>>
>> On 17 February 2015 at 19:41, Anshum Gupta  wrote:
>>> Hi Henrique,
>>>
>>> Solr supports posting a csv with multiple rows. Have a look at the
>>> documentation in the ref. guide here:
>>> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates
>>>
>>>
>>>
>>> On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira 
>>> wrote:
>>>
 Hi all,

 I was wondering if there is a way to tell Solr to treat a CSV entry as
 multiple documents instead of one document. For instance, suppose that a
 CSV file has 4 fields and a single entry:
 t1,v1,v2,v3
 2015-01-01T01:00:59Z,0.3,0.5,0.7

 I want Solr to update its index like it were 3 different documents:
 t1,v
 2015-01-01T01:00:59Z,0.3
 2015-01-01T01:00:59Z,0.5
 2015-01-01T01:00:59Z,0.7

 Is that possible, or do I have to create a different CSV for it?

 Many thanks,
 Henrique.
>>>
>>>
>>>
>>>
>>> --
>>> Anshum Gupta
>>> http://about.me/anshumgupta
>


Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
Hi Alex,

It's okay after I added in a new field "s_title" in the schema and
re-indexed.

   
   

But how can I ignore the articles ("A", "An", "The") in the sorting. As you
can see from the below example :

http://localhost:8983/solr/bibs/select?q=singapore&fl=id,title&sort=s_title+asc&wt=xml&start=0&rows=20&indent=true



0
0

singapore
true
id,title
0
s_title asc
20
xml




36

5th SEACEN-Toronto Centre Leadership Seminar for Senior Management of
Central Banks on Financial System Oversight, 16-21 Oct 2005, Singapore



70

Anti-money laundering & counter-terrorism financing / Commercial Affairs
Dept



15

China's anti-secession law : a legal perspective / Zou, Keyuan



12

China's currency peg : firm in the eye of the storm / Calla Wiemer



22

China's politics in 2004 : dawn of the Hu Jintao era / Zheng Yongnian & Lye
Liang Fook



92

Goods and Services Tax Act [2005 ed.] (Chapter 117A)



13

Governing capacity in China : creating a contingent of qualified personnel
/ Kjeld Erik Brodsgaard



21
Health care marketization in urban China / Gu Xin


85
Lianhe Zaobao, Sunday


84

Singapore : vision of a global city / Jones Lang LaSalle



7

Singapore real estate investment trusts : leveraged value / Tony Darwell



96

Singapore's success : engineering economic growth / Henri Ghesquiere



23

The Chen-Soong meeting : the beginning of inter-party rapprochement in
Taiwan? / Raymond R. Wu



17

The Haw Par saga in the 1970s / project sponsor, Low Kwok Mun; team leader,
Sandy Ho; team members, Audrey Low ... et al



78
The New paper on Sunday


95

The little Red Dot : reflections by Singapore's diplomats / editors, Tommy
Koh, Chang Li Lin



52

[Press releases and articles on policy changes affecting the Singapore
property market] / compiled by the Information Resource Centre, Monetary
Authority of Singapore



dataq

Simon is testing Solr - This one is in English. Color of the Wind. 我是中国人 ,
БOΛbШ OЙ PYCCKO-KИTAЙCKИЙ CΛOBAPb , Français-Chinois






Re: Using TimestampUpdateProcessorFactory and updateRequestProcessorChain

2015-02-17 Thread Shu-Wai Chow
> if your goal is that *every* doc will get a last_modified, regarldess of 
> how it is indexed, then you don't need to set the "update.chain" default 
> on every requestHandler -- instead just mark your 
> updateRequestProcessorChain as the default...
> 
>   
> 
>   last_modified
> 
>…

Thanks for this.  There was some confusion between me and my coworker on which 
requestHandler to set it, but setting it as a default should solve the problem. 
 Unfortunately, I’m still not getting it back.  I’m now wondering if it’s the 
schema that I’m screwing up or how I’m sending the index command.


Schema.xml:

> : 
> :  positionIncrementGap="0”/>

And the update command:

> : curl 
> "http://localhost:8983/solr/update/extract?uprefix=attr_&fmap.content=body&literal.id=1234.id&last_modified=NOW";
>  -F "sc=@1234.txt"

--


> On Feb 17, 2015, at 10:26 AM, Chris Hostetter  
> wrote:
> 
> : Hi,
> : 
> : You are using "/update" when registering, but using "/update/extract" when 
> invoking.
> : 
> : Ahmet
> 
> if your goal is that *every* doc will get a last_modified, regarldess of 
> how it is indexed, then you don't need to set the "update.chain" default 
> on every requestHandler -- instead just mark your 
> updateRequestProcessorChain as the default...
> 
>   
> 
>   last_modified
> 
>...
> 
> : 
> : On Tuesday, February 17, 2015 6:28 PM, Shu-Wai Chow 
>  wrote:
> : Hi, all.  I’m trying to insert a field into Solr called last_modified, 
> which holds a timestamp of the update. Since this is a cloud setup, I'm using 
> the TimestampUpdateProcessorFactory to update the updateRequestProcessorChain.
> : 
> : solrconfig.xml:
> : 
> : 
> : 
> : last_modified
> : 
> : 
> : 
> : 
> : 
> : last_modified
> : 
> : 
> : 
> : 
> : 
> : 
> : In schema.xml, I have:
> : 
> : 
> :  positionIncrementGap="0"/>
> : This is the command I'm using to index:
> : 
> : curl 
> "http://localhost:8983/solr/update/extract?uprefix=attr_&fmap.content=body&literal.id=1234.id&last_modified=NOW";
>  -F "sc=@1234.txt"
> : However, after indexing, the last_modified field is still not showing up on 
> queries. Is there something else I should be doing?  Thanks.
> : 
> 
> -Hoss
> http://www.lucidworks.com/



Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Alexandre Rafalovitch
Like I mentioned before. You could use string type if you just want
title it is. Or you can use a custom type to normalize the indexed
value, as long as you end up with a single token.

So, if you want to strip leading A/An/The, you can use
KeywordTokenizer, combined with whatever post-processing you need. I
would suggest LowerCase filter and perhaps Regex filter to strip off
those leading articles. You may need to iterate a couple of times on
that specific chain.

The good news is that you can just make a couple of type definitions
with different values/order, reload the index (from Cores screen of
the Web Admin UI) and run some of your sample titles through those
different definitions without having to reindex in the Analysis
screen.

Regards,
  Alex.


Sign up for my Solr resources newsletter at http://www.solr-start.com/

On 17 February 2015 at 22:36, Simon Cheng  wrote:
> Hi Alex,
>
> It's okay after I added in a new field "s_title" in the schema and
> re-indexed.
>
> multiValued="false"/>
>
>
> But how can I ignore the articles ("A", "An", "The") in the sorting. As you
> can see from the below example :


Re: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Nitin Solanki
Thanks James,
  I tried the same thing
spellcheck.count=10&spellcheck.alternativeTermCount=5. And I got 5
suggestions of both "life" and "hope" but not like this * The spellchecker
will try to return you up to 10 suggestions for "hope", but only up to 5
suggestions for "life". *


On Wed, Feb 18, 2015 at 1:10 AM, Dyer, James 
wrote:

> Here is an example to illustrate what I mean...
>
> - query q=text:(life AND
> hope)&spellcheck.count=10&spellcheck.alternativeTermCount=5
> - suppose at least one document in your dictionary field has "life" in it
> - also suppose zero documents in your dictionary field have "hope" in them
> - The spellchecker will try to return you up to 10 suggestions for "hope",
> but only up to 5 suggestions for "life"
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: Nitin Solanki [mailto:nitinml...@gmail.com]
> Sent: Tuesday, February 17, 2015 11:35 AM
> To: solr-user@lucene.apache.org
> Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount
>
> Hi James,
> How can you say that "count" doesn't use
> index/dictionary then from where suggestions come.
>
> On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James <
> james.d...@ingramcontent.com>
> wrote:
>
> > See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
> > the following section, for details.
> >
> > Briefly, "count" is the # of suggestions it will return for terms that
> are
> > *not* in your index/dictionary.  "alternativeTermCount" are the # of
> > alternatives you want returned for terms that *are* in your dictionary.
> > You can set them to the same value, unless you want fewer suggestions
> when
> > the terms is in the dictionary.
> >
> > James Dyer
> > Ingram Content Group
> >
> > -Original Message-
> > From: Nitin Solanki [mailto:nitinml...@gmail.com]
> > Sent: Tuesday, February 17, 2015 5:27 AM
> > To: solr-user@lucene.apache.org
> > Subject: spellcheck.count v/s spellcheck.alternativeTermCount
> >
> > Hello Everyone,
> >   I got confusion between spellcheck.count and
> > spellcheck.alternativeTermCount in Solr. Any help in details?
> >
>


Re: Solrcloud sizing

2015-02-17 Thread Erick Erickson
Well, it's really impossible to say, you have to prototype. Here's something
explaining this a bit:
https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

This is a major undertaking. Your question is simply impossible to
answer without prototyping as in
the link above, anything else is guesswork. And at this scale being
wrong is expensive.

So my advice would be to test on a small "cluster", say a 2 shard
system and see what kind of
performance you can get and extrapolate from there, with your data,
your queries etc. Perhaps
work with your client on a limited-scope proof-of-concept. Plan on
spending some time tuning
even the small cluster to get enough answers to form a go/no-go decision.

Best,
Erick


On Tue, Feb 17, 2015 at 4:40 PM, Dominique Bejean
 wrote:
> One of our customers needs to index 15 billions document in a collection.
> As this volume is not usual for me, I need some advices about solrcloud
> sizing (how much servers, nodes, shards, replicas, memory, ...)
>
> Some inputs :
>
>- Collection size : 15 billions document
>- Collection update : 8 millions new documents / days + 8 millions
>deleted documents / days
>- Updates occur during the night without queries
>- Queries occur during the day without updates
>- Document size is nearly 300 bytes
>- Document fields are mainly string including one date field
>- The same terms will occurs several time for a given field (from 10 to
>100.000)
>- Query will use a date period and a filter query on one or more fields
>- 10.000 queries / minutes
>- expected response time < 500ms
>- 1 billion documents indexed = 5Gb index size
>- no ssd drives
>
> So, what is you advice about :
>
> # of shards : 15 billions documents -> 16 shards ?
> # of replicas ?
> # of nodes = # of shards ?
> heap memory per node ?
> direct memory per node ?
>
> Thank your advices ?
>
> Dominique


Confirm Solr index corruption

2015-02-17 Thread Thomas Mathew
Hi All,

I use Solr 4.4.0 in a master-slave configuration. Last week, the master
server ran out of disk (logs got too big too quick due to a bug in our
system). Because of this, we weren't able to add new docs to an index. The
first thing I did was to delete a few old log files to free up disk space
(later I moved the other logs to free up disk). The index is working fine
even after this fiasco.

The next day, a colleague of mine pointed out that we may be missing a few
documents in the index. I suspect the above scenario may have broken the
index. I ran the checkIndex against this index. It didn't mention of any
corruption though.

Right now, the index has about 25k docs. I haven't optimized this index in
a while, and there are about 4000 deleted-docs. How can I confirm if we
lost anything? If we've lost docs, is there a way to recover it?

Thanks in advance!!

Regards
Thomas


Re: Solrcloud sizing

2015-02-17 Thread Dominique Bejean
Thank you Erick.

This was also my own opinion.

2015-02-18 7:12 GMT+01:00 Erick Erickson :

> Well, it's really impossible to say, you have to prototype. Here's
> something
> explaining this a bit:
>
> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> This is a major undertaking. Your question is simply impossible to
> answer without prototyping as in
> the link above, anything else is guesswork. And at this scale being
> wrong is expensive.
>
> So my advice would be to test on a small "cluster", say a 2 shard
> system and see what kind of
> performance you can get and extrapolate from there, with your data,
> your queries etc. Perhaps
> work with your client on a limited-scope proof-of-concept. Plan on
> spending some time tuning
> even the small cluster to get enough answers to form a go/no-go decision.
>
> Best,
> Erick
>
>
> On Tue, Feb 17, 2015 at 4:40 PM, Dominique Bejean
>  wrote:
> > One of our customers needs to index 15 billions document in a collection.
> > As this volume is not usual for me, I need some advices about solrcloud
> > sizing (how much servers, nodes, shards, replicas, memory, ...)
> >
> > Some inputs :
> >
> >- Collection size : 15 billions document
> >- Collection update : 8 millions new documents / days + 8 millions
> >deleted documents / days
> >- Updates occur during the night without queries
> >- Queries occur during the day without updates
> >- Document size is nearly 300 bytes
> >- Document fields are mainly string including one date field
> >- The same terms will occurs several time for a given field (from 10
> to
> >100.000)
> >- Query will use a date period and a filter query on one or more
> fields
> >- 10.000 queries / minutes
> >- expected response time < 500ms
> >- 1 billion documents indexed = 5Gb index size
> >- no ssd drives
> >
> > So, what is you advice about :
> >
> > # of shards : 15 billions documents -> 16 shards ?
> > # of replicas ?
> > # of nodes = # of shards ?
> > heap memory per node ?
> > direct memory per node ?
> >
> > Thank your advices ?
> >
> > Dominique
>


Re: Boosting by calculated distance buckets

2015-02-17 Thread David Smiley
Raav,

You may need to actually subscribe to the solr-user list.  Nabble seems to
not be working to well.
p.s. I’m on vacation this week so I can’t be very responsive

First of all... it's not clear you actually want to *boost* (since you seem
to not care about the relevancy score), it seems you want to *sort* based on
a function query.  So simply sort by the function query instead of using the
'bq' param.

Have you read about geodist() in the Solr Reference Guide?  It returns the
spatial distance.  With that and other function queries like map() you could
do something like sum(map(geodist(),0,40,40,0),map(geodist(),0,20,10,0)) and
you could put that into your main function query.  I purposefully overlapped
the map ranges so that I didn't have to deal with double-counting an edge. 
The only thing I don't like about this is that the distance is going to be
calculated as many times as you reference the function, and it's slow.  So
you may want to write your own function query (internally called a
ValueSource), which is relatively easy to do in Solr.

~ David


sraav wrote
> David,
> 
> Thank you for your prompt response. I truly appreciate it. Also, My post
> was not accepted the first two times so I am posting it again one final
> time. 
> 
> In my case I want to turn off the dependency on scoring and let solr use
> just the boost values that I pass to each function to sort on. Here is a
> quick example of how I got that to work with non-geo fields which are
> present in the document and are not dynamically calculated. Using edismax
> ofcourse.
> 
> I was able to turn off the scoring (i mean remove the dependency on score)
> on the result set and drive the sort by the boost that I mentioned in the
> below query. In the below function For example - if the "document1"
> matches the date listed it gets a boost = 5. If the same document matches
> the owner AND product  - it will get an additional boost of 5 more. The
> total boost of this "document1" is 10. From what ever I have seen, it
> seems like i was able to turn off of negate the affects of solr score.
> There was a query norm param that was affecting the boost but it seemed to
> be a constant around 0.70345...most of the time for any fq mentioned).  
> 
> bq = {!func}sum(if(query({!v='datelisted:[2015-01-22T00:00:00.000Z TO
> *]'}),5,0),if(and(query({!v='owner:*BRAVE*'}),query({!v='PRODUCT:*SWORD*'}),5,0))
> 
> What I am trying to do is to add additional boosting function to the
> custom boost that will eventually tie into the above function and boost
> value.
> 
> For example - if "document1" falls in 0-20 KM range i would like to add a
> boost of 50 making the final boost value to be 60. If it falls under
> 20-40KM - i would like to add a boost of 40 and so on.  
> 
> Is there a way we can do this?  Please let me know if I can provide better
> clarity on the use case that I am trying to solve. Thank you David.
> 
> Thanks,
> Raav





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 Independent Lucene/Solr search consultant, 
http://www.linkedin.com/in/davidwsmiley
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-by-calculated-distance-buckets-tp4186504p4187112.html
Sent from the Solr - User mailing list archive at Nabble.com.


Collations problem even term is available in documents.

2015-02-17 Thread Nitin Solanki
Hi,
I am misspelling a query "hota hai" to "hota hain". Inside
collations, "hota hai" is not coming, instead of that "hot main, home have.
etc" are coming. I have 37 documents where "hota hai" is present.

*URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:"hota
hain"&wt=json&indent=true&shards.qt=/spell

*Configuration:*
*solrconfig.xml:*


textSpellCi

  default
  gram_ci
  solr.DirectSolrSpellChecker
  internal
  0.5
  2
  0
  5
  2
  0.9
  freq





  gram_ci
  default
  on
  true
  15
  true
  15
  true
  1000
  3000
  true


  spellcheck

  

*Schema.xml: *