Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread dinesh naik
Thanks Eric and Upayavira for your inputs.

Is there a way i can associate this to a unique id of document, either
using schema browser or TermsComponent?

Best Regards,
Dinesh Naik

On Tue, Jun 30, 2015 at 2:55 AM, Upayavira  wrote:

> Use the schema browser on the admin UI, and click the "load term info"
> button. It'll show you the terms in your index.
>
> You can also use the analysis tab which will show you how it would
> tokenise stuff for a specific field.
>
> Upayavira
>
> On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote:
> > Hi Eric,
> > By compressed value I meant value of a field after removing special
> > characters . In my example its "-". Compressed form of red-apple is
> > redapple .
> >
> > I wanted to know if we can see the analyzed version of fields .
> >
> > For example if I use ngram on a field , how do I see the analyzed values
> > in index ?
> >
> >
> >
> >
> > -Original Message-
> > From: "Erick Erickson" 
> > Sent: ‎29-‎06-‎2015 18:12
> > To: "solr-user@lucene.apache.org" 
> > Subject: Re: Reading indexed data from solr 5.1.0 using admin/luke?
> >
> > Not quite sure what you mean by "compressed values". admin/luke
> > doesn't show the results of the compression of the stored values, there's
> > no way I know of to do that.
> >
> > Best,
> > Erick
> >
> > On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik 
> > wrote:
> > > Hi all,
> > >
> > > Is there a way to read the indexed data for field on which the
> > > analysis/processing  has been done ?
> > >
> > > I know using admin GUI we can see field wise analysis But how can i get
> > > hold on the complete document using admin/luke? or any other way?
> > >
> > > For example, if i have 2 fields called name and compressedname.
> > >
> > > name has values like apple, green-apple,red-apple
> > > compressedname has values like apple,greenapple,redapple
> > >
> > > Even though i make both these field indexed=true and stored=true
> > >
> > > I am not able to see the compressed values using
> admin/luke?id=
> > >
> > > in response i see something like this-
> > >
> > >
> > > 
> > > string
> > > ITS--
> > > ITS--
> > > GREEN-APPLE
> > > GREEN-APPLE
> > > 1.0
> > > 0
> > > 
> > > 
> > > string
> > > ITS--
> > > ITS--
> > > GREEN-APPLE
> > > GREEN-APPLE
> > > 1.0
> > > 0
> > > 
> > >
> > >
> > >
> > > --
> > > Best Regards,
> > > Dinesh Naik
>



-- 
Best Regards,
Dinesh Naik


Re: Some guidance on memory requirements/usage/tuning

2015-06-30 Thread Toke Eskildsen
On Tue, 2015-06-30 at 16:39 +1000, Caroline Hind wrote:
> We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same
> time increased the physical RAM from 24Gb to 96Gb. We run multiple
> cores on this one server, approximately 20 in total, but primarily we
> have one that is huge in comparison to all of the others. This very
> large core consists of nearly 62 million documents, and the index is
> around 45Gb in size.(Is that index unreasonably large, should it be
> sharded?) 

The size itself sounds fine, but your performance numbers below are
worrying. As always it is hard to give advice on setups:
https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

> I'm really unfamiliar with how we should be configuring our JVM.
> Currently we have it set to a maximum of 48Gb, up until yesterday it
> was set to 24Gb and we've been seeing the dreaded OOME messages from
> time to time.

There is a shift in pointer size when one passes the 32GB mark for JVM
memory. Your 48GB allocation gives you about the same amount of heap as
a 32GB allocation would:
https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
Consider running two Solrs on the same machine instead. Maybe one for
the large collection and one for the rest?

Anyway, OOMs with ~32GB of heap for 62M documents indicates that you are
doing heavy sorting, grouping or faceting on fields that does not have
DocValues enabled. Could you describe what you do in that regard?

> The type of queries that are run can return anything from
> 1 million to 9.5 million documents, and typically run for anything from
> 20 to 45 minutes. 

Such response times are a thousand times higher than what most people
are seeing. There might be a perfectly fine reason for those response
times, but I suggest we sanity check them: Could you show us a typical
query and tell us how many concurrent queries you normally serve?

- Toke Eskildsen, State and University Library, Denmark




Solr DIH from MySQL with unique ID

2015-06-30 Thread kurt
Hello. 

I have a question about the Solr Data Import Handler. I'm using Solr 5.2.1
on a Linux server with 32G ram.

I have five different collections, and for each collection, I'm trying to
import data from a MySQL database. All of the MySQL queries work properly in
MySQL, and previously I was able to use all of these queries building an
index with Lucid Search 2.9 (Solr 4.7). 

The problem is that when starting Solr it will not finish starting, but does
not give an error, and the admin GUI does not show. If I try to start with
only one collection, it works okay.

I am assuming that the problem has to do either with my incorrect execution
of the data-config file, or use of the unique IDs.

Here is what I have. 

In the data config file (example here, all other data config files are
similar)





 
 
 








 


In schema.xml file (I use the same schema.xml for each collection)



id

(all fields are added, but I'm not showing here for the sake of brevity)

When Solr is starting, this is as far as it goes:

INFO  - org.apache.solr.core.SolrConfig; Loaded SolrConfig: solrconfig.xml
INFO  - org.apache.solr.schema.IndexSchema; Reading Solr Schema from
/usr/lib/solr-5.2.1/server/solr/Books/conf/schema.xml
INFO  - org.apache.solr.core.SolrConfig; Loaded SolrConfig: solrconfig.xml
INFO  - org.apache.solr.schema.IndexSchema; Reading Solr Schema from
/usr/lib/solr-5.2.1/server/solr/BookStores/conf/schema.xml
INFO  - org.apache.solr.core.SolrConfig; Loaded SolrConfig: solrconfig.xml
INFO  - org.apache.solr.schema.IndexSchema; Reading Solr Schema from
/usr/lib/solr-5.2.1/server/solr/BookSales/conf/schema.xml
INFO  - org.apache.solr.schema.IndexSchema; [Books] Schema name=Schema521
INFO  - org.apache.solr.schema.IndexSchema; [BookStores] Schema name=
Schema521
INFO  - org.apache.solr.schema.IndexSchema; [BookSales] Schema name=
Schema521
INFO  - org.apache.solr.schema.IndexSchema; unique key field: id
INFO  - org.apache.solr.schema.IndexSchema; unique key field: id
INFO  - org.apache.solr.schema.IndexSchema; unique key field: id


I've tried many different alternatives in the above dataconfig, such as

in the query select:
book_id AS id
book_id AS 'id'

and adding pk="book_id" to the entity.

While I'm trying to fix this problem, I also do not understand 

1. Does Solr require that the unique key for a collection must be " id ", or
can it be any name, such as "book_id"?

Any help or guidance would be appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-DIH-from-MySQL-with-unique-ID-tp4214872.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Suggester not working.

2015-06-30 Thread ssharma7...@gmail.com
davidphilip cherian & Alessandro Benedetti,
Thanks for you feedback & links, I was able to get the suggestions from
suggester component.

Thanks & Regards,
Sachin Vyas.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Suggester-not-working-tp4214086p4214873.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fq versus q

2015-06-30 Thread Esther Goldbraich
Thank you Erick.
This solution fits part of our queries, will adopt it for those. Yet we 
have use-cases in which the results can not be cached.

Everyone,
What do you think about our assumptions and conclusions?

 As a general rule of thumb, at least in our case, would you please 
comment
 on the following assumptions/conclusions (note, all assuming that we 
don't
 want to cache filters, and the 'fq' part is only used to avoid scoring):

 1) If the query sorts by any other field than score (e.g. date), we can
 put the 'fq' part in 'q'. Scoring won't be done, and we won't pay the 
cost
 of building the filter, and then discarding it when the query completes.

 2) In fact, if we don't intend to cache the filter, we might as well just
 use only 'q'. At least, on our dataset (this may definitely *not* be a
 general statement).

 3) If we sort by relevance, but want to avoid scoring of the 'filter'
 clauses, is there anything we can do on 4.7?
 3.1) The ^= operator is only available in 5.1, which seems exactly what 
we
 need.
 3.2) Adding the filter clauses to the query w/ boost 0 will still compute
 their score, only they won't affect the overall document score correct?

 4) A more general question -- with the addition of ^= to query clauses in
 5.1 (resolved to ConstantScoreQuery down stream), what is the use case 
for
 using fq w/ !cache=false? As we understand it, users who use this want to
 compute a filter but not cache it. As we see, there is some added cost to
 building a filter, so if you pay this cost over and over, would it not be
 better to just use ^=?

Have a good day,
Esther



From:
Erick Erickson 
To:
solr-user@lucene.apache.org
Date:
25/06/2015 03:27 PM
Subject:
Re: fq versus q



Side note on dates and fqs. If you're using NOW in your date
expressions you may be able to re-use fqs by using "date math",
see:
https://lucidworks.com/blog/date-math-now-and-filter-queries/
Of course this may not be applicable in your situation...

FWIW,
Erick

On Thu, Jun 25, 2015 at 8:03 AM, Shai Erera  wrote:
> The tables came across corrupt, here they are (times in ms):
>
> Caches enabled:
>
>   q fq delta
> original query28295267
> w/o grouping  58325267
> w/o sort on date  28293265
>
> Caches disabled:
>
>   q fq delta
> original query4113  4381   268
> w/o grouping  131   407276
> w/o sort on date  4217  4400   183
>
> Shai
>
> On Thu, Jun 25, 2015 at 2:04 PM, Esther Goldbraich 

> wrote:
>
>> Thank you all for collaborative thinking!
>>
>> Ran additional benchmarks as proposed. Some results:
>>
>> All solr caches are enabled (queryResultCache hit ratio = 0.02):
>>
>>
>> q
>> fq {!cache=false}
>> delta
>> original query
>> 28
>> 295
>> 267
>> w/o grouping
>> 58
>> 325
>> 267
>> w/o sort on date
>> 28
>> 293
>> 265
>>
>> All solr caches are disabled (except built in lucene field cache):
>>
>>
>> q
>> fq {!cache=false}
>> delta
>> original query
>> 4113
>> 4381
>> 268
>> w/o grouping
>> 131
>> 407
>> 276
>> w/o sort on date
>> 4217
>> 4400
>> 183
>>
>> *median runtime in ms
>>
>> As you can see, disabling grouping and/or sorting does not affect the
>> results much. That is, the difference between running with
>> 'fq{!cache=false}' or with 'q' is the same, while 'fq' performs slower 
in
>> all cases.
>>
>> Is it correct to assume then that the performance difference comes from
>> computing the filter (traversing the posting lists and building the
>> bitset)?
>> Does it also mean that not caching the filter does not affect grouping?
>> I.e. perhaps the second pass of grouping uses the already computed 
filter,
>> and does not attempt to fetch it from the cache?
>>
>> As a general rule of thumb, at least in our case, would you please 
comment
>> on the following assumptions/conclusions (note, all assuming that we 
don't
>> want to cache filters, and the 'fq' part is only used to avoid 
scoring):
>>
>> 1) If the query sorts by any other field than score (e.g. date), we can
>> put the 'fq' part in 'q'. Scoring won't be done, and we won't pay the 
cost
>> of building the filter, and then discarding it when the query 
completes.
>>
>> 2) In fact, if we don't intend to cache the filter, we might as well 
just
>> use only 'q'. At least, on our dataset (this may definitely *not* be a
>> general statement).
>>
>> 3) If we sort by relevance, but want to avoid scoring of the 'filter'
>> clauses, is there anything we can do on 4.7?
>> 3.1) The ^= operator is only available in 5.1, which seems exactly what 
we
>> need.
>> 3.2) Adding the filter clauses to the query w/ boost 0 will still 
compute
>> their score, only they won't affect the overall document score correct?
>>
>> 4) A more general question -- with the addition of ^= to query clauses 
in
>> 5.1 (resolved to ConstantScoreQuery down stream), what is the use case 
for
>> using fq w/ !cache=false? As we understand it, users who use this want 
to
>> compute a filter but not cac

Re: Solr Suggester not working.

2015-06-30 Thread Vincenzo D'Amore
Hi, can you post your final configuration?

On Tue, Jun 30, 2015 at 9:57 AM, ssharma7...@gmail.com <
ssharma7...@gmail.com> wrote:

> davidphilip cherian & Alessandro Benedetti,
> Thanks for you feedback & links, I was able to get the suggestions from
> suggester component.
>
> Thanks & Regards,
> Sachin Vyas.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Suggester-not-working-tp4214086p4214873.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Questions regarding autosuggest (Solr 5.2.1)

2015-06-30 Thread Thomas Michael Engelke
 God damn. Thank you.

*ashamed*

Am 30.06.2015 00:21 schrieb Erick Erickson: 

> Try not putting it in double quotes?
> 
> Best,
> Erick
> 
> On Mon, Jun 29, 2015 at 12:22 PM, Thomas Michael Engelke
>  wrote:
> 
>> A friend and I are trying to develop some software using Solr in the 
>> background, and with that comes alot of changes. We're used to older 
>> versions (4.3 and below). We especially have problems with the autosuggest 
>> feature. This is the field definition (schema.xml) for our autosuggest 
>> field: > stored="true" required="false" multiValued="true" /> ... > source="name" dest="autosuggest" /> ... > class="solr.TextField" positionIncrementGap="100">  
>>  > class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" 
>> splitOnNumerics="1" generateWordParts="1" generateNumberParts="1" 
>> catenateWords="1" catenateNumbers="0" catenateAll="0" preserveOriginal="0"/> 
>>  > class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" 
>> enablePositionIncrements="true"
format="snowball"/>  
   
Afterwards, we defined an autosuggest component to use this field, like this 
(solrconfig.xml):   mySuggester FuzzyLookupFactory 
suggester_fuzzy_dir DocumentDictionaryFactory suggest "autosuggest" false false  
 And add a requesthandler to test out the functionality: 
  true 10 mySuggester 
  suggest   
However, trying to start the core that has this configuration, a long exception 
occurs, telling us this: Error in configuration: "autosuggest" is not defined 
in the schema Now, that seems to be wrong. Any idea how to fix that?
 

Re: Solr Suggester not working.

2015-06-30 Thread ssharma7...@gmail.com
Vincenzo D'Amore,
The following is my (CURRENT) Working Final Configuration:

*Scheme.xml*

.
.


.
.



.
.































.
.



*solrconfig.xml*
..
..

   
  textSuggester
  FreeTextLookupFactory
  DocumentDictionaryFactory
  text
  c_text
  true
   
   
  docNameSuggester
  FreeTextLookupFactory
  DocumentDictionaryFactory
  document_name
  c_document_name
  true
   


  

  json
  true
  5
  false-->

  textSuggester
  docNameSuggester


  suggest

  
..
..

*Solr Query URL*
http://localhost:8983/solr/collection1/suggestHandler?&wt=xml&suggest.q=document

*Suggester Output*




  0
  16


  

  5
  

  document
  512409557603043072
  


  document1
  512409557603043072
  


  document2
  512409557603043072
  


  document3
  512409557603043072
  


  document4
  512409557603043072
  

  

  
  

  3
  

  document
  10933347601771902
  


  documents
  4373339040708760
  


  documenting
  2186669520354380
  

  

  







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Suggester-not-working-tp4214086p4214929.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimize status

2015-06-30 Thread Erick Erickson
I've actually seen this happen right in front of my eyes "in the
field". However, that was a very high-performance environment. My
assumption was that fragmented index files were causing more disk
seeks especially for the first-pass query response in distributed
mode. So, if the problem is similar, it should go away if you test
requesting fewer docs. Note: This is not a cure for your problem, but
would be useful for identifying if it's similar to what I saw.

NOTE: the symptom was a significant disparity between the QTime (which
does not measure assembling the document) and the response time. _If_
that's the case and _if_ my theory that disk access is the culprit,
then SOLR-5478 and SOLR-6810 should be a big help as they remove the
first-pass decompression for distributed searches.

If that hypothesis has any validity, I'd expect you're running on
spinning-disks rather than SSDs, is that so?

Best,
Erick

On Tue, Jun 30, 2015 at 2:07 AM, Upayavira  wrote:
> We need to work out why your performance is bad without optimise. What
> version of Solr are you using? Can you confirm that your config is using
> the TieredMergePolicy?
>
> Upayavira
>
> Oe, Jun 30, 2015, at 04:48 AM, Summer Shire wrote:
>> Hi Upayavira and Erick,
>>
>> There are two things we are talking about here.
>>
>> First: Why am I optimizing? If I don’t our SEARCH (NOT INDEXING)
>> performance is 100% worst.
>> The problem lies in the number of total segments. We have to have max
>> segments 1 or 2.
>> I have done intensive performance related tests around number of
>> segments, merge factor or changing the Merge policy.
>>
>> Second: Solr does not perform better for me without an optimize. So now
>> that I have to optimize the second issue
>> is updating concurrently during an optimize. If I update when an optimize
>> is happening the optimize takes 5 times as long as
>> the normal optimize.
>>
>> So is there any way other than creating a postOptimize hook and writing
>> the status in a file and somehow making it available to the indexer.
>> All of this just sounds traumatic :)
>>
>> Thanks
>> Summer
>>
>>
>> > On Jun 29, 2015, at 5:40 AM, Erick Erickson  
>> > wrote:
>> >
>> > Steven:
>> >
>> > Yes, but
>> >
>> > First, here's Mike McCandles' excellent blog on segment merging:
>> > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>> >
>> > I think the third animation is the TieredMergePolicy. In short, yes an
>> > optimize will reclaim disk space. But as you update, this is done for
>> > you anyway. About the only time optimizing is at all beneficial is
>> > when you have a relatively static index. If you're continually
>> > updating documents, and by that I mean replacing some existing
>> > documents, then you'll immediately start generating "holes" in your
>> > index.
>> >
>> > And if you _do_ optimize, you wind up with a huge segment. And since
>> > the default policy tries to merge segments of roughly the same size,
>> > it accumulates deletes for quite a while before they merged away.
>> >
>> > And if you don't update existing docs or delete docs, then there's no
>> > wasted space anyway.
>> >
>> > Summer:
>> >
>> > First off, why do you care about not updating during optimizing?
>> > There's no good reason you have to worry about that, you can freely
>> > update while optimizing.
>> >
>> > But frankly I have to agree with Upayavira that on the face of it
>> > you're doing a lot of extra work. See above, but you optimize while
>> > indexing, so immediately you're rather defeating the purpose.
>> > Personally I'd only optimize relatively static indexes and, by
>> > definition, you're index isn't static since the second process is just
>> > waiting to modify it.
>> >
>> > Best,
>> > Erick
>> >
>> > On Mon, Jun 29, 2015 at 8:15 AM, Steven White  wrote:
>> >> Hi Upayavira,
>> >>
>> >> This is news to me that we should not optimize and index.
>> >>
>> >> What about disk space saving, isn't optimization to reclaim disk space or
>> >> is Solr somehow does that?  Where can I read more about this?
>> >>
>> >> I'm on Solr 5.1.0 (may switch to 5.2.1)
>> >>
>> >> Thanks
>> >>
>> >> Steve
>> >>
>> >> On Mon, Jun 29, 2015 at 4:16 AM, Upayavira  wrote:
>> >>
>> >>> I'm afraid I don't understand. You're saying that optimising is causing
>> >>> performance issues?
>> >>>
>> >>> Simple solution: DO NOT OPTIMIZE!
>> >>>
>> >>> Optimisation is very badly named. What it does is squashes all segments
>> >>> in your index into one segment, removing all deleted documents. It is
>> >>> good to get rid of deletes - in that sense the index is "optimized".
>> >>> However, future merges become very expensive. The best way to handle
>> >>> this topic is to leave it to Lucene/Solr to do it for you. Pretend the
>> >>> "optimize" option never existed.
>> >>>
>> >>> This is, of course, assuming you are using something like Solr 3.5+.
>> >>>
>> >>> Upayavira
>> >>>
>> >>> On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
>> 
>> 

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Erick Erickson
In short, not unless you want to get into low-level Lucene coding.
Inverted indexes are, well, inverted so their very structure makes
this difficult. It looks like this:

But I'm not convinced yet that this isn't an XY problem. What is the
high-level problem you're trying to solve here? Maybe there's another
way to go about it.

Best,
Erick

On Tue, Jun 30, 2015 at 3:32 AM, dinesh naik  wrote:
> Thanks Eric and Upayavira for your inputs.
>
> Is there a way i can associate this to a unique id of document, either
> using schema browser or TermsComponent?
>
> Best Regards,
> Dinesh Naik
>
> On Tue, Jun 30, 2015 at 2:55 AM, Upayavira  wrote:
>
>> Use the schema browser on the admin UI, and click the "load term info"
>> button. It'll show you the terms in your index.
>>
>> You can also use the analysis tab which will show you how it would
>> tokenise stuff for a specific field.
>>
>> Upayavira
>>
>> On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote:
>> > Hi Eric,
>> > By compressed value I meant value of a field after removing special
>> > characters . In my example its "-". Compressed form of red-apple is
>> > redapple .
>> >
>> > I wanted to know if we can see the analyzed version of fields .
>> >
>> > For example if I use ngram on a field , how do I see the analyzed values
>> > in index ?
>> >
>> >
>> >
>> >
>> > -Original Message-
>> > From: "Erick Erickson" 
>> > Sent: ‎29-‎06-‎2015 18:12
>> > To: "solr-user@lucene.apache.org" 
>> > Subject: Re: Reading indexed data from solr 5.1.0 using admin/luke?
>> >
>> > Not quite sure what you mean by "compressed values". admin/luke
>> > doesn't show the results of the compression of the stored values, there's
>> > no way I know of to do that.
>> >
>> > Best,
>> > Erick
>> >
>> > On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik 
>> > wrote:
>> > > Hi all,
>> > >
>> > > Is there a way to read the indexed data for field on which the
>> > > analysis/processing  has been done ?
>> > >
>> > > I know using admin GUI we can see field wise analysis But how can i get
>> > > hold on the complete document using admin/luke? or any other way?
>> > >
>> > > For example, if i have 2 fields called name and compressedname.
>> > >
>> > > name has values like apple, green-apple,red-apple
>> > > compressedname has values like apple,greenapple,redapple
>> > >
>> > > Even though i make both these field indexed=true and stored=true
>> > >
>> > > I am not able to see the compressed values using
>> admin/luke?id=
>> > >
>> > > in response i see something like this-
>> > >
>> > >
>> > > 
>> > > string
>> > > ITS--
>> > > ITS--
>> > > GREEN-APPLE
>> > > GREEN-APPLE
>> > > 1.0
>> > > 0
>> > > 
>> > > 
>> > > string
>> > > ITS--
>> > > ITS--
>> > > GREEN-APPLE
>> > > GREEN-APPLE
>> > > 1.0
>> > > 0
>> > > 
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards,
>> > > Dinesh Naik
>>
>
>
>
> --
> Best Regards,
> Dinesh Naik


Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread dinesh naik
Hi Erick,
This is mainly for debugging purpose. If i have 20M records and few fields
in some of the documents are not indexed as expected or something went
wrong during indexing then how do we pin point the exact issue and fix the
problem?


Best Regards,
Dinesh Naik

On Tue, Jun 30, 2015 at 5:56 PM, Erick Erickson 
wrote:

> In short, not unless you want to get into low-level Lucene coding.
> Inverted indexes are, well, inverted so their very structure makes
> this difficult. It looks like this:
>
> But I'm not convinced yet that this isn't an XY problem. What is the
> high-level problem you're trying to solve here? Maybe there's another
> way to go about it.
>
> Best,
> Erick
>
> On Tue, Jun 30, 2015 at 3:32 AM, dinesh naik 
> wrote:
> > Thanks Eric and Upayavira for your inputs.
> >
> > Is there a way i can associate this to a unique id of document, either
> > using schema browser or TermsComponent?
> >
> > Best Regards,
> > Dinesh Naik
> >
> > On Tue, Jun 30, 2015 at 2:55 AM, Upayavira  wrote:
> >
> >> Use the schema browser on the admin UI, and click the "load term info"
> >> button. It'll show you the terms in your index.
> >>
> >> You can also use the analysis tab which will show you how it would
> >> tokenise stuff for a specific field.
> >>
> >> Upayavira
> >>
> >> On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote:
> >> > Hi Eric,
> >> > By compressed value I meant value of a field after removing special
> >> > characters . In my example its "-". Compressed form of red-apple is
> >> > redapple .
> >> >
> >> > I wanted to know if we can see the analyzed version of fields .
> >> >
> >> > For example if I use ngram on a field , how do I see the analyzed
> values
> >> > in index ?
> >> >
> >> >
> >> >
> >> >
> >> > -Original Message-
> >> > From: "Erick Erickson" 
> >> > Sent: ‎29-‎06-‎2015 18:12
> >> > To: "solr-user@lucene.apache.org" 
> >> > Subject: Re: Reading indexed data from solr 5.1.0 using admin/luke?
> >> >
> >> > Not quite sure what you mean by "compressed values". admin/luke
> >> > doesn't show the results of the compression of the stored values,
> there's
> >> > no way I know of to do that.
> >> >
> >> > Best,
> >> > Erick
> >> >
> >> > On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik <
> dineshkumarn...@gmail.com>
> >> > wrote:
> >> > > Hi all,
> >> > >
> >> > > Is there a way to read the indexed data for field on which the
> >> > > analysis/processing  has been done ?
> >> > >
> >> > > I know using admin GUI we can see field wise analysis But how can i
> get
> >> > > hold on the complete document using admin/luke? or any other way?
> >> > >
> >> > > For example, if i have 2 fields called name and compressedname.
> >> > >
> >> > > name has values like apple, green-apple,red-apple
> >> > > compressedname has values like apple,greenapple,redapple
> >> > >
> >> > > Even though i make both these field indexed=true and stored=true
> >> > >
> >> > > I am not able to see the compressed values using
> >> admin/luke?id=
> >> > >
> >> > > in response i see something like this-
> >> > >
> >> > >
> >> > > 
> >> > > string
> >> > > ITS--
> >> > > ITS--
> >> > > GREEN-APPLE
> >> > > GREEN-APPLE
> >> > > 1.0
> >> > > 0
> >> > > 
> >> > > 
> >> > > string
> >> > > ITS--
> >> > > ITS--
> >> > > GREEN-APPLE
> >> > > GREEN-APPLE
> >> > > 1.0
> >> > > 0
> >> > > 
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Best Regards,
> >> > > Dinesh Naik
> >>
> >
> >
> >
> > --
> > Best Regards,
> > Dinesh Naik
>



-- 
Best Regards,
Dinesh Naik


Re: Some guidance on memory requirements/usage/tuning

2015-06-30 Thread Erick Erickson
bq: The type of queries that are run can return anything from 1
million to 9.5 million documents, and typically run for anything from
20 to 45 minutes.

Uhhh, are you literally setting the &rows parameter to over 9.5M and
getting that many docs all at once? Or is that just numFound and
you're _really_ returning just a relatively few docs? Because if
you're returning 9.5M rows, that's really an anti-pattern for Solr.
There are other ways to do some of this (cursor mark, streaming
aggregation,  export). But before we go there I want to be sure I'm
understanding the use-case.

Because I agree with Toke, the performance numbers you give are wy
out of what I would expect, so clearly I don't get something about
your setup.

Best,
Erick

On Tue, Jun 30, 2015 at 3:43 AM, Toke Eskildsen  
wrote:
> On Tue, 2015-06-30 at 16:39 +1000, Caroline Hind wrote:
>> We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same
>> time increased the physical RAM from 24Gb to 96Gb. We run multiple
>> cores on this one server, approximately 20 in total, but primarily we
>> have one that is huge in comparison to all of the others. This very
>> large core consists of nearly 62 million documents, and the index is
>> around 45Gb in size.(Is that index unreasonably large, should it be
>> sharded?)
>
> The size itself sounds fine, but your performance numbers below are
> worrying. As always it is hard to give advice on setups:
> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
>> I'm really unfamiliar with how we should be configuring our JVM.
>> Currently we have it set to a maximum of 48Gb, up until yesterday it
>> was set to 24Gb and we've been seeing the dreaded OOME messages from
>> time to time.
>
> There is a shift in pointer size when one passes the 32GB mark for JVM
> memory. Your 48GB allocation gives you about the same amount of heap as
> a 32GB allocation would:
> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
> Consider running two Solrs on the same machine instead. Maybe one for
> the large collection and one for the rest?
>
> Anyway, OOMs with ~32GB of heap for 62M documents indicates that you are
> doing heavy sorting, grouping or faceting on fields that does not have
> DocValues enabled. Could you describe what you do in that regard?
>
>> The type of queries that are run can return anything from
>> 1 million to 9.5 million documents, and typically run for anything from
>> 20 to 45 minutes.
>
> Such response times are a thousand times higher than what most people
> are seeing. There might be a perfectly fine reason for those response
> times, but I suggest we sanity check them: Could you show us a typical
> query and tell us how many concurrent queries you normally serve?
>
> - Toke Eskildsen, State and University Library, Denmark
>
>


SolrCloud 5.2.1 upgrade

2015-06-30 Thread Vincenzo D'Amore
Hi All,

I have a bunch of java clients connecting to a solrcloud cluster 4.8.1 with
Solrj 4.8.0.
The question is, I have to switch clients and cluster to the new version at
same time?
Could I upgrade the cluster and in the following months upgrade clients?

BTW, looking at Solrj 5.2.1 I have seen few class names has changed, e.g.
CloudSolrServer has changed in CloudSolrClient, but interface look like the
same.
Well, is there a changelog or a documentation that explain what are the
main differences?

Best regards,
Vincenzo


Re: Solr DIH from MySQL with unique ID

2015-06-30 Thread Erick Erickson
Two very quick questions:

1> how big is your transaction log? Well, do you even have one? If
Solr is abnormally terminated, it'll replay the tlog on startup. The
scenario here would be something like you were running DIH without any
kind of hard commit specified and killed Solr for some reason. Then,
every time it starts up it'll try to replay the log. This is actually
unlikely since you should be seeing a message in your Solr logs, but
perhaps it's not in the fragment you pasted.

2> Do you have autosuggest configured with buildOnStartup set to true?
Also unlikely since you should see a message in the solr log as well.

Does the query run in a reasonable time outside of Solr (i.e. if you
just submit it to SQL through a command line or some such)?

Best,
Erick

On Tue, Jun 30, 2015 at 3:52 AM, kurt  wrote:
> Hello.
>
> I have a question about the Solr Data Import Handler. I'm using Solr 5.2.1
> on a Linux server with 32G ram.
>
> I have five different collections, and for each collection, I'm trying to
> import data from a MySQL database. All of the MySQL queries work properly in
> MySQL, and previously I was able to use all of these queries building an
> index with Lucid Search 2.9 (Solr 4.7).
>
> The problem is that when starting Solr it will not finish starting, but does
> not give an error, and the admin GUI does not show. If I try to start with
> only one collection, it works okay.
>
> I am assuming that the problem has to do either with my incorrect execution
> of the data-config file, or use of the unique IDs.
>
> Here is what I have.
>
> In the data config file (example here, all other data config files are
> similar)
>
> 
>  url="jdbc:mysql://localhost:3306/Books" user="myuser" password="12345" />
> 
> 
>  
>  
>  
> 
> 
>  name="full_name_solr" />
> 
> 
>  name="publication_type" />
> 
> 
>  
> 
>
> In schema.xml file (I use the same schema.xml for each collection)
>
>  multiValued="false" required="true" />
>  multiValued="false" required="false" />
> id
>
> (all fields are added, but I'm not showing here for the sake of brevity)
>
> When Solr is starting, this is as far as it goes:
>
> INFO  - org.apache.solr.core.SolrConfig; Loaded SolrConfig: solrconfig.xml
> INFO  - org.apache.solr.schema.IndexSchema; Reading Solr Schema from
> /usr/lib/solr-5.2.1/server/solr/Books/conf/schema.xml
> INFO  - org.apache.solr.core.SolrConfig; Loaded SolrConfig: solrconfig.xml
> INFO  - org.apache.solr.schema.IndexSchema; Reading Solr Schema from
> /usr/lib/solr-5.2.1/server/solr/BookStores/conf/schema.xml
> INFO  - org.apache.solr.core.SolrConfig; Loaded SolrConfig: solrconfig.xml
> INFO  - org.apache.solr.schema.IndexSchema; Reading Solr Schema from
> /usr/lib/solr-5.2.1/server/solr/BookSales/conf/schema.xml
> INFO  - org.apache.solr.schema.IndexSchema; [Books] Schema name=Schema521
> INFO  - org.apache.solr.schema.IndexSchema; [BookStores] Schema name=
> Schema521
> INFO  - org.apache.solr.schema.IndexSchema; [BookSales] Schema name=
> Schema521
> INFO  - org.apache.solr.schema.IndexSchema; unique key field: id
> INFO  - org.apache.solr.schema.IndexSchema; unique key field: id
> INFO  - org.apache.solr.schema.IndexSchema; unique key field: id
>
>
> I've tried many different alternatives in the above dataconfig, such as
>
> in the query select:
> book_id AS id
> book_id AS 'id'
>
> and adding pk="book_id" to the entity.
>
> While I'm trying to fix this problem, I also do not understand
>
> 1. Does Solr require that the unique key for a collection must be " id ", or
> can it be any name, such as "book_id"?
>
> Any help or guidance would be appreciated.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-DIH-from-MySQL-with-unique-ID-tp4214872.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Questions regarding autosuggest (Solr 5.2.1)

2015-06-30 Thread Erick Erickson
Pesky computers, they keep doing exactly what I tell 'em to do, not
what I mean ;)

I'll open a JIRA for making Solr DWIM-compliant, Do What I Mean ;) ;)

On Tue, Jun 30, 2015 at 4:17 AM, Thomas Michael Engelke
 wrote:
>  God damn. Thank you.
>
> *ashamed*
>
> Am 30.06.2015 00:21 schrieb Erick Erickson:
>
>> Try not putting it in double quotes?
>>
>> Best,
>> Erick
>>
>> On Mon, Jun 29, 2015 at 12:22 PM, Thomas Michael Engelke
>>  wrote:
>>
>>> A friend and I are trying to develop some software using Solr in the 
>>> background, and with that comes alot of changes. We're used to older 
>>> versions (4.3 and below). We especially have problems with the autosuggest 
>>> feature. This is the field definition (schema.xml) for our autosuggest 
>>> field: >> stored="true" required="false" multiValued="true" /> ... >> source="name" dest="autosuggest" /> ... >> class="solr.TextField" positionIncrementGap="100">  
>>>  >> class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" 
>>> splitOnNumerics="1" generateWordParts="1" generateNumberParts="1" 
>>> catenateWords="1" catenateNumbers="0" catenateAll="0" 
>>> preserveOriginal="0"/>  
>>> >> ignoreCase="true" enablePositionIncrements="true"
> format="snowball"/>  class="solr.DictionaryCompoundWordTokenFilterFactory" 
> dictionary="dictionary.txt" minWordSize="5" minSubwordSize="3" 
> maxSubwordSize="30" onlyLongestMatch="false"/>  class="solr.GermanNormalizationFilterFactory"/>  class="solr.SnowballPorterFilterFactory" language="German2" 
> protected="protwords.txt"/>  minGramSize="2" maxGramSize="30"/>  class="solr.RemoveDuplicatesTokenFilterFactory"/>   type="query">   class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" 
> splitOnNumerics="1" generateWordParts="1" generateNumberParts="1" 
> catenateWords="1" catenateNumbers="0" catenateAll="0" preserveOriginal="0"/> 
>   class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" 
> enablePositionIncrements="true" format="snowball"/>  class="solr.GermanNormalizationFilterFactory"/>  class="solr.SnowballPorterFilterFactory" language="German2" 
> protected="protwords.txt"/>  class="solr.RemoveDuplicatesTokenFilterFactory"/>   
> Afterwards, we defined an autosuggest component to use this field, like this 
> (solrconfig.xml):  class="solr.SuggestComponent">   name="name">mySuggester FuzzyLookupFactory 
> suggester_fuzzy_dir  name="dictionaryImpl">DocumentDictionaryFactory  name="field">suggest  name="suggestAnalyzerFieldType">"autosuggest"  name="buildOnStartup">false false 
>   And add a requesthandler to test out the 
> functionality:  class="solr.SearchHandler" startup="lazy" >   name="suggest">true  name="suggest.count">10  name="suggest.dictionary">mySuggester   
> suggest   However, trying to start the core 
> that has this configuration, a long exception occurs, telling us this: Error 
> in configuration: "autosuggest" is not defined in the schema Now, that seems 
> to be wrong. Any idea how to fix that?
>


Re: Questions regarding autosuggest (Solr 5.2.1)

2015-06-30 Thread Alessandro Benedetti
I would like to add some consideration if possible.
I find the field type really hard analysed, are you sure is this ok with
your suggestions requirement ?
Usually is better to keep the field for suggestion as less analysed as
possible and then play with the different type of suggesters.
If you notice any additional problem, we can discuss through that, if not ,
it is ok !

Cheers

2015-06-30 13:48 GMT+01:00 Erick Erickson :

> Pesky computers, they keep doing exactly what I tell 'em to do, not
> what I mean ;)
>
> I'll open a JIRA for making Solr DWIM-compliant, Do What I Mean ;) ;)
>
> On Tue, Jun 30, 2015 at 4:17 AM, Thomas Michael Engelke
>  wrote:
> >  God damn. Thank you.
> >
> > *ashamed*
> >
> > Am 30.06.2015 00:21 schrieb Erick Erickson:
> >
> >> Try not putting it in double quotes?
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Jun 29, 2015 at 12:22 PM, Thomas Michael Engelke
> >>  wrote:
> >>
> >>> A friend and I are trying to develop some software using Solr in the
> background, and with that comes alot of changes. We're used to older
> versions (4.3 and below). We especially have problems with the autosuggest
> feature. This is the field definition (schema.xml) for our autosuggest
> field:  stored="true" required="false" multiValued="true" /> ...  source="name" dest="autosuggest" /> ...  class="solr.TextField" positionIncrementGap="100"> 
>   class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0"
> splitOnNumerics="1" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="0" catenateAll="0"
> preserveOriginal="0"/> 
>  ignoreCase="true" enablePositionIncrements="true"
> > format="snowball"/>  class="solr.DictionaryCompoundWordTokenFilterFactory"
> dictionary="dictionary.txt" minWordSize="5" minSubwordSize="3"
> maxSubwordSize="30" onlyLongestMatch="false"/>  class="solr.GermanNormalizationFilterFactory"/>  class="solr.SnowballPorterFilterFactory" language="German2"
> protected="protwords.txt"/>  minGramSize="2" maxGramSize="30"/>  class="solr.RemoveDuplicatesTokenFilterFactory"/>   type="query">   class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0"
> splitOnNumerics="1" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="0" catenateAll="0"
> preserveOriginal="0"/> 
>  ignoreCase="true" enablePositionIncrements="true" format="snowball"/>
>  > class="solr.GermanNormalizationFilterFactory"/>  class="solr.SnowballPorterFilterFactory" language="German2"
> protected="protwords.txt"/>  class="solr.RemoveDuplicatesTokenFilterFactory"/>  
> Afterwards, we defined an autosuggest component to use this field, like
> this (solrconfig.xml):  class="solr.SuggestComponent">   name="name">mySuggester  name="lookupImpl">FuzzyLookupFactory  name="storeDir">suggester_fuzzy_dir  name="dictionaryImpl">DocumentDictionaryFactory  name="field">suggest  name="suggestAnalyzerFieldType">"autosuggest"  name="buildOnStartup">false false
>   And add a requesthandler to test out the
> functionality:  class="solr.SearchHandler" startup="lazy" >   name="suggest">true  > name="suggest.count">10  name="suggest.dictionary">mySuggester  
> suggest   However, trying to start the
> core that has this configuration, a long exception occurs, telling us this:
> Error in configuration: "autosuggest" is not defined in the schema Now,
> that seems to be wrong. Any idea how to fix that?
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: SolrCloud 5.2.1 upgrade

2015-06-30 Thread Vincenzo D'Amore
Update: regarding the solrj changelog I found this:

-
https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5
and this:
-
https://issues.apache.org/jira/browse/SOLR/component/12324331?selectedTab=com.atlassian.jira.jira-projects-plugin:component-changelog-panel

On Tue, Jun 30, 2015 at 2:40 PM, Vincenzo D'Amore 
wrote:

> Hi All,
>
> I have a bunch of java clients connecting to a solrcloud cluster 4.8.1
> with Solrj 4.8.0.
> The question is, I have to switch clients and cluster to the new version
> at same time?
> Could I upgrade the cluster and in the following months upgrade clients?
>
> BTW, looking at Solrj 5.2.1 I have seen few class names has changed, e.g.
> CloudSolrServer has changed in CloudSolrClient, but interface look like the
> same.
> Well, is there a changelog or a documentation that explain what are the
> main differences?
>
> Best regards,
> Vincenzo
>
>


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Solr Suggester not working.

2015-06-30 Thread Vincenzo D'Amore
Thanks Sachin Vyas.

Maybe I have found a typo, but there is a closed comment "-->" alone at end
of tag

false -->



On Tue, Jun 30, 2015 at 2:09 PM, ssharma7...@gmail.com <
ssharma7...@gmail.com> wrote:

> Vincenzo D'Amore,
> The following is my (CURRENT) Working Final Configuration:
>
> *Scheme.xml*
> 
> .
> .
>  termVectors="true" termPositions="true" termOffsets="true" />
>  stored="true" required="true" multiValued="false" />
> .
> .
> 
>
> 
> .
> .
>
>  positionIncrementGap="100">
> 
>  class="solr.UAX29URLEmailTokenizerFactory"/>
>  ignoreCase="true"
> words="lang/stopwords_en.txt" />
>  class="solr.ASCIIFoldingFilterFactory"/>
>  class="solr.EnglishPossessiveFilterFactory"/>
>  class="solr.RemoveDuplicatesTokenFilterFactory"/>
> 
>  class="solr.LowerCaseFilterFactory"/>
> 
> 
>  class="solr.UAX29URLEmailTokenizerFactory"/>
>  ignoreCase="true"
> words="lang/stopwords_en.txt" />
>  class="solr.ASCIIFoldingFilterFactory"/>
>  class="solr.EnglishPossessiveFilterFactory"/>
>  class="solr.LowerCaseFilterFactory"/>
> 
> 
>
>  positionIncrementGap="100">
> 
>  class="solr.KeywordTokenizerFactory"/>
>  class="solr.LowerCaseFilterFactory"/>
> 
> 
>  class="solr.KeywordTokenizerFactory"/>
>  class="solr.LowerCaseFilterFactory"/>
> 
> 
>
> .
> .
> 
>
>
> *solrconfig.xml*
> ..
> ..
> 
>
>   textSuggester
>   FreeTextLookupFactory
>   DocumentDictionaryFactory
>   text
>   c_text
>   true
>
>
>   docNameSuggester
>   FreeTextLookupFactory
>   DocumentDictionaryFactory
>   document_name
>   c_document_name
>   true
>
> 
>
>  startup="lazy" >
> 
>   json
>   true
>   5
>   false-->
>
>   textSuggester
>   docNameSuggester
> 
> 
>   suggest
> 
>   
> ..
> ..
>
> *Solr Query URL*
>
> http://localhost:8983/solr/collection1/suggestHandler?&wt=xml&suggest.q=document
>
> *Suggester Output*
> 
> 
>
> 
>   0
>   16
> 
> 
>   
> 
>   5
>   
> 
>   document
>   512409557603043072
>   
> 
> 
>   document1
>   512409557603043072
>   
> 
> 
>   document2
>   512409557603043072
>   
> 
> 
>   document3
>   512409557603043072
>   
> 
> 
>   document4
>   512409557603043072
>   
> 
>   
> 
>   
>   
> 
>   3
>   
> 
>   document
>   10933347601771902
>   
> 
> 
>   documents
>   4373339040708760
>   
> 
> 
>   documenting
>   2186669520354380
>   
> 
>   
> 
>   
> 
> 
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Suggester-not-working-tp4214086p4214929.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Solr Suggester not working.

2015-06-30 Thread ssharma7...@gmail.com
Vincenzo D'Amore,
Yes You are right, it's a typo, I missed it while cleaning the XML to put on
the Solr-User list.
But, *REMOVE *the following line, this was not used in my Solr 5.1
configuration:
* false--> *


Regards,
Sachin Vyas.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Suggester-not-working-tp4214086p4214945.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr DIH from MySQL with unique ID

2015-06-30 Thread kurt
Erick,

Many thanks for your reply.

1. The file solr.log does not show any errors, however, there is a file
solr.log.8 which is 5MB and has a ton of text that was trying to index, but
there was an invalid date error. I fixed that. Is it possible that Solr
keeps trying to use that log? Can I simply delete these logs?

2. No I don't have autosuggest configured for that. I use terms component
for autocomplete.

3. The MySQL query runs very fast, and if I index the largest collection (2
million documents) alone, it runs in 15 minutes.

Thanks

K



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-DIH-from-MySQL-with-unique-ID-tp4214872p4214946.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Some guidance on memory requirements/usage/tuning

2015-06-30 Thread Alessandro Benedetti
Am I wrong or the current type of default IndexDirectory is the
"NRTCachingDirectoryFactory" since Solr 4.x ?
If I remember well this Factory is creating a Directory implementation
built on top of a "MMapDirectory".
In this case we should rely on the Memory Mapping Operative System feature
to properly manage the index in memory.
This means we give to the Solr JVM few memory and we leave all the big
memory to the operative system that will load and swap part of index in
memory ad hoc.

If my assumptions are correct and the user is using a recent Solr, it
should be a wrong practice to assign such a big memory to the Solr JVM ( I
assume Garbage collection will face big pauses and others problems as well).

One thing I read that should invalidate my assumptions is the  OOME
messages but actually as Toke wisely said,this can derive from a massive
usage of sorting, field grouping and field caching ( field faceting on not
DocValues fields ?).
On the other hand Erick consideration is quite important and I hope that
there is no usage of such a monster row parameter ( which will make
no-sense) .
To do deep paging the suggested approach is to use Cursor Mark ( as Erick
suggested) and if we need to stream all the results it can be a good option
to study the streaming component and the export feature.

Please correct me if I said anything wrong !

Cheers

2015-06-30 13:37 GMT+01:00 Erick Erickson :

> bq: The type of queries that are run can return anything from 1
> million to 9.5 million documents, and typically run for anything from
> 20 to 45 minutes.
>
> Uhhh, are you literally setting the &rows parameter to over 9.5M and
> getting that many docs all at once? Or is that just numFound and
> you're _really_ returning just a relatively few docs? Because if
> you're returning 9.5M rows, that's really an anti-pattern for Solr.
> There are other ways to do some of this (cursor mark, streaming
> aggregation,  export). But before we go there I want to be sure I'm
> understanding the use-case.
>
> Because I agree with Toke, the performance numbers you give are wy
> out of what I would expect, so clearly I don't get something about
> your setup.
>
> Best,
> Erick
>
> On Tue, Jun 30, 2015 at 3:43 AM, Toke Eskildsen 
> wrote:
> > On Tue, 2015-06-30 at 16:39 +1000, Caroline Hind wrote:
> >> We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same
> >> time increased the physical RAM from 24Gb to 96Gb. We run multiple
> >> cores on this one server, approximately 20 in total, but primarily we
> >> have one that is huge in comparison to all of the others. This very
> >> large core consists of nearly 62 million documents, and the index is
> >> around 45Gb in size.(Is that index unreasonably large, should it be
> >> sharded?)
> >
> > The size itself sounds fine, but your performance numbers below are
> > worrying. As always it is hard to give advice on setups:
> >
> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >
> >> I'm really unfamiliar with how we should be configuring our JVM.
> >> Currently we have it set to a maximum of 48Gb, up until yesterday it
> >> was set to 24Gb and we've been seeing the dreaded OOME messages from
> >> time to time.
> >
> > There is a shift in pointer size when one passes the 32GB mark for JVM
> > memory. Your 48GB allocation gives you about the same amount of heap as
> > a 32GB allocation would:
> >
> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
> > Consider running two Solrs on the same machine instead. Maybe one for
> > the large collection and one for the rest?
> >
> > Anyway, OOMs with ~32GB of heap for 62M documents indicates that you are
> > doing heavy sorting, grouping or faceting on fields that does not have
> > DocValues enabled. Could you describe what you do in that regard?
> >
> >> The type of queries that are run can return anything from
> >> 1 million to 9.5 million documents, and typically run for anything from
> >> 20 to 45 minutes.
> >
> > Such response times are a thousand times higher than what most people
> > are seeing. There might be a perfectly fine reason for those response
> > times, but I suggest we sanity check them: Could you show us a typical
> > query and tell us how many concurrent queries you normally serve?
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Restricting fields returned by Suggester reult.

2015-06-30 Thread ssharma7...@gmail.com
Hi,
Is it possible to restrict the result returned by Suggeter to "selected"
fields only?
i.e. Currently, Suggester returns data in following structure (XML),
Can I restrict the Solr (5.1) Suggestor to return ONLY "term" & EXCLUDE
 &
  as per Suggeter result XML below ?






  0
  16


  

  5
  

*document*
  512409557603043072
  


  *document1*
  512409557603043072
  


  *document2*
  512409557603043072
  


  *document3*
  512409557603043072
  


  *document4*
  512409557603043072
  

  

  
  

  3
  

  *document*
  10933347601771902
  


  *documents*
  4373339040708760
  


  *documenting*
  2186669520354380
  

  

  




Regards,
Sachin Vyas.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restricting-fields-returned-by-Suggester-reult-tp4214948.html
Sent from the Solr - User mailing list archive at Nabble.com.


Suggeter Result Exception in specific scenario

2015-06-30 Thread ssharma7...@gmail.com
Hi,
I have the following Solr 5.1 configuration:

*schema.xml*

. 
. 


. 
. 



. 
. 





























. 
. 



*solrconfig.xml*
.. 
.. 

   
  textSuggester
  FreeTextLookupFactory
  DocumentDictionaryFactory
  text
  c_text
  true
   
   
  docNameSuggester
  FreeTextLookupFactory
  DocumentDictionaryFactory
  document_name
  c_document_name
  true
   


  

  json
  true
  5

  textSuggester
  docNameSuggester


  suggest

  
.. 
.. 


1) I get the required results when I enter the Suggeter URL as below:
search text = copy of
http://localhost:8983/solr/collection1/suggestHandler?&wt=xml&suggest.q=copy%20of

2) When I enter the search text as copy of s OR copy of sc OR copy of sch

http://localhost:8983/solr/collection1/suggestHandler?&wt=xml&suggest.q=copy%20of%20s
http://localhost:8983/solr/portal_documents/suggestHandler?&wt=xml&suggest.q=copy%20of%20sc
http://localhost:8983/solr/portal_documents/suggestHandler?&wt=xml&suggest.q=copy%20of%20sch

I get the following exception in Suggetser Result XML:

5000java.lang.NullPointerException
at
org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.decodeWeight(FreeTextSuggester.java:749)
at
org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:612)
at
org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:444)
at
org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:433)
at
org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:216)
at
org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:258)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpPa

Suggester configuration queries.

2015-06-30 Thread ssharma7...@gmail.com
Hi,
I have the following Solr 5.1 configuration:

*schema.xml*

. 
. 


. 
. 



. 
. 





























. 
. 



*solrconfig.xml*
.. 
.. 

   
  textSuggester
  FreeTextLookupFactory
  DocumentDictionaryFactory
  text
  c_text
  true
   
   
  docNameSuggester
  FreeTextLookupFactory
  DocumentDictionaryFactory
  document_name
  c_document_name
  true
   


  

  json
  true
  5

  textSuggester
  docNameSuggester


  suggest

  
.. 
.. 


*Query:*
1) w.r.t. above configuration, is it OK to autocommit on save?

I came across the a link
http://www.signaldump.org/solr/qpod/33101/solr-suggester 
which mentions:

"The index-based spellcheck/suggest just reads terms from the indexed
fields which takes no time to build but suffers from reading indexed
terms, i.e. terms that have gone through the analysis process that may
have been stemmed, lowercased, all that."

So, if the above is correct, the time consumed is for reading data (SELECT).

P.S.I need to buildOnCommit to get the latest tokens in Suggeter. Any better
ideas, suggestion to achieve this?

Regards,
Sachin Vyas.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Erick Erickson
Dinesh:

This is what the admin/analysis page is for. It shows you exactly
what tokens are produced by what steps in the analysis chain.
That would be far better than trying to analyze the indexed
terms.

Best,
Erick

On Tue, Jun 30, 2015 at 8:35 AM, dinesh naik  wrote:
> Hi Erick,
> This is mainly for debugging purpose. If i have 20M records and few fields
> in some of the documents are not indexed as expected or something went
> wrong during indexing then how do we pin point the exact issue and fix the
> problem?
>
>
> Best Regards,
> Dinesh Naik
>
> On Tue, Jun 30, 2015 at 5:56 PM, Erick Erickson 
> wrote:
>
>> In short, not unless you want to get into low-level Lucene coding.
>> Inverted indexes are, well, inverted so their very structure makes
>> this difficult. It looks like this:
>>
>> But I'm not convinced yet that this isn't an XY problem. What is the
>> high-level problem you're trying to solve here? Maybe there's another
>> way to go about it.
>>
>> Best,
>> Erick
>>
>> On Tue, Jun 30, 2015 at 3:32 AM, dinesh naik 
>> wrote:
>> > Thanks Eric and Upayavira for your inputs.
>> >
>> > Is there a way i can associate this to a unique id of document, either
>> > using schema browser or TermsComponent?
>> >
>> > Best Regards,
>> > Dinesh Naik
>> >
>> > On Tue, Jun 30, 2015 at 2:55 AM, Upayavira  wrote:
>> >
>> >> Use the schema browser on the admin UI, and click the "load term info"
>> >> button. It'll show you the terms in your index.
>> >>
>> >> You can also use the analysis tab which will show you how it would
>> >> tokenise stuff for a specific field.
>> >>
>> >> Upayavira
>> >>
>> >> On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote:
>> >> > Hi Eric,
>> >> > By compressed value I meant value of a field after removing special
>> >> > characters . In my example its "-". Compressed form of red-apple is
>> >> > redapple .
>> >> >
>> >> > I wanted to know if we can see the analyzed version of fields .
>> >> >
>> >> > For example if I use ngram on a field , how do I see the analyzed
>> values
>> >> > in index ?
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > -Original Message-
>> >> > From: "Erick Erickson" 
>> >> > Sent: ‎29-‎06-‎2015 18:12
>> >> > To: "solr-user@lucene.apache.org" 
>> >> > Subject: Re: Reading indexed data from solr 5.1.0 using admin/luke?
>> >> >
>> >> > Not quite sure what you mean by "compressed values". admin/luke
>> >> > doesn't show the results of the compression of the stored values,
>> there's
>> >> > no way I know of to do that.
>> >> >
>> >> > Best,
>> >> > Erick
>> >> >
>> >> > On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik <
>> dineshkumarn...@gmail.com>
>> >> > wrote:
>> >> > > Hi all,
>> >> > >
>> >> > > Is there a way to read the indexed data for field on which the
>> >> > > analysis/processing  has been done ?
>> >> > >
>> >> > > I know using admin GUI we can see field wise analysis But how can i
>> get
>> >> > > hold on the complete document using admin/luke? or any other way?
>> >> > >
>> >> > > For example, if i have 2 fields called name and compressedname.
>> >> > >
>> >> > > name has values like apple, green-apple,red-apple
>> >> > > compressedname has values like apple,greenapple,redapple
>> >> > >
>> >> > > Even though i make both these field indexed=true and stored=true
>> >> > >
>> >> > > I am not able to see the compressed values using
>> >> admin/luke?id=
>> >> > >
>> >> > > in response i see something like this-
>> >> > >
>> >> > >
>> >> > > 
>> >> > > string
>> >> > > ITS--
>> >> > > ITS--
>> >> > > GREEN-APPLE
>> >> > > GREEN-APPLE
>> >> > > 1.0
>> >> > > 0
>> >> > > 
>> >> > > 
>> >> > > string
>> >> > > ITS--
>> >> > > ITS--
>> >> > > GREEN-APPLE
>> >> > > GREEN-APPLE
>> >> > > 1.0
>> >> > > 0
>> >> > > 
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Best Regards,
>> >> > > Dinesh Naik
>> >>
>> >
>> >
>> >
>> > --
>> > Best Regards,
>> > Dinesh Naik
>>
>
>
>
> --
> Best Regards,
> Dinesh Naik


Re: Restricting fields returned by Suggester reult.

2015-06-30 Thread Alessandro Benedetti
Actually what you are asking does not make any sense.
Solr response is returning that data structure because it must return as
much as possible.
It is responsibility of the client to get what it needs from the response.

Talking about the Java Client, I contributed the SolrJ code to parse the
Suggester response[1].
I will provide a final patch soon.
With my implementationn it is possible to use SolrJ to give you a simple
List of String for the suggested terms.

[1] https://issues.apache.org/jira/browse/SOLR-7719

Cheers

2015-06-30 14:28 GMT+01:00 ssharma7...@gmail.com :

> Hi,
> Is it possible to restrict the result returned by Suggeter to "selected"
> fields only?
> i.e. Currently, Suggester returns data in following structure (XML),
> Can I restrict the Solr (5.1) Suggestor to return ONLY "term" & EXCLUDE
>  &
>   as per Suggeter result XML below ?
>
>
> 
> 
>
> 
>   0
>   16
> 
> 
>   
> 
>   5
>   
> 
> *document*
>   512409557603043072
>   
> 
> 
>   *document1*
>   512409557603043072
>   
> 
> 
>   *document2*
>   512409557603043072
>   
> 
> 
>   *document3*
>   512409557603043072
>   
> 
> 
>   *document4*
>   512409557603043072
>   
> 
>   
> 
>   
>   
> 
>   3
>   
> 
>   *document*
>   10933347601771902
>   
> 
> 
>   *documents*
>   4373339040708760
>   
> 
> 
>   *documenting*
>   2186669520354380
>   
> 
>   
> 
>   
> 
> 
>
>
> Regards,
> Sachin Vyas.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Restricting-fields-returned-by-Suggester-reult-tp4214948.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Solr DIH from MySQL with unique ID

2015-06-30 Thread Erick Erickson
1> Not solr log. The transaction log. If it is present it'll be a
child directory of your data directory
called "tlog", a sibling to your index directory. And "big" here is
gigabytes. And yes, you can
just nuke it if you want. You get one automatically if you are using SolrCloud.

2> OK, it was a long shot

3> Hmmm, not sure what to say. The obvious test is to take the DIH
config out and see if
Solr starts right up, just to be sure it isn't something _else_. But
otherwise the the full Solr
log should contain a better clue what's going on.

bq: Does Solr require that the unique key for a collection must be " id ", or
can it be any name, such as "book_id"?

There's no a-priori name it has to be, just be sure it's defined that
as your uniqueKey in
schema.xml.

But this is one of the reasons that when things get complicated I
prefer SolrJ to DIH,
it allows much easier debugging. Here's a start:
http://searchhub.org/2012/02/14/indexing-with-solrj/

Best,
Erick

On Tue, Jun 30, 2015 at 9:20 AM, kurt  wrote:
> Erick,
>
> Many thanks for your reply.
>
> 1. The file solr.log does not show any errors, however, there is a file
> solr.log.8 which is 5MB and has a ton of text that was trying to index, but
> there was an invalid date error. I fixed that. Is it possible that Solr
> keeps trying to use that log? Can I simply delete these logs?
>
> 2. No I don't have autosuggest configured for that. I use terms component
> for autocomplete.
>
> 3. The MySQL query runs very fast, and if I index the largest collection (2
> million documents) alone, it runs in 15 minutes.
>
> Thanks
>
> K
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-DIH-from-MySQL-with-unique-ID-tp4214872p4214946.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Suggester configuration queries.

2015-06-30 Thread Erick Erickson
This will be pretty much unworkable for any large corpus. The
DocumentDictionaryFactory
builds its index by reading the stored value from every document
in your index to put into a sidecar Solr index (for free text suggester).

This can take many minutes so doing this on every commit is an
anti-pattern. The suggester framework is very powerful, but not to be
used casually.

So for your situation, I'd use a copyField to a minimally-analyzed field
and use the index-based suggesters.

Best,
Erick

On Tue, Jun 30, 2015 at 9:35 AM, ssharma7...@gmail.com
 wrote:
> Hi,
> I have the following Solr 5.1 configuration:
>
> *schema.xml*
> 
> .
> .
>  termVectors="true" termPositions="true" termOffsets="true" />
>  stored="true" required="true" multiValued="false" />
> .
> .
> 
>
> 
> .
> .
>  positionIncrementGap="100">
> 
>  class="solr.UAX29URLEmailTokenizerFactory"/>
>  ignoreCase="true" words="lang/stopwords_en.txt" />
>  class="solr.ASCIIFoldingFilterFactory"/>
>  class="solr.EnglishPossessiveFilterFactory"/>
>  class="solr.RemoveDuplicatesTokenFilterFactory"/>
> 
>  class="solr.LowerCaseFilterFactory"/>
> 
> 
>  class="solr.UAX29URLEmailTokenizerFactory"/>
>  ignoreCase="true" words="lang/stopwords_en.txt" />
>  class="solr.ASCIIFoldingFilterFactory"/>
>  class="solr.EnglishPossessiveFilterFactory"/>
>  class="solr.LowerCaseFilterFactory"/>
> 
> 
>
>  positionIncrementGap="100">
> 
>  class="solr.KeywordTokenizerFactory"/>
>  class="solr.LowerCaseFilterFactory"/>
> 
> 
>  class="solr.KeywordTokenizerFactory"/>
>  class="solr.LowerCaseFilterFactory"/>
> 
> 
> .
> .
> 
>
>
> *solrconfig.xml*
> ..
> ..
> 
>
>   textSuggester
>   FreeTextLookupFactory
>   DocumentDictionaryFactory
>   text
>   c_text
>   true
>
>
>   docNameSuggester
>   FreeTextLookupFactory
>   DocumentDictionaryFactory
>   document_name
>   c_document_name
>   true
>
> 
>
>  startup="lazy" >
> 
>   json
>   true
>   5
>
>   textSuggester
>   docNameSuggester
> 
> 
>   suggest
> 
>   
> ..
> ..
>
>
> *Query:*
> 1) w.r.t. above configuration, is it OK to autocommit on save?
>
> I came across the a link
> http://www.signaldump.org/solr/qpod/33101/solr-suggester
> which mentions:
>
> "The index-based spellcheck/suggest just reads terms from the indexed
> fields which takes no time to build but suffers from reading indexed
> terms, i.e. terms that have gone through the analysis process that may
> have been stemmed, lowercased, all that."
>
> So, if the above is correct, the time consumed is for reading data (SELECT).
>
> P.S.I need to buildOnCommit to get the latest tokens in Suggeter. Any better
> ideas, suggestion to achieve this?
>
> Regards,
> Sachin Vyas.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread dinesh naik
Hi Erick,

I agree with you.

But i was checking if we could  get  hold on the whole document (to see all
analyzed field values) .

There might be chances that field value is common for multiple documents .
In such cases it will be difficult to backtrack which document has the
issue . Because admin/analysis can be used to see for field level analysis
only.



Best Regards,
Dinesh Naik

On Tue, Jun 30, 2015 at 7:08 PM, Erick Erickson 
wrote:

> Dinesh:
>
> This is what the admin/analysis page is for. It shows you exactly
> what tokens are produced by what steps in the analysis chain.
> That would be far better than trying to analyze the indexed
> terms.
>
> Best,
> Erick
>
> On Tue, Jun 30, 2015 at 8:35 AM, dinesh naik 
> wrote:
> > Hi Erick,
> > This is mainly for debugging purpose. If i have 20M records and few
> fields
> > in some of the documents are not indexed as expected or something went
> > wrong during indexing then how do we pin point the exact issue and fix
> the
> > problem?
> >
> >
> > Best Regards,
> > Dinesh Naik
> >
> > On Tue, Jun 30, 2015 at 5:56 PM, Erick Erickson  >
> > wrote:
> >
> >> In short, not unless you want to get into low-level Lucene coding.
> >> Inverted indexes are, well, inverted so their very structure makes
> >> this difficult. It looks like this:
> >>
> >> But I'm not convinced yet that this isn't an XY problem. What is the
> >> high-level problem you're trying to solve here? Maybe there's another
> >> way to go about it.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Jun 30, 2015 at 3:32 AM, dinesh naik  >
> >> wrote:
> >> > Thanks Eric and Upayavira for your inputs.
> >> >
> >> > Is there a way i can associate this to a unique id of document, either
> >> > using schema browser or TermsComponent?
> >> >
> >> > Best Regards,
> >> > Dinesh Naik
> >> >
> >> > On Tue, Jun 30, 2015 at 2:55 AM, Upayavira  wrote:
> >> >
> >> >> Use the schema browser on the admin UI, and click the "load term
> info"
> >> >> button. It'll show you the terms in your index.
> >> >>
> >> >> You can also use the analysis tab which will show you how it would
> >> >> tokenise stuff for a specific field.
> >> >>
> >> >> Upayavira
> >> >>
> >> >> On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote:
> >> >> > Hi Eric,
> >> >> > By compressed value I meant value of a field after removing special
> >> >> > characters . In my example its "-". Compressed form of red-apple is
> >> >> > redapple .
> >> >> >
> >> >> > I wanted to know if we can see the analyzed version of fields .
> >> >> >
> >> >> > For example if I use ngram on a field , how do I see the analyzed
> >> values
> >> >> > in index ?
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > -Original Message-
> >> >> > From: "Erick Erickson" 
> >> >> > Sent: ‎29-‎06-‎2015 18:12
> >> >> > To: "solr-user@lucene.apache.org" 
> >> >> > Subject: Re: Reading indexed data from solr 5.1.0 using admin/luke?
> >> >> >
> >> >> > Not quite sure what you mean by "compressed values". admin/luke
> >> >> > doesn't show the results of the compression of the stored values,
> >> there's
> >> >> > no way I know of to do that.
> >> >> >
> >> >> > Best,
> >> >> > Erick
> >> >> >
> >> >> > On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik <
> >> dineshkumarn...@gmail.com>
> >> >> > wrote:
> >> >> > > Hi all,
> >> >> > >
> >> >> > > Is there a way to read the indexed data for field on which the
> >> >> > > analysis/processing  has been done ?
> >> >> > >
> >> >> > > I know using admin GUI we can see field wise analysis But how
> can i
> >> get
> >> >> > > hold on the complete document using admin/luke? or any other way?
> >> >> > >
> >> >> > > For example, if i have 2 fields called name and compressedname.
> >> >> > >
> >> >> > > name has values like apple, green-apple,red-apple
> >> >> > > compressedname has values like apple,greenapple,redapple
> >> >> > >
> >> >> > > Even though i make both these field indexed=true and stored=true
> >> >> > >
> >> >> > > I am not able to see the compressed values using
> >> >> admin/luke?id=
> >> >> > >
> >> >> > > in response i see something like this-
> >> >> > >
> >> >> > >
> >> >> > > 
> >> >> > > string
> >> >> > > ITS--
> >> >> > > ITS--
> >> >> > > GREEN-APPLE
> >> >> > > GREEN-APPLE
> >> >> > > 1.0
> >> >> > > 0
> >> >> > > 
> >> >> > > 
> >> >> > > string
> >> >> > > ITS--
> >> >> > > ITS--
> >> >> > > GREEN-APPLE
> >> >> > > GREEN-APPLE
> >> >> > > 1.0
> >> >> > > 0
> >> >> > > 
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > --
> >> >> > > Best Regards,
> >> >> > > Dinesh Naik
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Best Regards,
> >> > Dinesh Naik
> >>
> >
> >
> >
> > --
> > Best Regards,
> > Dinesh Naik
>



-- 
Best Regards,
Dinesh Naik


Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Mark Ehle
Thanks to all for the help - it's now storing text and I can search and get
results just before in 4.6, but I cannot get snippets to appear when I ask
for highlighting.


when I add documents, here is the URL my script generates:

http://localhost:8080/solr/newspapers/update/extract?literal.id=2015_01_01_battlecreekenquirer-004&literal.publication_date=2015-01-01T00:00:00Z&literal.year=2015&literal.yearstr=2015&literal.day=1&literal.month_num=1&literal.month=01_January&literal.publication_name=Battle%20Creek%20Enquirer&literal.publication_type=newspaper&literal.short_name=battlecreekenquirer&literal.image_number=4&literal.filename=2015_01_01_battlecreekenquirer-004.pdf&literal.copyright_year=1923&literal.copyright_restricted=y&fmap.content=publication_text&stream.contentType=application%2Ftxt&stream.file=%2Farchive_data%2Fnewspapers%2FBattle%20Creek%20Enquirer%2F2015%2F01_January%2F2015_01_01_battlecreekenquirer%2Ftxt%2F2015_01_01_battlecreekenquirer-004.txt


And here is my schema:








  

  


















































  

  




  




  
  




  




  









  
  








  




  








  
  







  




  









  




  




  
  




  







  








  



  


  



  



  




  


  



  

  











   

 


 

   
   

















































   
   
   
   
   
   
   

   
   
   
   

   
   

   
   
   
   
   
   
   
   
   
   
   


   
   

   
   

   
   

   

   
   


   
   
   
   
   
   
   
   
   

   
   

   
   

   
   
   
   
   
   


   
   

   

   
   

 

 
 id

 
 text

 
 

  

   
   
   
   
   
   
   
   



   
   


 
 
 
 








On Sat, Jun 27, 2015 at 11:27 AM, Erick Erickson 
wrote:

> This should be no different in 5.2 than 4.6.
>
> My first guess is a typo somewhere or some similar forehead-slapper.
> Are you sure you're specifying the field in the "fl" list?
>
> Take a look at the index files, the *.fdt files are where the stored data
> goes. You can't look into them, but for the same documents they should
> be roughly the same aggregate size as they are in 4.6
> 'du -hc *.fdt' will sum them all up for you (*nix).
>
> Second thing I'd do for sanity check is tail out the Solr log while
> indexing and querying, just to see "stuff" go by and see if any
> errors are thrown, although it sounds like you wouldn't see
> any search results at all if there was something wrong with
> indexing.
>
> And if none of that sheds any light, let's see the schema file?
> Maybe the results of adding &debug=all to the query?
>
> Best,
> Erick
>
> On Fri, Jun 26, 2015 at 8:05 AM, Mark Ehle  wrote:
> > In my schema from 4.6, the text was in the 'text' field, and the "stored"
> > attrib was set to "true" as it is in the 5.2 schema. I am ingesting the
> > text from files on the server , and it used to work just fine with 4.6. I
> > am using the same schema except I had to get rid the field types pint,
> > plong, pfloat, pdouble and pdate. Otherwise, the schema is identical.
> >
> > How do I tell SOLR 5.2 to store the text from a file to a certain field?
> >
> > Thanks!
> >
> >
> > On Fri, Jun 26, 2015 at 7:29 AM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> >> Actually storing or not storing a field is a simple schema.xml
> >> configuration.
> >> This suggestion can be obvious, but … have you checked you have your
> >> "stored" attribute set "true" for the field you are interested ?
> >>
> >> I am talking about the 5.2 schema.
> >>
> >> Cheers
> >>
> >> 2015-06-26 12:24 GMT+01:00 Mark Ehle :
> >>
> >> > Folks -
> >> >
> >> > I am using SOLR 4.6 to run a newspaper indexing site we have at the
> >> library
> >> > I work at. I would like to update to

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Alessandro Benedetti
Do you have the original document available ? Or stored in the field of
interest ?
Should be quite an easy test to reproduce the Analysis simply using the
analysis tool Upaya and Erick suggested.
Just use your real document content and you will see how it is exactly
analysed.

Cheers

2015-06-30 15:03 GMT+01:00 dinesh naik :

> Hi Erick,
>
> I agree with you.
>
> But i was checking if we could  get  hold on the whole document (to see all
> analyzed field values) .
>
> There might be chances that field value is common for multiple documents .
> In such cases it will be difficult to backtrack which document has the
> issue . Because admin/analysis can be used to see for field level analysis
> only.
>
>
>
> Best Regards,
> Dinesh Naik
>
> On Tue, Jun 30, 2015 at 7:08 PM, Erick Erickson 
> wrote:
>
> > Dinesh:
> >
> > This is what the admin/analysis page is for. It shows you exactly
> > what tokens are produced by what steps in the analysis chain.
> > That would be far better than trying to analyze the indexed
> > terms.
> >
> > Best,
> > Erick
> >
> > On Tue, Jun 30, 2015 at 8:35 AM, dinesh naik 
> > wrote:
> > > Hi Erick,
> > > This is mainly for debugging purpose. If i have 20M records and few
> > fields
> > > in some of the documents are not indexed as expected or something went
> > > wrong during indexing then how do we pin point the exact issue and fix
> > the
> > > problem?
> > >
> > >
> > > Best Regards,
> > > Dinesh Naik
> > >
> > > On Tue, Jun 30, 2015 at 5:56 PM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > >> In short, not unless you want to get into low-level Lucene coding.
> > >> Inverted indexes are, well, inverted so their very structure makes
> > >> this difficult. It looks like this:
> > >>
> > >> But I'm not convinced yet that this isn't an XY problem. What is the
> > >> high-level problem you're trying to solve here? Maybe there's another
> > >> way to go about it.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Tue, Jun 30, 2015 at 3:32 AM, dinesh naik <
> dineshkumarn...@gmail.com
> > >
> > >> wrote:
> > >> > Thanks Eric and Upayavira for your inputs.
> > >> >
> > >> > Is there a way i can associate this to a unique id of document,
> either
> > >> > using schema browser or TermsComponent?
> > >> >
> > >> > Best Regards,
> > >> > Dinesh Naik
> > >> >
> > >> > On Tue, Jun 30, 2015 at 2:55 AM, Upayavira  wrote:
> > >> >
> > >> >> Use the schema browser on the admin UI, and click the "load term
> > info"
> > >> >> button. It'll show you the terms in your index.
> > >> >>
> > >> >> You can also use the analysis tab which will show you how it would
> > >> >> tokenise stuff for a specific field.
> > >> >>
> > >> >> Upayavira
> > >> >>
> > >> >> On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote:
> > >> >> > Hi Eric,
> > >> >> > By compressed value I meant value of a field after removing
> special
> > >> >> > characters . In my example its "-". Compressed form of red-apple
> is
> > >> >> > redapple .
> > >> >> >
> > >> >> > I wanted to know if we can see the analyzed version of fields .
> > >> >> >
> > >> >> > For example if I use ngram on a field , how do I see the analyzed
> > >> values
> > >> >> > in index ?
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > -Original Message-
> > >> >> > From: "Erick Erickson" 
> > >> >> > Sent: ‎29-‎06-‎2015 18:12
> > >> >> > To: "solr-user@lucene.apache.org" 
> > >> >> > Subject: Re: Reading indexed data from solr 5.1.0 using
> admin/luke?
> > >> >> >
> > >> >> > Not quite sure what you mean by "compressed values". admin/luke
> > >> >> > doesn't show the results of the compression of the stored values,
> > >> there's
> > >> >> > no way I know of to do that.
> > >> >> >
> > >> >> > Best,
> > >> >> > Erick
> > >> >> >
> > >> >> > On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik <
> > >> dineshkumarn...@gmail.com>
> > >> >> > wrote:
> > >> >> > > Hi all,
> > >> >> > >
> > >> >> > > Is there a way to read the indexed data for field on which the
> > >> >> > > analysis/processing  has been done ?
> > >> >> > >
> > >> >> > > I know using admin GUI we can see field wise analysis But how
> > can i
> > >> get
> > >> >> > > hold on the complete document using admin/luke? or any other
> way?
> > >> >> > >
> > >> >> > > For example, if i have 2 fields called name and compressedname.
> > >> >> > >
> > >> >> > > name has values like apple, green-apple,red-apple
> > >> >> > > compressedname has values like apple,greenapple,redapple
> > >> >> > >
> > >> >> > > Even though i make both these field indexed=true and
> stored=true
> > >> >> > >
> > >> >> > > I am not able to see the compressed values using
> > >> >> admin/luke?id=
> > >> >> > >
> > >> >> > > in response i see something like this-
> > >> >> > >
> > >> >> > >
> > >> >> > > 
> > >> >> > > string
> > >> >> > > ITS--
> > >> >> > > ITS--
> > >> >> > > GREEN-APPLE
> > >> >> > > GREEN-APPLE
> > >> >> > > 1.0
> > >> >> > > 0
> > >> >> > > 
> > >> >> > > 

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi,

I am currently investigating the queries with a much small index size (1M)
to see the grouping, faceting on the performance degradation. This will
allow me to do a lot of tests in a short period of time.

However, it looks like the query is executed much faster the second time.
This is tested after re-indexing, and not immediately executed again. It
looks like it may be due to auto warming during or after re-indexing?

I would like to get the response profile (query, faceting etc) for the same
query in two separate requests without any cache or warming so that I get a
good average number and not much fluctuation. What are the settings that I
need to disable (temporarily) just for the purpose of the investigation? In
the solrconfig.xml, I can see filterCache, queryResultCache, documentCache
etc. I am not sure what need to be disabled to facilitate my work.

I understand that cache and warming setting will be very helpful in load
test later on. However, if I can optimize the query in a single request
scenario, the performance will be in a much better shape with all the cache
and warming setting during a load test scenario.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4214968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud 5.2.1 upgrade

2015-06-30 Thread Shawn Heisey
On 6/30/2015 6:40 AM, Vincenzo D'Amore wrote:
> I have a bunch of java clients connecting to a solrcloud cluster 4.8.1 with
> Solrj 4.8.0.
> The question is, I have to switch clients and cluster to the new version at
> same time?
> Could I upgrade the cluster and in the following months upgrade clients?
>
> BTW, looking at Solrj 5.2.1 I have seen few class names has changed, e.g.
> CloudSolrServer has changed in CloudSolrClient, but interface look like the
> same.
> Well, is there a changelog or a documentation that explain what are the
> main differences?

The CHANGES.txt for 4.8.1 does mention one bugfix specific to the cloud
client - SOLR-6035.  It would be a good idea to upgrade the client as
well as the server.  Since you are running SolrCloud, it is advisable to
keep your Solr and SolrJ versions the same, because SolrCloud is
changing very rapidly and cross-version compatibility is not very good. 
If you were not running SolrCloud, then you would not need to be as
careful with version numbers.  I'm using the 5.1.0 SolrJ client in my
dev index build program with Solr 4.9.1 and 4.7.2, and it all works
perfectly because my Solr servers are not in cloud mode.

When changes are committed to Solr, it is the CHANGES.txt file that gets
updated to track what was done, so that file has the best information. 
The section in that file for 5.0.0 has a VERY comprehensive list of
changes from version 4 to version 5.  Here is a version of that file
that you can browse easily:

https://lucene.apache.org/solr/5_2_1/changes/Changes.html#v5.0.0

The change from CloudSolrServer to CloudSolrClient is mentioned in
"Other Changes" as SOLR-6895 -- all of the "SolrServer" classes were
renamed by this issue, except EmbeddedSolrServer.  That one did not get
renamed since it is actually a server, not a client.

Note that the URL above is the CHANGES from 5.2.1, with the 5.0.0
section opened.  You should also read the sections for each version
*after* 5.0.0, and you may wish to read the sections for the 4.8.1,
4.9.x, and 4.10.x versions so you will know everything that has changed
since 4.8.0.

Thanks,
Shawn



Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Alessandro Benedetti
Instead of your immense schema, can you give us the details of the
Highlight you are trying to use ?
And how you are trying to use it ?
Which client ? Direct APi calls ?

let us know!

Cheers

2015-06-30 15:10 GMT+01:00 Mark Ehle :

> Thanks to all for the help - it's now storing text and I can search and get
> results just before in 4.6, but I cannot get snippets to appear when I ask
> for highlighting.
>
>
> when I add documents, here is the URL my script generates:
>
>
> http://localhost:8080/solr/newspapers/update/extract?literal.id=2015_01_01_battlecreekenquirer-004&literal.publication_date=2015-01-01T00:00:00Z&literal.year=2015&literal.yearstr=2015&literal.day=1&literal.month_num=1&literal.month=01_January&literal.publication_name=Battle%20Creek%20Enquirer&literal.publication_type=newspaper&literal.short_name=battlecreekenquirer&literal.image_number=4&literal.filename=2015_01_01_battlecreekenquirer-004.pdf&literal.copyright_year=1923&literal.copyright_restricted=y&fmap.content=publication_text&stream.contentType=application%2Ftxt&stream.file=%2Farchive_data%2Fnewspapers%2FBattle%20Creek%20Enquirer%2F2015%2F01_January%2F2015_01_01_battlecreekenquirer%2Ftxt%2F2015_01_01_battlecreekenquirer-004.txt
>
>
> And here is my schema:
>
> 
>
> 
>
> 
>
> 
>   
>
>   
> 
>
> 
>  omitNorms="true"/>
>
> 
>  omitNorms="true"/>
> 
> 
>
> 
>
> 
>  omitNorms="true" positionIncrementGap="0"/>
>  omitNorms="true" positionIncrementGap="0"/>
>  omitNorms="true" positionIncrementGap="0"/>
>  omitNorms="true" positionIncrementGap="0"/>
>
> 
>  omitNorms="true" positionIncrementGap="0"/>
>  omitNorms="true" positionIncrementGap="0"/>
>  omitNorms="true" positionIncrementGap="0"/>
>  precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
>
> 
>  precisionStep="0" positionIncrementGap="0"/>
>
> 
>  precisionStep="6" positionIncrementGap="0"/>
>
>
> 
>
>
> 
>  sortMissingLast="true" omitNorms="true"/>
>  sortMissingLast="true" omitNorms="true"/>
>  sortMissingLast="true" omitNorms="true"/>
>  sortMissingLast="true" omitNorms="true"/>
>
>
> 
> 
>
> 
>
> 
>
> 
>  positionIncrementGap="100">
>   
> 
>   
> 
>
> 
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
> 
> 
>   
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
>  ignoreCase="true" expand="true"/>
> 
>   
> 
>
> 
>  positionIncrementGap="100">
>   
> 
> 
> 
>  ignoreCase="true"
> words="stopwords_en.txt"
> enablePositionIncrements="true"
> />
> 
> 
>  protected="protwords.txt"/>
> 
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="stopwords_en.txt"
> enablePositionIncrements="true"
> />
> 
> 
>  protected="protwords.txt"/>
> 
> 
>   
> 
>
> 
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>   
> 
> 
> 
>  ignoreCase="true"
> words="stopwords_en.txt"
> enablePositionIncrements="true"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="stopwords_en.txt"
> enablePositionIncrements="true"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
> 
>
> 
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" expand="false"/>
>  words="stopwords_en.txt"/>
>  generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
>   
> 
>
> 
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
> 
>  withOriginal="true"
>maxPosAsterisk="3" maxPosQuestion="2"
> maxFractionAsterisk="0.33"/>
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  words="stopwords.txt" enablePositionIncrements="true" />
> 
>  

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread dinesh naik
Hi Alessandro,
I am able to check the field wise analyzed results.

I was interested in getting the complete document.

As Erick mentioned -
Reconstructing the doc from the
postings lists isactually quite tedious. The Luke program (not request
handler) has a
function that
does this, it's not fast though, more for troubleshooting than trying to do
anything in a production environment.

I ll try looking into the Luke program if i can get this done.

Thanks and Best Regards,
Dinesh Naik

On Tue, Jun 30, 2015 at 7:42 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Do you have the original document available ? Or stored in the field of
> interest ?
> Should be quite an easy test to reproduce the Analysis simply using the
> analysis tool Upaya and Erick suggested.
> Just use your real document content and you will see how it is exactly
> analysed.
>
> Cheers
>
> 2015-06-30 15:03 GMT+01:00 dinesh naik :
>
> > Hi Erick,
> >
> > I agree with you.
> >
> > But i was checking if we could  get  hold on the whole document (to see
> all
> > analyzed field values) .
> >
> > There might be chances that field value is common for multiple documents
> .
> > In such cases it will be difficult to backtrack which document has the
> > issue . Because admin/analysis can be used to see for field level
> analysis
> > only.
> >
> >
> >
> > Best Regards,
> > Dinesh Naik
> >
> > On Tue, Jun 30, 2015 at 7:08 PM, Erick Erickson  >
> > wrote:
> >
> > > Dinesh:
> > >
> > > This is what the admin/analysis page is for. It shows you exactly
> > > what tokens are produced by what steps in the analysis chain.
> > > That would be far better than trying to analyze the indexed
> > > terms.
> > >
> > > Best,
> > > Erick
> > >
> > > On Tue, Jun 30, 2015 at 8:35 AM, dinesh naik <
> dineshkumarn...@gmail.com>
> > > wrote:
> > > > Hi Erick,
> > > > This is mainly for debugging purpose. If i have 20M records and few
> > > fields
> > > > in some of the documents are not indexed as expected or something
> went
> > > > wrong during indexing then how do we pin point the exact issue and
> fix
> > > the
> > > > problem?
> > > >
> > > >
> > > > Best Regards,
> > > > Dinesh Naik
> > > >
> > > > On Tue, Jun 30, 2015 at 5:56 PM, Erick Erickson <
> > erickerick...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> In short, not unless you want to get into low-level Lucene coding.
> > > >> Inverted indexes are, well, inverted so their very structure makes
> > > >> this difficult. It looks like this:
> > > >>
> > > >> But I'm not convinced yet that this isn't an XY problem. What is the
> > > >> high-level problem you're trying to solve here? Maybe there's
> another
> > > >> way to go about it.
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >> On Tue, Jun 30, 2015 at 3:32 AM, dinesh naik <
> > dineshkumarn...@gmail.com
> > > >
> > > >> wrote:
> > > >> > Thanks Eric and Upayavira for your inputs.
> > > >> >
> > > >> > Is there a way i can associate this to a unique id of document,
> > either
> > > >> > using schema browser or TermsComponent?
> > > >> >
> > > >> > Best Regards,
> > > >> > Dinesh Naik
> > > >> >
> > > >> > On Tue, Jun 30, 2015 at 2:55 AM, Upayavira 
> wrote:
> > > >> >
> > > >> >> Use the schema browser on the admin UI, and click the "load term
> > > info"
> > > >> >> button. It'll show you the terms in your index.
> > > >> >>
> > > >> >> You can also use the analysis tab which will show you how it
> would
> > > >> >> tokenise stuff for a specific field.
> > > >> >>
> > > >> >> Upayavira
> > > >> >>
> > > >> >> On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote:
> > > >> >> > Hi Eric,
> > > >> >> > By compressed value I meant value of a field after removing
> > special
> > > >> >> > characters . In my example its "-". Compressed form of
> red-apple
> > is
> > > >> >> > redapple .
> > > >> >> >
> > > >> >> > I wanted to know if we can see the analyzed version of fields .
> > > >> >> >
> > > >> >> > For example if I use ngram on a field , how do I see the
> analyzed
> > > >> values
> > > >> >> > in index ?
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > -Original Message-
> > > >> >> > From: "Erick Erickson" 
> > > >> >> > Sent: ‎29-‎06-‎2015 18:12
> > > >> >> > To: "solr-user@lucene.apache.org"  >
> > > >> >> > Subject: Re: Reading indexed data from solr 5.1.0 using
> > admin/luke?
> > > >> >> >
> > > >> >> > Not quite sure what you mean by "compressed values". admin/luke
> > > >> >> > doesn't show the results of the compression of the stored
> values,
> > > >> there's
> > > >> >> > no way I know of to do that.
> > > >> >> >
> > > >> >> > Best,
> > > >> >> > Erick
> > > >> >> >
> > > >> >> > On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik <
> > > >> dineshkumarn...@gmail.com>
> > > >> >> > wrote:
> > > >> >> > > Hi all,
> > > >> >> > >
> > > >> >> > > Is there a way to read the indexed data for field on which
> the
> > > >> >> > > analysis/processing  has been done ?
> > > >> >> > >
> > > >> >> > > I kn

Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Mark Ehle
Alessandro -

Someone asked to see the schema, I posted it. Should I have just attached
it? Does this mailing list support that?

I am by no means a SOLR expert. I am a PHP coder who wrote a
(very-much-loved by our library staff and patrons) newspaper indexing tool
that I am trying to update. I only know enough about SOLR to install it,
and index and query. All I did to the 5.2 schema was add the
newspaper-specific fields that was in the old schema.

I cannot answer most of your questions. I just know that this url:
http://127.0.0.1:8080/solr/newspapers/select?q=%22JOHN+GRAP%22&fl=year&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

used to produce snippets of highlited text in 4.6. In 5.2 it does not.


Thanks -

Mark Ehle
Computer Support Librarian
Willard Library
Battle Creek, MI


On Tue, Jun 30, 2015 at 10:50 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Instead of your immense schema, can you give us the details of the
> Highlight you are trying to use ?
> And how you are trying to use it ?
> Which client ? Direct APi calls ?
>
> let us know!
>
> Cheers
>
> 2015-06-30 15:10 GMT+01:00 Mark Ehle :
>
> > Thanks to all for the help - it's now storing text and I can search and
> get
> > results just before in 4.6, but I cannot get snippets to appear when I
> ask
> > for highlighting.
> >
> >
> > when I add documents, here is the URL my script generates:
> >
> >
> >
> http://localhost:8080/solr/newspapers/update/extract?literal.id=2015_01_01_battlecreekenquirer-004&literal.publication_date=2015-01-01T00:00:00Z&literal.year=2015&literal.yearstr=2015&literal.day=1&literal.month_num=1&literal.month=01_January&literal.publication_name=Battle%20Creek%20Enquirer&literal.publication_type=newspaper&literal.short_name=battlecreekenquirer&literal.image_number=4&literal.filename=2015_01_01_battlecreekenquirer-004.pdf&literal.copyright_year=1923&literal.copyright_restricted=y&fmap.content=publication_text&stream.contentType=application%2Ftxt&stream.file=%2Farchive_data%2Fnewspapers%2FBattle%20Creek%20Enquirer%2F2015%2F01_January%2F2015_01_01_battlecreekenquirer%2Ftxt%2F2015_01_01_battlecreekenquirer-004.txt
> >
> >
> > And here is my schema:
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >   
> >
> >   
> > 
> >
> > 
> >  > omitNorms="true"/>
> >
> > 
> >  sortMissingLast="true"
> > omitNorms="true"/>
> > 
> > 
> >
> > 
> >
> > 
> >  > omitNorms="true" positionIncrementGap="0"/>
> >  > omitNorms="true" positionIncrementGap="0"/>
> >  > omitNorms="true" positionIncrementGap="0"/>
> >  precisionStep="0"
> > omitNorms="true" positionIncrementGap="0"/>
> >
> > 
> >  > omitNorms="true" positionIncrementGap="0"/>
> >  precisionStep="8"
> > omitNorms="true" positionIncrementGap="0"/>
> >  > omitNorms="true" positionIncrementGap="0"/>
> >  > precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
> >
> > 
> >  > precisionStep="0" positionIncrementGap="0"/>
> >
> > 
> >  > precisionStep="6" positionIncrementGap="0"/>
> >
> >
> > 
> >
> >
> > 
> >  > sortMissingLast="true" omitNorms="true"/>
> >  > sortMissingLast="true" omitNorms="true"/>
> >  > sortMissingLast="true" omitNorms="true"/>
> >  > sortMissingLast="true" omitNorms="true"/>
> >
> >
> > 
> >  />
> >
> > 
> >
> > 
> >
> > 
> >  > positionIncrementGap="100">
> >   
> > 
> >   
> > 
> >
> > 
> >  > positionIncrementGap="100">
> >   
> > 
> >  > words="stopwords.txt" enablePositionIncrements="true" />
> > 
> > 
> >   
> >   
> > 
> >  > words="stopwords.txt" enablePositionIncrements="true" />
> >  > ignoreCase="true" expand="true"/>
> > 
> >   
> > 
> >
> > 
> >  > positionIncrementGap="100">
> >   
> > 
> > 
> > 
> >  > ignoreCase="true"
> > words="stopwords_en.txt"
> > enablePositionIncrements="true"
> > />
> > 
> > 
> >  > protected="protwords.txt"/>
> > 
> > 
> >   
> >   
> > 
> >  > ignoreCase="true" expand="true"/>
> >  > ignoreCase="true"
> > words="stopwords_en.txt"
> > enablePositionIncrements="true"
> > />
> > 
> > 
> >  > protected="protwords.txt"/>
> > 
> > 
> >   
> > 
> >
> > 
> >  > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >   
> > 
> > 
> > 
> >  > ignoreCase="true"
> > words="stopwords_en.txt"
> > enablePositionIncrements="true"
> > />
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" ca

Re: optimize status

2015-06-30 Thread Shawn Heisey
On 6/29/2015 2:48 PM, Reitzel, Charles wrote:
> I take your point about shards and segments being different things.  I 
> understand that the hash ranges per segment are not kept in ZK.   I guess I 
> wish they were.
>
> In this regard, I liked Mongodb, uses a 2-level sharding scheme.   Each shard 
> manages a list of  "chunks", each has its own hash range which is kept in the 
> cluster state.   If data needs to be balanced across nodes, it works at the 
> chunk level.  No record/doc level I/O is necessary.   Much more targeted and 
> only the data that needs to move is touched.  Solr does most things better 
> than Mongo, imo.  But this is one area where the Mongo got it right.

Segment detail would not only lead to a data explosion in the
clusterstate, it would be crossing abstraction boundaries, and would
potentially require updating the clusterstate just because a single
document was inserted into the index.  That one tiny update could (and
probably would) create a new segment on one shard.  Due to the way
SolrCloud replicates data during normal operation, every replica for a
given shard might have a different set of segments, which means segments
would need to be tracked at the replica level, not the shard level.

Also, Solr cannot control which hash ranges end up in each segment. 
Solr only knows about the index as a whole ... implementation details
like segments are left entirely up to Lucene, and although I admit to
not knowing Lucene internals very well, I don't think Lucene offers any
way to control that either.  You mention that MongoDB dictates which
hash ranges end up in each chunk.  That implies that MongoDB can control
each chunk.  If we move the analogy to Solr, it breaks down because Solr
cannot control segments.  Although Solr does have several configuration
knobs that affect how segments are created, those configurations are
simply passed through to Lucene, Solr itself does not use that information.

Thanks,
Shawn



Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Alessandro Benedetti
No worries, it is not a big deal you shared the schema.xml, I said that
only because it turned the mail a little hard to read, anyway, in my
opinion the query is correct, so the problem should reside elsewhere.

Can you share the solrconfig.xml piece for your select request handler ?
Probably it is not the problem, but can give us more info.
I find text to be stored, so highlighting should work.

>From official documentation :

"The standard highlighter (AKA the default highlighter) doesn't require any
special indexing parameters on the fields to highlight.  However you can
optionally turn on termVectors, termPositions, and termOffsets for any
field to be highlighted. This will avoid having to run documents through
the analysis chain at query-time and will make highlighting significantly
faster and use less memory, particularly for large text fields, and even
more so when hl.usePhraseHighlighter is enabled."

So you should be ok.

Keep us posted


2015-06-30 16:00 GMT+01:00 Mark Ehle :

> Alessandro -
>
> Someone asked to see the schema, I posted it. Should I have just attached
> it? Does this mailing list support that?
>
> I am by no means a SOLR expert. I am a PHP coder who wrote a
> (very-much-loved by our library staff and patrons) newspaper indexing tool
> that I am trying to update. I only know enough about SOLR to install it,
> and index and query. All I did to the 5.2 schema was add the
> newspaper-specific fields that was in the old schema.
>
> I cannot answer most of your questions. I just know that this url:
>
> http://127.0.0.1:8080/solr/newspapers/select?q=%22JOHN+GRAP%22&fl=year&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>
> used to produce snippets of highlited text in 4.6. In 5.2 it does not.
>
>
> Thanks -
>
> Mark Ehle
> Computer Support Librarian
> Willard Library
> Battle Creek, MI
>
>
> On Tue, Jun 30, 2015 at 10:50 AM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Instead of your immense schema, can you give us the details of the
> > Highlight you are trying to use ?
> > And how you are trying to use it ?
> > Which client ? Direct APi calls ?
> >
> > let us know!
> >
> > Cheers
> >
> > 2015-06-30 15:10 GMT+01:00 Mark Ehle :
> >
> > > Thanks to all for the help - it's now storing text and I can search and
> > get
> > > results just before in 4.6, but I cannot get snippets to appear when I
> > ask
> > > for highlighting.
> > >
> > >
> > > when I add documents, here is the URL my script generates:
> > >
> > >
> > >
> >
> http://localhost:8080/solr/newspapers/update/extract?literal.id=2015_01_01_battlecreekenquirer-004&literal.publication_date=2015-01-01T00:00:00Z&literal.year=2015&literal.yearstr=2015&literal.day=1&literal.month_num=1&literal.month=01_January&literal.publication_name=Battle%20Creek%20Enquirer&literal.publication_type=newspaper&literal.short_name=battlecreekenquirer&literal.image_number=4&literal.filename=2015_01_01_battlecreekenquirer-004.pdf&literal.copyright_year=1923&literal.copyright_restricted=y&fmap.content=publication_text&stream.contentType=application%2Ftxt&stream.file=%2Farchive_data%2Fnewspapers%2FBattle%20Creek%20Enquirer%2F2015%2F01_January%2F2015_01_01_battlecreekenquirer%2Ftxt%2F2015_01_01_battlecreekenquirer-004.txt
> > >
> > >
> > > And here is my schema:
> > >
> > > 
> > >
> > > 
> > >
> > > 
> > >
> > > 
> > >   
> > >
> > >   
> > > 
> > >
> > > 
> > >  sortMissingLast="true"
> > > omitNorms="true"/>
> > >
> > > 
> > >  > sortMissingLast="true"
> > > omitNorms="true"/>
> > > 
> > > 
> > >
> > > 
> > >
> > > 
> > >  > > omitNorms="true" positionIncrementGap="0"/>
> > >  precisionStep="0"
> > > omitNorms="true" positionIncrementGap="0"/>
> > >  > > omitNorms="true" positionIncrementGap="0"/>
> > >  > precisionStep="0"
> > > omitNorms="true" positionIncrementGap="0"/>
> > >
> > > 
> > >  > > omitNorms="true" positionIncrementGap="0"/>
> > >  > precisionStep="8"
> > > omitNorms="true" positionIncrementGap="0"/>
> > >  precisionStep="8"
> > > omitNorms="true" positionIncrementGap="0"/>
> > >  > > precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
> > >
> > > 
> > >  > > precisionStep="0" positionIncrementGap="0"/>
> > >
> > > 
> > >  > > precisionStep="6" positionIncrementGap="0"/>
> > >
> > >
> > > 
> > >
> > >
> > > 
> > >  > > sortMissingLast="true" omitNorms="true"/>
> > >  > > sortMissingLast="true" omitNorms="true"/>
> > >  > > sortMissingLast="true" omitNorms="true"/>
> > >  > > sortMissingLast="true" omitNorms="true"/>
> > >
> > >
> > > 
> > >  indexed="true"
> > />
> > >
> > > 
> > >
> > > 
> > >
> > > 
> > >  > > positionIncrementGap="100">
> > >   
> > > 
> > >   
> > > 
> > >
> > > 
> > >  > > positionIncrementGap="100">
> > >   
> > > 
> > >  > > words="stopwords.txt" enablePositionIncre

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Alessandro Benedetti
But what do you mean with the complete document ? Is it not available
anymore ?
So you have lost your original document and you want to try to reconstruct
from the index ?

2015-06-30 16:05 GMT+01:00 dinesh naik :

> Hi Alessandro,
> I am able to check the field wise analyzed results.
>
> I was interested in getting the complete document.
>
> As Erick mentioned -
> Reconstructing the doc from the
> postings lists isactually quite tedious. The Luke program (not request
> handler) has a
> function that
> does this, it's not fast though, more for troubleshooting than trying to do
> anything in a production environment.
>
> I ll try looking into the Luke program if i can get this done.
>
> Thanks and Best Regards,
> Dinesh Naik
>
> On Tue, Jun 30, 2015 at 7:42 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Do you have the original document available ? Or stored in the field of
> > interest ?
> > Should be quite an easy test to reproduce the Analysis simply using the
> > analysis tool Upaya and Erick suggested.
> > Just use your real document content and you will see how it is exactly
> > analysed.
> >
> > Cheers
> >
> > 2015-06-30 15:03 GMT+01:00 dinesh naik :
> >
> > > Hi Erick,
> > >
> > > I agree with you.
> > >
> > > But i was checking if we could  get  hold on the whole document (to see
> > all
> > > analyzed field values) .
> > >
> > > There might be chances that field value is common for multiple
> documents
> > .
> > > In such cases it will be difficult to backtrack which document has the
> > > issue . Because admin/analysis can be used to see for field level
> > analysis
> > > only.
> > >
> > >
> > >
> > > Best Regards,
> > > Dinesh Naik
> > >
> > > On Tue, Jun 30, 2015 at 7:08 PM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > > > Dinesh:
> > > >
> > > > This is what the admin/analysis page is for. It shows you exactly
> > > > what tokens are produced by what steps in the analysis chain.
> > > > That would be far better than trying to analyze the indexed
> > > > terms.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Tue, Jun 30, 2015 at 8:35 AM, dinesh naik <
> > dineshkumarn...@gmail.com>
> > > > wrote:
> > > > > Hi Erick,
> > > > > This is mainly for debugging purpose. If i have 20M records and few
> > > > fields
> > > > > in some of the documents are not indexed as expected or something
> > went
> > > > > wrong during indexing then how do we pin point the exact issue and
> > fix
> > > > the
> > > > > problem?
> > > > >
> > > > >
> > > > > Best Regards,
> > > > > Dinesh Naik
> > > > >
> > > > > On Tue, Jun 30, 2015 at 5:56 PM, Erick Erickson <
> > > erickerick...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > >> In short, not unless you want to get into low-level Lucene coding.
> > > > >> Inverted indexes are, well, inverted so their very structure makes
> > > > >> this difficult. It looks like this:
> > > > >>
> > > > >> But I'm not convinced yet that this isn't an XY problem. What is
> the
> > > > >> high-level problem you're trying to solve here? Maybe there's
> > another
> > > > >> way to go about it.
> > > > >>
> > > > >> Best,
> > > > >> Erick
> > > > >>
> > > > >> On Tue, Jun 30, 2015 at 3:32 AM, dinesh naik <
> > > dineshkumarn...@gmail.com
> > > > >
> > > > >> wrote:
> > > > >> > Thanks Eric and Upayavira for your inputs.
> > > > >> >
> > > > >> > Is there a way i can associate this to a unique id of document,
> > > either
> > > > >> > using schema browser or TermsComponent?
> > > > >> >
> > > > >> > Best Regards,
> > > > >> > Dinesh Naik
> > > > >> >
> > > > >> > On Tue, Jun 30, 2015 at 2:55 AM, Upayavira 
> > wrote:
> > > > >> >
> > > > >> >> Use the schema browser on the admin UI, and click the "load
> term
> > > > info"
> > > > >> >> button. It'll show you the terms in your index.
> > > > >> >>
> > > > >> >> You can also use the analysis tab which will show you how it
> > would
> > > > >> >> tokenise stuff for a specific field.
> > > > >> >>
> > > > >> >> Upayavira
> > > > >> >>
> > > > >> >> On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote:
> > > > >> >> > Hi Eric,
> > > > >> >> > By compressed value I meant value of a field after removing
> > > special
> > > > >> >> > characters . In my example its "-". Compressed form of
> > red-apple
> > > is
> > > > >> >> > redapple .
> > > > >> >> >
> > > > >> >> > I wanted to know if we can see the analyzed version of
> fields .
> > > > >> >> >
> > > > >> >> > For example if I use ngram on a field , how do I see the
> > analyzed
> > > > >> values
> > > > >> >> > in index ?
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > -Original Message-
> > > > >> >> > From: "Erick Erickson" 
> > > > >> >> > Sent: ‎29-‎06-‎2015 18:12
> > > > >> >> > To: "solr-user@lucene.apache.org" <
> solr-user@lucene.apache.org
> > >
> > > > >> >> > Subject: Re: Reading indexed data from solr 5.1.0 using
> > > admin/luke?
> > > > >> >> >
> > > > >> >> > Not quite sure 

Re: DIH deletes cause opening of searchers

2015-06-30 Thread Shawn Heisey

On 6/25/2015 2:20 AM, Mikhail Khludnev wrote:

On Tue, Jun 23, 2015 at 9:23 AM, Rudolf Grigeľ  wrote:

How can I prevent opening new searcher after
every delete statement ?

comment  tag in solrconfig.xml (it always help)


The presence or absence of the updateLog should not affect whether new 
searchers are opened.  If this change actually works, I'm pretty sure 
that's a bug.


IMHO the updateLog should always be enabled on Solr 4.x and up, and 
autoCommit with openSearcher set to false should be configured so the 
transaction logs do not get huge.


Thanks,
Shawn



Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Mark Ehle
Do you mean this?:



  explicit
  10
  



On Tue, Jun 30, 2015 at 12:11 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> No worries, it is not a big deal you shared the schema.xml, I said that
> only because it turned the mail a little hard to read, anyway, in my
> opinion the query is correct, so the problem should reside elsewhere.
>
> Can you share the solrconfig.xml piece for your select request handler ?
> Probably it is not the problem, but can give us more info.
> I find text to be stored, so highlighting should work.
>
> From official documentation :
>
> "The standard highlighter (AKA the default highlighter) doesn't require any
> special indexing parameters on the fields to highlight.  However you can
> optionally turn on termVectors, termPositions, and termOffsets for any
> field to be highlighted. This will avoid having to run documents through
> the analysis chain at query-time and will make highlighting significantly
> faster and use less memory, particularly for large text fields, and even
> more so when hl.usePhraseHighlighter is enabled."
>
> So you should be ok.
>
> Keep us posted
>
>
> 2015-06-30 16:00 GMT+01:00 Mark Ehle :
>
> > Alessandro -
> >
> > Someone asked to see the schema, I posted it. Should I have just attached
> > it? Does this mailing list support that?
> >
> > I am by no means a SOLR expert. I am a PHP coder who wrote a
> > (very-much-loved by our library staff and patrons) newspaper indexing
> tool
> > that I am trying to update. I only know enough about SOLR to install it,
> > and index and query. All I did to the 5.2 schema was add the
> > newspaper-specific fields that was in the old schema.
> >
> > I cannot answer most of your questions. I just know that this url:
> >
> >
> http://127.0.0.1:8080/solr/newspapers/select?q=%22JOHN+GRAP%22&fl=year&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >
> > used to produce snippets of highlited text in 4.6. In 5.2 it does not.
> >
> >
> > Thanks -
> >
> > Mark Ehle
> > Computer Support Librarian
> > Willard Library
> > Battle Creek, MI
> >
> >
> > On Tue, Jun 30, 2015 at 10:50 AM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > Instead of your immense schema, can you give us the details of the
> > > Highlight you are trying to use ?
> > > And how you are trying to use it ?
> > > Which client ? Direct APi calls ?
> > >
> > > let us know!
> > >
> > > Cheers
> > >
> > > 2015-06-30 15:10 GMT+01:00 Mark Ehle :
> > >
> > > > Thanks to all for the help - it's now storing text and I can search
> and
> > > get
> > > > results just before in 4.6, but I cannot get snippets to appear when
> I
> > > ask
> > > > for highlighting.
> > > >
> > > >
> > > > when I add documents, here is the URL my script generates:
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8080/solr/newspapers/update/extract?literal.id=2015_01_01_battlecreekenquirer-004&literal.publication_date=2015-01-01T00:00:00Z&literal.year=2015&literal.yearstr=2015&literal.day=1&literal.month_num=1&literal.month=01_January&literal.publication_name=Battle%20Creek%20Enquirer&literal.publication_type=newspaper&literal.short_name=battlecreekenquirer&literal.image_number=4&literal.filename=2015_01_01_battlecreekenquirer-004.pdf&literal.copyright_year=1923&literal.copyright_restricted=y&fmap.content=publication_text&stream.contentType=application%2Ftxt&stream.file=%2Farchive_data%2Fnewspapers%2FBattle%20Creek%20Enquirer%2F2015%2F01_January%2F2015_01_01_battlecreekenquirer%2Ftxt%2F2015_01_01_battlecreekenquirer-004.txt
> > > >
> > > >
> > > > And here is my schema:
> > > >
> > > > 
> > > >
> > > > 
> > > >
> > > > 
> > > >
> > > > 
> > > >   
> > > >
> > > >   
> > > > 
> > > >
> > > > 
> > > >  > sortMissingLast="true"
> > > > omitNorms="true"/>
> > > >
> > > > 
> > > >  > > sortMissingLast="true"
> > > > omitNorms="true"/>
> > > > 
> > > > 
> > > >
> > > > 
> > > >
> > > > 
> > > >  > > > omitNorms="true" positionIncrementGap="0"/>
> > > >  > precisionStep="0"
> > > > omitNorms="true" positionIncrementGap="0"/>
> > > >  precisionStep="0"
> > > > omitNorms="true" positionIncrementGap="0"/>
> > > >  > > precisionStep="0"
> > > > omitNorms="true" positionIncrementGap="0"/>
> > > >
> > > > 
> > > >  precisionStep="8"
> > > > omitNorms="true" positionIncrementGap="0"/>
> > > >  > > precisionStep="8"
> > > > omitNorms="true" positionIncrementGap="0"/>
> > > >  > precisionStep="8"
> > > > omitNorms="true" positionIncrementGap="0"/>
> > > >  > > > precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
> > > >
> > > > 
> > > >  omitNorms="true"
> > > > precisionStep="0" positionIncrementGap="0"/>
> > > >
> > > > 
> > > >  omitNorms="true"
> > > > precisionStep="6" positionIncrementGap="0"/>
> > > >
> > > >
> > > > 
> > > >
> > > >
> > > > 
> > > >  > > > sortMissingLast="true" omitNorms=

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread Erick Erickson
I'd set  filterCache and queryResultCache to zero (size and autowarm count)

Leave documentCache alone IMO as it's used to store documents on disk
as the pass through various query components and doesn't autowarm anyway.
I'd think taking it out would skew your results because of multiple
decompressions.

Best,
Erick

On Tue, Jun 30, 2015 at 10:29 AM, wwang525  wrote:
> Hi,
>
> I am currently investigating the queries with a much small index size (1M)
> to see the grouping, faceting on the performance degradation. This will
> allow me to do a lot of tests in a short period of time.
>
> However, it looks like the query is executed much faster the second time.
> This is tested after re-indexing, and not immediately executed again. It
> looks like it may be due to auto warming during or after re-indexing?
>
> I would like to get the response profile (query, faceting etc) for the same
> query in two separate requests without any cache or warming so that I get a
> good average number and not much fluctuation. What are the settings that I
> need to disable (temporarily) just for the purpose of the investigation? In
> the solrconfig.xml, I can see filterCache, queryResultCache, documentCache
> etc. I am not sure what need to be disabled to facilitate my work.
>
> I understand that cache and warming setting will be very helpful in load
> test later on. However, if I can optimize the query in a single request
> scenario, the performance will be in a much better shape with all the cache
> and warming setting during a load test scenario.
>
> Thanks,
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4214968.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH deletes cause opening of searchers

2015-06-30 Thread Erick Erickson
>From the log fragment it's at least worth further investigation.

You've had 4 searchers open in less than 1/2 second. That's
horribly fast, but you already know that...

Let's see the DIH configs, perhaps there's something
innocent-seeming there that's causing this. Or, there's
a bug somewhere.

Best,
Erick

On Tue, Jun 30, 2015 at 12:17 PM, Shawn Heisey  wrote:
> On 6/25/2015 2:20 AM, Mikhail Khludnev wrote:
>>
>> On Tue, Jun 23, 2015 at 9:23 AM, Rudolf Grigeľ  wrote:
>>>
>>> How can I prevent opening new searcher after
>>> every delete statement ?
>>
>> comment  tag in solrconfig.xml (it always help)
>
>
> The presence or absence of the updateLog should not affect whether new
> searchers are opened.  If this change actually works, I'm pretty sure that's
> a bug.
>
> IMHO the updateLog should always be enabled on Solr 4.x and up, and
> autoCommit with openSearcher set to false should be configured so the
> transaction logs do not get huge.
>
> Thanks,
> Shawn
>


Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Erick Erickson
Something's not right here. Your query does not specify any field,
you have q="JOHN GRAP". Which should parse as
q=default_search_field:"JOHN GRAP".

BUT, you've commented the default field out of the select request handler.
I don't _think_ that there's a default in the code, but I've been surprised
before.

Could you add

&debug=true&echoParams=all

to the query and paste the results?

Best,
Erick

On Tue, Jun 30, 2015 at 12:32 PM, Mark Ehle  wrote:
> Do you mean this?:
>
> 
> 
>   explicit
>   10
>   
> 
> 
>
> On Tue, Jun 30, 2015 at 12:11 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
>> No worries, it is not a big deal you shared the schema.xml, I said that
>> only because it turned the mail a little hard to read, anyway, in my
>> opinion the query is correct, so the problem should reside elsewhere.
>>
>> Can you share the solrconfig.xml piece for your select request handler ?
>> Probably it is not the problem, but can give us more info.
>> I find text to be stored, so highlighting should work.
>>
>> From official documentation :
>>
>> "The standard highlighter (AKA the default highlighter) doesn't require any
>> special indexing parameters on the fields to highlight.  However you can
>> optionally turn on termVectors, termPositions, and termOffsets for any
>> field to be highlighted. This will avoid having to run documents through
>> the analysis chain at query-time and will make highlighting significantly
>> faster and use less memory, particularly for large text fields, and even
>> more so when hl.usePhraseHighlighter is enabled."
>>
>> So you should be ok.
>>
>> Keep us posted
>>
>>
>> 2015-06-30 16:00 GMT+01:00 Mark Ehle :
>>
>> > Alessandro -
>> >
>> > Someone asked to see the schema, I posted it. Should I have just attached
>> > it? Does this mailing list support that?
>> >
>> > I am by no means a SOLR expert. I am a PHP coder who wrote a
>> > (very-much-loved by our library staff and patrons) newspaper indexing
>> tool
>> > that I am trying to update. I only know enough about SOLR to install it,
>> > and index and query. All I did to the 5.2 schema was add the
>> > newspaper-specific fields that was in the old schema.
>> >
>> > I cannot answer most of your questions. I just know that this url:
>> >
>> >
>> http://127.0.0.1:8080/solr/newspapers/select?q=%22JOHN+GRAP%22&fl=year&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>> >
>> > used to produce snippets of highlited text in 4.6. In 5.2 it does not.
>> >
>> >
>> > Thanks -
>> >
>> > Mark Ehle
>> > Computer Support Librarian
>> > Willard Library
>> > Battle Creek, MI
>> >
>> >
>> > On Tue, Jun 30, 2015 at 10:50 AM, Alessandro Benedetti <
>> > benedetti.ale...@gmail.com> wrote:
>> >
>> > > Instead of your immense schema, can you give us the details of the
>> > > Highlight you are trying to use ?
>> > > And how you are trying to use it ?
>> > > Which client ? Direct APi calls ?
>> > >
>> > > let us know!
>> > >
>> > > Cheers
>> > >
>> > > 2015-06-30 15:10 GMT+01:00 Mark Ehle :
>> > >
>> > > > Thanks to all for the help - it's now storing text and I can search
>> and
>> > > get
>> > > > results just before in 4.6, but I cannot get snippets to appear when
>> I
>> > > ask
>> > > > for highlighting.
>> > > >
>> > > >
>> > > > when I add documents, here is the URL my script generates:
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> http://localhost:8080/solr/newspapers/update/extract?literal.id=2015_01_01_battlecreekenquirer-004&literal.publication_date=2015-01-01T00:00:00Z&literal.year=2015&literal.yearstr=2015&literal.day=1&literal.month_num=1&literal.month=01_January&literal.publication_name=Battle%20Creek%20Enquirer&literal.publication_type=newspaper&literal.short_name=battlecreekenquirer&literal.image_number=4&literal.filename=2015_01_01_battlecreekenquirer-004.pdf&literal.copyright_year=1923&literal.copyright_restricted=y&fmap.content=publication_text&stream.contentType=application%2Ftxt&stream.file=%2Farchive_data%2Fnewspapers%2FBattle%20Creek%20Enquirer%2F2015%2F01_January%2F2015_01_01_battlecreekenquirer%2Ftxt%2F2015_01_01_battlecreekenquirer-004.txt
>> > > >
>> > > >
>> > > > And here is my schema:
>> > > >
>> > > > 
>> > > >
>> > > > 
>> > > >
>> > > > 
>> > > >
>> > > > 
>> > > >   
>> > > >
>> > > >   
>> > > > 
>> > > >
>> > > > 
>> > > > > > sortMissingLast="true"
>> > > > omitNorms="true"/>
>> > > >
>> > > > 
>> > > > > > > sortMissingLast="true"
>> > > > omitNorms="true"/>
>> > > > 
>> > > > 
>> > > >
>> > > > 
>> > > >
>> > > > 
>> > > > > > > > omitNorms="true" positionIncrementGap="0"/>
>> > > > > > precisionStep="0"
>> > > > omitNorms="true" positionIncrementGap="0"/>
>> > > > > precisionStep="0"
>> > > > omitNorms="true" positionIncrementGap="0"/>
>> > > > > > > precisionStep="0"
>> > > > omitNorms="true" positionIncrementGap="0"/>
>> > > >
>> > > > 
>> > > > > precisionStep="8"
>> > > > omitNor

Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Mark Ehle
Here's what I get:

{
  "responseHeader":{
"status":0,
"QTime":27,
"params":{
  "echoParams":"all",
  "fl":"year",
  "df":"_text_",
  "indent":"true",
  "q":"\"JOHN GRAP\"",
  "hl.simple.pre":"",
  "debug":"true",
  "hl.simple.post":"",
  "hl.fl":"text",
  "wt":"json",
  "hl":"true",
  "rows":"10"}},
  "response":{"numFound":3,"start":0,"docs":[
  {
"year":[2015]},
  {
"year":[2015]},
  {
"year":[2015]}]
  },
  "highlighting":{
"2015_01_01_battlecreekenquirer-001":{},
"2015_01_04_battlecreekenquirer-003":{},
"2015_01_01_battlecreekenquirer-002":{}},
  "debug":{
"rawquerystring":"\"JOHN GRAP\"",
"querystring":"\"JOHN GRAP\"",
"parsedquery":"PhraseQuery(_text_:\"john grap\")",
"parsedquery_toString":"_text_:\"john grap\"",
"explain":{
  "2015_01_01_battlecreekenquirer-001":"\n0.23774907 =
weight(_text_:\"john grap\" in 0) [DefaultSimilarity], result of:\n
0.23774907 = fieldWeight in 0, product of:\n2.0 = tf(freq=4.0),
with freq of:\n  4.0 = phraseFreq=4.0\n6.086376 = idf(), sum
of:\n  1.8675005 = idf(docFreq=41, maxDocs=100)\n  4.218876 =
idf(docFreq=3, maxDocs=100)\n0.01953125 = fieldNorm(doc=0)\n",
  "2015_01_04_battlecreekenquirer-003":"\n0.13449118 =
weight(_text_:\"john grap\" in 98) [DefaultSimilarity], result of:\n
0.13449118 = fieldWeight in 98, product of:\n1.4142135 =
tf(freq=2.0), with freq of:\n  2.0 = phraseFreq=2.0\n6.086376
= idf(), sum of:\n  1.8675005 = idf(docFreq=41, maxDocs=100)\n
 4.218876 = idf(docFreq=3, maxDocs=100)\n0.015625 =
fieldNorm(doc=98)\n",
  "2015_01_01_battlecreekenquirer-002":"\n0.10086838 =
weight(_text_:\"john grap\" in 1) [DefaultSimilarity], result of:\n
0.10086838 = fieldWeight in 1, product of:\n1.4142135 =
tf(freq=2.0), with freq of:\n  2.0 = phraseFreq=2.0\n6.086376
= idf(), sum of:\n  1.8675005 = idf(docFreq=41, maxDocs=100)\n
 4.218876 = idf(docFreq=3, maxDocs=100)\n0.01171875 =
fieldNorm(doc=1)\n"},
"QParser":"LuceneQParser",
"timing":{
  "time":27.0,
  "prepare":{
"time":4.0,
"query":{
  "time":4.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":23.0,
"query":{
  "time":14.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":1.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"debug":{
  "time":8.0}



On Tue, Jun 30, 2015 at 1:13 PM, Erick Erickson 
wrote:

> Something's not right here. Your query does not specify any field,
> you have q="JOHN GRAP". Which should parse as
> q=default_search_field:"JOHN GRAP".
>
> BUT, you've commented the default field out of the select request handler.
> I don't _think_ that there's a default in the code, but I've been surprised
> before.
>
> Could you add
>
> &debug=true&echoParams=all
>
> to the query and paste the results?
>
> Best,
> Erick
>
> On Tue, Jun 30, 2015 at 12:32 PM, Mark Ehle  wrote:
> > Do you mean this?:
> >
> > 
> > 
> >   explicit
> >   10
> >   
> > 
> > 
> >
> > On Tue, Jun 30, 2015 at 12:11 PM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> >> No worries, it is not a big deal you shared the schema.xml, I said that
> >> only because it turned the mail a little hard to read, anyway, in my
> >> opinion the query is correct, so the problem should reside elsewhere.
> >>
> >> Can you share the solrconfig.xml piece for your select request handler ?
> >> Probably it is not the problem, but can give us more info.
> >> I find text to be stored, so highlighting should work.
> >>
> >> From official documentation :
> >>
> >> "The standard highlighter (AKA the default highlighter) doesn't require
> any
> >> special indexing parameters on the fields to highlight.  However you can
> >> optionally turn on termVectors, termPositions, and termOffsets for any
> >> field to be highlighted. This will avoid having to run documents through
> >> the analysis chain at query-time and will make highlighting
> significantly
> >> faster and use less memory, particularly for large text fields, and even
> >> more so when hl.usePhraseHighlighter is enabled."
> >>
> >> So you should be ok.
> >>
> >> Keep us posted
> >>
> >>
> >> 2015-06-30 16:00 GMT+01:00 Mark Ehle :
> >>
> >> > Alessandro -
> >> >
> >> > Someone asked to see the schema, I posted it. Should I have just
> attached
> >> > it? Does this mailing list support that?
> >> >
> >> > I am by no means a SOLR expert. I am a PHP coder wh

RE: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Dinesh Naik
Hi Alessandro,

Lets say I have 20M documents with 50 fields in each. 

I have applied text analysis like compression,ngram,synonym expansion  on these 
fields. 

Checking individually field level analysis can be easily done via 
admin/analysis . But I need to do 50 times analysis check for these 50 fields .

I wanted to know if solr provides a way to see all these analyzed fields at 
once (for ex. By using unique id ).

Best Regards,
Dinesh Naik

-Original Message-
From: "Alessandro Benedetti" 
Sent: ‎30-‎06-‎2015 21:43
To: "solr-user@lucene.apache.org" 
Subject: Re: Reading indexed data from solr 5.1.0 using admin/luke?

But what do you mean with the complete document ? Is it not available
anymore ?
So you have lost your original document and you want to try to reconstruct
from the index ?

2015-06-30 16:05 GMT+01:00 dinesh naik :

> Hi Alessandro,
> I am able to check the field wise analyzed results.
>
> I was interested in getting the complete document.
>
> As Erick mentioned -
> Reconstructing the doc from the
> postings lists isactually quite tedious. The Luke program (not request
> handler) has a
> function that
> does this, it's not fast though, more for troubleshooting than trying to do
> anything in a production environment.
>
> I ll try looking into the Luke program if i can get this done.
>
> Thanks and Best Regards,
> Dinesh Naik
>
> On Tue, Jun 30, 2015 at 7:42 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Do you have the original document available ? Or stored in the field of
> > interest ?
> > Should be quite an easy test to reproduce the Analysis simply using the
> > analysis tool Upaya and Erick suggested.
> > Just use your real document content and you will see how it is exactly
> > analysed.
> >
> > Cheers
> >
> > 2015-06-30 15:03 GMT+01:00 dinesh naik :
> >
> > > Hi Erick,
> > >
> > > I agree with you.
> > >
> > > But i was checking if we could  get  hold on the whole document (to see
> > all
> > > analyzed field values) .
> > >
> > > There might be chances that field value is common for multiple
> documents
> > .
> > > In such cases it will be difficult to backtrack which document has the
> > > issue . Because admin/analysis can be used to see for field level
> > analysis
> > > only.
> > >
> > >
> > >
> > > Best Regards,
> > > Dinesh Naik
> > >
> > > On Tue, Jun 30, 2015 at 7:08 PM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > > > Dinesh:
> > > >
> > > > This is what the admin/analysis page is for. It shows you exactly
> > > > what tokens are produced by what steps in the analysis chain.
> > > > That would be far better than trying to analyze the indexed
> > > > terms.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Tue, Jun 30, 2015 at 8:35 AM, dinesh naik <
> > dineshkumarn...@gmail.com>
> > > > wrote:
> > > > > Hi Erick,
> > > > > This is mainly for debugging purpose. If i have 20M records and few
> > > > fields
> > > > > in some of the documents are not indexed as expected or something
> > went
> > > > > wrong during indexing then how do we pin point the exact issue and
> > fix
> > > > the
> > > > > problem?
> > > > >
> > > > >
> > > > > Best Regards,
> > > > > Dinesh Naik
> > > > >
> > > > > On Tue, Jun 30, 2015 at 5:56 PM, Erick Erickson <
> > > erickerick...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > >> In short, not unless you want to get into low-level Lucene coding.
> > > > >> Inverted indexes are, well, inverted so their very structure makes
> > > > >> this difficult. It looks like this:
> > > > >>
> > > > >> But I'm not convinced yet that this isn't an XY problem. What is
> the
> > > > >> high-level problem you're trying to solve here? Maybe there's
> > another
> > > > >> way to go about it.
> > > > >>
> > > > >> Best,
> > > > >> Erick
> > > > >>
> > > > >> On Tue, Jun 30, 2015 at 3:32 AM, dinesh naik <
> > > dineshkumarn...@gmail.com
> > > > >
> > > > >> wrote:
> > > > >> > Thanks Eric and Upayavira for your inputs.
> > > > >> >
> > > > >> > Is there a way i can associate this to a unique id of document,
> > > either
> > > > >> > using schema browser or TermsComponent?
> > > > >> >
> > > > >> > Best Regards,
> > > > >> > Dinesh Naik
> > > > >> >
> > > > >> > On Tue, Jun 30, 2015 at 2:55 AM, Upayavira 
> > wrote:
> > > > >> >
> > > > >> >> Use the schema browser on the admin UI, and click the "load
> term
> > > > info"
> > > > >> >> button. It'll show you the terms in your index.
> > > > >> >>
> > > > >> >> You can also use the analysis tab which will show you how it
> > would
> > > > >> >> tokenise stuff for a specific field.
> > > > >> >>
> > > > >> >> Upayavira
> > > > >> >>
> > > > >> >> On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote:
> > > > >> >> > Hi Eric,
> > > > >> >> > By compressed value I meant value of a field after removing
> > > special
> > > > >> >> > characters . In my example its "-". Compressed form of
> > red-apple
> > > is
> > > > >> >> > redapple .
> > > > >> >> >
> > 

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Test_results_round_2.doc
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215016.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimize status

2015-06-30 Thread Upayavira


On Tue, Jun 30, 2015, at 04:42 PM, Shawn Heisey wrote:
> On 6/29/2015 2:48 PM, Reitzel, Charles wrote:
> > I take your point about shards and segments being different things.  I 
> > understand that the hash ranges per segment are not kept in ZK.   I guess I 
> > wish they were.
> >
> > In this regard, I liked Mongodb, uses a 2-level sharding scheme.   Each 
> > shard manages a list of  "chunks", each has its own hash range which is 
> > kept in the cluster state.   If data needs to be balanced across nodes, it 
> > works at the chunk level.  No record/doc level I/O is necessary.   Much 
> > more targeted and only the data that needs to move is touched.  Solr does 
> > most things better than Mongo, imo.  But this is one area where the Mongo 
> > got it right.
> 
> Segment detail would not only lead to a data explosion in the
> clusterstate, it would be crossing abstraction boundaries, and would
> potentially require updating the clusterstate just because a single
> document was inserted into the index.  That one tiny update could (and
> probably would) create a new segment on one shard.  Due to the way
> SolrCloud replicates data during normal operation, every replica for a
> given shard might have a different set of segments, which means segments
> would need to be tracked at the replica level, not the shard level.
> 
> Also, Solr cannot control which hash ranges end up in each segment. 
> Solr only knows about the index as a whole ... implementation details
> like segments are left entirely up to Lucene, and although I admit to
> not knowing Lucene internals very well, I don't think Lucene offers any
> way to control that either.  You mention that MongoDB dictates which
> hash ranges end up in each chunk.  That implies that MongoDB can control
> each chunk.  If we move the analogy to Solr, it breaks down because Solr
> cannot control segments.  Although Solr does have several configuration
> knobs that affect how segments are created, those configurations are
> simply passed through to Lucene, Solr itself does not use that
> information.

To put it more specifically - when a (hard) commit happens, all of the
documents in that commit are written into a new segment. Thus, it has no
bearing on what hash range is used. A segment can never be edited. When
there are too many, segments are merged into a new one, and the
originals deleted. So, there is no way for Solr/Lucene to insert a
document into anything other than a brand new segment.

Hence, the idea of using a second level of sharding at the segment level
does not fit with how a lucene index is structured.

Upayavira


Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi All,

I did many tests with very consistent test results. Each query was executed
after re-indexing, and only one request was sent to query the index. I
disabled filterCache and queryResultCache for this test based on Erick's
recommendation.

The test document was posted to this email list earlier. Briefly, the query
without grouping and faceting took about 60 ms, and grouping on top of the
same query adds about 15 ms. However, the faceting adds additional 70 ms,
brings it to 140 ms

The index size is only 1 M records. A 10 times of the record size (> 10M)
will likely bring the total response time to > 1 second for these two
queries. My goal is to make the query as performant as possible so that we
can achieve a < 1 second response time under load.

Is a 50 ms to 60 ms response time (single request scenario) a bit too long
for 1M records with Solr? Is the faceting taking too long  (70 ms)to
process?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215019.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-30 Thread Upayavira
you can call the same API as the admin UI does. Pass it strings, it
returns tokens in json/xml/whatever.

Upayavira

On Tue, Jun 30, 2015, at 06:55 PM, Dinesh Naik wrote:
> Hi Alessandro,
> 
> Lets say I have 20M documents with 50 fields in each. 
> 
> I have applied text analysis like compression,ngram,synonym expansion  on
> these fields. 
> 
> Checking individually field level analysis can be easily done via
> admin/analysis . But I need to do 50 times analysis check for these 50
> fields .
> 
> I wanted to know if solr provides a way to see all these analyzed fields
> at once (for ex. By using unique id ).
> 
> Best Regards,
> Dinesh Naik
> 
> -Original Message-
> From: "Alessandro Benedetti" 
> Sent: ‎30-‎06-‎2015 21:43
> To: "solr-user@lucene.apache.org" 
> Subject: Re: Reading indexed data from solr 5.1.0 using admin/luke?
> 
> But what do you mean with the complete document ? Is it not available
> anymore ?
> So you have lost your original document and you want to try to
> reconstruct
> from the index ?
> 
> 2015-06-30 16:05 GMT+01:00 dinesh naik :
> 
> > Hi Alessandro,
> > I am able to check the field wise analyzed results.
> >
> > I was interested in getting the complete document.
> >
> > As Erick mentioned -
> > Reconstructing the doc from the
> > postings lists isactually quite tedious. The Luke program (not request
> > handler) has a
> > function that
> > does this, it's not fast though, more for troubleshooting than trying to do
> > anything in a production environment.
> >
> > I ll try looking into the Luke program if i can get this done.
> >
> > Thanks and Best Regards,
> > Dinesh Naik
> >
> > On Tue, Jun 30, 2015 at 7:42 PM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > Do you have the original document available ? Or stored in the field of
> > > interest ?
> > > Should be quite an easy test to reproduce the Analysis simply using the
> > > analysis tool Upaya and Erick suggested.
> > > Just use your real document content and you will see how it is exactly
> > > analysed.
> > >
> > > Cheers
> > >
> > > 2015-06-30 15:03 GMT+01:00 dinesh naik :
> > >
> > > > Hi Erick,
> > > >
> > > > I agree with you.
> > > >
> > > > But i was checking if we could  get  hold on the whole document (to see
> > > all
> > > > analyzed field values) .
> > > >
> > > > There might be chances that field value is common for multiple
> > documents
> > > .
> > > > In such cases it will be difficult to backtrack which document has the
> > > > issue . Because admin/analysis can be used to see for field level
> > > analysis
> > > > only.
> > > >
> > > >
> > > >
> > > > Best Regards,
> > > > Dinesh Naik
> > > >
> > > > On Tue, Jun 30, 2015 at 7:08 PM, Erick Erickson <
> > erickerick...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Dinesh:
> > > > >
> > > > > This is what the admin/analysis page is for. It shows you exactly
> > > > > what tokens are produced by what steps in the analysis chain.
> > > > > That would be far better than trying to analyze the indexed
> > > > > terms.
> > > > >
> > > > > Best,
> > > > > Erick
> > > > >
> > > > > On Tue, Jun 30, 2015 at 8:35 AM, dinesh naik <
> > > dineshkumarn...@gmail.com>
> > > > > wrote:
> > > > > > Hi Erick,
> > > > > > This is mainly for debugging purpose. If i have 20M records and few
> > > > > fields
> > > > > > in some of the documents are not indexed as expected or something
> > > went
> > > > > > wrong during indexing then how do we pin point the exact issue and
> > > fix
> > > > > the
> > > > > > problem?
> > > > > >
> > > > > >
> > > > > > Best Regards,
> > > > > > Dinesh Naik
> > > > > >
> > > > > > On Tue, Jun 30, 2015 at 5:56 PM, Erick Erickson <
> > > > erickerick...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> In short, not unless you want to get into low-level Lucene coding.
> > > > > >> Inverted indexes are, well, inverted so their very structure makes
> > > > > >> this difficult. It looks like this:
> > > > > >>
> > > > > >> But I'm not convinced yet that this isn't an XY problem. What is
> > the
> > > > > >> high-level problem you're trying to solve here? Maybe there's
> > > another
> > > > > >> way to go about it.
> > > > > >>
> > > > > >> Best,
> > > > > >> Erick
> > > > > >>
> > > > > >> On Tue, Jun 30, 2015 at 3:32 AM, dinesh naik <
> > > > dineshkumarn...@gmail.com
> > > > > >
> > > > > >> wrote:
> > > > > >> > Thanks Eric and Upayavira for your inputs.
> > > > > >> >
> > > > > >> > Is there a way i can associate this to a unique id of document,
> > > > either
> > > > > >> > using schema browser or TermsComponent?
> > > > > >> >
> > > > > >> > Best Regards,
> > > > > >> > Dinesh Naik
> > > > > >> >
> > > > > >> > On Tue, Jun 30, 2015 at 2:55 AM, Upayavira 
> > > wrote:
> > > > > >> >
> > > > > >> >> Use the schema browser on the admin UI, and click the "load
> > term
> > > > > info"
> > > > > >> >> button. It'll show you the terms in your index.
> > > > > >> >>
> > > > > >> >> You can also use t

Using the DataImportHandler to get filepath from MySQL DataBase BackSlash Character Problem

2015-06-30 Thread Paden
Hello, 

I'm having a slight "Catch-22" scenario going on with my Solr indexing
process. I'm using the DataImportHandler to pull a filepath from a database.
The problems is that Windows filepaths have the backslash character inside
their paths. 

\\some\filepath

So when insert this data into MySQL I have to use a "\\" in order to get it
to look like it should 

some\\filepath

And everything looks good when I put it into MySQL it looks like 

\\some\filepath

But when I index into Solr it looks like this

some\\filepath

Now I was thinking that I should just go back and put in single backslashes
again since it's printing the exact input that I used to put the string in
the table in MySQL but then it does this wonderful trick

\\somefilepath

It will recognize the front two but not that middle backslashes. 

I know this may seem like more of a MySQL kind of issue but is there any way
that I can get solr to recognize this escape character so it would print the
desired output? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-the-DataImportHandler-to-get-filepath-from-MySQL-DataBase-BackSlash-Character-Problem-tp4215034.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Using the DataImportHandler to get filepath from MySQL DataBase BackSlash Character Problem

2015-06-30 Thread Keswani, Nitin - BLS CTR
Hi Paden,

I believe you could use a PatternReplaceFilterFactory ( 
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceFilterFactory.html
 )
 configured in your fieldtype that could replace '' with '\\' at index time.

Thanks.

Regards,

Nitin Keswani

-Original Message-
From: Paden [mailto:rumsey...@gmail.com] 
Sent: Tuesday, June 30, 2015 3:59 PM
To: solr-user@lucene.apache.org
Subject: Using the DataImportHandler to get filepath from MySQL DataBase 
BackSlash Character Problem

Hello, 

I'm having a slight "Catch-22" scenario going on with my Solr indexing process. 
I'm using the DataImportHandler to pull a filepath from a database.
The problems is that Windows filepaths have the backslash character inside 
their paths. 

\\some\filepath

So when insert this data into MySQL I have to use a "\\" in order to get it to 
look like it should 

some\\filepath

And everything looks good when I put it into MySQL it looks like 

\\some\filepath

But when I index into Solr it looks like this

some\\filepath

Now I was thinking that I should just go back and put in single backslashes 
again since it's printing the exact input that I used to put the string in the 
table in MySQL but then it does this wonderful trick

\\somefilepath

It will recognize the front two but not that middle backslashes. 

I know this may seem like more of a MySQL kind of issue but is there any way 
that I can get solr to recognize this escape character so it would print the 
desired output? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-the-DataImportHandler-to-get-filepath-from-MySQL-DataBase-BackSlash-Character-Problem-tp4215034.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Correcting text at index time

2015-06-30 Thread hossmaa
Hi all

Thanks for the replies. So there's no getting away from doing it on my own
then...

@Jack: I need to replace a whole list of shortened words... It would make a
crazy regex (which I incidentally wouldn't even know how to formulate).

Cheers
A.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4215056.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade to 5.2 from 4.6, no storing of text

2015-06-30 Thread Erick Erickson
It _looks_ like you're searching against _text_ and trying to
highlight on text. On a very brief grep of all the Java code I don't
see _text_ defined anywhere (of course I could be missing something
here).

So none of this makes sense.
> you have no "df" field defined, yet you're getting a default _text_.
> your highlighting mysteriously disappears.

This only really makes sense to me if the /select handler you showed
us isn't really the handler you are using in the Solr instance you're
querying. So I'd check what was in my documents, what the
configuration really was (see admin/select_core/files/solrconfig.xml).
The behavior you report does make some sense if you have indexed to
_text_ but are highlighting on text.

Quick checks (and I'd set my primary fl field (not highlight) to * too):
1> try changing your hl.fl field to _text_
2> try making a fielded search of text:"JOHN GRAP"
3> Look at admin UI/select core/schema browser and see what fields you
actually have in your index.

I'd guess that <1> will show you highlight fragments (assuming _text_
is the real field and stored="true")
<2> will return no hits

If those are true, track down the person who changed _text_ to text
and shoot them.

Best,
Erick


On Tue, Jun 30, 2015 at 1:24 PM, Mark Ehle  wrote:
> Here's what I get:
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":27,
> "params":{
>   "echoParams":"all",
>   "fl":"year",
>   "df":"_text_",
>   "indent":"true",
>   "q":"\"JOHN GRAP\"",
>   "hl.simple.pre":"",
>   "debug":"true",
>   "hl.simple.post":"",
>   "hl.fl":"text",
>   "wt":"json",
>   "hl":"true",
>   "rows":"10"}},
>   "response":{"numFound":3,"start":0,"docs":[
>   {
> "year":[2015]},
>   {
> "year":[2015]},
>   {
> "year":[2015]}]
>   },
>   "highlighting":{
> "2015_01_01_battlecreekenquirer-001":{},
> "2015_01_04_battlecreekenquirer-003":{},
> "2015_01_01_battlecreekenquirer-002":{}},
>   "debug":{
> "rawquerystring":"\"JOHN GRAP\"",
> "querystring":"\"JOHN GRAP\"",
> "parsedquery":"PhraseQuery(_text_:\"john grap\")",
> "parsedquery_toString":"_text_:\"john grap\"",
> "explain":{
>   "2015_01_01_battlecreekenquirer-001":"\n0.23774907 =
> weight(_text_:\"john grap\" in 0) [DefaultSimilarity], result of:\n
> 0.23774907 = fieldWeight in 0, product of:\n2.0 = tf(freq=4.0),
> with freq of:\n  4.0 = phraseFreq=4.0\n6.086376 = idf(), sum
> of:\n  1.8675005 = idf(docFreq=41, maxDocs=100)\n  4.218876 =
> idf(docFreq=3, maxDocs=100)\n0.01953125 = fieldNorm(doc=0)\n",
>   "2015_01_04_battlecreekenquirer-003":"\n0.13449118 =
> weight(_text_:\"john grap\" in 98) [DefaultSimilarity], result of:\n
> 0.13449118 = fieldWeight in 98, product of:\n1.4142135 =
> tf(freq=2.0), with freq of:\n  2.0 = phraseFreq=2.0\n6.086376
> = idf(), sum of:\n  1.8675005 = idf(docFreq=41, maxDocs=100)\n
>  4.218876 = idf(docFreq=3, maxDocs=100)\n0.015625 =
> fieldNorm(doc=98)\n",
>   "2015_01_01_battlecreekenquirer-002":"\n0.10086838 =
> weight(_text_:\"john grap\" in 1) [DefaultSimilarity], result of:\n
> 0.10086838 = fieldWeight in 1, product of:\n1.4142135 =
> tf(freq=2.0), with freq of:\n  2.0 = phraseFreq=2.0\n6.086376
> = idf(), sum of:\n  1.8675005 = idf(docFreq=41, maxDocs=100)\n
>  4.218876 = idf(docFreq=3, maxDocs=100)\n0.01171875 =
> fieldNorm(doc=1)\n"},
> "QParser":"LuceneQParser",
> "timing":{
>   "time":27.0,
>   "prepare":{
> "time":4.0,
> "query":{
>   "time":4.0},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":0.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "debug":{
>   "time":0.0}},
>   "process":{
> "time":23.0,
> "query":{
>   "time":14.0},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":1.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "debug":{
>   "time":8.0}
>
>
>
> On Tue, Jun 30, 2015 at 1:13 PM, Erick Erickson 
> wrote:
>
>> Something's not right here. Your query does not specify any field,
>> you have q="JOHN GRAP". Which should parse as
>> q=default_search_field:"JOHN GRAP".
>>
>> BUT, you've commented the default field out of the select request handler.
>> I don't _think_ that there's a default in the code, but I've been surprised
>> before.
>>
>> Could you add
>>
>> &debug=true&echoParams=all
>>
>> to the query and paste the results?
>>
>> Best,
>> Erick
>>
>> On Tue, Jun 30, 2015 at 12:32 PM, Mark Ehle  wrote:
>> > Do you mean this?:
>> >
>> > 
>> > 
>> >   e

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread Erick Erickson
bq: The index size is only 1 M records. A 10 times of the record size (> 10M)
will likely bring the total response time to > 1 second

This is an extrapolation you simply cannot make. Plus you cannot really tell
anything from just a few queries about system performance. In fact you must
disregard the first few queries due to loading Lucene indexes into memory.

Plus you cannot extrapolate from just a few queries. Part of the time
is loading
the low-level Lucene caches for querying. And I'm assuming that time times
you're reportinga re QTimes, but if they're not then there's the time spent
assembling the response packet (i.e. reading/decompressing the data to get
the stored data) which is almost entirely independent of the number of docs.

In short, I don't have faith that your test methodology is reliable (although
kudos for having methodology at all, lots of people don't!). And I'm 99.99%
sure that you can't rely on the calculation that 10X the number of docs is
10X the response time.

Best,
Erick

On Tue, Jun 30, 2015 at 2:51 PM, wwang525  wrote:
> Hi All,
>
> I did many tests with very consistent test results. Each query was executed
> after re-indexing, and only one request was sent to query the index. I
> disabled filterCache and queryResultCache for this test based on Erick's
> recommendation.
>
> The test document was posted to this email list earlier. Briefly, the query
> without grouping and faceting took about 60 ms, and grouping on top of the
> same query adds about 15 ms. However, the faceting adds additional 70 ms,
> brings it to 140 ms
>
> The index size is only 1 M records. A 10 times of the record size (> 10M)
> will likely bring the total response time to > 1 second for these two
> queries. My goal is to make the query as performant as possible so that we
> can achieve a < 1 second response time under load.
>
> Is a 50 ms to 60 ms response time (single request scenario) a bit too long
> for 1M records with Solr? Is the faceting taking too long  (70 ms)to
> process?
>
> Thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215019.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Correcting text at index time

2015-06-30 Thread Jack Krupansky
You would have to have a separate instance of the update processor, each
with one of the words.

Or, you could code a JavaScript script with the stateless script update
processor that has the long list or words and replacements as two arrays or
an array of objects, and then iterate through the input value and the array.


-- Jack Krupansky

On Tue, Jun 30, 2015 at 5:23 PM, hossmaa  wrote:

> Hi all
>
> Thanks for the replies. So there's no getting away from doing it on my own
> then...
>
> @Jack: I need to replace a whole list of shortened words... It would make a
> crazy regex (which I incidentally wouldn't even know how to formulate).
>
> Cheers
> A.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4215056.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


AUTO: Nicholas M. Wertzberger is out of the office (returning 07/06/2015)

2015-06-30 Thread Nicholas M. Wertzberger


I am out of the office until 07/06/2015.

I'll be out of the office through July 4th.
Please contact Jason Brown for any pressing JAS Team related items.


Note: This is an automated response to your message  "Re: Correcting text
at index time" sent on 6/30/2015 8:55:16 PM.

This is the only notification you will receive while this person is away.
**

This email and any attachments may contain information that is confidential 
and/or privileged for the sole use of the intended recipient.  Any use, review, 
disclosure, copying, distribution or reliance by others, and any forwarding of 
this email or its contents, without the express permission of the sender is 
strictly prohibited by law.  If you are not the intended recipient, please 
contact the sender immediately, delete the e-mail and destroy all copies.
**


Re: Restricting fields returned by Suggester reult.

2015-06-30 Thread ssharma7...@gmail.com
Alessandro Benedetti,
Thanks for the update.

Actually, what I meant by - "Is it possible to restrict the result returned
by Suggeter to "selected" 
fields only?" was like option of "fl" available for querying (/select) in
Solr, wherein there could be some fields as defined in "schema.xml", but we
can pick & choose what we require. This would in effect reduce the amount of
data sent from server.


Regards,
Sachin Vyas.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restricting-fields-returned-by-Suggester-reult-tp4214948p4215132.html
Sent from the Solr - User mailing list archive at Nabble.com.