Re: Normalizing/Returning solr scores between 0 to 1

2013-06-28 Thread Upayavira
And if Solr has to spit it out, perhaps you could do that with a simple
salt transform or velocity template.

Upayavira

On Fri, Jun 28, 2013, at 12:30 AM, Learner wrote:
> Might not be useful but a work around would be to divide all scores by
> max
> score to get scores between 0 and 1.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Normalizing-Returning-solr-scores-between-0-to-1-tp4073797p4073829.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: URL search and indexing

2013-06-28 Thread Flavio Pompermaier
Thanks for the explanation, I was missing exaclty that!
Now things works correctly also using the post script.
However I don't think I need norms if I use id of same lenght (UUID), right?
I just need strings with omitTermFreqAndPositions="false" I think.


On Thu, Jun 27, 2013 at 7:31 PM, Erick Erickson wrote:

> Right, string fields are a little tricky, they're easy to confuse with
> fields that actually _do_ something.
>
> By default, norms and term frequencies are turned off for types based on '
> class="solr.StrField" '. So any field length normalization (i.e. terms that
> appear in shorter fields count more) and term frequencies calculations are
> _not_ include in the score calculation.
>
> Try blowing your index away and adding this to your fields to see the
> difference
>
> omitNorms="false" omitTermFreqAndPositions="false"
>
> You probably want to either turn these on explicitly for your string types
> or use a type based on 'class="solr.TextField" ' since these options
> default to "false" for text fields. If you use something like
> "keywordTokenizerFactory" you also won't get your URL split up into pieces.
> And in that case you can also normalize the values with something like
> lowerCaseFilter which you can't do with "string" types since they're
> completely unanalyzed.
>
> Best
> Erick
>
>
> On Wed, Jun 26, 2013 at 11:34 AM, Flavio Pompermaier
> wrote:
>
> > Obviously I messed up with email thread...however I found a problem
> > indexing my document via post.sh.
> > This is basically my schema.xml:
> >
> > 
> >  
> > > required="true" multiValued="false" />
> > > multiValued="true"/>
> >
> >  
> >  url
> >   
> >  > />
> >  > positionIncrementGap="0"/>
> >  
> > 
> >
> > and this is the document I tried to upload via post.sh:
> >
> > 
> > 
> >   http://test.example.org/first.html
> >   1000
> >   1000
> >   1000
> >   5000
> > 
> > 
> >   http://test.example.org/second.html
> >   1000
> >   5000
> > 
> > 
> >
> > When playing with administration and debugging tools I discovered that
> > searching for q=itemid:5000 gave me the same score for those docs, while
> I
> > was expecting different term frequencies between the first and the
> second.
> > In fact, using java to upload documents lead to correct results (3
> > occurrences of item 1000 in the first doc and 1 in the second), e.g.:
> > document1.addField("itemid", "1000");
> > document1.addField("itemid", "1000");
> > document1.addField("itemid", "1000");
> >
> > Am I right or am I missing something else?
> >
> >
> > On Wed, Jun 26, 2013 at 5:18 PM, Jack Krupansky  > >wrote:
> >
> > > If there is a bug... we should identify it. What's a sample post
> command
> > > that you issued?
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > -Original Message- From: Flavio Pompermaier
> > > Sent: Wednesday, June 26, 2013 10:53 AM
> > >
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: URL search and indexing
> > >
> > > I was doing exactly that and, thanks to the administration page and
> > > explanation/debugging, I checked if results were those expected.
> > > Unfortunately, results were not correct submitting updates trough
> post.sh
> > > script (that use curl in the end).
> > > Probably, if it founds the same tag (same value for the same
> field-name),
> > > it will collapse them.
> > > Rewriting the same document in Java and submitting the updates did the
> > > things work correctly.
> > >
> > > In my opinion this is a bug (of the entire process, then I don't know
> it
> > > this is a problem of curl or of the script itself).
> > >
> > > Best,
> > > Flavio
> > >
> > > On Wed, Jun 26, 2013 at 4:18 PM, Erick Erickson <
> erickerick...@gmail.com
> > >*
> > > *wrote:
> > >
> > >  Flavio:
> > >>
> > >> You mention that you're new to Solr, so I thought I'd make sure
> > >> you know that the admin/analysis page is your friend! I flat
> > >> guarantee that as you try to index/search following the suggestions
> > >> you'll scratch your head at your results and you'll discover that
> > >> the analysis process isn't doing quite what you expect. The
> > >> admin/analysis page shows you the transformation of the input
> > >> at each stage, i.e. how the input is tokenized, what transformations
> > >> are applied to each token etc. It's invaluable!
> > >>
> > >> Best
> > >> Erick
> > >>
> > >> P.S. Feel free to un-check the "verbose" box, it provides lots
> > >> of information but can be overwhelming, especially at first!
> > >>
> > >> On Wed, Jun 26, 2013 at 12:20 AM, Flavio Pompermaier
> > >>  wrote:
> > >> > Ok thank you all for the great help!
> > >> > Now I'm ready to start playing with my index!
> > >> >
> > >> > Best,
> > >> > Flavio
> > >> >
> > >> >
> > >> > On Tue, Jun 25, 2013 at 11:40 PM, Jack Krupansky <
> > >> j...@basetechnology.com>wrote:
> > >> >
> > >> >> Yeah, URL Classify does only do so much. That's why you need to
> > combine
> > >> >> multiple methods.
> > >> >>
> > >> >> As a fourth method, you co

Re: URL search and indexing

2013-06-28 Thread Upayavira
field length normalisation is based upon the number of terms in a field,
not the number of characters in a term. I guess with multivalued string
fields, that would mean a field with lots of values (but one match)
would score lower than one with only one matching value.

Upayavira


On Fri, Jun 28, 2013, at 09:24 AM, Flavio Pompermaier wrote:
> Thanks for the explanation, I was missing exaclty that!
> Now things works correctly also using the post script.
> However I don't think I need norms if I use id of same lenght (UUID),
> right?
> I just need strings with omitTermFreqAndPositions="false" I think.
> 
> 
> On Thu, Jun 27, 2013 at 7:31 PM, Erick Erickson
> wrote:
> 
> > Right, string fields are a little tricky, they're easy to confuse with
> > fields that actually _do_ something.
> >
> > By default, norms and term frequencies are turned off for types based on '
> > class="solr.StrField" '. So any field length normalization (i.e. terms that
> > appear in shorter fields count more) and term frequencies calculations are
> > _not_ include in the score calculation.
> >
> > Try blowing your index away and adding this to your fields to see the
> > difference
> >
> > omitNorms="false" omitTermFreqAndPositions="false"
> >
> > You probably want to either turn these on explicitly for your string types
> > or use a type based on 'class="solr.TextField" ' since these options
> > default to "false" for text fields. If you use something like
> > "keywordTokenizerFactory" you also won't get your URL split up into pieces.
> > And in that case you can also normalize the values with something like
> > lowerCaseFilter which you can't do with "string" types since they're
> > completely unanalyzed.
> >
> > Best
> > Erick
> >
> >
> > On Wed, Jun 26, 2013 at 11:34 AM, Flavio Pompermaier
> > wrote:
> >
> > > Obviously I messed up with email thread...however I found a problem
> > > indexing my document via post.sh.
> > > This is basically my schema.xml:
> > >
> > > 
> > >  
> > > > > required="true" multiValued="false" />
> > > > > multiValued="true"/>
> > >
> > >  
> > >  url
> > >   
> > >  > > />
> > >  > > positionIncrementGap="0"/>
> > >  
> > > 
> > >
> > > and this is the document I tried to upload via post.sh:
> > >
> > > 
> > > 
> > >   http://test.example.org/first.html
> > >   1000
> > >   1000
> > >   1000
> > >   5000
> > > 
> > > 
> > >   http://test.example.org/second.html
> > >   1000
> > >   5000
> > > 
> > > 
> > >
> > > When playing with administration and debugging tools I discovered that
> > > searching for q=itemid:5000 gave me the same score for those docs, while
> > I
> > > was expecting different term frequencies between the first and the
> > second.
> > > In fact, using java to upload documents lead to correct results (3
> > > occurrences of item 1000 in the first doc and 1 in the second), e.g.:
> > > document1.addField("itemid", "1000");
> > > document1.addField("itemid", "1000");
> > > document1.addField("itemid", "1000");
> > >
> > > Am I right or am I missing something else?
> > >
> > >
> > > On Wed, Jun 26, 2013 at 5:18 PM, Jack Krupansky  > > >wrote:
> > >
> > > > If there is a bug... we should identify it. What's a sample post
> > command
> > > > that you issued?
> > > >
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > -Original Message- From: Flavio Pompermaier
> > > > Sent: Wednesday, June 26, 2013 10:53 AM
> > > >
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: URL search and indexing
> > > >
> > > > I was doing exactly that and, thanks to the administration page and
> > > > explanation/debugging, I checked if results were those expected.
> > > > Unfortunately, results were not correct submitting updates trough
> > post.sh
> > > > script (that use curl in the end).
> > > > Probably, if it founds the same tag (same value for the same
> > field-name),
> > > > it will collapse them.
> > > > Rewriting the same document in Java and submitting the updates did the
> > > > things work correctly.
> > > >
> > > > In my opinion this is a bug (of the entire process, then I don't know
> > it
> > > > this is a problem of curl or of the script itself).
> > > >
> > > > Best,
> > > > Flavio
> > > >
> > > > On Wed, Jun 26, 2013 at 4:18 PM, Erick Erickson <
> > erickerick...@gmail.com
> > > >*
> > > > *wrote:
> > > >
> > > >  Flavio:
> > > >>
> > > >> You mention that you're new to Solr, so I thought I'd make sure
> > > >> you know that the admin/analysis page is your friend! I flat
> > > >> guarantee that as you try to index/search following the suggestions
> > > >> you'll scratch your head at your results and you'll discover that
> > > >> the analysis process isn't doing quite what you expect. The
> > > >> admin/analysis page shows you the transformation of the input
> > > >> at each stage, i.e. how the input is tokenized, what transformations
> > > >> are applied to each token etc. It's invaluable!
> > > >>
> > > >> Best
> > > >> Erick
> > > >>
> > > >> P.S. Fe

Re: How spell checker used if indexed document is containing misspelled words

2013-06-28 Thread venkatesham.gu...@igate.com
Thanks for the replies.

I have already tried options mentioned here, apparently those provide
suggestions for the query word which is incorrectly spelled. I am looking a
feature that - my query term is correct and I want the results in those
documents both correct spelled term matches and incorrect spelled term
matches if any - 

For example I am searching for documents which have the term "headache", now
I want search results all the documents which have been correct spelled
headache and also any documents which have been a wrong spelling or typos of
headache like "headcahe" and so on.

Please provide any thoughts and ideas to resolve this problem.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-spell-checker-used-if-indexed-document-is-containing-misspelled-words-tp4070463p4073879.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How spell checker used if indexed document is containing misspelled words

2013-06-28 Thread Upayavira
You're wanting to make your search more fuzzy. You could try phonetic
search, but that's very fuzzy. Go to the analysis tab in the admin UI.
Locate the 'phonetic' field type in the drop down, and you can see what
will happen to terms when they are converted to phonetic equivalents.

Upayavira

On Fri, Jun 28, 2013, at 10:29 AM, venkatesham.gu...@igate.com wrote:
> Thanks for the replies.
> 
> I have already tried options mentioned here, apparently those provide
> suggestions for the query word which is incorrectly spelled. I am looking
> a
> feature that - my query term is correct and I want the results in those
> documents both correct spelled term matches and incorrect spelled term
> matches if any - 
> 
> For example I am searching for documents which have the term "headache",
> now
> I want search results all the documents which have been correct spelled
> headache and also any documents which have been a wrong spelling or typos
> of
> headache like "headcahe" and so on.
> 
> Please provide any thoughts and ideas to resolve this problem.
> 
> Thanks.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-spell-checker-used-if-indexed-document-is-containing-misspelled-words-tp4070463p4073879.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Context search in solr

2013-06-28 Thread venkatesham.gu...@igate.com
My search query is having multiple words ranging from 3 to 8 and a context
attached to it. I am looking for the search result documents which should
have all the terms which are there in query and also terms in the document
should relate or have the similar context.

For example: my search query is "Low blood pressure"
Documents which have part of text like below
1. I am having lower blood pressure.
2. my blood pressure was very low
3. I also take atenolol and norvasc for high blood pressure but never heard
of the protonix causing low magnesium

If I search for all words in the document, result will have 3 documents but
3rd one even though its a keyword match but it has no context with "Low
blood pressure".
If I search for phrase search "Low blood pressure", the result will have
only 1 document, other 2 will not match even though 2nd document is a
probable match

to make my search little intelligence on context, what are the features solr
provides.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Context-search-in-solr-tp4073882.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Context search in solr

2013-06-28 Thread Upayavira
you might use proximity. "low blood pressure"~6 might match #1 and #2
but not #3.

It says find phrases that require six or less position moves in order to
match my terms as a phrase.

Upayavira

On Fri, Jun 28, 2013, at 11:10 AM, venkatesham.gu...@igate.com wrote:
> My search query is having multiple words ranging from 3 to 8 and a
> context
> attached to it. I am looking for the search result documents which should
> have all the terms which are there in query and also terms in the
> document
> should relate or have the similar context.
> 
> For example: my search query is "Low blood pressure"
> Documents which have part of text like below
> 1. I am having lower blood pressure.
> 2. my blood pressure was very low
> 3. I also take atenolol and norvasc for high blood pressure but never
> heard
> of the protonix causing low magnesium
> 
> If I search for all words in the document, result will have 3 documents
> but
> 3rd one even though its a keyword match but it has no context with "Low
> blood pressure".
> If I search for phrase search "Low blood pressure", the result will have
> only 1 document, other 2 will not match even though 2nd document is a
> probable match
> 
> to make my search little intelligence on context, what are the features
> solr
> provides.
> 
> Thanks.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Context-search-in-solr-tp4073882.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why solr 4.0 use FSIndexOutput to write file, otherwise MMap/NIO

2013-06-28 Thread Michael McCandless
Output is quite a bit simpler than input because all we do is write a
single stream of bytes with no seeking ("append only"), and it's done
with only one thread, so I don't think there'd be much to gain by
using the newer IO APIs for writing...

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jun 28, 2013 at 2:23 AM, Jeffery Wang
 wrote:
>
> I have checked the FSDirectory, it will create "MMapDirectory" or 
> "NIOFSDirectory" for Directory.
> This two directory only supply IndexInput extend for read file 
> (MMapIndexInput extends ByteBufferIndexInput),
> why not there is not MMap/NIO IndexOutput extend for file write. It only use 
> FSIndexOutput for file write(FSIndexOutput extends BufferedIndexOutput).
>
> Does FSIndexOutput wirte file very slow than MMap/NIO? How to improve the IO 
> write performance.
>
> Thanks,
> __
> Jeffery Wang
> Application Service - Backend
> Morningstar (Shenzhen) Ltd.
> Morningstar. Illuminating investing worldwide.
> +86 755 3311 0220 Office
> +86 130 7782 2813 Mobile
> jeffery.w...@morningstar.com
> This e-mail contains privileged and confidential information and is intended 
> only for the use of the person(s) named above. Any dissemination, 
> distribution or duplication of this communication without prior written 
> consent from Morningstar is strictly prohibited. If you received this message 
> in error please contact the sender immediately and delete the materials from 
> any computer.
>


Re: Solr admin search with wildcard

2013-06-28 Thread Erick Erickson
This is a no-op, or rather I'm not sure what it does:



This is the key:


But be aware that if you copy anything
else into the "text" field you'll be searching
there too.

Now you can search the "text" field. Assuming
this is from the example, the text field uses the
"text_general" fieldType, which is defined to
use the StandardTokenizerFactory to break up
the incoming stream. Take a look at the javadocs
and/or
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory

The admin/analysis page will show you exactly what each
step in an analyzer chain does to the input, you _really_
want to get familiar with that

One final note, depending on your use-case,
you may not need any copyfield at all, just
use the "text_general" type for your iframe
field If you choose this, be sure to delete
your index and re-index from scratch...


Best
Erick




On Thu, Jun 27, 2013 at 9:41 AM, Amit Sela  wrote:

> Forgive my ignorance but I want to  be sure, do I add  source="iframe" dest="text"/> to solrindex-mapping.xml?
> so that my solrindex-mapping.xml looks like this:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> * *
> 
> url
>
> And what do you mean by standard tokenization ?
>
> Thanks!
>
>
> On Thu, Jun 27, 2013 at 3:43 PM, Jack Krupansky  >wrote:
>
> > Just  from the string field to a "text" field and use standard
> > tokenization, then you can search the text field for "youtube" or even
> > "something" that is a component of the URL path. No wildcard required.
> >
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Amit Sela
> > Sent: Thursday, June 27, 2013 8:37 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr admin search with wildcard
> >
> >
> > The stored and indexed string is actually a url like "
> > http://www.youtube.com/**somethingsomething<
> http://www.youtube.com/somethingsomething>
> > ".
> > It looks like removing the quotes does the job: iframe:*youtube* or am I
> > wrong ? For now, performance is not an issue, but accuracy is and I would
> > like to know for example how many URLS have iframe source leading to
> > YouTube for example. So query like: iframe:*youtube* with max rows 10 or
> > something will return in the response numFound field the total number of
> > pages that have a tag ifarme with a source matching *youtube, No ?
> >
> >
> > On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky  >*
> > *wrote:
> >
> >  No, you cannot use wildcards within a quoted term.
> >>
> >> Tell us a little more about what your strings look like. You might want
> to
> >> consider tokenizing or using ngrams to avoid the need for wildcards.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Amit Sela
> >> Sent: Thursday, June 27, 2013 3:33 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Solr admin search with wildcard
> >>
> >>
> >> I'm looking to search (in the solr admin search screen) a certain field
> >> for:
> >>
> >> *youtube*
> >>
> >> I know that leading wildcards takes a lot of resources but I'm not
> worried
> >> with that
> >>
> >> My only question is about the syntax, would this work:
> >>
> >> field:"*youtube*" ?
> >>
> >> Thanks,
> >>
> >> I'm using Solr 3.6.2
> >>
> >>
> >
>


Re: Filter queries taking a long time, even with cache disabled

2013-06-28 Thread Erick Erickson
I'm guessing you're well aware that the example you
gave is parsed as search_field:love default_field:obama.

Which isn't pertinent, there's nothing that
looks like it should take any time at all here, to say
nothing of 120 seconds.

So start with &debug=query and see what the
filter query is parsed as, especially the section
"parsed_filter_queries" in the debug output.

Best
Erick


On Thu, Jun 27, 2013 at 10:01 AM, Dotan Cohen  wrote:

> On Thu, Jun 27, 2013 at 12:14 PM, Upayavira  wrote:
> > can you give an example?
> >
>
> Thank you. This is an example query:
> select
> ?q=search_field:iraq
> &fq={!cache=false}search_field:love%20obama
> &defType=edismax
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>


Re: Field Query After Collapse.Field?

2013-06-28 Thread Erick Erickson
bq: Is there anyway to perform the field query after the results are
collapsed?

I'm not quite sure what you mean here. The intent of fq
clauses it that they apply to the entire query before
anything else, including field collapsing (and I'm
assuming you mean group.field, not collapse.field)

What's the higher-level use-case you're trying to satisfy?

Best
Erick


On Thu, Jun 27, 2013 at 1:34 PM, slevytam wrote:

> Hello,
>
> I've struggling to find a way to query after collapse.field is performed
> and
> I'm hoping someone can help.
>
> I'm doing a multiple core(index) search which generates results that can
> have varying fields.
> ex.
> entry_id, entry_starred
> entry_id, entry_read
>
> I perform a collapse.field on entry_id which yields:
> ex. entry_id, entry_starred, entry_read
>
> But if I try to do a fq on one of the fields
> ex. fq=!entry_read:1
>
> The fq is performed before the collapse leading to incorrect results.
>
> Is there anyway to perform the field query after the results are collapsed?
>
> Thanks,
>
> slevytam
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Field-Query-After-Collapse-Field-tp4073691.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: solrj indexing using embedded solr is slow

2013-06-28 Thread Erick Erickson
First, how much slower? 2x? 10x? 1.1x?

When using embedded, you're doing all the
work you were doing on two machines on a
single machine, so my first question would
be how is your CPU performaing? Is it maxed?

Best
Erick


On Thu, Jun 27, 2013 at 1:59 PM, Learner  wrote:

> Shawn,
>
> Thanks a lot for your reply.
>
> I have pasted my entire code below, it would be great if you can let me
> know
> if I am doing anything wrong in terms of running the code in multithreaded
> environment.
>
> http://pastebin.com/WRLn3yWn
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solrj-indexing-using-embedded-solr-is-slow-tp4073636p4073711.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Stemming query in Solr

2013-06-28 Thread Erick Erickson
First, this is for the Java version, I hope it extends to C#.

But in your configuration, when you're indexing the stemmer
should be storing the reduced form in the index. Then, when
searching, the search should be against the reduced term.
To check this, try
1> Using the Admin/Analysis page to see what gets stored
 in your index and what your query is transformed to to
 insure that you're getting what you expect.

If you want to get in deeper to the details, try
1> use, say, the TermsComponent or Admin/Schema Browser
 or Luke to look in your index and see what's actually
there.
2> us &debug=query or Admin/Analysis to see what the query
actually looks like.

Both your use-cases should work fine just with reduction
_unless_ the particular word you look for doesn't happen to
trip the stemmer. By that I mean that since it's algorithmically
based, there may be some edge cases that seem like they
should be reduced that aren't. I don't know whether "fisherman"
would reduce to "fish" for instance.

So are you seeing things that really don't work as expected or
are you just working from the docs? Because I really don't
see why you wouldn't get what you want given your description.

Best
Erick


On Fri, Jun 28, 2013 at 2:33 AM, snkar  wrote:

> We have a search system based on Solr using the Solrnet library in C# which
> supports some advanced search features like Fuzzy, Synonym and Stemming.
> While all of these work, *the expectation from the Stemming Search seems to
> be a combination of Stemming by reduction as well as stemming by expansion
> to cover grammatical variations on a word*. A use case will make it more
> clear:
>
>  - a search for fish would also find fishing
>  - a search for applied would also find applying, applies, and apply
>
> We had implemented Stemming using a CopyField with
> SnowballPorterFilterFactory. *As a result, when /searching for burning the
> results are returning for burning and burn/ but when /searching for burn
> the
> results are not returning for burning or burnt or burns/*
>
> Since all stemmers supported Lucene/Solr all use stemming by reduction, we
> are not sure on how to go about this. As per the Solr Wiki:
>
> > A related technology to stemming is lemmatization, which allows for
> > "stemming" by expansion, taking a root word and 'expanding' it to all of
> > its various forms. Lemmatization can be used either at insertion time or
> > at query time. Lucene/Solr does not have built-in support for
> > lemmatization but it can be simulated by using your own dictionaries and
> > the SynonymFilterFactory
>
> We are not sure of exactly how to go about this in Solr. Any ideas.
>
> We were also thinking in terms of using some C# based stemmer/lemmatizer
> library to get the root of the word and using some public database like
> WordNet to extract the different grammatical variations of the stem and
> then
> send across all these terms for querying in Solr. We have not yet done a
> lot
> of research to figure out a stable C# stemmer/lemmatizer and a WordNet C#
> API, but seems like this will get too convoluted and it should have a way
> to
> be executed from within Solr.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Stemming-query-in-Solr-tp4073862.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Context search in solr

2013-06-28 Thread Erick Erickson
One variant on Upayavira's comment would be to use
the proximity as a boost query. That way all three would
match, but the first two would get higher scores.

Either way should work though.

Best
Erick


On Fri, Jun 28, 2013 at 6:29 AM, Upayavira  wrote:

> you might use proximity. "low blood pressure"~6 might match #1 and #2
> but not #3.
>
> It says find phrases that require six or less position moves in order to
> match my terms as a phrase.
>
> Upayavira
>
> On Fri, Jun 28, 2013, at 11:10 AM, venkatesham.gu...@igate.com wrote:
> > My search query is having multiple words ranging from 3 to 8 and a
> > context
> > attached to it. I am looking for the search result documents which should
> > have all the terms which are there in query and also terms in the
> > document
> > should relate or have the similar context.
> >
> > For example: my search query is "Low blood pressure"
> > Documents which have part of text like below
> > 1. I am having lower blood pressure.
> > 2. my blood pressure was very low
> > 3. I also take atenolol and norvasc for high blood pressure but never
> > heard
> > of the protonix causing low magnesium
> >
> > If I search for all words in the document, result will have 3 documents
> > but
> > 3rd one even though its a keyword match but it has no context with "Low
> > blood pressure".
> > If I search for phrase search "Low blood pressure", the result will have
> > only 1 document, other 2 will not match even though 2nd document is a
> > probable match
> >
> > to make my search little intelligence on context, what are the features
> > solr
> > provides.
> >
> > Thanks.
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Context-search-in-solr-tp4073882.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Context search in solr

2013-06-28 Thread Jack Krupansky
The other trick that the relevancy experts know about: you get better top 
results using "OR" query of the base terms combined with "OR" of the 
proximity phrases of the terms if you are willing to accept that there may 
be less-than-desirable results further down the list. Sure, people don't 
like seeing the mis-matched results in the list and a larger number of 
results, but it's all a tradeoff to assure that the most relevant results 
are higher and exact matching is a little looser.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Friday, June 28, 2013 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Context search in solr

One variant on Upayavira's comment would be to use
the proximity as a boost query. That way all three would
match, but the first two would get higher scores.

Either way should work though.

Best
Erick


On Fri, Jun 28, 2013 at 6:29 AM, Upayavira  wrote:


you might use proximity. "low blood pressure"~6 might match #1 and #2
but not #3.

It says find phrases that require six or less position moves in order to
match my terms as a phrase.

Upayavira

On Fri, Jun 28, 2013, at 11:10 AM, venkatesham.gu...@igate.com wrote:
> My search query is having multiple words ranging from 3 to 8 and a
> context
> attached to it. I am looking for the search result documents which 
> should

> have all the terms which are there in query and also terms in the
> document
> should relate or have the similar context.
>
> For example: my search query is "Low blood pressure"
> Documents which have part of text like below
> 1. I am having lower blood pressure.
> 2. my blood pressure was very low
> 3. I also take atenolol and norvasc for high blood pressure but never
> heard
> of the protonix causing low magnesium
>
> If I search for all words in the document, result will have 3 documents
> but
> 3rd one even though its a keyword match but it has no context with "Low
> blood pressure".
> If I search for phrase search "Low blood pressure", the result will have
> only 1 document, other 2 will not match even though 2nd document is a
> probable match
>
> to make my search little intelligence on context, what are the features
> solr
> provides.
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Context-search-in-solr-tp4073882.html
> Sent from the Solr - User mailing list archive at Nabble.com.





Re: Field Query After Collapse.Field?

2013-06-28 Thread slevytam
Hi Erick,

I actually did mean collapse.field, as per:
http://blog.trifork.com/2009/10/20/result-grouping-field-collapsing-with-solr/

On high level I am trying to avoid the use of a join between a list of
entries and a list of actions that users have performed on a entry (since
it's not supported by distributed search).

So I have a list of entries
ie. entry_id, entry_content, etc

And a list of actions users have performed on the entry
ie. entry_id, entry_read, entry_starred

I'm trying to combine these for pagination purposes.  By doing a search for
entry_id across the two cores (indexes) and then doing a collapse.field, I
am able to get this nice list of results.  However, I cannot figure out a
way to then filter that list since q and fq happen before the collapse.

Thanks,

Shalom



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-Query-After-Collapse-Field-tp4073691p4073928.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field Query After Collapse.Field?

2013-06-28 Thread Erick Erickson
Well, now I'm really puzzled. The link you referenced was from when
grouping/field collapsing was under development. I did a quick look
through the entire 4x code base fo "collapse" and there's no place
I saw that looks like it accepts that parameter. Of course I may have
just missed it.

What version of Solr are you using? Have you done anything special
to it? Can you cut/paste your response, or at least the relevant bits that
show the effects of specifying collapse.field?

Best
Erick


On Fri, Jun 28, 2013 at 12:19 PM, slevytam wrote:

> Hi Erick,
>
> I actually did mean collapse.field, as per:
>
> http://blog.trifork.com/2009/10/20/result-grouping-field-collapsing-with-solr/
>
> On high level I am trying to avoid the use of a join between a list of
> entries and a list of actions that users have performed on a entry (since
> it's not supported by distributed search).
>
> So I have a list of entries
> ie. entry_id, entry_content, etc
>
> And a list of actions users have performed on the entry
> ie. entry_id, entry_read, entry_starred
>
> I'm trying to combine these for pagination purposes.  By doing a search for
> entry_id across the two cores (indexes) and then doing a collapse.field, I
> am able to get this nice list of results.  However, I cannot figure out a
> way to then filter that list since q and fq happen before the collapse.
>
> Thanks,
>
> Shalom
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Field-Query-After-Collapse-Field-tp4073691p4073928.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Replicating files containing external file fields

2013-06-28 Thread Arun Rangarajan
Erick,
Thx for your reply. The external file field fields are already under
 specified in solrconfig.xml. They are not getting replicated.
(Solr version 4.2.1.)


On Thu, Jun 27, 2013 at 10:50 AM, Erick Erickson wrote:

> Haven't tried this, but I _think_ you can use the
> "confFiles" trick with relative paths, see:
> http://wiki.apache.org/solr/SolrReplication
>
> Or just put your EFF files in the data dir?
>
> Best
> Erick
>
>
> On Wed, Jun 26, 2013 at 9:01 PM, Arun Rangarajan
> wrote:
>
> > From https://wiki.apache.org/solr/SolrReplication I understand that
> index
> > dir and any files under the conf dir can be replicated to slaves. I want
> to
> > know if there is any way the files under the data dir containing external
> > file fields can be replicated. These are not replicated by default.
> > Currently we are running the ext file field reload script on both the
> > master and the slave and then running reloadCache on each server once
> they
> > are loaded.
> >
>


Content based recommender using lucene/solr

2013-06-28 Thread Luis Carlos Guerrero Covo
Hi,

I'm using lucene and solr right now in a production environment with an
index of about a million docs. I'm working on a recommender that basically
would list the n most similar items to the user based on the current item
he is viewing.

I've been thinking of using solr/lucene since I already have all docs
available and I want a quick version that can be deployed while we work on
a more robust recommender. How about overriding the default similarity so
that it scores documents based on the euclidean distance of normalized item
attributes and then using a morelikethis component to pass in the
attributes of the item for which I want to generate recommendations? I know
it has its issues like recomputing scores/normalization/weight application
at query time which could make this idea unfeasible/impractical. I'm at a
very preliminary stage right now with this and would love some suggestions
from experienced users.

thank you,

Luis Guerrero


Re: Replicating files containing external file fields

2013-06-28 Thread Jack Krupansky
Show us your  directive. Maybe there is some subtle error in the 
file name.


-- Jack Krupansky

-Original Message- 
From: Arun Rangarajan

Sent: Friday, June 28, 2013 1:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Replicating files containing external file fields

Erick,
Thx for your reply. The external file field fields are already under
 specified in solrconfig.xml. They are not getting replicated.
(Solr version 4.2.1.)


On Thu, Jun 27, 2013 at 10:50 AM, Erick Erickson 
wrote:



Haven't tried this, but I _think_ you can use the
"confFiles" trick with relative paths, see:
http://wiki.apache.org/solr/SolrReplication

Or just put your EFF files in the data dir?

Best
Erick


On Wed, Jun 26, 2013 at 9:01 PM, Arun Rangarajan
wrote:

> From https://wiki.apache.org/solr/SolrReplication I understand that
index
> dir and any files under the conf dir can be replicated to slaves. I want
to
> know if there is any way the files under the data dir containing 
> external

> file fields can be replicated. These are not replicated by default.
> Currently we are running the ext file field reload script on both the
> master and the slave and then running reloadCache on each server once
they
> are loaded.
>





Error- missing sfield for spatial request

2013-06-28 Thread Learner
I am trying to combine geospatial query (latlong) with the below query inside
a search component but I am getting the below error..

*Error:*


missing sfield for spatial request
400



  
 (
   _query_:"{!wp_optional df='addr_location_clean_i' qs=1
v=$fps_where}"^6.2 OR
   _query_:"{!wp_optional df='addr_location_i' qs=1 v=$fps_where}"^6.2
OR
*  
_query_:"{!geofilt}"&sfield=latlong&pt=$fps_latlong&d=50&sort=geodist()
asc
*)


I initially thought that it might be an issue with latlong field, but this
query works fine.

*:*&fq=_query_:%22{!geofilt}%22&sfield=latlong&pt=47.601234,-122.330466&d=50&sort=geodist()%20asc


Can someone let me know how to form the geospatial (LatLonType) query in
search component?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-missing-sfield-for-spatial-request-tp4073940.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Content based recommender using lucene/solr

2013-06-28 Thread Saikat Kanjilal
Why not just use mahout to do this, there is an item similarity algorithm in 
mahout that does exactly this :)

https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html

You can use mahout in distributed and non-distributed mode as well.

> From: lcguerreroc...@gmail.com
> Date: Fri, 28 Jun 2013 12:16:57 -0500
> Subject: Content based recommender using lucene/solr
> To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
> 
> Hi,
> 
> I'm using lucene and solr right now in a production environment with an
> index of about a million docs. I'm working on a recommender that basically
> would list the n most similar items to the user based on the current item
> he is viewing.
> 
> I've been thinking of using solr/lucene since I already have all docs
> available and I want a quick version that can be deployed while we work on
> a more robust recommender. How about overriding the default similarity so
> that it scores documents based on the euclidean distance of normalized item
> attributes and then using a morelikethis component to pass in the
> attributes of the item for which I want to generate recommendations? I know
> it has its issues like recomputing scores/normalization/weight application
> at query time which could make this idea unfeasible/impractical. I'm at a
> very preliminary stage right now with this and would love some suggestions
> from experienced users.
> 
> thank you,
> 
> Luis Guerrero
  

Re: Content based recommender using lucene/solr

2013-06-28 Thread Luis Carlos Guerrero Covo
Hey saikat, thanks for your suggestion. I've looked into mahout and other
alternatives for computing k nearest neighbors. I would have to run a job
and computer the k nearest neighbors and track them in the index for
retrieval. I wanted to see if this was something I could do with lucene
using lucene's scoring function and solr's morelikethis component. The job
you specifically mention is for Item based recommendation which would
require me to track the different items users have viewed. I'm looking for
a content based approach where I would use a distance measure to establish
how near items are (how similar) and have some kind of training phase to
adjust weights.


On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal wrote:

> Why not just use mahout to do this, there is an item similarity algorithm
> in mahout that does exactly this :)
>
>
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
>
> You can use mahout in distributed and non-distributed mode as well.
>
> > From: lcguerreroc...@gmail.com
> > Date: Fri, 28 Jun 2013 12:16:57 -0500
> > Subject: Content based recommender using lucene/solr
> > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
> >
> > Hi,
> >
> > I'm using lucene and solr right now in a production environment with an
> > index of about a million docs. I'm working on a recommender that
> basically
> > would list the n most similar items to the user based on the current item
> > he is viewing.
> >
> > I've been thinking of using solr/lucene since I already have all docs
> > available and I want a quick version that can be deployed while we work
> on
> > a more robust recommender. How about overriding the default similarity so
> > that it scores documents based on the euclidean distance of normalized
> item
> > attributes and then using a morelikethis component to pass in the
> > attributes of the item for which I want to generate recommendations? I
> know
> > it has its issues like recomputing scores/normalization/weight
> application
> > at query time which could make this idea unfeasible/impractical. I'm at a
> > very preliminary stage right now with this and would love some
> suggestions
> > from experienced users.
> >
> > thank you,
> >
> > Luis Guerrero
>
>



-- 
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047


broken links returned from solr search

2013-06-28 Thread MA LIG
Hello,

I ran the solr example as described in
http://lucene.apache.org/solr/4_3_1/tutorial.html and then loaded some doc
files to solr as described in
http://wiki.apache.org/solr/ExtractingRequestHandler. The commands I used
to load the files were of the form

  curl "
http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F
"myfile=@test.doc"

I can successfully see search results in
http://localhost:8983/solr/collection1/browse
.

However, when I click on a link, I get a 404 not found error. How can I
make these links work properly?

Thanks in advance

-gw


Re: Content based recommender using lucene/solr

2013-06-28 Thread Otis Gospodnetic
Hi,

Have a look at http://www.youtube.com/watch?v=13yQbaW2V4Y .  I'd say
it's easier than Mahout, especially if you already have and know your
way around Solr.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Fri, Jun 28, 2013 at 2:02 PM, Luis Carlos Guerrero Covo
 wrote:
> Hey saikat, thanks for your suggestion. I've looked into mahout and other
> alternatives for computing k nearest neighbors. I would have to run a job
> and computer the k nearest neighbors and track them in the index for
> retrieval. I wanted to see if this was something I could do with lucene
> using lucene's scoring function and solr's morelikethis component. The job
> you specifically mention is for Item based recommendation which would
> require me to track the different items users have viewed. I'm looking for
> a content based approach where I would use a distance measure to establish
> how near items are (how similar) and have some kind of training phase to
> adjust weights.
>
>
> On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal wrote:
>
>> Why not just use mahout to do this, there is an item similarity algorithm
>> in mahout that does exactly this :)
>>
>>
>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
>>
>> You can use mahout in distributed and non-distributed mode as well.
>>
>> > From: lcguerreroc...@gmail.com
>> > Date: Fri, 28 Jun 2013 12:16:57 -0500
>> > Subject: Content based recommender using lucene/solr
>> > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
>> >
>> > Hi,
>> >
>> > I'm using lucene and solr right now in a production environment with an
>> > index of about a million docs. I'm working on a recommender that
>> basically
>> > would list the n most similar items to the user based on the current item
>> > he is viewing.
>> >
>> > I've been thinking of using solr/lucene since I already have all docs
>> > available and I want a quick version that can be deployed while we work
>> on
>> > a more robust recommender. How about overriding the default similarity so
>> > that it scores documents based on the euclidean distance of normalized
>> item
>> > attributes and then using a morelikethis component to pass in the
>> > attributes of the item for which I want to generate recommendations? I
>> know
>> > it has its issues like recomputing scores/normalization/weight
>> application
>> > at query time which could make this idea unfeasible/impractical. I'm at a
>> > very preliminary stage right now with this and would love some
>> suggestions
>> > from experienced users.
>> >
>> > thank you,
>> >
>> > Luis Guerrero
>>
>>
>
>
>
> --
> Luis Carlos Guerrero Covo
> M.S. Computer Engineering
> (57) 3183542047


RE: Content based recommender using lucene/solr

2013-06-28 Thread Saikat Kanjilal
You could build a custom recommender in mahout to accomplish this, also just 
out of curiosity why the content based approach as opposed to building a 
recommender based on co-occurence.  One other thing, what is your data size, 
are you looking at scale where you need something like hadoop?

> From: lcguerreroc...@gmail.com
> Date: Fri, 28 Jun 2013 13:02:00 -0500
> Subject: Re: Content based recommender using lucene/solr
> To: solr-user@lucene.apache.org
> CC: java-u...@lucene.apache.org
> 
> Hey saikat, thanks for your suggestion. I've looked into mahout and other
> alternatives for computing k nearest neighbors. I would have to run a job
> and computer the k nearest neighbors and track them in the index for
> retrieval. I wanted to see if this was something I could do with lucene
> using lucene's scoring function and solr's morelikethis component. The job
> you specifically mention is for Item based recommendation which would
> require me to track the different items users have viewed. I'm looking for
> a content based approach where I would use a distance measure to establish
> how near items are (how similar) and have some kind of training phase to
> adjust weights.
> 
> 
> On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal wrote:
> 
> > Why not just use mahout to do this, there is an item similarity algorithm
> > in mahout that does exactly this :)
> >
> >
> > https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
> >
> > You can use mahout in distributed and non-distributed mode as well.
> >
> > > From: lcguerreroc...@gmail.com
> > > Date: Fri, 28 Jun 2013 12:16:57 -0500
> > > Subject: Content based recommender using lucene/solr
> > > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
> > >
> > > Hi,
> > >
> > > I'm using lucene and solr right now in a production environment with an
> > > index of about a million docs. I'm working on a recommender that
> > basically
> > > would list the n most similar items to the user based on the current item
> > > he is viewing.
> > >
> > > I've been thinking of using solr/lucene since I already have all docs
> > > available and I want a quick version that can be deployed while we work
> > on
> > > a more robust recommender. How about overriding the default similarity so
> > > that it scores documents based on the euclidean distance of normalized
> > item
> > > attributes and then using a morelikethis component to pass in the
> > > attributes of the item for which I want to generate recommendations? I
> > know
> > > it has its issues like recomputing scores/normalization/weight
> > application
> > > at query time which could make this idea unfeasible/impractical. I'm at a
> > > very preliminary stage right now with this and would love some
> > suggestions
> > > from experienced users.
> > >
> > > thank you,
> > >
> > > Luis Guerrero
> >
> >
> 
> 
> 
> -- 
> Luis Carlos Guerrero Covo
> M.S. Computer Engineering
> (57) 3183542047
  

Re: Content based recommender using lucene/solr

2013-06-28 Thread Walter Underwood
More Like This already is kNN. It extracts features from the document (makes a 
query), and runs that query against the collection.

If you want the items most similar to the current item, use MLT.

wunder

On Jun 28, 2013, at 11:02 AM, Luis Carlos Guerrero Covo wrote:

> Hey saikat, thanks for your suggestion. I've looked into mahout and other
> alternatives for computing k nearest neighbors. I would have to run a job
> and computer the k nearest neighbors and track them in the index for
> retrieval. I wanted to see if this was something I could do with lucene
> using lucene's scoring function and solr's morelikethis component. The job
> you specifically mention is for Item based recommendation which would
> require me to track the different items users have viewed. I'm looking for
> a content based approach where I would use a distance measure to establish
> how near items are (how similar) and have some kind of training phase to
> adjust weights.
> 
> 
> On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal wrote:
> 
>> Why not just use mahout to do this, there is an item similarity algorithm
>> in mahout that does exactly this :)
>> 
>> 
>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
>> 
>> You can use mahout in distributed and non-distributed mode as well.
>> 
>>> From: lcguerreroc...@gmail.com
>>> Date: Fri, 28 Jun 2013 12:16:57 -0500
>>> Subject: Content based recommender using lucene/solr
>>> To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
>>> 
>>> Hi,
>>> 
>>> I'm using lucene and solr right now in a production environment with an
>>> index of about a million docs. I'm working on a recommender that
>> basically
>>> would list the n most similar items to the user based on the current item
>>> he is viewing.
>>> 
>>> I've been thinking of using solr/lucene since I already have all docs
>>> available and I want a quick version that can be deployed while we work
>> on
>>> a more robust recommender. How about overriding the default similarity so
>>> that it scores documents based on the euclidean distance of normalized
>> item
>>> attributes and then using a morelikethis component to pass in the
>>> attributes of the item for which I want to generate recommendations? I
>> know
>>> it has its issues like recomputing scores/normalization/weight
>> application
>>> at query time which could make this idea unfeasible/impractical. I'm at a
>>> very preliminary stage right now with this and would love some
>> suggestions
>>> from experienced users.
>>> 
>>> thank you,
>>> 
>>> Luis Guerrero
>> 
>> 
> 
> 
> 
> -- 
> Luis Carlos Guerrero Covo
> M.S. Computer Engineering
> (57) 3183542047

--
Walter Underwood
wun...@wunderwood.org





Re: Field Query After Collapse.Field?

2013-06-28 Thread Bryan Bende
Can you just use two queries to achieve the desired results ?

Query1 to get all actions where !entry_read:1 for some range of rows (your
page size)
Query2 to get all the entries with an entry_id in the results of Query1

The second query would be very direct and only query for a set of entries
equal to your page size.


On Fri, Jun 28, 2013 at 12:51 PM, Erick Erickson wrote:

> Well, now I'm really puzzled. The link you referenced was from when
> grouping/field collapsing was under development. I did a quick look
> through the entire 4x code base fo "collapse" and there's no place
> I saw that looks like it accepts that parameter. Of course I may have
> just missed it.
>
> What version of Solr are you using? Have you done anything special
> to it? Can you cut/paste your response, or at least the relevant bits that
> show the effects of specifying collapse.field?
>
> Best
> Erick
>
>
> On Fri, Jun 28, 2013 at 12:19 PM, slevytam  >wrote:
>
> > Hi Erick,
> >
> > I actually did mean collapse.field, as per:
> >
> >
> http://blog.trifork.com/2009/10/20/result-grouping-field-collapsing-with-solr/
> >
> > On high level I am trying to avoid the use of a join between a list of
> > entries and a list of actions that users have performed on a entry (since
> > it's not supported by distributed search).
> >
> > So I have a list of entries
> > ie. entry_id, entry_content, etc
> >
> > And a list of actions users have performed on the entry
> > ie. entry_id, entry_read, entry_starred
> >
> > I'm trying to combine these for pagination purposes.  By doing a search
> for
> > entry_id across the two cores (indexes) and then doing a collapse.field,
> I
> > am able to get this nice list of results.  However, I cannot figure out a
> > way to then filter that list since q and fq happen before the collapse.
> >
> > Thanks,
> >
> > Shalom
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Field-Query-After-Collapse-Field-tp4073691p4073928.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Content based recommender using lucene/solr

2013-06-28 Thread Otis Gospodnetic
Hi,

It doesn't have to be one or the other.  In the past I've built a news
recommender engine based on CF (Mahout) and combined it with Content
Similarity-based engine (wasn't Solr/Lucene, but something custom that
worked with ngrams, but it may have as well been Lucene/Solr/ES).  It
worked well.  If you haven't worked with Mahout before I'd suggest the
approach in that video and going from there to Mahout only if it's
limiting.

See Ted's stuff on this topic, too:
http://www.slideshare.net/tdunning/search-as-recommendation +
http://berlinbuzzwords.de/sessions/multi-modal-recommendation-algorithms
(note: Mahout, Solr, Pig)

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Fri, Jun 28, 2013 at 2:07 PM, Saikat Kanjilal  wrote:
> You could build a custom recommender in mahout to accomplish this, also just 
> out of curiosity why the content based approach as opposed to building a 
> recommender based on co-occurence.  One other thing, what is your data size, 
> are you looking at scale where you need something like hadoop?
>
>> From: lcguerreroc...@gmail.com
>> Date: Fri, 28 Jun 2013 13:02:00 -0500
>> Subject: Re: Content based recommender using lucene/solr
>> To: solr-user@lucene.apache.org
>> CC: java-u...@lucene.apache.org
>>
>> Hey saikat, thanks for your suggestion. I've looked into mahout and other
>> alternatives for computing k nearest neighbors. I would have to run a job
>> and computer the k nearest neighbors and track them in the index for
>> retrieval. I wanted to see if this was something I could do with lucene
>> using lucene's scoring function and solr's morelikethis component. The job
>> you specifically mention is for Item based recommendation which would
>> require me to track the different items users have viewed. I'm looking for
>> a content based approach where I would use a distance measure to establish
>> how near items are (how similar) and have some kind of training phase to
>> adjust weights.
>>
>>
>> On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal wrote:
>>
>> > Why not just use mahout to do this, there is an item similarity algorithm
>> > in mahout that does exactly this :)
>> >
>> >
>> > https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
>> >
>> > You can use mahout in distributed and non-distributed mode as well.
>> >
>> > > From: lcguerreroc...@gmail.com
>> > > Date: Fri, 28 Jun 2013 12:16:57 -0500
>> > > Subject: Content based recommender using lucene/solr
>> > > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
>> > >
>> > > Hi,
>> > >
>> > > I'm using lucene and solr right now in a production environment with an
>> > > index of about a million docs. I'm working on a recommender that
>> > basically
>> > > would list the n most similar items to the user based on the current item
>> > > he is viewing.
>> > >
>> > > I've been thinking of using solr/lucene since I already have all docs
>> > > available and I want a quick version that can be deployed while we work
>> > on
>> > > a more robust recommender. How about overriding the default similarity so
>> > > that it scores documents based on the euclidean distance of normalized
>> > item
>> > > attributes and then using a morelikethis component to pass in the
>> > > attributes of the item for which I want to generate recommendations? I
>> > know
>> > > it has its issues like recomputing scores/normalization/weight
>> > application
>> > > at query time which could make this idea unfeasible/impractical. I'm at a
>> > > very preliminary stage right now with this and would love some
>> > suggestions
>> > > from experienced users.
>> > >
>> > > thank you,
>> > >
>> > > Luis Guerrero
>> >
>> >
>>
>>
>>
>> --
>> Luis Carlos Guerrero Covo
>> M.S. Computer Engineering
>> (57) 3183542047
>


Re: Content based recommender using lucene/solr

2013-06-28 Thread Luis Carlos Guerrero Covo
I only have about a million docs right now so scaling is not a big issue.
I'm looking to provide a quick implementation and then worry about scale
when I get around to implementing a more robust recommender. I'm looking at
a content based approach because we are not tracking users and items viewed
by users. I was thinking of using morelikethis like walter mentioned, but
wanted some feedback on the nuances required for a proper implementation
like having a similarity based on euclidean distance, normalizing numerical
field values and computing collection wide stats like mean and variance.
Thank you for the link Otis, I will watch it right away.


On Fri, Jun 28, 2013 at 1:12 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> It doesn't have to be one or the other.  In the past I've built a news
> recommender engine based on CF (Mahout) and combined it with Content
> Similarity-based engine (wasn't Solr/Lucene, but something custom that
> worked with ngrams, but it may have as well been Lucene/Solr/ES).  It
> worked well.  If you haven't worked with Mahout before I'd suggest the
> approach in that video and going from there to Mahout only if it's
> limiting.
>
> See Ted's stuff on this topic, too:
> http://www.slideshare.net/tdunning/search-as-recommendation +
> http://berlinbuzzwords.de/sessions/multi-modal-recommendation-algorithms
> (note: Mahout, Solr, Pig)
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Fri, Jun 28, 2013 at 2:07 PM, Saikat Kanjilal 
> wrote:
> > You could build a custom recommender in mahout to accomplish this, also
> just out of curiosity why the content based approach as opposed to building
> a recommender based on co-occurence.  One other thing, what is your data
> size, are you looking at scale where you need something like hadoop?
> >
> >> From: lcguerreroc...@gmail.com
> >> Date: Fri, 28 Jun 2013 13:02:00 -0500
> >> Subject: Re: Content based recommender using lucene/solr
> >> To: solr-user@lucene.apache.org
> >> CC: java-u...@lucene.apache.org
> >>
> >> Hey saikat, thanks for your suggestion. I've looked into mahout and
> other
> >> alternatives for computing k nearest neighbors. I would have to run a
> job
> >> and computer the k nearest neighbors and track them in the index for
> >> retrieval. I wanted to see if this was something I could do with lucene
> >> using lucene's scoring function and solr's morelikethis component. The
> job
> >> you specifically mention is for Item based recommendation which would
> >> require me to track the different items users have viewed. I'm looking
> for
> >> a content based approach where I would use a distance measure to
> establish
> >> how near items are (how similar) and have some kind of training phase to
> >> adjust weights.
> >>
> >>
> >> On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal  >wrote:
> >>
> >> > Why not just use mahout to do this, there is an item similarity
> algorithm
> >> > in mahout that does exactly this :)
> >> >
> >> >
> >> >
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
> >> >
> >> > You can use mahout in distributed and non-distributed mode as well.
> >> >
> >> > > From: lcguerreroc...@gmail.com
> >> > > Date: Fri, 28 Jun 2013 12:16:57 -0500
> >> > > Subject: Content based recommender using lucene/solr
> >> > > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
> >> > >
> >> > > Hi,
> >> > >
> >> > > I'm using lucene and solr right now in a production environment
> with an
> >> > > index of about a million docs. I'm working on a recommender that
> >> > basically
> >> > > would list the n most similar items to the user based on the
> current item
> >> > > he is viewing.
> >> > >
> >> > > I've been thinking of using solr/lucene since I already have all
> docs
> >> > > available and I want a quick version that can be deployed while we
> work
> >> > on
> >> > > a more robust recommender. How about overriding the default
> similarity so
> >> > > that it scores documents based on the euclidean distance of
> normalized
> >> > item
> >> > > attributes and then using a morelikethis component to pass in the
> >> > > attributes of the item for which I want to generate
> recommendations? I
> >> > know
> >> > > it has its issues like recomputing scores/normalization/weight
> >> > application
> >> > > at query time which could make this idea unfeasible/impractical.
> I'm at a
> >> > > very preliminary stage right now with this and would love some
> >> > suggestions
> >> > > from experienced users.
> >> > >
> >> > > thank you,
> >> > >
> >> > > Luis Guerrero
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Luis Carlos Guerrero Covo
> >> M.S. Computer Engineering
> >> (57) 3183542047
> >
>



-- 
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047


Re: full-import failed after 5 hours with Exception: ORA-01555: snapshot too old: rollback segment number with name "" too small ORA-22924: snapshot too old

2013-06-28 Thread Otis Gospodnetic
Hi,

I'd go talk to the DBA.  How long does this query take if you run it
directly against Oracle?  How long if you run it locally vs. from a
remove server (like Solr is in relation to your Oracle server(s)).
What happens if you increase batchSize?

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Thu, Jun 27, 2013 at 6:41 PM, srinalluri  wrote:
> Hello,
>
> I am using Solr 4.3.2 and Oracle DB. The sub entity is using
> CachedSqlEntityProcessor. The dataSource is having batchSize="500". The
> full-import is failed with 'ORA-01555: snapshot too old: rollback segment
> number  with name "" too small ORA-22924: snapshot too old' Exception after
> 5 hours.
>
> We already increased the undo space 4 times at the database end. Number of
> records in the jan_story table is 800,000 only. Tomcat is with 4GB JVM
> memory.
>
> Following is the entity (there are other sub-entities, I didn't mention them
> here. As the import failed with article_details entity. article_details is
> the first sub-entity)
>
>  preImportDeleteQuery="content_type:article AND
> repository:par8qatestingprod"
> query="select ID as VCMID from jan_story">
>  transformer="TemplateTransformer,ClobTransformer,RegexTransformer"
>   query="select bb.recordid, aa.ID as DID,aa.STORY_TITLE,
> aa.STORY_HEADLINE, aa.SOURCE, aa.DECK, regexp_replace(aa.body,
> '\\[(pullquote|summary)\]\|\[video [0-9]+?\]|\[youtube
> .+?\]', '') as BODY, aa.PUBLISHED_DATE, aa.MODIFIED_DATE, aa.DATELINE,
> aa.REPORTER_NAME, aa.TICKER_CODES,aa.ADVERTORIAL_CONTENT from jan_story
> aa,mapp bb where aa.id=bb.keystring1" cacheKey="DID"
> cacheLookup="par8-article-testingprod.VCMID"
> processor="CachedSqlEntityProcessor" >
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
>   
>
>
> The full-import without CachedSqlEntityProcessor is taking 7 days. That is
> why I am doing all this.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/full-import-failed-after-5-hours-with-Exception-ORA-01555-snapshot-too-old-rollback-segment-number-wd-tp4073822.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field Query After Collapse.Field?

2013-06-28 Thread slevytam
Hi Erick,

I have no idea how I managed to get that working.  I was messing around a
lot.  I may have added org.apache.solr.handler.component.CollapseComponent
to an older version :-  Unfortunately, I've formatted the server since to
try some other options.

I did find the official wiki page for the collapse.field feature.
http://wiki.apache.org/solr/FieldCollapsingUncommitted

I guess it was marked as duplicate for the 3.3 release; however, I cannot
find anyway to collapse fields with the current options. 
(http://wiki.apache.org/solr/FieldCollapsing)

Can you see anyway of doing this?

Thanks,

slevytam



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-Query-After-Collapse-Field-tp4073691p4073972.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field Query After Collapse.Field?

2013-06-28 Thread slevytam
Unfortunately not.  That would require an object for every single entry for
every single user.  

Generating millions of basically empty objects just for this query is likely
impossible.  

:(



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-Query-After-Collapse-Field-tp4073691p4073976.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: shardkey

2013-06-28 Thread Joshi, Shital
Thanks Mark.

We use commit=true as part of the request to add documents. Something like 
this: 

echo "$data"| curl --proxy "" --silent 
"http://HOST:9983/solr/collection1/update/csv?commit=true&separator=|&fieldnames=$fieldnames&_shard_=shard1"
  --data-binary @-  -H 'Content-type:text/plain; charset=utf-8'

You're suggesting that after this update, we should always execute, curl 
--proxy "" --silent "http://HOST:8983/solr/core3/update?commit=true";  Is that 
correct? 
It doesn't matter whether HOST is leader or replica. 



-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Thursday, June 27, 2013 5:35 PM
To: solr-user@lucene.apache.org
Subject: Re: shardkey

You might be seeing https://issues.apache.org/jira/browse/SOLR-4923 ?

The commit true part of the request that add documents? If so, it might be 
SOLR-4923 and you should try the commit in a request after adding the docs.

- Mark

On Jun 27, 2013, at 4:42 PM, "Joshi, Shital"  wrote:

> Hi,
> 
> We finally decided on using custom sharding (implicit document routing) for 
> our project. We will have ~3 mil documents per shardkey.  We're maintaining 
> shardkey -> shardid mapping in a database table. While adding documents we 
> always specify _shard_ parameter in update URL but while querying,  we don't 
> specify shards parameter. We want to search across shards. 
> 
> While experimenting we found that right after hard committing (commit=true in 
> update URL), at times the query didn't return documents across shards (40% of 
> the time) But many times (60% of the time) it returned documents across 
> shards. When queried after few hours, the query always returned documents 
> across  shards. Is that expected behavior? Is there a parameter to enforce 
> querying across all shards? This is very important point for us to move 
> further with SolrCloud. 
> 
> We're experimenting with adding a new shard and start directing all new 
> documents to this new shard. Hopefully that should work.
> 
> Many Thanks! 
> 
> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
> Sent: Friday, June 21, 2013 8:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: shardkey
> 
> On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital  wrote:
>> But now Solr stores composite id in the document id
> 
> Correct, it's the document id itself that contains everything needed
> for tje compositeId router to determine the hash.
> 
>> It would only use it to calculate hash key but while storing
> 
> compositeId routing is when it makes sense to make the routing part of
> the unique id so that an id is all the information needed to find the
> document in the cluster.  For example customer_id!document_name.  From
> your example of 20130611!test_14 it looks like you're doing time based
> sharding, and one would normally not use the compositeId router for
> that.
> 
> -Yonik
> http://lucidworks.com



cores sharing an instance

2013-06-28 Thread Peyman Faratin
Hi 

I have a multicore setup (in 4.3.0). Is it possible for one core to share an 
instance of its class with other cores at run time? i.e.

At run time core 1 makes an instance of object O_i

core 1 --> object O_i
core 2
---
core n

then can core K access O_i? I know they can share properties but is it possible 
to share objects?

thank you



dataconfig to index ZIP Files

2013-06-28 Thread ericrs22
So I thought I had it correctly setup but I'm receiveing the following
response to my Data Config

Last Update: 18:17:52

 (Duration: 07s)

Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s)

Started: 13 minutes ago

Here's my Data config.







   








--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataconfig to index ZIP Files

2013-06-28 Thread Steve Rowe
Hi,

Maybe fileName="*.zip" instead of ".*zip" ?

Steve

On Jun 28, 2013, at 2:20 PM, ericrs22  wrote:

> So I thought I had it correctly setup but I'm receiveing the following
> response to my Data Config
> 
> Last Update: 18:17:52
> 
> (Duration: 07s)
> 
> Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s)
> 
> Started: 13 minutes ago
> 
> Here's my Data config.
> 
> 
>
>
>  processor="FileListEntityProcessor" baseDir="E:\ArchiveRoot"
> fileName=".*zip" recursive="true" rootEntity="false" dataSource="binaryFile"
> onError="skip">
> 
>
>  
> 
>
>
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: shardkey

2013-06-28 Thread Mark Miller
Yeah, that is what I would try until 4.4 comes out - and it should not matter 
replica or leader.

- Mark

On Jun 28, 2013, at 3:13 PM, "Joshi, Shital"  wrote:

> Thanks Mark.
> 
> We use commit=true as part of the request to add documents. Something like 
> this: 
> 
> echo "$data"| curl --proxy "" --silent 
> "http://HOST:9983/solr/collection1/update/csv?commit=true&separator=|&fieldnames=$fieldnames&_shard_=shard1"
>   --data-binary @-  -H 'Content-type:text/plain; charset=utf-8'
> 
> You're suggesting that after this update, we should always execute, curl 
> --proxy "" --silent "http://HOST:8983/solr/core3/update?commit=true";  Is that 
> correct? 
> It doesn't matter whether HOST is leader or replica. 
> 
> 
> 
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com] 
> Sent: Thursday, June 27, 2013 5:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: shardkey
> 
> You might be seeing https://issues.apache.org/jira/browse/SOLR-4923 ?
> 
> The commit true part of the request that add documents? If so, it might be 
> SOLR-4923 and you should try the commit in a request after adding the docs.
> 
> - Mark
> 
> On Jun 27, 2013, at 4:42 PM, "Joshi, Shital"  wrote:
> 
>> Hi,
>> 
>> We finally decided on using custom sharding (implicit document routing) for 
>> our project. We will have ~3 mil documents per shardkey.  We're maintaining 
>> shardkey -> shardid mapping in a database table. While adding documents we 
>> always specify _shard_ parameter in update URL but while querying,  we don't 
>> specify shards parameter. We want to search across shards. 
>> 
>> While experimenting we found that right after hard committing (commit=true 
>> in update URL), at times the query didn't return documents across shards 
>> (40% of the time) But many times (60% of the time) it returned documents 
>> across shards. When queried after few hours, the query always returned 
>> documents across  shards. Is that expected behavior? Is there a parameter to 
>> enforce querying across all shards? This is very important point for us to 
>> move further with SolrCloud. 
>> 
>> We're experimenting with adding a new shard and start directing all new 
>> documents to this new shard. Hopefully that should work.
>> 
>> Many Thanks! 
>> 
>> -Original Message-
>> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
>> Sent: Friday, June 21, 2013 8:50 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: shardkey
>> 
>> On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital  wrote:
>>> But now Solr stores composite id in the document id
>> 
>> Correct, it's the document id itself that contains everything needed
>> for tje compositeId router to determine the hash.
>> 
>>> It would only use it to calculate hash key but while storing
>> 
>> compositeId routing is when it makes sense to make the routing part of
>> the unique id so that an id is all the information needed to find the
>> document in the cluster.  For example customer_id!document_name.  From
>> your example of 20130611!test_14 it looks like you're doing time based
>> sharding, and one would normally not use the compositeId router for
>> that.
>> 
>> -Yonik
>> http://lucidworks.com
> 



RE: shardkey

2013-06-28 Thread Joshi, Shital
Thanks! 

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Friday, June 28, 2013 5:06 PM
To: solr-user@lucene.apache.org
Subject: Re: shardkey

Yeah, that is what I would try until 4.4 comes out - and it should not matter 
replica or leader.

- Mark

On Jun 28, 2013, at 3:13 PM, "Joshi, Shital"  wrote:

> Thanks Mark.
> 
> We use commit=true as part of the request to add documents. Something like 
> this: 
> 
> echo "$data"| curl --proxy "" --silent 
> "http://HOST:9983/solr/collection1/update/csv?commit=true&separator=|&fieldnames=$fieldnames&_shard_=shard1"
>   --data-binary @-  -H 'Content-type:text/plain; charset=utf-8'
> 
> You're suggesting that after this update, we should always execute, curl 
> --proxy "" --silent "http://HOST:8983/solr/core3/update?commit=true";  Is that 
> correct? 
> It doesn't matter whether HOST is leader or replica. 
> 
> 
> 
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com] 
> Sent: Thursday, June 27, 2013 5:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: shardkey
> 
> You might be seeing https://issues.apache.org/jira/browse/SOLR-4923 ?
> 
> The commit true part of the request that add documents? If so, it might be 
> SOLR-4923 and you should try the commit in a request after adding the docs.
> 
> - Mark
> 
> On Jun 27, 2013, at 4:42 PM, "Joshi, Shital"  wrote:
> 
>> Hi,
>> 
>> We finally decided on using custom sharding (implicit document routing) for 
>> our project. We will have ~3 mil documents per shardkey.  We're maintaining 
>> shardkey -> shardid mapping in a database table. While adding documents we 
>> always specify _shard_ parameter in update URL but while querying,  we don't 
>> specify shards parameter. We want to search across shards. 
>> 
>> While experimenting we found that right after hard committing (commit=true 
>> in update URL), at times the query didn't return documents across shards 
>> (40% of the time) But many times (60% of the time) it returned documents 
>> across shards. When queried after few hours, the query always returned 
>> documents across  shards. Is that expected behavior? Is there a parameter to 
>> enforce querying across all shards? This is very important point for us to 
>> move further with SolrCloud. 
>> 
>> We're experimenting with adding a new shard and start directing all new 
>> documents to this new shard. Hopefully that should work.
>> 
>> Many Thanks! 
>> 
>> -Original Message-
>> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
>> Sent: Friday, June 21, 2013 8:50 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: shardkey
>> 
>> On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital  wrote:
>>> But now Solr stores composite id in the document id
>> 
>> Correct, it's the document id itself that contains everything needed
>> for tje compositeId router to determine the hash.
>> 
>>> It would only use it to calculate hash key but while storing
>> 
>> compositeId routing is when it makes sense to make the routing part of
>> the unique id so that an id is all the information needed to find the
>> document in the cluster.  For example customer_id!document_name.  From
>> your example of 20130611!test_14 it looks like you're doing time based
>> sharding, and one would normally not use the compositeId router for
>> that.
>> 
>> -Yonik
>> http://lucidworks.com
> 



Re: dataconfig to index ZIP Files

2013-06-28 Thread ericrs22
unfortunately not. I had tried that before with the logs saying:

Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
java.util.regex.PatternSyntaxException: Dangling meta character '*' near
index 0 


With .*zip i get this:


WARN
 
SimplePropertiesWriter
 
Unable to read: dataimport.properties



--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074009.html
Sent from the Solr - User mailing list archive at Nabble.com.


change solr core schema and config via http

2013-06-28 Thread Wu, James C.
Hi,

I am trying to figure out how to change the schema/config of an existing core 
or a core to be created via http calls to solr.  After spending hours in 
searching online, I still could not find any documents showing me how to do it.

The only way I know is that you have to log on to the solr host and then 
create/modify the schema/config files on the host. It seem surprising to me 
that we can create new core via the http interface but are not able to change 
the schema/config in the same way.

Can anyone give me some hints? Thanks.

Regards,

James


Re: change solr core schema and config via http

2013-06-28 Thread Rafał Kuć
Hello!

In 4.3.1 you can only read schema.xml or portions of it using Schema
API (https://issues.apache.org/jira/browse/SOLR-4658). It is a start
to allow schema.xml modifications using HTTP API, which will be a
functionality of next release of Solr -
https://issues.apache.org/jira/browse/SOLR-3251

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

> Hi,

> I am trying to figure out how to change the schema/config of an
> existing core or a core to be created via http calls to solr.  After
> spending hours in searching online, I still could not find any
> documents showing me how to do it.

> The only way I know is that you have to log on to the solr host and
> then create/modify the schema/config files on the host. It seem
> surprising to me that we can create new core via the http interface
> but are not able to change the schema/config in the same way.

> Can anyone give me some hints? Thanks.

> Regards,

> James



An issue with atomic updates?

2013-06-28 Thread Sam Antique
Hi all,

I think I have found an issue (or misleading behavior, per say) about
atomic updates.

If I do atomic updates on a field, and if the operation is none-sense
(anything other than add, set, inc), it still returns success. Say I send:

/update/json?commit=true -d '[{"id":"...", "field1":{"add":"value"}}]'

it adds fine and return success. But if I continue:

/update/json?commit=true -d '[{"id":"...",
"field1":{"none-sense":"value"}}]'

It still returns status:0, which is a bit misleading.

Is this a known issue?

Thanks,
Sam


Re: An issue with atomic updates?

2013-06-28 Thread Jack Krupansky
Well, it is known to me and documented in my book. BTW, that field value is 
simply ignored.


There are tons of places in Solr where undefined values or outright garbage 
are simply ignored, silently.


Go ahead and file a Jira though.

-- Jack Krupansky

-Original Message- 
From: Sam Antique

Sent: Friday, June 28, 2013 6:12 PM
To: solr-user@lucene.apache.org
Subject: An issue with atomic updates?

Hi all,

I think I have found an issue (or misleading behavior, per say) about
atomic updates.

If I do atomic updates on a field, and if the operation is none-sense
(anything other than add, set, inc), it still returns success. Say I send:

/update/json?commit=true -d '[{"id":"...", "field1":{"add":"value"}}]'

it adds fine and return success. But if I continue:

/update/json?commit=true -d '[{"id":"...",
"field1":{"none-sense":"value"}}]'

It still returns status:0, which is a bit misleading.

Is this a known issue?

Thanks,
Sam 



RE: change solr core schema and config via http

2013-06-28 Thread Wu, James C.
Hi,

It only allow adding new fields to the existing schema.

My problem is that I am trying to provide my own schema file when I create a 
new core and I do not have ssh access to the solr host.  Is this not even 
possible?

Regards,

james

-Original Message-
From: Rafał Kuć [mailto:r@solr.pl] 
Sent: Friday, June 28, 2013 3:11 PM
To: solr-user@lucene.apache.org
Subject: Re: change solr core schema and config via http

Hello!

In 4.3.1 you can only read schema.xml or portions of it using Schema API 
(https://issues.apache.org/jira/browse/SOLR-4658). It is a start to allow 
schema.xml modifications using HTTP API, which will be a functionality of next 
release of Solr -
https://issues.apache.org/jira/browse/SOLR-3251

--
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

> Hi,

> I am trying to figure out how to change the schema/config of an 
> existing core or a core to be created via http calls to solr.  After 
> spending hours in searching online, I still could not find any 
> documents showing me how to do it.

> The only way I know is that you have to log on to the solr host and 
> then create/modify the schema/config files on the host. It seem 
> surprising to me that we can create new core via the http interface 
> but are not able to change the schema/config in the same way.

> Can anyone give me some hints? Thanks.

> Regards,

> James



Re: change solr core schema and config via http

2013-06-28 Thread Jack Krupansky
How could you not have ssh access to the Solr host machine? I mean, how are 
you managing that server, without ssh access?


And if you are not managing the server, what business do you have trying to 
change the Solr configuration?!?!?


Something fishy here!

-- Jack Krupansky

-Original Message- 
From: Wu, James C.

Sent: Friday, June 28, 2013 6:30 PM
To: solr-user@lucene.apache.org
Subject: RE: change solr core schema and config via http

Hi,

It only allow adding new fields to the existing schema.

My problem is that I am trying to provide my own schema file when I create a 
new core and I do not have ssh access to the solr host.  Is this not even 
possible?


Regards,

james

-Original Message-
From: Rafał Kuć [mailto:r@solr.pl]
Sent: Friday, June 28, 2013 3:11 PM
To: solr-user@lucene.apache.org
Subject: Re: change solr core schema and config via http

Hello!

In 4.3.1 you can only read schema.xml or portions of it using Schema API 
(https://issues.apache.org/jira/browse/SOLR-4658). It is a start to allow 
schema.xml modifications using HTTP API, which will be a functionality of 
next release of Solr -

https://issues.apache.org/jira/browse/SOLR-3251

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch


Hi,



I am trying to figure out how to change the schema/config of an
existing core or a core to be created via http calls to solr.  After
spending hours in searching online, I still could not find any
documents showing me how to do it.



The only way I know is that you have to log on to the solr host and
then create/modify the schema/config files on the host. It seem
surprising to me that we can create new core via the http interface
but are not able to change the schema/config in the same way.



Can anyone give me some hints? Thanks.



Regards,



James




RE: change solr core schema and config via http

2013-06-28 Thread Wu, James C.
Hi,

Well, we try to use Solr to run a multi-tenant index/search service.  We 
assigns each client a different core with their own config and schema.  It 
would be good for us if we can just let the customer to be able to create cores 
with their own schema and config.  The customer would definitely not have ssh 
access to the solr host. 

Regards,

james

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Friday, June 28, 2013 3:57 PM
To: solr-user@lucene.apache.org
Subject: Re: change solr core schema and config via http

How could you not have ssh access to the Solr host machine? I mean, how are you 
managing that server, without ssh access?

And if you are not managing the server, what business do you have trying to 
change the Solr configuration?!?!?

Something fishy here!

-- Jack Krupansky

-Original Message-
From: Wu, James C.
Sent: Friday, June 28, 2013 6:30 PM
To: solr-user@lucene.apache.org
Subject: RE: change solr core schema and config via http

Hi,

It only allow adding new fields to the existing schema.

My problem is that I am trying to provide my own schema file when I create a 
new core and I do not have ssh access to the solr host.  Is this not even 
possible?

Regards,

james

-Original Message-
From: Rafał Kuć [mailto:r@solr.pl]
Sent: Friday, June 28, 2013 3:11 PM
To: solr-user@lucene.apache.org
Subject: Re: change solr core schema and config via http

Hello!

In 4.3.1 you can only read schema.xml or portions of it using Schema API 
(https://issues.apache.org/jira/browse/SOLR-4658). It is a start to allow 
schema.xml modifications using HTTP API, which will be a functionality of next 
release of Solr -
https://issues.apache.org/jira/browse/SOLR-3251

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch

> Hi,

> I am trying to figure out how to change the schema/config of an
> existing core or a core to be created via http calls to solr.  After
> spending hours in searching online, I still could not find any
> documents showing me how to do it.

> The only way I know is that you have to log on to the solr host and
> then create/modify the schema/config files on the host. It seem
> surprising to me that we can create new core via the http interface
> but are not able to change the schema/config in the same way.

> Can anyone give me some hints? Thanks.

> Regards,

> James



documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
Hey guys,

This has to be a stupid question/I must be doing something wrong, but after
frequent load testing with documentCache enabled under Solr 4.3.1 with
autoWarmCount=150, I'm noticing that my documentCache metrics are always
zero for non-cumlative.

At first I thought my commit rate is fast enough I just never see the
non-cumlative result, but after 100s of samples I still always get zero
values.

Here is the current output of my documentCache from Solr's admin for 1 core:

"

   - 
documentCache
  - class:org.apache.solr.search.LRUCache
  - version:1.0
  - description:LRU Cache(maxSize=512, initialSize=512,
  autowarmCount=150, regenerator=null)
  - src:$URL: https:/
  /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
  
solr/core/src/java/org/apache/solr/search/LRUCache.java$
  - stats:
 - lookups:0
 - hits:0
 - hitratio:0.00
 - inserts:0
 - evictions:0
 - size:0
 - warmupTime:0
 - cumulative_lookups:65198986
 - cumulative_hits:63075669
 - cumulative_hitratio:0.96
 - cumulative_inserts:2123317
 - cumulative_evictions:1010262
  "

The cumulative values seem to rise, suggesting doc cache is working, but at
the same time it seems I never see non-cumlative metrics, most importantly
warmupTime.

Am I doing something wrong, is this normal/by-design, or is there an issue
here?

Thanks for helping with my silly question! Have a good weekend,

Tim


Re: change solr core schema and config via http

2013-06-28 Thread Jack Krupansky

Ah, yes, good old multi-tenant - I should have known.

Yeah, the Solr API is evolving, albeit too slowly for the needs of some.

-- Jack Krupansky

-Original Message- 
From: Wu, James C.

Sent: Friday, June 28, 2013 7:06 PM
To: solr-user@lucene.apache.org
Subject: RE: change solr core schema and config via http

Hi,

Well, we try to use Solr to run a multi-tenant index/search service.  We 
assigns each client a different core with their own config and schema.  It 
would be good for us if we can just let the customer to be able to create 
cores with their own schema and config.  The customer would definitely not 
have ssh access to the solr host.


Regards,

james

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Friday, June 28, 2013 3:57 PM
To: solr-user@lucene.apache.org
Subject: Re: change solr core schema and config via http

How could you not have ssh access to the Solr host machine? I mean, how are 
you managing that server, without ssh access?


And if you are not managing the server, what business do you have trying to 
change the Solr configuration?!?!?


Something fishy here!

-- Jack Krupansky

-Original Message-
From: Wu, James C.
Sent: Friday, June 28, 2013 6:30 PM
To: solr-user@lucene.apache.org
Subject: RE: change solr core schema and config via http

Hi,

It only allow adding new fields to the existing schema.

My problem is that I am trying to provide my own schema file when I create a 
new core and I do not have ssh access to the solr host.  Is this not even 
possible?


Regards,

james

-Original Message-
From: Rafał Kuć [mailto:r@solr.pl]
Sent: Friday, June 28, 2013 3:11 PM
To: solr-user@lucene.apache.org
Subject: Re: change solr core schema and config via http

Hello!

In 4.3.1 you can only read schema.xml or portions of it using Schema API 
(https://issues.apache.org/jira/browse/SOLR-4658). It is a start to allow 
schema.xml modifications using HTTP API, which will be a functionality of 
next release of Solr -

https://issues.apache.org/jira/browse/SOLR-3251

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch


Hi,



I am trying to figure out how to change the schema/config of an
existing core or a core to be created via http calls to solr.  After
spending hours in searching online, I still could not find any
documents showing me how to do it.



The only way I know is that you have to log on to the solr host and
then create/modify the schema/config files on the host. It seem
surprising to me that we can create new core via the http interface
but are not able to change the schema/config in the same way.



Can anyone give me some hints? Thanks.



Regards,



James




Re: documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
To answer some of my own question, Shawn H's great reply on this thread
explains why I see no autoWarming on doc cache:

http://www.marshut.com/iznwr/soft-commit-and-document-cache.html

It is still unclear to me why I see no other metrics, however.

Thanks Shawn,

Tim


On 28 June 2013 16:14, Tim Vaillancourt  wrote:

> Hey guys,
>
> This has to be a stupid question/I must be doing something wrong, but
> after frequent load testing with documentCache enabled under Solr 4.3.1
> with autoWarmCount=150, I'm noticing that my documentCache metrics are
> always zero for non-cumlative.
>
> At first I thought my commit rate is fast enough I just never see the
> non-cumlative result, but after 100s of samples I still always get zero
> values.
>
> Here is the current output of my documentCache from Solr's admin for 1
> core:
>
> "
>
>- 
> documentCache
>   - class:org.apache.solr.search.LRUCache
>   - version:1.0
>   - description:LRU Cache(maxSize=512, initialSize=512,
>   autowarmCount=150, regenerator=null)
>   - src:$URL: https:/
>   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
>   
> solr/core/src/java/org/apache/solr/search/LRUCache.java$
>   - stats:
>  - lookups:0
>  - hits:0
>  - hitratio:0.00
>  - inserts: 0
>  - evictions:0
>  - size:0
>  - warmupTime:0
>  - cumulative_lookups: 65198986
>  - cumulative_hits:63075669
>  - cumulative_hitratio:0.96
>  - cumulative_inserts: 2123317
>  - cumulative_evictions:1010262
>   "
>
> The cumulative values seem to rise, suggesting doc cache is working, but
> at the same time it seems I never see non-cumlative metrics, most
> importantly warmupTime.
>
> Am I doing something wrong, is this normal/by-design, or is there an issue
> here?
>
> Thanks for helping with my silly question! Have a good weekend,
>
> Tim
>
>
>
>


Re: documentCache not used in 4.3.1?

2013-06-28 Thread Otis Gospodnetic
Hi Tim,

Not sure about the zeros in 4.3.1, but in SPM we see all these numbers
are non-0, though I haven't had the chance to confirm with Solr 4.3.1.

Note that you can't really autowarm document cache...

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt  wrote:
> Hey guys,
>
> This has to be a stupid question/I must be doing something wrong, but after
> frequent load testing with documentCache enabled under Solr 4.3.1 with
> autoWarmCount=150, I'm noticing that my documentCache metrics are always
> zero for non-cumlative.
>
> At first I thought my commit rate is fast enough I just never see the
> non-cumlative result, but after 100s of samples I still always get zero
> values.
>
> Here is the current output of my documentCache from Solr's admin for 1 core:
>
> "
>
>- 
> documentCache
>   - class:org.apache.solr.search.LRUCache
>   - version:1.0
>   - description:LRU Cache(maxSize=512, initialSize=512,
>   autowarmCount=150, regenerator=null)
>   - src:$URL: https:/
>   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
>   
> solr/core/src/java/org/apache/solr/search/LRUCache.java$
>   - stats:
>  - lookups:0
>  - hits:0
>  - hitratio:0.00
>  - inserts:0
>  - evictions:0
>  - size:0
>  - warmupTime:0
>  - cumulative_lookups:65198986
>  - cumulative_hits:63075669
>  - cumulative_hitratio:0.96
>  - cumulative_inserts:2123317
>  - cumulative_evictions:1010262
>   "
>
> The cumulative values seem to rise, suggesting doc cache is working, but at
> the same time it seems I never see non-cumlative metrics, most importantly
> warmupTime.
>
> Am I doing something wrong, is this normal/by-design, or is there an issue
> here?
>
> Thanks for helping with my silly question! Have a good weekend,
>
> Tim


Re: documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
Thanks Otis,

Yeah I realized after sending my e-mail that doc cache does not warm,
however I'm still lost on why there are no other metrics.

Thanks!

Tim


On 28 June 2013 16:22, Otis Gospodnetic  wrote:

> Hi Tim,
>
> Not sure about the zeros in 4.3.1, but in SPM we see all these numbers
> are non-0, though I haven't had the chance to confirm with Solr 4.3.1.
>
> Note that you can't really autowarm document cache...
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt 
> wrote:
> > Hey guys,
> >
> > This has to be a stupid question/I must be doing something wrong, but
> after
> > frequent load testing with documentCache enabled under Solr 4.3.1 with
> > autoWarmCount=150, I'm noticing that my documentCache metrics are always
> > zero for non-cumlative.
> >
> > At first I thought my commit rate is fast enough I just never see the
> > non-cumlative result, but after 100s of samples I still always get zero
> > values.
> >
> > Here is the current output of my documentCache from Solr's admin for 1
> core:
> >
> > "
> >
> >- documentCache<
> http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?entry=documentCache
> >
> >   - class:org.apache.solr.search.LRUCache
> >   - version:1.0
> >   - description:LRU Cache(maxSize=512, initialSize=512,
> >   autowarmCount=150, regenerator=null)
> >   - src:$URL: https:/
> >   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
> >   solr/core/src/java/org/apache/solr/search/LRUCache.java<
> https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java
> >$
> >   - stats:
> >  - lookups:0
> >  - hits:0
> >  - hitratio:0.00
> >  - inserts:0
> >  - evictions:0
> >  - size:0
> >  - warmupTime:0
> >  - cumulative_lookups:65198986
> >  - cumulative_hits:63075669
> >  - cumulative_hitratio:0.96
> >  - cumulative_inserts:2123317
> >  - cumulative_evictions:1010262
> >   "
> >
> > The cumulative values seem to rise, suggesting doc cache is working, but
> at
> > the same time it seems I never see non-cumlative metrics, most
> importantly
> > warmupTime.
> >
> > Am I doing something wrong, is this normal/by-design, or is there an
> issue
> > here?
> >
> > Thanks for helping with my silly question! Have a good weekend,
> >
> > Tim
>


Re: Replicating files containing external file fields

2013-06-28 Thread Arun Rangarajan
Jack,

Here is the ReplicationHandler definition from solrconfig.xml:


 
${enable.master:false}
 startup
 commit
optimize
 solrconfig.xml,data-config.xml,schema.xml,stopwords.txt,synonyms.txt,elevate.xml
 

 ${enable.slave:false}
 http://${master.ip}:${master.port}/solr/${
solr.core.name}/replication
 00:01:00
 


The confFiles are under the dir:
/var/solr/application-cores/List/conf
and the external file fields are like:
/var/solr-data/List/external_*

Should I add
/var/solr-data/List/external_*
to confFiles like this?

solrconfig.xml,data-config.xml,schema.xml,stopwords.txt,synonyms.txt,elevate.xml,
/var/solr-data/List/external_*


Also, can you tell me when (or whether) I need to do reloadCache on the
slave after the ext file fields are replicated?

Thx.


On Fri, Jun 28, 2013 at 10:13 AM, Jack Krupansky wrote:

> Show us your  directive. Maybe there is some subtle error in
> the file name.
>
> -- Jack Krupansky
>
> -Original Message- From: Arun Rangarajan
> Sent: Friday, June 28, 2013 1:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replicating files containing external file fields
>
>
> Erick,
> Thx for your reply. The external file field fields are already under
>  specified in solrconfig.xml. They are not getting replicated.
> (Solr version 4.2.1.)
>
>
> On Thu, Jun 27, 2013 at 10:50 AM, Erick Erickson 
> **wrote:
>
>  Haven't tried this, but I _think_ you can use the
>> "confFiles" trick with relative paths, see:
>> http://wiki.apache.org/solr/**SolrReplication
>>
>> Or just put your EFF files in the data dir?
>>
>> Best
>> Erick
>>
>>
>> On Wed, Jun 26, 2013 at 9:01 PM, Arun Rangarajan
>> **wrote:
>>
>> > From 
>> > https://wiki.apache.org/solr/**SolrReplicationI
>> >  understand that
>> index
>> > dir and any files under the conf dir can be replicated to slaves. I want
>> to
>> > know if there is any way the files under the data dir containing >
>> external
>> > file fields can be replicated. These are not replicated by default.
>> > Currently we are running the ext file field reload script on both the
>> > master and the slave and then running reloadCache on each server once
>> they
>> > are loaded.
>> >
>>
>>
>


Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-28 Thread Bill Au
I am running Solr 4.3.0, using DIH to import data from MySQL.  I am running
into a very strange problem where data from a datetime column being
imported with the right date but the time is 00:00:00.  I tried using SQL
DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.  The
raw debug response of DIH, it looks like the time porting of the datetime
data is already 00:00:00 in Solr jdbc query result.

So I looked at the source code of DIH JdbcDataSource class.  It is using
java.sql.ResultSet and its getDate() method to handle date column.  The
getDate() method returns java.sql.Date.  The java api doc for java.sql.Date

http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html

states that:

"To conform with the definition of SQL DATE, the millisecond values wrapped
by a java.sql.Date instance must be 'normalized' by setting the hours,
minutes, seconds, and milliseconds to zero in the particular time zone with
which the instance is associated."

This seems to be describing exactly my problem.  Has anyone else notice
this problem?  Has anyone use DIH to index SQL datetime successfully?  If
so can you send me the relevant portion of the DIH config?

Bill


Re: Replicating files containing external file fields

2013-06-28 Thread Jack Krupansky
Yes, you need to list that EFF file in the "confFiles" list - only those 
listed files will be replicated.


solrconfig.xml,data-config.xml,schema.xml,stopwords.txt,synonyms.txt,elevate.xml,
/var/solr-data/List/external_*

Oops... sorry, no wildcards... you must list the individual files.

Technically, the path is supposed to be relative to the Solr collection 
"conf" directory, so you MAY have you may have to put lots of "../" in the 
path to get to the files, like:


../../../../solr-data/List/external_1

Tor each file.

(This is what Erick was referring to.)

Sorry, I don't have the answer to the reload question at the tip of my 
tongue.


-- Jack Krupansky

-Original Message- 
From: Arun Rangarajan

Sent: Friday, June 28, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Replicating files containing external file fields

Jack,

Here is the ReplicationHandler definition from solrconfig.xml:



${enable.master:false}
startup
commit
optimize
solrconfig.xml,data-config.xml,schema.xml,stopwords.txt,synonyms.txt,elevate.xml


${enable.slave:false}
http://${master.ip}:${master.port}/solr/${
solr.core.name}/replication
00:01:00



The confFiles are under the dir:
/var/solr/application-cores/List/conf
and the external file fields are like:
/var/solr-data/List/external_*

Should I add
/var/solr-data/List/external_*
to confFiles like this?

solrconfig.xml,data-config.xml,schema.xml,stopwords.txt,synonyms.txt,elevate.xml,
/var/solr-data/List/external_*


Also, can you tell me when (or whether) I need to do reloadCache on the
slave after the ext file fields are replicated?

Thx.


On Fri, Jun 28, 2013 at 10:13 AM, Jack Krupansky 
wrote:



Show us your  directive. Maybe there is some subtle error in
the file name.

-- Jack Krupansky

-Original Message- From: Arun Rangarajan
Sent: Friday, June 28, 2013 1:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Replicating files containing external file fields


Erick,
Thx for your reply. The external file field fields are already under
 specified in solrconfig.xml. They are not getting replicated.
(Solr version 4.2.1.)


On Thu, Jun 27, 2013 at 10:50 AM, Erick Erickson 
**wrote:

 Haven't tried this, but I _think_ you can use the

"confFiles" trick with relative paths, see:
http://wiki.apache.org/solr/**SolrReplication

Or just put your EFF files in the data dir?

Best
Erick


On Wed, Jun 26, 2013 at 9:01 PM, Arun Rangarajan
**wrote:

> From 
> https://wiki.apache.org/solr/**SolrReplicationI 
> understand that

index
> dir and any files under the conf dir can be replicated to slaves. I 
> want

to
> know if there is any way the files under the data dir containing >
external
> file fields can be replicated. These are not replicated by default.
> Currently we are running the ext file field reload script on both the
> master and the slave and then running reloadCache on each server once
they
> are loaded.
>








Re: Joins with SolrCloud

2013-06-28 Thread Chris Toomey
Thanks, confirmed by trying w/ 4.3.1 that the join works with the outer
collection distributed/sharded so long as the inner collection is not
distributed/sharded.

Chris


On Tue, Jun 25, 2013 at 4:55 PM, Upayavira  wrote:

> I have never heard mention that joins support distributed search, so you
> cannot do a join against a sharded core.
>
> However, if from your example, innerCollection was replicated across all
> nodes, I would think that should work, because all that comes back from
> each server when a distributed search happens is the best 'n' matches,
> so exactly how those 'n' matches were located doesn't matter
> particularly.
>
> Simpler answer: try it!
>
> Upayavira
>
> On Tue, Jun 25, 2013, at 11:25 PM, Chris Toomey wrote:
> > What are the restrictions/limitations w.r.t. joins when using SolrCloud?
> >
> > Say I have a 3-node cluster and both my "outer" and "inner" collections
> > are
> > sharded 3 ways across the cluster.  Could I do a query such as
> >
> "select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foo&collection=outerCollection"?
> >
> > Or if the above isn't supported, would it be if the "inner" collection
> > was
> > not sharded and was replicated across all 3 nodes, so that it existed in
> > its entirety on each node?
> >
> > thx,
> > Chris
>


FileDataSource vs JdbcDataSouce (speed) Solr 3.5

2013-06-28 Thread Mike L.
 
I've been working on improving index time with a JdbcDataSource DIH based 
config and found it not to be as performant as I'd hoped for, for various 
reasons, not specifically due to solr. With that said, I decided to switch 
gears a bit and test out FileDataSource setup... I assumed by eliminiating 
network latency, I should see drastic improvements in terms of import time..but 
I'm a bit surprised that this process seems to run much slower, at least the 
way I've initially coded it. (below)
 
The below is a barebone file import that I wrote which consumes a tab delimited 
file. Nothing fancy here. The regex just seperates out the fields... Is there 
faster approach to doing this? If so, what is it?
 
Also, what is the "recommended" approach in terms of index/importing data? I 
know thats may come across as a vague question as there are various options 
available, but which one would be considered the "standard" approach within a 
production enterprise environment.
 
 
(below has been cleansed)
 

 
   
 
 
 
   

 
Thanks in advance,
Mike

Improving performance to return 2000+ documents

2013-06-28 Thread Utkarsh Sengar
Hello,

I have a usecase where I need to retrive top 2000 documents matching a
query.
What are the parameters (in query, solrconfig, schema) I shoud look at to
improve this?

I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB RAM,
8vCPU and 7GB JVM heap size.

I have documentCache:
  

allText is a copyField.

This is the result I get:
ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 "
http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
"

Benchmarking x.amazonaws.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:
Server Hostname:x.amazonaws.com
Server Port:8983

Document Path:
/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
Document Length:1538537 bytes

Concurrency Level:  10
Time taken for tests:   35.999 seconds
Complete requests:  500
Failed requests:21
   (Connect: 0, Receive: 0, Length: 21, Exceptions: 0)
Write errors:   0
Non-2xx responses:  2
Total transferred:  766221660 bytes
HTML transferred:   766191806 bytes
Requests per second:13.89 [#/sec] (mean)
Time per request:   719.981 [ms] (mean)
Time per request:   71.998 [ms] (mean, across all concurrent requests)
Transfer rate:  20785.65 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:00   0.6  0   8
Processing: 9  717 2339.6199   12611
Waiting:9  635 2233.6164   12580
Total:  9  718 2339.6199   12611

Percentage of the requests served within a certain time (ms)
  50%199
  66%236
  75%263
  80%281
  90%548
  95%838
  98%  12475
  99%  12545
 100%  12611 (longest request)

-- 
Thanks,
-Utkarsh


Re: Joins with SolrCloud

2013-06-28 Thread Yonik Seeley
On Tue, Jun 25, 2013 at 7:55 PM, Upayavira  wrote:
> However, if from your example, innerCollection was replicated across all
> nodes, I would think that should work, because all that comes back from
> each server when a distributed search happens is the best 'n' matches,
> so exactly how those 'n' matches were located doesn't matter
> particularly.

Yes, joins would only join documents residing on the same shard.
Distributed search with joins should work fine provided that you have
co-located documents you want to join.

-Yonik
http://lucidworks.com


Re: Improving performance to return 2000+ documents

2013-06-28 Thread Utkarsh Sengar
Also, I don't see a consistent response time from solr, I ran ab again and
I get this:

ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 "
http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
"


Benchmarking x.amazonaws.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:
Server Hostname:   x.amazonaws.com
Server Port:8983

Document Path:
/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
Document Length:1538537 bytes

Concurrency Level:  10
Time taken for tests:   10.858 seconds
Complete requests:  500
Failed requests:8
   (Connect: 0, Receive: 0, Length: 8, Exceptions: 0)
Write errors:   0
Total transferred:  769297992 bytes
HTML transferred:   769268492 bytes
Requests per second:46.05 [#/sec] (mean)
Time per request:   217.167 [ms] (mean)
Time per request:   21.717 [ms] (mean, across all concurrent requests)
Transfer rate:  69187.90 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:00   0.3  0   2
Processing:   110  215  72.0190 497
Waiting:   91  180  70.5152 473
Total:112  216  72.0191 497

Percentage of the requests served within a certain time (ms)
  50%191
  66%225
  75%252
  80%272
  90%319
  95%364
  98%420
  99%453
 100%497 (longest request)


Sometimes it takes a lot of time, sometimes its pretty quick.

Thanks,
-Utkarsh


On Fri, Jun 28, 2013 at 5:39 PM, Utkarsh Sengar wrote:

> Hello,
>
> I have a usecase where I need to retrive top 2000 documents matching a
> query.
> What are the parameters (in query, solrconfig, schema) I shoud look at to
> improve this?
>
> I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB
> RAM, 8vCPU and 7GB JVM heap size.
>
> I have documentCache:
>initialSize="100"   autowarmCount="0"/>
>
> allText is a copyField.
>
> This is the result I get:
> ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 "
> http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> "
>
> Benchmarking x.amazonaws.com (be patient)
> Completed 100 requests
> Completed 200 requests
> Completed 300 requests
> Completed 400 requests
> Completed 500 requests
> Finished 500 requests
>
>
> Server Software:
> Server Hostname:x.amazonaws.com
> Server Port:8983
>
> Document Path:
> /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> Document Length:1538537 bytes
>
> Concurrency Level:  10
> Time taken for tests:   35.999 seconds
> Complete requests:  500
> Failed requests:21
>(Connect: 0, Receive: 0, Length: 21, Exceptions: 0)
> Write errors:   0
> Non-2xx responses:  2
> Total transferred:  766221660 bytes
> HTML transferred:   766191806 bytes
> Requests per second:13.89 [#/sec] (mean)
> Time per request:   719.981 [ms] (mean)
> Time per request:   71.998 [ms] (mean, across all concurrent requests)
> Transfer rate:  20785.65 [Kbytes/sec] received
>
> Connection Times (ms)
>   min  mean[+/-sd] median   max
> Connect:00   0.6  0   8
> Processing: 9  717 2339.6199   12611
> Waiting:9  635 2233.6164   12580
> Total:  9  718 2339.6199   12611
>
> Percentage of the requests served within a certain time (ms)
>   50%199
>   66%236
>   75%263
>   80%281
>   90%548
>   95%838
>   98%  12475
>   99%  12545
>  100%  12611 (longest request)
>
> --
> Thanks,
> -Utkarsh
>



-- 
Thanks,
-Utkarsh


Re: cores sharing an instance

2013-06-28 Thread Shalin Shekhar Mangar
There is very little shared between multiple cores (instanceDir paths,
logging config maybe?). Why are you trying to do this?

On Sat, Jun 29, 2013 at 1:14 AM, Peyman Faratin  wrote:
> Hi
>
> I have a multicore setup (in 4.3.0). Is it possible for one core to share an 
> instance of its class with other cores at run time? i.e.
>
> At run time core 1 makes an instance of object O_i
>
> core 1 --> object O_i
> core 2
> ---
> core n
>
> then can core K access O_i? I know they can share properties but is it 
> possible to share objects?
>
> thank you
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: dataconfig to index ZIP Files

2013-06-28 Thread Shalin Shekhar Mangar
What is dataSource="binaryFile"? I don't see any such data source
defined in your configuration.

On Fri, Jun 28, 2013 at 11:50 PM, ericrs22  wrote:
> So I thought I had it correctly setup but I'm receiveing the following
> response to my Data Config
>
> Last Update: 18:17:52
>
>  (Duration: 07s)
>
> Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s)
>
> Started: 13 minutes ago
>
> Here's my Data config.
>
> 
> 
> 
>processor="FileListEntityProcessor" baseDir="E:\ArchiveRoot"
> fileName=".*zip" recursive="true" rootEntity="false" dataSource="binaryFile"
> onError="skip">
>
> 
>
>
> 
> 
> 
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-28 Thread Shalin Shekhar Mangar
The default in JdbcDataSource is to use ResultSet.getObject which
returns the underlying database's type. The type specific methods in
ResultSet are not invoked unless you are using convertType="true".

Is MySQL actually returning java.sql.Timestamp objects?

On Sat, Jun 29, 2013 at 5:22 AM, Bill Au  wrote:
> I am running Solr 4.3.0, using DIH to import data from MySQL.  I am running
> into a very strange problem where data from a datetime column being
> imported with the right date but the time is 00:00:00.  I tried using SQL
> DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.  The
> raw debug response of DIH, it looks like the time porting of the datetime
> data is already 00:00:00 in Solr jdbc query result.
>
> So I looked at the source code of DIH JdbcDataSource class.  It is using
> java.sql.ResultSet and its getDate() method to handle date column.  The
> getDate() method returns java.sql.Date.  The java api doc for java.sql.Date
>
> http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
>
> states that:
>
> "To conform with the definition of SQL DATE, the millisecond values wrapped
> by a java.sql.Date instance must be 'normalized' by setting the hours,
> minutes, seconds, and milliseconds to zero in the particular time zone with
> which the instance is associated."
>
> This seems to be describing exactly my problem.  Has anyone else notice
> this problem?  Has anyone use DIH to index SQL datetime successfully?  If
> so can you send me the relevant portion of the DIH config?
>
> Bill



-- 
Regards,
Shalin Shekhar Mangar.