Re: Conditional Add/Overwrite a document

2015-10-31 Thread Gili Nachum
Amazing. Thanks Brendan.

On Thu, Oct 29, 2015 at 12:31 AM, Brendan Humphreys <
bren...@canva.com.invalid> wrote:

> Hi Gili,
>
> It sounds like Solr's DocBasedVersionConstraintsProcessor is what you are
> looking for:
>
>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-DocumentCentricVersioningConstraints
>
> Cheers,
> -Brendan
>
> On 29 October 2015 at 06:33, Gili Nachum  wrote:
>
> > Hi, Is there a conditional Add operation in Solr?
> >
> > My documents have "my_int" field and when re-adding a document with the
> > same ID, I would to overwrite the existing doc only if the new doc my_int
> > value is higher than that of the existing doc.
> >
> > As a naive solution, I could first read the existing doc to check if it
> > exist or what the my_int value is, but I rather avoid the round trip,
> since
> > I need to support a high index throughput.
> >
> > I'm willing to write some Solr plugin code if that's a must. If I do that
> > which class should I extend/add? Would setting the my_int values as
> > docValues help to make them more easily available to the indexer?
> >
>
> --
> [image: Canva] 
> Empowering the world to design
> Also, we're hiring. Apply here! 
> [image: Twitter] [image: Facebook]
> [image: LinkedIn]
> [image: Instagram]
> 
>


How to retrieve single child document with block join

2015-10-31 Thread Yangrui Guo
Hi

I want to know if I can get the child document only if it contains the
query term. Currently I could retrieve all child document at once with
query expansion. Does solr support individual child retrieval?

Thanks,

Yangrui


Re: How to retrieve single child document with block join

2015-10-31 Thread Mikhail Khludnev
Hello Yangrui,

The question is not clear so far, but it sounds like it can be achieved via
fl=[child ... childrenFilter=field:],
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents


On Sat, Oct 31, 2015 at 8:52 AM, Yangrui Guo  wrote:

> Hi
>
> I want to know if I can get the child document only if it contains the
> query term. Currently I could retrieve all child document at once with
> query expansion. Does solr support individual child retrieval?
>
> Thanks,
>
> Yangrui
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Question on index time de-duplication

2015-10-31 Thread Zheng Lin Edwin Yeo
Hi Shamik,

I'm using most of the configuration out of the box, but I'm also looking at
tagging an identifier or something so that it will always show the latest
documents.

At first I thought it will automatically show the one that is indexed
later, but seems that it is not the case. It will just show a random one if
we use the default configurations.

Will update here also if I find any solutions or tips.

Regards,
Edwin


On 31 October 2015 at 00:38, shamik  wrote:

> Thanks for your reply. Have you customized SignatureUpdateProcessorFactory
> or
> are you using the configuration out of the box ? I know it works for simple
> dedup, but my requirement is tad different as I need to tag an identifier
> to
> the latest document. My goal is to understand if that's possible using
> SignatureUpdateProcessorFactory.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Question-on-index-time-de-duplication-tp4237306p4237409.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


contributor request

2015-10-31 Thread Alex
Hi,

Please kindly add me to the Solr wiki contributors list. The app we're
developing (Jitbit Help) is using Apache Solr to power our knowledge-base
search engine, customers love it. (we were using MS Fulltext indexing
service before, but it's a huge PITA).

Thanks


Re: contributor request

2015-10-31 Thread Alex
Oh, shoot, forgot to include my wiki username. Its "AlexYumas" sorry about
that stupid me

On Sat, Oct 31, 2015 at 10:48 PM, Alex  wrote:

> Hi,
>
> Please kindly add me to the Solr wiki contributors list. The app we're
> developing (Jitbit Help) is using Apache Solr to power our knowledge-base
> search engine, customers love it. (we were using MS Fulltext indexing
> service before, but it's a huge PITA).
>
> Thanks
>


Solr getting irrelevant results when use block join

2015-10-31 Thread Yangrui Guo
Hi I'm using solr to search imdb database. I set the parent entity to
include the name for each actor/actress and child entity for his movies.
Because user might either enter a movie or a person I did not specify which
entity solr should return. When I just search q=Kate AND Winslet without
block join solr returned me the correct result. However, when I search
{!parent which="type:parent"}+(Kate AND Winslet) solr seemed to have
returned all document containing just term "Kate". I tried quoting the
terms but the order needs to be exactly "Kate Winslet". Is there any method
I can boost higher the score of the document which includes the terms in
the same field?

Yangrui


Re: Solr getting irrelevant results when use block join

2015-10-31 Thread Walter Underwood
This will probably work better without child documents and joins.

I would denormalize into actor documents and movie documents. At least, that’s 
what I did at Netflix.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 31, 2015, at 1:17 PM, Yangrui Guo  wrote:
> 
> Hi I'm using solr to search imdb database. I set the parent entity to
> include the name for each actor/actress and child entity for his movies.
> Because user might either enter a movie or a person I did not specify which
> entity solr should return. When I just search q=Kate AND Winslet without
> block join solr returned me the correct result. However, when I search
> {!parent which="type:parent"}+(Kate AND Winslet) solr seemed to have
> returned all document containing just term "Kate". I tried quoting the
> terms but the order needs to be exactly "Kate Winslet". Is there any method
> I can boost higher the score of the document which includes the terms in
> the same field?
> 
> Yangrui



Re: Problem with the Content Field during Solr Indexing

2015-10-31 Thread Zheng Lin Edwin Yeo
Hi Shruti,

>From what I understand, the /update/extract handler is for indexing
rich-text documents, and does not support ".png" files.

It only supports the following files format: pdf, doc, docx, ppt, pptx,
xls, xlsx, odt, odp, ods, ott, otp, ots, rtf, htm, html, txt, log
If you use the default post.jar, I believe the other formats will get
filtered out.

When I tried to index ".png" file in my custom handler, it just index "
" in the content.

Regards,
Edwin



On 31 October 2015 at 09:35, Shruti Mundra  wrote:

> Hi Edwin,
>
> The file extension of the image file is ".png" and we are following this
> url for indexing:
> "
>
> http://blog.thedigitalgroup.com/vijaym/wp-content/uploads/sites/11/2015/07/SolrImageExtract.png
> "
>
> Thanks and Regards,
> Shruti Mundra
>
> On Thu, Oct 29, 2015 at 8:33 PM, Zheng Lin Edwin Yeo  >
> wrote:
>
> > The "\n" actually means new line as decoded by Solr from the indexed
> > document.
> >
> > What is your file extension of your image file, and which method are you
> > using to do the indexing?
> >
> > Regards,
> > Edwin
> >
> >
> > On 30 October 2015 at 04:38, Shruti Mundra  wrote:
> >
> > > Hi,
> > >
> > > When I'm trying index an image file directly to Solr, the attribute
> > > content, consists of trails of "\n"s and not the data.
> > > We are successful in getting the metadata for that image.
> > >
> > > Can anyone help us out on how we could get the content along with the
> > > Metadata.
> > >
> > > Thanks!
> > >
> > > - Shruti Mundra
> > >
> >
>


Re: contributor request

2015-10-31 Thread Erick Erickson
Looks like Steve added you today, you should be all set.

On Sat, Oct 31, 2015 at 12:50 PM, Alex  wrote:
> Oh, shoot, forgot to include my wiki username. Its "AlexYumas" sorry about
> that stupid me
>
> On Sat, Oct 31, 2015 at 10:48 PM, Alex  wrote:
>
>> Hi,
>>
>> Please kindly add me to the Solr wiki contributors list. The app we're
>> developing (Jitbit Help) is using Apache Solr to power our knowledge-base
>> search engine, customers love it. (we were using MS Fulltext indexing
>> service before, but it's a huge PITA).
>>
>> Thanks
>>


Kate Winslet vs Winslet Kate

2015-10-31 Thread Yangrui Guo
Hi today I found an interesting aspect of solr. I imported IMDB data into
solr. The IMDB puts last name before first name for its person's name field
eg. "Winslet, Kate". When I search "Winslet Kate" with quotation marks I
could get the exact result. However if I search "Kate Winslet" or Kate AND
Winslet solr seem to return me all result containing either Kate or Winslet
which is similar to "Winslet Kate"~99. From user perspective I
certainly want solr to treat Kate Winslet the same as Winslet Kate. Is
there anyway to make solr score higher for terms in the same field?

Yangrui


Re: Kate Winslet vs Winslet Kate

2015-10-31 Thread Erick Erickson
There are a couple of anomalies here.

1> kate AND winslet
What does the query look like if you add &debug=true to the statement
and look at the "parsed_query" section of the return?  My guess is you
typed "q=name:kate AND winslet" which parses as "q=name:kate AND
default_search_field:winslet" and are getting matches you don't
expect. You need something like "q=name:(kate AND winslet)" or
"q=name:kate AND name:winslet". Note that if you're using eDIsmax it's
more complicated, but that should still honor the intent.

2> I have no idea why searching for "Kate Winslet" in quotes returns
anything, I wouldn't expect it to unless you mean you type in "q=kate
winslet" which is searching against your default field, not the name
field.

Best,
Erick

On Sat, Oct 31, 2015 at 8:52 PM, Yangrui Guo  wrote:
> Hi today I found an interesting aspect of solr. I imported IMDB data into
> solr. The IMDB puts last name before first name for its person's name field
> eg. "Winslet, Kate". When I search "Winslet Kate" with quotation marks I
> could get the exact result. However if I search "Kate Winslet" or Kate AND
> Winslet solr seem to return me all result containing either Kate or Winslet
> which is similar to "Winslet Kate"~99. From user perspective I
> certainly want solr to treat Kate Winslet the same as Winslet Kate. Is
> there anyway to make solr score higher for terms in the same field?
>
> Yangrui


Re: Kate Winslet vs Winslet Kate

2015-10-31 Thread Yangrui Guo
Thanks for the reply. Putting the name: before the terms did the work. I
just wanted to generalize the search query because users might be
interested in querying Kate Winslet herself or her movies. If user enter
query string "Kate Winslet movie", the query q=name:(Kate AND Winslet AND
movie) will return nothing.

Yangrui Guo

On Saturday, October 31, 2015, Erick Erickson 
wrote:

> There are a couple of anomalies here.
>
> 1> kate AND winslet
> What does the query look like if you add &debug=true to the statement
> and look at the "parsed_query" section of the return?  My guess is you
> typed "q=name:kate AND winslet" which parses as "q=name:kate AND
> default_search_field:winslet" and are getting matches you don't
> expect. You need something like "q=name:(kate AND winslet)" or
> "q=name:kate AND name:winslet". Note that if you're using eDIsmax it's
> more complicated, but that should still honor the intent.
>
> 2> I have no idea why searching for "Kate Winslet" in quotes returns
> anything, I wouldn't expect it to unless you mean you type in "q=kate
> winslet" which is searching against your default field, not the name
> field.
>
> Best,
> Erick
>
> On Sat, Oct 31, 2015 at 8:52 PM, Yangrui Guo  > wrote:
> > Hi today I found an interesting aspect of solr. I imported IMDB data into
> > solr. The IMDB puts last name before first name for its person's name
> field
> > eg. "Winslet, Kate". When I search "Winslet Kate" with quotation marks I
> > could get the exact result. However if I search "Kate Winslet" or Kate
> AND
> > Winslet solr seem to return me all result containing either Kate or
> Winslet
> > which is similar to "Winslet Kate"~99. From user perspective I
> > certainly want solr to treat Kate Winslet the same as Winslet Kate. Is
> > there anyway to make solr score higher for terms in the same field?
> >
> > Yangrui
>


Re: Kate Winslet vs Winslet Kate

2015-10-31 Thread Daniel Valdivia
Perhaps

q=name:("Kate AND Winslet")

q=name:("Kate Winslet")

Sent from my iPhone

> On Oct 31, 2015, at 10:21 PM, Yangrui Guo  wrote:
> 
> Thanks for the reply. Putting the name: before the terms did the work. I
> just wanted to generalize the search query because users might be
> interested in querying Kate Winslet herself or her movies. If user enter
> query string "Kate Winslet movie", the query q=name:(Kate AND Winslet AND
> movie) will return nothing.
> 
> Yangrui Guo
> 
> On Saturday, October 31, 2015, Erick Erickson 
> wrote:
> 
>> There are a couple of anomalies here.
>> 
>> 1> kate AND winslet
>> What does the query look like if you add &debug=true to the statement
>> and look at the "parsed_query" section of the return?  My guess is you
>> typed "q=name:kate AND winslet" which parses as "q=name:kate AND
>> default_search_field:winslet" and are getting matches you don't
>> expect. You need something like "q=name:(kate AND winslet)" or
>> "q=name:kate AND name:winslet". Note that if you're using eDIsmax it's
>> more complicated, but that should still honor the intent.
>> 
>> 2> I have no idea why searching for "Kate Winslet" in quotes returns
>> anything, I wouldn't expect it to unless you mean you type in "q=kate
>> winslet" which is searching against your default field, not the name
>> field.
>> 
>> Best,
>> Erick
>> 
>> On Sat, Oct 31, 2015 at 8:52 PM, Yangrui Guo > > wrote:
>>> Hi today I found an interesting aspect of solr. I imported IMDB data into
>>> solr. The IMDB puts last name before first name for its person's name
>> field
>>> eg. "Winslet, Kate". When I search "Winslet Kate" with quotation marks I
>>> could get the exact result. However if I search "Kate Winslet" or Kate
>> AND
>>> Winslet solr seem to return me all result containing either Kate or
>> Winslet
>>> which is similar to "Winslet Kate"~99. From user perspective I
>>> certainly want solr to treat Kate Winslet the same as Winslet Kate. Is
>>> there anyway to make solr score higher for terms in the same field?
>>> 
>>> Yangrui
>> 


Re: Kate Winslet vs Winslet Kate

2015-10-31 Thread Erick Erickson
Yeah, that's actually a tough one. You have no control over what the user types,
you have to try to guess what they meant.

To do that right, you really have to have some meta-data besides what the user
typed in, i.e. recognize "kate" and "winslet" are proper names and "movies" is
something else and break up the query appropriately behind the scenes.

edismax might help here. You could copyField for everything into a
bag_of_words field then boost the name field quite high relative to the
bag_of_words field. That way, and _assuming_ that the bag_of_words
field had all three words, then the user at least gets something.

You can also do some tricks with edismax and the "pf" parameters. That
option automatically takes the input and makes a phrase out of it against
the field, so you get better scores for, say, the name field if it contains
the phrase "kate winslet". doesn't help with the kate winslet movies
though.

On Sat, Oct 31, 2015 at 11:11 PM, Daniel Valdivia
 wrote:
> Perhaps
>
> q=name:("Kate AND Winslet")
>
> q=name:("Kate Winslet")
>
> Sent from my iPhone
>
>> On Oct 31, 2015, at 10:21 PM, Yangrui Guo  wrote:
>>
>> Thanks for the reply. Putting the name: before the terms did the work. I
>> just wanted to generalize the search query because users might be
>> interested in querying Kate Winslet herself or her movies. If user enter
>> query string "Kate Winslet movie", the query q=name:(Kate AND Winslet AND
>> movie) will return nothing.
>>
>> Yangrui Guo
>>
>> On Saturday, October 31, 2015, Erick Erickson 
>> wrote:
>>
>>> There are a couple of anomalies here.
>>>
>>> 1> kate AND winslet
>>> What does the query look like if you add &debug=true to the statement
>>> and look at the "parsed_query" section of the return?  My guess is you
>>> typed "q=name:kate AND winslet" which parses as "q=name:kate AND
>>> default_search_field:winslet" and are getting matches you don't
>>> expect. You need something like "q=name:(kate AND winslet)" or
>>> "q=name:kate AND name:winslet". Note that if you're using eDIsmax it's
>>> more complicated, but that should still honor the intent.
>>>
>>> 2> I have no idea why searching for "Kate Winslet" in quotes returns
>>> anything, I wouldn't expect it to unless you mean you type in "q=kate
>>> winslet" which is searching against your default field, not the name
>>> field.
>>>
>>> Best,
>>> Erick
>>>
>>> On Sat, Oct 31, 2015 at 8:52 PM, Yangrui Guo >> > wrote:
 Hi today I found an interesting aspect of solr. I imported IMDB data into
 solr. The IMDB puts last name before first name for its person's name
>>> field
 eg. "Winslet, Kate". When I search "Winslet Kate" with quotation marks I
 could get the exact result. However if I search "Kate Winslet" or Kate
>>> AND
 Winslet solr seem to return me all result containing either Kate or
>>> Winslet
 which is similar to "Winslet Kate"~99. From user perspective I
 certainly want solr to treat Kate Winslet the same as Winslet Kate. Is
 there anyway to make solr score higher for terms in the same field?

 Yangrui
>>>