Solr 5.3 Faceting on Children with Block Join Parser

2015-08-31 Thread Tom Devel
Apologies for cross posting a question from SO here.

I am very interested in the new faceting on child documents feature of Solr
5.3 and would like to know if somebody has figured out how to do it as
asked in the question on
http://stackoverflow.com/questions/32212949/solr-5-3-faceting-on-children-with-block-join-parser

Thanks for any hints,
Tom

The question is:

Solr 5.3 supports faceting on nested documents [1], with a great tutorial
from Yonik [2].

In the tutorial example, the query to get the documents for faceting is
directly performed on the child documents:

$ curl http://localhost:8983/solr/demo/query -d '
q=author_s:yonik&fl=id,comment_t&
json.facet={
genres : {
type: terms,
field: cat_s,
domain: { blockParent : "type_s:book" }
}
}'

What I do not know is how to facet on child documents returned from a Block
Join Parent Query Parser [3] and provided through ExpandComponent [4].

What I have working so far is the same as in the example from the
ExpandComponent [4]: Query the child fields to return the parent documents
(see 1.), then expand the result to get the relevant child documents (see
2.)


1. q={!parent which="type_s:parent" v='text_t:solr'}

2. &expand=true&expand.field=ISBN_s&expand.q=*:*

What I need:

Having steps 1.) and 2.) already working, how can we facet on some field
(does not matter which) of the returned child documents from (2.) ?

  [1]: http://yonik.com/solr-5-3/
  [2]: http://yonik.com/solr-nested-objects/
  [3]: https://cwiki.apache.org/confluence/display/solr/Other+Parsers
  [4]: http://heliosearch.org/expand-block-join/


Search over a multiValued field

2015-03-03 Thread Tom Devel
Hi,

I am running Solr 5.0.0 and have a question about proximity search and
multiValued fields.

I am indexing xml files of the following form with foundField being a field
defined as multiValued and text_en my in schema.xml.



8
"Oranges from South California - ordered"
"Green Apples - available"
"Black Report Books - ordered"


There are several such documents, and for instance, I would like to query
all documents having in the foundField "Oranges" and "ordered". The
following proximity query takes care of it:

q=foundField:("oranges AND ordered"~2)

However, a field could have more words, and I also cannot know the
proximity of the desired query words in advance. Setting the proximity
value too high results in false positives, the following query also returns
the document (although "available" was in the entry about Apples):

foundField:("oranges AND available"~200)

I do not think that tweaking a proximity value is the correct approach.

How can I search to match contents in a multiValued field per Value as
described above, without running into the problem?

Many thanks for any help


Re: Search over a multiValued field

2015-03-03 Thread Tom Devel
Jack,

This is exactly what I was looking for, thanks. I found the
positionIncrementGap attribute in the schema.xml for the text_en

I was putting in "AND" because I read in the Solr documentation that "The
OR operator is the default conjunction operator."

Does it mean that words between " symbols, such as "Orange ordered" are
treated as a single term, with (implicitly) AND conjunction between them?

Where could I found more info about this?

I am currently reading
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser

Thanks again

On Tue, Mar 3, 2015 at 3:58 PM, Jack Krupansky 
wrote:

> Just set the positionIncrementGap for the multivalued field to a much
> higher value, like 1000 or 5000. That's the purpose of this attribute, to
> assure that reasonable proximity matches don't match across multiple
> values.
>
> Also, leave "AND" out of the query phrases - you're just trying to match
> the product name and availability.
>
>
> -- Jack Krupansky
>
> On Tue, Mar 3, 2015 at 4:51 PM, Tom Devel  wrote:
>
> > Hi,
> >
> > I am running Solr 5.0.0 and have a question about proximity search and
> > multiValued fields.
> >
> > I am indexing xml files of the following form with foundField being a
> field
> > defined as multiValued and text_en my in schema.xml.
> >
> > 
> > 
> > 8
> > "Oranges from South California -
> ordered"
> > "Green Apples - available"
> > "Black Report Books - ordered"
> > 
> >
> > There are several such documents, and for instance, I would like to query
> > all documents having in the foundField "Oranges" and "ordered". The
> > following proximity query takes care of it:
> >
> > q=foundField:("oranges AND ordered"~2)
> >
> > However, a field could have more words, and I also cannot know the
> > proximity of the desired query words in advance. Setting the proximity
> > value too high results in false positives, the following query also
> returns
> > the document (although "available" was in the entry about Apples):
> >
> > foundField:("oranges AND available"~200)
> >
> > I do not think that tweaking a proximity value is the correct approach.
> >
> > How can I search to match contents in a multiValued field per Value as
> > described above, without running into the problem?
> >
> > Many thanks for any help
> >
>


Re: Search over a multiValued field

2015-03-03 Thread Tom Devel
Erick,

Thanks a lot for the explanation, makes sense now.

Tom

On Tue, Mar 3, 2015 at 5:54 PM, Erick Erickson 
wrote:

> bq: Does it mean that words between " symbols, such as "Orange ordered" are
> treated as a single term, with (implicitly) AND conjunction between them?
>
> not at all. When you quote things, you're getting a "phrase query",
> perhaps one
> with slop. So something like
> "a b" means that 'a' must appear right next to 'b'. This is something
> like an AND
> in the sense that both terms must appear, but it is far more
> restrictive since it takes into
> account the position of the terms in the field.
>
> "a b"~10 means that both words must appear within 10 transpositions in
> the same field.
> You can think of "transposition" as how many intervening terms there
> are, so something
> like "a b"~2 would match docs with "a x b", but not "a x y z b".
>
> And this is where positionIncrementGap comes in. By putting 1000 in
> for it, you guarantee
> "a b"~999 won't match 'a' in one field and 'b' in another.
>
> whereas a AND b would match across successive MV entries no matter what the
> gap.
>
> HTH,
> Erick
>
> On Tue, Mar 3, 2015 at 2:22 PM, Tom Devel  wrote:
> > Jack,
> >
> > This is exactly what I was looking for, thanks. I found the
> > positionIncrementGap attribute in the schema.xml for the text_en
> >
> > I was putting in "AND" because I read in the Solr documentation that "The
> > OR operator is the default conjunction operator."
> >
> > Does it mean that words between " symbols, such as "Orange ordered" are
> > treated as a single term, with (implicitly) AND conjunction between them?
> >
> > Where could I found more info about this?
> >
> > I am currently reading
> >
> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser
> >
> > Thanks again
> >
> > On Tue, Mar 3, 2015 at 3:58 PM, Jack Krupansky  >
> > wrote:
> >
> >> Just set the positionIncrementGap for the multivalued field to a much
> >> higher value, like 1000 or 5000. That's the purpose of this attribute,
> to
> >> assure that reasonable proximity matches don't match across multiple
> >> values.
> >>
> >> Also, leave "AND" out of the query phrases - you're just trying to match
> >> the product name and availability.
> >>
> >>
> >> -- Jack Krupansky
> >>
> >> On Tue, Mar 3, 2015 at 4:51 PM, Tom Devel  wrote:
> >>
> >> > Hi,
> >> >
> >> > I am running Solr 5.0.0 and have a question about proximity search and
> >> > multiValued fields.
> >> >
> >> > I am indexing xml files of the following form with foundField being a
> >> field
> >> > defined as multiValued and text_en my in schema.xml.
> >> >
> >> > 
> >> > 
> >> > 8
> >> > "Oranges from South California -
> >> ordered"
> >> > "Green Apples - available"
> >> > "Black Report Books - ordered"
> >> > 
> >> >
> >> > There are several such documents, and for instance, I would like to
> query
> >> > all documents having in the foundField "Oranges" and "ordered". The
> >> > following proximity query takes care of it:
> >> >
> >> > q=foundField:("oranges AND ordered"~2)
> >> >
> >> > However, a field could have more words, and I also cannot know the
> >> > proximity of the desired query words in advance. Setting the proximity
> >> > value too high results in false positives, the following query also
> >> returns
> >> > the document (although "available" was in the entry about Apples):
> >> >
> >> > foundField:("oranges AND available"~200)
> >> >
> >> > I do not think that tweaking a proximity value is the correct
> approach.
> >> >
> >> > How can I search to match contents in a multiValued field per Value as
> >> > described above, without running into the problem?
> >> >
> >> > Many thanks for any help
> >> >
> >>
>


Order of defining fields and dynamic fields in schema.xml

2015-03-06 Thread Tom Devel
Hi,

I am running solr 5 using basic_configs and have a questions about the
order of defining fields and dynamic fields in the schema.xml file?

For example, there is a field "hierarchy.of.fields.Project" I am capturing
as below as "text_en_splitting", but the rest of the fields in this
hierarchy, I would like as "text_en"

Since the dynamicField with * is technically spanning over the Project
field, should its definition go above, or below the Project field?





Or this case, I have a hierarchy where currently only one field should be
captured "another.hierarchy.of.fields.Description", the rest for now should
be just ignored. Is here any significance of which definition comes first?




Thanks for any hints,
Tom


Re: Order of defining fields and dynamic fields in schema.xml

2015-03-06 Thread Tom Devel
Thats good to know.

On http://wiki.apache.org/solr/SchemaXml it also states about dynamicFields
that "you can create field rules that Solr will use to understand what
datatype should be used whenever it is given a field name that is not
explicitly defined, but matches a prefix or suffix used in a dynamicField. "

Thanks

On Fri, Mar 6, 2015 at 10:43 AM, Alexandre Rafalovitch 
wrote:

> I don't believe the order in file matters for anything apart from
> initParams section. The longer - more specific one - matches first.
>
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 6 March 2015 at 11:21, Tom Devel  wrote:
> > Hi,
> >
> > I am running solr 5 using basic_configs and have a questions about the
> > order of defining fields and dynamic fields in the schema.xml file?
> >
> > For example, there is a field "hierarchy.of.fields.Project" I am
> capturing
> > as below as "text_en_splitting", but the rest of the fields in this
> > hierarchy, I would like as "text_en"
> >
> > Since the dynamicField with * is technically spanning over the Project
> > field, should its definition go above, or below the Project field?
> >
> >  > indexed="true"  stored="true"  multiValued="true" required="false" />
> >  > indexed="true"  stored="true"  multiValued="true" required="false" />
> >
> >
> > Or this case, I have a hierarchy where currently only one field should be
> > captured "another.hierarchy.of.fields.Description", the rest for now
> should
> > be just ignored. Is here any significance of which definition comes
> first?
> >
> >  > indexed="false"  stored="false"  multiValued="true" required="false" />
> >  > type="text_en"indexed="true"  stored="true"  multiValued="true"
> > required="false" />
> >
> > Thanks for any hints,
> > Tom
>


Block Join Query update documents, how to do it correctly?

2015-05-13 Thread Tom Devel
I am using the Block Join Query Parser with success, following the example
on:

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers

As this example shows, each parent document can have a number of documents
embedded, and each document, be it a parent or a child, has its own unique
identifier.

Now I would like to update some of the parent documents, and read that
there are horror stories with duplicate documents, scrambled data etc., the
two prominent JIRA entries for this are:

https://issues.apache.org/jira/browse/SOLR-6700
https://issues.apache.org/jira/browse/SOLR-6096

My question is, how do you usually update such documents, for example to
update a value for the parent or a value for one of its children?

I tried to repost the whole modified document (the parent and ALL of its
children as one file), and it seems to work on a small toy example, but of
course I cannot be sure for a larger instance with thousands of documents,
and I would like to know if this is the correct way to go or not.

To make it clear, if originally I used bin/solr post on on the following
file:



1
Solr has block join support
  parentDocument

 2
SolrCloud supports it too!




Now I could do bin/solr post on a file:



1
Updated field: Solr has block join support
  parentDocument

 2
Updated field: SolrCloud supports it
too!




Will this avoid these inconsistent and scrambled or duplicate data on Solr
instances as discussed in the JIRAs? How do you usually do this?

Thanks for any help or hints.

Tom


Re: How to update from Solr Cloud 5.4.1 to 5.5.1

2016-08-29 Thread Tom Devel
Shawn,

Do you (or anybody else here) know of the upgrade steps from 6.1 to 6.2 in
this case? The release notes of 6.2 do not mention anything about
upgrading, but 6.2 has some good bugfixes.

If 6.2 made changes to the index format, is a drop-in replacement from 6.1
to 6.2 still possible?

Thanks,
Tom

On Sat, Aug 27, 2016 at 12:23 PM, Shawn Heisey  wrote:

> On 8/26/2016 10:22 AM, D'agostino Victor wrote:
> > Do you know in which version index format changes and if I should
> > update to a higher version ?
>
> In version 6.0, and again in the just-released 6.2, one aspect of the
> index format has been updated.  Version 6.1 didn't have any format
> changes from 6.0.  You won't see the new version reflected in any of the
> filenames in the index directory.
>
> Whether or not to upgrade depends on what features you need, and whether
> you need fixes included in the new version.  Not all of the fixed bugs
> in 6.x are applicable to 5.x -- some are fixes for problems introduced
> during 6.x development.
>
> > And about ZooKeeper ; the 3.4.8 is fine or should I update it too ?
>
> That's the newest stable version of zookeeper.  There are alpha releases
> of version 3.5.
>
> Solr includes zookeeper 3.4.6.  A 3.4.8 server will work, but no
> guarantees can be made about the 3.5 alpha versions.
>
> Thanks,
> Shawn
>
>


Solr and UIMA, capturing fields

2015-02-13 Thread Tom Devel
Hi,

I successfully combined Solr and UIMA with the help of
https://wiki.apache.org/solr/SolrUIMA and other pages (and am happy to
provide some help about how to reach this step).

Right now I can run an analysis engine and get some "primitive"
feature/fields which I specify in the schema.xml automatically recognized
by Solr. But if the features itself are objects, I do not know how to
capture them in Solr.

I provided the relevant solrconfig.xml in [1], and the schema.xml addition
in [2] for the following small example, they are using the AE directly
provided by the UIMA example.

With the input "This is a sentence with an email at u...@host.com", Solr
correctly adds the field:

"UIMAname": [
  "36"
]

since this is the index where the email token starts. I could also
successfully capture the feature
end to indicate where the found email token ends.

However, example.EmailAddress has the features: "begin, end, sofa". sofa is
not a primitive feature, but an "object" which itself has features
"sofaNum, sofaID, sofaString, ..."

How can I access fields in Solr from an annotation like
example.EmailAddress that are not simple strings but itself objects?

I made an image of the CAS Visual Debugger with this AE and the sentence to
show which fields I mean, I hope this makes it more clear:
http://tinypic.com/view.php?pic=34rud1s&s=8#.VN5bF7s2cWN

Does anyone know how to access such fields with Solr and UIMA?

Thanks a lot for any help,
Tom


[1]
  

  


/home/toliwa/javalibs/uimaj-2.6.0-bin/apache-uima/examples/descriptors/analysis_engine/UIMA_Analysis_Example.xml

false

id

  false
  
text
  


  
example.EmailAddress

  begin
  UIMAname

  

  



  

[2]