Re: Multilingual Solr

2016-06-06 Thread Alessandro Benedetti
Hi Johannes,
nothing out of the box unfortunately but could be a nice idea and
contribution.
If having a multi-core setup is not an option ( out of curiousity, can I
ask why ?)
you could proceed in this way :

1) you define in the schema N field variation per field you are interested
in.
N is the number of language you can support.
Given for example the text field you define :
text field not indexed, only stored
text_en indexed
text_fr indexed
text_it indexed ...

2) At indexing time you can develop a custom updateRequestProcessor that
will identify the language ( Solr internal libraries offer support for
that) and address the correct text field to index the content .
If you want to index also translations, you need to rely on some third
party libraries to do that.

3) At query time you can address in parallel all the fields you want, with
the edismax query parser for example .

4) For rendering the results, I don't have exactly clear, do you want to :

a) translate the document content in the language you want, you could
develop a custom DocTransformer that will take in input the language and
translate, but I don't see that much benefit in that.

b) return only the documents that originally were of that language. This
case is easy, you add a fq at queyTime to filter only the documents of the
language you want ( at indexing time you identify the language)

c) return the original content of the document, this is quite easy. You can
store the generic "text" field, and always return that.

Let us know for further discussion,

Cheers

On Sun, Jun 5, 2016 at 9:57 PM, Riedl, Johannes <
johannes.ri...@uni-tuebingen.de> wrote:

> Hi all,
>
> we are currently in search of a solution for switching between different
> languages in the query results and keeping the possibility to perform a
> search in several languages in parallel.  The overall aim would be a
> constant field name and a an additional Solr parameter "lang=XX_YY" that
> allows to return the results in the chosen language while searches are
> applied to all languages. Setting up several cores to obtain a generic
> field name is not an option. Does anyone know of a clean way to achieve
> this, particularly routing content indexed to a generic field (e.g. title)
> to a "background field" (e.g. title_en, title_fr) etc on the fly and
> retrieving it from there depending on the language chosen.
>
> Background: So far, we have investigated the multi-language field approach
> offered by Trey Grainger in the code examples for "Solr in Action" (
> https://github.com/treygrainger/solr-in-action.git, chapter 14), an
> extension to the ordinary textField that allows to use a generic field name
> and the language is encoded at the beginning of the field content and
> appropriate index and query analyzers associated to dummy fields in
> schema.xml. If there is a way to store data in these dummy fields and
> additionally the lang parameter is added we might be done.
>
> Thanks a lot, best regards
>
> Johannes
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Indexing logs in Solr

2016-06-06 Thread Anil
Hi Eric and Benedetti Alessandro*,*

do you have any inputs on the solution given the following link ?
http://lucene.472066.n3.nabble.com/Highlighting-for-non-stored-fields-td1773015.html

Thanks,
Anil

On 5 June 2016 at 11:56, Anil  wrote:

> Thanks IIan. I will look into this.
> In our case, logs are attached to some application information and its
> linked to other product information.
>
> Based on the log information, user will be navigated to other features of
> product. So we cannot directly decouple log search from our application.
>
> Thanks,
> Anil
>
> On 5 June 2016 at 11:42, Ilan Schwarts  wrote:
>
>> How about using "logstash" for this? I know its ES and not solr, but it is
>> a free tool that is out there and no need to re-invent the wheel
>> On Jun 5, 2016 9:09 AM, "Anil"  wrote:
>>
>> > Hi ,
>> >
>> > i would like to index logs using to enable search on it in our
>> application.
>> >
>> > The problem would be index and stored size as log files size would go
>> upto
>> > terabytes.
>> >
>> > is there any way to use highlight feature without storing ?
>> >
>> > i found following link where Benedetti Alessandro mentioned about custom
>> > highlighter on url field.
>> >
>> >
>> >
>> http://lucene.472066.n3.nabble.com/Highlighting-for-non-stored-fields-td1773015.html
>> >
>> > Any ideas would be helpful. Thanks.
>> >
>> > Cheers,
>> > Anil
>> >
>>
>
>


Solr highlights

2016-06-06 Thread Anil
HI ,

As per my understanding, there will be a on highlighter for all fields of
solr document.

Is there a way to apply different highlighters for different fields ?
Thanks.

Cheers,
Anil


Multiple dictionary and affix for HunspellStemFilterFactory

2016-06-06 Thread Zheng Lin Edwin Yeo
Hi,

I would like to check, is it possible to reference to multiple dictionaries
for HunspellStemFilterFactory?

i am trying to add more records to the default en_GB.dic file, but the file
size has exceeded 1024KB, and ZooKeeper doesn't allow files that are larger
than 1024KB to be loaded in.

When I tried adding this dictionary="en_GB.dic,en_GB2.dic", it works. But
when I tried adding the affix="en_GB.aff,en_GB2.aff", it doesn't work. Do I
need to add the second afflix? Or I can work correctly if I just add the
second dictionary?

I'm using Solr 6.0.1 and ZooKeeper 3.4.8, which are currently the latest
versions available.

Regards,
Edwin


Re: language configuration in update extract request handler

2016-06-06 Thread Reth RM
This question should be posted on tika mailing list. It is not related to
index or search but about parsing content of image.

On Sun, Jun 5, 2016 at 10:20 PM, SIDDHAST® Roshan 
wrote:

> Hi All,
>
> we are using the application for indexing and searching text using
> solr. we refered the guide posted
>
> http://hortonworks.com/hadoop-tutorial/indexing-and-searching-text-within-images-with-apache-solr/
>
> Problem: we are want to index hindi images. we want to know how to set
> configuration parameter of tesseract via tika or external params
>
> --
> Roshan Agarwal
> Siddhast®
> 907 chandra vihar colony
> Jhansi-284002
> M:+917376314900
>


solr5.4.1 : data import handler for index rich data

2016-06-06 Thread kostali hassan
I am looking to add new field to extract they value from the field text:



 for example the field links to extract all links from the field text of
each file.
I define in tika.config.xml a regex for the expression of links but when
the prossesor of indexation is finish I get just one value even if in
schema.xml I define the field links as multiValued (true) ; And I remark
the handler update/Extract get all the links automaticlly (multi value).
what I have to do to get all links present in each files with data import
handler.


Re: Multilingual Solr

2016-06-06 Thread Alexandre Rafalovitch
There is a language auto-detect UpdateRequestProcessor to route
indexed content to differently suffixed fields. You have Google's
algorithm: 
http://www.solr-start.com/info/update-request-processors/#LangDetectLanguageIdentifierUpdateProcessorFactory
or a Tika one: 
http://www.solr-start.com/info/update-request-processors/#TikaLanguageIdentifierUpdateProcessorFactory

To map during retrieval, you could use aliases, like I did in my book
example some years ago:
https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml#L20

Does this cover your needs?

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 6 June 2016 at 06:57, Riedl, Johannes
 wrote:
> Hi all,
>
> we are currently in search of a solution for switching between different 
> languages in the query results and keeping the possibility to perform a 
> search in several languages in parallel.  The overall aim would be a constant 
> field name and a an additional Solr parameter "lang=XX_YY" that allows to 
> return the results in the chosen language while searches are applied to all 
> languages. Setting up several cores to obtain a generic field name is not an 
> option. Does anyone know of a clean way to achieve this, particularly routing 
> content indexed to a generic field (e.g. title) to a "background field" (e.g. 
> title_en, title_fr) etc on the fly and retrieving it from there depending on 
> the language chosen.
>
> Background: So far, we have investigated the multi-language field approach 
> offered by Trey Grainger in the code examples for "Solr in Action" 
> (https://github.com/treygrainger/solr-in-action.git, chapter 14), an 
> extension to the ordinary textField that allows to use a generic field name 
> and the language is encoded at the beginning of the field content and 
> appropriate index and query analyzers associated to dummy fields in 
> schema.xml. If there is a way to store data in these dummy fields and 
> additionally the lang parameter is added we might be done.
>
> Thanks a lot, best regards
>
> Johannes


clustering in solr(carrot2)

2016-06-06 Thread Mugeesh Husain
Hello everyone, 

For clustering I tried to implement some test using official document
https://cwiki.apache.org/confluence/display/solr/Result+Clustering. 

I am getting the result as below 


  
DDR
  
  3.9599865057283354
  
TWINX2048-3200PRO
VS1GB400C3
VDBDB1A16
  


  
iPod
  
  11.959228467119022
  
F8V7067-APL-KIT
IW-02
MA147LL/A
  



could anyone tell me what is lable tag( DDR) , where is coming from ? 

I am stuck over label tag, need to understand about label ? 

Please explain it or give me suitable link 


Thanks 
mugeesh 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/clustering-in-solr-carrot2-tp4280817.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: find stores with sales of > $x in last 2 months ?

2016-06-06 Thread Allison, Timothy B.
Thank you, Alex.

> Sorry, your question a bit confusing.
Y. Sorry.

> Also, is this last month as in 'January' (rolling monthly) or as in 'last 30 
> days'
(rolling daily).

Ideally, the latter, if this is possible to calculate dynamically in response 
to a query.  My backoff method (if the 'rolling daily' method isn't possible), 
would be to index monthly stats and then just use the range query as you 
suggested.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Sunday, June 5, 2016 12:52 AM
To: solr-user 
Subject: Re: find stores with sales of > $x in last 2 months ?

Are you asking for just numerical comparison during search or about a way to 
aggregate numbers from multiple records? Also, is this last month as in 
'January' (rolling monthly) or as in 'last 30 days'
(rolling daily). Sorry, your question a bit confusing.

Numerical comparison is just a range (numField:[x TO *])  as per

https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-RangeSearches

https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-DifferencesbetweenLuceneQueryParserandtheSolrStandardQueryParser

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 3 June 2016 at 23:23, Allison, Timothy B.  wrote:
> All,
>   This is a toy example, but is there a way to search for, say, stores with 
> sales of > $x in the last 2 months with Solr?
>   $x and the time frame are selected by the user at query time.
>
> If the queries could be constrained (this is still tbd), I could see updating 
> "stats" fields within each store document on a daily basis 
> (sales_last_1_month, sales_last_2_months, sales_last_3_months...etc).  The 
> dataset is fairly small and daily updates of this nature would not be 
> prohibitive.
>
>Or, is this trying to use a screw driver where a hammer is required?
>
>Thank you.
>
>Best,
>
>  Tim


Re: Help needed on Solr Streaming Expressions

2016-06-06 Thread Joel Bernstein
Hi,

To eliminate any issues that might be happening due to curl, try running
the command from your browser.

http://localhost:8988/solr/document2/stream?expr=search(document3,zkHost="
127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest",
sort="document_id asc",qt="/export")



I think most browsers will url encode the expression automatically, but you
can url encode also using an online tool. Also you can remove the zkHost
param and it should default to zkHost your solr is connected to.


If you still get an error take a look at the logs and post the full stack
trace to this thread, which will help determine where the problem is.



Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jun 5, 2016 at 2:11 PM, Hui Liu  wrote:

> Hi,
>
>
>
>   I have Solr 6.0.0 installed on my PC (windows 7), I was
> experimenting with ‘Streaming Expression’ feature by following steps from
> this link:
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions,
> but cannot get it to work, attached is my solrconfig.xml and schema.xml,
> note I do have ‘export’ handler defined in my ‘solrconfig.xml’ and enabled
> all fields as ‘docvalues’ in ‘schema.xml’; I am using solr cloud and
> external zookeeper (also installed on m PC), here is the command to start
> this 2-node Solr cloud instance and to create the collection ‘document3’:
>
>
>
> -- start 2-node solr cloud instances:
>
> solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3
>
> solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4
>
>
>
> -- create the collection:
>
> solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2
>
>
>
>   after creating the collection I loaded a few documents using
> ‘csv’ format and I was able to query it using ‘curl’ command from my PC:
>
>
>
> -- this works on my PC:
>
> curl
> http://localhost:8988/solr/document3/select?q=*:*&sort=document_id+desc,sender_msg_dest+desc&fl=document_id,sender_msg_dest,recip_msg_dest
>
>
>
>   but when trying Streaming ‘search’ using curl, it does not
> work, I tried with 3 different options: with zkHost, using ‘export’, or
> using ‘select’, all getting the same error:
>
>
> curl: (6) Could not resolve host: sort=document_id asc,qt=
>
> {"result-set":{"docs":[
>
> {"EXCEPTION":null,"EOF":true}]}}
>
> -- different curl commands tried, all getting the same error above:
>
> curl --data-urlencode 
> 'expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id,
> sender_msg_dest", sort="document_id asc",qt="/export")' "
> http://localhost:8988/solr/document2/stream";
>
>
>
> curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id,
> sender_msg_dest", sort="document_id asc",qt="/export")' "
> http://localhost:8988/solr/document2/stream";
>
>
>
> curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id,
> sender_msg_dest", sort="document_id asc",qt="/select",rows=10)' "
> http://localhost:8988/solr/document2/stream";
>
>
>
>   what am I doing wrong? Thanks for any help!
>
>
>
> Regards,
>
> Hui Liu
>


Re: Not (!) operator

2016-06-06 Thread Anil
This is good idea
Thanks Alex.
On May 28, 2016 12:59 AM, "Alexandre Rafalovitch" 
wrote:

> If you are worried about performance, bake the present/absent as a
> signal in a separate field during the document processing as a special
> UpdateRequestProcessor sequence.
>
> Regards,
> Alex.
>
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 27 May 2016 at 17:13, Anil  wrote:
> > Hi Shawn,
> >
> > Thanks for reply. i am also worried wither performance.
> > I will check if there is another way to design the documents in case of
> > parent and child relationship.
> >
> > Regards,
> > Anil
> >
> > On 27 May 2016 at 12:39, Shawn Heisey  wrote:
> >
> >> On 5/26/2016 11:13 PM, Anil wrote:
> >> > We have status text field in our solr document and it is optional.
> >> > search query status: !Closed returning documents with no status as
> >> > well. how to get only documents having status and it is !Closed ? one
> >> > way is status:* AND status:!Closed . any other way ? Thanks
> >>
> >> If you use status:* then you are doing a wildcard query.  If the status
> >> field has a large number of unique values, this will be VERY slow.
> >> Avoid wildcard queries unless they are the only way to accomplish what
> >> you need.
> >>
> >> If the status field has more than a few possible values, the most
> >> compact way to do this query efficiently would be:
> >>
> >> status:[* TO *]-status:Closed
> >>
> >> This could be written as:
> >>
> >> status:[* TO *] AND NOT status:Closed
> >>
> >> See this article about why this may not be the best way to write
> queries:
> >>
> >> https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/
> >>
> >> The [] syntax is a range query.  By starting and ending the range with
> >> the * character, it means "all documents where status has a value".
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Re: Multilingual Solr

2016-06-06 Thread Johannes Riedl

Hi Alessandro, hi Alexandre,

Thanks a lot for your reply and your considerations and hints. We use a 
web front end that comes bundled with Solr. It currently uses a single 
core approach. We would like to stick to the original setup as closely 
as possible to avoid administrative overhead and to not prevent the 
possible use of several cores in a different context in the future. This 
is the reason why we would like to hide the language fields completely 
from the front end apart from specifying an additional language 
parameter. Language detection on indexing is currently not an issue for 
us, as we get the input in a standardized format and thus can determine 
the language beforehand.


https://github.com/treygrainger/solr-in-action/blob/master/example-docs/ch14/cores/multi-language-field/conf/schema.xml 
shows an example how the multiText field type makes use of language 
specific field types to specify the analyzers that are being used. The 
core issue for us (pun intended ;-)) is to find out whether it is 
possible to extend this approach to only return the selected 
language(s), i.e. to transparently add something like nested documents.


Best regards

Johannes


On 06.06.2016 10:10, Alessandro Benedetti wrote:

Hi Johannes,
nothing out of the box unfortunately but could be a nice idea and
contribution.
If having a multi-core setup is not an option ( out of curiousity, can I
ask why ?)
you could proceed in this way :

1) you define in the schema N field variation per field you are interested
in.
N is the number of language you can support.
Given for example the text field you define :
text field not indexed, only stored
text_en indexed
text_fr indexed
text_it indexed ...

2) At indexing time you can develop a custom updateRequestProcessor that
will identify the language ( Solr internal libraries offer support for
that) and address the correct text field to index the content .
If you want to index also translations, you need to rely on some third
party libraries to do that.

3) At query time you can address in parallel all the fields you want, with
the edismax query parser for example .

4) For rendering the results, I don't have exactly clear, do you want to :

a) translate the document content in the language you want, you could
develop a custom DocTransformer that will take in input the language and
translate, but I don't see that much benefit in that.

b) return only the documents that originally were of that language. This
case is easy, you add a fq at queyTime to filter only the documents of the
language you want ( at indexing time you identify the language)

c) return the original content of the document, this is quite easy. You can
store the generic "text" field, and always return that.

Let us know for further discussion,

Cheers

On Sun, Jun 5, 2016 at 9:57 PM, Riedl, Johannes <
johannes.ri...@uni-tuebingen.de> wrote:


Hi all,

we are currently in search of a solution for switching between different
languages in the query results and keeping the possibility to perform a
search in several languages in parallel.  The overall aim would be a
constant field name and a an additional Solr parameter "lang=XX_YY" that
allows to return the results in the chosen language while searches are
applied to all languages. Setting up several cores to obtain a generic
field name is not an option. Does anyone know of a clean way to achieve
this, particularly routing content indexed to a generic field (e.g. title)
to a "background field" (e.g. title_en, title_fr) etc on the fly and
retrieving it from there depending on the language chosen.

Background: So far, we have investigated the multi-language field approach
offered by Trey Grainger in the code examples for "Solr in Action" (
https://github.com/treygrainger/solr-in-action.git, chapter 14), an
extension to the ordinary textField that allows to use a generic field name
and the language is encoded at the beginning of the field content and
appropriate index and query analyzers associated to dummy fields in
schema.xml. If there is a way to store data in these dummy fields and
additionally the lang parameter is added we might be done.

Thanks a lot, best regards

Johannes








Re: Solr /export and dates (Solr 5.5.1)

2016-06-06 Thread Erick Erickson
Sorry, it dropped off my radar somehow. Just opened SOLR-9187. I have
a patch that I'm testing now, we'll see how that goes.

On Wed, Jun 1, 2016 at 7:54 PM, Ronald Wood  wrote:
>
> Thanks! I'm glad to find out I'm not going crazy.
>
> I'll keep a lookout for that enhancement.
>
> Ronald S. Wood
>
> Immediate customer support:
> Call 1-866-762-7741 (x2) or 
> emailsupp...@smarsh.com
>
> On Jun 1, 2016, at 21:45, Joel Bernstein 
> mailto:joels...@gmail.com>> wrote:
>
> The documentation is wrong for sure. We need a new example query.
>
> I was just discussing the date issue with Erick Erickson the other day. I
> believe he is working on adding dates to the export handler but I didn't
> see a jira ticket for this yet. We'll also need to add dates to the /export
> handler for date support in the Parallel SQL interface.
>
> Erick, if you're reading this, let us know if this is in the works.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Jun 1, 2016 at 8:15 PM, Ronald Wood 
> mailto:rw...@smarsh.com>> wrote:
>
> I have spent a bit of time with the export handler in 5.5.1 (since we are
> unable to upgrade directly from 4 to 6). The speed looks impressive at
> first glance compared to paging with cursors.
>
> However, I am deeply confused that it does not seem to be possible to
> either sort on or get date values when doing an export.
>
> I say deeply confused, because the example in the Reference Guide is this:
>
>
> http://localhost:8983/solr/core_name/export?q=my-query&sort=severity+desc,timestamp+desc&fl=severity,timestamp,msg
>
> Now, I suppose you could argue that timestamp's schema type isn't shown,
> so maybe it's an epochal integer value.
>
> Certainly when I try to get our date field (defined as TrieDateField in
> our schema) I get this error:
>
> java.io.IOException: Export fields must either be one of the following
> types: int,float,long,double,string
>  at
> org.apache.solr.response.SortingResponseWriter.getFieldWriters(SortingResponseWriter.java:277)
>  at
> org.apache.solr.response.SortingResponseWriter.write(SortingResponseWriter.java:120)
>  at
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:52)
>  ...
>
> And I see date is not a type there. However, int, float, long and double
> are also Trie types, so I'm not sure why a TrieDateField could not also be
> sorted or exported.
>
> I wonder if someone could elucidate this. I would have thought getting
> dates out of an export or stream would be highly desirable. I am definitely
> open to the high likelihood I am doing something wrong.
>
> I apologize if this topic has been covered before, as I was unable to find
> a way to search the mailing list on the Apache mail archives site. I wonder
> if there's some search engine out there that could do that kind of thing? ?
>
> Ronald S. Wood | Senior Software Developer
> 857-991-7681 (mobile)
>
> Smarsh
> 100 Franklin St. Suite 903 | Boston, MA 02210
> 1-866-SMARSH-1 | 971-998-9967 (fax)
> www.smarsh.com
>
> Immediate customer support:
> Call 1-866-762-7741 (x2) or visit 
> www.smarsh.com/support<
> http://www.smarsh.com/support>
>


NRT updates

2016-06-06 Thread Chris Vizisa
Hi,

Does number of fields in a document affect NRT updates?
I have around 1.6 million products. Each product can be available in about
3000 stores.
In addition to around 50 fields related to a product I am storing
product_store info in each product document like:
 1. Quantity of that product in each store (store_n1_count,
store_n2_count,..., store_3000_count)
 2. status of that product in each store (store_n1_status,
store_n2_status,.store_3000_status)

I would need to do NRT update on count and status of each product, and like
that there are around 1.6 million products.

Q1. Is it okay to do NRT updates on this product collection (for each
product's store_count and store_status) with around 900 updates per second
  across the different products, (pls note that each product's status
as well as count gets updated, like that there are 1.6M products)
Q2. Is it okay using atomic updates for the NRT updates of multiple
store_counts and multiple store_status of each product and like that around
1.6 million products in total. Or is there any other optimal way to
handle this amount of dynamic data change.
For atomic updates I understand all fields need to be stored.
Q3. So basically can I have all this info in product collection itself or
should I store store_status info separately with productId joining them
for the NRT scenario to work best. In that case each product_store info
is a separate document, with 3 or 4 fields only but many million
documents (worst case 1.6M products multiplied by 3000 stores).
Q4.When we embed all store related info in the product doc itself, a
single product doc
   can be a candidate for simultaneous updates as its count or status
can change in
   different stores at the same time. If we go for a separate
collection depicting
   product_status info, only one doc updated at a time mostly.
   Which is more efficient and optimized.?


Could some one please suggest what is optimal. Any pointers welcome.

Thanks!
Chris.


Re: solr 5.4.1

2016-06-06 Thread Erick Erickson
It's unclear what you're asking. You want your own
schema file? Or your own configuration for parsing
your documents?

Have you read through the reference guide
section here:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

and if so, what parts are you having trouble with?

Best,
Erick

On Thu, Jun 2, 2016 at 8:53 AM, Adnane Falh  wrote:
> hi i would like to create a new field structure(tika-config.xml) for my
> indexing files using tika (ExtractingRequestHandler) and i just want a
> working example to follow so that i can create my file thank you


Re: Zookeeper hanging after a commit

2016-06-06 Thread Erick Erickson
Zookeeper hanging? If it was truly unresponsive I would
think your entire SolrCloud would be down. I guess you
could test this by, say, creating a new collection and
seeing if it goes live, if Zookeeper is truly unresponsive
that would fail.

Are you sure it's not just that the merging that's going
on as part of MRIT?

Best,
Erick

On Thu, Jun 2, 2016 at 11:37 AM, Jordan Drake  wrote:
> Hi all,
>
> We are in the processing of streamlining our indexing process and trying to
> increase some performance. We came across an issue where zookeeper seems to
> hang for 10+ minutes (we've seen it as high as 40 min) after committing.
> See the portion of the logs below.
>
> Our indexing is being done using the MapReduceIndexerTool with the go-live
> option to merge into our live Solr.
> The creation of the segments in mapreduce is fairly quick, and the merge is
> usually fast. It's just that we occasionally see this issue in one of our
> environments.
>
> I'm not sure whether this is a Zookeeper or Solr issue or if this is just
> expected behavior. Any ideas on where to look for debugging?
>
>
>
> 16/06/02 09:03:06 INFO hadoop.MapReduceIndexerTool: Indexing 1 files
> using 1 real mappers into 1 reducers
> 16/06/02 09:04:08 INFO hadoop.MapReduceIndexerTool: Done. Indexing 1
> files using 1 real mappers into 1 reducers took 2.06103613E10 secs
> 16/06/02 09:04:08 INFO hadoop.GoLive: Live merging of output shards
> into Solr cluster...
> 16/06/02 09:04:08 INFO hadoop.GoLive: Live merge
> hdfs://192.168.5.228:8020/indexed/tmp/e2e/2223/results/part-0 into
> http://192.168.5.227:8983/solr
> 16/06/02 09:04:22 INFO hadoop.GoLive: Committing live merge...
> 16/06/02 09:04:22 INFO zookeeper.ZooKeeper: Initiating client
> connection, connectString=192.168.5.227:9983 sessionTimeout=1
> watcher=org.apache.solr.common.cloud.ConnectionManager@1deca477
> 16/06/02 09:04:22 INFO cloud.ConnectionManager: Waiting for client to
> connect to ZooKeeper
> 16/06/02 09:04:22 INFO zookeeper.ClientCnxn: Opening socket connection
> to server 192.168.5.227/192.168.5.227:9983. Will not attempt to
> authenticate using SASL (unknown error)
> 16/06/02 09:04:22 INFO zookeeper.ClientCnxn: Socket connection
> established to 192.168.5.227/192.168.5.227:9983, initiating session
> 16/06/02 09:04:22 INFO zookeeper.ClientCnxn: Session establishment
> complete on server 192.168.5.227/192.168.5.227:9983, sessionid =
> 0x154e9ea749c028f, negotiated timeout = 1
> 16/06/02 09:04:22 INFO cloud.ConnectionManager: Watcher
> org.apache.solr.common.cloud.ConnectionManager@1deca477
> name:ZooKeeperConnection Watcher:192.168.5.227:9983 got event
> WatchedEvent state:SyncConnected type:None path:null path:null
> type:None
> 16/06/02 09:04:22 INFO cloud.ConnectionManager: Client is connected to
> ZooKeeper*16/06/02 09:04:22 INFO cloud.ZkStateReader: Updating cluster
> state from ZooKeeper...
> 16/06/02 09:18:17 INFO zookeeper.ZooKeeper: Session: 0x154e9ea749c028f closed*
> 16/06/02 09:18:17 INFO zookeeper.ClientCnxn: EventThread shut down
> 16/06/02 09:18:17 INFO hadoop.GoLive: Done committing live merge
> 16/06/02 09:18:17 INFO hadoop.GoLive: Live merging of index shards
> into Solr cluster took 2.83196359E11 secs
> 16/06/02 09:18:17 INFO hadoop.GoLive: Live merging completed successfully
> 16/06/02 09:18:17 INFO hadoop.MapReduceIndexerTool: Succeeded with
> job: jobName: org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper,
> jobId: job_1464681461364_0604
> 16/06/02 09:18:17 INFO hadoop.MapReduceIndexerTool: Success. Done.
> Program took 3.04902275E11 secs. Goodbye.
>
>
>
> Thanks,
> Jordan Drake


Re: SolrCloud 5.2.1 nodes are out of sync - how to handle

2016-06-06 Thread Erick Erickson
Sure, the routing doesn't matter to the ADDREPLICA
command, you give it a shard ID.

I'm more worried about how the nodes got out of
sync in the first place. Are _both_ Solr noded on a
particular machine out of sync? And what is the evidence
that they are?

You can issue something like
'.../solr/coll_shard1_replica1?q=*:*&distrib=false'
 against each _core_ and see if the counts are
the same just to check.

But in the normal course of events, this should be
all automatic. So what do you think caused the replicas
to get out of synch in the first place? And what's the symptom?

Best,
Erick

On Thu, Jun 2, 2016 at 10:46 PM, Ilan Schwarts  wrote:
> In my question i confusef you, there are 2 shards and 2 nodes on each
> shard, one leader and one not. When created the collection num of shards
> was 2 and replication factor was 2.
> Now the status is shard 1 has 2 out of sync nodes, so it is needed to
> merge/sync them. Do you still suggest same? Add replica to the damaged
> shard and then delete it? If the collection was created with composite
> routing is it possible?
> On Jun 3, 2016 4:18 AM, "Erick Erickson"  wrote:
>
>> A pedantic nit... leader/replica is not much like
>> "old master/slave".
>>
>> That out of the way, here's what I'd do.
>> 1> use the ADDREPLICA to add a new replica for the shard
>> _on the same node as the bad one_.
>> 2> Once that had recoverd (green in the admin UI) and you
>>  were confident of
>>its integrity (you can verify by running queries against this
>>   new replica and the leader with &distrib=false), use
>>DELETEREPLICA on the "bad" core.
>>
>> Best,
>> Erick
>>
>> On Wed, Jun 1, 2016 at 5:54 AM, Ilan Schwarts  wrote:
>> > Hi,
>> > We have in lab SolrCloud 5.2.1
>> > 2 Shards, each shard has 2 cores/nodes, replication factor is 1. meaning
>> > that one node is leader (like old master-slave).
>> > (upon collection creation numShards=1 rp=1)
>> >
>> > Now there is a problem in the lab, shard 1 has 2 cores, but the number of
>> > docs is different, and when adding a document to one of the cores, it
>> will
>> > not replicate the data to the other one.
>> > If i check cluster state.json it appears fine, it writes there are 2
>> active
>> > cores and only 1 is set as leader.
>> >
>> > What is the recovery method for a scenario like this ? I dont have logs
>> > anymore and cannot reproduce.
>> > Is it possible to merge the 2 cores into 1, and then split that core to 2
>> > cores ?
>> > Or maybe to enforce sync if possible ?
>> >
>> > The other shard, Shard 2 is functioning well, the replication works fine,
>> > when adding a document to 1 core, it will replicate it to the other.
>> >
>> > --
>> >
>> >
>> > -
>> > Ilan Schwarts
>>


Re: Index time Dates format when time is not needed

2016-06-06 Thread Erick Erickson
That padding is just fine, you're effectively indexing
everything exactly at midnight.

Best,
Erick

On Sun, Jun 5, 2016 at 12:48 PM, Steven White  wrote:
> Hi everyone,
>
> I'm using "solr.DateRangeField" data type to index my dates data and based
> on [1] the format of the dates data is "-MM-DDThh:mm:ssZ".
>
> In my case, I have no need to search on time, just dates.  I started by
> indexing my dates data as "2016-06-01" but Solr threw an exception.  I then
> changed my code to index the dates data as: "2016-06-01T00:00:00Z" and now
> it works.
>
> I have tested this new format and all is well so far, however I'm not sure
> if the way I have done the padding is valid.  So, my question to the Solr
> community is this: Is the format that I'm using correct (padding with "00")
> or is there some other format I should have used that is better and more
> optimal for my use case?
>
> Thanks in advanced.
>
> Steve
>
> [1] https://cwiki.apache.org/confluence/display/solr/Working+with+Dates


Re: Getting a list of matching terms and offsets

2016-06-06 Thread Justin Lee
Thank you very much!  That JIRA entry led me to
https://issues.apache.org/jira/browse/SOLR-4722, which still works against
Solr 6 with a couple of modifications and should serve as the basis for
what I want to do.  You saved me a bunch of work, so thanks very much.
 (Also, it is always nice to know that people with more experience than me
took the same approach.)

On Sun, Jun 5, 2016 at 1:09 PM Ahmet Arslan 
wrote:

> Hi Lee,
>
> May be you can find useful starting point on
> https://issues.apache.org/jira/browse/SOLR-1397
>
> Please consider to contribute when you gather something working.
>
> Ahmet
>
>
>
>
> On Sunday, June 5, 2016 10:37 PM, Justin Lee 
> wrote:
> Thanks, yea, I looked at debug query too.  Unfortunately the output of
> debug query doesn't quite do it.  For example, if you use a wildcard query,
> it will simply explain the score associated with that wildcard query, not
> the actual matching token.  In order words, if you search for "hour*" and
> the actual matching text is "hours", debug query doesn't tell you that.
> Instead, it just reports the score associated with "hour*".
>
> The closest example I've ever found is this:
>
>
> https://lucidworks.com/blog/2013/05/09/update-accessing-words-around-a-positional-match-in-lucene-4/
>
> But this kind of approach won't let me use the full power of the Solr
> ecosystem.  I'd basically be back to dealing with Lucene directly, which I
> think is a step backwards.  I think the right approach is to write my own
> SearchComponent, using the highlighter as a starting point.  But I wanted
> to make sure there wasn't a simpler way.
>
>
> On Sun, Jun 5, 2016 at 11:30 AM Ahmet Arslan 
> wrote:
>
> > Well debug query has the list of token that caused match.
> > If i am not mistaken i read an example about span query and spans thing.
> > It was listing the positions of the matches.
> > Cannot find the example at the moment..
> >
> > Ahmet
> >
> >
> >
> > On Sunday, June 5, 2016 9:10 PM, Justin Lee 
> > wrote:
> > Thanks for the responses Alex and Ahmet.
> >
> > The TermVector component was the first thing I looked at, but what it
> gives
> > you is offset information for every token in the document.  I'm trying to
> > get a list of tokens that actually match the search query, and unless I'm
> > missing something, the TermVector component doesn't give you that
> > information.
> >
> > The TermSpans class does contain the right information, but again the
> hard
> > part is: how do I reliably get a list of TokenSpans for the tokens that
> > actually match the search query?  That's why I ended up in the
> highlighter
> > source code, because the highlighter has to do just this in order to
> create
> > snippets with accurate highlighting.
> >
> > Justin
> >
> >
> > On Sun, Jun 5, 2016 at 9:09 AM Ahmet Arslan 
> > wrote:
> >
> > > Hi,
> > >
> > > May be org.apache.lucene.search.spans.TermSpans ?
> > >
> > >
> > >
> > > On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch <
> > arafa...@gmail.com>
> > > wrote:
> > > It sounds like TermVector component's output:
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
> > >
> > > Perhaps with additional flags enabled (e.g. tv.offsets and/or
> > > tv.positions).
> > >
> > > Regards,
> > >Alex.
> > > 
> > > Newsletter and resources for Solr beginners and intermediates:
> > > http://www.solr-start.com/
> > >
> > >
> > >
> > > On 5 June 2016 at 07:39, Justin Lee  wrote:
> > > > Is anyone aware of a way of getting a list of each matching token and
> > > their
> > > > offsets after executing a search?  The reason I want to do this is
> > > because
> > > > I have the physical coordinates of each token in the original
> document
> > > > stored out of band, and I want to be able to highlight in the
> original
> > > > document.  I would really like to have Solr return the list of
> matching
> > > > tokens because then things like stemming and phrase matching will
> work
> > as
> > > > expected. I'm thinking of something like the highlighter component,
> > > except
> > > > instead of returning html, it would return just the matching tokens
> and
> > > > their offsets.
> > > >
> > > > I have googled high and low and can't seem to find an exact answer to
> > > this
> > > > question, so I have spent the last few days examining the internals
> of
> > > the
> > > > various highlighting classes in Solr and Lucene.  I think the bulk of
> > the
> > > > action is in WeightedSpanTermExtractor and its interaction with
> > > > getBestTextFragments in the Highlighter class.  But before I spend
> > > anymore
> > > > time on this I thought I'd ask (1) whether anyone knows of an easier
> > way
> > > of
> > > > doing this, and (2) whether I'm at least barking up the right tree.
> > > >
> > > > Thanks much,
> > > > Justin
> > >
> >
>


Re: Can a DocTransformer access the whole results tree?

2016-06-06 Thread Upayavira
:-)

On Sat, 4 Jun 2016, at 06:50 PM, Mikhail Khludnev wrote:
> I'm sorry for thinking sooo slow.
> 
> On Sat, Jun 4, 2016 at 7:19 PM, Upayavira  wrote:
> 
> > Ahhh, seen it now in your SubQueryAugmenterFactory, via the threadLocal.
> > Somewhat scary code, but I think I can work with it!
> >
> > Thanks!
> >
> > Upayavira
> >
> > On Sat, 4 Jun 2016, at 10:30 AM, Mikhail Khludnev wrote:
> > > Had you check
> > >
> > https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/request/SolrRequestInfo.html#getRsp--
> > > ?
> > > 27 мая 2016 г. 16:55 пользователь "Upayavira"  написал:
> > >
> > > > In a JSON response, we get this:
> > > >
> > > > {
> > > >   "responseHeader": {...},
> > > >   "response": { "docs": [...] },
> > > >   "highlighting": {...}
> > > >   ...
> > > > }
> > > >
> > > > I'm assuming that the getProcessedDocuments call would give me the
> > docs:
> > > > {} element, whereas I'm after the whole response so I can retrieve the
> > > > "highlighting" element.
> > > >
> > > > Make sense?
> > > >
> > > > On Fri, 27 May 2016, at 02:45 PM, Mikhail Khludnev wrote:
> > > > > Upayavira,
> > > > >
> > > > > It's not clear what do you mean in "results themselves", perhaps you
> > mean
> > > > > SolrDocuments ?
> > > > >
> > > > > public abstract class ResultContext {
> > > > >  ..
> > > > >   public Iterator getProcessedDocuments() {
> > > > > return new DocsStreamer(this);
> > > > >   }
> > > > >
> > > > > On Fri, May 27, 2016 at 4:15 PM, Upayavira  wrote:
> > > > >
> > > > > > Yes, I've seen that. I can see the getDocList() method will
> > presumably
> > > > > > give me the results themselves, but I need the full response so I
> > can
> > > > > > get the highlighting details, but I can't see them anywhere.
> > > > > >
> > > > > > On Thu, 26 May 2016, at 09:39 PM, Mikhail Khludnev wrote:
> > > > > > > public abstract class ResultContext {
> > > > > > >
> > > > > > >  /// here are all results
> > > > > > >   public abstract DocList getDocList();
> > > > > > >
> > > > > > >   public abstract ReturnFields getReturnFields();
> > > > > > >
> > > > > > >   public abstract SolrIndexSearcher getSearcher();
> > > > > > >
> > > > > > >   public abstract Query getQuery();
> > > > > > >
> > > > > > >   public abstract SolrQueryRequest getRequest();
> > > > > > >
> > > > > > > On Thu, May 26, 2016 at 11:25 PM, Upayavira 
> > wrote:
> > > > > > >
> > > > > > > > Hi Mikhail,
> > > > > > > >
> > > > > > > > Is there really? If I look at ResultContext, I see it is an
> > > > abstract
> > > > > > > > class, completed by BasicResultContext. I don't see any context
> > > > method
> > > > > > > > there. I can see a getContext() on SolrQueryRequest which just
> > > > returns
> > > > > > a
> > > > > > > > hashmap. Will I find the response in there? Is that what you
> > are
> > > > > > > > suggesting?
> > > > > > > >
> > > > > > > > Upayavira
> > > > > > > >
> > > > > > > > On Thu, 26 May 2016, at 06:28 PM, Mikhail Khludnev wrote:
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > There is a protected ResultContext field named context.
> > > > > > > > >
> > > > > > > > > On Thu, May 26, 2016 at 5:31 PM, Upayavira 
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Looking at the code for a sample DocTransformer, it seems
> > that
> > > > a
> > > > > > > > > > DocTransformer only has access to the document itself, not
> > to
> > > > the
> > > > > > whole
> > > > > > > > > > results. Because of this, it isn't possible to use a
> > > > > > DocTransformer to
> > > > > > > > > > merge, for example, the highlighting results into the main
> > > > > > document.
> > > > > > > > > >
> > > > > > > > > > Am I missing something?
> > > > > > > > > >
> > > > > > > > > > Upayavira
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Sincerely yours
> > > > > > > > > Mikhail Khludnev
> > > > > > > > > Principal Engineer,
> > > > > > > > > Grid Dynamics
> > > > > > > > >
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sincerely yours
> > > > > > > Mikhail Khludnev
> > > > > > > Principal Engineer,
> > > > > > > Grid Dynamics
> > > > > > >
> > > > > > > 
> > > > > > > 
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours
> > > > > Mikhail Khludnev
> > > > > Principal Engineer,
> > > > > Grid Dynamics
> > > > >
> > > > > 
> > > > > 
> > > >
> >
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> 
> 


Solr 6 fail to index images

2016-06-06 Thread Jeferson dos Anjos
I'm trying to index images on SOLR, but I get the following error:

ERROR: [doc=5b36cb2b78072e41] Error adding field
'media_black_point'='(0.012054443, 0.012496948, 0.010314941)' msg=For input
string: "(0.012054443"

It looks like it's a problem of field types, but these fields are extracted
automatically. I'm forgetting some additional configuration?

I appreciate all the help!

Thanks
Jeferson M. dos Anjos
CEO do Packdocs
ps.: Flexible spaces to contents with Packdocs (www.packdocs.com)


RE: Help needed on Solr Streaming Expressions

2016-06-06 Thread Hui Liu
Joel,

Thank you very much for your help, I tried the http command below with my 
existing 2 shards collection 'document3' (sorry I have a typo below should be 
document3 instead of document2), this time I got much better error:

{"result-set":{"docs":[
{"EXCEPTION":"Unable to construct instance of 
org.apache.solr.client.solrj.io.stream.CloudSolrStream","EOF":true}]}}

I attach the error stack trace from 'solr-8988-console.log' and 'solr.log' here 
in file 'solr_error.txt'.

However I continued and tried create another identical collection 'document5' 
with 2 shards and 2 replica using the same schema, this time the http URL 
worked!!! Maybe my previous collection 'document3' has some corruption? 

-- command to create collection 'document5':
solr create -c document5 -d new_doc_configs5 -p 8988 -s 2 -rf 2

-- command for stream expression:
http://localhost:8988/solr/document5/stream?expr=search(document5,zkHost="127.0.0.1:2181",q="*:*",fl="document_id,
 sender_msg_dest", sort="document_id asc",qt="/export")

-- result from browser:
{"result-set":{"docs":[
{"document_id":20346005172,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346005173,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346006403,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006406,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006741,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006743,"sender_msg_dest":"14:004321519IBMP"},
{"EOF":true,"RESPONSE_TIME":10}]}}

Do you think I can try the same in http using other 'Stream Decorators' such as 
'complement' and 'innerJoin'?

Regards,
Hui

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Monday, June 06, 2016 9:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Help needed on Solr Streaming Expressions

Hi,

To eliminate any issues that might be happening due to curl, try running the 
command from your browser.

http://localhost:8988/solr/document2/stream?expr=search(document3,zkHost="
127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id 
asc",qt="/export")



I think most browsers will url encode the expression automatically, but you can 
url encode also using an online tool. Also you can remove the zkHost param and 
it should default to zkHost your solr is connected to.


If you still get an error take a look at the logs and post the full stack trace 
to this thread, which will help determine where the problem is.



Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jun 5, 2016 at 2:11 PM, Hui Liu  wrote:

> Hi,
>
>
>
>   I have Solr 6.0.0 installed on my PC (windows 7), I was 
> experimenting with ‘Streaming Expression’ feature by following steps 
> from this link:
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> , but cannot get it to work, attached is my solrconfig.xml and 
> schema.xml, note I do have ‘export’ handler defined in my 
> ‘solrconfig.xml’ and enabled all fields as ‘docvalues’ in 
> ‘schema.xml’; I am using solr cloud and external zookeeper (also 
> installed on m PC), here is the command to start this 2-node Solr 
> cloud instance and to create the collection ‘document3’:
>
>
>
> -- start 2-node solr cloud instances:
>
> solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3
>
> solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4
>
>
>
> -- create the collection:
>
> solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2
>
>
>
>   after creating the collection I loaded a few documents 
> using ‘csv’ format and I was able to query it using ‘curl’ command from my PC:
>
>
>
> -- this works on my PC:
>
> curl
> http://localhost:8988/solr/document3/select?q=*:*&sort=document_id+des
> c,sender_msg_dest+desc&fl=document_id,sender_msg_dest,recip_msg_dest
>
>
>
>   but when trying Streaming ‘search’ using curl, it does 
> not work, I tried with 3 different options: with zkHost, using 
> ‘export’, or using ‘select’, all getting the same error:
>
>
> curl: (6) Could not resolve host: sort=document_id asc,qt=
>
> {"result-set":{"docs":[
>
> {"EXCEPTION":null,"EOF":true}]}}
>
> -- different curl commands tried, all getting the same error above:
>
> curl --data-urlencode 
> 'expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id
> , sender_msg_dest", sort="document_id asc",qt="/export")' "
> http://localhost:8988/solr/document2/stream";
>
>
>
> curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id,
> sender_msg_dest", sort="document_id asc",qt="/export")' "
> http://localhost:8988/solr/document2/stream";
>
>
>
> curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id,
> sender_msg_dest", sort="document_id asc",qt="/select",rows=10)' "
> http://localhost:8988/solr/document2/stream";
>
>
>
>   what am I doing wrong? Thanks for any help!
>
>
>
> Regards,
>
> Hui Liu
>
solr-8988-console.log
=
9503054 ERROR (qtp1514322932-111) [c:document3

RE: Help needed on Solr Streaming Expressions

2016-06-06 Thread Hui Liu
The only difference between document3 and document5 is document3 has no data in 
'shard2', after loading some data into shard2, the http command also worked:

http://localhost:8988/solr/document3/stream?expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id,
 sender_msg_dest", sort="document_id asc",qt="/export")

my guess is the 'null pointer' error from the stack trace is caused by no data 
in the 'shard2'.

Regards,
Hui

-Original Message-
From: Hui Liu 
Sent: Monday, June 06, 2016 1:04 PM
To: solr-user@lucene.apache.org
Subject: RE: Help needed on Solr Streaming Expressions

Joel,

Thank you very much for your help, I tried the http command below with my 
existing 2 shards collection 'document3' (sorry I have a typo below should be 
document3 instead of document2), this time I got much better error:

{"result-set":{"docs":[
{"EXCEPTION":"Unable to construct instance of 
org.apache.solr.client.solrj.io.stream.CloudSolrStream","EOF":true}]}}

I attach the error stack trace from 'solr-8988-console.log' and 'solr.log' here 
in file 'solr_error.txt'.

However I continued and tried create another identical collection 'document5' 
with 2 shards and 2 replica using the same schema, this time the http URL 
worked!!! Maybe my previous collection 'document3' has some corruption? 

-- command to create collection 'document5':
solr create -c document5 -d new_doc_configs5 -p 8988 -s 2 -rf 2

-- command for stream expression:
http://localhost:8988/solr/document5/stream?expr=search(document5,zkHost="127.0.0.1:2181",q="*:*",fl="document_id,
 sender_msg_dest", sort="document_id asc",qt="/export")

-- result from browser:
{"result-set":{"docs":[
{"document_id":20346005172,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346005173,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346006403,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006406,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006741,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006743,"sender_msg_dest":"14:004321519IBMP"},
{"EOF":true,"RESPONSE_TIME":10}]}}

Do you think I can try the same in http using other 'Stream Decorators' such as 
'complement' and 'innerJoin'?

Regards,
Hui

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com]
Sent: Monday, June 06, 2016 9:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Help needed on Solr Streaming Expressions

Hi,

To eliminate any issues that might be happening due to curl, try running the 
command from your browser.

http://localhost:8988/solr/document2/stream?expr=search(document3,zkHost="
127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id 
asc",qt="/export")



I think most browsers will url encode the expression automatically, but you can 
url encode also using an online tool. Also you can remove the zkHost param and 
it should default to zkHost your solr is connected to.


If you still get an error take a look at the logs and post the full stack trace 
to this thread, which will help determine where the problem is.



Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jun 5, 2016 at 2:11 PM, Hui Liu  wrote:

> Hi,
>
>
>
>   I have Solr 6.0.0 installed on my PC (windows 7), I was 
> experimenting with ‘Streaming Expression’ feature by following steps 
> from this link:
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> , but cannot get it to work, attached is my solrconfig.xml and 
> schema.xml, note I do have ‘export’ handler defined in my 
> ‘solrconfig.xml’ and enabled all fields as ‘docvalues’ in 
> ‘schema.xml’; I am using solr cloud and external zookeeper (also 
> installed on m PC), here is the command to start this 2-node Solr 
> cloud instance and to create the collection ‘document3’:
>
>
>
> -- start 2-node solr cloud instances:
>
> solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3
>
> solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4
>
>
>
> -- create the collection:
>
> solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2
>
>
>
>   after creating the collection I loaded a few documents 
> using ‘csv’ format and I was able to query it using ‘curl’ command from my PC:
>
>
>
> -- this works on my PC:
>
> curl
> http://localhost:8988/solr/document3/select?q=*:*&sort=document_id+des
> c,sender_msg_dest+desc&fl=document_id,sender_msg_dest,recip_msg_dest
>
>
>
>   but when trying Streaming ‘search’ using curl, it does 
> not work, I tried with 3 different options: with zkHost, using 
> ‘export’, or using ‘select’, all getting the same error:
>
>
> curl: (6) Could not resolve host: sort=document_id asc,qt=
>
> {"result-set":{"docs":[
>
> {"EXCEPTION":null,"EOF":true}]}}
>
> -- different curl commands tried, all getting the same error above:
>
> curl --data-urlencode
> 'expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id
> , sender_msg_dest", sort="document_id asc",qt="/export")' "
> http://localhost:8988/

Re: Solr 6 fail to index images

2016-06-06 Thread Shawn Heisey
On 6/6/2016 10:56 AM, Jeferson dos Anjos wrote:
> I'm trying to index images on SOLR, but I get the following error:
> ERROR: [doc=5b36cb2b78072e41] Error adding field
> 'media_black_point'='(0.012054443, 0.012496948, 0.010314941)' msg=For
> input string: "(0.012054443" It looks like it's a problem of field
> types, but these fields are extracted automatically. I'm forgetting
> some additional configuration?

Looks like you're probably running into this, which was marked "Won't Fix":

https://issues.apache.org/jira/browse/SOLR-8017

Thanks,
Shawn



Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-06 Thread Joe Lawson
Mary Jo.

It appears to be working correctly but you have a very complex query going
on so it can be confusing. Assuming you are using the queryParser as
provided in examples your query would look like "+sbc" when it enters the
queryParser and would look like "+((sbc)^2.0 (sb)^0.5 (small block)^0.5)"
when it came out and then it would enter the normal pipeline and everything
would be processed as individual tokens.

It appears that you have synonyms being processed at query time on the
prodnumbertext field. For example when (sbc)^2.0 enters into the normal
query stage then have all the qf, pf, ps and tie modifies added so the
first one turns into something like

"(body:sbc^0.5 | productinfo:sbc^1.0 | keywords:sbc^2.0 | prodname:sbc^10.0
| prodnumbertext:sbc^20.0)^2.0"

Then the query time synonym expansion on produnumbertext combined with a
phrase and default mm being 100% (
https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Themm(MinimumShouldMatch)Parameter)
you end up with query being

(((prodnumbertext:sbc prodnumbertext:sb prodnumbertext:small)
prodnumbertext:block)~2)^20.0

The ~2 comes from mm=100% and having the phrase "small block" as a synonym.
This messes up your results as well as anything in prodnumbertext will have
to match "sbc block" "sb block" or "small block" which of course is only
going to match small block. Check out the section "Multi-work synonyms
won't work as phrase queries" in
https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ for
more info.

Advice: make sure on the schema that none of the fields your are running
queries against do any complex query operations, especially make sure they
aren't doing additional synonym resolution against the same file.

I think you are getting hit by the MM bug.  Try tuning it way down to
something like 0.01% and see how the matches go.



On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey  wrote:

> Okay so big thanks for the help with getting the hon_lucene_synonyms plugin
> working. That is a big load off to finally have a solution in place for all
> our multi-term synonyms. We did find that the information in Step 8 about
> the plugin showing "SynonymExpandingExtendedDismaxQParser" for QParser does
> not seem to be correct, we only ever get "ExtendedDismaxQParser" but the
> synonym expansion is definitely working.
>
> In implementing it though, the one thing I'm still having an issue with is
> trying to figure out how I can get results on the original term to appear
> first in our results and matches on the synonyms lower in the results. The
> plugin includes settings for an originalboost and synonymboost, but that
> doesn't seem to be working along with all the other edismax boosts I'm
> doing. We search across a number of fields, each with their own boost and
> then do phrase searches with boosts as well. My params look like this:
>
> params["defType"] = 'synonym_edismax';
> params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
> prodnumbertext^20.0';
> params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["ps"] = 1;
> params["tie"] = 0.1;
> params["synonyms"] = true;
> params["synonyms.originalBoost"] = 2.0;
> params["synonyms.synonymBoost"] = 0.5;
>
> And here's an example of what the plugin gives me for a search on "sbc"
> which includes synonyms for "sb" and "small block" I don't really know
> enough about this to figure out what exactly it's doing but since all of
> the results I am getting first are ones with "small block" in the name, and
> the ones with "sbc" in the prodname field which should be first are buried
> about 1000 documents in, I know the originalboost and synonymboost aren't
> working with all this other stuff. Ideas how to fix this? With the normal
> synonym filter we just set up copies of the fields that could have synonyms
> to use with that filter applied and had a lower boost on those. Not sure
> how to make it work with this custom query parser though.
>
> +((prodname:sbc^10.0 | body:sbc^0.5 | productinfo:sbc | keywords:sbc^2.0 |
> (((prodnumbertext:sbc prodnumbertext:small prodnumbertext:sb)
> prodnumbertext:block)~2)^20.0)~0.1^2.0 (((+(prodname:sb^10.0 | body:sb^0.5
> | productinfo:sb | keywords:sb^2.0 | (((prodnumbertext:sb
> prodnumbertext:small prodnumbertext:sbc) prodnumbertext:block)~2)^20.0)~0.1
> ()))^0.5) (((+(((prodname:small^10.0 | body:small^0.5 | productinfo:small |
> keywords:small^2.0 | prodnumbertext:small^20.0)~0.1 (prodname:block^10.0 |
> body:block^0.5 | productinfo:block | keywords:block^2.0 |
> prodnumbertext:block^20.0)~0.1)~2) (productinfo:"small block"~1 |
> body:"small block"~1^5.0 | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1 (productinfo:"small block"~1 | body:"small block"~1^5.0
> | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^5

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-06 Thread Joe Lawson
>
> Advice: make sure on the schema that none of the fields your are running
> queries against do any complex query operations, especially make sure they
> aren't doing additional synonym resolution against the same file.
>

BTW. I'd do this first before messing with MM


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-06 Thread MaryJo Sminkey
Oh thanks, yeah I did miss that one field which had a parent type with the
normal synonym filter. However, that's our product SKU field so really
doesn't even come into play. I verified that none of the other fields have
a synonym filter set and even removed the productumbertext just to make
sure it wasn't doing anything. I was still getting the same results, the
matches with "SBC" in the name are buried under the "small block" matches.
After thinking over the issue, I realized what the solution was, I just
needed to set the synonym.originalBoost high enough that it would be higher
than the boosts provided by the phrase boosting, which is clearly what was
letting "small block" jump ahead of "sbc". So I bumped that up to 100
leaving the synonymBoost at 1 and now I'm getting the results I'm looking
for.

Thanks for the help!

Mary Jo

Sent with MailTrack


On Mon, Jun 6, 2016 at 4:57 PM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> Mary Jo.
>
> It appears to be working correctly but you have a very complex query going
> on so it can be confusing. Assuming you are using the queryParser as
> provided in examples your query would look like "+sbc" when it enters the
> queryParser and would look like "+((sbc)^2.0 (sb)^0.5 (small block)^0.5)"
> when it came out and then it would enter the normal pipeline and everything
> would be processed as individual tokens.
>
> It appears that you have synonyms being processed at query time on the
> prodnumbertext field. For example when (sbc)^2.0 enters into the normal
> query stage then have all the qf, pf, ps and tie modifies added so the
> first one turns into something like
>
> "(body:sbc^0.5 | productinfo:sbc^1.0 | keywords:sbc^2.0 | prodname:sbc^10.0
> | prodnumbertext:sbc^20.0)^2.0"
>
> Then the query time synonym expansion on produnumbertext combined with a
> phrase and default mm being 100% (
>
> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Themm(MinimumShouldMatch)Parameter
> )
> you end up with query being
>
> (((prodnumbertext:sbc prodnumbertext:sb prodnumbertext:small)
> prodnumbertext:block)~2)^20.0
>
> The ~2 comes from mm=100% and having the phrase "small block" as a synonym.
> This messes up your results as well as anything in prodnumbertext will have
> to match "sbc block" "sb block" or "small block" which of course is only
> going to match small block. Check out the section "Multi-work synonyms
> won't work as phrase queries" in
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ for
> more info.
>
> Advice: make sure on the schema that none of the fields your are running
> queries against do any complex query operations, especially make sure they
> aren't doing additional synonym resolution against the same file.
>
> I think you are getting hit by the MM bug.  Try tuning it way down to
> something like 0.01% and see how the matches go.
>
>
>
> On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey 
> wrote:
>
> > Okay so big thanks for the help with getting the hon_lucene_synonyms
> plugin
> > working. That is a big load off to finally have a solution in place for
> all
> > our multi-term synonyms. We did find that the information in Step 8 about
> > the plugin showing "SynonymExpandingExtendedDismaxQParser" for QParser
> does
> > not seem to be correct, we only ever get "ExtendedDismaxQParser" but the
> > synonym expansion is definitely working.
> >
> > In implementing it though, the one thing I'm still having an issue with
> is
> > trying to figure out how I can get results on the original term to appear
> > first in our results and matches on the synonyms lower in the results.
> The
> > plugin includes settings for an originalboost and synonymboost, but that
> > doesn't seem to be working along with all the other edismax boosts I'm
> > doing. We search across a number of fields, each with their own boost and
> > then do phrase searches with boosts as well. My params look like this:
> >
> > params["defType"] = 'synonym_edismax';
> > params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
> > prodnumbertext^20.0';
> > params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> > params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> > params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> > params["ps"] = 1;
> > params["tie"] = 0.1;
> > params["synonyms"] = true;
> > params["synonyms.originalBoost"] = 2.0;
> > params["synonyms.synonymBoost"] = 0.5;
> >
> > And here's an example of what the plugin gives me for a search on "sbc"
> > which includes synonyms for "sb" and "small block" I don't really
> know
> > enough about this to figure out what exactly it's doing but since all of
> > the results I am getting first are ones with "small block" in the name,
> and
> > the ones with "sbc" in the prodname field which should be first are
> buried
> > abo

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-06 Thread Joe Lawson
Yeah I thought the scale of the boosts were off as well but got caught up
verifying that the plugin was working. My colleague suggested that it could
be that because small block is a phrase that it would get a higher score in
matching because you basically get a phrase match each time which causes it
to float to the top. You should check out his post about Solr's latest
score engine. It explains the notion of TF*IDF which drives almost all the
theory in information retrieval (aka search).

http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/

We were thinking, as you experimented with, that the 0.5 and 2.0 boosts
were no match for the product name and keyword field boosts so that would
influence your search as well.
On Jun 6, 2016 6:03 PM, "MaryJo Sminkey"  wrote:

> Oh thanks, yeah I did miss that one field which had a parent type with the
> normal synonym filter. However, that's our product SKU field so really
> doesn't even come into play. I verified that none of the other fields have
> a synonym filter set and even removed the productumbertext just to make
> sure it wasn't doing anything. I was still getting the same results, the
> matches with "SBC" in the name are buried under the "small block" matches.
> After thinking over the issue, I realized what the solution was, I just
> needed to set the synonym.originalBoost high enough that it would be higher
> than the boosts provided by the phrase boosting, which is clearly what was
> letting "small block" jump ahead of "sbc". So I bumped that up to 100
> leaving the synonymBoost at 1 and now I'm getting the results I'm looking
> for.
>
> Thanks for the help!
>
> Mary Jo
>
> Sent with MailTrack
> <
> https://mailtrack.io/install?source=signature&lang=en&referral=mjsmin...@gmail.com&idSignature=22
> >
>
> On Mon, Jun 6, 2016 at 4:57 PM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
>
> > Mary Jo.
> >
> > It appears to be working correctly but you have a very complex query
> going
> > on so it can be confusing. Assuming you are using the queryParser as
> > provided in examples your query would look like "+sbc" when it enters the
> > queryParser and would look like "+((sbc)^2.0 (sb)^0.5 (small block)^0.5)"
> > when it came out and then it would enter the normal pipeline and
> everything
> > would be processed as individual tokens.
> >
> > It appears that you have synonyms being processed at query time on the
> > prodnumbertext field. For example when (sbc)^2.0 enters into the normal
> > query stage then have all the qf, pf, ps and tie modifies added so the
> > first one turns into something like
> >
> > "(body:sbc^0.5 | productinfo:sbc^1.0 | keywords:sbc^2.0 |
> prodname:sbc^10.0
> > | prodnumbertext:sbc^20.0)^2.0"
> >
> > Then the query time synonym expansion on produnumbertext combined with a
> > phrase and default mm being 100% (
> >
> >
> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Themm(MinimumShouldMatch)Parameter
> > )
> > you end up with query being
> >
> > (((prodnumbertext:sbc prodnumbertext:sb prodnumbertext:small)
> > prodnumbertext:block)~2)^20.0
> >
> > The ~2 comes from mm=100% and having the phrase "small block" as a
> synonym.
> > This messes up your results as well as anything in prodnumbertext will
> have
> > to match "sbc block" "sb block" or "small block" which of course is only
> > going to match small block. Check out the section "Multi-work synonyms
> > won't work as phrase queries" in
> > https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ for
> > more info.
> >
> > Advice: make sure on the schema that none of the fields your are running
> > queries against do any complex query operations, especially make sure
> they
> > aren't doing additional synonym resolution against the same file.
> >
> > I think you are getting hit by the MM bug.  Try tuning it way down to
> > something like 0.01% and see how the matches go.
> >
> >
> >
> > On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey 
> > wrote:
> >
> > > Okay so big thanks for the help with getting the hon_lucene_synonyms
> > plugin
> > > working. That is a big load off to finally have a solution in place for
> > all
> > > our multi-term synonyms. We did find that the information in Step 8
> about
> > > the plugin showing "SynonymExpandingExtendedDismaxQParser" for QParser
> > does
> > > not seem to be correct, we only ever get "ExtendedDismaxQParser" but
> the
> > > synonym expansion is definitely working.
> > >
> > > In implementing it though, the one thing I'm still having an issue with
> > is
> > > trying to figure out how I can get results on the original term to
> appear
> > > first in our results and matches on the synonyms lower in the results.
> > The
> > > plugin includes settings for an originalboost and synonymboost, but
> that
> > > doesn't seem to be working along with all the other edismax boosts I'm
> > > doing. We search across a number of fields, each wit

using spell check on phrases

2016-06-06 Thread kaveh minooie

Hi everyone

I am using solr 6 and DirectSolrSpellChecker, and edismax parser. the 
problem that I am having is that when the query is a phrase, every 
single word in the phrase need to be misspelled for the spell checker to 
gets activated and gives suggestions. if only one of the word is 
misspelled then it just says that spelling is correct:

true

I was wondering if anyone has encountered this situation before and 
knows how to solve it?


thanks,

--
Kaveh Minooie


Re: NRT updates

2016-06-06 Thread Chris Vizisa
Hi,
Any pointers, suggestions, experiences ... please..

Thanks!
Chris.

On Mon, Jun 6, 2016 at 10:27 AM, Chris Vizisa 
wrote:

> Hi,
>
> Does number of fields in a document affect NRT updates?
> I have around 1.6 million products. Each product can be available in about
> 3000 stores.
> In addition to around 50 fields related to a product I am storing
> product_store info in each product document like:
>  1. Quantity of that product in each store (store_n1_count,
> store_n2_count,..., store_3000_count)
>  2. status of that product in each store (store_n1_status,
> store_n2_status,.store_3000_status)
>
> I would need to do NRT update on count and status of each product, and
> like that there are around 1.6 million products.
>
> Q1. Is it okay to do NRT updates on this product collection (for each
> product's store_count and store_status) with around 900 updates per second
>   across the different products, (pls note that each product's status
> as well as count gets updated, like that there are 1.6M products)
> Q2. Is it okay using atomic updates for the NRT updates of multiple
> store_counts and multiple store_status of each product and like that around
> 1.6 million products in total. Or is there any other optimal way to
> handle this amount of dynamic data change.
> For atomic updates I understand all fields need to be stored.
> Q3. So basically can I have all this info in product collection itself or
> should I store store_status info separately with productId joining them
> for the NRT scenario to work best. In that case each product_store
> info is a separate document, with 3 or 4 fields only but many million
> documents (worst case 1.6M products multiplied by 3000 stores).
> Q4.When we embed all store related info in the product doc itself, a
> single product doc
>can be a candidate for simultaneous updates as its count or status
> can change in
>different stores at the same time. If we go for a separate
> collection depicting
>product_status info, only one doc updated at a time mostly.
>Which is more efficient and optimized.?
>
>
> Could some one please suggest what is optimal. Any pointers welcome.
>
> Thanks!
> Chris.
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-06 Thread MaryJo Sminkey
On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

>
> We were thinking, as you experimented with, that the 0.5 and 2.0 boosts
> were no match for the product name and keyword field boosts so that would
> influence your search as well.



Yeah I definitely will have to play with the values a bit as we want the
product name matches to always appear highest, whether original or
synonyms, but I'll have to figure out how to get that result without one
word terms that have multi word synonyms getting overly boosted for a
phrase match while still sufficiently boosting the normal phrase match
stuff too. With the normal synonym filter I was able to just copy fields
that could have synonyms to a new field (which would be the only one with
the synonym filter), and use a different, lower boost on those fields, but
that won't work with this plugin which applies across everything in the
query. Makes it a bit more complicated to get everything just right.

MJ


Sent with MailTrack



Concern of large amount daily update

2016-06-06 Thread scott.chu

We recently plan to replace a old-school lucene that has 50M docs with 
Solrcloud but the daily update, according to the responsive colleague,  could 
be around 100 thousands docs. Its data source is a bunch of mysql tables. When 
implementing the updating workflow, what solud I do so that I can maintain a 
fair amount of time when doing updating docs? Currently what I have in mind are:

1. Use atomic update to avoid unnecessary full-doc update.
2. Run multple of my updating process where each update different range of docs.

Is there other things that I can do to help my issue? Is there any suggestion 
or expereiences for preparing appropriate h/w, e.g. CPU or RAM?

scott.chu,scott@udngroup.com
2016/6/7 (週二)


Using Solr to index zip files

2016-06-06 Thread anupama . gangadhar
Hi,

I have an use case where I need to search zip files quickly in HDFS. I intend 
to use Solr but not finding any relevant information about whether it can be 
done for zip files.
These are nested zip files i.e. zips within a zip file. Any help/information is 
much appreciated.

Thank you,
Regards,
Anupama


If you are not the addressee, please inform us immediately that you have 
received this e-mail by mistake, and delete it. We thank you for your support.



Solr 6.1.x Release Date ??

2016-06-06 Thread Ramesh Shankar
Hi,

Any idea of Solr 6.1.X Release Date ??

I am interested in the [subquery] transformer and like to know the release
date since its available only in 6.1.x

Thanks & Regards
Ramesh


Solr 5.4 Transaction

2016-06-06 Thread Pithon Philippe
Hi,
I have a question on Solr Transaction as relational databases

The Solr commit is not isolated for each client session, right?
In my test (source below) The commit in a session adds records of other
sessions
is there a documentation on this?
is what's planned improvements on this? version 6, version 7?
Thank you for any ideas!


Source example  :

public class TransactionTest {

static final String BASE_URL = "http://localhost:8983/solr/test";;

public static void main(String[] args) {

try {
new TransactionTest();
} catch (Exception e) {
e.printStackTrace();
}

}

public TransactionTest() throws Exception {

HttpSolrClient solrClient = new HttpSolrClient(BASE_URL);

DTOMail mail = new DTOMail();
mail.setType("mail");
mail.setBody("test body");

System.out.println("add been");
solrClient.addBean(mail);

pause();

System.out.println("commit");
solrClient.commit();

solrClient.close();
}

private void pause() {

try {
System.in.read();
} catch (Exception e) {
}

}

}