Re: very slow add/commit time

2009-11-03 Thread Bruno
How many MB have you set of cache on your solrconfig.xml?

On Tue, Nov 3, 2009 at 12:24 PM, Marc Des Garets wrote:

> Hi,
>
>
>
> I am experiencing a problem with an index of about 80 millions documents
> (41Gb). I am trying to update documents in this index using Solrj.
>
>
>
> When I do:
>
> solrServer.add(docs);  //docs is a List that contains
> 1000 SolrInputDocument (takes 36sec)
>
> solrServer.commit(false,false); //either never ends with a OutOfMemory
> error or takes forever
>
>
>
> I have -Xms4g -Xmx4g
>
>
>
> Any idea what could be the problem?
>
>
>
> Thanks for your help.
>
>
> --
> This transmission is strictly confidential, possibly legally privileged,
> and intended solely for the
> addressee.  Any views or opinions expressed within it are those of the
> author and do not necessarily
> represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's
> subsidiary companies.  If you
> are not the intended recipient then you must not disclose, copy or take any
> action in reliance of this
> transmission. If you have received this transmission in error, please
> notify the sender as soon as
> possible.  No employee or agent is authorised to conclude any binding
> agreement on behalf of
> i-CD Publishing (UK) Ltd with another party by email without express
> written confirmation by an
> authorised employee of the Company. http://www.192.com (Tel: 08000 192
> 192).  i-CD Publishing (UK) Ltd
> is incorporated in England and Wales, company number 3148549, VAT No. GB
> 673128728.




-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


Re: very slow add/commit time

2009-11-03 Thread Bruno
Try raising you ramBufferSize (it helped a lot when my team had performance
issues)

And also try checkin this link (helps a lot)
http://wiki.apache.org/solr/SolrPerformanceFactors

Regards

On Tue, Nov 3, 2009 at 12:38 PM, Marc Des Garets wrote:

> If you mean ramBufferSizeMB, I have it set on 512. The maxBufferedDocs
> is commented. If you mean queryResultMaxDocsCached, it is set on 200 but
> is it used when indexing?
>
> -Original Message-
> From: Bruno [mailto:brun...@gmail.com]
> Sent: 03 November 2009 14:27
> To: solr-user@lucene.apache.org
> Subject: Re: very slow add/commit time
>
> How many MB have you set of cache on your solrconfig.xml?
>
> On Tue, Nov 3, 2009 at 12:24 PM, Marc Des Garets
> wrote:
>
> > Hi,
> >
> >
> >
> > I am experiencing a problem with an index of about 80 millions
> documents
> > (41Gb). I am trying to update documents in this index using Solrj.
> >
> >
> >
> > When I do:
> >
> > solrServer.add(docs);  //docs is a List that
> contains
> > 1000 SolrInputDocument (takes 36sec)
> >
> > solrServer.commit(false,false); //either never ends with a OutOfMemory
> > error or takes forever
> >
> >
> >
> > I have -Xms4g -Xmx4g
> >
> >
> >
> > Any idea what could be the problem?
> >
> >
> >
> > Thanks for your help.
> >
> >
> > --
> > This transmission is strictly confidential, possibly legally
> privileged,
> > and intended solely for the
> > addressee.  Any views or opinions expressed within it are those of the
> > author and do not necessarily
> > represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's
> > subsidiary companies.  If you
> > are not the intended recipient then you must not disclose, copy or
> take any
> > action in reliance of this
> > transmission. If you have received this transmission in error, please
> > notify the sender as soon as
> > possible.  No employee or agent is authorised to conclude any binding
> > agreement on behalf of
> > i-CD Publishing (UK) Ltd with another party by email without express
> > written confirmation by an
> > authorised employee of the Company. http://www.192.com (Tel: 08000 192
> > 192).  i-CD Publishing (UK) Ltd
> > is incorporated in England and Wales, company number 3148549, VAT No.
> GB
> > 673128728.
>
>
>
>
> --
> Bruno Morelli Vargas
> Mail: brun...@gmail.com
> Msn: brun...@hotmail.com
> Icq: 165055101
> Skype: morellibmv
> --
> This transmission is strictly confidential, possibly legally privileged,
> and intended solely for the
> addressee.  Any views or opinions expressed within it are those of the
> author and do not necessarily
> represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's
> subsidiary companies.  If you
> are not the intended recipient then you must not disclose, copy or take any
> action in reliance of this
> transmission. If you have received this transmission in error, please
> notify the sender as soon as
> possible.  No employee or agent is authorised to conclude any binding
> agreement on behalf of
> i-CD Publishing (UK) Ltd with another party by email without express
> written confirmation by an
> authorised employee of the Company. http://www.192.com (Tel: 08000 192
> 192).  i-CD Publishing (UK) Ltd
> is incorporated in England and Wales, company number 3148549, VAT No. GB
> 673128728.




-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


Grouping

2009-12-04 Thread Bruno
Is there a way to make a group by or distinct query?

-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


Re: SolrJ: Highlighting not Working

2009-06-18 Thread Bruno
I've tried with default values and didn't work either.


On Thu, Jun 18, 2009 at 2:31 PM, Mark Miller  wrote:

> Why do you have:
> query.set("hl.maxAnalyzedChars", -1);
>
> Have you tried using the default? Unless -1 is an undoc'd feature, this
> means you wouldnt get anything back! This should normally be a fairly hefty
> value and defaults to 51200, according to the wiki.
>
> And why:
> query.set("hl.fragsize", 1);
>
> That means a fragment could only be 1 char - again, I'd try the default
> (take out the param), and adjust from there.
> (wiki says the default is 100).
>
> Let us know how it goes.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
> Bruno wrote:
>
>>  Hi guys.
>>  I new at using highlighting, so probably I'm making some stupid mistake,
>> however I'm not founding anything wrong.
>>  I use highlighting from a query withing a EmbeddedSolrServer, and within
>> the query I've set parameters necessary for enabling highlighting. Attached,
>> follows my schema and solrconfig.xml , and down below follows the Java code.
>> Content from the SolrDocumentList is not highlighted.
>>
>> EmbeddedSolrServer server = SolrServerManager./getServerEv/();
>> String queryString = filter;
>> SolrQuery query =
>>
>> *new* SolrQuery();
>>
>> query.setQuery(queryString);
>> query.setHighlight(*true*);
>> query.addHighlightField(/LOG_FIELD/);
>> query.setHighlightSimplePost("");
>> query.setHighlightSimplePre("");
>> query.set("hl.usePhraseHighlighter", *true*);
>> query.set("hl.highlightMultiTerm", *true*);
>> query.set("hl.snippets", 100);
>> query.set("hl.fragsize", 1);
>> query.set("hl.mergeContiguous", *false*);
>> query.set("hl.requireFieldMatch", *false*);
>> query.set("hl.maxAnalyzedChars", -1);
>>
>> query.addSortField(/DATE_FIELD/, SolrQuery.ORDER./asc/);
>> query.setFacetLimit(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
>> 1));
>> query.setRows(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
>> 1));
>> query.setIncludeScore(*true*);
>> QueryResponse rsp = server.query(query);
>> SolrDocumentList docs = rsp.getResults();
>>
>> --
>> Bruno Morelli Vargas  Mail: brun...@gmail.com > brun...@gmail.com>
>> Msn: brun...@hotmail.com <mailto:brun...@hotmail.com>
>> Icq: 165055101
>> Skype: morellibmv
>>
>>
>
>
>


-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


Re: SolrJ: Highlighting not Working

2009-06-18 Thread Bruno
Couple of things I've forgot to mention:

Solr Version: 1.3
Enviroment: Websphere

On Thu, Jun 18, 2009 at 2:34 PM, Bruno  wrote:

> I've tried with default values and didn't work either.
>
>
> On Thu, Jun 18, 2009 at 2:31 PM, Mark Miller wrote:
>
>> Why do you have:
>> query.set("hl.maxAnalyzedChars", -1);
>>
>> Have you tried using the default? Unless -1 is an undoc'd feature, this
>> means you wouldnt get anything back! This should normally be a fairly hefty
>> value and defaults to 51200, according to the wiki.
>>
>> And why:
>> query.set("hl.fragsize", 1);
>>
>> That means a fragment could only be 1 char - again, I'd try the default
>> (take out the param), and adjust from there.
>> (wiki says the default is 100).
>>
>> Let us know how it goes.
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>> Bruno wrote:
>>
>>>  Hi guys.
>>>  I new at using highlighting, so probably I'm making some stupid mistake,
>>> however I'm not founding anything wrong.
>>>  I use highlighting from a query withing a EmbeddedSolrServer, and within
>>> the query I've set parameters necessary for enabling highlighting. Attached,
>>> follows my schema and solrconfig.xml , and down below follows the Java code.
>>> Content from the SolrDocumentList is not highlighted.
>>>
>>> EmbeddedSolrServer server = SolrServerManager./getServerEv/();
>>> String queryString = filter;
>>> SolrQuery query =
>>>
>>> *new* SolrQuery();
>>>
>>> query.setQuery(queryString);
>>> query.setHighlight(*true*);
>>> query.addHighlightField(/LOG_FIELD/);
>>> query.setHighlightSimplePost("");
>>> query.setHighlightSimplePre("");
>>> query.set("hl.usePhraseHighlighter", *true*);
>>> query.set("hl.highlightMultiTerm", *true*);
>>> query.set("hl.snippets", 100);
>>> query.set("hl.fragsize", 1);
>>> query.set("hl.mergeContiguous", *false*);
>>> query.set("hl.requireFieldMatch", *false*);
>>> query.set("hl.maxAnalyzedChars", -1);
>>>
>>> query.addSortField(/DATE_FIELD/, SolrQuery.ORDER./asc/);
>>> query.setFacetLimit(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
>>> 1));
>>> query.setRows(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
>>> 1));
>>> query.setIncludeScore(*true*);
>>> QueryResponse rsp = server.query(query);
>>> SolrDocumentList docs = rsp.getResults();
>>>
>>> --
>>> Bruno Morelli Vargas  Mail: brun...@gmail.com >> brun...@gmail.com>
>>> Msn: brun...@hotmail.com <mailto:brun...@hotmail.com>
>>> Icq: 165055101
>>> Skype: morellibmv
>>>
>>>
>>
>>
>>
>
>
> --
> Bruno Morelli Vargas
> Mail: brun...@gmail.com
> Msn: brun...@hotmail.com
> Icq: 165055101
> Skype: morellibmv
>
>


-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


Re: SolrJ: Highlighting not Working

2009-06-18 Thread Bruno
Here is the query, search for the term "ipod" on the "log" field
q=log%3Aipod+AND+requestid%3A1029+AND+logfilename%3Apayxdev-1245272062125-USS.log.zip&hl=true&hl.fl=log&hl.fl=message&hl.simple.post=%3Ci%3E&hl.simple.pre=%3C%2Fi%3E&hl.usePhraseHighlighter=true&hl.highlightMultiTerm=true&hl.snippets=100&hl.fragsize=100&hl.mergeContiguous=false&hl.requireFieldMatch=false&hl.maxAnalyzedChars=-1&sort=timestamp+asc&facet.limit=6000&rows=6000&fl=score

On Thu, Jun 18, 2009 at 2:51 PM, Mark Miller  wrote:

> Nothing off the top of my head ...
>
> I can play around with some of the solrj unit tests a bit later and perhaps
> see if I can dig anything up.
>
> Note:
> if you expect wildcard/prefix/etc queries to highlight, they will not with
> Solr 1.3.
>
> query.set("hl.highlightMultiTerm", *true*);
>
> The above only applies to solr 1.4.
> So if your query is just a wildcard ...
>
> What is your query, by the way?
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
> Bruno wrote:
>
>> Couple of things I've forgot to mention:
>>
>> Solr Version: 1.3
>> Enviroment: Websphere
>>
>> On Thu, Jun 18, 2009 at 2:34 PM, Bruno  wrote:
>>
>>
>>
>>> I've tried with default values and didn't work either.
>>>
>>>
>>> On Thu, Jun 18, 2009 at 2:31 PM, Mark Miller >> >wrote:
>>>
>>>
>>>
>>>> Why do you have:
>>>> query.set("hl.maxAnalyzedChars", -1);
>>>>
>>>> Have you tried using the default? Unless -1 is an undoc'd feature, this
>>>> means you wouldnt get anything back! This should normally be a fairly
>>>> hefty
>>>> value and defaults to 51200, according to the wiki.
>>>>
>>>> And why:
>>>> query.set("hl.fragsize", 1);
>>>>
>>>> That means a fragment could only be 1 char - again, I'd try the default
>>>> (take out the param), and adjust from there.
>>>> (wiki says the default is 100).
>>>>
>>>> Let us know how it goes.
>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>> Bruno wrote:
>>>>
>>>>
>>>>
>>>>>  Hi guys.
>>>>>  I new at using highlighting, so probably I'm making some stupid
>>>>> mistake,
>>>>> however I'm not founding anything wrong.
>>>>>  I use highlighting from a query withing a EmbeddedSolrServer, and
>>>>> within
>>>>> the query I've set parameters necessary for enabling highlighting.
>>>>> Attached,
>>>>> follows my schema and solrconfig.xml , and down below follows the Java
>>>>> code.
>>>>> Content from the SolrDocumentList is not highlighted.
>>>>>
>>>>> EmbeddedSolrServer server = SolrServerManager./getServerEv/();
>>>>> String queryString = filter;
>>>>> SolrQuery query =
>>>>>
>>>>> *new* SolrQuery();
>>>>>
>>>>> query.setQuery(queryString);
>>>>> query.setHighlight(*true*);
>>>>> query.addHighlightField(/LOG_FIELD/);
>>>>> query.setHighlightSimplePost("");
>>>>> query.setHighlightSimplePre("");
>>>>> query.set("hl.usePhraseHighlighter", *true*);
>>>>> query.set("hl.highlightMultiTerm", *true*);
>>>>> query.set("hl.snippets", 100);
>>>>> query.set("hl.fragsize", 1);
>>>>> query.set("hl.mergeContiguous", *false*);
>>>>> query.set("hl.requireFieldMatch", *false*);
>>>>> query.set("hl.maxAnalyzedChars", -1);
>>>>>
>>>>> query.addSortField(/DATE_FIELD/, SolrQuery.ORDER./asc/);
>>>>>
>>>>> query.setFacetLimit(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
>>>>> 1));
>>>>>
>>>>> query.setRows(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
>>>>> 1));
>>>>> query.setIncludeScore(*true*);
>>>>> QueryResponse rsp = server.query(query);
>>>>> SolrDocumentList docs = rsp.getResults();
>>>>>
>>>>> --
>>>>> Bruno Morelli Vargas  Mail: brun...@gmail.com >>>> brun...@gmail.com>
>>>>> Msn: brun...@hotmail.com <mailto:brun...@hotmail.com>
>>>>> Icq: 165055101
>>>>> Skype: morellibmv
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>> --
>>> Bruno Morelli Vargas
>>> Mail: brun...@gmail.com
>>> Msn: brun...@hotmail.com
>>> Icq: 165055101
>>> Skype: morellibmv
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>
>
>


-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


Re: SolrJ: Highlighting not Working

2009-06-18 Thread Bruno
I've checked the NamedList you told me about, but it contains only one
highlighted doc, when there I have more docs that sould be highlighted.

On Thu, Jun 18, 2009 at 3:03 PM, Erik Hatcher wrote:

> Note that highlighting is NOT part of the document list returned.  It's in
> an additional NamedList section of the response (with name="highlighting")
>
>Erik
>
>
> On Jun 18, 2009, at 1:22 PM, Bruno wrote:
>
>  Hi guys.
>>
>> I new at using highlighting, so probably I'm making some stupid mistake,
>> however I'm not founding anything wrong.
>>
>> I use highlighting from a query withing a EmbeddedSolrServer, and within
>> the query I've set parameters necessary for enabling highlighting. Attached,
>> follows my schema and solrconfig.xml , and down below follows the Java code.
>> Content from the SolrDocumentList is not highlighted.
>>
>> EmbeddedSolrServer server = SolrServerManager.getServerEv();
>> String queryString = filter;
>> SolrQuery query =
>>
>> new SolrQuery();
>>
>> query.setQuery(queryString);
>> query.setHighlight(true);
>> query.addHighlightField(LOG_FIELD);
>> query.setHighlightSimplePost("");
>> query.setHighlightSimplePre("");
>> query.set("hl.usePhraseHighlighter", true);
>> query.set("hl.highlightMultiTerm", true);
>> query.set("hl.snippets", 100);
>> query.set("hl.fragsize", 1);
>> query.set("hl.mergeContiguous", false);
>> query.set("hl.requireFieldMatch", false);
>> query.set("hl.maxAnalyzedChars", -1);
>>
>> query.addSortField(DATE_FIELD, SolrQuery.ORDER.asc);
>> query.setFacetLimit(LogUtilProperties.getInstance().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE,
>> 1));
>> query.setRows(LogUtilProperties.getInstance().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE,
>> 1));
>> query.setIncludeScore(true);
>> QueryResponse rsp = server.query(query);
>> SolrDocumentList docs = rsp.getResults();
>>
>> --
>> Bruno Morelli Vargas
>> Mail: brun...@gmail.com
>> Msn: brun...@hotmail.com
>> Icq: 165055101
>> Skype: morellibmv
>>
>> 
>>
>
>


-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


Re: SolrJ: Highlighting not Working

2009-06-18 Thread Bruno
Just figured out what happened... It's necessary for the schema to have a
uniqueKey set, otherwise, highlighting will have one or less entries, as the
map's key is the doc uniqueKey, so on debuggin I figured out that the
QueryResponse tries to put all highlighted results in a map with null key...
at end, putting tons of entries all with null key will result on a
one-entry-only map.

Thanks for the help guys.

On Thu, Jun 18, 2009 at 3:17 PM, Bruno  wrote:

> I've checked the NamedList you told me about, but it contains only one
> highlighted doc, when there I have more docs that sould be highlighted.
>
>
> On Thu, Jun 18, 2009 at 3:03 PM, Erik Hatcher 
> wrote:
>
>> Note that highlighting is NOT part of the document list returned.  It's in
>> an additional NamedList section of the response (with name="highlighting")
>>
>>Erik
>>
>>
>> On Jun 18, 2009, at 1:22 PM, Bruno wrote:
>>
>>  Hi guys.
>>>
>>> I new at using highlighting, so probably I'm making some stupid mistake,
>>> however I'm not founding anything wrong.
>>>
>>> I use highlighting from a query withing a EmbeddedSolrServer, and within
>>> the query I've set parameters necessary for enabling highlighting. Attached,
>>> follows my schema and solrconfig.xml , and down below follows the Java code.
>>> Content from the SolrDocumentList is not highlighted.
>>>
>>> EmbeddedSolrServer server = SolrServerManager.getServerEv();
>>> String queryString = filter;
>>> SolrQuery query =
>>>
>>> new SolrQuery();
>>>
>>> query.setQuery(queryString);
>>> query.setHighlight(true);
>>> query.addHighlightField(LOG_FIELD);
>>> query.setHighlightSimplePost("");
>>> query.setHighlightSimplePre("");
>>> query.set("hl.usePhraseHighlighter", true);
>>> query.set("hl.highlightMultiTerm", true);
>>> query.set("hl.snippets", 100);
>>> query.set("hl.fragsize", 1);
>>> query.set("hl.mergeContiguous", false);
>>> query.set("hl.requireFieldMatch", false);
>>> query.set("hl.maxAnalyzedChars", -1);
>>>
>>> query.addSortField(DATE_FIELD, SolrQuery.ORDER.asc);
>>> query.setFacetLimit(LogUtilProperties.getInstance().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE,
>>> 1));
>>> query.setRows(LogUtilProperties.getInstance().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE,
>>> 1));
>>> query.setIncludeScore(true);
>>> QueryResponse rsp = server.query(query);
>>> SolrDocumentList docs = rsp.getResults();
>>>
>>> --
>>> Bruno Morelli Vargas
>>> Mail: brun...@gmail.com
>>> Msn: brun...@hotmail.com
>>> Icq: 165055101
>>> Skype: morellibmv
>>>
>>> 
>>>
>>
>>
>
>
> --
> Bruno Morelli Vargas
> Mail: brun...@gmail.com
> Msn: brun...@hotmail.com
> Icq: 165055101
> Skype: morellibmv
>
>


-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


Re: Slowness during submit the index

2009-06-20 Thread Bruno
d your email correctly, but I think you
>>>> are saying
>>>> you are indexing your DB content into a Solr index.  If this is
>>>> correct, here
>>>> are things to look at:
>>>> * is the java version the same on both machines (QA vs. PROD)
>>>> * are the same java parameters being used on both machines
>>>> * is the connection to the DB the same on both machines
>>>> * are both the PROD and QA DB servers the same and are both DB
>>>> instances the
>>>> same
>>>> ...
>>>>
>>>>
>>>> Otis
>>>> --
>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>
>>>>
>>>>
>>>> - Original Message 
>>>>> From: Francis Yakin
>>>>> To: "solr-user@lucene.apache.org"
>>>>> Sent: Friday, June 19, 2009 5:27:59 PM
>>>>> Subject: Slowness during submit the index
>>>>>
>>>>>
>>>>> We are experiencing slowness during reloading/resubmitting index
>>>>> from
>>> Database
>>>>> to the master.
>>>>>
>>>>> We have two environments:
>>>>>
>>>>> QA and Prod.
>>>>>
>>>>> The slowness is happened only in Production but not in QA.
>>>>>
>>>>> It only takes one hours to reload 2.5Mil indexes compare 5-6
>>>>> hours to load
>>> the
>>>>> same size of index in Prod.
>>>>>
>>>>> I checked both the config files in QA and Prod, they are all
>>>>> identical,
>>>> except:
>>>>>
>>>>>
>>>>> In QA:
>>>>> false
>>>>> In Prod:
>>>>> true
>>>>>
>>>>> I believe that we use "http" protocol reload/submit the index
>>>>> from Database
>>> to
>>>>> Solr Master.
>>>>> I did test copying big files thru network from database to the
>>>>> solr box, I
>>>> don't
>>>>> see any issue.
>>>>>
>>>>> We are running solr 1.2
>>>>>
>>>>> Any inputs will be much appreciated.
>>
>
>

-- 
Enviado do meu celular

Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


Reading a parameter from a String.

2010-04-14 Thread Bruno
I need to change a parameter from within a query string.

:* AND requestid:100 AND timestamp:[2010-04-13T20:30:00.000Z TO
2010-04-13T21:00:00.000Z] AND
source:"LogCollector-risidev3was2.201002020100._opt_ISI_logs.FNM.stdout_ISIREG_10.02.01_02.00.00.txt.tar.gz-stdout_ISIREG_10.02.01_02.00.00.txt.FNM.risidev3was2_opt_ISI_logs.201002020100"

In this case I have to change the timestamp parameters.

Is there a way?

-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


document support for file system crawling

2006-08-29 Thread Bruno

Hi there,

browsing through the message thread I tried to find a trail addressing file
system crawls. I want to implement an enterprise search over a networked
filesystem, crawling all sorts of documents, such as html, doc, ppt and pdf.
Nutch provides plugins enabling it to read proprietary formats. 
Is there support for the same functionality in solr?

Bruno
-- 
View this message in context: 
http://www.nabble.com/document-support-for-file-system-crawling-tf2188066.html#a6053318
Sent from the Solr - User forum at Nabble.com.



Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina

Hi All,

Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the same
schema but not.
I have one solr instance launch. 300 000 records in each collection.

I try to use this request without having both results:

http://my_adress:my_port/solr/C1/select?collection=C1,C2&q=fid:34520196&wt=json

this request returns only C1 results and if I do:

http://my_adress:my_port/solr/C2/select?collection=C1,C2&q=fid:34520196&wt=json

it returns only C2 results.

I have 5 identical fields on both collection
id, fid, st, cc, timestamp
where id is the unique key field.

Can someone could explain me why it doesn't work ?

Thanks a lot !
Bruno

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com



Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina

yes id value is unique in C1 and unique in C2.
id in C1 is never present in C2
id in C2 is never present in C1

Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?
To get proper results, the ids should be unique across both the cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina  wrote:


Hi All,

Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the same
schema but not.
I have one solr instance launch. 300 000 records in each collection.

I try to use this request without having both results:

http://my_adress:my_port
/solr/C1/select?collection=C1,C2&q=fid:34520196&wt=json

this request returns only C1 results and if I do:

http://my_adress:my_port
/solr/C2/select?collection=C1,C2&q=fid:34520196&wt=json

it returns only C2 results.

I have 5 identical fields on both collection
id, fid, st, cc, timestamp
where id is the unique key field.

Can someone could explain me why it doesn't work ?

Thanks a lot !
Bruno

---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com

--

Regards,
Binoy Dalal




---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com



Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina

Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same query 
fid:34520196


http://xxx.xxx.xxx.xxx:/solr/c1/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2

{ "responseHeader":{ "status":0, "QTime":1, "params":{ 
"fl":"fid,cc*,st", "indent":"true", "q":"fid:34520196", 
"collection":"c1,c2", "wt":"json"}}, 
"response":{"numFound":1,"start":0,"docs":[ {


"id":"EP1680447",
"st":"LAPSED",
"fid":"34520196"}]
  }
}


http://xxx.xxx.xxx.xxx:/solr/c2/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2

{
  "responseHeader":{
"status":0,
"QTime":0,
"params":{
  "fl":"id,fid,cc*,st",
  "indent":"true",
  "q":"fid:34520196",
  "collection":"c1,c2",
  "wt":"json"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"WO2005040212",
"st":"PENDING",
"cc_CA":"LAPSED",
"cc_EP":"LAPSED",
"cc_JP":"PENDING",
"cc_US":"LAPSED",
"fid":"34520196"}]
  }}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno

Le 06/01/2016 14:56, Emir Arnautovic a écrit :

Hi Bruno,
Can you check counts? Is it possible that first page is only with 
results from collection that you sent request to so you assumed it 
returns only results from single collection?


Thanks,
Emir

On 06.01.2016 14:33, Susheel Kumar wrote:

Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned 
results
from two identical collections. I doubt if it is broken in 5.4 just 
double

check if you are not missing anything else.

Thanks,
Susheel

http://localhost:8983/solr/c1/select?q=id_type%3Ahello&wt=json&indent=true&collection=c1,c2 



responseHeader": {"status": 0,"QTime": 98,"params": {"q": 
"id_type:hello","

indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id": "1","
id_type": "hello","_version_": 1522623395043213300},{"id": 
"3","id_type": "

hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina  wrote:


yes id value is unique in C1 and unique in C2.
id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :


Are Id values for docs in both the collections exactly same?
To get proper results, the ids should be unique across both the cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina  wrote:

Hi All,

Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the 
same

schema but not.
I have one solr instance launch. 300 000 records in each collection.

I try to use this request without having both results:

http://my_adress:my_port
/solr/C1/select?collection=C1,C2&q=fid:34520196&wt=json

this request returns only C1 results and if I do:

http://my_adress:my_port
/solr/C2/select?collection=C1,C2&q=fid:34520196&wt=json

it returns only C2 results.

I have 5 identical fields on both collection
id, fid, st, cc, timestamp
where id is the unique key field.

Can someone could explain me why it doesn't work ?

Thanks a lot !
Bruno

---
L'absence de virus dans ce courrier électronique a été vérifiée 
par le

logiciel antivirus Avast.
http://www.avast.com

--


Regards,
Binoy Dalal



---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com







---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com



Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina

I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :

I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel

On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina  wrote:


Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same query
fid:34520196

http://xxx.xxx.xxx.xxx:
/solr/c1/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2

{ "responseHeader":{ "status":0, "QTime":1, "params":{ "fl":"fid,cc*,st",
"indent":"true", "q":"fid:34520196", "collection":"c1,c2", "wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[ {

 "id":"EP1680447",
 "st":"LAPSED",
 "fid":"34520196"}]
   }
}


http://xxx.xxx.xxx.xxx:
/solr/c2/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2

{
   "responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
   "fl":"id,fid,cc*,st",
   "indent":"true",
   "q":"fid:34520196",
   "collection":"c1,c2",
   "wt":"json"}},
   "response":{"numFound":1,"start":0,"docs":[
   {
 "id":"WO2005040212",
 "st":"PENDING",
 "cc_CA":"LAPSED",
 "cc_EP":"LAPSED",
 "cc_JP":"PENDING",
 "cc_US":"LAPSED",
 "fid":"34520196"}]
   }}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :


Hi Bruno,
Can you check counts? Is it possible that first page is only with results
from collection that you sent request to so you assumed it returns only
results from single collection?

Thanks,
Emir

On 06.01.2016 14:33, Susheel Kumar wrote:


Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel


http://localhost:8983/solr/c1/select?q=id_type%3Ahello&wt=json&indent=true&collection=c1,c2

responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id": "1","
id_type": "hello","_version_": 1522623395043213300},{"id":
"3","id_type":"
hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina  wrote:

yes id value is unique in C1 and unique in C2.

id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?

To get proper results, the ids should be unique across both the cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina  wrote:

Hi All,


Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the
same
schema but not.
I have one solr instance launch. 300 000 records in each collection.

I try to use this request without having both results:

http://my_adress:my_port
/solr/C1/select?collection=C1,C2&q=fid:34520196&wt=json

this request returns only C1 results and if I do:

http://my_adress:my_port
/solr/C2/select?collection=C1,C2&q=fid:34520196&wt=json

it returns only C2 results.

I have 5 identical fields on both collection
id, fid, st, cc, timestamp
where id is the unique key field.

Can someone could explain me why it doesn't work ?

Thanks a lot !
Bruno

---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com

--

Regards,

Binoy Dalal


---

L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com




---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com



Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina
Same result on my dev' server, it seems that collection param haven't 
effect on the query...


Q: I don't see on the solr 5.4 doc, the "collection" param for select 
handler, is it always present in 5.4 version ?


Le 06/01/2016 17:38, Bruno Mannina a écrit :

I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :

I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel

On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina  wrote:


Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same query
fid:34520196

http://xxx.xxx.xxx.xxx:
/solr/c1/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2 



{ "responseHeader":{ "status":0, "QTime":1, "params":{ 
"fl":"fid,cc*,st",
"indent":"true", "q":"fid:34520196", "collection":"c1,c2", 
"wt":"json"}},

"response":{"numFound":1,"start":0,"docs":[ {

 "id":"EP1680447",
 "st":"LAPSED",
 "fid":"34520196"}]
   }
}


http://xxx.xxx.xxx.xxx:
/solr/c2/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2 



{
   "responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
   "fl":"id,fid,cc*,st",
   "indent":"true",
   "q":"fid:34520196",
   "collection":"c1,c2",
   "wt":"json"}},
   "response":{"numFound":1,"start":0,"docs":[
   {
 "id":"WO2005040212",
 "st":"PENDING",
 "cc_CA":"LAPSED",
 "cc_EP":"LAPSED",
 "cc_JP":"PENDING",
 "cc_US":"LAPSED",
 "fid":"34520196"}]
   }}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :


Hi Bruno,
Can you check counts? Is it possible that first page is only with 
results
from collection that you sent request to so you assumed it returns 
only

results from single collection?

Thanks,
Emir

On 06.01.2016 14:33, Susheel Kumar wrote:


Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel


http://localhost:8983/solr/c1/select?q=id_type%3Ahello&wt=json&indent=true&collection=c1,c2 



responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id": 
"1","

id_type": "hello","_version_": 1522623395043213300},{"id":
"3","id_type":"
hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina  
wrote:


yes id value is unique in C1 and unique in C2.

id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?
To get proper results, the ids should be unique across both the 
cores.


On Wed, 6 Jan 2016, 15:11 Bruno Mannina  wrote:

Hi All,


Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the
same
schema but not.
I have one solr instance launch. 300 000 records in each 
collection.


I try to use this request without having both results:

http://my_adress:my_port
/solr/C1/select?collection=C1,C2&q=fid:34520196&wt=json

this request returns only C1 results and if I do:

http://my_adress:my_port
/solr/C2/select?collection=C1,C2&q=fid:34520196&wt=json

it returns only C2 results.

I have 5 identical fields on both collection
id, fid, st, cc, timestamp
where id is the unique key field.

Can someone could explain me why it doesn't work ?

Thanks a lot !
Bruno

---
L'absence de virus dans ce courrier électronique a été vérifiée 
par le

logiciel antivirus Avast.
http://www.avast.com

--

Regards,

Binoy Dalal


---
L'absence de virus dans ce courrier électronique a été vérifiée 
par le

logiciel antivirus Avast.
http://www.avast.com




---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com





---
L'absence de virus dans ce courrier électronique a été vérifiée par le 
logiciel antivirus Avast.

http://www.avast.com






---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com



Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina

Hi Ester,

yes, i saw it, but if I use:

q={!join from=fid to=fid}fid:34520196 (with or not &collection=c1,c2)

I have only the result from the collection used in the select/c1

Le 06/01/2016 17:52, esther.quan...@lucidworks.com a écrit :

Hi Bruno,

You might consider using the JoinQueryParser. Details here : 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser

Best,
Esther


Le 6 janv. 2016 à 08:48, Bruno Mannina  a écrit :

Same result on my dev' server, it seems that collection param haven't effect on 
the query...

Q: I don't see on the solr 5.4 doc, the "collection" param for select handler, 
is it always present in 5.4 version ?

Le 06/01/2016 17:38, Bruno Mannina a écrit :

I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :

I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel


On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina  wrote:

Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same query
fid:34520196

http://xxx.xxx.xxx.xxx:
/solr/c1/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2

{ "responseHeader":{ "status":0, "QTime":1, "params":{ "fl":"fid,cc*,st",
"indent":"true", "q":"fid:34520196", "collection":"c1,c2", "wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[ {

 "id":"EP1680447",
 "st":"LAPSED",
 "fid":"34520196"}]
   }
}


http://xxx.xxx.xxx.xxx:
/solr/c2/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2

{
   "responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
   "fl":"id,fid,cc*,st",
   "indent":"true",
   "q":"fid:34520196",
   "collection":"c1,c2",
   "wt":"json"}},
   "response":{"numFound":1,"start":0,"docs":[
   {
 "id":"WO2005040212",
 "st":"PENDING",
 "cc_CA":"LAPSED",
 "cc_EP":"LAPSED",
     "cc_JP":"PENDING",
 "cc_US":"LAPSED",
 "fid":"34520196"}]
   }}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :


Hi Bruno,
Can you check counts? Is it possible that first page is only with results
from collection that you sent request to so you assumed it returns only
results from single collection?

Thanks,
Emir


On 06.01.2016 14:33, Susheel Kumar wrote:

Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel


http://localhost:8983/solr/c1/select?q=id_type%3Ahello&wt=json&indent=true&collection=c1,c2

responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id": "1","
id_type": "hello","_version_": 1522623395043213300},{"id":
"3","id_type":"
hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina  wrote:

yes id value is unique in C1 and unique in C2.

id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?

To get proper results, the ids should be unique across both the cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina  wrote:

Hi All,


Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the
same
schema but not.
I have one solr instance launch. 300 000 records in each collection.

I try to use this request without having both results:

http://my_adress:my_port
/solr/C1/select?collection=C1,C2&q=fid:34520196&wt=json

this request returns only C1 results and if I do:

http://my_adress:my_port
/solr/C2/select?collection=C1,C2&q=fid:34520196&wt=json

it returns on

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina

:( not work for me

http://my_adress:my_port/solr/c1/select?q={!join from=fid to=fid 
fromIndex=c2}fid:34520196&wt=json

the result is always the same, it answer only for c1
34520196 has result in both collections



Le 06/01/2016 18:16, Binoy Dalal a écrit :

Bruno,
Use join like so:
{!join from=f1 to=f2 fromIndex=c2}
On c1

On Wed, 6 Jan 2016, 22:30 Bruno Mannina  wrote:


Hi Ester,

yes, i saw it, but if I use:

q={!join from=fid to=fid}fid:34520196 (with or not &collection=c1,c2)

I have only the result from the collection used in the select/c1

Le 06/01/2016 17:52, esther.quan...@lucidworks.com a écrit :

Hi Bruno,

You might consider using the JoinQueryParser. Details here :

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser

Best,
Esther


Le 6 janv. 2016 à 08:48, Bruno Mannina  a écrit :

Same result on my dev' server, it seems that collection param haven't

effect on the query...

Q: I don't see on the solr 5.4 doc, the "collection" param for select

handler, is it always present in 5.4 version ?

Le 06/01/2016 17:38, Bruno Mannina a écrit :

I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :

I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel


On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina 

wrote:

Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same

query

fid:34520196

http://xxx.xxx.xxx.xxx:


/solr/c1/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2

{ "responseHeader":{ "status":0, "QTime":1, "params":{

"fl":"fid,cc*,st",

"indent":"true", "q":"fid:34520196", "collection":"c1,c2",

"wt":"json"}},

"response":{"numFound":1,"start":0,"docs":[ {

  "id":"EP1680447",
  "st":"LAPSED",
  "fid":"34520196"}]
}
}


http://xxx.xxx.xxx.xxx:


/solr/c2/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2

{
"responseHeader":{
  "status":0,
  "QTime":0,
  "params":{
"fl":"id,fid,cc*,st",
"indent":"true",
"q":"fid:34520196",
"collection":"c1,c2",
"wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[
{
  "id":"WO2005040212",
  "st":"PENDING",
  "cc_CA":"LAPSED",
  "cc_EP":"LAPSED",
  "cc_JP":"PENDING",
  "cc_US":"LAPSED",
  "fid":"34520196"}]
}}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :


Hi Bruno,
Can you check counts? Is it possible that first page is only with

results

from collection that you sent request to so you assumed it returns

only

results from single collection?

Thanks,
Emir


On 06.01.2016 14:33, Susheel Kumar wrote:

Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel




http://localhost:8983/solr/c1/select?q=id_type%3Ahello&wt=json&indent=true&collection=c1,c2

responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id":

"1","

id_type": "hello","_version_": 1522623395043213300},{"id":
"3","id_type":"
hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina 

wrote:

yes id value is unique in C1 and unique in C2.

id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?

To get proper results, the ids should be unique across both the

cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina 

wrote:

Hi All,


Solr 5.4, Ubuntu

I thought it was simple to req

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina

Yeah ! it works with your method !

thanks a lot Esther !


Le 06/01/2016 19:15, Esther-Melaine Quansah a écrit :

Ok, so join won’t work. Distributed search is your answer. This worked for me:

http://localhost:8983/solr/temp/select?shards=localhost:8983/solr/job,localhost:8983/solr/temp&q=*:*
 
<http://localhost:8983/solr/temp/select?shards=localhost:8983/solr/job,localhost:8983/solr/temp&q=*:*>

so for you it’d look something like

http://localhost:8983/solr/c1/select?shards=localhost:8983/solr/c1,localhost:8983/solr/c2&q=fid:34520196
 
<http://localhost:8983/solr/c1/select?shards=localhost:8983/solr/c1,localhost:8983/solr/c2&q=fid:34520196>
and obviously, you’ll just choose the ports that correspond to your 
configuration.

Esther

On Jan 6, 2016, at 9:36 AM, Bruno Mannina  wrote:

:( not work for me

http://my_adress:my_port/solr/c1/select?q={!join from=fid to=fid 
fromIndex=c2}fid:34520196&wt=json

the result is always the same, it answer only for c1
34520196 has result in both collections



Le 06/01/2016 18:16, Binoy Dalal a écrit :

Bruno,
Use join like so:
{!join from=f1 to=f2 fromIndex=c2}
On c1

On Wed, 6 Jan 2016, 22:30 Bruno Mannina  wrote:


Hi Ester,

yes, i saw it, but if I use:

q={!join from=fid to=fid}fid:34520196 (with or not &collection=c1,c2)

I have only the result from the collection used in the select/c1

Le 06/01/2016 17:52, esther.quan...@lucidworks.com a écrit :

Hi Bruno,

You might consider using the JoinQueryParser. Details here :

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser

Best,
Esther


Le 6 janv. 2016 à 08:48, Bruno Mannina  a écrit :

Same result on my dev' server, it seems that collection param haven't

effect on the query...

Q: I don't see on the solr 5.4 doc, the "collection" param for select

handler, is it always present in 5.4 version ?

Le 06/01/2016 17:38, Bruno Mannina a écrit :

I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :

I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel


On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina 

wrote:

Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same

query

fid:34520196

http://xxx.xxx.xxx.xxx:


/solr/c1/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2

{ "responseHeader":{ "status":0, "QTime":1, "params":{

"fl":"fid,cc*,st",

"indent":"true", "q":"fid:34520196", "collection":"c1,c2",

"wt":"json"}},

"response":{"numFound":1,"start":0,"docs":[ {

  "id":"EP1680447",
  "st":"LAPSED",
  "fid":"34520196"}]
}
}


http://xxx.xxx.xxx.xxx:


/solr/c2/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2

{
"responseHeader":{
  "status":0,
  "QTime":0,
  "params":{
"fl":"id,fid,cc*,st",
"indent":"true",
"q":"fid:34520196",
"collection":"c1,c2",
    "wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[
{
  "id":"WO2005040212",
  "st":"PENDING",
  "cc_CA":"LAPSED",
  "cc_EP":"LAPSED",
  "cc_JP":"PENDING",
  "cc_US":"LAPSED",
  "fid":"34520196"}]
}}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :


Hi Bruno,
Can you check counts? Is it possible that first page is only with

results

from collection that you sent request to so you assumed it returns

only

results from single collection?

Thanks,
Emir


On 06.01.2016 14:33, Susheel Kumar wrote:

Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel




http://localhost:8983/solr/c1/select?q=id_type%3Ahello&wt=json&indent=true&collection=c1,c2

responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina

Hi Shawn,

thanks for this info, I use solr alone on my own server.

Le 06/01/2016 20:13, Shawn Heisey a écrit :

On 1/6/2016 2:41 AM, Bruno Mannina wrote:

I try to use this request without having both results:

http://my_adress:my_port/solr/C1/select?collection=C1,C2&q=fid:34520196&wt=json


this request returns only C1 results and if I do:

http://my_adress:my_port/solr/C2/select?collection=C1,C2&q=fid:34520196&wt=json


it returns only C2 results.

Are you running in SolrCloud mode (with zookeeper)?  If you're not, then
the collection parameter doesn't do anything, and old-style distributed
search (with the shards parameter) will be your only option.

Thanks,
Shawn






---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com



Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina

Hi,

is it possible that was the problem wrote by Shawn and you have 
SolrCloud mode (with zookeeper) ?


The solution gives by Esther works fine so it's ok for me :)

**

Are you running in SolrCloud mode (with zookeeper)?  If you're not, then
the collection parameter doesn't do anything, and old-style distributed
search (with the shards parameter) will be your only option.

Thanks,
Shawn

***

Le 06/01/2016 19:17, Susheel Kumar a écrit :

Hi Bruno,  I just tested on 5.4 for your sake and it works fine.  You are
somewhere goofing up.  Please create a new simple schema different from
your use case with 2-3 fields with 2-3 documents and test this out
independently on your current problem.  That's what i can make suggestion
and did same to confirm this.

On Wed, Jan 6, 2016 at 11:48 AM, Bruno Mannina  wrote:


Same result on my dev' server, it seems that collection param haven't
effect on the query...

Q: I don't see on the solr 5.4 doc, the "collection" param for select
handler, is it always present in 5.4 version ?


Le 06/01/2016 17:38, Bruno Mannina a écrit :


I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :


I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel

On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina  wrote:

Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same query
fid:34520196

http://xxx.xxx.xxx.xxx:
/solr/c1/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2


{ "responseHeader":{ "status":0, "QTime":1, "params":{
"fl":"fid,cc*,st",
"indent":"true", "q":"fid:34520196", "collection":"c1,c2",
"wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[ {

  "id":"EP1680447",
  "st":"LAPSED",
  "fid":"34520196"}]
}
}


http://xxx.xxx.xxx.xxx:
/solr/c2/select?q=fid:34520196&wt=json&indent=true&fl=id,fid,cc*,st&collection=c1,c2


{
"responseHeader":{
  "status":0,
  "QTime":0,
  "params":{
"fl":"id,fid,cc*,st",
"indent":"true",
"q":"fid:34520196",
"collection":"c1,c2",
"wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[
{
  "id":"WO2005040212",
  "st":"PENDING",
  "cc_CA":"LAPSED",
  "cc_EP":"LAPSED",
  "cc_JP":"PENDING",
  "cc_US":"LAPSED",
  "fid":"34520196"}]
}}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :

Hi Bruno,

Can you check counts? Is it possible that first page is only with
results
from collection that you sent request to so you assumed it returns only
results from single collection?

Thanks,
Emir

On 06.01.2016 14:33, Susheel Kumar wrote:

Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel



http://localhost:8983/solr/c1/select?q=id_type%3Ahello&wt=json&indent=true&collection=c1,c2

responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id":
"1","
id_type": "hello","_version_": 1522623395043213300},{"id":
"3","id_type":"
hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina 
wrote:

yes id value is unique in C1 and unique in C2.


id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?


To get proper results, the ids should be unique across both the
cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina  wrote:

Hi All,

Solr 5.4, Ubuntu

I thought it was simple to request across two collections 

Wildcard "?" ?

2015-10-21 Thread Bruno Mannina

Dear Solr-user,

I'm surprise to see in my SOLR 5.0 that the wildward ? replace
inevitably 1 character.

my request is:

title:magnet? AND tire?

 SOLR found only title with a character after magnet and tire but don't
found
title with only magnet AND tire


Do you know where can I tell to solr that ? wildcard means [0, 1]
character and not [1] character ?
Is it possible ?


Thanks a lot !

my field in my schema is defined like that:


   Field: title

Field-Type:
   org.apache.solr.schema.TextField
PI Gap:
   100

Flags:  Indexed Tokenized   Stored  Multivalued
Properties  y
y
y
y
Schema  y
y
y
y
Index   y
y
y


 *

   org.apache.solr.analysis.TokenizerChain

 *

   org.apache.solr.analysis.TokenizerChain




---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com


Re: Wildcard "?" ?

2015-10-21 Thread Bruno Mannina

title:/magnet.?/ doesn't work for me because solr answers:

|title = "Magnetic folding system"|

but thanks to give me the idea to use regexp !!!

Le 21/10/2015 18:46, Upayavira a écrit :

No, you cannot tell Solr to handle wildcards differently. However, you
can use regular expressions for searching:

title:/magnet.?/ should do it.

Upayavira

On Wed, Oct 21, 2015, at 11:35 AM, Bruno Mannina wrote:

Dear Solr-user,

I'm surprise to see in my SOLR 5.0 that the wildward ? replace
inevitably 1 character.

my request is:

title:magnet? AND tire?

   SOLR found only title with a character after magnet and tire but don't
found
title with only magnet AND tire


Do you know where can I tell to solr that ? wildcard means [0, 1]
character and not [1] character ?
Is it possible ?


Thanks a lot !

my field in my schema is defined like that:


 Field: title

Field-Type:
 org.apache.solr.schema.TextField
PI Gap:
 100

Flags:  Indexed Tokenized   Stored  Multivalued
Properties  y
y
y
y
Schema  y
y
y
y
Index   y
y
y


   *

 org.apache.solr.analysis.TokenizerChain

   *

 org.apache.solr.analysis.TokenizerChain




---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com






---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com


Re: Wildcard "?" ?

2015-10-22 Thread Bruno Mannina

Upayavira,

Thanks a lot for these information

Regards,
Bruno

Le 21/10/2015 19:24, Upayavira a écrit :

regexp will match the whole term. So, if you have stemming on, magnetic
may well stem to magnet, and that is the term against which the regexp
is executed.

If you want to do the regexp against the whole field, then you need to
do it against a string version of that field.

The process of using a regexp (and a wildcard for that matter) is:
  * search through the list of terms in your field for terms that match
  your regexp (uses an FST for speed)
  * search for documents that contain those resulting terms

Upayavira

On Wed, Oct 21, 2015, at 12:08 PM, Bruno Mannina wrote:

title:/magnet.?/ doesn't work for me because solr answers:

|title = "Magnetic folding system"|

but thanks to give me the idea to use regexp !!!

Le 21/10/2015 18:46, Upayavira a écrit :

No, you cannot tell Solr to handle wildcards differently. However, you
can use regular expressions for searching:

title:/magnet.?/ should do it.

Upayavira

On Wed, Oct 21, 2015, at 11:35 AM, Bruno Mannina wrote:

Dear Solr-user,

I'm surprise to see in my SOLR 5.0 that the wildward ? replace
inevitably 1 character.

my request is:

title:magnet? AND tire?

SOLR found only title with a character after magnet and tire but don't
found
title with only magnet AND tire


Do you know where can I tell to solr that ? wildcard means [0, 1]
character and not [1] character ?
Is it possible ?


Thanks a lot !

my field in my schema is defined like that:


  Field: title

Field-Type:
  org.apache.solr.schema.TextField
PI Gap:
  100

Flags:  Indexed Tokenized   Stored  Multivalued
Properties  y
y
y
y
Schema  y
y
y
y
Index   y
y
y


*

  org.apache.solr.analysis.TokenizerChain

*

  org.apache.solr.analysis.TokenizerChain




---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com



---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com



Solr 3.6, Highlight and multi words?

2015-03-29 Thread Bruno Mannina

Dear Solr User,

I try to work with highlight, it works well but only if I have only one
keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=tien,aben&fl=pn&f.aben.hl.snippets=5

Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



 

  (EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal body (10) made 
fromplastic  material
  , particularly for touring bike. #CMT#ADVANTAGE : #/CMT# The bicycle pedal has a pedal 
body made fromplastic

  
  

betweenplastic  tapes 3 and 3 having two heat fusion layers, and 
the twoplastic  tapes 3 and 3 are stuck

  
  

elements. A connecting element is formed as a hinge, a flexible foil or a 
flexibleplastic  part. #CMT#USE

  
  

  A bicycle handlebar grip includes an inner fiber layer and an 
outerplastic  layer. Thus, the fiber
handlebar grip, while theplastic  layer is soft and 
has an adjustable thickness to provide a comfortable
sensation to a user. In addition, theplastic  layer 
includes a holding portion coated on the outer surface
layer to enhance the combination strength between the fiber layer and 
theplastic  layer and to enhance

  






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com


Re: Solr 3.6, Highlight and multi words?

2015-03-29 Thread Bruno Mannina

Additional information, in my schema.xml, my field is defined like this:

 

May be it misses something? like termVectors



Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only 
one keyword in my query?!

If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=tien,aben&fl=pn&f.aben.hl.snippets=5 



Could you help me please to understand ? I read doc, google, without 
success...

so I post here...

my result is:



 

  (EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal 
body (10) made from<em>plastic</em> material
  , particularly for touring bike. #CMT#ADVANTAGE : #/CMT# 
The bicycle pedal has a pedal body made 
from<em>plastic</em>


  
  

   between<em>plastic</em>  tapes 3 and 3 having 
two heat fusion layers, and the two<em>plastic</em>  tapes 
3 and 3 are stuck


  
  

elements. A connecting element is formed as a hinge, a 
flexible foil or a flexible<em>plastic</em>  part. 
#CMT#USE


  
  

  A bicycle handlebar grip includes an inner fiber layer and 
an outer<em>plastic</em> layer. Thus, the fiber
handlebar grip, while the<em>plastic</em>  
layer is soft and has an adjustable thickness to provide a 
comfortable
sensation to a user. In addition, 
the<em>plastic</em>  layer includes a holding portion 
coated on the outer surface
layer to enhance the combination strength between the 
fiber layer and the<em>plastic</em>  layer and to 
enhance


  






---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com


Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina
Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?


regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only 
one keyword in my query?!

If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=tien,aben&fl=pn&f.aben.hl.snippets=5 



Could you help me please to understand ? I read doc, google, without 
success...

so I post here...

my result is:



 

  (EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal 
body (10) made from<em>plastic</em> material
  , particularly for touring bike. #CMT#ADVANTAGE : #/CMT# 
The bicycle pedal has a pedal body made 
from<em>plastic</em>


  
  

   between<em>plastic</em>  tapes 3 and 3 having 
two heat fusion layers, and the two<em>plastic</em>  tapes 
3 and 3 are stuck


  
  

elements. A connecting element is formed as a hinge, a 
flexible foil or a flexible<em>plastic</em>  part. 
#CMT#USE


  
  

  A bicycle handlebar grip includes an inner fiber layer and 
an outer<em>plastic</em> layer. Thus, the fiber
handlebar grip, while the<em>plastic</em>  
layer is soft and has an adjustable thickness to provide a 
comfortable
sensation to a user. In addition, 
the<em>plastic</em>  layer includes a holding portion 
coated on the outer surface
layer to enhance the combination strength between the 
fiber layer and the<em>plastic</em>  layer and to 
enhance


  






---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com





Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina

Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use "aben" as field in my query as you say in Answer 1.
it doesn't work if I use "ab" may be because "ab" field is a copyField 
for abfr, aben, abit, abpt


Concerning the 2., yes you have right it's not and but AND

I have this result:



  <em>Bicycle</em>  frame comprises holder, particularly for 
water bottle, where holder is connected


  #CMT# #/CMT# The<em>bicycle</em>  frame (7) comprises a holder 
(1), particularly for a water bottle
  . The holder is connected with the<em>bicycle</em>  frame by a 
screw (5), where a mounting element has a compensation
section which is made of an elastic material, particularly 
a<em>plastic</em>  material. The compensation section

  


So my last question is why I haven't  instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?

^^
2. Try removing the word "and" from the query.  There may be some interaction 
with a stop word filter.  If you want a phrase query, wrap it in quotes.

3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only
one keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29&version=2.2&start=0&row
s=10&indent=on&hl=true&hl.fl=tien,aben&fl=pn&f.aben.hl.snippets=5


Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



  
 
   (EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal
body (10) made from<em>plastic</em> material
   , particularly for touring bike. #CMT#ADVANTAGE : #/CMT#
The bicycle pedal has a pedal body made
from<em>plastic</em>
 
   
   
 
between<em>plastic</em>  tapes 3 and 3 having
two heat fusion layers, and the two<em>plastic</em>  tapes
3 and 3 are stuck
 
   
   
 
 elements. A connecting element is formed as a hinge, a
flexible foil or a flexible<em>plastic</em>  part.
#CMT#USE
 
   
   
 
   A bicycle handlebar grip includes an inner fiber layer and
an outer<em>plastic</em> layer. Thus, the fiber
 handlebar grip, while the<em>plastic</em>
layer is soft and has an adjustable thickness to provide a
comfortable
 sensation to a user. In addition,
the<em>plastic</em>  layer includes a holding portion
coated on the outer surface
 layer to enhance the combination strength between the
fiber layer and the<em>plastic</em>  layer and to
enhance
 
   


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*




Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina

ok for qf (i can't test now)

but concerning hl.simple.pre hl.simple.post I can define only one color no ?

in the sample solrconfig.xml there are several color,


  

  
  

  

How can I tell to solr to use these color instead of hl.simple.pre/post ?



Le 01/04/2015 20:58, Reitzel, Charles a écrit :

If you want to query on the field ab, you'll probably need to add it the qf 
parameter.

To control the highlighting markup, with the standard highlighter, use 
hl.simple.pre and hl.simple.post.

https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter


-Original Message-----
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 2:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use "aben" as field in my query as you say in Answer 1.
it doesn't work if I use "ab" may be because "ab" field is a copyField for 
abfr, aben, abit, abpt

Concerning the 2., yes you have right it's not and but AND

I have this result:


  
<em>Bicycle</em>  frame comprises holder, particularly for 
water bottle, where holder is connected
  
  
#CMT# #/CMT# The<em>bicycle</em>  frame (7) comprises a 
holder (1), particularly for a water bottle
. The holder is connected with the<em>bicycle</em>  frame by 
a screw (5), where a mounting element has a compensation
  section which is made of an elastic material, particularly 
a<em>plastic</em>  material. The compensation section
  



So my last question is why I haven't  instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?
 
^^ 2. Try removing the word "and" from the query.  There may be some interaction with a stop word filter.  If you want a phrase query, wrap it in quotes.


3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only
one keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29&version=2.2&start=0&ro
w
s=10&indent=on&hl=true&hl.fl=tien,aben&fl=pn&f.aben.hl.snippets=5


Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



   
  
(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal
body (10) made from<em>plastic</em> material
, particularly for touring bike. #CMT#ADVANTAGE : #/CMT#
The bicycle pedal has a pedal body made
from<em>plastic</em>
  


  
 between<em>plastic</em>  tapes 3 and 3
having two heat fusion layers, and the
two<em>plastic</em>  tapes
3 and 3 are stuck
  


  
  elements. A connecting element is formed as a hinge, a
flexible foil or a flexible<em>plastic</em>  part.
#CMT#USE
  


  
A bicycle handlebar grip includes an inner fiber layer
and an outer<em>plastic</em> layer. Thus, the fiber
  handlebar grip, while the<em>plastic</em>
layer is soft and has an adjustable thickness to provide a
comfortable
  sensation to a user. In addition,
the<em>plastic</em>  layer includes a holding portion
coated on the outer surface
  layer to enhance the combination strength between the
fiber layer and the<em>plastic</em>  layer and to
enhance
  


**
*** This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
**
***


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*




Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina

of course no prb charles, you already help me !

Le 01/04/2015 21:54, Reitzel, Charles a écrit :

Sorry, I've never tried highlighting in multiple colors...

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 3:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

ok for qf (i can't test now)

but concerning hl.simple.pre hl.simple.post I can define only one color no ?

in the sample solrconfig.xml there are several color,



  


  


How can I tell to solr to use these color instead of hl.simple.pre/post ?



Le 01/04/2015 20:58, Reitzel, Charles a écrit :

If you want to query on the field ab, you'll probably need to add it the qf 
parameter.

To control the highlighting markup, with the standard highlighter, use 
hl.simple.pre and hl.simple.post.

https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter


-Original Message-----
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 2:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use "aben" as field in my query as you say in Answer 1.
it doesn't work if I use "ab" may be because "ab" field is a copyField
for abfr, aben, abit, abpt

Concerning the 2., yes you have right it's not and but AND

I have this result:


   
 <em>Bicycle</em>  frame comprises holder, particularly for 
water bottle, where holder is connected
   
   
 #CMT# #/CMT# The<em>bicycle</em>  frame (7) comprises a 
holder (1), particularly for a water bottle
 . The holder is connected with the<em>bicycle</em>  frame 
by a screw (5), where a mounting element has a compensation
   section which is made of an elastic material, particularly 
a<em>plastic</em>  material. The compensation section
   
 


So my last question is why I haven't  instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?
  
^^ 2. Try removing the word "and" from the query.  There may be some interaction with a stop word filter.  If you want a phrase query, wrap it in quotes.


3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only
one keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29&version=2.2&start=0&r
o
w
s=10&indent=on&hl=true&hl.fl=tien,aben&fl=pn&f.aben.hl.snippets=5


Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:




   
 (EP2423092A1) #CMT# #/CMT# The bicycle pedal has a
pedal body (10) made from<em>plastic</em> material
 , particularly for touring bike. #CMT#ADVANTAGE :
#/CMT# The bicycle pedal has a pedal body made
from<em>plastic</em>
   
 
 
   
  between<em>plastic</em>  tapes 3 and 3
having two heat fusion layers, and the
two<em>plastic</em>  tapes
3 and 3 are stuck
   
 
 
   
   elements. A connecting element is formed as a hinge,
a flexible foil or a flexible<em>plastic</em>  part.
#CMT#USE
   
 
 
   
 A bicycle handlebar grip includes an inner fiber layer
and an outer<em>plastic</em> layer. Thus, the fiber
   handlebar grip, while the<em>plastic</em>
layer is soft and has an adjustable thickness to provide a
comfortable
   sensation to a user. In addition,
the<em>plastic</em>  layer includes a holding portion
coated on the outer surface
   layer to enhance the combination strength between the
fiber layer and the<em>plastic</em>  layer

Solr 5.0, defaultSearchField, defaultOperator ?

2015-04-17 Thread Bruno Mannina

Dear Solr users,

Since today I used SOLR 5.0 (I used solr 3.6) so i try to adapt my old
schema for solr 5.0.

I have two questions:
- how can I set the defaultSearchField ?
I don't want to use in the query the df tag  because I have a lot of
modification to do for that on my web project.

- how can I set the defaultOperator (and|or) ?

It seems that these "options" are now deprecated in SOLR 5.0 schema.

Thanks a lot for your comment,

Regards,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Solr 5.0, defaultSearchField, defaultOperator ?

2015-04-18 Thread Bruno Mannina

Thx Chris & Ahmet !

Le 17/04/2015 23:56, Chris Hostetter a écrit :

: df and q.op are the ones you are looking for.
: You can define them in defaults section.

specifically...

https://cwiki.apache.org/confluence/display/solr/InitParams+in+SolrConfig


:
: Ahmet
:
:
:
: On Friday, April 17, 2015 9:18 PM, Bruno Mannina  wrote:
: Dear Solr users,
:
: Since today I used SOLR 5.0 (I used solr 3.6) so i try to adapt my old
: schema for solr 5.0.
:
: I have two questions:
: - how can I set the defaultSearchField ?
: I don't want to use in the query the df tag  because I have a lot of
: modification to do for that on my web project.
:
: - how can I set the defaultOperator (and|or) ?
:
: It seems that these "options" are now deprecated in SOLR 5.0 schema.
:
: Thanks a lot for your comment,
:
: Regards,
: Bruno
:
: ---
: Ce courrier électronique ne contient aucun virus ou logiciel malveillant 
parce que la protection avast! Antivirus est active.
: http://www.avast.com
:

-Hoss
http://www.lucidworks.com/



---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Correspondance table ?

2015-04-20 Thread Bruno Mannina

Dear Solr Users,

Solr 5.0.0

I have actually around 90 000 000 docs in my solr, and I have a field
with one char which represents a category. i.e:
value = a, definition : nature and health
etc...
I have fews categories, around 15.

These definition categories can changed during years.

Can I use a file where I will have
a\tNature and Health
b\tComputer science
etc...

and instead of having the code letter in my json result solr, I will
have the definition ?
Only in the result.
The query will be done with the code letter.

I'm sure it's possible !

Additional question: is it possible to do that also with a big
correspondance file? around 5000 definitions?

Thanks for your help,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Correspondance table ?

2015-04-20 Thread Bruno Mannina

Hi Alex,

well ok but if I have a big table ? more than 10 000 entries ?
is it safe to do that client side ?

note:
I have one little table
but I have also 2 big tables for 2 other fields

Le 20/04/2015 10:57, Alexandre Rafalovitch a écrit :

The best place to do so is in the client software, since you are not
using it for search in any way. So, wherever you get your Solr's
response JSON/XML/etc, map it there.

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 20 April 2015 at 18:23, Bruno Mannina  wrote:

Dear Solr Users,

Solr 5.0.0

I have actually around 90 000 000 docs in my solr, and I have a field with
one char which represents a category. i.e:
value = a, definition : nature and health
etc...
I have fews categories, around 15.

These definition categories can changed during years.

Can I use a file where I will have
a\tNature and Health
b\tComputer science
etc...

and instead of having the code letter in my json result solr, I will have
the definition ?
Only in the result.
The query will be done with the code letter.

I'm sure it's possible !

Additional question: is it possible to do that also with a big
correspondance file? around 5000 definitions?

Thanks for your help,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Correspondance table ?

2015-04-20 Thread Bruno Mannina

Hi Jack,

ok, it's not for many millions of users, just max 100 by day.
it will be used on traditional "PC" and also on mobile clients.

Then, I need to do test to verify the possibility.

Thx

Le 20/04/2015 14:20, Jack Krupansky a écrit :

It depends on the specific nature of your clients. Is they in-house users,
like only dozens or hundreds, or is this a large web app with many millions
of users and with mobile clients as well as traditional "PC" clients?

If it feels too much to do in the client, then a middleware API service
layer could be the way to go. In any case, don't try to load too much work
onto the Solr server itself.

-- Jack Krupansky

On Mon, Apr 20, 2015 at 7:32 AM, Bruno Mannina  wrote:


Hi Alex,

well ok but if I have a big table ? more than 10 000 entries ?
is it safe to do that client side ?

note:
I have one little table
but I have also 2 big tables for 2 other fields


Le 20/04/2015 10:57, Alexandre Rafalovitch a écrit :


The best place to do so is in the client software, since you are not
using it for search in any way. So, wherever you get your Solr's
response JSON/XML/etc, map it there.

Regards,
 Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 20 April 2015 at 18:23, Bruno Mannina  wrote:


Dear Solr Users,

Solr 5.0.0

I have actually around 90 000 000 docs in my solr, and I have a field
with
one char which represents a category. i.e:
value = a, definition : nature and health
etc...
I have fews categories, around 15.

These definition categories can changed during years.

Can I use a file where I will have
a\tNature and Health
b\tComputer science
etc...

and instead of having the code letter in my json result solr, I will have
the definition ?
Only in the result.
The query will be done with the code letter.

I'm sure it's possible !

Additional question: is it possible to do that also with a big
correspondance file? around 5000 definitions?

Thanks for your help,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com



---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Solr5.0.0, do a commit alone ?

2015-04-21 Thread Bruno Mannina

Dear Solr Users,

With Solr3.6, when I want to force a commit without giving data, I do:
java -jar post.jar

Now with Solr5.0.0, I use
bin/post .

but it do not accept to do a commit if I don't give a data directory. ie:
bin/post -c mydb -commit yes

I want to do that because I have a file with delete action.
Each line in this file contains one ref to delete
bin/post -c mydb -commit no -d "..."
So I would like to do the commit only after running my file with a
command line

bin/post -c mydb -commit yes (without data) is not accepted by post

Thanks,
Sincerely,
Bruno




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina

Dear Solr Community,

I have a recent computer with 8Go RAM, I installed Ubuntu 14.04 and SOLR 
5.0, Java 7

This is a brand new installation.

all work fine but I would like to increase the JAVA_MEM_SOLR (40% of 
total RAM available).

So I edit the bin/solr.in.sh

# Increase Java Min/Max Heap as needed to support your indexing / query 
needs

SOLR_JAVA_MEM="-Xms3g –Xmx3g -XX:MaxPermSize=512m -XX:PermSize=512m"

but with this param, the solr server can't be start, I use:
bin/solr start

Do you have an idea of the problem ?

Thanks a lot for your comment,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina

Dear Solr Users,

I have a brand new computer where I installed Ubuntu 14.04, 8Go RAM,
SOLR 5.0, Java 7
I indexed 92 000 000 docs (little text file ~2ko each)
I have around 30 fields

All work fine but each Tuesday I need to delete some docs inside, so I
create a batch file
with inside line like this:
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:58644"
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:162882"
..
.
/home/solr/solr-5.0.0/bin/post -c docdb  -commit yes -d
"f1:2868668"

my f1 field is my key field. It is unique.

But if my file contains more than one or two hundreds line, my solr
shutdown.
Two hundreds line shutdown always solr 5.0.
I have no error in my console, just Solr can't be reach on the port 8983.

Is exists a variable that I must increase to disable this error ?

On my old solr 3.6, I don't use the same line to delete document, I use:
java -jar -Ddata=args -Dcommit=no  post.jar
"113422"

You can see that I use directly  not , and my schema between
solr3.6 and solr5.0 is almost the same.
I have just some more fields.
why this method do not work now ?

Thanks a lot,
Bruno


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina

ok I have this OOM error in the log file ...

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs"
#   Executing /bin/sh -c "/home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs"...

Running OOM killer script for process 28233 for Solr on port 8983
Killed process 28233

I try in few minutes to increase the

formdataUploadLimitInKB

and I will tell you the result.

Le 04/05/2015 14:58, Shawn Heisey a écrit :

On 5/4/2015 3:19 AM, Bruno Mannina wrote:

All work fine but each Tuesday I need to delete some docs inside, so I
create a batch file
with inside line like this:
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:58644"
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:162882"
..
.
/home/solr/solr-5.0.0/bin/post -c docdb  -commit yes -d
"f1:2868668"

my f1 field is my key field. It is unique.

But if my file contains more than one or two hundreds line, my solr
shutdown.
Two hundreds line shutdown always solr 5.0.
I have no error in my console, just Solr can't be reach on the port 8983.

Is exists a variable that I must increase to disable this error ?

As far as I know, the only limit that can affect that is the maximum
post size.  Current versions of Solr default to a 2MB max post size,
using the formdataUploadLimitInKB attribute on the requestParsers
element in solrconfig.xml, which defaults to 2048.

Even if that limit is exceeded by a request, it should not crash Solr,
it should simply log an error and ignore the request.  It would be a bug
if Solr does crash.

What happens if you increase that limit?  Are you seeing any error
messages in the Solr logfile when you send that delete request?

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina

I increase the

formdataUploadLimitInKB

to 2048000 and the problem is the same, same error

an idea ?



Le 04/05/2015 16:38, Bruno Mannina a écrit :

ok I have this OOM error in the log file ...

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs"
#   Executing /bin/sh -c "/home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs"...

Running OOM killer script for process 28233 for Solr on port 8983
Killed process 28233

I try in few minutes to increase the

formdataUploadLimitInKB

and I will tell you the result.

Le 04/05/2015 14:58, Shawn Heisey a écrit :

On 5/4/2015 3:19 AM, Bruno Mannina wrote:

All work fine but each Tuesday I need to delete some docs inside, so I
create a batch file
with inside line like this:
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:58644"
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
"f1:162882"
..
.
/home/solr/solr-5.0.0/bin/post -c docdb  -commit yes -d
"f1:2868668"

my f1 field is my key field. It is unique.

But if my file contains more than one or two hundreds line, my solr
shutdown.
Two hundreds line shutdown always solr 5.0.
I have no error in my console, just Solr can't be reach on the port 
8983.


Is exists a variable that I must increase to disable this error ?

As far as I know, the only limit that can affect that is the maximum
post size.  Current versions of Solr default to a 2MB max post size,
using the formdataUploadLimitInKB attribute on the requestParsers
element in solrconfig.xml, which defaults to 2048.

Even if that limit is exceeded by a request, it should not crash Solr,
it should simply log an error and ignore the request.  It would be a bug
if Solr does crash.

What happens if you increase that limit?  Are you seeing any error
messages in the Solr logfile when you send that delete request?

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina

Yes ! it works !!!

Scott perfect 

For my config 3g do not work, but 2g yes !

Thanks

Le 04/05/2015 16:50, Scott Dawson a écrit :

Bruno,
You have the wrong kind of dash (a long dash) in front of the Xmx flag.
Could that be causing a problem?

Regards,
Scott

On Mon, May 4, 2015 at 5:06 AM, Bruno Mannina  wrote:


Dear Solr Community,

I have a recent computer with 8Go RAM, I installed Ubuntu 14.04 and SOLR
5.0, Java 7
This is a brand new installation.

all work fine but I would like to increase the JAVA_MEM_SOLR (40% of total
RAM available).
So I edit the bin/solr.in.sh

# Increase Java Min/Max Heap as needed to support your indexing / query
needs
SOLR_JAVA_MEM="-Xms3g –Xmx3g -XX:MaxPermSize=512m -XX:PermSize=512m"

but with this param, the solr server can't be start, I use:
bin/solr start

Do you have an idea of the problem ?

Thanks a lot for your comment,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina
Yes it was that ! I increased the SOLR_JAVA_MEM to 2g (with 8Go Ram i do 
more, 3g fail to run solr on my brand new computer)


thanks !

Le 04/05/2015 17:03, Shawn Heisey a écrit :

On 5/4/2015 8:38 AM, Bruno Mannina wrote:

ok I have this OOM error in the log file ...

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/home/solr/solr-5.0.0/bin/oom_solr.sh
8983/home/solr/solr-5.0.0/server/logs"
#   Executing /bin/sh -c "/home/solr/solr-5.0.0/bin/oom_solr.sh
8983/home/solr/solr-5.0.0/server/logs"...
Running OOM killer script for process 28233 for Solr on port 8983

Out Of Memory errors are a completely different problem.  Solr behavior
is completely unpredictable after an OutOfMemoryError exception, so the
5.0 install includes a script to run on OOME that kills Solr.  It's the
only safe way to handle that problem.

Your Solr install is not being given enough Java heap memory for what it
is being asked to do.  You need to increase the heap size for Solr.  If
you look at the admin UI for Solr in a web browser, you can see what the
max heap is set to ... on a default 5.0 install running Solr with
"bin/solr" the max heap will be 512m ... which is VERY small.  Try using
bin/solr with the -m option, set to something like 2g (for 2 gigabytes
of heap).

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina

Shaun thanks a lot for this comment,

So, I have this information, no information about 32 or 64 bits...

solr@linux:~$ java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK Server VM (build 24.79-b02, mixed mode)
solr@linux:~$

solr@linux:~$ uname -a
Linux linux 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:11:46 UTC 
2015 i686 i686 i686 GNU/Linux

solr@linux:~$

I need to install a new version of Java ? I just install my ubuntu since 
one week :)

updates are up to date.

Le 04/05/2015 17:23, Shawn Heisey a écrit :

On 5/4/2015 9:09 AM, Bruno Mannina wrote:

Yes ! it works !!!

Scott perfect 

For my config 3g do not work, but 2g yes !

If you can't start Solr with a 3g heap, chances are that you are running
a 32-bit version of Java.  A 32-bit Java cannot go above a 2GB heap.  A
64-bit JVM requires a 64-bit operating system, which requires a 64-bit
CPU.  Since 2006, Intel has only been providing 64-bit chips to the
consumer market, and getting a 32-bit chip in a new computer has gotten
extremely difficult.  The server market has had only 64-bit chips from
Intel since 2005.  I am not sure what those dates look like for AMD
chips, but it is probably similar.

Running "java -version" should give you enough information to determine
whether your Java is 32-bit or 64-bit.  This is the output from that
command on a Linux machine that is running a 64-bit JVM from Oracle:

root@idxa4:~# java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

If you are running Solr on Linux, then the output of "uname -a" should
tell you whether your operating system is 32 or 64 bit.

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina

ok, I note all these information, thanks !

I will update if it's needed. 2go seems to be ok.

Le 04/05/2015 18:46, Shawn Heisey a écrit :

On 5/4/2015 10:28 AM, Bruno Mannina wrote:

solr@linux:~$ java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK Server VM (build 24.79-b02, mixed mode)
solr@linux:~$

solr@linux:~$ uname -a
Linux linux 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:11:46 UTC
2015 i686 i686 i686 GNU/Linux
solr@linux:~$

Both Linux and Java are 32-bit.  For linux, I know this because your
arch is "i686", which means it is coded for a newer generation 32-bit
CPU.  You can't be running a 64-bit Java, and the Java version confirms
that because it doesn't contain "64-bit".

Run this command:

cat /proc/cpuinfo

If the "flags" on the CPU contain the string "lm" (long mode), then your
CPU is capable of running a 64-bit (sometimes known as amd64 or x86_64)
version of Linux, and a 64-bit Java.  You will need to re-install both
Linux and Java to get this capability.

Here's "uname -a" from a 64-bit version of Ubuntu:

Linux lb1 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:08:34 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux

Since you are running 5.0, I would recommend Oracle Java 8.

http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina

Dear Solr users,

I have a problem with SOLR5.0 (and not on SOLR3.6)

What kind of field can I use for my uniqueKey field named "code" if I
want it case insensitive ?

On SOLR3.6, I defined a string_ci field like this:



  
  





and it works fine.
- If I add a document with the same code then the doc is updated.
- If I search a document with lower or upper case, the doc is found


But in SOLR5.0, if I use this definition then :
- I can search in lower/upper case, it's OK
- BUT if I add a doc with the same code then the doc is added not updated !?

I read that the problem could be that the type of field is tokenized
instead of use a string.

If I change from string_ci to string, then
- I lost the possibility to search in lower/upper case
- but it works fine to update the doc.

So, could you help me to find the right field type to:

- search in case insensitive
- if I add a document with the same code, the old doc will be updated

Thanks a lot !


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina

Hello Chris,

yes I confirm on my SOLR3.6 it works fine since several years, and each 
doc added with same code is updated not added.


To be more clear, I receive docs with a field name "pn" and it's the 
uniqueKey, and it always in uppercase


so I must define in my schema.xml

required="true" stored="true"/>
indexed="true" stored="false"/>

...
   id
...
  

but the application that use solr already exists so it requests with pn 
field not id, i cannot change that.
and in each docs I receive, there is not id field, just pn field, and  i 
cannot also change that.


so there is a problem no ? I must import a id field and request a pn 
field, but I have a pn field only for import...



Le 05/05/2015 01:00, Chris Hostetter a écrit :

: On SOLR3.6, I defined a string_ci field like this:
:
: 
: 
:   
:   
: 
: 
:
: 


I'm really suprised that field would have worked for you (reliably) as a
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going
back to Solr 1.x) been to use a copyField to create a case insensitive
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc
with id "foo" overwrites a doc with id "FOO" then the only reliable way to
make something like that work is to do the lowercassing in an
UpdateProcessor to ensure it happens *before* the docs are distributed to
the correct shard, and so the correct existing doc is overwritten (even if
you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-06 Thread Bruno Mannina

Yes thanks it's now for me too.

Daniel, my pn is always in uppercase and I index them always in uppercase.
the problem (solved now after all your answers, thanks) was the request, 
if users

requests with lowercase then solr reply no result and it was not good.

but now the problem is solved, I changed in my source file the name pn 
field to id

and in my schema I use a copy field named pn and it works perfectly.

Thanks a lot !!!

Le 06/05/2015 09:44, Daniel Collins a écrit :

Ah, I remember seeing this when we first started using Solr (which was 4.0
because we needed Solr Cloud), I never got around to filing an issue for it
(oops!), but we have a note in our schema to leave the key field a normal
string (like Bruno we had tried to lowercase it which failed).
We didn't really know Solr in those days, and hadn't really thought about
it since then, but Hoss' and Erick's explanations make perfect sense now!

Since shard routing is (basically) done on hashes of the unique key, if I
have 2 documents which are the "same", but have values "HELLO" and "hello",
they might well hash to completely different shards, so the update
logistics would be horrible.

Bruno, why do you need to lowercase at all then?  You said in your example,
that your client application always supplies "pn" and it is always
uppercase, so presumably all adds/updates could be done directly on that
field (as a normal string with no lowercasing).  Where does the case
insensitivity come in, is that only for searching?  If so couldn't you add
a search field (called id), and update your app to search using that (or
make that your default search field, I guess it depends if your calling app
explicitly uses the pn field name in its searches).


On 6 May 2015 at 01:55, Erick Erickson  wrote:


Well, "working fine" may be a bit of an overstatement. That has never
been officially supported, so it "just happened" to work in 3.6.

As Chris points out, if you're using SolrCloud then this will _not_
work as routing happens early in the process, i.e. before the analysis
chain gets the token so various copies of the doc will exist on
different shards.

Best,
Erick

On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina  wrote:

Hello Chris,

yes I confirm on my SOLR3.6 it works fine since several years, and each

doc

added with same code is updated not added.

To be more clear, I receive docs with a field name "pn" and it's the
uniqueKey, and it always in uppercase

so I must define in my schema.xml

 
 
indexed="true"

stored="false"/>
...
id
...
   

but the application that use solr already exists so it requests with pn
field not id, i cannot change that.
and in each docs I receive, there is not id field, just pn field, and  i
cannot also change that.

so there is a problem no ? I must import a id field and request a pn

field,

but I have a pn field only for import...



Le 05/05/2015 01:00, Chris Hostetter a écrit :

: On SOLR3.6, I defined a string_ci field like this:
:
: 
: 
:   
:   
: 
: 
:
: 


I'm really suprised that field would have worked for you (reliably) as a
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going
back to Solr 1.x) been to use a copyField to create a case insensitive
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc
with id "foo" overwrites a doc with id "FOO" then the only reliable way

to

make something like that work is to do the lowercassing in an
UpdateProcessor to ensure it happens *before* the docs are distributed

to

the correct shard, and so the correct existing doc is overwritten (even

if

you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



How to index 20 000 files with a command line ?

2015-05-29 Thread Bruno Mannina

Dear Solr Users,

Habitualy i use this command line to index my files:
>bin/post -c hbl /data/hbl-201522/*.xml

but today I have a big update, so there are 20 000 xml files (each files
1kohttp://www.avast.com



Re: How to index 20 000 files with a command line ?

2015-05-29 Thread Bruno Mannina

oh yes like this:

 find  /data/hbl-201522/-name  "*.xml"  -exec  bin/post -c hbl  {}  \;

?

Le 29/05/2015 14:15, Sergey Shvets a écrit :

Hello Bruno,

You can use find command with exec attribute.

regards
  Sergey

Friday, May 29, 2015, 3:11:37 PM, you wrote:

Dear Solr Users,

Habitualy i use this command line to index my files:
  >bin/post -c hbl /data/hbl-201522/*.xml

but today I have a big update, so there are 20 000 xml files (each files
1kohttp://www.avast.com








---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com


Help for a field in my schema ?

2015-05-29 Thread Bruno Mannina

Dear Solr-Users,

(SOLR 5.0 Ubuntu)

I have xml files with tags like this
claimXXYYY

where XX is a language code like FR EN DE PT etc... (I don't know the
number of language code I can have)
and YYY is a number [1..999]

i.e.:
claimen1
claimen2
claimen3
claimfr1
claimfr2
claimfr3

I would like to define fields named:
*claimen* equal to claimenYYY (EN language, all numbers, indexed=true,
stored=true) (search needed and must be displayed)
*claim *equal to all claimXXYYY (all languages, all numbers,
indexed=true, stored false) (search not needed but must be displayed)

Is it possible to have these 2 fields ?

Could you help me to declare them in my schema.xml ?

Thanks a lot for your help !

Bruno



---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com


Possible or not ?

2015-06-05 Thread Bruno Mannina

Dear Solr Users,

I would like to post  1 000 000 records (1 records = 1 files) in one shoot ?
and do the commit and the end.

Is it possible to do that ?

I've several directories with each 20 000 files inside.
I would like to do:
bin/post -c mydb /DATA

under DATA I have
/DATA/1/*.xml (20 000 files)
/DATA/2/*.xml (20 000 files)
/DATA/3/*.xml (20 000 files)

/DATA/50/*.xml (20 000 files)

Actually, I post 5 directories in one time (it takes around 1h30 for 100
000 records/files)

But it's Friday and I would like to run it during the W.E. alone.

Thanks for your comment,

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
https://www.avast.com/antivirus



Re: Possible or not ?

2015-06-05 Thread Bruno Mannina

Hi Alessandro,

I'm actually on my dev' computer, so I would like to post 1 000 000 xml 
file (with a structure defined in my schema.xml)


I have already import 1 000 000 xml files by using
bin/post -c mydb /DATA0/1 /DATA0/2 /DATA0/3 /DATA0/4 /DATA0/5
where /DATA0/X contains 20 000 xml files (I do it 20 times by just 
changing X from 1 to 50)


I would like to do now
bin/post -c mydb /DATA1

I would like to know If my SOLR5 will run fine and no provide an memory 
error because there are too many files

in one post without doing a commit?

The commit will be done at the end of 1 000 000.

Is it ok ?


Le 05/06/2015 16:59, Alessandro Benedetti a écrit :

Hi Bruno,
I can not see what is your challenge.
Of course you can index your data in the flavour you want and do a commit
whenever you want…
Are those xml Solr xml ?
If not you would need to use the DIH, the extract update handler or any
custom Indexer application.
Maybe I missed your point…
Give me more details please !

Cheers

2015-06-05 15:41 GMT+01:00 Bruno Mannina :


Dear Solr Users,

I would like to post  1 000 000 records (1 records = 1 files) in one shoot
?
and do the commit and the end.

Is it possible to do that ?

I've several directories with each 20 000 files inside.
I would like to do:
bin/post -c mydb /DATA

under DATA I have
/DATA/1/*.xml (20 000 files)
/DATA/2/*.xml (20 000 files)
/DATA/3/*.xml (20 000 files)

/DATA/50/*.xml (20 000 files)

Actually, I post 5 directories in one time (it takes around 1h30 for 100
000 records/files)

But it's Friday and I would like to run it during the W.E. alone.

Thanks for your comment,

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
https://www.avast.com/antivirus







---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
https://www.avast.com/antivirus



Re: Possible or not ?

2015-06-05 Thread Bruno Mannina

Ok thanks for these information !

Le 05/06/2015 17:37, Erick Erickson a écrit :

Picking up on Alessandro's point. While you can post all these docs
and commit at the end, unless you do a hard commit (
openSearcher=true or false doesn't matter), then if your server should
abnormally terminate for _any_ reason, all these docs will be
replayed on startup from the transaction log.

I'll also echo Alessandro's point that I don't see the advantage of this.
Personally I'd set my hard commit interval with openSearcher=false
to something like 6 (60 seconds it's in milliseconds) and forget
about it. You're not imposing  much extra load on the system, you're
durably saving your progress, you're avoiding really, really, really
long restarts if your server should stop for some reason.

If you don't want the docs to be _visible_ for searches, be sure your
autocommit has openSearcer set to false and disable soft commits
(set the interval to -1 or remove it from your solrconfig).

Best,
Erick

On Fri, Jun 5, 2015 at 8:21 AM, Alessandro Benedetti
 wrote:

I can not see any problem in that, but talking about commits I would like
to make a difference between "Hard" and "Soft" .

Hard commit -> durability
Soft commit -> visibility

I suggest you this interesting reading :
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
It's an old interesting Erick post.

It explains you better what are the differences between different commit
types.

I would put you in this scenario :

Heavy (bulk) indexing

The assumption here is that you’re interested in getting lots of data to
the index as quickly as possible for search sometime in the future. I’m
thinking original loads of a data source etc.

- Set your soft commit interval quite long. As in 10 minutes or even
longer (-1 for no soft commits at all). *Soft commit is about
visibility, *and my assumption here is that bulk indexing isn’t about
near real time searching so don’t do the extra work of opening any kind of
searcher.
- Set your hard commit intervals to 15 seconds, openSearcher=false.
Again the assumption is that you’re going to be just blasting data at Solr.
The worst case here is that you restart your system and have to replay 15
seconds or so of data from your tlog. If your system is bouncing up and
down more often than that, fix the reason for that first.
- Only after you’ve tried the simple things should you consider
refinements, they’re usually only required in unusual circumstances. But
they include:
   - Turning off the tlog completely for the bulk-load operation
   - Indexing offline with some kind of map-reduce process
   - Only having a leader per shard, no replicas for the load, then
   turning on replicas later and letting them do old-style replication to
   catch up. Note that this is automatic, if the node discovers it is “too
   far” out of sync with the leader, it initiates an old-style replication.
   After it has caught up, it’ll get documents as they’re indexed to the
   leader and keep its own tlog.
   - etc.



Actually you could do the commit only at the end, but I can not see any
advantage in that.
I suggest you to play with auto hard/soft commit config and get a better
idea of the situation !

Cheers

2015-06-05 16:08 GMT+01:00 Bruno Mannina :


Hi Alessandro,

I'm actually on my dev' computer, so I would like to post 1 000 000 xml
file (with a structure defined in my schema.xml)

I have already import 1 000 000 xml files by using
bin/post -c mydb /DATA0/1 /DATA0/2 /DATA0/3 /DATA0/4 /DATA0/5
where /DATA0/X contains 20 000 xml files (I do it 20 times by just
changing X from 1 to 50)

I would like to do now
bin/post -c mydb /DATA1

I would like to know If my SOLR5 will run fine and no provide an memory
error because there are too many files
in one post without doing a commit?

The commit will be done at the end of 1 000 000.

Is it ok ?



Le 05/06/2015 16:59, Alessandro Benedetti a écrit :


Hi Bruno,
I can not see what is your challenge.
Of course you can index your data in the flavour you want and do a commit
whenever you want…
Are those xml Solr xml ?
If not you would need to use the DIH, the extract update handler or any
custom Indexer application.
Maybe I missed your point…
Give me more details please !

Cheers

2015-06-05 15:41 GMT+01:00 Bruno Mannina :

  Dear Solr Users,

I would like to post  1 000 000 records (1 records = 1 files) in one
shoot
?
and do the commit and the end.

Is it possible to do that ?

I've several directories with each 20 000 files inside.
I would like to do:
bin/post -c mydb /DATA

under DATA I have
/DATA/1/*.xml (20 000 files)
/DATA/2/*.xml (20 000 files)
/DATA/3/*.xml (20 000 files)

/DATA/50/*.xml (20 000 files)

Actually, I post 5 directories in one time (it takes around 1h30 for 

Re: Possible or not ?

2015-06-05 Thread Bruno Mannina

Thanks for the link,

So, I launch this post, I will see on Monday if it will ok :)

Le 05/06/2015 17:21, Alessandro Benedetti a écrit :

I can not see any problem in that, but talking about commits I would like
to make a difference between "Hard" and "Soft" .

Hard commit -> durability
Soft commit -> visibility

I suggest you this interesting reading :
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
It's an old interesting Erick post.

It explains you better what are the differences between different commit
types.

I would put you in this scenario :

Heavy (bulk) indexing

The assumption here is that you’re interested in getting lots of data to
the index as quickly as possible for search sometime in the future. I’m
thinking original loads of a data source etc.

- Set your soft commit interval quite long. As in 10 minutes or even
longer (-1 for no soft commits at all). *Soft commit is about
visibility, *and my assumption here is that bulk indexing isn’t about
near real time searching so don’t do the extra work of opening any kind of
searcher.
- Set your hard commit intervals to 15 seconds, openSearcher=false.
Again the assumption is that you’re going to be just blasting data at Solr.
The worst case here is that you restart your system and have to replay 15
seconds or so of data from your tlog. If your system is bouncing up and
down more often than that, fix the reason for that first.
- Only after you’ve tried the simple things should you consider
refinements, they’re usually only required in unusual circumstances. But
they include:
   - Turning off the tlog completely for the bulk-load operation
   - Indexing offline with some kind of map-reduce process
   - Only having a leader per shard, no replicas for the load, then
   turning on replicas later and letting them do old-style replication to
   catch up. Note that this is automatic, if the node discovers it is “too
   far” out of sync with the leader, it initiates an old-style replication.
   After it has caught up, it’ll get documents as they’re indexed to the
   leader and keep its own tlog.
   - etc.



Actually you could do the commit only at the end, but I can not see any
advantage in that.
I suggest you to play with auto hard/soft commit config and get a better
idea of the situation !

Cheers

2015-06-05 16:08 GMT+01:00 Bruno Mannina :


Hi Alessandro,

I'm actually on my dev' computer, so I would like to post 1 000 000 xml
file (with a structure defined in my schema.xml)

I have already import 1 000 000 xml files by using
bin/post -c mydb /DATA0/1 /DATA0/2 /DATA0/3 /DATA0/4 /DATA0/5
where /DATA0/X contains 20 000 xml files (I do it 20 times by just
changing X from 1 to 50)

I would like to do now
bin/post -c mydb /DATA1

I would like to know If my SOLR5 will run fine and no provide an memory
error because there are too many files
in one post without doing a commit?

The commit will be done at the end of 1 000 000.

Is it ok ?



Le 05/06/2015 16:59, Alessandro Benedetti a écrit :


Hi Bruno,
I can not see what is your challenge.
Of course you can index your data in the flavour you want and do a commit
whenever you want…
Are those xml Solr xml ?
If not you would need to use the DIH, the extract update handler or any
custom Indexer application.
Maybe I missed your point…
Give me more details please !

Cheers

2015-06-05 15:41 GMT+01:00 Bruno Mannina :

  Dear Solr Users,

I would like to post  1 000 000 records (1 records = 1 files) in one
shoot
?
and do the commit and the end.

Is it possible to do that ?

I've several directories with each 20 000 files inside.
I would like to do:
bin/post -c mydb /DATA

under DATA I have
/DATA/1/*.xml (20 000 files)
/DATA/2/*.xml (20 000 files)
/DATA/3/*.xml (20 000 files)

/DATA/50/*.xml (20 000 files)

Actually, I post 5 directories in one time (it takes around 1h30 for 100
000 records/files)

But it's Friday and I would like to run it during the W.E. alone.

Thanks for your comment,

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
https://www.avast.com/antivirus




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
https://www.avast.com/antivirus







---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
https://www.avast.com/antivirus



How to index text field with html entities ?

2016-07-29 Thread Bruno Mannina

Dear Solr User,

Solr 5.0.1

I have several xml files that contains html entities in some fields.

I have a author field (english text) with this kind of text:

Brown & Gammon

If I set my field like this:

Brown & Gammon

Solr generates error "Undeclared general entity"

if I add CDATA like this:



it seems that I can't search with the &

au:"brown & gammon"

Could you help me to find the right syntax ?

Thanks a lot,

Bruno




---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Re: How to index text field with html entities ?

2016-07-29 Thread Bruno Mannina

Hi Chris,

Thanks for your answer, and I add a little thing,

after checking my log it seems that it concerns only some html entities.
No problem with & but I have problem with:

ü
“
etc...

I will check your answer to find a solution,

Thanks !

Le 29/07/2016 à 23:58, Chris Hostetter a écrit :

: I have several xml files that contains html entities in some fields.

...

: If I set my field like this:
:
: Brown & Gammon
:
: Solr generates error "Undeclared general entity"

...because that's not valid XML...

: if I add CDATA like this:
:
: 
:
: it seems that I can't search with the &

...because that is valid xml, and tells solr you want the literal string
"Brown & Gammon" to be indexed -- given a typical analyzer you are
probably getting either "&" or "amp" as a term in your index.

: Could you help me to find the right syntax ?

the client code you are using for indexing can either "parse" these HTML
snippets using an HTML parser, and then send solr the *real* string you
want to index, or you can configure solr with something like
HTMLStripFieldUpdateProcessorFactory (if you want both the indexed form
and the stored form to be plain text) or HTMLStripCharFilterFactory (if
you wnat to preserve the html markup in the stored value, but strip it as
part of the analysis chain for indexing.


http://lucene.apache.org/solr/6_1_0/solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html
http://lucene.apache.org/core/6_1_0/analyzers-common/org/apache/lucene/analysis/charfilter/HTMLStripCharFilterFactory.html


-Hoss
http://www.lucidworks.com/




---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Re: How to index text field with html entities ?

2016-07-30 Thread Bruno Mannina

Thanks Shawn for these precisions

Le 30/07/2016 à 00:43, Shawn Heisey a écrit :

On 7/29/2016 4:05 PM, Bruno Mannina wrote:

after checking my log it seems that it concerns only some html entities.
No problem with & but I have problem with:

ü
“
etc...

Those are valid *HTML* entities, but they are not valid *XML* entities.
The list of entities that are valid in XML is quite short -- there are
only five of them.

https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML

When Solr processes XML, it is only going to convert entities that are
valid for XML -- the five already mentioned.  It will fail on the other
247 entities that are only valid for HTML.

If you are seeing the problem with & (which is one of the five valid
XML entities) then we'll need the Solr version and the full error
message/stacktrace from the solr logfile.

Thanks,
Shawn





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Strange error when I try to copy....

2016-09-09 Thread Bruno Mannina

Dear Solr Users,

I use since several years SOLR and since two weeks, I have a problem
when I try to copy my solr index.

My solr index is around 180Go (~100 000 000 docs, 1 doc ~ 3ko)

My method to save my index every Sunday:

- I stop SOLR 5.4 on Ubuntu 14.04LTS - 16Go - i3-2120 CPU @ 3.30Ghz

- I do a simple directory copy /data to my HDD backup (from 2To SATA to
2To SATA directly connected to the Mothercard).

All files are copied fine but one not ! the biggest (~65Go) failed.

I have the message : "Error splicing file: Input/output error"

I tried also on windows (I have a dualboot), I have "redondance error".

I check my HDD, no error, I check the file "_k46.fdt" no error, I can
delete docs, add docs, my database can be reach and works fine.

Is someone have an idea to backup my database ? or why I have this error ?

Many thanks for your help,

Sincerely,

Bruno





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


Re: Strange error when I try to copy....

2016-09-09 Thread Bruno Mannina

Le 09/09/2016 à 17:57, Shawn Heisey a écrit :

On 9/8/2016 9:41 AM, Bruno Mannina wrote:

- I stop SOLR 5.4 on Ubuntu 14.04LTS - 16Go - i3-2120 CPU @ 3.30Ghz

- I do a simple directory copy /data to my HDD backup (from 2To SATA
to 2To SATA directly connected to the Mothercard).

All files are copied fine but one not ! the biggest (~65Go) failed.

I have the message : "Error splicing file: Input/output error"

This isn't a Solr issue, which is easy to determine by the fact that
you've stopped Solr and it's not even running.  It's a problem with the
filesystem, probably the destination filesystem.

The most common reason that I have found for this error is a destination
filesystem that is incapable of holding a large file -- which can happen
when the disk is formatted fat32 instead of ntfs or a Linux filesystem.
You can have a 2TB filesystem with fat32, but no files larger than 4GB
-- so your 65GB file won't fit.

I think you're going to need to reformat that external drive with
another filesystem.  If you choose NTFS, you'll be able to use the disk
on either Linux or Windows.

Thanks,
Shawn



Hi Shawn,

First thanks for your answer, effectively it's a little bit clear.
Tonight I will check the file system of my hdd.

And sorry for this question out of solr subject.

Cdlt,
Bruno



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Solr 5.4.0: Colored Highlight and multi-value field ?

2017-10-03 Thread Bruno Mannina
Dear all,



Is it possible to have a colored highlight in a multi-value field ?



I’m succeed to do it on a textfield but not in a multi-value field, then
SOLR takes hl.simple.pre / hl.simple.post as tag.



Thanks a lot for your help,



Cordialement, Best Regards

Bruno Mannina

 <http://www.matheo-software.com> www.matheo-software.com

 <http://www.patent-pulse.com> www.patent-pulse.com

Tél. +33 0 970 738 743

Mob. +33 0 634 421 817

 <https://www.facebook.com/PatentPulse> facebook (1)
<https://twitter.com/matheosoftware> 1425551717
<https://www.linkedin.com/company/matheo-software> 1425551737
<https://www.youtube.com/user/MatheoSoftware> 1425551760





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


RE: Solr 5.4.0: Colored Highlight and multi-value field ?

2017-10-06 Thread Bruno Mannina
Hi Erik,

Sorry for the late reply, I wasn't in my office this week...

So, I give more information:

* IC is a multi-value field defined like this:


* The request I use (i.e):
http://my_host/solr/collection/select?
q=ic:(A63C10* OR G06F22/086)
&start=0
&rows=10
&wt=json
&indent=true
&sort=pd+desc
&fl=*
// HighLight
&hl=true
&hl.fl=ti,ab,ic,inc,cpc,apc
&hl.simple.pre=
&hl.simple.post=
&hl.fragmentsBuilder=colored
&hl.useFastVectorHighlighter=true
&hl.highlightMultiTerm=true
&hl.usePhraseHighlighter=true
&hl.fragsize=999
&hl.preserveMulti=true

* Result:
I have only one color (in my case the yellow) for all different values found

* BUT *

If I use a non multi-value field like ti (title) with a query with some keywords


*Result (i.e ti:(foo OR merge) ):
I have different colors for each different terms found


Question:
- Is it because IC field is not defined with all term*="true" options ?
- How can I have different color and not use pre and post tags ?


Many thanks for your help !

-Message d'origine-
De : Erick Erickson [mailto:erickerick...@gmail.com]
Envoyé : mercredi 4 octobre 2017 15:48
À : solr-user
Objet : Re: Solr 5.4.0: Colored Highlight and multi-value field ?

How does it not work for you? Details matter, an example set of values and the 
response from Solr are good bits of info for us to have.

On Tue, Oct 3, 2017 at 3:59 PM, Bruno Mannina 
wrote:

> Dear all,
>
>
>
> Is it possible to have a colored highlight in a multi-value field ?
>
>
>
> I’m succeed to do it on a textfield but not in a multi-value field,
> then SOLR takes hl.simple.pre / hl.simple.post as tag.
>
>
>
> Thanks a lot for your help,
>
>
>
> Cordialement, Best Regards
>
> Bruno Mannina
>
> www.matheo-software.com
>
> www.patent-pulse.com
>
> Tél. +33 0 970 738 743
>
> Mob. +33 0 634 421 817
>
> [image: facebook (1)] <https://www.facebook.com/PatentPulse>[image:
> 1425551717] <https://twitter.com/matheosoftware>[image: 1425551737]
> <https://www.linkedin.com/company/matheo-software>[image: 1425551760]
> <https://www.youtube.com/user/MatheoSoftware>
>
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> campaign=sig-email&utm_content=emailclient> Garanti sans virus.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> campaign=sig-email&utm_content=emailclient>
> <#m_-7780043212915396992_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Get docs with same value in one other field ?

2017-02-22 Thread Bruno Mannina


Hello all,



I'm facing a problem that I would like to know if it's possible to do it
with one request in SOLR.

I have SOLR 5.



I have docs with several fields but here two are useful for us.

Field 1 : id (unique key)

Field 2 : fid (family Id)



i.e:



id:XXX

fid: 1254



id: YYY

fid: 1254



id: ZZZ

fid:3698



id: QQQ

fid: 3698

.



I request only by id in my project, and I would like in my result have also
all docs that have the same fid .

i.e. if I request :

..q=id:ZZZ&.



I get the docs ZZZ of course but also QQQ because QQQ_fid = ZZZ_fid



MoreLikeThis, Group, etc. don't answer to my question (but may I don't know
how to use it to do that)



Thanks for your help,



Bruno







---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


RE: Get docs with same value in one other field ?

2017-02-22 Thread Bruno Mannina
Ye it's perfect !!! it works.

Thanks David & Alexandre !

-Message d'origine-
De : David Hastings [mailto:hastings.recurs...@gmail.com]
Envoyé : mercredi 22 février 2017 23:00
À : solr-user@lucene.apache.org
Objet : Re: Get docs with same value in one other field ?

sorry embedded link:

q={!join+from=fid=fid}id:ZZZ

On Wed, Feb 22, 2017 at 4:58 PM, David Hastings < hastings.recurs...@gmail.com> 
wrote:

> for a reference to some examples:
>
> https://wiki.apache.org/solr/Join
>
> sor youd want something like:
>
> q={!join+from=fid=fid}i
> <http://localhost:8983/solr/select?q=%7B!join+from=manu_id_s+to=id%7Di
> pod>
> d:ZZZ
>
> i dont have much experience with this function however
>
>
>
> On Wed, Feb 22, 2017 at 4:40 PM, Alexandre Rafalovitch
>  > wrote:
>
>> Sounds like two clauses with the second clause being a JOINT search
>> where you match by ID and then join on FID.
>>
>> Would that work?
>>
>> Regards,
>>Alex.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and
>> experienced
>>
>>
>> On 22 February 2017 at 16:27, Bruno Mannina  wrote:
>> >
>> >
>> > Hello all,
>> >
>> >
>> >
>> > I'm facing a problem that I would like to know if it's possible to
>> > do it with one request in SOLR.
>> >
>> > I have SOLR 5.
>> >
>> >
>> >
>> > I have docs with several fields but here two are useful for us.
>> >
>> > Field 1 : id (unique key)
>> >
>> > Field 2 : fid (family Id)
>> >
>> >
>> >
>> > i.e:
>> >
>> >
>> >
>> > id:XXX
>> >
>> > fid: 1254
>> >
>> >
>> >
>> > id: YYY
>> >
>> > fid: 1254
>> >
>> >
>> >
>> > id: ZZZ
>> >
>> > fid:3698
>> >
>> >
>> >
>> > id: QQQ
>> >
>> > fid: 3698
>> >
>> > .
>> >
>> >
>> >
>> > I request only by id in my project, and I would like in my result
>> > have
>> also
>> > all docs that have the same fid .
>> >
>> > i.e. if I request :
>> >
>> > ..q=id:ZZZ&.
>> >
>> >
>> >
>> > I get the docs ZZZ of course but also QQQ because QQQ_fid = ZZZ_fid
>> >
>> >
>> >
>> > MoreLikeThis, Group, etc. don't answer to my question (but may I
>> > don't
>> know
>> > how to use it to do that)
>> >
>> >
>> >
>> > Thanks for your help,
>> >
>> >
>> >
>> > Bruno
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > ---
>> > L'absence de virus dans ce courrier électronique a été vérifiée par
>> > le
>> logiciel antivirus Avast.
>> > https://www.avast.com/antivirus
>>
>
>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



RE: Get docs with same value in one other field ?

2017-02-22 Thread Bruno Mannina
Just a little more thing, I need to request up to 1000 id's
Actually I test with 2 or 3 and it takes times (my db is around 100 000 000 
docs, 128Go RAM).

Do you think, it could be OOM error ? if I test with up to 1000 id ?

-Message d'origine-
De : Bruno Mannina [mailto:bmann...@free.fr] 
Envoyé : mercredi 22 février 2017 23:47
À : solr-user@lucene.apache.org
Objet : RE: Get docs with same value in one other field ?

Ye it's perfect !!! it works.

Thanks David & Alexandre !

-Message d'origine-
De : David Hastings [mailto:hastings.recurs...@gmail.com]
Envoyé : mercredi 22 février 2017 23:00
À : solr-user@lucene.apache.org
Objet : Re: Get docs with same value in one other field ?

sorry embedded link:

q={!join+from=fid=fid}id:ZZZ

On Wed, Feb 22, 2017 at 4:58 PM, David Hastings < hastings.recurs...@gmail.com> 
wrote:

> for a reference to some examples:
>
> https://wiki.apache.org/solr/Join
>
> sor youd want something like:
>
> q={!join+from=fid=fid}i
> <http://localhost:8983/solr/select?q=%7B!join+from=manu_id_s+to=id%7Di
> pod>
> d:ZZZ
>
> i dont have much experience with this function however
>
>
>
> On Wed, Feb 22, 2017 at 4:40 PM, Alexandre Rafalovitch 
>  > wrote:
>
>> Sounds like two clauses with the second clause being a JOINT search 
>> where you match by ID and then join on FID.
>>
>> Would that work?
>>
>> Regards,
>>Alex.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and 
>> experienced
>>
>>
>> On 22 February 2017 at 16:27, Bruno Mannina  wrote:
>> >
>> >
>> > Hello all,
>> >
>> >
>> >
>> > I'm facing a problem that I would like to know if it's possible to 
>> > do it with one request in SOLR.
>> >
>> > I have SOLR 5.
>> >
>> >
>> >
>> > I have docs with several fields but here two are useful for us.
>> >
>> > Field 1 : id (unique key)
>> >
>> > Field 2 : fid (family Id)
>> >
>> >
>> >
>> > i.e:
>> >
>> >
>> >
>> > id:XXX
>> >
>> > fid: 1254
>> >
>> >
>> >
>> > id: YYY
>> >
>> > fid: 1254
>> >
>> >
>> >
>> > id: ZZZ
>> >
>> > fid:3698
>> >
>> >
>> >
>> > id: QQQ
>> >
>> > fid: 3698
>> >
>> > .
>> >
>> >
>> >
>> > I request only by id in my project, and I would like in my result 
>> > have
>> also
>> > all docs that have the same fid .
>> >
>> > i.e. if I request :
>> >
>> > ..q=id:ZZZ&.
>> >
>> >
>> >
>> > I get the docs ZZZ of course but also QQQ because QQQ_fid = ZZZ_fid
>> >
>> >
>> >
>> > MoreLikeThis, Group, etc. don't answer to my question (but may I 
>> > don't
>> know
>> > how to use it to do that)
>> >
>> >
>> >
>> > Thanks for your help,
>> >
>> >
>> >
>> > Bruno
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > ---
>> > L'absence de virus dans ce courrier électronique a été vérifiée par 
>> > le
>> logiciel antivirus Avast.
>> > https://www.avast.com/antivirus
>>
>
>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



RE: Get docs with same value in one other field ?

2017-02-22 Thread Bruno Mannina
Ok Alex, I will looking for a best solution. I'm afraid to have a OOM with a 
huge number of ids.

And yes I already use a POST query, it was just to show my problem. Anyway 
thanks to indicate me this information also.

-Message d'origine-
De : Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Envoyé : jeudi 23 février 2017 00:08
À : solr-user
Objet : Re: Get docs with same value in one other field ?

A thousand of IDs could be painful to send and perhaps to run against.

At minimum, look into splitting your query into multiple variables (so you 
could reuse the list in both direct and join query). Look also at using terms 
query processor that specializes in the list of IDs. You may also need to send 
your ID list as a POST, not GET request to avoid blowing the URL length.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 22 February 2017 at 17:55, Bruno Mannina  wrote:
> Just a little more thing, I need to request up to 1000 id's Actually I
> test with 2 or 3 and it takes times (my db is around 100 000 000 docs, 128Go 
> RAM).
>
> Do you think, it could be OOM error ? if I test with up to 1000 id ?
>
> -Message d'origine-
> De : Bruno Mannina [mailto:bmann...@free.fr] Envoyé : mercredi 22
> février 2017 23:47 À : solr-user@lucene.apache.org Objet : RE: Get
> docs with same value in one other field ?
>
> Ye it's perfect !!! it works.
>
> Thanks David & Alexandre !
>
> -Message d'origine-
> De : David Hastings [mailto:hastings.recurs...@gmail.com]
> Envoyé : mercredi 22 février 2017 23:00 À :
> solr-user@lucene.apache.org Objet : Re: Get docs with same value in
> one other field ?
>
> sorry embedded link:
>
> q={!join+from=fid=fid}id:ZZZ
>
> On Wed, Feb 22, 2017 at 4:58 PM, David Hastings < 
> hastings.recurs...@gmail.com> wrote:
>
>> for a reference to some examples:
>>
>> https://wiki.apache.org/solr/Join
>>
>> sor youd want something like:
>>
>> q={!join+from=fid=fid}i
>> <http://localhost:8983/solr/select?q=%7B!join+from=manu_id_s+to=id%7D
>> i
>> pod>
>> d:ZZZ
>>
>> i dont have much experience with this function however
>>
>>
>>
>> On Wed, Feb 22, 2017 at 4:40 PM, Alexandre Rafalovitch
>> > > wrote:
>>
>>> Sounds like two clauses with the second clause being a JOINT search
>>> where you match by ID and then join on FID.
>>>
>>> Would that work?
>>>
>>> Regards,
>>>Alex.
>>> 
>>> http://www.solr-start.com/ - Resources for Solr users, new and
>>> experienced
>>>
>>>
>>> On 22 February 2017 at 16:27, Bruno Mannina  wrote:
>>> >
>>> >
>>> > Hello all,
>>> >
>>> >
>>> >
>>> > I'm facing a problem that I would like to know if it's possible to
>>> > do it with one request in SOLR.
>>> >
>>> > I have SOLR 5.
>>> >
>>> >
>>> >
>>> > I have docs with several fields but here two are useful for us.
>>> >
>>> > Field 1 : id (unique key)
>>> >
>>> > Field 2 : fid (family Id)
>>> >
>>> >
>>> >
>>> > i.e:
>>> >
>>> >
>>> >
>>> > id:XXX
>>> >
>>> > fid: 1254
>>> >
>>> >
>>> >
>>> > id: YYY
>>> >
>>> > fid: 1254
>>> >
>>> >
>>> >
>>> > id: ZZZ
>>> >
>>> > fid:3698
>>> >
>>> >
>>> >
>>> > id: QQQ
>>> >
>>> > fid: 3698
>>> >
>>> > .
>>> >
>>> >
>>> >
>>> > I request only by id in my project, and I would like in my result
>>> > have
>>> also
>>> > all docs that have the same fid .
>>> >
>>> > i.e. if I request :
>>> >
>>> > ..q=id:ZZZ&.
>>> >
>>> >
>>> >
>>> > I get the docs ZZZ of course but also QQQ because QQQ_fid =
>>> > ZZZ_fid
>>> >
>>> >
>>> >
>>> > MoreLikeThis, Group, etc. don't answer to my question (but may I
>>> > don't
>>> know
>>> > how to use it to do that)
>>> >
>>> >
>>> >
>>> > Thanks for your help,
>>> >
>>> >
>>> >
>>> > Bruno
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > ---
>>> > L'absence de virus dans ce courrier électronique a été vérifiée
>>> > par le
>>> logiciel antivirus Avast.
>>> > https://www.avast.com/antivirus
>>>
>>
>>
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus
>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Get docs with same value in one other field ?

2017-02-22 Thread Bruno Mannina
Hello all,



I’m facing a problem that I would like to know if it’s possible to do it
with one request in SOLR.

I have SOLR 5.



I have docs with several fields but here two are useful for us.

Field 1 : id (unique key)

Field 2 : fid (family Id)



i.e:



id:XXX

fid: 1254



id: YYY

fid: 1254



id: ZZZ

fid:3698



id: QQQ

fid: 3698

…



I request only by id in my project, and I would like in my result have also
all docs that have the same fid .

i.e. if I request :

..q=id:ZZZ&…



I get the docs ZZZ of course but also QQQ because QQQ_fid = ZZZ_fid



MoreLikeThis, Group, etc… don’t answer to my question (but may I don’t know
how to use it to do that)



Thanks for your help,



Bruno





Bruno Mannina

 <http://www.matheo-software.com> www.matheo-software.com

 <http://www.patent-pulse.com> www.patent-pulse.com

Tél. +33 0 430 650 788
Fax. +33 0 430 650 728



Stay in touch!

 <https://twitter.com/matheosoftware> cid:image001.png@01D2860B.70B15DC0
<https://www.linkedin.com/company/matheo-software>
cid:image002.png@01D2860B.70B15DC0
<https://www.youtube.com/user/MatheoSoftware>
cid:image003.png@01D2860B.70B15DC0





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


Solr5, Clustering & exact phrase problem

2017-03-13 Thread Bruno Mannina
Dear Solr-User,



I’m trying to use solr clustering (Lingo algorithm) on my database (notices
with id, title, abstract fields)



All works fine when my query is simple (with or without Boolean operators)
but if I try with exact phrase like:

..&q=ti:“snowboard binding”&…



Then Solr generates only one cluster named “other” and put inside all
notices.



As I test it since few times, I have in my solrconfig the sample that
example gives.

Of course, I changed field names.



Do you know if I made a mistake, missing something or may be exact phrase is
not supported by clustering ?



Just one another question, I want to generate clusters by using fields
abstract and title, is exact what I did ion my solrconfig:

Carrot.title = title

Carrot.snippet = abstract



Thanks a lot for your help,



Bruno Mannina

 <http://www.matheo-software.com> www.matheo-software.com

 <http://www.patent-pulse.com> www.patent-pulse.com

Tél. +33 0 430 650 788
Fax. +33 0 430 650 728

 <https://www.facebook.com/PatentPulse> facebook (1)
<https://twitter.com/matheosoftware> 1425551717
<https://www.linkedin.com/company/matheo-software> 1425551737
<https://www.youtube.com/user/MatheoSoftware> 1425551760





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


Shards, delete duplicates ?

2017-04-14 Thread Bruno Mannina
Dear Solr users,



I have two collections C1 and C2

For C1 and C2 the unique key is ID.



ID in C1 are patent numbers normalized i.e US + 12 digits + A1

ID in C2 are patent numbers as I receive them. US + 13 digits + A1 (a
leading 0 is added)



My collection C2 has a field name ID12 which is not defined as a unique
field.

This ID12 is the copy of the field ID of C1. (US + 12 digits + A1)

Data in ID12 are unique in the whole C2 collection.



Data in C1_ID and C2_ID12 are the same.



I try to request these both collections using shards in the url.

It works fine but I get duplicate documents. It’s normal I know.



Is exists a method, a parameter, or anything else that allows me to indicate

to  solr to compare ID in C1 with ID12 in C2 to delete duplicates ?



Many thanks for your help,





Bruno Mannina

 <http://www.matheo-software.com> www.matheo-software.com

 <http://www.patent-pulse.com> www.patent-pulse.com

Tél. +33 0 430 650 788
Fax. +33 0 430 650 728

 <https://www.facebook.com/PatentPulse> facebook (1)
<https://twitter.com/matheosoftware> 1425551717
<https://www.linkedin.com/company/matheo-software> 1425551737
<https://www.youtube.com/user/MatheoSoftware> 1425551760





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


How can I request a big list of values ?

2014-08-09 Thread Bruno Mannina

Hi All,

I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.

All work fine, it's great :)

But now, I would like to request a list of values in the same field
(more than 2000 values)

I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)

but I have a list of 2000 values ! I think it's not the good idea to use
this method.

Can someone help me to find the good solution ?
Can I use a json structure by using a POST method ?

Thanks a lot,
Bruno
|


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com


Re: How can I request a big list of values ?

2014-08-10 Thread Bruno Mannina

Hi Jack,

ok but for 2000 values, it means that I must do 40 requests if I choose 
to have 50 values by requests :'(
and in my case, user can choose about 8 topics, so it can generate 8 
times 40 requests... humm...


is it not possible to send a text, json, xml file ?

Le 10/08/2014 17:38, Jack Krupansky a écrit :
Generally, "large requests" are an anti-pattern in modern distributed 
systems. Better to have a number of smaller requests executing in 
parallel and then merge the results in the application layer.


-- Jack Krupansky

-Original Message----- From: Bruno Mannina
Sent: Saturday, August 9, 2014 7:18 PM
To: solr-user@lucene.apache.org
Subject: How can I request a big list of values ?

Hi All,

I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.

All work fine, it's great :)

But now, I would like to request a list of values in the same field
(more than 2000 values)

I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)

but I have a list of 2000 values ! I think it's not the good idea to use
this method.

Can someone help me to find the good solution ?
Can I use a json structure by using a POST method ?

Thanks a lot,
Bruno
|


---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: How can I request a big list of values ?

2014-08-10 Thread Bruno Mannina

Hi Anshum,

I can do it with 3.6 release no ?

my main problem, it's that I have around 2000 values, so I can't use one 
request with these values, it's too wide. :'(


I will take a look to generate (like Jack proposes me) several requests, 
but even in this case it seems to be not safe...


Le 10/08/2014 19:45, Anshum Gupta a écrit :

Hi Bruno,

If you would have been on a more recent release,
https://issues.apache.org/jira/browse/SOLR-6318 would have come in
handy perhaps.
You might want to look at patching your version with this though (as a
work around).

On Sat, Aug 9, 2014 at 4:18 PM, Bruno Mannina  wrote:

Hi All,

I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.

All work fine, it's great :)

But now, I would like to request a list of values in the same field (more
than 2000 values)

I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)

but I have a list of 2000 values ! I think it's not the good idea to use
this method.

Can someone help me to find the good solution ?
Can I use a json structure by using a POST method ?

Thanks a lot,
Bruno
|


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



statistic on a field?

2013-10-20 Thread Bruno Mannina

Dear,

I have a field named "Authors", is it possible to have
the frequency of terms (first 2000 for i.e.) of this field ?

Thanks,

Bruno


Re: statistic on a field?

2013-10-20 Thread Bruno Mannina

Le 20/10/2013 17:52, Bruno Mannina a écrit :

Dear,

I have a field named "Authors", is it possible to have
the frequency of terms (first 2000 for i.e.) of this field ?

Thanks,

Bruno


By using Schema Browser, I have information on my field Authors but I 
have a problem,

I have statistic on part of terms of this field...

i.e.

termfreq
co256875
ltd235899
corp 195554
etc...

The field has been splitted to do stats ?!

FieldType: TEXT_GENERAL
Properties: Indexed, Tokenized, Stored, Multivalued
Schema: Indexed, Tokenized, Stored, Multivalued
Index: indexed, Tokenized, Stored

Position Increment Gap: 100

Distinct: 1803034

I think it's because this field is Tokenized ? no ?

Regards,
Bruno




Is Solr can create temporary sub-index ?

2013-10-23 Thread Bruno Mannina

Dear Solr User,

We have to do a new web project which is : Connect our SOLR database to 
a web plateform.


This Web Plateform will be used by several users at the same time.
They do requests on our SOLR and they can apply filter on the result.

i.e.:
Our SOLR contains 87M docs
An user do requests, result is around few hundreds to several thousands.
On the Web Plateform, user will see first 20 results (or more by using 
Next Page button)
But he will need also to filter the whole result by additional terms. 
(Terms that our plateform will propose him)


Is SOLR can create temporary index (manage by SOLR himself during a web 
session) ?


My goal is to not download the whole result on local computer to provide 
filter, or to re-send

the same request several times added to the new criterias.

Many thanks for your comment,

Regards,
Bruno


Re: Is Solr can create temporary sub-index ?

2013-10-23 Thread Bruno Mannina

Hello Tim,

Yes solr's facet could be a solution, but I need to re-send the q= each 
time.

I'm asking me just if an another solution exists.

Facet seems to be the good solution.

Bruno



Le 23/10/2013 17:03, Timothy Potter a écrit :

Hi Bruno,

Have you looked into Solr's facet support? If I'm reading your post
correctly, this sounds like the classic case for facets. Each time the user
selects a facet, you add a filter query (fq clause) to the original query.
http://wiki.apache.org/solr/SolrFacetingOverview

Tim


On Wed, Oct 23, 2013 at 8:16 AM, Bruno Mannina  wrote:


Dear Solr User,

We have to do a new web project which is : Connect our SOLR database to a
web plateform.

This Web Plateform will be used by several users at the same time.
They do requests on our SOLR and they can apply filter on the result.

i.e.:
Our SOLR contains 87M docs
An user do requests, result is around few hundreds to several thousands.
On the Web Plateform, user will see first 20 results (or more by using
Next Page button)
But he will need also to filter the whole result by additional terms.
(Terms that our plateform will propose him)

Is SOLR can create temporary index (manage by SOLR himself during a web
session) ?

My goal is to not download the whole result on local computer to provide
filter, or to re-send
the same request several times added to the new criterias.

Many thanks for your comment,

Regards,
Bruno





Re: Is Solr can create temporary sub-index ?

2013-10-23 Thread Bruno Mannina

I have a little question concerning statistics on a request:

I have a field defined like that:
multiValued="true"/>


positionIncrementGap="100" autoGeneratePhraseQueries="true">

 
  
  words="stopwords.txt" enablePositionIncrements="true"/>

  
 
 
  
  words="stopwords.txt" enablePositionIncrements="true"/>
  ignoreCase="true" expand="true"/>

  
 


Date sample for this field:

 A23L1/22066
 A23L1/227
 A23L1/231
 A23L1/2375


My question is:
  Is it possible to have frequency of terms for the whole result of the 
initial user's request?


Thanks a lot,
Bruno

Le 23/10/2013 18:12, Timothy Potter a écrit :

Yes, absolutely you resend the q= each time, optionally with any facets
selected by the user using fq=


On Wed, Oct 23, 2013 at 10:00 AM, Bruno Mannina  wrote:


Hello Tim,

Yes solr's facet could be a solution, but I need to re-send the q= each
time.
I'm asking me just if an another solution exists.

Facet seems to be the good solution.

Bruno



Le 23/10/2013 17:03, Timothy Potter a écrit :

  Hi Bruno,

Have you looked into Solr's facet support? If I'm reading your post
correctly, this sounds like the classic case for facets. Each time the
user
selects a facet, you add a filter query (fq clause) to the original query.
http://wiki.apache.org/solr/**SolrFacetingOverview<http://wiki.apache.org/solr/SolrFacetingOverview>

Tim


On Wed, Oct 23, 2013 at 8:16 AM, Bruno Mannina  wrote:

  Dear Solr User,

We have to do a new web project which is : Connect our SOLR database to a
web plateform.

This Web Plateform will be used by several users at the same time.
They do requests on our SOLR and they can apply filter on the result.

i.e.:
Our SOLR contains 87M docs
An user do requests, result is around few hundreds to several thousands.
On the Web Plateform, user will see first 20 results (or more by using
Next Page button)
But he will need also to filter the whole result by additional terms.
(Terms that our plateform will propose him)

Is SOLR can create temporary index (manage by SOLR himself during a web
session) ?

My goal is to not download the whole result on local computer to provide
filter, or to re-send
the same request several times added to the new criterias.

Many thanks for your comment,

Regards,
Bruno






Re: Is Solr can create temporary sub-index ?

2013-10-23 Thread Bruno Mannina
Hum I think my fieldType = "text_classification" is not appropriated for 
this kind of data...


I don't need to use stopwords, synonym etc...

IC field is a field that contains codes, and codes contains often the 
char "/"

and if I use the Terms option, I get:


...
4563254
3763554
2263254
...
..

Le 23/10/2013 18:51, Bruno Mannina a écrit :
positionIncrementGap="100" autoGeneratePhraseQueries="true">

 
  
  words="stopwords.txt" enablePositionIncrements="true"/>

  
 
 
  
  words="stopwords.txt" enablePositionIncrements="true"/>
  ignoreCase="true" expand="true"/>

  
 
 




Re: Is Solr can create temporary sub-index ?

2013-10-23 Thread Bruno Mannina

I need your help to define the right fieldType, please,

this field must be indexed, stored and each value must be considered as 
one term.

The char / don't be consider like a separator.

Is String could be a good fieldType ?

thanks

Le 23/10/2013 18:51, Bruno Mannina a écrit :


 A23L1/22066
 A23L1/227
 A23L1/231
 A23L1/2375
 




What is the right fieldType for this kind of field?

2013-10-23 Thread Bruno Mannina

Dear,

Data look likes:

A23L1/22066
 A23L1/227
 A23L1/231
 A23L1/2375

I tried:
- String
but I can't search with troncation (i.e. A23*)

- Text_General
but as my code contains / then data are splitted...

What kind of field must choose to use truncation and consider code with 
/ as one term?


thanks a lot for your help,
Bruno


Re: What is the right fieldType for this kind of field?

2013-10-23 Thread Bruno Mannina

Hi Jack,

Yes String works fine, I forgot to restart my solr server after changing 
my schema.xml...arrf.I'm so stupid sorry !


Le 23/10/2013 20:09, Jack Krupansky a écrit :
Trailing wildcard should work fine for strings, but "a23*" will not 
match "A23*" due to case. You could use the keyword tokenizer plus the 
lower case filter.


-- Jack Krupansky

-Original Message- From: Bruno Mannina
Sent: Wednesday, October 23, 2013 1:54 PM
To: solr-user@lucene.apache.org
Subject: What is the right fieldType for this kind of field?

Dear,

Data look likes:

A23L1/22066
 A23L1/227
 A23L1/231
 A23L1/2375

I tried:
- String
but I can't search with troncation (i.e. A23*)

- Text_General
but as my code contains / then data are splitted...

What kind of field must choose to use truncation and consider code with
/ as one term?

thanks a lot for your help,
Bruno






Re: What is the right fieldType for this kind of field?

2013-10-23 Thread Bruno Mannina

Le 23/10/2013 20:09, Jack Krupansky a écrit :
You could use the keyword tokenizer plus the lower case filter. 

Jack,

Could you help me to write the right fieldType please?
(index and query)

Another thing, I don't know if I must use the Keyword tokenizer because 
codes contain "/" char,

and Tokenizer seems split code no ?

Many thanks,

Bruno


Re: What is the right fieldType for this kind of field?

2013-10-23 Thread Bruno Mannina

Le 23/10/2013 22:44, Bruno Mannina a écrit :

Le 23/10/2013 20:09, Jack Krupansky a écrit :
You could use the keyword tokenizer plus the lower case filter. 

Jack,

Could you help me to write the right fieldType please?
(index and query)

Another thing, I don't know if I must use the Keyword tokenizer 
because codes contain "/" char,

and Tokenizer seems split code no ?

Many thanks,

Bruno



may be an answer (i don't tested yet)

http://pietervogelaar.nl/solr-3-5-search-case-insensitive-on-a-string-field-for-exact-match/


Re: What is the right fieldType for this kind of field?

2013-10-23 Thread Bruno Mannina

Le 23/10/2013 22:49, Bruno Mannina a écrit :

Le 23/10/2013 22:44, Bruno Mannina a écrit :

Le 23/10/2013 20:09, Jack Krupansky a écrit :
You could use the keyword tokenizer plus the lower case filter. 

Jack,

Could you help me to write the right fieldType please?
(index and query)

Another thing, I don't know if I must use the Keyword tokenizer 
because codes contain "/" char,

and Tokenizer seems split code no ?

Many thanks,

Bruno



may be an answer (i don't tested yet)

http://pietervogelaar.nl/solr-3-5-search-case-insensitive-on-a-string-field-for-exact-match/ 





ok it works fine !


Terms function join with a Select function ?

2013-10-23 Thread Bruno Mannina

Dear Solr users,

I use the Terms function to see the frequency data in a field but it's 
for the whole database.


I have 2 questions:
- Is it possible to increase the number of statistic ? actually I have 
the 10 first frequency term.


- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks


Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina

Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but it's 
for the whole database.


I have 2 questions:
- Is it possible to increase the number of statistic ? actually I have 
the 10 first frequency term.


- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina

Dear,

humI don't know how can I use it..;

I tried:

my query:
ti:snowboard (3095 results)

I would like to have at the end of my XML, the Terms statistic for the 
field AP (applicant field (patent notice))


but I haven't that...

Please help,
Bruno

/select?q=ti%Asnowboard&version=2.2&start=0&rows=10&indent=on&facet=true&f.ap.facet.limit=10

Le 24/10/2013 14:04, Erik Hatcher a écrit :

That would be called faceting :)

 http://wiki.apache.org/solr/SimpleFacetParameters




On Oct 24, 2013, at 5:23 AM, Bruno Mannina  wrote:


Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result....

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but it's for the 
whole database.

I have 2 questions:
- Is it possible to increase the number of statistic ? actually I have the 10 
first frequency term.

- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com







---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina

humm facet perfs are very bad (Solr 3.6.0)
My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram)

I thought facets will work only on the result but it seems it's not the 
case.


My request:
http://localhost:2727/solr/select?q=ti:snowboard&rows=0&facet=true&facet.field=ap&facet.limit=5

Do you think my request is wrong ?

Maybe it's not possible to have statistic on a field (like Terms 
function) on a query.....


Thx for your help,

Bruno


Le 24/10/2013 19:40, Bruno Mannina a écrit :

Dear,

humI don't know how can I use it..;

I tried:

my query:
ti:snowboard (3095 results)

I would like to have at the end of my XML, the Terms statistic for the 
field AP (applicant field (patent notice))


but I haven't that...

Please help,
Bruno

/select?q=ti%Asnowboard&version=2.2&start=0&rows=10&indent=on&facet=true&f.ap.facet.limit=10 



Le 24/10/2013 14:04, Erik Hatcher a écrit :

That would be called faceting :)

 http://wiki.apache.org/solr/SimpleFacetParameters




On Oct 24, 2013, at 5:23 AM, Bruno Mannina  wrote:


Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but 
it's for the whole database.


I have 2 questions:
- Is it possible to increase the number of statistic ? actually I 
have the 10 first frequency term.


- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks




---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com







---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina

Just a little precision: solr down after running my URL :( so bad...

Le 24/10/2013 22:04, Bruno Mannina a écrit :

humm facet perfs are very bad (Solr 3.6.0)
My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram)

I thought facets will work only on the result but it seems it's not 
the case.


My request:
http://localhost:2727/solr/select?q=ti:snowboard&rows=0&facet=true&facet.field=ap&facet.limit=5 



Do you think my request is wrong ?

Maybe it's not possible to have statistic on a field (like Terms 
function) on a query.....


Thx for your help,

Bruno


Le 24/10/2013 19:40, Bruno Mannina a écrit :

Dear,

humI don't know how can I use it..;

I tried:

my query:
ti:snowboard (3095 results)

I would like to have at the end of my XML, the Terms statistic for 
the field AP (applicant field (patent notice))


but I haven't that...

Please help,
Bruno

/select?q=ti%Asnowboard&version=2.2&start=0&rows=10&indent=on&facet=true&f.ap.facet.limit=10 



Le 24/10/2013 14:04, Erik Hatcher a écrit :

That would be called faceting :)

 http://wiki.apache.org/solr/SimpleFacetParameters




On Oct 24, 2013, at 5:23 AM, Bruno Mannina  wrote:


Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but 
it's for the whole database.


I have 2 questions:
- Is it possible to increase the number of statistic ? actually I 
have the 10 first frequency term.


- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks




---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com







---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Normalized data during indexing ?

2013-10-25 Thread Bruno Mannina

Dear,

I would like to know if SOLR can do that:

I have a field named "Assignee" with values like:

Int Business Machines Corp
Int Business Mach Inc

I would like to have a "result field" in the schema.xml named
"Norm_Assignee" which contains
the translation with a lexical file:

Int Business Machines Corp > IBM
Int Business Mach Inc > IBM

So, I will have:




 Int Business Machines Corp


 IBM






 Int Business Mach Inc


 IBM



and if the correspondance do not exists then don't create the data.

I'm sure this idea is possible with SOLR but I don't found on Wiki,
Google, SOLR Support

Thanks for any idea,

Bruno


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Terms function join with a Select function ?

2013-10-25 Thread Bruno Mannina

Hi Erick,

I think it's a memory problem, I do my test on a little computer at home 
(8Go Ram i3-2120 3.30Ghz 64bits)


and my database is very big 87M docs for 200Go size.

I thought SOLR could done statistic on only the query answer, so here on 
around 3000 docs (around 6000 terms)

it's not so big

I do analyze log yet, I will do in few hours when I comeback home

Thanks,
Bruno

Le 25/10/2013 15:36, Erick Erickson a écrit :

How many unique values are in the field? Solr has to create a counter
for each and every one of them, you may be blowing memory up. What
do the logs say?


Best,
Erick


On Thu, Oct 24, 2013 at 4:07 PM, Bruno Mannina  wrote:


Just a little precision: solr down after running my URL :( so bad...

Le 24/10/2013 22:04, Bruno Mannina a écrit :

  humm facet perfs are very bad (Solr 3.6.0)

My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram)

I thought facets will work only on the result but it seems it's not the
case.

My request:
http://localhost:2727/solr/**select?q=ti:snowboard&rows=0&**
facet=true&facet.field=ap&**facet.limit=5<http://localhost:2727/solr/select?q=ti:snowboard&rows=0&facet=true&facet.field=ap&facet.limit=5>

Do you think my request is wrong ?

Maybe it's not possible to have statistic on a field (like Terms
function) on a query.....

Thx for your help,

Bruno


Le 24/10/2013 19:40, Bruno Mannina a écrit :


Dear,

humI don't know how can I use it..;

I tried:

my query:
ti:snowboard (3095 results)

I would like to have at the end of my XML, the Terms statistic for the
field AP (applicant field (patent notice))

but I haven't that...

Please help,
Bruno

/select?q=ti%Asnowboard&**version=2.2&start=0&rows=10&**
indent=on&facet=true&f.ap.**facet.limit=10

Le 24/10/2013 14:04, Erik Hatcher a écrit :


That would be called faceting :)

  
http://wiki.apache.org/solr/**SimpleFacetParameters<http://wiki.apache.org/solr/SimpleFacetParameters>




On Oct 24, 2013, at 5:23 AM, Bruno Mannina  wrote:

  Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :


Dear Solr users,

I use the Terms function to see the frequency data in a field but
it's for the whole database.

I have 2 questions:
- Is it possible to increase the number of statistic ? actually I
have the 10 first frequency term.

- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks




---
Ce courrier électronique ne contient aucun virus ou logiciel
malveillant parce que la protection avast! Antivirus est active.
http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Normalized data during indexing ?

2013-10-25 Thread Bruno Mannina

Hi Michael,

thanks it sounds like I'm looking for

I need to investigate

Thanks a lot !

Le 25/10/2013 14:46, michael.boom a écrit :

Maybe this can help you:
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Normalized-data-during-indexing-tp4097750p4097752.html
Sent from the Solr - User mailing list archive at Nabble.com.





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



How to request not directly my SOLR server ?

2013-11-26 Thread Bruno Mannina

Dear All,

I show my SOLR server to a friend and its first question was:

"You can request directly your solr database from your internet
explorer?! is it not a security problem?
each person which has your request link can use your database directly?"

So I ask the question here. I protect my admin panel but is it possible
to protect a direct request ?

By using google, lot a result concern admin panel security but I can't
find information about that.

Thanks for your comment,

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: How to request not directly my SOLR server ?

2013-11-26 Thread Bruno Mannina

Le 26/11/2013 18:52, Shawn Heisey a écrit :

On 11/26/2013 8:37 AM, Bruno Mannina wrote:

I show my SOLR server to a friend and its first question was:

"You can request directly your solr database from your internet 
explorer?! is it not a security problem?

each person which has your request link can use your database directly?"

So I ask the question here. I protect my admin panel but is it 
possible to protect a direct request ?


Don't make your Solr server directly accessible from the Internet.  
Only make it accessible from the machines that serve your website and 
whoever needs to administer it.


Solr has no security features.  You can use the security features in 
whatever container is running Solr, but that is outside the scope of 
this mailing list.


Thanks,
Shawn




Thanks a lot for this information,

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Indexed a new big database while the old is running?

2014-02-18 Thread Bruno Mannina

Dear Solr Users,

We have actually a SOLR db with around 88 000 000 docs.
All work fine :)

We receive each year a new backfile with the same content (but improved).

Index these docs takes several days on SOLR,
So is it possible to create a new collection (restart SOLR) and
Index these new 88 000 000 docs without stopping the current collection ?

We have around 1 million connections by month.

Do you think that this new indexation may cause problem to SOLR using?
Note: new database will not be used until the current collection will be 
stopped.


Thx for your comment,
Bruno



Re: Indexed a new big database while the old is running?

2014-02-19 Thread Bruno Mannina

Hi Shaw,

Thanks for your answer.

Actually we haven't performance problem because we do only select request.
We have 4 CPUs 8cores 24Go Ram.

I know how to create alias, my question was just concerning performance, 
and you have right,
impossible to answer to this question without more information about my 
system, sorry.


I will do real test and I will check if perf will be down, if yes I will 
stop new indexation


If you have more information concerning indexation performance with my 
server config, don't miss to

write me. :)

Have a nice day,

Regards,
Bruno


Le 18/02/2014 16:30, Shawn Heisey a écrit :

On 2/18/2014 5:28 AM, Bruno Mannina wrote:

We have actually a SOLR db with around 88 000 000 docs.
All work fine :)

We receive each year a new backfile with the same content (but improved).

Index these docs takes several days on SOLR,
So is it possible to create a new collection (restart SOLR) and
Index these new 88 000 000 docs without stopping the current collection ?

We have around 1 million connections by month.

Do you think that this new indexation may cause problem to SOLR using?
Note: new database will not be used until the current collection will be
stopped.

You can instantly switch between collections by using the alias feature.
  To do this, you would have collections named something like test201302
and test201402, then you would create an alias named 'test' that points
to one of these collections.  Your code can use 'test' as the collection
name.

Without a lot more information, it's impossible to say whether building
a new collection will cause performance problems for the existing
collection.

It does seem like a problem that rebuilding the index takes several
days.  You might already be having performance problems.  It's also
possible that there's an aspect to this that I am not seeing, and that
several days is perfectly normal for YOUR index.

Not enough RAM is the most common reason for performance issues on a
large index:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn







Help with SolrCloud exceptions while recovering

2014-11-08 Thread Bruno Osiek
Hi,

I am a newbie SolrCloud enthusiast. My goal is to implement an
infrastructure to enable text analysis (clustering, classification,
information extraction, sentiment analysis, etc).

My development environment consists of one machine, quad-core processor,
16GB RAM and 1TB HD.

Have started implementing Apache Flume, Twitter as source and SolrCloud
(within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
configuration and managing cluster.

The pseudo-distributed cluster consists of one collection, three shards
each with three replicas.

Everything runs smoothly for a while. After 50.000 tweets committed
(actually CloudSolrServer commits every batch consisting of 500 documents)
randomly SolrCloud starts logging exceptions: Lucene file not found,
IndexWriter cannot be opened, replication unsuccessful and the likes.
Recovery starts with no success until replica goes down.

Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1) with
same results.

I have looked everywhere for help before writing this email. My guess right
now is that the problem lies with SolrCloud and Zookeeper connection,
although haven't seen any such exception.

Any reference or help will be welcomed.

Cheers,
B.


Re: Help with SolrCloud exceptions while recovering

2014-11-09 Thread Bruno Osiek
Hi Erick,

Thank you very much for your reply.
I disabled client commit while setting commits at solconfig.xml as follows:

 
   ${solr.autoCommit.maxTime:30}
   false
 

 
   ${solr.autoSoftCommit.maxTime:6}
 

The picture changed for the better. No more index corruption, endless
replication trials and, up till now, 16 hours since start-up and more than
142k tweet downloaded, shards and replicas are "active".

One problem remains though. While auto committing Solr logs the following
stack-trace

00:00:40,383 ERROR [org.apache.solr.update.CommitTracker]
(commitScheduler-25-thread-1) auto commit
error...:org.apache.solr.common.SolrException: *Error opening new searcher*
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1550)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1662)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:603)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
*Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
_1.nvm*
at
org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252)
at
org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
at java.util.TimSort.sort(TimSort.java:203)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at
org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286)
at
org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2017)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1986)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:407)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:287)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1461)
... 10 more
*Caused by: java.io.FileNotFoundException: _1.nvm*
at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:260)
at
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
at
org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:141)
at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:513)
at
org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:242)
... 24 more

This file "_1.nvm" once existed. Was deleted during one auto commit , but
remains somewhere in a queue for deletion. I believe the consequence is
that at SolrCloud Admin UI -> Core Admin -> Stats, the "Current" status is
off for all shards' replica number 3. If I understand correctly this means
that changes to the index are not becoming visible.

Once again I tried to find possible reasons for that situation, but none of
the threads found seems to reflect my case.

My lock type is set to: ${solr.lock.type:single}. This
is due to lock.wait timeout error with both "native" and "simple" when
trying to create collection using the commands API. There is a thread
discussing this issue:

http://lucene.472066.n3.nabble.com/unable-to-load-core-after-cluster-restart-td4098731.html

The only thing is that "single" should only be used if "there is no
possibility of another process trying to modify the index" and I
cannot guarantee that. Could that be the cause of the file not found
exception?

Thanks once again for your help.

Regards,
Bruno.



2014-11-08 18:36 GMT-02:00 Erick Erickson :

> First. for tweets committing every 500 docs is much too frequent.
> Especially from the client and super-especially if you have multiple
> clients running. I'd recommend you just configure solrconfig this way
> as a place to start and do NOT commit from any clients.
> 1> a hard commit (openSearcher=false) every minute (or maybe 5 minutes)
> 2> a soft commit every minute
>
> This latter governs how long it'll be between when a doc is indexed and
> when
> can be searched.
>
> Here'

Re: Help with SolrCloud exceptions while recovering

2014-11-09 Thread Bruno Osiek
Erick,

Once again thank you very much for your attention.

Now my pseudo-distributed SolrCloud is configured with no inconsistency. An
additional problem was starting Jboss with "solr.data.dir" set to a path
not expected by Solr (actually it was not even underneath solr.home
directory).

This thread (
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3ccao8xr5zv8o-s6zn7ypaxpzpourqjknbsm59mbe6h3dpfykg...@mail.gmail.com%3E)
explains the inconsistency.

I found no need to change Solr data directory. After commenting this
property at Jboss' standalone.xml and setting
"${solr.lock.type:native}" everything started to work
properly.

Regards,
Bruno



2014-11-09 14:35 GMT-02:00 Erick Erickson :

> OK, we're _definitely_ in the speculative realm here, so don't think
> I know more than I do ;)...
>
> The next thing I'd try is to go back to "native" as the lock type on the
> theory that the lock type wasn't your problem, it was the too-frequent
> commits.
>
> bq: This file "_1.nvm" once existed. Was deleted during one auto commit ,
> but
> remains somewhere in a queue for deletion
>
> Assuming Unix, this is entirely expected. Searchers have all the files
> open. Commits
> do background merges, which may delete segments. So the current searcher
> may
> have the file open even though it's been "merged away". When the searcher
> closes, the file will actually truly disappear.
>
> It's more complicated on Windows but eventually that's what happens
>
> Anyway, keep us posted. If this continues to occur, please open a new
> thread,
> that might catch the eye of people who are deep into Lucene file locking...
>
> Best,
> Erick
>
> On Sun, Nov 9, 2014 at 6:45 AM, Bruno Osiek  wrote:
> > Hi Erick,
> >
> > Thank you very much for your reply.
> > I disabled client commit while setting commits at solconfig.xml as
> follows:
> >
> >  
> >${solr.autoCommit.maxTime:30}
> >false
> >  
> >
> >  
> >${solr.autoSoftCommit.maxTime:6}
> >  
> >
> > The picture changed for the better. No more index corruption, endless
> > replication trials and, up till now, 16 hours since start-up and more
> than
> > 142k tweet downloaded, shards and replicas are "active".
> >
> > One problem remains though. While auto committing Solr logs the following
> > stack-trace
> >
> > 00:00:40,383 ERROR [org.apache.solr.update.CommitTracker]
> > (commitScheduler-25-thread-1) auto commit
> > error...:org.apache.solr.common.SolrException: *Error opening new
> searcher*
> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1550)
> > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1662)
> > at
> >
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:603)
> > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> > *Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
> > _1.nvm*
> > at
> >
> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252)
> > at
> >
> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238)
> > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
> > at java.util.TimSort.sort(TimSort.java:203)
> > at java.util.TimSort.sort(TimSort.java:173)
> > at java.util.Arrays.sort(Arrays.java:659)
> > at java.util.Collections.sort(Collections.java:217)
> > at
> >
> org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286)
> > at
> >
> org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2017)
> > at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1986)
> > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:407)
> > at
> >
> org.apache

Request two databases at the same time ?

2015-01-09 Thread Bruno Mannina

Dear All,

I use Apache-SOLR3.6, on Ubuntu (newbie user).

I have a big database named BigDB1 with 90M documents,
each document contains several fields (docid, title, author, date, etc...)

I received today from another source, abstract of some documents (there
are also the same docid field in this source).
I don't want to modify my BigDB1 to update documents with abstract
because BigDB1 is always updated twice by week.

Do you think it's possible to create a new database named AbsDB1 and
request the both database at the same time ?
 if I do for example:
title:airplane AND abstract:plastic

I would like to obtain documents from BigDB1 and AbsDB1.

Many thanks for your help, information and others things that can help me.

Regards,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Request two databases at the same time ?

2015-01-09 Thread Bruno Mannina

Dear Erick,

thank you for your answer.

My answers are below.

Le 09/01/2015 20:43, Erick Erickson a écrit :

bq: I don't want to modify my BigDB1 to update documents with abstract
because BigDB1 is always updated twice by week.

Why not? Solr/Lucene handle updating docs, if a doc in the index has
the same , the old doc is deleted and the new one takes its
place. So why not just put the new abstracts into BigDB1? If you
re-index the docs later (your twice/week comment), then they'll be
overwritten. This will be much simpler than trying to maintain two.
I understand this process, I use it for other collections and twice time 
by week for BigDB1.
But, i.e. Doc1 is updated with Abstract on Monday. Tuesday I must update 
it with new data, then Abstract will be lost.
I can't check/get abstract before to re-insert it in the new doc because 
I receive several thousand docs every week (new and amend),

i think it will take a long time to do that.


But if you cannot update BigDB1 just fire off two queries and combine
them. Or specify the shards parameter on the URL pointing to both
collections. Do note, though, that the relevance calculations may not
be absolutely comparable, so mixing the results may show some
surprises...

Shards..I wilkl take a look to this, I don't know this param.
Concerning relevance, I don't really use it, so it won't be a problem I 
think.



Sincerely,


Best,
Erick

On Fri, Jan 9, 2015 at 9:12 AM, Bruno Mannina  wrote:

Dear All,

I use Apache-SOLR3.6, on Ubuntu (newbie user).

I have a big database named BigDB1 with 90M documents,
each document contains several fields (docid, title, author, date, etc...)

I received today from another source, abstract of some documents (there are
also the same docid field in this source).
I don't want to modify my BigDB1 to update documents with abstract because
BigDB1 is always updated twice by week.

Do you think it's possible to create a new database named AbsDB1 and request
the both database at the same time ?
  if I do for example:
title:airplane AND abstract:plastic

I would like to obtain documents from BigDB1 and AbsDB1.

Many thanks for your help, information and others things that can help me.

Regards,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



  1   2   3   >