date:20141103

AW: Spellchecking and suggesting part numbers

2014-11-03 Thread Lochschmied, Alexander

Thanks James, this did help a lot.

Is it possible to make DirectSolrSpellChecker try to return suggestions with 
maximum length of matching leading characters?

Alexander

-Ursprüngliche Nachricht-
Von: Dyer, James [mailto:james.d...@ingramcontent.com] 
Gesendet: Mittwoch, 24. September 2014 16:42
An: solr-user@lucene.apache.org
Betreff: RE: Spellchecking and suggesting part numbers

Alexander,

You could use a higher value for spellcheck.count, maybe 20 or so, then in your 
application pick out the suggestions that make changes on the right side.

Another option is to use DirectSolrSpellChecker (usually a better choice 
anyhow) and set the "minPrefix" field.  This will require up to n characters on 
the left side to match before it will make suggestions.  Taking a quick look at 
the code, it seems to me it won't try and correct anything in this prefix 
region also.  So perhaps you can set this to 2-4 (default=1).  See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29
 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Lochschmied, Alexander [mailto:alexander.lochschm...@vishay.com] 
Sent: Wednesday, September 24, 2014 9:06 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking and suggesting part numbers

Hello Solr Users,

we are trying to get suggestions for part numbers using the spellchecker.

Problem scenario:

ABCD1234 // This is the search term
ABCE1234 // This is what we get from spellchecker
ABCD1244 // This is what we would like to get from spellchecker

Characters towards the left of our part numbers are more relevant.


The setup is:



solr.IndexBasedSpellChecker
./spellchecker
did_you_mean_part




did_you_mean_part
on


spellcheck_part




















Can we tweak the setup such that we should get more relevant part numbers?

Thanks,
Alexander

RE: Ignoring Duplicates in Multivalue Field

2014-11-03 Thread Tomer Levi

Hi Ahmet,
When I add the RunUpdateProcessorFactory Solr didn't remove any duplications.
Any other idea?

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: Monday, November 03, 2014 1:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Ignoring Duplicates in Multivalue Field

Hi Tomer,

What happens when you addto your chain?

Ahmet

On Sunday, November 2, 2014 1:22 PM, Tomer Levi  wrote:

Hi,
I’m trying to make my “update” request handler ignore multivalue duplications 
in updates.
To make my use case clear, let’s assume my index already contains a document 
like:
{
   id:”100”, 
 “myMultValueField”: [“1”,”2”,”3”]
}

Later I would like to send an update like:
{
   id:”100”,” 
   myMultValueField” {“add”:”2”}
}

How can I make the update request handler understand that “2” already exist and 
ignore it?
I tried to add update chain below but it didn’t work for me.

 myMultValueField 

And add it to my requestHandler:

 uniq-fields

Tomer Levi 
Software Engineer  
Big Data Group 
Product & Technology Unit 
(T) +972 (9) 775-2693 

tomer.l...@nice.com  
www.nice.com

Re: SOLRJ - query with ChildDocTransformerFactory crash because of the javabin parser

2014-11-03 Thread andreic9203

Hello,

And to answer to my question...:D

Was a little, let's say mistake, in my query.
Instead of
fl=[child parentFilter="cat:PARENT" childFilter="cat:CHILD"]
should be
fl=id,[child parentFilter="cat:PARENT" childFilter="cat:CHILD"]

Awkward, because if you put the query and the filter in URL or in the solr
queries tool, both methods works very well. But when you want to make the
query from java, with Solrj, the library appends that parser section, and it
crashes if it cannot find parent fields in the results.

Instead of the id, you could put what field name do you want from the
parent.

Regards,
Andrei



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLRJ-query-with-ChildDocTransformerFactory-crash-because-of-the-javabin-parser-tp4167183p4167218.html
Sent from the Solr - User mailing list archive at Nabble.com.

dynamically change default update chain

2014-11-03 Thread Dmitry Kan

Hello solr fellows,

I'm working on a project that involves using two update chains. One default
chain is used most of the time and another one custom is used sporadically.

The default update chain is called automatically without action needed
(well, that's why it is default).

The custom pipeline can be switched on using update.chain http parameter,
like so:

[code]
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.setCommitWithin(1);
updateRequest.setParam("update.chain", "customupdatechain");
updateRequest.add(solrDoc);
updateRequest.process(solrServer);
[/code]

Now I have a new requirement: be able to install the custom chain as the
default update chain such that any client that is sending data in will get
it processed via the custom chain and not the default chain. And this
should happen seamlessly to the client, i.e. no parameter change needed.

Is this possible with the current state of the Solr core / collection api
or some other method?

-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

CFP: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Uwe Schindler

***Please forward this CFP to anyone who may be interested in participating.***

Hi,

Search has evolved to be much more than simply full-text search. We now rely on 
“search engines” for a wide variety of functionality:
search as navigation, search as analytics and backend for data visualization 
and sometimes, dare we say it, as a data store. The purpose of this dev room is 
to explore the new world of open source search engines: their enhanced 
functionality, new use cases, feature and architectural deep dives, and the 
position of search in relation to the wider set of software tools.

We welcome proposals from folks working with or on open source search engines 
(e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or 
technologies that heavily depend upon search (e.g.
NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in 
presentations on search algorithms, machine learning, real-world 
implementation/deployment stories and explorations of the future of search.

Talks should be 30-60 minutes in length, including time for Q&A.

You can submit your talks to us here:
https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform

Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We 
cannot guarantee we will have the opportunity to review submissions made after 
the deadline, so please submit early (and often)!

Should you have any questions, you can contact the Dev Room
organizers: opensourcesearch-devr...@lists.fosdem.org

Cheers,
LH on behalf of the Open Source Search Dev Room Program Committee*

* Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, 
Uwe Schindler

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/

Re: dynamically change default update chain

2014-11-03 Thread Michael Sokolov

Just to get the obvious sledgehammer solution out of the way - upload a 
new, edited solrconfig.xml with the default changed, and reload the core.


-Mike


On 11/3/14 6:28 AM, Dmitry Kan wrote:

Hello solr fellows,

I'm working on a project that involves using two update chains. One default
chain is used most of the time and another one custom is used sporadically.

The default update chain is called automatically without action needed
(well, that's why it is default).

The custom pipeline can be switched on using update.chain http parameter,
like so:

[code]
 UpdateRequest updateRequest = new UpdateRequest();
 updateRequest.setCommitWithin(1);
 updateRequest.setParam("update.chain", "customupdatechain");
 updateRequest.add(solrDoc);
 updateRequest.process(solrServer);
[/code]

Now I have a new requirement: be able to install the custom chain as the
default update chain such that any client that is sending data in will get
it processed via the custom chain and not the default chain. And this
should happen seamlessly to the client, i.e. no parameter change needed.

Is this possible with the current state of the Solr core / collection api
or some other method?

RE: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Uwe Schindler

Hi,

forgot to mention:
FOSDEM 2015 takes place in Brussels on January 31th and February 1st, 2015. See 
also: https://fosdem.org/2015/

I hope to see you there!
Uwe

> -Original Message-
> From: Uwe Schindler [mailto:uschind...@apache.org]
> Sent: Monday, November 03, 2014 1:29 PM
> To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr-
> u...@lucene.apache.org; gene...@lucene.apache.org
> Subject: CFP: FOSDEM 2015 - Open Source Search Dev Room
> 
> ***Please forward this CFP to anyone who may be interested in
> participating.***
> 
> Hi,
> 
> Search has evolved to be much more than simply full-text search. We now
> rely on “search engines” for a wide variety of functionality:
> search as navigation, search as analytics and backend for data visualization
> and sometimes, dare we say it, as a data store. The purpose of this dev room
> is to explore the new world of open source search engines: their enhanced
> functionality, new use cases, feature and architectural deep dives, and the
> position of search in relation to the wider set of software tools.
> 
> We welcome proposals from folks working with or on open source search
> engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.)
> or technologies that heavily depend upon search (e.g.
> NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in
> presentations on search algorithms, machine learning, real-world
> implementation/deployment stories and explorations of the future of
> search.
> 
> Talks should be 30-60 minutes in length, including time for Q&A.
> 
> You can submit your talks to us here:
> https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3
> 8G0OxSfp84A/viewform
> 
> Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We
> cannot guarantee we will have the opportunity to review submissions made
> after the deadline, so please submit early (and often)!
> 
> Should you have any questions, you can contact the Dev Room
> organizers: opensourcesearch-devr...@lists.fosdem.org
> 
> Cheers,
> LH on behalf of the Open Source Search Dev Room Program Committee*
> 
> * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten
> Curdt, Uwe Schindler
> 
> -
> Uwe Schindler
> uschind...@apache.org
> Apache Lucene PMC Member / Committer
> Bremen, Germany
> http://lucene.apache.org/
> 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org

order of updates

2014-11-03 Thread Matteo Grolla

HI,
can anybody give me a confirm?
If I add multiple document with the same id but differing on other fields and 
then issue a commit (no commits before this) the last added document gets 
indexed, right?
H.p.
using solr 4 and default settings for optimistic locking.

Matteo

Re: dynamically change default update chain

2014-11-03 Thread Dmitry Kan

Thanks, Mike,

we have discussed something similar with steffkes on IRC today, where I
said: "some programmatic convenience would be great of course. But I could
in principle imagine having two versions of solrconfig.xml and swapping
them followed by a core reload. It just sounds a bit scary to me."

But now, after pondering a bit more over it, I start to get inclined
towards "fiddling with sending dummy documents with certain fields that
will tell the update component to either call another update component or
proceed normally"

another = custom updater
normally = default updater

More ideas are of course welcome!

Dmitry

On Mon, Nov 3, 2014 at 2:41 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

> Just to get the obvious sledgehammer solution out of the way - upload a
> new, edited solrconfig.xml with the default changed, and reload the core.
>
> -Mike
>
>
>
> On 11/3/14 6:28 AM, Dmitry Kan wrote:
>
>> Hello solr fellows,
>>
>> I'm working on a project that involves using two update chains. One
>> default
>> chain is used most of the time and another one custom is used
>> sporadically.
>>
>> The default update chain is called automatically without action needed
>> (well, that's why it is default).
>>
>> The custom pipeline can be switched on using update.chain http parameter,
>> like so:
>>
>> [code]
>>  UpdateRequest updateRequest = new UpdateRequest();
>>  updateRequest.setCommitWithin(1);
>>  updateRequest.setParam("update.chain", "customupdatechain");
>>  updateRequest.add(solrDoc);
>>  updateRequest.process(solrServer);
>> [/code]
>>
>> Now I have a new requirement: be able to install the custom chain as the
>> default update chain such that any client that is sending data in will get
>> it processed via the custom chain and not the default chain. And this
>> should happen seamlessly to the client, i.e. no parameter change needed.
>>
>> Is this possible with the current state of the Solr core / collection api
>> or some other method?
>>
>>
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: SolrCloud use of "min_rf" through SolrJ

2014-11-03 Thread Zisis Tachtsidis

In case anyone else runs into this, I've managed to make it work. I didn't
notice in the ticket discussion that the specific feature is enabled when
min_rf >=2, I was setting min_rf=1. It goes without saying that you should
also have at least 2 replicas in your SolrCloud configuration. The actual
code I've used to make it return "rf" is

UpdateRequest req = new UpdateRequest();
req.setParam(UpdateRequest.MIN_REPFACT, "2");
req.add(doc);
NamedList response = solrServer.request(req);





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-use-of-min-rf-through-SolrJ-tp4164966p4167250.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Ignoring Duplicates in Multivalue Field

2014-11-03 Thread Jack Krupansky

The update processors are only processing the values in the "source" data, 
not the data that has already been indexed and stored.


We probably need to file a Jira to add an "insert" field value option that 
merges in the new field value, skipping it if it already exists or appending 
it to the end of the existing list of field values for a multivalued field.


You could try... a combination of both "remove" and "add", assuming that 
Solr applies them in the order specified, to remove any existing value and 
then add it to the end.


See:
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

-- Jack Krupansky

-Original Message- 
From: Tomer Levi

Sent: Monday, November 3, 2014 4:19 AM
To: solr-user@lucene.apache.org ; Ahmet Arslan
Subject: RE: Ignoring Duplicates in Multivalue Field

Hi Ahmet,
When I add the RunUpdateProcessorFactory Solr didn't remove any 
duplications.

Any other idea?


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: Monday, November 03, 2014 1:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Ignoring Duplicates in Multivalue Field

Hi Tomer,

What happens when you add   class="solr.RunUpdateProcessorFactory" /> to your chain?


Ahmet



On Sunday, November 2, 2014 1:22 PM, Tomer Levi  wrote:



Hi,
I’m trying to make my “update” request handler ignore multivalue 
duplications in updates.
To make my use case clear, let’s assume my index already contains a document 
like:

{
  id:”100”,
“myMultValueField”: [“1”,”2”,”3”]
}

Later I would like to send an update like:
{
  id:”100”,”
  myMultValueField” {“add”:”2”}
}

How can I make the update request handler understand that “2” already exist 
and ignore it?

I tried to add update chain below but it didn’t work for me.


  
myMultValueField 


 
  

And add it to my requestHandler:

  
uniq-fields
  
   

Tomer Levi
Software Engineer
Big Data Group
Product & Technology Unit
(T) +972 (9) 775-2693

tomer.l...@nice.com
www.nice.com

Re: dynamically change default update chain

2014-11-03 Thread Dmitry Kan

An update:

Another idea comes from Erick Hatcher; sharing it for the benefit of anyone
who's interested in the topic:

 maybe you can make a custom request handler that toggles
which is the default chain?



On Mon, Nov 3, 2014 at 4:08 PM, Dmitry Kan  wrote:

> Thanks, Mike,
>
> we have discussed something similar with steffkes on IRC today, where I
> said: "some programmatic convenience would be great of course. But I
> could in principle imagine having two versions of solrconfig.xml and
> swapping them followed by a core reload. It just sounds a bit scary to me.
> "
>
> But now, after pondering a bit more over it, I start to get inclined
> towards "fiddling with sending dummy documents with certain fields that
> will tell the update component to either call another update component or
> proceed normally"
>
> another = custom updater
> normally = default updater
>
> More ideas are of course welcome!
>
> Dmitry
>
> On Mon, Nov 3, 2014 at 2:41 PM, Michael Sokolov <
> msoko...@safaribooksonline.com> wrote:
>
>> Just to get the obvious sledgehammer solution out of the way - upload a
>> new, edited solrconfig.xml with the default changed, and reload the core.
>>
>> -Mike
>>
>>
>>
>> On 11/3/14 6:28 AM, Dmitry Kan wrote:
>>
>>> Hello solr fellows,
>>>
>>> I'm working on a project that involves using two update chains. One
>>> default
>>> chain is used most of the time and another one custom is used
>>> sporadically.
>>>
>>> The default update chain is called automatically without action needed
>>> (well, that's why it is default).
>>>
>>> The custom pipeline can be switched on using update.chain http parameter,
>>> like so:
>>>
>>> [code]
>>>  UpdateRequest updateRequest = new UpdateRequest();
>>>  updateRequest.setCommitWithin(1);
>>>  updateRequest.setParam("update.chain", "customupdatechain");
>>>  updateRequest.add(solrDoc);
>>>  updateRequest.process(solrServer);
>>> [/code]
>>>
>>> Now I have a new requirement: be able to install the custom chain as the
>>> default update chain such that any client that is sending data in will
>>> get
>>> it processed via the custom chain and not the default chain. And this
>>> should happen seamlessly to the client, i.e. no parameter change needed.
>>>
>>> Is this possible with the current state of the Solr core / collection api
>>> or some other method?
>>>
>>>
>>
>
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: order of updates

2014-11-03 Thread Yonik Seeley

On Mon, Nov 3, 2014 at 8:53 AM, Matteo Grolla  wrote:
> HI,
> can anybody give me a confirm?
> If I add multiple document with the same id but differing on other fields and 
> then issue a commit (no commits before this) the last added document gets 
> indexed, right?

Correct.

> using solr 4 and default settings for optimistic locking.

If you haven't seen it, I did an example of that a while back:

http://heliosearch.org/solr/optimistic-concurrency/

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data

Re: Ignoring Duplicates in Multivalue Field

2014-11-03 Thread Matthew Nigl

>From memory, if you use UniqFieldsUpdateProcessor after
DistributedUpdateProcessor, then you will be filtering on the set ["1",
"2", "3", "2"].

**

   * *

*   *

* myMultValueField
*

*  *

  **

**

On 4 November 2014 01:37, Jack Krupansky  wrote:

> The update processors are only processing the values in the "source" data,
> not the data that has already been indexed and stored.
>
> We probably need to file a Jira to add an "insert" field value option that
> merges in the new field value, skipping it if it already exists or
> appending it to the end of the existing list of field values for a
> multivalued field.
>
> You could try... a combination of both "remove" and "add", assuming that
> Solr applies them in the order specified, to remove any existing value and
> then add it to the end.
>
> See:
> https://cwiki.apache.org/confluence/display/solr/
> Updating+Parts+of+Documents
>
> -- Jack Krupansky
>
> -Original Message- From: Tomer Levi
> Sent: Monday, November 3, 2014 4:19 AM
> To: solr-user@lucene.apache.org ; Ahmet Arslan
> Subject: RE: Ignoring Duplicates in Multivalue Field
>
>
> Hi Ahmet,
> When I add the RunUpdateProcessorFactory Solr didn't remove any
> duplications.
> Any other idea?
>
>
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
> Sent: Monday, November 03, 2014 1:35 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Ignoring Duplicates in Multivalue Field
>
> Hi Tomer,
>
> What happens when you add/> to your chain?
>
> Ahmet
>
>
>
> On Sunday, November 2, 2014 1:22 PM, Tomer Levi 
> wrote:
>
>
>
> Hi,
> I’m trying to make my “update” request handler ignore multivalue
> duplications in updates.
> To make my use case clear, let’s assume my index already contains a
> document like:
> {
>   id:”100”,
> “myMultValueField”: [“1”,”2”,”3”]
> }
>
> Later I would like to send an update like:
> {
>   id:”100”,”
>   myMultValueField” {“add”:”2”}
> }
>
> How can I make the update request handler understand that “2” already
> exist and ignore it?
> I tried to add update chain below but it didn’t work for me.
>
> 
>   
> myMultValueField
> 
>  
>   
>
> And add it to my requestHandler:
> 
>   
> uniq-fields
>   
>
>
> Tomer Levi
> Software Engineer
> Big Data Group
> Product & Technology Unit
> (T) +972 (9) 775-2693
>
> tomer.l...@nice.com
> www.nice.com
>

Re: SOLRJ - query with ChildDocTransformerFactory crash because of the javabin parser

2014-11-03 Thread Erick Erickson

bq: Was a little, let's say mistake, in my query

Been there, done that ;) Thanks for closing this out.

Best,
Erick

On Mon, Nov 3, 2014 at 3:25 AM, andreic9203  wrote:
> Hello,
>
> And to answer to my question...:D
>
> Was a little, let's say mistake, in my query.
> Instead of
> fl=[child parentFilter="cat:PARENT" childFilter="cat:CHILD"]
> should be
> fl=id,[child parentFilter="cat:PARENT" childFilter="cat:CHILD"]
>
> Awkward, because if you put the query and the filter in URL or in the solr
> queries tool, both methods works very well. But when you want to make the
> query from java, with Solrj, the library appends that parser section, and it
> crashes if it cannot find parent fields in the results.
>
> Instead of the id, you could put what field name do you want from the
> parent.
>
> Regards,
> Andrei
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SOLRJ-query-with-ChildDocTransformerFactory-crash-because-of-the-javabin-parser-tp4167183p4167218.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Solr slow start up (tlog is small)

2014-11-03 Thread Po-Yu Chuang

Hi,

I am using Solr 4.9 with Tomcat and it works fine except that the
deployment of solr.war is too long. While deploying Solr, all webapps on
Tomcat stop responding which is unacceptable. Most articles I found say
that it might result from big transaction log because of uncommitted
documents, but this is not my case.

At first, the Solr data is 280G and the start up time is 30 minutes. Then I
set a field to stored="false" and re-index whole data. The data size became
185G and the start up time reduced to 17 minutes, but it is still too long.

Here are some numbers I measured:

1)
Solr home: 280G
tlog: 500K
30 min to start up
While starting up, disk read is constantly about 50MB/s (according to
dstat). So it seems that Solr reads 30m * 60s * 50MB/s = 90GB of data while
starting up, which is 30% of index data size.

2)
Solr home: 185G
tlog: 5M
17 minutes to start up
While starting up, disk read is constantly about 5MB/s (according to
dstat). So it seems that Solr reads 17m * 60s *5MB/s = 5GB of data while
starting up, which is about 3% of index data size.

p.s. I did commit each time 1000 documents being added and did optimization
after all documents are added.

Any ideas or suggestions would be appreciated.

Thanks,
Po-Yu

Solr slow startup

2014-11-03 Thread Michal Krajňanský

Dear All,


Sorry for the possibly newbie question as I have only recently started
experimenting with Solr and Solrcloud.


I am trying to import an index originally created with Lucene 2.x so Solr
4.10. What I did was:

1. upgrade index to version 3.x with IndexUpgrader
2. upgrade index to version 4.x with IndexUpgrader
3. created schema for Solr and used the default solrconfig (with some paths
changes)
4. succesfully started Solr

The sizes I am speaking about are in tens of gigabytes and the startup
times are 5~10 minutes.


I have read here:
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCMQFjAB&url=https%3A%2F%2Fwiki.apache.org%2Fsolr%2FSolrPerformanceProblems&ei=AKNXVL7ULbGR7Abp7IDYCA&usg=AFQjCNEtw2Zma8ST3JLGL3xw6nG2G_0YuA&sig2=HmM8R1VYuVtXv8lQHsHPJQ&bvm=bv.78597519,bs.1,d.dGY&cad=rja
that it has possibly something to do with the updateHandler and enabled the
autoCommit as suggested, however with no improvement.

Such a long startup time feels odd when Lucene itself seems to load the
same indexes in no time.

I would very much appreciate any help with this issue.


Best,


Michal Krajnansky

Re: Solr slow start up (tlog is small)

2014-11-03 Thread Yonik Seeley

Can you tell from the logs what Solr is doing during that time?
Do you have any warming queries configured?
Also see this: https://issues.apache.org/jira/browse/SOLR-6679
  (comment out suggester related stuff if you aren't using it)

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


On Mon, Nov 3, 2014 at 11:03 AM, Po-Yu Chuang  wrote:
> Hi,
>
> I am using Solr 4.9 with Tomcat and it works fine except that the
> deployment of solr.war is too long. While deploying Solr, all webapps on
> Tomcat stop responding which is unacceptable. Most articles I found say
> that it might result from big transaction log because of uncommitted
> documents, but this is not my case.
>
> At first, the Solr data is 280G and the start up time is 30 minutes. Then I
> set a field to stored="false" and re-index whole data. The data size became
> 185G and the start up time reduced to 17 minutes, but it is still too long.
>
> Here are some numbers I measured:
>
> 1)
> Solr home: 280G
> tlog: 500K
> 30 min to start up
> While starting up, disk read is constantly about 50MB/s (according to
> dstat). So it seems that Solr reads 30m * 60s * 50MB/s = 90GB of data while
> starting up, which is 30% of index data size.
>
> 2)
> Solr home: 185G
> tlog: 5M
> 17 minutes to start up
> While starting up, disk read is constantly about 5MB/s (according to
> dstat). So it seems that Solr reads 17m * 60s *5MB/s = 5GB of data while
> starting up, which is about 3% of index data size.
>
> p.s. I did commit each time 1000 documents being added and did optimization
> after all documents are added.
>
> Any ideas or suggestions would be appreciated.
>
> Thanks,
> Po-Yu

Re: Solr slow startup

2014-11-03 Thread Yonik Seeley

One possible cause of a slow startup with the default configs:
https://issues.apache.org/jira/browse/SOLR-6679

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


On Mon, Nov 3, 2014 at 11:05 AM, Michal Krajňanský
 wrote:
> Dear All,
>
>
> Sorry for the possibly newbie question as I have only recently started
> experimenting with Solr and Solrcloud.
>
>
> I am trying to import an index originally created with Lucene 2.x so Solr
> 4.10. What I did was:
>
> 1. upgrade index to version 3.x with IndexUpgrader
> 2. upgrade index to version 4.x with IndexUpgrader
> 3. created schema for Solr and used the default solrconfig (with some paths
> changes)
> 4. succesfully started Solr
>
> The sizes I am speaking about are in tens of gigabytes and the startup
> times are 5~10 minutes.
>
>
> I have read here:
> https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCMQFjAB&url=https%3A%2F%2Fwiki.apache.org%2Fsolr%2FSolrPerformanceProblems&ei=AKNXVL7ULbGR7Abp7IDYCA&usg=AFQjCNEtw2Zma8ST3JLGL3xw6nG2G_0YuA&sig2=HmM8R1VYuVtXv8lQHsHPJQ&bvm=bv.78597519,bs.1,d.dGY&cad=rja
> that it has possibly something to do with the updateHandler and enabled the
> autoCommit as suggested, however with no improvement.
>
> Such a long startup time feels odd when Lucene itself seems to load the
> same indexes in no time.
>
> I would very much appreciate any help with this issue.
>
>
> Best,
>
>
> Michal Krajnansky

Re: Solr slow startup

2014-11-03 Thread Michal Krajňanský

Hey Yonik,

That (getting rid of the suggester) solved the issue! You saved me a lot of
time and nerves.

Best,

Michal

2014-11-03 17:19 GMT+01:00 Yonik Seeley :

> One possible cause of a slow startup with the default configs:
> https://issues.apache.org/jira/browse/SOLR-6679
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>
>
> On Mon, Nov 3, 2014 at 11:05 AM, Michal Krajňanský
>  wrote:
> > Dear All,
> >
> >
> > Sorry for the possibly newbie question as I have only recently started
> > experimenting with Solr and Solrcloud.
> >
> >
> > I am trying to import an index originally created with Lucene 2.x so Solr
> > 4.10. What I did was:
> >
> > 1. upgrade index to version 3.x with IndexUpgrader
> > 2. upgrade index to version 4.x with IndexUpgrader
> > 3. created schema for Solr and used the default solrconfig (with some
> paths
> > changes)
> > 4. succesfully started Solr
> >
> > The sizes I am speaking about are in tens of gigabytes and the startup
> > times are 5~10 minutes.
> >
> >
> > I have read here:
> >
> https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCMQFjAB&url=https%3A%2F%2Fwiki.apache.org%2Fsolr%2FSolrPerformanceProblems&ei=AKNXVL7ULbGR7Abp7IDYCA&usg=AFQjCNEtw2Zma8ST3JLGL3xw6nG2G_0YuA&sig2=HmM8R1VYuVtXv8lQHsHPJQ&bvm=bv.78597519,bs.1,d.dGY&cad=rja
> > that it has possibly something to do with the updateHandler and enabled
> the
> > autoCommit as suggested, however with no improvement.
> >
> > Such a long startup time feels odd when Lucene itself seems to load the
> > same indexes in no time.
> >
> > I would very much appreciate any help with this issue.
> >
> >
> > Best,
> >
> >
> > Michal Krajnansky
>

Re: order of updates

2014-11-03 Thread Matteo Grolla

Thanks really a lot Yonik!

Il giorno 03/nov/2014, alle ore 15:51, Yonik Seeley ha scritto:

> On Mon, Nov 3, 2014 at 8:53 AM, Matteo Grolla  wrote:
>> HI,
>>can anybody give me a confirm?
>> If I add multiple document with the same id but differing on other fields 
>> and then issue a commit (no commits before this) the last added document 
>> gets indexed, right?
> 
> Correct.
> 
>> using solr 4 and default settings for optimistic locking.
> 
> If you haven't seen it, I did an example of that a while back:
> 
> http://heliosearch.org/solr/optimistic-concurrency/
> 
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data

Cannot use Phrase Queries in eDisMax and filtering

2014-11-03 Thread Tim Hearn

I am writing a search bar application with Solr which I'd like to have the
following two features:

phrase matching for user queries - results which match user phrase are
boosted.

Field faceting based on 'tags' field.

When I execute this query:

q=steve jobs&
fq=storeid:527bd613e4b0564cc755460a&
sort=score desc&
start=50&
rows=2&
fl=*,score&
qt=/query&
defType=edismax&
pf=concept_name^15 note_text^5 file_text^2.5&
pf3=1&
pf2=1&
ps=1&
group=true&
group.field=conceptid&
group.limit=10&
group.ngroups=true

The phrase boosting feature operates correctly and boosts results which
closer match the phrase query "Steve Jobs".  As an example, the concept
with concept_name="Steve Jobs" has a score of ~3.96 in the results of this
query.

However, when I execute the query after the user has selected a facet field
(The facet fields are bought up from a seperate query) and execute the
following query:

q=steve jobs&
fq=storeid:527bd613e4b0564cc755460a&
fq=tag:Person&
sort=score desc&
start=0&
rows=50&
fl=*,score&
qt=/query&
defType=edismax&
pf=concept_name^15 note_text^5 file_text^2.5&
pf3=1&
pf2=1&
ps=1&
group=true&
group.field=conceptid&
group.limit=10&
group.ngroups=true

The phrase boosting does not work, even though the facet filtering does.
The concept with concept_name="Steve Jobs" has a score of ~0.2 in the
results of this query.

I'm not sure if this is a bug, but if it is not can someone point me to the
relevant documentation that will help me fix this issue? All queries were
written using the SolrJ Library.  I also tried searching the string "Steve
Jobs" and it returned the correct results (The with concept_name "Steve
Jobs" was returned highest)

Re: Solr slow start up (tlog is small)

2014-11-03 Thread Dmitry Kan

One other reason for a slow start-up can be large number of segments in the
index. Which I'm guessing is not the case since you optimized? But anyway,
what's the number of segments in both 280G and 185G indices?

Dmitry

On Mon, Nov 3, 2014 at 6:17 PM, Yonik Seeley  wrote:

> Can you tell from the logs what Solr is doing during that time?
> Do you have any warming queries configured?
> Also see this: https://issues.apache.org/jira/browse/SOLR-6679
>   (comment out suggester related stuff if you aren't using it)
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>
>
> On Mon, Nov 3, 2014 at 11:03 AM, Po-Yu Chuang 
> wrote:
> > Hi,
> >
> > I am using Solr 4.9 with Tomcat and it works fine except that the
> > deployment of solr.war is too long. While deploying Solr, all webapps on
> > Tomcat stop responding which is unacceptable. Most articles I found say
> > that it might result from big transaction log because of uncommitted
> > documents, but this is not my case.
> >
> > At first, the Solr data is 280G and the start up time is 30 minutes.
> Then I
> > set a field to stored="false" and re-index whole data. The data size
> became
> > 185G and the start up time reduced to 17 minutes, but it is still too
> long.
> >
> > Here are some numbers I measured:
> >
> > 1)
> > Solr home: 280G
> > tlog: 500K
> > 30 min to start up
> > While starting up, disk read is constantly about 50MB/s (according to
> > dstat). So it seems that Solr reads 30m * 60s * 50MB/s = 90GB of data
> while
> > starting up, which is 30% of index data size.
> >
> > 2)
> > Solr home: 185G
> > tlog: 5M
> > 17 minutes to start up
> > While starting up, disk read is constantly about 5MB/s (according to
> > dstat). So it seems that Solr reads 17m * 60s *5MB/s = 5GB of data while
> > starting up, which is about 3% of index data size.
> >
> > p.s. I did commit each time 1000 documents being added and did
> optimization
> > after all documents are added.
> >
> > Any ideas or suggestions would be appreciated.
> >
> > Thanks,
> > Po-Yu
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Data colocation hint to solr index

2014-11-03 Thread maninder batth

Hi,
In my company, we serve car manuals for different car manufacturers with
their various makes and models. Typically, the search is always done within
context of a car manufacturer, year, make and model. Is there a way in Solr
to create indexes based on this criteria? Currently, the index contains all
manufactueres, makes and models. This causes index to go over a terabyte.
Hence, if we could teach solr to co-locate all the data for a particular
manufactuere, make and model, that would be an ideal thing to do.
I was wondering if this is possible?

Regards,
Jim

Re: Data colocation hint to solr index

2014-11-03 Thread Shalin Shekhar Mangar

If you're using SolrCloud then you can use composite IDs such as
!doc-id to co-locate documents belonging to a manufacturer together
and at query time, you can add _route_=! to the request to route it
to the correct node.

On Mon, Nov 3, 2014 at 11:00 PM, maninder batth 
wrote:

> Hi,
> In my company, we serve car manuals for different car manufacturers with
> their various makes and models. Typically, the search is always done within
> context of a car manufacturer, year, make and model. Is there a way in Solr
> to create indexes based on this criteria? Currently, the index contains all
> manufactueres, makes and models. This causes index to go over a terabyte.
> Hence, if we could teach solr to co-locate all the data for a particular
> manufactuere, make and model, that would be an ideal thing to do.
> I was wondering if this is possible?
>
> Regards,
> Jim
>

-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr slow start up (tlog is small)

2014-11-03 Thread Po-Yu Chuang

Hi Yonik,

After removing the suggest component, it takes only 7 seconds to start up
now!!! Thank you so much.

Po-Yu

On Mon, Nov 3, 2014 at 11:17 AM, Yonik Seeley  wrote:

> Can you tell from the logs what Solr is doing during that time?
> Do you have any warming queries configured?
> Also see this: https://issues.apache.org/jira/browse/SOLR-6679
>   (comment out suggester related stuff if you aren't using it)
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>
>
> On Mon, Nov 3, 2014 at 11:03 AM, Po-Yu Chuang 
> wrote:
> > Hi,
> >
> > I am using Solr 4.9 with Tomcat and it works fine except that the
> > deployment of solr.war is too long. While deploying Solr, all webapps on
> > Tomcat stop responding which is unacceptable. Most articles I found say
> > that it might result from big transaction log because of uncommitted
> > documents, but this is not my case.
> >
> > At first, the Solr data is 280G and the start up time is 30 minutes.
> Then I
> > set a field to stored="false" and re-index whole data. The data size
> became
> > 185G and the start up time reduced to 17 minutes, but it is still too
> long.
> >
> > Here are some numbers I measured:
> >
> > 1)
> > Solr home: 280G
> > tlog: 500K
> > 30 min to start up
> > While starting up, disk read is constantly about 50MB/s (according to
> > dstat). So it seems that Solr reads 30m * 60s * 50MB/s = 90GB of data
> while
> > starting up, which is 30% of index data size.
> >
> > 2)
> > Solr home: 185G
> > tlog: 5M
> > 17 minutes to start up
> > While starting up, disk read is constantly about 5MB/s (according to
> > dstat). So it seems that Solr reads 17m * 60s *5MB/s = 5GB of data while
> > starting up, which is about 3% of index data size.
> >
> > p.s. I did commit each time 1000 documents being added and did
> optimization
> > after all documents are added.
> >
> > Any ideas or suggestions would be appreciated.
> >
> > Thanks,
> > Po-Yu
>

Re: Solr error : sorry, no dataimport-handler defined!

2014-11-03 Thread Tim Dunphy

Hi Alexandre,

 Thanks so much for your input and examples! Ok so here's what I've done so
far with no luck as of yet unfortunately.

  Inside of solrconfig.xml I put the following:

  **

As you can see, I've replaced the relative paths with absolute ones. So as
of now, my solr 4 server is no longer complaining about not being able to
find directories and modules. So we're off to a good start! And now I can
list the 'dist' directory and in my case find the jar files I'm looking for.

[root@solr1:/opt/solr/collection1/conf] #ls /opt/solr/dist/ | grep
dataimporthandler
*solr-dataimporthandler-4.10.1.jar*
*solr-dataimporthandler-extras-4.10.1.jar*

So far so good.

I next tried this db-data-config file in the same directory as
solrconfig.xml

[root@solr1:/opt/solr/collection1/conf] #cat db-data-config.xml.bak

Restarted tomcat, and with this setup I wasn't getting any errors in the
browser or logs and the web interface was still working. Always a good sign!

So then I went down to Core Selector -> collection1 -> data import. And it
was quite frustrating, cuz I was getting the same error as before!

 *sorry, no dataimport-handler defined!*

So then I tried the exact db-data-config.xml file from your example.

Knowing full well it wouldn't actually work, as I"m using a remote mysql
database instead of a local hsqldb database.But at this point, my only goal
was to get the data import to show up as an option. I'd tweak the
db-data-config.xml file at a later point if this in fact worked!

But alas, I was still getting the same result...

*sorry, no dataimport-handler defined!*

G.. so annoying after all that work. Anyway, I really do appreciate
your kindness and help. :) I'm enclosing my solrconfig.xml and both
versions of my db-data-config.xml in hopes that we can make some progress
here!

Thank

Tim

On Sun, Nov 2, 2014 at 9:50 PM, Alexandre Rafalovitch 
wrote:

> That tutorial seems to be somewhat dodgy. You need at least one more
> step of adding DIH library in solrconfig.xml:
>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_2/solr/example/example-DIH/solr/db/conf/solrconfig.xml#L75
> (I recommend using absolute path though).
>
> Also, you should not need to spell the full class out. See lower down
> in the same class:
>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_2/solr/example/example-DIH/solr/db/conf/solrconfig.xml#L823
>
> Finally, in the config file, I don't remember document element having
> a name. Again, the working example can be found in the same directory:
>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_2/solr/example/example-DIH/solr/db/conf/db-data-config.xml#L3
>
> Solr ships with a bunch of examples. If you are using/download
> standard distribution, you could start from those until you understand
> how it all hangs together.
>
> Regards,
>Alex.
>
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 2 November 2014 21:26, Tim Dunphy  wrote:
> > Hi Alex,
> >
> >
> >> I thought the ""
> >> and the ending span were broken email thing but they seem to be in the
> >> solrconfig.xml file as well. I would start from removing those and
> >> leaving just the actual definition.
> >
> >
> > Thanks for your response!
> >
> > OK so I tried your suggestion of removing those span tags like so:
> >
>

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

  4.10.1

  ${solr.data.dir:}

${solr.hdfs.home:}

${solr.hdfs.confdir:}

${solr.hdfs.blockcache.enabled:true}

${solr.hdfs.blockcache.global:true}

${solr.lock.type:native}

 true

 false

  ${solr.ulog.dir:}

   ${solr.autoCommit.maxTime:15000} 
   false 

   ${solr.autoSoftCommit.maxTime:-1} 

1024

true

   20

   200

  static firstSearcher warmi

Re: Cannot use Phrase Queries in eDisMax and filtering

2014-11-03 Thread Ramzi Alqrainy

The results are different, because you need to set "start" parameter 0
instead of 50 in the first query (after filtration ) with same rows value 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cannot-use-Phrase-Queries-in-eDisMax-and-filtering-tp4167302p4167329.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr error : sorry, no dataimport-handler defined!

2014-11-03 Thread Alexandre Rafalovitch

Two problems:
1) You have  (span) elements in your solrconfig.xml. They just
do not belong there. The original tutorial screwed up. Your element
should be on the same level as the other elements in that example.
2) You also seem to have another random piece of data configuration in
the solrconfig.xml. Also in the spans, so they are being ignored. But
still very very wrong. Take those out all together.

You should just have 3 things tying together:
1) jars loaded in the lib statement in solrconfig.xml
2) handler definition that points at your data-config file
3) data-config file itself.

If you are still having troubles, I strongly recommend getting the
shipped example to work and then adding your own stuff until you get
that working. Then, try to create a standalone configuration.
Sometimes, this is an easier approach for the first time user.

Regards,
   Alex.
P.s. I also cover that in my Solr book. A relevant example is here:
https://github.com/arafalov/solr-indexing-book/tree/master/published/dihdb
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 3 November 2014 13:40, Tim Dunphy  wrote:
> Hi Alexandre,
>
>  Thanks so much for your input and examples! Ok so here's what I've done so
> far with no luck as of yet unfortunately.
>
>   Inside of solrconfig.xml I put the following:
>
>   
>   
>   
>
>   
>   
>   
>   
>
>   
>   
>
> As you can see, I've replaced the relative paths with absolute ones. So as
> of now, my solr 4 server is no longer complaining about not being able to
> find directories and modules. So we're off to a good start! And now I can
> list the 'dist' directory and in my case find the jar files I'm looking for.
>
>
> [root@solr1:/opt/solr/collection1/conf] #ls /opt/solr/dist/ | grep
> dataimporthandler
> solr-dataimporthandler-4.10.1.jar
> solr-dataimporthandler-extras-4.10.1.jar
>
> So far so good.
>
> I next tried this db-data-config file in the same directory as
> solrconfig.xml
>
> [root@solr1:/opt/solr/collection1/conf] #cat db-data-config.xml.bak
> 
>
> 
>
>  url="jdbc:mysql://web1.mydomain.com:3306/jokefire" user="admin"
> password="secret" batchSize="1" />
> 
>
> 
> 
> 
> 
> 
> 
> 
>  />
> 
> 
> 
>
> 
> 
> 
>
> Restarted tomcat, and with this setup I wasn't getting any errors in the
> browser or logs and the web interface was still working. Always a good sign!
>
> So then I went down to Core Selector -> collection1 -> data import. And it
> was quite frustrating, cuz I was getting the same error as before!
>
>  sorry, no dataimport-handler defined!
>
> So then I tried the exact db-data-config.xml file from your example.
>
> 
>  url="jdbc:hsqldb:./example-DIH/hsqldb/ex" user="sa" />
> 
>  deltaQuery="select id from item where last_modified >
> '${dataimporter.last_index_time}'">
> 
>
>  query="select DESCRIPTION from FEATURE where
> ITEM_ID='${item.ID}'"
> deltaQuery="select ITEM_ID from FEATURE where
> last_modified > '${dataimporter.last_index_time}'"
> parentDeltaQuery="select ID from item where
> ID=${feature.ITEM_ID}">
> 
> 
>
>  query="select CATEGORY_ID from item_category where
> ITEM_ID='${item.ID}'"
> deltaQuery="select ITEM_ID, CATEGORY_ID from
> item_category where last_modified > '${dataimporter.last_index_time}'"
> parentDeltaQuery="select ID from item where
> ID=${item_category.ITEM_ID}">
>  query="select DESCRIPTION from category where ID =
> '${item_category.CATEGORY_ID}'"
> deltaQuery="select ID from category where
> last_modified > '${dataimporter.last_index_time}'"
> parentDeltaQuery="select ITEM_ID, CATEGORY_ID from
> item_category where CATEGORY_ID=${category.ID}">
> 
> 
> 
> 
> 
> 
>
> Knowing full well it wouldn't actually work, as I"m using a remote mysql
> database instead of a local hsqldb database.But at this point, my only goal
> was to get the data import to show up as an option. I'd tweak the
> db-data-config.xml file at a later point if this in fact worked!
>
> But alas, I was still getting the same result...
>
> sorry, no dataimport-handler defined!
>
> G.. so annoying after all that work. Anyway, I really do appreciate your
> kindness and help. :) I'm enclosing my solrconfig.xml and both versions of
> my db-data-config.xml in hopes that we can make some progress here!
>
>
> Thank
>
>
> Tim
>
>
>
>
>
>
>
>
>
> On Sun, Nov 2, 2014 at 9:50 PM, Alexandre Rafalovitch 
> wrote:
>>
>> That tutorial seems to be

Re: Cannot use Phrase Queries in eDisMax and filtering

2014-11-03 Thread Tim Hearn

That was a typo in the email I did not actually send the query with a start
param of 50.  I sent it with a start param of 0, I just verified.  Sorry
for the mistake.

On Mon, Nov 3, 2014 at 1:41 PM, Ramzi Alqrainy 
wrote:

> The results are different, because you need to set "start" parameter 0
> instead of 50 in the first query (after filtration ) with same rows value
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Cannot-use-Phrase-Queries-in-eDisMax-and-filtering-tp4167302p4167329.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Consul instead of ZooKeeper anyone?

2014-11-03 Thread Greg Solovyev

Thanks Erick, 
after looking further into Solr's source code, I see that it's married to ZK 
libraries and it won't be possible to extend existing code without diverting 
from the trunk. At the same time, I don't see any reason for lack of 
abstraction in cloud-related code of Solr and SolrJ. As far as I can see Consul 
provides all that SolrCloud needs and so if cloud code was using some more 
abstraction, ZK bindings could be substituted with another library. I am 
willing to implement a this functionality and the abstraction, but at the same 
time, I don't want to maintain my own branch of Solr because of this 
integration. Do you think it would be possible to add an abstraction layer to 
Solr source code in near future? 

I think Consul has all the features that SolrCloud needs and what's especially 
attractive about Consul is that it's memory footprint is 100X smaller than ZK. 
Mainly though, we are considering Consul as a main service locator for a bunch 
of other moving parts within Zimbra, so being able to avoid deploying ZK just 
for SolrCloud would save a bunch of $$ for large customers.

Thanks,
Greg

- Original Message -
From: "Erick Erickson" 
To: solr-user@lucene.apache.org
Sent: Friday, October 31, 2014 5:15:09 PM
Subject: Re: Consul instead of ZooKeeper anyone?

Not that I know of, but look before you leap. I took a quick look at
Consul and it really doesn't look like any kind of drop-in replacement.
Also, the Zookeeper usage in SolrCloud isn't really pluggable
AFAIK, so there'll be lots of places in the Solr code that need to be
reworked etc., especially in the realm of collections and sharding.

The Collections API will be challenging to port over I think.

Not to mention SolrJ and CloudSolrServer for clients who want to interact
with SolrCloud through Java.

Not saying it won't work, I just suspect that getting it done would be
a big job, and thereafter keeping those changes in sync with the
changing SolrCloud code base would chew up a lots of time. So if
I were putting my Product Manager hat on I'd ask "is the benefit
worth the effort?".

All that said, go for it if you've a mind to!

Best,
Erick

On Fri, Oct 31, 2014 at 4:08 PM, Greg Solovyev  wrote:
> I am investigating a project to make SolrCloud run on Consul instead of 
> ZooKeeper. So far, my research revealed no such efforts, but I wanted to 
> check with this list to make sure I am not going to be reinventing the wheel. 
> Have anyone attempted using Consul instead of ZK to coordinate SolrCloud 
> nodes?
>
> Thanks,
> Greg

Re: Cannot use Phrase Queries in eDisMax and filtering

2014-11-03 Thread Ramzi Alqrainy

I tried to produce your case in my machine with below queries, but everything
worked fine with me. I just want to ask you a question what is the field
type of "tag" field ?

q=bmw&
fl=score,*&
wt=json&
fq=city_id:59&
qt=/query&
defType=edismax&
pf=title^15%20discription^5&
pf3=1&
pf2=1&
ps=1&
qroup=true&
group.field=member_id&
group.limit=10&
sort=score desc&
group.ngroups=true




q=bmw&
fl=score,*&
wt=json&
fq=city_id:59&
qt=/query&
defType=edismax&
pf=title^15%20discription^5&
pf3=1&
pf2=1&
ps=1&
qroup=true&
group.field=member_id&
group.limit=10&
group.ngroups=true&
sort=score desc& 
fq=category_id:1777



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cannot-use-Phrase-Queries-in-eDisMax-and-filtering-tp4167302p4167338.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Cannot use Phrase Queries in eDisMax and filtering

2014-11-03 Thread Tim Hearn

It is of type string.

On Mon, Nov 3, 2014 at 2:29 PM, Ramzi Alqrainy 
wrote:

> I tried to produce your case in my machine with below queries, but
> everything
> worked fine with me. I just want to ask you a question what is the field
> type of "tag" field ?
>
> q=bmw&
> fl=score,*&
> wt=json&
> fq=city_id:59&
> qt=/query&
> defType=edismax&
> pf=title^15%20discription^5&
> pf3=1&
> pf2=1&
> ps=1&
> qroup=true&
> group.field=member_id&
> group.limit=10&
> sort=score desc&
> group.ngroups=true
>
>
>
>
> q=bmw&
> fl=score,*&
> wt=json&
> fq=city_id:59&
> qt=/query&
> defType=edismax&
> pf=title^15%20discription^5&
> pf3=1&
> pf2=1&
> ps=1&
> qroup=true&
> group.field=member_id&
> group.limit=10&
> group.ngroups=true&
> sort=score desc&
> fq=category_id:1777
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Cannot-use-Phrase-Queries-in-eDisMax-and-filtering-tp4167302p4167338.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Data colocation hint to solr index

2014-11-03 Thread maninder batth

Thank you for recommendation on composite IDs. We currently use solr 3.x.
After reading on composite ids, it sounds like a feature of solr 4.x. Is
something similar available in solr 3.x also? Also, we do not use solrCloud.

On Mon, Nov 3, 2014 at 12:41 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> If you're using SolrCloud then you can use composite IDs such as
> !doc-id to co-locate documents belonging to a manufacturer together
> and at query time, you can add _route_=! to the request to route it
> to the correct node.
>
> On Mon, Nov 3, 2014 at 11:00 PM, maninder batth 
> wrote:
>
> > Hi,
> > In my company, we serve car manuals for different car manufacturers with
> > their various makes and models. Typically, the search is always done
> within
> > context of a car manufacturer, year, make and model. Is there a way in
> Solr
> > to create indexes based on this criteria? Currently, the index contains
> all
> > manufactueres, makes and models. This causes index to go over a terabyte.
> > Hence, if we could teach solr to co-locate all the data for a particular
> > manufactuere, make and model, that would be an ideal thing to do.
> > I was wondering if this is possible?
> >
> > Regards,
> > Jim
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Which Solr releases contain SOLR-4470 (Security for inter-solr-node requests)

2014-11-03 Thread Yuan Jerry

I am currently working on SolrCloud and its related security configurations for 
securing Solr web applications using HTTP Basic Authentication mechanism. Among 
the Solr nodes inside the SolrCloud clustered env, there seem to be existing 
some inter-solr-node communication issues due to the security configurations, 
which are the HTTP Authentication errors. Based on my research, the patch 
SOLR-4470 (Security for inter-solr-node requests) would be ideal for resolving 
these issues (please refer to the address: 
https://wiki.apache.org/solr/SolrSecurity#Security_for_inter-solr-node_requests).
 However, it seems to me that these security patches are out-of-box additions 
to the current Solr source codebase, which don't seem to be available in the 
recent Solr releases.

If someone could point out which Solr releases or the jars from some online 
repositories that contain this patch, it would be appreciated very much.

Jerry


This e-mail is confidential.  If you are not the intended recipient, you must 
not disclose or use the information contained in it. If you have received this 
e-mail in error, please tell us immediately by return e-mail and delete the 
document. No recipient may use the information in this e-mail in violation of 
any civil or criminal statute. Sentry disclaims all liability for any 
unauthorized uses of this e-mail or its contents. Sentry accepts no liability 
or responsibility for any damage caused by any virus transmitted with this 
e-mail.

Re: Which Solr releases contain SOLR-4470 (Security for inter-solr-node requests)

2014-11-03 Thread Jan Høydahl

You find the answer to such questions by looking at the state of the JIRA issue
https://issues.apache.org/jira/browse/SOLR-4470

Staus: Open
Fix version: Trunk

Which means that this feature is not included in any released Solr version 
(yet).

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 3. nov. 2014 kl. 22.39 skrev Yuan Jerry :
> 
> I am currently working on SolrCloud and its related security configurations 
> for securing Solr web applications using HTTP Basic Authentication mechanism. 
> Among the Solr nodes inside the SolrCloud clustered env, there seem to be 
> existing some inter-solr-node communication issues due to the security 
> configurations, which are the HTTP Authentication errors. Based on my 
> research, the patch SOLR-4470 (Security for inter-solr-node requests) would 
> be ideal for resolving these issues (please refer to the address: 
> https://wiki.apache.org/solr/SolrSecurity#Security_for_inter-solr-node_requests).
>  However, it seems to me that these security patches are out-of-box additions 
> to the current Solr source codebase, which don't seem to be available in the 
> recent Solr releases.
> 
> If someone could point out which Solr releases or the jars from some online 
> repositories that contain this patch, it would be appreciated very much.
> 
> Jerry
> 
> 
> This e-mail is confidential.  If you are not the intended recipient, you must 
> not disclose or use the information contained in it. If you have received 
> this e-mail in error, please tell us immediately by return e-mail and delete 
> the document. No recipient may use the information in this e-mail in 
> violation of any civil or criminal statute. Sentry disclaims all liability 
> for any unauthorized uses of this e-mail or its contents. Sentry accepts no 
> liability or responsibility for any damage caused by any virus transmitted 
> with this e-mail.

Re: Which Solr releases contain SOLR-4470 (Security for inter-solr-node requests)

2014-11-03 Thread Chris Hostetter


: I am currently working on SolrCloud and its related security 
: configurations for securing Solr web applications using HTTP Basic 
: Authentication mechanism. Among the Solr nodes inside the SolrCloud 
: clustered env, there seem to be existing some inter-solr-node 
: communication issues due to the security configurations, which are the 
: HTTP Authentication errors. Based on my research, the patch SOLR-4470 

In my opinion, your best bet to "secure" Solr is to avoid any and all 
involvement of Basic Auth and instead use SSL with Client certificates...

https://cwiki.apache.org/confluence/display/solr/Enabling+SSL


1) Already supported in Solr today - no patches needed

2) eliminates the complexity of needing a proxy in front of solr to handle 
the client auth, so that the solr nodes can talk to eachother w/o auth -- 
and/or: having solr nodes "forward" the client auth arround.  Instead each 
solr node authenticates the client using the client's cert, and each node 
authenticates itself for the inter-node requests using it's own cert.

3) much more secure then Basic-Auth headers which could be sniffed by a 
man-in-the-middle (you could use SSL + Basic Auth - but if you are going 
to enable SSL anyway, why bother with Basic Auth? just configure the 
client certs)


-Hoss
http://www.lucidworks.com/

custom sorting of search result

2014-11-03 Thread alxsss

Hello,


We need to order solr search results according to specific rules. 


I will explain with an example. Let say solr returns 1000 results for query 
"sport". 
These results must be divided into three buckets according to rules that come 
from database. 
Then one doc must be chosen from each bucket and put in the results 
subsequently until all buckets are empty.


One approach was to modify/override solr code where it gets results, sorts them 
and return #rows of elements.
However, from the code in Weight.java scoreAll function we see that docs have 
only internal document id and nothing else. 


We expect unique solr document id in order to match documents with the custom 
scoring.
We also  see that Lucene code handles those doc ids to scoreAll function, and 
for now We do not want to modify Lucene code
 and prefer to solve this issue as a Solr  plugin .


Any ideas are welcome.




Thanks.
Alex.

Re: Consul instead of ZooKeeper anyone?

2014-11-03 Thread Erick Erickson

bq:  Do you think it would be possible to add an abstraction layer to
Solr source code in near future?

I strongly doubt it. As you've already noted, this is a large amount
of work. Without some super-compelling advantage I just don't see the
interest.

bq:  to avoid deploying ZK just for SolrCloud would save a bunch of $$
for large customers

How so? It's free.

Making this change would, IMO, require a compelling story to generate
much enthusiasm. So far I haven't seen that story, and Jürgen and
Walter raise valid points that haven't been addressed. I suspect
you're significantly underestimating the effort to get this stable in
the SolrCloud world as well.

I don't really want to be such a wet blanket, but you're asking about
a very significant amount of work from a bunch of people, all of whom
have lots of things on their plate. So without a _very_ good reason, I
think it's unlikely to generate much interest.

Best,
Erick

On Mon, Nov 3, 2014 at 11:17 AM, Greg Solovyev  wrote:
> Thanks Erick,
> after looking further into Solr's source code, I see that it's married to ZK 
> libraries and it won't be possible to extend existing code without diverting 
> from the trunk. At the same time, I don't see any reason for lack of 
> abstraction in cloud-related code of Solr and SolrJ. As far as I can see 
> Consul provides all that SolrCloud needs and so if cloud code was using some 
> more abstraction, ZK bindings could be substituted with another library. I am 
> willing to implement a this functionality and the abstraction, but at the 
> same time, I don't want to maintain my own branch of Solr because of this 
> integration. Do you think it would be possible to add an abstraction layer to 
> Solr source code in near future?
>
> I think Consul has all the features that SolrCloud needs and what's 
> especially attractive about Consul is that it's memory footprint is 100X 
> smaller than ZK. Mainly though, we are considering Consul as a main service 
> locator for a bunch of other moving parts within Zimbra, so being able to 
> avoid deploying ZK just for SolrCloud would save a bunch of $$ for large 
> customers.
>
> Thanks,
> Greg
>
> - Original Message -
> From: "Erick Erickson" 
> To: solr-user@lucene.apache.org
> Sent: Friday, October 31, 2014 5:15:09 PM
> Subject: Re: Consul instead of ZooKeeper anyone?
>
> Not that I know of, but look before you leap. I took a quick look at
> Consul and it really doesn't look like any kind of drop-in replacement.
> Also, the Zookeeper usage in SolrCloud isn't really pluggable
> AFAIK, so there'll be lots of places in the Solr code that need to be
> reworked etc., especially in the realm of collections and sharding.
>
> The Collections API will be challenging to port over I think.
>
> Not to mention SolrJ and CloudSolrServer for clients who want to interact
> with SolrCloud through Java.
>
> Not saying it won't work, I just suspect that getting it done would be
> a big job, and thereafter keeping those changes in sync with the
> changing SolrCloud code base would chew up a lots of time. So if
> I were putting my Product Manager hat on I'd ask "is the benefit
> worth the effort?".
>
> All that said, go for it if you've a mind to!
>
> Best,
> Erick
>
> On Fri, Oct 31, 2014 at 4:08 PM, Greg Solovyev  wrote:
>> I am investigating a project to make SolrCloud run on Consul instead of 
>> ZooKeeper. So far, my research revealed no such efforts, but I wanted to 
>> check with this list to make sure I am not going to be reinventing the 
>> wheel. Have anyone attempted using Consul instead of ZK to coordinate 
>> SolrCloud nodes?
>>
>> Thanks,
>> Greg

Re: Data colocation hint to solr index

2014-11-03 Thread Erick Erickson

You have a TB-scale index and you're not using SolrCloud? Are
you using master/slave or otherwise splitting up your index? Because
if you're not, then please ship me some of your hardware because it
must be awesome.

Which is a tongue-in-cheek way of saying there must be lots of details
you aren't telling us that would help us help you.


Best,
Erick

On Mon, Nov 3, 2014 at 11:45 AM, maninder batth  wrote:
> Thank you for recommendation on composite IDs. We currently use solr 3.x.
> After reading on composite ids, it sounds like a feature of solr 4.x. Is
> something similar available in solr 3.x also? Also, we do not use solrCloud.
>
> On Mon, Nov 3, 2014 at 12:41 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> If you're using SolrCloud then you can use composite IDs such as
>> !doc-id to co-locate documents belonging to a manufacturer together
>> and at query time, you can add _route_=! to the request to route it
>> to the correct node.
>>
>> On Mon, Nov 3, 2014 at 11:00 PM, maninder batth 
>> wrote:
>>
>> > Hi,
>> > In my company, we serve car manuals for different car manufacturers with
>> > their various makes and models. Typically, the search is always done
>> within
>> > context of a car manufacturer, year, make and model. Is there a way in
>> Solr
>> > to create indexes based on this criteria? Currently, the index contains
>> all
>> > manufactueres, makes and models. This causes index to go over a terabyte.
>> > Hence, if we could teach solr to co-locate all the data for a particular
>> > manufactuere, make and model, that would be an ideal thing to do.
>> > I was wondering if this is possible?
>> >
>> > Regards,
>> > Jim
>> >
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>

RE: Missing Records

2014-11-03 Thread AJ Lemke

So I jumped back on this.

I have not been using the optimize option on this new set of tests.
If I run the full index on the leader I seem to get all of the items in the 
database minus 3 that have a missing field.

Indexing completed. Added/Updated: 903,990 documents. Deleted 0 documents. 
(Duration: 25m 11s)
Requests: 1 (0/s), Fetched: 903,993 (598/s), Skipped: 0, Processed: 903,990

Last Modified:2 minutes ago
Num Docs:903990
Max Doc:903990
Heap Memory Usage:2625744
Deleted Docs:0
Version:3249
Segment Count:7
Optimized:
Current:

If I run it on the other node I get:

Indexing completed. Added/Updated: 903,993 documents. Deleted 0 documents. 
(Duration: 27m 08s)
Requests: 1 (0/s), Fetched: 903,993 (555/s), Skipped: 0, Processed: 903,993 
(555/s)

Last Modified:about a minute ago
Num Docs:897791
Max Doc:897791
Heap Memory Usage:2621072
Deleted Docs:0
Version:3285
Segment Count:7
Optimized:
Current:

Any ideas?

If there is any more info that is needed let me know.

AJ


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, October 31, 2014 1:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Missing Records

Sorry to say this, but I don't think the numDocs/maxDoc numbers are telling you 
anything. because it looks like you've optimized which purges any data 
associated with deleted docs, including the internal IDs which are the 
numDocs/maxDocs figures. So if there were deletions, we can't see any evidence 
of same.


Siih.


On Fri, Oct 31, 2014 at 9:56 AM, AJ Lemke  wrote:
> I have run some more tests so the numbers have changed a bit.
>
> Index Results done on Node 1:
> Indexing completed. Added/Updated: 903,993 documents. Deleted 0 
> documents. (Duration: 31m 47s)
> Requests: 1 (0/s), Fetched: 903,993 (474/s), Skipped: 0, Processed: 
> 903,993
>
> Node 1:
> Last Modified: 44 minutes ago
> Num Docs: 824216
> Max Doc: 824216
> Heap Memory Usage: -1
> Deleted Docs: 0
> Version: 1051
> Segment Count: 1
> Optimized: checked
> Current: checked
>
> Node 2:
> Last Modified: 44 minutes ago
> Num Docs: 824216
> Max Doc: 824216
> Heap Memory Usage: -1
> Deleted Docs: 0
> Version: 1051
> Segment Count: 1
> Optimized: checked
> Current: checked
>
> Search results are the same as the doc numbers above.
>
> Logs only have one instance of an error:
>
> ERROR - 2014-10-31 10:47:12.867; 
> org.apache.solr.update.StreamingSolrServers$1; error
> org.apache.solr.common.SolrException: Bad Request
>
>
>
> request: 
> http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2
> at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Some info that may be of help
> This is on my local vm using jetty with the embedded zookeeper.
> Commands to start cloud:
>
> java -DzkRun -jar start.jar
> java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar
>
> sh zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir 
> ~/development/configs/inventory/ -confname config_ inventory sh 
> zkcli.sh -zkhost localhost:9983 -cmd linkconfig -collection inventory 
> -confname config_ inventory
>
> curl 
> "http://localhost:8983/solr/admin/collections?action=CREATE&name=inventory&numShards=1&replicationFactor=2&maxShardsPerNode=4";
> curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name= 
> inventory "
>
> AJ
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, October 31, 2014 9:49 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Missing Records
>
> OK, that is puzzling.
>
> bq: If there were duplicates only one of the duplicates should be removed and 
> I still should be able to search for the ID and find one correct?
>
> Correct.
>
> Your bad request error is puzzling, you may be on to something there.
> What it looks like is that somehow some of the documents you're 
> sending to Solr aren't getting indexed, either being dropped through 
> the network or perhaps have invalid fields, field formats (i.e. a date 
> in the wrong format,
> whatever) or some such. When you complete the run, what are the maxDoc and 
> numDocs numbers on one of the nodes?
>
> What else do you see in the logs? They're pretty big after that many adds, 
> but maybe you can grep for ERROR and see something interesting like stack 
> traces. Or even "org.apache.solr". This latter will give you some false hits, 
> but at least it's better than paging through a huge log file
>
> Personally, in this kind of situation I sometimes use SolrJ to do my indexing 
> rather than DIH, I find it easier to debu

Re: Solr error : sorry, no dataimport-handler defined!

2014-11-03 Thread Tim Dunphy

Hi Alexandre,

 OK some good progress was made based on this advice. Thanks! I think we're
in the home stretch with the data import. Not there yet. But hopefully
close.

> Two problems:
> 1) You have  (span) elements in your solrconfig.xml. They just
> do not belong there. The original tutorial screwed up. Your element
> should be on the same level as the other elements in that example.
> 2) You also seem to have another random piece of data configuration in
> the solrconfig.xml. Also in the spans, so they are being ignored. But
> still very very wrong. Take those out all together.
> You should just have 3 things tying together:
> 1) jars loaded in the lib statement in solrconfig.xml
> 2) handler definition that points at your data-config file
> 3) data-config file itself.

OK so here I'm loading the libs:

* *
*  *

Verified the files are there.

[root@solr1:/opt/solr/collection1/conf] #ls -l /opt/solr/lib/ | grep mysql
-rw-r--r--. 1 root root 959987 Nov  3 19:17
mysql-connector-java-5.1.33-bin.jar

[root@solr1:/opt/solr/collection1/conf] #ls -l /opt/solr/lib/ | grep mysql
-rw-r--r--. 1 root root 959987 Nov  3 19:17
mysql-connector-java-5.1.33-bin.jar
[root@solr1:/opt/solr/collection1/conf] #ls -l /opt/solr/dist/ | grep
dataimport
-rw-r--r--. 1 tomcat tomcat   219261 Sep 24 06:07
solr-dataimporthandler-4.10.1.jar
-rw-r--r--. 1 tomcat tomcat37443 Sep 24 06:07
solr-dataimporthandler-extras-4.10.1.jar

Added this entry to solrconfig.xml (without the spans):

db-data-config.xml

Then added this db-data-config.xml file in the same directory as the
solrconfig.xml

Verified I could connect to the DB with the info supplied in the data
config file:

[root@solr1:/opt/solr/collection1/conf] #mysql -uadmin -p -h
web1.mydomain.com jokefire
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 8628551
Server version: 5.5.39 MySQL Community Server (GPL) by Remi

Copyright (c) 2000, 2014, Oracle, Monty Program Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input
statement.

MySQL [jokefire]>

Bounced tomcat, and there it was!! I now had a web interface for the data
import feature! Thank you for helping to get me this far!

[image: Inline image 1]

However at this stage the import, even tho it says it's been started, just
kind of sits there. And no records are actually imported.

I took a look at the logs and found these entries:

11/3/2014, 7:21:03 PMWARNSimplePropertiesWriterUnable to read:
dataimport.properties11/3/2014, 7:21:04 PMWARNSimplePropertiesWriterUnable
to read: dataimport.properties
It looks as if something is still failing. But I googled that error and
found that the answer to that was to make 'the conf directory writable'.
I'll experiment with tightening up permissions, but at that point I just
wanted to see if that would solve this. So I made it world writable with
chmod 777. And lo and behold an import happened!!

Last Update: 19:40:26
*Indexing completed. Added/Updated: 4 documents. Deleted 0 documents.
(Duration: 01s)*
Requests: 1 (1/s), Fetched: 4 (4/s), Skipped: 0, Processed: 4 (4/s)
Started: 2 minutes ago

Very cool. Finally I can start to use solr with some real data. Not much in
this database yet. But that's ok I'll add some data and have a look.
Hopefully this will be the database for a live app someday, making this
little exercise in solr indexing useful!

Thanks again!

Tim

> If you are still having troubles, I strongly recommend getting the
> shipped example to work and then adding your own stuff until you get
> that working. Then, try to create a standalone configuration.
> Sometimes, this is an easier approach for the first time user.
> Regards,
>Alex.
> P.s. I also cover that in my Solr book. A relevant example is here:
> https://github.com/arafalov/solr-indexing-book/tree/master/published/dihdb
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

On Mon, Nov 3, 2014 at 1:50 PM, Alexandre Rafalovitch 
wrote:

> Two problems:
> 1) You have  (span) elements in your solrconfig.xml. They just
> do not belong there. The original tutorial screwed up. Your element
> should be on the same level as the other elements in that example.
> 2) You also seem to have another random piece of data configuration in
> the solrconfig.xml. Also in the spans, so they are being ignored. But
> still very very wrong. Take those out all together.
>
> You should just have 3 things tying together:
> 1) jars loaded in the

Faceting return value of a function query?

2014-11-03 Thread Tom

Hi,

I'm new to Solr, and I'm having a problem with faceting. I would really
appreciate it if you could help :)

I have a set of documents in JSON format, which I could post to my Solr
core using the post.jar tool. Each document contains two fields, namely
"startDate" and "endDate", both of which are of type "date".

Conceptually, I would like to have a third field "timeSpan" that is
automatically generated from the return value of function query
"ms(endDate, startDate)", and do range facet on it, i.e. compute the
distribution of "timeSpan", among either all of or a filtered subset of the
documents.

I have tried to find ways of both directly faceting the function return
values and automatically generate the "timeSpan" field during indexing, but
without luck yet.

Suggestions are greatly appreciated!

Best,
Yubing

Re: Faceting return value of a function query?

2014-11-03 Thread Erick Erickson

Wouldn't it be easiest to compute the span at index time? Then it's
very straight-forward.

Best,
Erick

On Mon, Nov 3, 2014 at 8:18 PM, Yubing (Tom) Dong 董玉冰
 wrote:
> Hi,
>
> I'm new to Solr, and I'm having a problem with faceting. I would really
> appreciate it if you could help :)
>
> I have a set of documents in JSON format, which I could post to my Solr
> core using the post.jar tool. Each document contains two fields, namely
> "startDate" and "endDate", both of which are of type "date".
>
> Conceptually, I would like to have a third field "timeSpan" that is
> automatically generated from the return value of function query
> "ms(endDate, startDate)", and do range facet on it, i.e. compute the
> distribution of "timeSpan", among either all of or a filtered subset of the
> documents.
>
> I have tried to find ways of both directly faceting the function return
> values and automatically generate the "timeSpan" field during indexing, but
> without luck yet.
>
> Suggestions are greatly appreciated!
>
> Best,
> Yubing

Re: Faceting return value of a function query?

2014-11-03 Thread Tom

Hi Erik,

Thanks for the reply! Do you mean parse and modify the documents before
sending them to Solr?

Cheers,
Yubing

On Mon, Nov 3, 2014 at 8:48 PM, Erick Erickson 
wrote:

> Wouldn't it be easiest to compute the span at index time? Then it's
> very straight-forward.
>
> Best,
> Erick
>
> On Mon, Nov 3, 2014 at 8:18 PM, Yubing (Tom) Dong 董玉冰
>  wrote:
> > Hi,
> >
> > I'm new to Solr, and I'm having a problem with faceting. I would really
> > appreciate it if you could help :)
> >
> > I have a set of documents in JSON format, which I could post to my Solr
> > core using the post.jar tool. Each document contains two fields, namely
> > "startDate" and "endDate", both of which are of type "date".
> >
> > Conceptually, I would like to have a third field "timeSpan" that is
> > automatically generated from the return value of function query
> > "ms(endDate, startDate)", and do range facet on it, i.e. compute the
> > distribution of "timeSpan", among either all of or a filtered subset of
> the
> > documents.
> >
> > I have tried to find ways of both directly faceting the function return
> > values and automatically generate the "timeSpan" field during indexing,
> but
> > without luck yet.
> >
> > Suggestions are greatly appreciated!
> >
> > Best,
> > Yubing
>

Re: Missing log entries with log4j log rotation

2014-11-03 Thread Shawn Heisey

On 11/1/2014 11:45 AM, Shawn Heisey wrote:
> There appear to be large blocks of time missing in my solr logfiles
> created with slf4j->log4j and rotated using the log4j config:
> 
> End of solr.log.1: INFO  - 2014-10-31 12:52:25.073;
> Start of solr.log: INFO  - 2014-11-01 02:27:27.404;
> 
> End of solr.log.2: INFO  - 2014-10-29 06:30:32.661;
> Start of solr.log.1: INFO  - 2014-10-30 07:01:34.241;

The more I thought about this problem, the more convinced I became that
the issue had to be in log4j, since log4j is responsible for writing and
rotating the logs.

I posted the question on the log4j mailing list, and the response
basically said "If this is a bug in log4j 1.x, it's not going to get
fixed.  Upgrade to 2.x."  We do something similar ourselves when a new
major version gets minted, so I can't really complain about that.

I was able to get information on what jar changes would be required for
such an upgrade, but from what I can tell, log4j2 does not support a
property-based configuration file, it must be XML.  There are no
conversion tools for version 2... the only conversion tool I found would
convert log4j.properties into an XML config for version 1.x, which looks
very different from a version 2 XML config.  There do not appear to be
any examples of a RollingFileAppender based config for log4j2.

It won't be a relevant upgrade test if I can't configure the new version
in the same way as the old version, and because I can't find any
examples to work from, I'm going to have to experiment with the config.

If we choose to upgrade the project to log4j 2.1, upgrading the logging
might prove tricky for some end users.  If we do the upgrade right, they
would have the option of continuing to use their existing logging setup,
which might be losing logs like mine.

Thanks,
Shawn

Re: Faceting return value of a function query?

2014-11-03 Thread Erick Erickson

Yep. It's almost always easier and faster if you can pre-compute as
much as possible during indexing time. It'll take longer to   index of
course, but the ratio of writing to the index to searching is usually
hugely in favor of doing the work during indexing.

Best,
Erick

On Mon, Nov 3, 2014 at 8:52 PM, Yubing (Tom) Dong 董玉冰
 wrote:
> Hi Erik,
>
> Thanks for the reply! Do you mean parse and modify the documents before
> sending them to Solr?
>
> Cheers,
> Yubing
>
> On Mon, Nov 3, 2014 at 8:48 PM, Erick Erickson 
> wrote:
>
>> Wouldn't it be easiest to compute the span at index time? Then it's
>> very straight-forward.
>>
>> Best,
>> Erick
>>
>> On Mon, Nov 3, 2014 at 8:18 PM, Yubing (Tom) Dong 董玉冰
>>  wrote:
>> > Hi,
>> >
>> > I'm new to Solr, and I'm having a problem with faceting. I would really
>> > appreciate it if you could help :)
>> >
>> > I have a set of documents in JSON format, which I could post to my Solr
>> > core using the post.jar tool. Each document contains two fields, namely
>> > "startDate" and "endDate", both of which are of type "date".
>> >
>> > Conceptually, I would like to have a third field "timeSpan" that is
>> > automatically generated from the return value of function query
>> > "ms(endDate, startDate)", and do range facet on it, i.e. compute the
>> > distribution of "timeSpan", among either all of or a filtered subset of
>> the
>> > documents.
>> >
>> > I have tried to find ways of both directly faceting the function return
>> > values and automatically generate the "timeSpan" field during indexing,
>> but
>> > without luck yet.
>> >
>> > Suggestions are greatly appreciated!
>> >
>> > Best,
>> > Yubing
>>

Admin UI Schema Browser screen and ReverseStringFilterFactory

2014-11-03 Thread Alexandre Rafalovitch

Hello,

I just noticed this one and curious what's causing this (desirable I
guess) behavior.

I have a chain with ReverseStringFilterFactory in it (both index and
query). So, the tokens are reversed from the input.

But when I look at the Schema Browser screen and it loads the tokens,
it seems to show an uninverted form somehow. Because when I click on
the token value it searches for that value that it shows me and finds
correct records.

But the analysis screen shows the tokens being reversed (also correctly).

So, the only explanation I can think of is that the Schema Browser
(luke) is somehow uninverting the tokens for the presentation. But
where is that defined and what other edge-cases are in there?

tl;dr: everything works, but WHY?

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

Re: Data colocation hint to solr index

2014-11-03 Thread Shawn Heisey

On 11/3/2014 12:45 PM, maninder batth wrote:
> Thank you for recommendation on composite IDs. We currently use solr 3.x.
> After reading on composite ids, it sounds like a feature of solr 4.x. Is
> something similar available in solr 3.x also? Also, we do not use solrCloud.

The compositeId router is part of SolrCloud, which you will only find in
Solr 4.0 and newer.  On 3.x, you must normally handle all shard routing
outside of Solr.  It might be possible to configure the dataimport
handler so that its JDBC query selects only documents that belong on
that shard, if you happen to be using the dataimport handler already.

Thanks,
Shawn

Re: Faceting return value of a function query?

2014-11-03 Thread Tom

I see. Thank you! :-)

Sent from my Android phone
On Nov 3, 2014 9:35 PM, "Erick Erickson"  wrote:

> Yep. It's almost always easier and faster if you can pre-compute as
> much as possible during indexing time. It'll take longer to   index of
> course, but the ratio of writing to the index to searching is usually
> hugely in favor of doing the work during indexing.
>
> Best,
> Erick
>
> On Mon, Nov 3, 2014 at 8:52 PM, Yubing (Tom) Dong 董玉冰
>  wrote:
> > Hi Erik,
> >
> > Thanks for the reply! Do you mean parse and modify the documents before
> > sending them to Solr?
> >
> > Cheers,
> > Yubing
> >
> > On Mon, Nov 3, 2014 at 8:48 PM, Erick Erickson 
> > wrote:
> >
> >> Wouldn't it be easiest to compute the span at index time? Then it's
> >> very straight-forward.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Nov 3, 2014 at 8:18 PM, Yubing (Tom) Dong 董玉冰
> >>  wrote:
> >> > Hi,
> >> >
> >> > I'm new to Solr, and I'm having a problem with faceting. I would
> really
> >> > appreciate it if you could help :)
> >> >
> >> > I have a set of documents in JSON format, which I could post to my
> Solr
> >> > core using the post.jar tool. Each document contains two fields,
> namely
> >> > "startDate" and "endDate", both of which are of type "date".
> >> >
> >> > Conceptually, I would like to have a third field "timeSpan" that is
> >> > automatically generated from the return value of function query
> >> > "ms(endDate, startDate)", and do range facet on it, i.e. compute the
> >> > distribution of "timeSpan", among either all of or a filtered subset
> of
> >> the
> >> > documents.
> >> >
> >> > I have tried to find ways of both directly faceting the function
> return
> >> > values and automatically generate the "timeSpan" field during
> indexing,
> >> but
> >> > without luck yet.
> >> >
> >> > Suggestions are greatly appreciated!
> >> >
> >> > Best,
> >> > Yubing
> >>
>

50 matches

Mail list logo