RE: Keyword Tokenizer Phrase Issue

2012-02-10 Thread Zac Smith
I have done some further analysis on this and I am now even more confused. When 
I use the Field Analysis tool with the text 'chicken stock' it highlights that 
text as a match.
The dismax query looks ok to me:
+(DisjunctionMaxQuery((ingredient_synonyms:chicken^0.6)~0.01) 
DisjunctionMaxQuery((ingredient_synonyms:stock^0.6)~0.01)) 
DisjunctionMaxQuery((ingredient_synonyms:chicken stock^0.6)~0.01)

Then I have done an explainOther and it shows a failure to meet condition. 
However there does seem to be some kind of match registered:
0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s)
  0.0 = no match on required clause (ingredient_synonyms:chicken^0.6 
ingredient_synonyms:stock^0.6)
  0.0650662 = (MATCH) weight(ingredient_synonyms:chicken stock^0.6 in 0), 
product of:
0.21204369 = queryWeight(ingredient_synonyms:chicken stock^0.6), product of:
  0.6 = boost
  0.30685282 = idf(docFreq=1, maxDocs=1)
  1.1517122 = queryNorm
0.30685282 = (MATCH) fieldWeight(ingredient_synonyms:chicken stock in 0), 
product of:
  1.0 = tf(termFreq(ingredient_synonyms:chicken stock)=1)
  0.30685282 = idf(docFreq=1, maxDocs=1)
  1.0 = fieldNorm(field=ingredient_synonyms, doc=0)

Any ideas?

My dismax handler is setup like this:
  

 dismax
 explicit
 0.01
 ingredient_synonyms^0.6
 ingredient_synonyms^0.6


Zac

From: Zac Smith
Sent: Thursday, February 09, 2012 12:52 PM
To: solr-user@lucene.apache.org
Subject: Keyword Tokenizer Phrase Issue

Hi,

I have a simple field type that uses the KeywordTokenizerFactory. I would like 
to use this so that values in this field are only matched with the full text of 
the field.
e.g. If I indexed the text 'chicken stock', searches on this field would only 
match when searching for 'chicken stock'. If searching for just 'chicken' or 
just 'stock' there should not match.

This mostly works, except if there is more than one word in the text I only get 
a match when searching with quotes. e.g.
"chicken stock" (matches)
chicken stock (doesn't match)

Is there any way I can set this up so that I don't have to provide quotes? I am 
using dismax and if I put quotes in it will mess up the search for the rest of 
my fields. I had an idea that I could issue a separate search using the regular 
query parser, but couldn't work out how to do this:
I thought I could do something like this: qt=dismax&q=fish OR 
_query_:ingredient:"chicken stock"

I am using solr 3.5.0. My field type is:









Thanks
Zac


Re: indexing with DIH (and with problems)

2012-02-10 Thread Gora Mohanty
On 10 February 2012 04:15, alessio crisantemi
 wrote:
> hi all,
> I would index on solr my pdf files wich includeds on my directory c:\myfile\
>
> so, I add on my solr/conf directory the file data-config.xml like the
> following:
[...]

> but this is the result:
[...]

Your Solr URL for dataimport looks a little odd: You seem to be
doing a delta-import. Normally, one would start with a full import:
http://solr-host:port/solr/dataimport?command=full-import

Have you looked in the Solr logs for the cause of the exception?
Please share that with us.

Regards,
Gora


SOLR

2012-02-10 Thread mizayah
Is there any way to score not being affected by duplicated input in query?

When i have record with field

title: "The GIRL with the dragon tattoo"

If query is: "girl" it get less score then "girl girl girl". It find word in
the same position, why score is growing?

I need it to know if record i found is close to user query.
Best would be for me to know how much % match with that "title" field.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-tp3731831p3731831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Basic Performance Testwith duplicated data

2012-02-10 Thread Husain, Yavar
Will testing Solr based on duplicated data in the database result in same 
performance statistics as compared to testing Solr with completely unique data? 
By test I mean routine performance tests like time to index, time to search 
etc. Will solr perform any kind of optimization that will result in different 
statistics for both duplicated data and unique data?


**This
 message may contain confidential or proprietary information intended only for 
the use of theaddressee(s) named above or may contain information that is 
legally privileged. If you arenot the intended addressee, or the person 
responsible for delivering it to the intended addressee,you are hereby 
notified that reading, disseminating, distributing or copying this message is 
strictlyprohibited. If you have received this message by mistake, please 
immediately notify us byreplying to the message and delete the original 
message and any copies immediately thereafter.

Thank you.~
**
FAFLD



Re: Solr Basic Performance Testwith duplicated data

2012-02-10 Thread Rafał Kuć
Hello!

In terms of query performance, Solr will use caches (of course, if
they are turned on). So if you will run similar queries (like the same
filters, sort and stuff like that) the performance may be different
than performance with unique queries.

The http://wiki.apache.org/solr/SolrCaching have more information
about Solr caches and what they are used for.

As for the data, I think you may want to index data that is similar
(or even the same) as you will have in production.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

> Will testing Solr based on duplicated data in the database result
> in same performance statistics as compared to testing Solr with
> completely unique data? By test I mean routine performance tests
> like time to index, time to search etc. Will solr perform any kind
> of optimization that will result in different statistics for both duplicated 
> data and unique data?
> 
> 
> **This
> message may contain confidential or proprietary information intended
> only for the use of theaddressee(s) named above or may contain
> information that is legally privileged. If you arenot the
> intended addressee, or the person responsible for delivering it to
> the intended addressee,you are hereby notified that reading,
> disseminating, distributing or copying this message is
> strictlyprohibited. If you have received this message by
> mistake, please immediately notify us byreplying to the message
> and delete the original message and any copies immediately thereafter.
> 
> Thank you.~
> **
> FAFLD
> 






Re: indexing with DIH (and with problems)

2012-02-10 Thread alessio crisantemi
I have problems with full import query.
no results.

I search in log files and after I write again..
tx
a.
2012/2/9 alessio crisantemi 

> hi all,
> I would index on solr my pdf files wich includeds on my directory
> c:\myfile\
>
> so, I add on my solr/conf directory the file data-config.xml like the
> following:
>
>
> 
> 
> 
>  processor="FileListEntityProcessor"
> baseDir="c:\myfile\" fileName="*.pdf"
> recursive="true">
>  url="${f.fileAbsolutePath}" format="text">
> 
> 
>  
> 
> 
> 
> 
>
> before, I add this part into solr-config.xml:
>
>
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
>   c:\solr\conf\data-config.xml
> 
>   
>
>
> but this is the result:
>
> 
> * * *delta-import*
>  * * *idle*
>  * * 
>  
> *-*
> 
>  * * *0:0:2.512*
>  * * *0*
>  * * *0*
>  * * *0*
>  * * *0*
>  * * *2012-02-09 23:37:07*
>  * * *Indexing failed. Rolled back all changes.*
>  * * *2012-02-09 23:37:07*
> * * 
>  * * *This response format is experimental. It is
> likely to change in the future.*
>  * * 
>
> suggestions?
> thanks
> alessio
>


RE: Keyword Tokenizer Phrase Issue

2012-02-10 Thread Ahmet Arslan
Hi Zac,

Field Analysis tool (analysis.jsp) does not perform actual query parsing.

One thing to be aware of when Using Keyword Tokenizer at query time is: Query 
string (chicken stock) is pre-tokenized according to white spaces, before it 
reaches keyword tokenizer.

If you use quotes ("chicken stock"), query parser does no pre-tokenizes, though.

--- On Fri, 2/10/12, Zac Smith  wrote:

> From: Zac Smith 
> Subject: RE: Keyword Tokenizer Phrase Issue
> To: "solr-user@lucene.apache.org" 
> Date: Friday, February 10, 2012, 10:35 AM
> I have done some further analysis on
> this and I am now even more confused. When I use the Field
> Analysis tool with the text 'chicken stock' it highlights
> that text as a match.
> The dismax query looks ok to me:
> +(DisjunctionMaxQuery((ingredient_synonyms:chicken^0.6)~0.01)
> DisjunctionMaxQuery((ingredient_synonyms:stock^0.6)~0.01))
> DisjunctionMaxQuery((ingredient_synonyms:chicken
> stock^0.6)~0.01)
> 
> Then I have done an explainOther and it shows a failure to
> meet condition. However there does seem to be some kind of
> match registered:
> 0.0 = (NON-MATCH) Failure to meet condition(s) of
> required/prohibited clause(s)
>   0.0 = no match on required clause
> (ingredient_synonyms:chicken^0.6
> ingredient_synonyms:stock^0.6)
>   0.0650662 = (MATCH)
> weight(ingredient_synonyms:chicken stock^0.6 in 0), product
> of:
>     0.21204369 =
> queryWeight(ingredient_synonyms:chicken stock^0.6), product
> of:
>       0.6 = boost
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>       1.1517122 = queryNorm
>     0.30685282 = (MATCH)
> fieldWeight(ingredient_synonyms:chicken stock in 0), product
> of:
>       1.0 =
> tf(termFreq(ingredient_synonyms:chicken stock)=1)
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>       1.0 =
> fieldNorm(field=ingredient_synonyms, doc=0)
> 
> Any ideas?
> 
> My dismax handler is setup like this:
>    class="solr.SearchHandler" >
>     
>       name="defType">dismax
>       name="echoParams">explicit
>       name="tie">0.01
>       name="qf">ingredient_synonyms^0.6
>       name="pf">ingredient_synonyms^0.6
> 
> 
> Zac
> 
> From: Zac Smith
> Sent: Thursday, February 09, 2012 12:52 PM
> To: solr-user@lucene.apache.org
> Subject: Keyword Tokenizer Phrase Issue
> 
> Hi,
> 
> I have a simple field type that uses the
> KeywordTokenizerFactory. I would like to use this so that
> values in this field are only matched with the full text of
> the field.
> e.g. If I indexed the text 'chicken stock', searches on this
> field would only match when searching for 'chicken stock'.
> If searching for just 'chicken' or just 'stock' there should
> not match.
> 
> This mostly works, except if there is more than one word in
> the text I only get a match when searching with quotes.
> e.g.
> "chicken stock" (matches)
> chicken stock (doesn't match)
> 
> Is there any way I can set this up so that I don't have to
> provide quotes? I am using dismax and if I put quotes in it
> will mess up the search for the rest of my fields. I had an
> idea that I could issue a separate search using the regular
> query parser, but couldn't work out how to do this:
> I thought I could do something like this:
> qt=dismax&q=fish OR _query_:ingredient:"chicken stock"
> 
> I am using solr 3.5.0. My field type is:
>  positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>                
> 
>                
>                
> 
>                
> 
>                
> 
>                
>                
> 
>                
> 
> 
> 
> Thanks
> Zac
>


Re: indexing with DIH (and with problems)

2012-02-10 Thread Chantal Ackermann


On Thu, 2012-02-09 at 23:45 +0100, alessio crisantemi wrote:
> hi all,
> I would index on solr my pdf files wich includeds on my directory c:\myfile\
> 
> so, I add on my solr/conf directory the file data-config.xml like the
> following:
> 
> 
> 
> 
> 
> *0*
"""

DIH hasn't even retrieved any data from you data source. Check that the
call you have configured really returns any documents.


Chantal




> processor="FileListEntityProcessor"
> baseDir="c:\myfile\" fileName="*.pdf"
> recursive="true">
>  url="${f.fileAbsolutePath}" format="text">
> 
> 
>  
> 
> 
> 
> 
> 
> before, I add this part into solr-config.xml:
> 
> 
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
>   c:\solr\conf\data-config.xml
> 
>   
> 
> 
> but this is the result:
> 
> 
> * * *delta-import*
>  * * *idle*
>  * * 
>  
> *-*
> 
>  * * *0:0:2.512*
>  * * *0*
>  * * *0*
>  * * *0*
>  * * *0*
>  * * *2012-02-09 23:37:07*
>  * * *Indexing failed. Rolled back all changes.*
>  * * *2012-02-09 23:37:07*
> * * 
>  * * *This response format is experimental. It is
> likely to change in the future.*
> * * 
> 
> suggestions?
> thanks
> alessio



Solr / Tika Integration

2012-02-10 Thread Dirk Högemann
Hello,

we use Solr 3.5 and Tika to index a lot of PDFs. The content of those PDFs
is searchable via a full-text search.
Also the terms are used to make search suggestions.

Unfortunately pdfbox seems to insert a space character, when there are
soft-hyphens in the content of the PDF
Thus the extracted text is sometimes very fragmented. For example the word
Medizin is extracted as Me di zin.
As a consequence the suggestions are often unusable and the search does not
work as expected.

Has anyone a suggestion how to extract the content of PDF containing
sof-hyphens withpout fragmenting it?

Best
Dirk


Re: Sorting solrdocumentlist object after querying

2012-02-10 Thread Kashif Khan
hey Tommaso,

That result grouping is during the query but i want to sort the
solrdocumentlist after it has been queried and i hv injected few solrdocs in
the solrdocumentlist. Thus i want this solrdocumentlist to be sorted based
on the fields i specify and cannot query the solr for result grouping cos
those injected documents are not available in that solr also that result
grouping is not working with multiple shards

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-solrdocumentlist-object-after-querying-tp3726303p3732120.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do i do group by in solr with multiple shards?

2012-02-10 Thread Kashif Khan
Hi Erick,

I have tried grouping with and without shards using solr 3.3. I know solr
3.3 does not support grouping with multiple shards. We have been waiting for
3.5.0 and nw it is available and we will try with that.

The reason i am looking for grouping is posted in this link. Please advice
me how can i achieve that in the custom request handler. We have very less
time to do R&D in that so i had to post this.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-i-do-group-by-in-solr-with-multiple-shards-in-solr-3-3-tp3728555p3732148.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr / Tika Integration

2012-02-10 Thread Shairon Toledo
Hi,
Maybe the pdf creator tool is not generating a "fluid" text, in pdf has
sections defined by objects, e.g. for "Medizin"

20 0 obj
(Medizin)
endobj

However this can happen

20 0 obj
(Me)
endobj

21 0 obj
(di)
endobj

22 0 obj
(zin)
endobj

See that, there are 3 text objects, the extraction tool can interprete that
as 3 words.
Check you pdf file to make sure that it's well-formed.



On Fri, Feb 10, 2012 at 8:21 AM, Dirk Högemann <
dirk.hoegem...@googlemail.com> wrote:

> Hello,
>
> we use Solr 3.5 and Tika to index a lot of PDFs. The content of those PDFs
> is searchable via a full-text search.
> Also the terms are used to make search suggestions.
>
> Unfortunately pdfbox seems to insert a space character, when there are
> soft-hyphens in the content of the PDF
> Thus the extracted text is sometimes very fragmented. For example the word
> Medizin is extracted as Me di zin.
> As a consequence the suggestions are often unusable and the search does not
> work as expected.
>
> Has anyone a suggestion how to extract the content of PDF containing
> sof-hyphens withpout fragmenting it?
>
> Best
> Dirk
>



-- 
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo


Re: Solr / Tika Integration

2012-02-10 Thread Jan Høydahl
I think you need to control the parameter "enableAutoSpace" in PDFBox. There's 
a JIRA for it, but it depends on some Tika1.1 stuff as far I can understand

https://issues.apache.org/jira/browse/SOLR-2930

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 10. feb. 2012, at 11:21, Dirk Högemann wrote:

> Hello,
> 
> we use Solr 3.5 and Tika to index a lot of PDFs. The content of those PDFs
> is searchable via a full-text search.
> Also the terms are used to make search suggestions.
> 
> Unfortunately pdfbox seems to insert a space character, when there are
> soft-hyphens in the content of the PDF
> Thus the extracted text is sometimes very fragmented. For example the word
> Medizin is extracted as Me di zin.
> As a consequence the suggestions are often unusable and the search does not
> work as expected.
> 
> Has anyone a suggestion how to extract the content of PDF containing
> sof-hyphens withpout fragmenting it?
> 
> Best
> Dirk



Re: Solr / Tika Integration

2012-02-10 Thread Dirk Högemann
Thanks so far. I will have a closer look at the PDF.

I tried the enableautospace setting with pdfbox1.6 - did not work:

PDFParser parser = new PDFParser();
   parser.setEnableAutoSpace(false);
   ContentHandler handler = new BodyContentHandler();

Output:
Va ri an te Creutz feldt-
Ja kob-Krank heit
Stel lung nah men des Ar beits krei ses Blut

Our suggest component and parts of our search is getting hard to use by
this. Any other ideas?

Best
Dirk


2012/2/10 Jan Høydahl 

> I think you need to control the parameter "enableAutoSpace" in PDFBox.
> There's a JIRA for it, but it depends on some Tika1.1 stuff as far I can
> understand
>
> https://issues.apache.org/jira/browse/SOLR-2930
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 10. feb. 2012, at 11:21, Dirk Högemann wrote:
>
> > Hello,
> >
> > we use Solr 3.5 and Tika to index a lot of PDFs. The content of those
> PDFs
> > is searchable via a full-text search.
> > Also the terms are used to make search suggestions.
> >
> > Unfortunately pdfbox seems to insert a space character, when there are
> > soft-hyphens in the content of the PDF
> > Thus the extracted text is sometimes very fragmented. For example the
> word
> > Medizin is extracted as Me di zin.
> > As a consequence the suggestions are often unusable and the search does
> not
> > work as expected.
> >
> > Has anyone a suggestion how to extract the content of PDF containing
> > sof-hyphens withpout fragmenting it?
> >
> > Best
> > Dirk
>
>


Re: Solr / Tika Integration

2012-02-10 Thread Robert Muir
On Fri, Feb 10, 2012 at 6:18 AM, Dirk Högemann
 wrote:
>
> Our suggest component and parts of our search is getting hard to use by
> this. Any other ideas?
>

Looks like https://issues.apache.org/jira/browse/PDFBOX-371

The title of the issue is a bit confusing (I don't think it should go
to hyphen either!), but I think its the reason its being mapped to a
space.

-- 
lucidimagination.com


Hi

2012-02-10 Thread sumal
My self I am Sumal who working as a Software Engineer. Currently I am
developing web based e-commerce applications using java and i am using e
commerce Konakart shopping cart as well. I am using 

Konakart community edition. I am kindly requesting some information about
how to integrate solr in my konakart

 

If not can you send me some sample application which is using this solr
search engine... some application mans small jsp page that functioning to
search using solr...

 

Thank you!

 

 

 

Best Regards,

 -Sumal Wattegedara-



Re: Hi

2012-02-10 Thread Dalius Sidlauskas
Hi, I don't think this is the right place for this question. You should 
follow samples of solr client api integration in Java and develop your 
way in konakart..


Regards!
Dalius Sidlauskas


On 10/02/12 08:25, sumal wrote:

My self I am Sumal who working as a Software Engineer. Currently I am
developing web based e-commerce applications using java and i am using e
commerce Konakart shopping cart as well. I am using

Konakart community edition. I am kindly requesting some information about
how to integrate solr in my konakart



If not can you send me some sample application which is using this solr
search engine... some application mans small jsp page that functioning to
search using solr...



Thank you!







Best Regards,

  -Sumal Wattegedara-




Tokenize result of a NGramFilterFactory in Solr (query analyzer)

2012-02-10 Thread Mathias Hodler
Hi,

I'm using the NGramFilterFactory for indexing and querying.

So if I'm searching for "overflow" it creates an query like this:

mySearchField:"ov ve ... erflow overflo verflow overflow"

But if I misspelled "overflow", i.e. "owerflow" there are no matches
because the quotes around the query:

mySearchField:"ow we ... erflow owerflo werflow owerflow"

Is it possible to tokenize the result of the NGramFilteFactory, that
it'll creates an query like this:

mySearchField:"ow"
mySearchField:"we"
mySearchField:"erflow"
mySearchField:"owerflo"
mySearchField:"werflow"
mySearchField:"owerflow"

In this case solr would find results, because the token "erflow" exists.


Re: Tokenize result of a NGramFilterFactory in Solr (query analyzer)

2012-02-10 Thread Ahmet Arslan
> I'm using the NGramFilterFactory for indexing and querying.
> 
> So if I'm searching for "overflow" it creates an query like
> this:
> 
> mySearchField:"ov ve ... erflow overflo verflow overflow"
> 
> But if I misspelled "overflow", i.e. "owerflow" there are no
> matches
> because the quotes around the query:
> 
> mySearchField:"ow we ... erflow owerflo werflow owerflow"
> 
> Is it possible to tokenize the result of the
> NGramFilteFactory, that
> it'll creates an query like this:
> 
> mySearchField:"ow"
> mySearchField:"we"
> mySearchField:"erflow"
> mySearchField:"owerflo"
> mySearchField:"werflow"
> mySearchField:"owerflow"
> 
> In this case solr would find results, because the token
> "erflow" exists.

 should 
work for you.


Re: Tokenize result of a NGramFilterFactory in Solr (query analyzer)

2012-02-10 Thread Mathias Hodler
Hi Ahmet,

awesome! Now it works.

2012/2/10 Ahmet Arslan :
>> I'm using the NGramFilterFactory for indexing and querying.
>>
>> So if I'm searching for "overflow" it creates an query like
>> this:
>>
>> mySearchField:"ov ve ... erflow overflo verflow overflow"
>>
>> But if I misspelled "overflow", i.e. "owerflow" there are no
>> matches
>> because the quotes around the query:
>>
>> mySearchField:"ow we ... erflow owerflo werflow owerflow"
>>
>> Is it possible to tokenize the result of the
>> NGramFilteFactory, that
>> it'll creates an query like this:
>>
>> mySearchField:"ow"
>> mySearchField:"we"
>> mySearchField:"erflow"
>> mySearchField:"owerflo"
>> mySearchField:"werflow"
>> mySearchField:"owerflow"
>>
>> In this case solr would find results, because the token
>> "erflow" exists.
>
>  should 
> work for you.


Re: Geospatial search with multivalued field

2012-02-10 Thread Marian Steinbach
2012/2/9 Mikhail Khludnev :
> Some time ago I tested backported patch from
> https://issues.apache.org/jira/browse/SOLR-2155
> it works.

OK, I would do that. But...

Against which version can/should I apply the patch? (I am not
restricted by other requirements so far.)

Then I tried both with the trunk and with 3.4.0-src, but I can't even
find the files the patch wants to modify. Some are moved, but others
don't exist.

Some of the files mentioned in the patch can be
solr/src/java/org/apache/solr/search/QParserPlugin.java
=> solr/core/src/java/org/apache/solr/search/QParserPlugin.java

solr/src/test/org/apache/solr/search/SpatialFilterTest.java
=> solr/core/src/test/org/apache/solr/search/SpatialFilterTest.java

lucene/contrib/spatial/src/java/org/apache/lucene/spatial/geometry/shape/MultiGeom.java
=> not found at all.

This was all for "SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch".

For the file "Solr2155-for-1.0.2-3.x-port.patch", similar things
happen. E.g. "HaversineMultiConstFunction.java" isn't anywhere in the
3.4.0 src tarball.

What would I have to do in order to get any Solr version patched with 2155?

Thanks!

Marian


SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
I know that the latest Solr Cloud doesn't use standard replication but
I have a question about how it appears to be working.  I currently
have the following cluster state

{"collection1":{
"slice1":{
  "JamiesMac.local:8501_solr_slice1_shard1":{
"shard_id":"slice1",
"state":"active",
"core":"slice1_shard1",
"collection":"collection1",
"node_name":"JamiesMac.local:8501_solr",
"base_url":"http://JamiesMac.local:8501/solr"},
  "JamiesMac.local:8502_solr_slice1_shard2":{
"shard_id":"slice1",
"state":"active",
"core":"slice1_shard2",
"collection":"collection1",
"node_name":"JamiesMac.local:8502_solr",
"base_url":"http://JamiesMac.local:8502/solr"},
  "jamiesmac:8501_solr_slice1_shard1":{
"shard_id":"slice1",
"state":"down",
"core":"slice1_shard1",
"collection":"collection1",
"node_name":"jamiesmac:8501_solr",
"base_url":"http://jamiesmac:8501/solr"},
  "jamiesmac:8502_solr_slice1_shard2":{
"shard_id":"slice1",
"leader":"true",
"state":"active",
"core":"slice1_shard2",
"collection":"collection1",
"node_name":"jamiesmac:8502_solr",
"base_url":"http://jamiesmac:8502/solr"}},
"slice2":{
  "JamiesMac.local:8501_solr_slice2_shard2":{
"shard_id":"slice2",
"state":"active",
"core":"slice2_shard2",
"collection":"collection1",
"node_name":"JamiesMac.local:8501_solr",
"base_url":"http://JamiesMac.local:8501/solr"},
  "JamiesMac.local:8502_solr_slice2_shard1":{
"shard_id":"slice2",
"state":"active",
"core":"slice2_shard1",
"collection":"collection1",
"node_name":"JamiesMac.local:8502_solr",
"base_url":"http://JamiesMac.local:8502/solr"},
  "jamiesmac:8501_solr_slice2_shard2":{
"shard_id":"slice2",
"state":"down",
"core":"slice2_shard2",
"collection":"collection1",
"node_name":"jamiesmac:8501_solr",
"base_url":"http://jamiesmac:8501/solr"},
  "jamiesmac:8502_solr_slice2_shard1":{
"shard_id":"slice2",
"leader":"true",
"state":"active",
"core":"slice2_shard1",
"collection":"collection1",
"node_name":"jamiesmac:8502_solr",
"base_url":"http://jamiesmac:8502/solr"

I then added some docs to the following shards using SolrJ
http://localhost:8502/solr/slice2_shard1
http://localhost:8502/solr/slice1_shard2

I then bring back up the other cores and I don't see replication
happening.  Looking at the stats for each core I see that on the 8501
instance (the instance that was off) the number of docs is 0, so I've
clearly set something up incorrectly.  Any help on this would be
greatly appreciated.


Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
Can you explain a little more how you doing this? How are you bringing the 
cores down and then back up? Shutting down a full solr instance, unloading the 
core?

On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:

> I know that the latest Solr Cloud doesn't use standard replication but
> I have a question about how it appears to be working.  I currently
> have the following cluster state
> 
> {"collection1":{
>"slice1":{
>  "JamiesMac.local:8501_solr_slice1_shard1":{
>"shard_id":"slice1",
>"state":"active",
>"core":"slice1_shard1",
>"collection":"collection1",
>"node_name":"JamiesMac.local:8501_solr",
>"base_url":"http://JamiesMac.local:8501/solr"},
>  "JamiesMac.local:8502_solr_slice1_shard2":{
>"shard_id":"slice1",
>"state":"active",
>"core":"slice1_shard2",
>"collection":"collection1",
>"node_name":"JamiesMac.local:8502_solr",
>"base_url":"http://JamiesMac.local:8502/solr"},
>  "jamiesmac:8501_solr_slice1_shard1":{
>"shard_id":"slice1",
>"state":"down",
>"core":"slice1_shard1",
>"collection":"collection1",
>"node_name":"jamiesmac:8501_solr",
>"base_url":"http://jamiesmac:8501/solr"},
>  "jamiesmac:8502_solr_slice1_shard2":{
>"shard_id":"slice1",
>"leader":"true",
>"state":"active",
>"core":"slice1_shard2",
>"collection":"collection1",
>"node_name":"jamiesmac:8502_solr",
>"base_url":"http://jamiesmac:8502/solr"}},
>"slice2":{
>  "JamiesMac.local:8501_solr_slice2_shard2":{
>"shard_id":"slice2",
>"state":"active",
>"core":"slice2_shard2",
>"collection":"collection1",
>"node_name":"JamiesMac.local:8501_solr",
>"base_url":"http://JamiesMac.local:8501/solr"},
>  "JamiesMac.local:8502_solr_slice2_shard1":{
>"shard_id":"slice2",
>"state":"active",
>"core":"slice2_shard1",
>"collection":"collection1",
>"node_name":"JamiesMac.local:8502_solr",
>"base_url":"http://JamiesMac.local:8502/solr"},
>  "jamiesmac:8501_solr_slice2_shard2":{
>"shard_id":"slice2",
>"state":"down",
>"core":"slice2_shard2",
>"collection":"collection1",
>"node_name":"jamiesmac:8501_solr",
>"base_url":"http://jamiesmac:8501/solr"},
>  "jamiesmac:8502_solr_slice2_shard1":{
>"shard_id":"slice2",
>"leader":"true",
>"state":"active",
>"core":"slice2_shard1",
>"collection":"collection1",
>"node_name":"jamiesmac:8502_solr",
>"base_url":"http://jamiesmac:8502/solr"
> 
> I then added some docs to the following shards using SolrJ
> http://localhost:8502/solr/slice2_shard1
> http://localhost:8502/solr/slice1_shard2
> 
> I then bring back up the other cores and I don't see replication
> happening.  Looking at the stats for each core I see that on the 8501
> instance (the instance that was off) the number of docs is 0, so I've
> clearly set something up incorrectly.  Any help on this would be
> greatly appreciated.

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
Sorry, I shut down the full solr instance.

On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller  wrote:
> Can you explain a little more how you doing this? How are you bringing the 
> cores down and then back up? Shutting down a full solr instance, unloading 
> the core?
>
> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
>
>> I know that the latest Solr Cloud doesn't use standard replication but
>> I have a question about how it appears to be working.  I currently
>> have the following cluster state
>>
>> {"collection1":{
>>    "slice1":{
>>      "JamiesMac.local:8501_solr_slice1_shard1":{
>>        "shard_id":"slice1",
>>        "state":"active",
>>        "core":"slice1_shard1",
>>        "collection":"collection1",
>>        "node_name":"JamiesMac.local:8501_solr",
>>        "base_url":"http://JamiesMac.local:8501/solr"},
>>      "JamiesMac.local:8502_solr_slice1_shard2":{
>>        "shard_id":"slice1",
>>        "state":"active",
>>        "core":"slice1_shard2",
>>        "collection":"collection1",
>>        "node_name":"JamiesMac.local:8502_solr",
>>        "base_url":"http://JamiesMac.local:8502/solr"},
>>      "jamiesmac:8501_solr_slice1_shard1":{
>>        "shard_id":"slice1",
>>        "state":"down",
>>        "core":"slice1_shard1",
>>        "collection":"collection1",
>>        "node_name":"jamiesmac:8501_solr",
>>        "base_url":"http://jamiesmac:8501/solr"},
>>      "jamiesmac:8502_solr_slice1_shard2":{
>>        "shard_id":"slice1",
>>        "leader":"true",
>>        "state":"active",
>>        "core":"slice1_shard2",
>>        "collection":"collection1",
>>        "node_name":"jamiesmac:8502_solr",
>>        "base_url":"http://jamiesmac:8502/solr"}},
>>    "slice2":{
>>      "JamiesMac.local:8501_solr_slice2_shard2":{
>>        "shard_id":"slice2",
>>        "state":"active",
>>        "core":"slice2_shard2",
>>        "collection":"collection1",
>>        "node_name":"JamiesMac.local:8501_solr",
>>        "base_url":"http://JamiesMac.local:8501/solr"},
>>      "JamiesMac.local:8502_solr_slice2_shard1":{
>>        "shard_id":"slice2",
>>        "state":"active",
>>        "core":"slice2_shard1",
>>        "collection":"collection1",
>>        "node_name":"JamiesMac.local:8502_solr",
>>        "base_url":"http://JamiesMac.local:8502/solr"},
>>      "jamiesmac:8501_solr_slice2_shard2":{
>>        "shard_id":"slice2",
>>        "state":"down",
>>        "core":"slice2_shard2",
>>        "collection":"collection1",
>>        "node_name":"jamiesmac:8501_solr",
>>        "base_url":"http://jamiesmac:8501/solr"},
>>      "jamiesmac:8502_solr_slice2_shard1":{
>>        "shard_id":"slice2",
>>        "leader":"true",
>>        "state":"active",
>>        "core":"slice2_shard1",
>>        "collection":"collection1",
>>        "node_name":"jamiesmac:8502_solr",
>>        "base_url":"http://jamiesmac:8502/solr"
>>
>> I then added some docs to the following shards using SolrJ
>> http://localhost:8502/solr/slice2_shard1
>> http://localhost:8502/solr/slice1_shard2
>>
>> I then bring back up the other cores and I don't see replication
>> happening.  Looking at the stats for each core I see that on the 8501
>> instance (the instance that was off) the number of docs is 0, so I've
>> clearly set something up incorrectly.  Any help on this would be
>> greatly appreciated.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: correct usage of StreamingUpdateSolrServer?

2012-02-10 Thread Erick Erickson
Can you post the code? SUSS should essentially be a drop-in
replacement for CHSS.

It's not advisable to commit after every add, it's usually better
to use commitWithin, and perhaps commit at the very end of
the run.

Best
Erick

On Thu, Feb 9, 2012 at 4:00 PM, T Vinod Gupta  wrote:
> Hi,
> I wrote a hello world program to add documents to solr server. When I
> use CommonsHttpSolrServer, the program exits but when I
> use StreamingUpdateSolrServer, the program never exits. And I couldn't find
> a way to close it? Are there any best practices here? Do I have to do
> anything differently at the time of documents adds/updates when
> using StreamingUpdateSolrServer? I am following the add/commit cycle. Is
> that ok?
>
> thanks


Re: Range facet - Count in facet menu != Count in search results

2012-02-10 Thread Yuhao
Jay,

Was the curly closing bracket "}" intentional?  I'm using 3.4, which also 
supports "fq=price:[10 TO 20]".  The problem is the results are not working 
properly.





 From: Jan Høydahl 
To: solr-user@lucene.apache.org; Yuhao  
Sent: Thursday, February 9, 2012 7:45 PM
Subject: Re: Range facet - Count in facet menu != Count in search results
 
Hi,

If you use trunk (4.0) version, you can say fq=price:[10 TO 20} and have the 
upper bound be exclusive.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 10. feb. 2012, at 00:58, Yuhao wrote:

> I've changed the "facet.range.include" option to every possible value (lower, 
> upper, edge, outer, all)**.  It only changes the count shown in the "Ranges" 
> facet menu on the left.  It has no effect on the count and results shown in 
> search results, which ALWAYS is inclusive of both the lower AND upper bounds 
> (which is equivalent to "include = all").  Is this by design?  I would like 
> to make the search results include the lower bound, but not the upper bound.  
> Can I do that?
> 
> My range field is multi-valued, but I don't think that should be the problem.
> 
> ** Actually, it doesn't like "outer" for some reason, which leaves the facet 
> completely empty.

Re: Re: solr search speed is so slow.

2012-02-10 Thread Erick Erickson
Please re-read Hoss' response. There is no need to warm all queries, that will
be very slow for autowarming and you quickly reach a point of
diminishing returns.

Best
Erick

2012/2/9 Rong Kang :
> Thanks for your reply.
>
> I didn't use any other params except  q(for example 
> http://localhost:8080/solr/search?q=drugs). no facet, no sort.
> I don't think configure newSearcher or firstSearcher can help, because I want 
> every query can be very fast. Do you have other solution?
>  I think 460ms is too slow even though a word  is firstly searched.
>
>
> My computer 's setting:
> cpu: amd 5000, 2.2GHz, 1 cpu with 2 cores.
> main memory: 2G, 800Mhz
> disk drive : 7200r/min
>
> This is my  full search configuration:
>
>
>   class="org.apache.solr.handler.component.SearchHandler">
>       
>          xslt
>          dismaxdoc.xsl
>          -1
>          all
>          off
>           filename
>          10
>          dismax
>           filename^5.0 text^1.5
>          *:*
>          on
>          filename text
>         true
>       
> 
> 100
> 100
>          filename
>          100
>          3
>
>       
>  
>
>
> and my schema.xml
>
>
>  
>        termVectors="true" termPositions="true" termOffsets="true"/>
>        required="true" termVectors="true" termPositions="true" termOffsets="true"/>
>       
>  
>  text
>  id
>  
>
>
> and
>
>
>  positionIncrementGap="100">
>      
>        
>         generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
> splitOnCaseChange="1"/>
>        
>        
>      
>      
>        
>         ignoreCase="true" expand="true"/>
>         generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
> splitOnCaseChange="1"/>
>        
>        
>      
>    
> 
>      
>        
>         words="stopwords.txt"/>
>         generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
> splitOnCaseChange="1"/>
>        
>        
>      
>      
>        
>         ignoreCase="true" expand="true"/>
>         words="stopwords.txt"/>
>         generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
> splitOnCaseChange="1"/>
>        
>
>        
>      
>    
>
>
> At 2012-02-10 11:49:39,"Chris Hostetter"  wrote:
>>
>>: When I first search one word in solr . its response time is 460ms. When
>>: I search the same word the second time. its response time is under 70ms.
>>: I can't tolerate 460ms . Does anyone know how to improve performance?
>>
>>tell us more about the query itself -- what params did you use?  did you
>>sort? did you facet?
>>
>>(the only info you've given us so far is what defaults you configured in
>>your handler, but not what params you used at query time)
>>
>>
>>: and my search configuration
>>:      dismax
>>:            filename^5.0 text^1.5
>>:
>>:
>>:           *:*
>>:           on
>>:           filename text
>>:  true
>>: 
>>: 
>>: 100
>>:           filename
>>:           3
>>
>>-Hoss


RE: Index Start Question

2012-02-10 Thread Hoffman, Chase
Erick,

Thanks for the suggestion. I think we're going to go that route.

Best,

--Chase

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, February 09, 2012 12:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Index Start Question

Hmmm. You say:

"The DBA opens a command line prompt and initiates an index build/rebuild"

How? By issuing a curl command? Running a program? It seems to me that the
easiest thing to do here would be to create a small program that kicks
off the indexing process and have *that* program send the e-mails when
it starts and perhaps a completion e-mail after it's done.

Seems a lot surer than trying to infer the action from the Solr logs...

Best
Erick

On Thu, Feb 9, 2012 at 10:43 AM, Hoffman, Chase  wrote:
> Erick,
>
> My understanding of the process is this:
>
> 1. The DBA opens a command line prompt and initiates an index build/rebuild
> 2. SOLR performs said index build/rebuild
> 3. Index finishes
>
> I don't think we're appending documents to the SOLR index - it's indexing 
> MSSQL tables.  The servers these are running on aren't beefy enough to run 
> multiple SOLR index builds at the same time.  So the hope is to find some key 
> in the logs that shows the start of the index rebuild so that I can put in 
> some automation to blast out an email saying "Server X is currently running 
> an index, do not kick off an index run on Server X".
>
> Thanks so much for your help.
>
> Best,
>
> --Chase
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, February 09, 2012 9:39 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Index Start Question
>
> OK, what do you mean by "index is kicked off"? You mean starting Solr or 
> actually adding a document to a running Solr?
>
> If the latter, you're probably looking for something like this:
> Feb 9, 2012 10:34:26 AM
> org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: {add=[eoe32]} 0 6
>
> The important bits are solr.update.processor and the add=blahblah bit where 
> the stuff after the = will be a list of s for the document(s) 
> added.
>
> However, this will be somewhat fragile, the format of the logged messages is 
> not guaranteed in future versions.
>
> Although this is happening, I think, after the doc has been added to the 
> index, so it may be too late for your problem.
>
> Best
> Erick
>
> On Wed, Feb 8, 2012 at 3:13 PM, Hoffman, Chase  wrote:
>> Please forgive me if this is a dumb question.  I've never dealt with SOLR 
>> before, and I'm being asked to determine from the logs when a SOLR index is 
>> kicked off (it is a Windows server).  The TOMCAT service runs continually, 
>> so no love there.  In parsing the logs, I think 
>> "org.apache.solr.core.SolrResourceLoader " is the indicator, since 
>> "org.apache.solr.core.SolrCore execute" seems to occur even when I know an 
>> index has not been started.
>>
>> Any advice you could give me would be wonderful.
>>
>> Best,
>>
>> --Chase
>>
>> Chase Hoffman
>> Infrastructure Systems Administrator, Performance Technologies The
>> Advisory Board Company
>> 512-681-2190 direct | 512-609-1150 fax
>> hoffm...@advisory.com |
>> www.advisory.com
>>
>> Don't miss out-log in now
>> Unlock thousands of members-only tools, events, best practices, and more at 
>> www.advisory.com.
>> Get
>> started> SignatureLine|Other|ABC|Login+8+Reasons|Nov212011>
>



Re: Range facet - Count in facet menu != Count in search results

2012-02-10 Thread Erick Erickson
I'll answer for Jan "Yes". Prior to 4.0, you cannot mix
inclusive and exclusive operators on a range query. see:
https://issues.apache.org/jira/browse/SOLR-355. If you
can't go to 4.0, you can cheat and make, say, your top
value a tiny bit less than the boundary. For an int-based
field [1 To 20] use [1 TO 19]. For a float field [1 TO 19.99]
or some such.

Faceting is not really related to what's in the results
list in terms of counts, etc., it's just a way of counting
buckets. Changing the faceting parameters will not change
the displayed results, which is what it appears you're
expecting

Best
Erick

On Fri, Feb 10, 2012 at 10:45 AM, Yuhao  wrote:
> Jay,
>
> Was the curly closing bracket "}" intentional?  I'm using 3.4, which also 
> supports "fq=price:[10 TO 20]".  The problem is the results are not working 
> properly.
>
>
>
>
> 
>  From: Jan Høydahl 
> To: solr-user@lucene.apache.org; Yuhao 
> Sent: Thursday, February 9, 2012 7:45 PM
> Subject: Re: Range facet - Count in facet menu != Count in search results
>
> Hi,
>
> If you use trunk (4.0) version, you can say fq=price:[10 TO 20} and have the 
> upper bound be exclusive.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 10. feb. 2012, at 00:58, Yuhao wrote:
>
>> I've changed the "facet.range.include" option to every possible value 
>> (lower, upper, edge, outer, all)**.  It only changes the count shown in the 
>> "Ranges" facet menu on the left.  It has no effect on the count and results 
>> shown in search results, which ALWAYS is inclusive of both the lower AND 
>> upper bounds (which is equivalent to "include = all").  Is this by design?  
>> I would like to make the search results include the lower bound, but not the 
>> upper bound.  Can I do that?
>>
>> My range field is multi-valued, but I don't think that should be the problem.
>>
>> ** Actually, it doesn't like "outer" for some reason, which leaves the facet 
>> completely empty.


How to define field type

2012-02-10 Thread Torlaf15
Hello,

I hope someone can help me.

I have several documents with the fields content, author, ... indexed.
Now I would like to make a faceted search.

The exact problem is with me following:
As a result (SolrResponse) for query I get: facet_fields= {author =
{first name=1, surname = 1}}.

As a result I would like to get the whole authors' names .
Like this:.facet_fields={author={firstname surname=1}}  

I tried to change the type of field author to string. But then it is no
longer possible to search the content of this field.

I hope someone give me some advice.

Many thanks for your time
Toralf

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-define-field-type-tp3732986p3732986.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr / Tika Integration

2012-02-10 Thread Dirk Högemann
Interesting thing is that the only Tool I found to handle my pdf correctly
was pdftotext.


2012/2/10 Robert Muir 

> On Fri, Feb 10, 2012 at 6:18 AM, Dirk Högemann
>  wrote:
> >
> > Our suggest component and parts of our search is getting hard to use by
> > this. Any other ideas?
> >
>
> Looks like https://issues.apache.org/jira/browse/PDFBOX-371
>
> The title of the issue is a bit confusing (I don't think it should go
> to hyphen either!), but I think its the reason its being mapped to a
> space.
>
> --
> lucidimagination.com
>


(Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
Was there a fix recently to address sorting issues for Dates in solr
cloud?  On my cluster I have a date field which when I sort across the
cluster I get incorrect order executing the following query I get

solr/select?distrib=true&q=paul&sort=datetime_dt%20desc&fl=datetime_dt


  
2009-10-31T16:48:10Z
  
  
2009-10-30T20:52:23Z
  
  
2009-10-27T03:28:35Z
  
  
2009-10-30T00:47:11Z
  
...

if distrib is set to false, i.e
solr/select?distrib=false&q=paul&sort=datetime_dt%20desc&fl=datetime_dt


  
2009-10-26T04:39:51Z
  
  
2009-10-24T23:24:30Z
  
  
2009-10-24T10:53:58Z
  
  
2009-10-23T19:14:01Z
  
  
2009-10-19T03:15:24Z
  

Again, I have not noticed this on trunk, but I'm working with a much
smaller data set so it's tough to say for sure right now


Re: Range facet - Count in facet menu != Count in search results

2012-02-10 Thread Darren Govoni
Double check your default operator for a faceted search vs. regular
search. I caught this difference in my work that explained this
difference.

On Fri, 2012-02-10 at 07:45 -0800, Yuhao wrote:
> Jay,
> 
> Was the curly closing bracket "}" intentional?  I'm using 3.4, which also 
> supports "fq=price:[10 TO 20]".  The problem is the results are not working 
> properly.
> 
> 
> 
> 
> 
>  From: Jan Høydahl 
> To: solr-user@lucene.apache.org; Yuhao  
> Sent: Thursday, February 9, 2012 7:45 PM
> Subject: Re: Range facet - Count in facet menu != Count in search results
>  
> Hi,
> 
> If you use trunk (4.0) version, you can say fq=price:[10 TO 20} and have the 
> upper bound be exclusive.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
> 
> On 10. feb. 2012, at 00:58, Yuhao wrote:
> 
> > I've changed the "facet.range.include" option to every possible value 
> > (lower, upper, edge, outer, all)**.  It only changes the count shown in the 
> > "Ranges" facet menu on the left.  It has no effect on the count and results 
> > shown in search results, which ALWAYS is inclusive of both the lower AND 
> > upper bounds (which is equivalent to "include = all").  Is this by design?  
> > I would like to make the search results include the lower bound, but not 
> > the upper bound.  Can I do that?
> > 
> > My range field is multi-valued, but I don't think that should be the 
> > problem.
> > 
> > ** Actually, it doesn't like "outer" for some reason, which leaves the 
> > facet completely empty.




Re: How to define field type

2012-02-10 Thread Erick Erickson
Typically this is handle by defining
 a second field of type string and use
copyField to copy from author to this
new field, say, author_facet.

Then do your facets on author_facet
but do searches on author.


Best
Erick

On Fri, Feb 10, 2012 at 11:19 AM, Torlaf15  wrote:
> Hello,
>
> I hope someone can help me.
>
> I have several documents with the fields content, author, ... indexed.
> Now I would like to make a faceted search.
>
> The exact problem is with me following:
> As a result (SolrResponse) for query I get: facet_fields= {author =
> {first name=1, surname = 1}}.
>
> As a result I would like to get the whole authors' names .
> Like this:.facet_fields={author={firstname surname=1}}
>
> I tried to change the type of field author to string. But then it is no
> longer possible to search the content of this field.
>
> I hope someone give me some advice.
>
> Many thanks for your time
> Toralf
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-define-field-type-tp3732986p3732986.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Yonik Seeley
On Fri, Feb 10, 2012 at 11:44 AM, Jamie Johnson  wrote:
> Was there a fix recently to address sorting issues for Dates in solr
> cloud?  On my cluster I have a date field which when I sort across the
> cluster I get incorrect order executing the following query I get

Yikes!  There haven't been any fixes recently that I know of.
What version of Solr is this?

-Yonik
lucidimagination.com


Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
This is an snapshot of the solrcloud branch from somewhere between a
year and 6 months ago (can't really remember off hand) with some
custom components, I'm thinking that the custom components may be
messing something up.  I'm removing them now to test this without
those to make sure that the issue is on my end, will report shortly.

On Fri, Feb 10, 2012 at 12:16 PM, Yonik Seeley
 wrote:
> On Fri, Feb 10, 2012 at 11:44 AM, Jamie Johnson  wrote:
>> Was there a fix recently to address sorting issues for Dates in solr
>> cloud?  On my cluster I have a date field which when I sort across the
>> cluster I get incorrect order executing the following query I get
>
> Yikes!  There haven't been any fixes recently that I know of.
> What version of Solr is this?
>
> -Yonik
> lucidimagination.com


Re: Geospatial search with multivalued field

2012-02-10 Thread Mikhail Khludnev
Marian,

Sorry, I completely forgot to mention.
Pls check David's instruction
https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13117350&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13117350

The patch you tried to use is just my amendment for the David's zip. In
according to his comment it's already in Solr2155-1.0.3-project.zip.

Regards

On Fri, Feb 10, 2012 at 5:32 PM, Marian Steinbach  wrote:

> 2012/2/9 Mikhail Khludnev :
> > Some time ago I tested backported patch from
> > https://issues.apache.org/jira/browse/SOLR-2155
> > it works.
>
> OK, I would do that. But...
>
> Against which version can/should I apply the patch? (I am not
> restricted by other requirements so far.)
>
> Then I tried both with the trunk and with 3.4.0-src, but I can't even
> find the files the patch wants to modify. Some are moved, but others
> don't exist.
>
> Some of the files mentioned in the patch can be
> solr/src/java/org/apache/solr/search/QParserPlugin.java
> => solr/core/src/java/org/apache/solr/search/QParserPlugin.java
>
> solr/src/test/org/apache/solr/search/SpatialFilterTest.java
> => solr/core/src/test/org/apache/solr/search/SpatialFilterTest.java
>
>
> lucene/contrib/spatial/src/java/org/apache/lucene/spatial/geometry/shape/MultiGeom.java
> => not found at all.
>
> This was all for
> "SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch".
>
> For the file "Solr2155-for-1.0.2-3.x-port.patch", similar things
> happen. E.g. "HaversineMultiConstFunction.java" isn't anywhere in the
> 3.4.0 src tarball.
>
> What would I have to do in order to get any Solr version patched with 2155?
>
> Thanks!
>
> Marian
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: How to define field type

2012-02-10 Thread Torlaf15
Hi,

that sounds very good.

Thank you
Toralf

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-define-field-type-tp3732986p3733350.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Empty results with OR filter query

2012-02-10 Thread Steven Ou
For anyone having this issue in the future:

I managed to narrow it down to Solr-RA 3.5. Installing Solr 3.5 solved the
issue. I don't really know how the internals of Solr-RA work, but it
appears that it was using AND operators even when I explicitly used OR
operators in the query. The other solution was to set defaultOperator to
OR, but I wasn't sure how this would affect my other queries. Would
explicit AND operators now become OR operators?

Anyway, thanks to Erik for helping me troubleshoot this!
--
Steven Ou | 歐偉凡

*ravn.com* | Chief Technology Officer
steve...@gmail.com | +1 909-569-9880


On Thu, Feb 9, 2012 at 5:14 PM, Erik Hatcher  wrote:

>
> On Feb 9, 2012, at 20:11 , Steven Ou wrote:
>
> > Sorry, what do you mean "explicit category rather than boolean
> expression"?
>
> q=category_ids_im:634 for example.  Just to get an idea of what matches
> each category.
>
> > Type was not changed midstream - hasn't really been changed ever, really.
> > And I happen to have *just* reindexed, too.
> >
> > Don't seem to have a default operator set. Not sure how to do it,
> either...?
>
> Look at Solr's example schema.xml.  It'll have it spelled out there.
>
>Erik
>
>
> > --
> > Steven Ou | 歐偉凡
> >
> > *ravn.com* | Chief Technology Officer
> > steve...@gmail.com | +1 909-569-9880
> >
> >
> > On Thu, Feb 9, 2012 at 5:01 PM, Erik Hatcher 
> wrote:
> >
> >> Extremely odd.
> >>
> >> Hmmm... other things to try:
> >>
> >> * query on an explicit category, rather than in a boolean expression
> >> * try a different field type than sint (say just int, or string)
> >> * shouldn't matter (since you're using "OR" explicitly) but double check
> >> the default operator in schema.xml
> >> * reindex (was the field type ever changed mid-stream?)
> >>
> >> Definitely something fishy here.  Nothing obvious pops out yet.
> >>
> >>   Erik
> >>
> >>
> >> On Feb 9, 2012, at 19:53 , Steven Ou wrote:
> >>
> >>> Actually, I take that back. Using q instead of fq still produces the
> same
> >>> problem. Somehow it's *less* inconsistent so at first glance it looked
> >> like
> >>> it fixed it. However, it does *not* fix it :(
> >>> --
> >>> Steven Ou | 歐偉凡
> >>>
> >>> *ravn.com* | Chief Technology Officer
> >>> steve...@gmail.com | +1 909-569-9880
> >>>
> >>>
> >>> On Thu, Feb 9, 2012 at 4:48 PM, Steven Ou  wrote:
> >>>
>  Well, keeping all other filter queries the same, changing fq=
>  category_ids_im:(637+OR+639) to fq=category_ids_im:(637+OR+639+OR+634)
>  causes results to not show up.
> 
>  In fact, I took out *all* other filter queries. And while I wasn't
> able
>  to reproduce it exactly, nonetheless when I added the third category
> id
> >> the
>  number of results *went down*. Which is consistently inconsistent, per
>  se. Adding an OR cannot, logically, reduce the number of results!
>  --
>  Steven Ou | 歐偉凡
> 
>  *ravn.com* | Chief Technology Officer
>  steve...@gmail.com | +1 909-569-9880
> 
> 
> 
>  On Thu, Feb 9, 2012 at 4:39 PM, Erik Hatcher  >>> wrote:
> 
> > Yes, certainly should work fine as a filter query... I was merely
> >> trying
> > to eliminate variables from the equation.  You've got several filters
> >> and a
> > q=*:* going on below, so it's obviously harder to pinpoint what could
> >> be
> > going wrong.  I suggest continuing to eliminate variables here, as
> more
> > than likely some other filter is causing the documents you think
> should
> > appear to be filtered out.
> >
> >  Erik
> >
> >
> >
> > On Feb 9, 2012, at 19:24 , Steven Ou wrote:
> >
> >> By turning fq=category_ids_im:(637+OR+639+OR+634) to
> >> q=category_ids_im:(637+OR+639+OR+634)
> >> it appears to produce the correct results. But... that doesn't seem
> to
> > make
> >> sense to me? Shouldn't it work just fine as a filter query?
> >> --
> >> Steven Ou | 歐偉凡
> >>
> >> *ravn.com* | Chief Technology Officer
> >> steve...@gmail.com | +1 909-569-9880
> >>
> >>
> >> On Thu, Feb 9, 2012 at 4:20 PM, Steven Ou 
> wrote:
> >>
> >>> I don't really know how to analyze the debug output... Here it is
> for
> > the
> >>> full query I'm running, which includes other filter queries.
> >>>
> >>> 
> >>> *:*
> >>> *:*
> >>> MatchAllDocsQuery(*:*)
> >>> *:*
> >>> 
> >>> LuceneQParser
> >>> 
> >>> type:Event
> >>> displayable_b:true
> >>> category_ids_im:(637 OR 639 OR 634)
> >>> end_datetime_dt:[2012\-02\-10T00\:17\:52Z TO *]
> >>> {!geofilt}
> >>> 
> >>> 
> >>> type:Event
> >>> displayable_b:true
> >>> 
> >>> category_ids_im:637 category_ids_im:639 category_ids_im:634
> >>> 
> >>> end_datetime_dt:[1328833072000 TO *]
> >>> 
> >>>
> >>>
> >
> >>
> SpatialDistanceQuery(geofilt(latlonSource=coordinates_lls(double(coordinates_lls_0_coordinate),double(coordinates_

SolrJ and INFO level logging

2012-02-10 Thread Shawn Heisey
In SolrJ, when using CommonsHttpSolrServer, SolrJ doesn't log anything 
at or below the INFO level.  When I have the logging turned on at that 
level, I only see log messages that I have placed within my own code.  
If I log at DEBUG, then I do see SolrJ log messages.


When I switched to StreamingUpdateSolrServer for indexing, suddenly I 
began to see a few INFO messages from SolrJ.  The code is under 
development and not currently in a runnable state.  I can't get the 
actual log messages, so I apologize.


Which is the incorrect behavior at the INFO level - CHSS logging 
nothing, or SUSS generating logs?


Thanks,
Shawn



Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
It looks like everything works fine without my custom component, which
is good for Solr, bad for me.  The custom component does some
additional authorization processing to remove docs that the user does
not have access to.  To do this we're iterating through
responseBuilder.getResults().docList and removing any documents that
the user should not be able to see.  Removing bad items works fine,
but sorting isn't quite right after doing this.  At this point is the
docList completely sorted, or is there an optimization inside solr
which only sorts the top X documents?  I'm grabbing at straws here
because for the life of me I can't figure out what is causing this.


I'm doing all of the filtering inside of the process method in my
custom SearchComponent.

On Fri, Feb 10, 2012 at 12:41 PM, Jamie Johnson  wrote:
> This is an snapshot of the solrcloud branch from somewhere between a
> year and 6 months ago (can't really remember off hand) with some
> custom components, I'm thinking that the custom components may be
> messing something up.  I'm removing them now to test this without
> those to make sure that the issue is on my end, will report shortly.
>
> On Fri, Feb 10, 2012 at 12:16 PM, Yonik Seeley
>  wrote:
>> On Fri, Feb 10, 2012 at 11:44 AM, Jamie Johnson  wrote:
>>> Was there a fix recently to address sorting issues for Dates in solr
>>> cloud?  On my cluster I have a date field which when I sort across the
>>> cluster I get incorrect order executing the following query I get
>>
>> Yikes!  There haven't been any fixes recently that I know of.
>> What version of Solr is this?
>>
>> -Yonik
>> lucidimagination.com


Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-10 Thread geeky2
hello,

>>
Or does your field in schema.xml have anything like
autoGeneratePhraseQueries="true" in it?
<<

there is no reference to this in our production schema.

this is extremely confusing.

i am not completely clear on the issue?

reviewing our previous messages - it looks like the data is being tokenized
correctly according to the analysis page and output from Luke.

it also looks like the definition of the field and field type is correct in
the schema.xml

it also looks like there is no errant data (quotes) being introduced in to
the query string submitted to solr:

example:

*http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select?indent=on&version=2.2&q=itemNo%3ABP21UAA&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on&explainOther=&hl.fl=*

*so - does the real issue reside in HOW the query is being contructed /
parsed ???

and if so - what drives this query to become a MultiPhraseQuery with
embedded quotes 
*

itemNo:BP21UAA
itemNo:BP21UAA
MultiPhraseQuery(itemNo:"bp 21 (uaa
bp21uaa)")itemNo:"bp 21 (uaa
bp21uaa)"

please note - i also mocked up a simple test on my personal linux box - just
using the solr 3.5 distro (we are using 3.3.0 on our production box under
centOS)

i was able to get a simple test to work and yes - my query does look
different

output from my simple mock up on my personal box:

*http://localhost:8983/solr/select?indent=on&version=2.2&q=manu%3ABP21UAA&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on&explainOther=&hl.fl=*

manu:BP21UAAmanu:BP21UAAmanu:bp manu:21
manu:uaa manu:bp21uaamanu:bp manu:21
manu:uaa manu:bp21uaa

schema.xml





any suggestions would be greatly appreciated.

mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3733486.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
So looking at query component it appears to sort the entire doc list
at the end of process, my component is defined after this query so the
doclist that I get should be sorted, right?  To me this should mean
that I can remove items from this list and shift everything left as
needed and it should work fine, but this isn't what appears to be
happening.  For queries that are not distributed I don't see this
issue, only for distributed queries.


On Fri, Feb 10, 2012 at 2:23 PM, Jamie Johnson  wrote:
> It looks like everything works fine without my custom component, which
> is good for Solr, bad for me.  The custom component does some
> additional authorization processing to remove docs that the user does
> not have access to.  To do this we're iterating through
> responseBuilder.getResults().docList and removing any documents that
> the user should not be able to see.  Removing bad items works fine,
> but sorting isn't quite right after doing this.  At this point is the
> docList completely sorted, or is there an optimization inside solr
> which only sorts the top X documents?  I'm grabbing at straws here
> because for the life of me I can't figure out what is causing this.
>
>
> I'm doing all of the filtering inside of the process method in my
> custom SearchComponent.
>
> On Fri, Feb 10, 2012 at 12:41 PM, Jamie Johnson  wrote:
>> This is an snapshot of the solrcloud branch from somewhere between a
>> year and 6 months ago (can't really remember off hand) with some
>> custom components, I'm thinking that the custom components may be
>> messing something up.  I'm removing them now to test this without
>> those to make sure that the issue is on my end, will report shortly.
>>
>> On Fri, Feb 10, 2012 at 12:16 PM, Yonik Seeley
>>  wrote:
>>> On Fri, Feb 10, 2012 at 11:44 AM, Jamie Johnson  wrote:
 Was there a fix recently to address sorting issues for Dates in solr
 cloud?  On my cluster I have a date field which when I sort across the
 cluster I get incorrect order executing the following query I get
>>>
>>> Yikes!  There haven't been any fixes recently that I know of.
>>> What version of Solr is this?
>>>
>>> -Yonik
>>> lucidimagination.com


Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Yonik Seeley
On Fri, Feb 10, 2012 at 2:48 PM, Jamie Johnson  wrote:
> So looking at query component it appears to sort the entire doc list
> at the end of process, my component is defined after this query so the
> doclist that I get should be sorted, right?  To me this should mean
> that I can remove items from this list and shift everything left as
> needed and it should work fine, but this isn't what appears to be
> happening.  For queries that are not distributed I don't see this
> issue, only for distributed queries.

The document lists from the shards are merged by looking at the sort values.
Those are looked up by position in a different part of the response
(generated by fsv=true).
If you just mess with the doclists, those sort values will no longer
"line up" (doc #5 won't correspond to fsv slot #5).

Short solution: if you remove a doc, remove that slot from all of the
sort values

Better solution: We have pseudo-fields now... we should add sort
values directly to the documents so this type of parallel structure is
no longer needed.

-Yonik
lucidimagination.com


Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
I'd like to look at the pseudo fields you're talking about (don't
really understand it right now), but need to get something working in
the short term.  How do I go about removing these from the sort
values?

On Fri, Feb 10, 2012 at 3:06 PM, Yonik Seeley
 wrote:
> On Fri, Feb 10, 2012 at 2:48 PM, Jamie Johnson  wrote:
>> So looking at query component it appears to sort the entire doc list
>> at the end of process, my component is defined after this query so the
>> doclist that I get should be sorted, right?  To me this should mean
>> that I can remove items from this list and shift everything left as
>> needed and it should work fine, but this isn't what appears to be
>> happening.  For queries that are not distributed I don't see this
>> issue, only for distributed queries.
>
> The document lists from the shards are merged by looking at the sort values.
> Those are looked up by position in a different part of the response
> (generated by fsv=true).
> If you just mess with the doclists, those sort values will no longer
> "line up" (doc #5 won't correspond to fsv slot #5).
>
> Short solution: if you remove a doc, remove that slot from all of the
> sort values
>
> Better solution: We have pseudo-fields now... we should add sort
> values directly to the documents so this type of parallel structure is
> no longer needed.
>
> -Yonik
> lucidimagination.com


Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
doing some copying I came up with the following

boolean fsv =
req.getParams().getBool(ResponseBuilder.FIELD_SORT_VALUES,false);
if(fsv){
NamedList sortVals = (NamedList) 
rsp.getValues().get("sort_values");
  Sort sort = searcher.weightSort(rb.getSortSpec().getSort());
  SortField[] sortFields = sort==null ? new
SortField[]{SortField.FIELD_SCORE} : sort.getSort();
for (SortField sortField: sortFields) {
String fieldname = sortField.getField();
ArrayList list = (ArrayList) 
sortVals.get(fieldname);
for(int index = 0; index < removedDocs.length; index ++)
list.remove(removedDocs[index]);
}
}

this seems to have worked, need to do more testing but I don't
understand why it worked, what exactly is this doing?

On Fri, Feb 10, 2012 at 3:12 PM, Jamie Johnson  wrote:
> I'd like to look at the pseudo fields you're talking about (don't
> really understand it right now), but need to get something working in
> the short term.  How do I go about removing these from the sort
> values?
>
> On Fri, Feb 10, 2012 at 3:06 PM, Yonik Seeley
>  wrote:
>> On Fri, Feb 10, 2012 at 2:48 PM, Jamie Johnson  wrote:
>>> So looking at query component it appears to sort the entire doc list
>>> at the end of process, my component is defined after this query so the
>>> doclist that I get should be sorted, right?  To me this should mean
>>> that I can remove items from this list and shift everything left as
>>> needed and it should work fine, but this isn't what appears to be
>>> happening.  For queries that are not distributed I don't see this
>>> issue, only for distributed queries.
>>
>> The document lists from the shards are merged by looking at the sort values.
>> Those are looked up by position in a different part of the response
>> (generated by fsv=true).
>> If you just mess with the doclists, those sort values will no longer
>> "line up" (doc #5 won't correspond to fsv slot #5).
>>
>> Short solution: if you remove a doc, remove that slot from all of the
>> sort values
>>
>> Better solution: We have pseudo-fields now... we should add sort
>> values directly to the documents so this type of parallel structure is
>> no longer needed.
>>
>> -Yonik
>> lucidimagination.com


Re: Geospatial search with multivalued field

2012-02-10 Thread Marian Steinbach
2012/2/10 Mikhail Khludnev :
> Marian,
>
> Sorry, I completely forgot to mention.
> Pls check David's instruction
> https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13117350&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13117350
>
> The patch you tried to use is just my amendment for the David's zip. In
> according to his comment it's already in Solr2155-1.0.3-project.zip.

Mikhail, thank you! That explains a lot.

With the ZIP project, I still can't figure out where to put it and
what to apply it to. Do I have to put it into the 3.4.0-src folder
hierarchy? If yes, where exactly?

Marian


Re: correct usage of StreamingUpdateSolrServer?

2012-02-10 Thread T Vinod Gupta
here is how i was playing with it..

StreamingUpdateSolrServer solrServer = new
StreamingUpdateSolrServer("http://localhost:8983/solr/";, 10, 1);

SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( "pk_id", "id1");
doc1.addField("doc_type", "content");
doc1.addField( "id", "1");
doc1.addField( "content_text", "hello world" );

Collection docs = new
ArrayList();
docs.add(doc1);
solrServer.add(docs);
solrServer.commit();

thanks

On Fri, Feb 10, 2012 at 7:41 AM, Erick Erickson wrote:

> Can you post the code? SUSS should essentially be a drop-in
> replacement for CHSS.
>
> It's not advisable to commit after every add, it's usually better
> to use commitWithin, and perhaps commit at the very end of
> the run.
>
> Best
> Erick
>
> On Thu, Feb 9, 2012 at 4:00 PM, T Vinod Gupta 
> wrote:
> > Hi,
> > I wrote a hello world program to add documents to solr server. When I
> > use CommonsHttpSolrServer, the program exits but when I
> > use StreamingUpdateSolrServer, the program never exits. And I couldn't
> find
> > a way to close it? Are there any best practices here? Do I have to do
> > anything differently at the time of documents adds/updates when
> > using StreamingUpdateSolrServer? I am following the add/commit cycle. Is
> > that ok?
> >
> > thanks
>


Re: indexing with DIH (and with problems)

2012-02-10 Thread alessio crisantemi
with rootEntity="false" it's the same..
help!
2012/2/10 Chantal Ackermann 

>
>
> On Thu, 2012-02-09 at 23:45 +0100, alessio crisantemi wrote:
> > hi all,
> > I would index on solr my pdf files wich includeds on my directory
> c:\myfile\
> >
> > so, I add on my solr/conf directory the file data-config.xml like the
> > following:
> >
> >
> > 
> > 
> > 
> > 
> Why do you set rootEntity="false" on the root entity?
> This looks odd to me - but I can be wrong, of course.
>
> If DIH shows this:
> """
> *0*
> """
>
> DIH hasn't even retrieved any data from you data source. Check that the
> call you have configured really returns any documents.
>
>
> Chantal
>
>
>
>
> > processor="FileListEntityProcessor"
> > baseDir="c:\myfile\" fileName="*.pdf"
> > recursive="true">
> >  > url="${f.fileAbsolutePath}" format="text">
> > 
> > 
> >  
> > 
> > 
> > 
> > 
> >
> > before, I add this part into solr-config.xml:
> >
> >
> >  > class="org.apache.solr.handler.dataimport.DataImportHandler">
> > 
> >   c:\solr\conf\data-config.xml
> > 
> >   
> >
> >
> > but this is the result:
> >
> > 
> > * * *delta-import*
> >  * * *idle*
> >  * * 
> >  *-*<
> http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=delta-import#
> >
> > 
> >  * * *0:0:2.512*
> >  * * *0*
> >  * * *0*
> >  * * *0*
> >  * * *0*
> >  * * *2012-02-09 23:37:07*
> >  * * *Indexing failed. Rolled back all changes.*
> >  * * *2012-02-09 23:37:07*
> > * * 
> >  * * *This response format is experimental. It is
> > likely to change in the future.*
> > * * 
> >
> > suggestions?
> > thanks
> > alessio
>
>


Re: correct usage of StreamingUpdateSolrServer?

2012-02-10 Thread Erick Erickson
Well, that's certainly "hello world" .

But I'm kinda stumped, I have programs that look an awful lot like
this that terminate just fine.

Anything in your Solr logs? And are you just executing this once?
And what version of Solr are you using?

Best
Erick

On Fri, Feb 10, 2012 at 3:49 PM, T Vinod Gupta  wrote:
> here is how i was playing with it..
>
>        StreamingUpdateSolrServer solrServer = new
> StreamingUpdateSolrServer("http://localhost:8983/solr/";, 10, 1);
>
>        SolrInputDocument doc1 = new SolrInputDocument();
>        doc1.addField( "pk_id", "id1");
>        doc1.addField("doc_type", "content");
>        doc1.addField( "id", "1");
>        doc1.addField( "content_text", "hello world" );
>
>        Collection docs = new
> ArrayList();
>        docs.add(doc1);
>        solrServer.add(docs);
>        solrServer.commit();
>
> thanks
>
> On Fri, Feb 10, 2012 at 7:41 AM, Erick Erickson 
> wrote:
>
>> Can you post the code? SUSS should essentially be a drop-in
>> replacement for CHSS.
>>
>> It's not advisable to commit after every add, it's usually better
>> to use commitWithin, and perhaps commit at the very end of
>> the run.
>>
>> Best
>> Erick
>>
>> On Thu, Feb 9, 2012 at 4:00 PM, T Vinod Gupta 
>> wrote:
>> > Hi,
>> > I wrote a hello world program to add documents to solr server. When I
>> > use CommonsHttpSolrServer, the program exits but when I
>> > use StreamingUpdateSolrServer, the program never exits. And I couldn't
>> find
>> > a way to close it? Are there any best practices here? Do I have to do
>> > anything differently at the time of documents adds/updates when
>> > using StreamingUpdateSolrServer? I am following the add/commit cycle. Is
>> > that ok?
>> >
>> > thanks
>>


Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
Sorry for pinging this again, is more information needed on this?  I
can provide more details but am not sure what to provide.

On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson  wrote:
> Sorry, I shut down the full solr instance.
>
> On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller  wrote:
>> Can you explain a little more how you doing this? How are you bringing the 
>> cores down and then back up? Shutting down a full solr instance, unloading 
>> the core?
>>
>> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
>>
>>> I know that the latest Solr Cloud doesn't use standard replication but
>>> I have a question about how it appears to be working.  I currently
>>> have the following cluster state
>>>
>>> {"collection1":{
>>>    "slice1":{
>>>      "JamiesMac.local:8501_solr_slice1_shard1":{
>>>        "shard_id":"slice1",
>>>        "state":"active",
>>>        "core":"slice1_shard1",
>>>        "collection":"collection1",
>>>        "node_name":"JamiesMac.local:8501_solr",
>>>        "base_url":"http://JamiesMac.local:8501/solr"},
>>>      "JamiesMac.local:8502_solr_slice1_shard2":{
>>>        "shard_id":"slice1",
>>>        "state":"active",
>>>        "core":"slice1_shard2",
>>>        "collection":"collection1",
>>>        "node_name":"JamiesMac.local:8502_solr",
>>>        "base_url":"http://JamiesMac.local:8502/solr"},
>>>      "jamiesmac:8501_solr_slice1_shard1":{
>>>        "shard_id":"slice1",
>>>        "state":"down",
>>>        "core":"slice1_shard1",
>>>        "collection":"collection1",
>>>        "node_name":"jamiesmac:8501_solr",
>>>        "base_url":"http://jamiesmac:8501/solr"},
>>>      "jamiesmac:8502_solr_slice1_shard2":{
>>>        "shard_id":"slice1",
>>>        "leader":"true",
>>>        "state":"active",
>>>        "core":"slice1_shard2",
>>>        "collection":"collection1",
>>>        "node_name":"jamiesmac:8502_solr",
>>>        "base_url":"http://jamiesmac:8502/solr"}},
>>>    "slice2":{
>>>      "JamiesMac.local:8501_solr_slice2_shard2":{
>>>        "shard_id":"slice2",
>>>        "state":"active",
>>>        "core":"slice2_shard2",
>>>        "collection":"collection1",
>>>        "node_name":"JamiesMac.local:8501_solr",
>>>        "base_url":"http://JamiesMac.local:8501/solr"},
>>>      "JamiesMac.local:8502_solr_slice2_shard1":{
>>>        "shard_id":"slice2",
>>>        "state":"active",
>>>        "core":"slice2_shard1",
>>>        "collection":"collection1",
>>>        "node_name":"JamiesMac.local:8502_solr",
>>>        "base_url":"http://JamiesMac.local:8502/solr"},
>>>      "jamiesmac:8501_solr_slice2_shard2":{
>>>        "shard_id":"slice2",
>>>        "state":"down",
>>>        "core":"slice2_shard2",
>>>        "collection":"collection1",
>>>        "node_name":"jamiesmac:8501_solr",
>>>        "base_url":"http://jamiesmac:8501/solr"},
>>>      "jamiesmac:8502_solr_slice2_shard1":{
>>>        "shard_id":"slice2",
>>>        "leader":"true",
>>>        "state":"active",
>>>        "core":"slice2_shard1",
>>>        "collection":"collection1",
>>>        "node_name":"jamiesmac:8502_solr",
>>>        "base_url":"http://jamiesmac:8502/solr"
>>>
>>> I then added some docs to the following shards using SolrJ
>>> http://localhost:8502/solr/slice2_shard1
>>> http://localhost:8502/solr/slice1_shard2
>>>
>>> I then bring back up the other cores and I don't see replication
>>> happening.  Looking at the stats for each core I see that on the 8501
>>> instance (the instance that was off) the number of docs is 0, so I've
>>> clearly set something up incorrectly.  Any help on this would be
>>> greatly appreciated.
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>


new feature: advanced filter caching and post filtering

2012-02-10 Thread Yonik Seeley
Well, not super-new (it's in 3.4), but the spatial post-filtering is
brand new in 4.0 as of today, and I don't think cache=false and
post-filtering was really highlighted well before anyway.

http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/

-Yonik
lucidimagination.com


Re: URI Encoding with Solr and Weblogic

2012-02-10 Thread rzoao
Hello, Elisabeth

I am having the same issue with WebLogic 11 with Solr 3.5. I've tried your
solution and didn't work out, but I'm not sure if I'm doing it right.

I've tried to alter the
%SERVER_HOME%\servers\AdminServer\tmp\_WL_user\solr\t6nzak\war\WEB-INF\weblogic.xml
and restarted the server, but I still get the same error. That's just it?

Did you remove the Dfile.encoding=UTF-8 or did you keep it as well?

Is it working perfectly now? It didn't make any difference in here.

Thank you,

rzoao

--
View this message in context: 
http://lucene.472066.n3.nabble.com/URI-Encoding-with-Solr-and-Weblogic-tp3724153p3733907.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
I'm trying, but so far I don't see anything. I'll have to try and mimic your 
setup closer it seems.

I tried starting up 6 solr instances on different ports as 2 shards, each with 
a replication factor of 3.

Then I indexed 20k documents to the cluster and verified doc counts.

Then I shutdown all the replicas so that only one instance served each shard.

Then I indexed 20k documents to the cluster.

Then I started the downed nodes and verified that they where in a recovery 
state.

After enough time went by I checked and verified document counts on each 
instance - they where as expected.

I guess next I can try a similar experiment using multiple cores, but if you 
notice anything that stands out that is largely different in what you are 
doing, let me know.

The cores that are behind, does it say they are down, recovering, or active in 
zookeeper?

On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:

> Sorry for pinging this again, is more information needed on this?  I
> can provide more details but am not sure what to provide.
> 
> On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson  wrote:
>> Sorry, I shut down the full solr instance.
>> 
>> On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller  wrote:
>>> Can you explain a little more how you doing this? How are you bringing the 
>>> cores down and then back up? Shutting down a full solr instance, unloading 
>>> the core?
>>> 
>>> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
>>> 
 I know that the latest Solr Cloud doesn't use standard replication but
 I have a question about how it appears to be working.  I currently
 have the following cluster state
 
 {"collection1":{
"slice1":{
  "JamiesMac.local:8501_solr_slice1_shard1":{
"shard_id":"slice1",
"state":"active",
"core":"slice1_shard1",
"collection":"collection1",
"node_name":"JamiesMac.local:8501_solr",
"base_url":"http://JamiesMac.local:8501/solr"},
  "JamiesMac.local:8502_solr_slice1_shard2":{
"shard_id":"slice1",
"state":"active",
"core":"slice1_shard2",
"collection":"collection1",
"node_name":"JamiesMac.local:8502_solr",
"base_url":"http://JamiesMac.local:8502/solr"},
  "jamiesmac:8501_solr_slice1_shard1":{
"shard_id":"slice1",
"state":"down",
"core":"slice1_shard1",
"collection":"collection1",
"node_name":"jamiesmac:8501_solr",
"base_url":"http://jamiesmac:8501/solr"},
  "jamiesmac:8502_solr_slice1_shard2":{
"shard_id":"slice1",
"leader":"true",
"state":"active",
"core":"slice1_shard2",
"collection":"collection1",
"node_name":"jamiesmac:8502_solr",
"base_url":"http://jamiesmac:8502/solr"}},
"slice2":{
  "JamiesMac.local:8501_solr_slice2_shard2":{
"shard_id":"slice2",
"state":"active",
"core":"slice2_shard2",
"collection":"collection1",
"node_name":"JamiesMac.local:8501_solr",
"base_url":"http://JamiesMac.local:8501/solr"},
  "JamiesMac.local:8502_solr_slice2_shard1":{
"shard_id":"slice2",
"state":"active",
"core":"slice2_shard1",
"collection":"collection1",
"node_name":"JamiesMac.local:8502_solr",
"base_url":"http://JamiesMac.local:8502/solr"},
  "jamiesmac:8501_solr_slice2_shard2":{
"shard_id":"slice2",
"state":"down",
"core":"slice2_shard2",
"collection":"collection1",
"node_name":"jamiesmac:8501_solr",
"base_url":"http://jamiesmac:8501/solr"},
  "jamiesmac:8502_solr_slice2_shard1":{
"shard_id":"slice2",
"leader":"true",
"state":"active",
"core":"slice2_shard1",
"collection":"collection1",
"node_name":"jamiesmac:8502_solr",
"base_url":"http://jamiesmac:8502/solr"
 
 I then added some docs to the following shards using SolrJ
 http://localhost:8502/solr/slice2_shard1
 http://localhost:8502/solr/slice1_shard2
 
 I then bring back up the other cores and I don't see replication
 happening.  Looking at the stats for each core I see that on the 8501
 instance (the instance that was off) the number of docs is 0, so I've
 clearly set something up incorrectly.  Any help on this would be
 greatly appreciated.
>>> 
>>> - Mark Miller
>>> lucidimagination.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
Also, it will help if you can mention the exact version of solrcloud you are 
talking about in each issue - I know you have one from the old branch, and I 
assume a version off trunk you are playing with - so a heads up on which and if 
trunk, what rev or day will help in the case that I'm trying to dupe issues 
that have been addressed.

- Mark

On Feb 10, 2012, at 6:09 PM, Mark Miller wrote:

> I'm trying, but so far I don't see anything. I'll have to try and mimic your 
> setup closer it seems.
> 
> I tried starting up 6 solr instances on different ports as 2 shards, each 
> with a replication factor of 3.
> 
> Then I indexed 20k documents to the cluster and verified doc counts.
> 
> Then I shutdown all the replicas so that only one instance served each shard.
> 
> Then I indexed 20k documents to the cluster.
> 
> Then I started the downed nodes and verified that they where in a recovery 
> state.
> 
> After enough time went by I checked and verified document counts on each 
> instance - they where as expected.
> 
> I guess next I can try a similar experiment using multiple cores, but if you 
> notice anything that stands out that is largely different in what you are 
> doing, let me know.
> 
> The cores that are behind, does it say they are down, recovering, or active 
> in zookeeper?
> 
> On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:
> 
>> Sorry for pinging this again, is more information needed on this?  I
>> can provide more details but am not sure what to provide.
>> 
>> On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson  wrote:
>>> Sorry, I shut down the full solr instance.
>>> 
>>> On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller  wrote:
 Can you explain a little more how you doing this? How are you bringing the 
 cores down and then back up? Shutting down a full solr instance, unloading 
 the core?
 
 On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
 
> I know that the latest Solr Cloud doesn't use standard replication but
> I have a question about how it appears to be working.  I currently
> have the following cluster state
> 
> {"collection1":{
>   "slice1":{
> "JamiesMac.local:8501_solr_slice1_shard1":{
>   "shard_id":"slice1",
>   "state":"active",
>   "core":"slice1_shard1",
>   "collection":"collection1",
>   "node_name":"JamiesMac.local:8501_solr",
>   "base_url":"http://JamiesMac.local:8501/solr"},
> "JamiesMac.local:8502_solr_slice1_shard2":{
>   "shard_id":"slice1",
>   "state":"active",
>   "core":"slice1_shard2",
>   "collection":"collection1",
>   "node_name":"JamiesMac.local:8502_solr",
>   "base_url":"http://JamiesMac.local:8502/solr"},
> "jamiesmac:8501_solr_slice1_shard1":{
>   "shard_id":"slice1",
>   "state":"down",
>   "core":"slice1_shard1",
>   "collection":"collection1",
>   "node_name":"jamiesmac:8501_solr",
>   "base_url":"http://jamiesmac:8501/solr"},
> "jamiesmac:8502_solr_slice1_shard2":{
>   "shard_id":"slice1",
>   "leader":"true",
>   "state":"active",
>   "core":"slice1_shard2",
>   "collection":"collection1",
>   "node_name":"jamiesmac:8502_solr",
>   "base_url":"http://jamiesmac:8502/solr"}},
>   "slice2":{
> "JamiesMac.local:8501_solr_slice2_shard2":{
>   "shard_id":"slice2",
>   "state":"active",
>   "core":"slice2_shard2",
>   "collection":"collection1",
>   "node_name":"JamiesMac.local:8501_solr",
>   "base_url":"http://JamiesMac.local:8501/solr"},
> "JamiesMac.local:8502_solr_slice2_shard1":{
>   "shard_id":"slice2",
>   "state":"active",
>   "core":"slice2_shard1",
>   "collection":"collection1",
>   "node_name":"JamiesMac.local:8502_solr",
>   "base_url":"http://JamiesMac.local:8502/solr"},
> "jamiesmac:8501_solr_slice2_shard2":{
>   "shard_id":"slice2",
>   "state":"down",
>   "core":"slice2_shard2",
>   "collection":"collection1",
>   "node_name":"jamiesmac:8501_solr",
>   "base_url":"http://jamiesmac:8501/solr"},
> "jamiesmac:8502_solr_slice2_shard1":{
>   "shard_id":"slice2",
>   "leader":"true",
>   "state":"active",
>   "core":"slice2_shard1",
>   "collection":"collection1",
>   "node_name":"jamiesmac:8502_solr",
>   "base_url":"http://jamiesmac:8502/solr"
> 
> I then added some docs to the following shards using SolrJ
> http://localhost:8502/solr/slice2_shard1
> http://localhost:8502/solr/slice1_shard2
> 
> I then bring back up the other cores and I don't see replication
> happening.  Looking at the stats for each core I see that on the 8501
> instance (the instance that was off) the number of docs is 0, so I

Re: indexing with DIH (and with problems)

2012-02-10 Thread alessio crisantemi
Here is a stack:
SEVERE: Full Import failed
org.apache.solr.handler.
dataimport.DataImportHandlerException: Unable to load En
tityProcessor implementation for entity:9946435225838 Processing Document #
1
at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocB
uilder.java:576)
.
.
Caused by: org.apache.solr.common.SolrException: Error loading class
'TikaEntity
Processor'
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.
java:375)
.
.
Caused by: java.lang.ClassNotFoundException: TikaEntityProcessor
at java.net.URLClassLoader$1.run(Unknown Source)
.

why?
Tu

2012/2/10 Gora Mohanty 

> On 10 February 2012 04:15, alessio crisantemi
>  wrote:
> > hi all,
> > I would index on solr my pdf files wich includeds on my directory
> c:\myfile\
> >
> > so, I add on my solr/conf directory the file data-config.xml like the
> > following:
> [...]
>
> > but this is the result:
> [...]
>
> Your Solr URL for dataimport looks a little odd: You seem to be
> doing a delta-import. Normally, one would start with a full import:
> http://solr-host:port/solr/dataimport?command=full-import
>
> Have you looked in the Solr logs for the cause of the exception?
> Please share that with us.
>
> Regards,
> Gora
>


RE: Keyword Tokenizer Phrase Issue

2012-02-10 Thread Zac Smith
Thanks, that explains why the individual terms 'chicken' and 'stock' are still 
in the query (and are required).
So I have tried a few things to get around this, but to no avail:

Changed the query analyzer to use the WhitespaceTokenizerFactory with 
autoGeneratePhraseQueries=true. This creates the correct phrase query, but the 
dismax query still requires the individual terms to match ('chicken' and 
'stock'):
+(DisjunctionMaxQuery((ingredient_synonyms:chicken)~0.01) 
DisjunctionMaxQuery((ingredient_synonyms:stock)~0.01)) 
DisjunctionMaxQuery((ingredient_synonyms:"chicken stock"~100)~0.01)

So the next thing I have tried is to remove the individual terms during the 
query analysis. I did this using the ShingleFilterFactory, so my query analyzer 
now looks like this:

   



This leaves the single term 'chicken stock' in the query analysis and the 
dismax query is:
+() DisjunctionMaxQuery((ingredient_synonyms:chicken stock)~0.01)

Which looks OK except for the +(). It looks like it is requiring an empty 
clause.

This seems like a pretty simple requirement - to only have exact matches on 
multi word text. Am I missing something here?

Thanks
Zac


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Friday, February 10, 2012 1:50 AM
To: solr-user@lucene.apache.org
Subject: RE: Keyword Tokenizer Phrase Issue

Hi Zac,

Field Analysis tool (analysis.jsp) does not perform actual query parsing.

One thing to be aware of when Using Keyword Tokenizer at query time is: Query 
string (chicken stock) is pre-tokenized according to white spaces, before it 
reaches keyword tokenizer.

If you use quotes ("chicken stock"), query parser does no pre-tokenizes, though.

--- On Fri, 2/10/12, Zac Smith  wrote:

> From: Zac Smith 
> Subject: RE: Keyword Tokenizer Phrase Issue
> To: "solr-user@lucene.apache.org" 
> Date: Friday, February 10, 2012, 10:35 AM I have done some further 
> analysis on this and I am now even more confused. When I use the Field 
> Analysis tool with the text 'chicken stock' it highlights that text as 
> a match.
> The dismax query looks ok to me:
> +(DisjunctionMaxQuery((ingredient_synonyms:chicken^0.6)~0.01)
> DisjunctionMaxQuery((ingredient_synonyms:stock^0.6)~0.01))
> DisjunctionMaxQuery((ingredient_synonyms:chicken
> stock^0.6)~0.01)
> 
> Then I have done an explainOther and it shows a failure to meet 
> condition. However there does seem to be some kind of match 
> registered:
> 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited 
> clause(s)
>   0.0 = no match on required clause
> (ingredient_synonyms:chicken^0.6
> ingredient_synonyms:stock^0.6)
>   0.0650662 = (MATCH)
> weight(ingredient_synonyms:chicken stock^0.6 in 0), product
> of:
>     0.21204369 =
> queryWeight(ingredient_synonyms:chicken stock^0.6), product
> of:
>       0.6 = boost
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>       1.1517122 = queryNorm
>     0.30685282 = (MATCH)
> fieldWeight(ingredient_synonyms:chicken stock in 0), product
> of:
>       1.0 =
> tf(termFreq(ingredient_synonyms:chicken stock)=1)
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>       1.0 =
> fieldNorm(field=ingredient_synonyms, doc=0)
> 
> Any ideas?
> 
> My dismax handler is setup like this:
>    class="solr.SearchHandler" >
>     
>       name="defType">dismax
>       name="echoParams">explicit
>       name="tie">0.01
>       name="qf">ingredient_synonyms^0.6
>       name="pf">ingredient_synonyms^0.6
> 
> 
> Zac
> 
> From: Zac Smith
> Sent: Thursday, February 09, 2012 12:52 PM
> To: solr-user@lucene.apache.org
> Subject: Keyword Tokenizer Phrase Issue
> 
> Hi,
> 
> I have a simple field type that uses the KeywordTokenizerFactory. I 
> would like to use this so that values in this field are only matched 
> with the full text of the field.
> e.g. If I indexed the text 'chicken stock', searches on this field 
> would only match when searching for 'chicken stock'.
> If searching for just 'chicken' or just 'stock' there should not 
> match.
> 
> This mostly works, except if there is more than one word in the text I 
> only get a match when searching with quotes.
> e.g.
> "chicken stock" (matches)
> chicken stock (doesn't match)
> 
> Is there any way I can set this up so that I don't have to provide 
> quotes? I am using dismax and if I put quotes in it will mess up the 
> search for the rest of my fields. I had an idea that I could issue a 
> separate search using the regular query parser, but couldn't work out 
> how to do this:
> I thought I could do something like this:
> qt=dismax&q=fish OR _query_:ingredient:"chicken stock"
> 
> I am using solr 3.5.0. My field type is:
>  positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>                
> 
>                
>                
> 
>                
> 
>                
> 
>                
>                
> 
>                
> 
> 
> 
> Thanks
> Zac
> 




Setting up logging for a Solr project that isn't in tomcat/webapps/solr

2012-02-10 Thread Mike O'Leary
I set up a Solr project to run with Tomcat for indexing contents of a database 
by following a web tutorial that described how to put the project directory 
anywhere you want and then put a file called .xml in the 
tomcat/conf/Catalina/localhost directory that contains contents like this:



  


I got this working, and now I would like to create a logging.properties file 
for Solr only, as described in the Apache Solr Reference Guide distributed by 
Lucid. It says:

To change logging settings for Solr only, edit 
tomcat/webapps/solr/WEB-INF/classes/logging.properties. You will need to create 
the classes directory and the logging.properties file. You can set levels from 
FINEST to SEVERE for a class or an entire package. Here are a couple of 
examples:
org.apache.commons.digester.Digester.level = FINEST
org.apache.solr.level = WARNING

I think this explanation assumes that the Solr project is in 
tomcat/webapps/solr. I tried putting a logging.properties file in various 
locations where I hoped Tomcat would pick it up, but none of them worked. If I 
have a solr_db.xml file in tomcat/conf/Catalina/localhost that points to a Solr 
project in C:/projects/solr_apps/solr_db (that was created by copying the 
contents of the apache-solr-3.5.0/example/solr directory to 
C:/projects/solr_apps/solr_db and going from there), where is the right place 
to put a "Solr only" logging.properties file?
Thanks,
Mike


Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
nothing seems that different.  In regards to the states of each I'll
try to verify tonight.

This was using a version I pulled from SVN trunk yesterday morning

On Fri, Feb 10, 2012 at 6:22 PM, Mark Miller  wrote:
> Also, it will help if you can mention the exact version of solrcloud you are 
> talking about in each issue - I know you have one from the old branch, and I 
> assume a version off trunk you are playing with - so a heads up on which and 
> if trunk, what rev or day will help in the case that I'm trying to dupe 
> issues that have been addressed.
>
> - Mark
>
> On Feb 10, 2012, at 6:09 PM, Mark Miller wrote:
>
>> I'm trying, but so far I don't see anything. I'll have to try and mimic your 
>> setup closer it seems.
>>
>> I tried starting up 6 solr instances on different ports as 2 shards, each 
>> with a replication factor of 3.
>>
>> Then I indexed 20k documents to the cluster and verified doc counts.
>>
>> Then I shutdown all the replicas so that only one instance served each shard.
>>
>> Then I indexed 20k documents to the cluster.
>>
>> Then I started the downed nodes and verified that they where in a recovery 
>> state.
>>
>> After enough time went by I checked and verified document counts on each 
>> instance - they where as expected.
>>
>> I guess next I can try a similar experiment using multiple cores, but if you 
>> notice anything that stands out that is largely different in what you are 
>> doing, let me know.
>>
>> The cores that are behind, does it say they are down, recovering, or active 
>> in zookeeper?
>>
>> On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:
>>
>>> Sorry for pinging this again, is more information needed on this?  I
>>> can provide more details but am not sure what to provide.
>>>
>>> On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson  wrote:
 Sorry, I shut down the full solr instance.

 On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller  wrote:
> Can you explain a little more how you doing this? How are you bringing 
> the cores down and then back up? Shutting down a full solr instance, 
> unloading the core?
>
> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
>
>> I know that the latest Solr Cloud doesn't use standard replication but
>> I have a question about how it appears to be working.  I currently
>> have the following cluster state
>>
>> {"collection1":{
>>   "slice1":{
>>     "JamiesMac.local:8501_solr_slice1_shard1":{
>>       "shard_id":"slice1",
>>       "state":"active",
>>       "core":"slice1_shard1",
>>       "collection":"collection1",
>>       "node_name":"JamiesMac.local:8501_solr",
>>       "base_url":"http://JamiesMac.local:8501/solr"},
>>     "JamiesMac.local:8502_solr_slice1_shard2":{
>>       "shard_id":"slice1",
>>       "state":"active",
>>       "core":"slice1_shard2",
>>       "collection":"collection1",
>>       "node_name":"JamiesMac.local:8502_solr",
>>       "base_url":"http://JamiesMac.local:8502/solr"},
>>     "jamiesmac:8501_solr_slice1_shard1":{
>>       "shard_id":"slice1",
>>       "state":"down",
>>       "core":"slice1_shard1",
>>       "collection":"collection1",
>>       "node_name":"jamiesmac:8501_solr",
>>       "base_url":"http://jamiesmac:8501/solr"},
>>     "jamiesmac:8502_solr_slice1_shard2":{
>>       "shard_id":"slice1",
>>       "leader":"true",
>>       "state":"active",
>>       "core":"slice1_shard2",
>>       "collection":"collection1",
>>       "node_name":"jamiesmac:8502_solr",
>>       "base_url":"http://jamiesmac:8502/solr"}},
>>   "slice2":{
>>     "JamiesMac.local:8501_solr_slice2_shard2":{
>>       "shard_id":"slice2",
>>       "state":"active",
>>       "core":"slice2_shard2",
>>       "collection":"collection1",
>>       "node_name":"JamiesMac.local:8501_solr",
>>       "base_url":"http://JamiesMac.local:8501/solr"},
>>     "JamiesMac.local:8502_solr_slice2_shard1":{
>>       "shard_id":"slice2",
>>       "state":"active",
>>       "core":"slice2_shard1",
>>       "collection":"collection1",
>>       "node_name":"JamiesMac.local:8502_solr",
>>       "base_url":"http://JamiesMac.local:8502/solr"},
>>     "jamiesmac:8501_solr_slice2_shard2":{
>>       "shard_id":"slice2",
>>       "state":"down",
>>       "core":"slice2_shard2",
>>       "collection":"collection1",
>>       "node_name":"jamiesmac:8501_solr",
>>       "base_url":"http://jamiesmac:8501/solr"},
>>     "jamiesmac:8502_solr_slice2_shard1":{
>>       "shard_id":"slice2",
>>       "leader":"true",
>>       "state":"active",
>>       "core":"slice2_shard1",
>>       "collection":"collection1",
>>       "node_name":"jamiesmac:8502_solr",
>>       "base_url":"http://jamiesmac:8502/solr"
>>
>> I then added some docs to the following shards using Solr

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
Thanks.

If the given ZK snapshot was the end state, then two nodes are marked as
down. Generally that happens because replication failed - if you have not,
I'd check the logs for those two nodes.

- Mark

On Fri, Feb 10, 2012 at 7:35 PM, Jamie Johnson  wrote:

> nothing seems that different.  In regards to the states of each I'll
> try to verify tonight.
>
> This was using a version I pulled from SVN trunk yesterday morning
>
> On Fri, Feb 10, 2012 at 6:22 PM, Mark Miller 
> wrote:
> > Also, it will help if you can mention the exact version of solrcloud you
> are talking about in each issue - I know you have one from the old branch,
> and I assume a version off trunk you are playing with - so a heads up on
> which and if trunk, what rev or day will help in the case that I'm trying
> to dupe issues that have been addressed.
> >
> > - Mark
> >
> > On Feb 10, 2012, at 6:09 PM, Mark Miller wrote:
> >
> >> I'm trying, but so far I don't see anything. I'll have to try and mimic
> your setup closer it seems.
> >>
> >> I tried starting up 6 solr instances on different ports as 2 shards,
> each with a replication factor of 3.
> >>
> >> Then I indexed 20k documents to the cluster and verified doc counts.
> >>
> >> Then I shutdown all the replicas so that only one instance served each
> shard.
> >>
> >> Then I indexed 20k documents to the cluster.
> >>
> >> Then I started the downed nodes and verified that they where in a
> recovery state.
> >>
> >> After enough time went by I checked and verified document counts on
> each instance - they where as expected.
> >>
> >> I guess next I can try a similar experiment using multiple cores, but
> if you notice anything that stands out that is largely different in what
> you are doing, let me know.
> >>
> >> The cores that are behind, does it say they are down, recovering, or
> active in zookeeper?
> >>
> >> On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:
> >>
> >>> Sorry for pinging this again, is more information needed on this?  I
> >>> can provide more details but am not sure what to provide.
> >>>
> >>> On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson 
> wrote:
>  Sorry, I shut down the full solr instance.
> 
>  On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller 
> wrote:
> > Can you explain a little more how you doing this? How are you
> bringing the cores down and then back up? Shutting down a full solr
> instance, unloading the core?
> >
> > On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
> >
> >> I know that the latest Solr Cloud doesn't use standard replication
> but
> >> I have a question about how it appears to be working.  I currently
> >> have the following cluster state
> >>
> >> {"collection1":{
> >>   "slice1":{
> >> "JamiesMac.local:8501_solr_slice1_shard1":{
> >>   "shard_id":"slice1",
> >>   "state":"active",
> >>   "core":"slice1_shard1",
> >>   "collection":"collection1",
> >>   "node_name":"JamiesMac.local:8501_solr",
> >>   "base_url":"http://JamiesMac.local:8501/solr"},
> >> "JamiesMac.local:8502_solr_slice1_shard2":{
> >>   "shard_id":"slice1",
> >>   "state":"active",
> >>   "core":"slice1_shard2",
> >>   "collection":"collection1",
> >>   "node_name":"JamiesMac.local:8502_solr",
> >>   "base_url":"http://JamiesMac.local:8502/solr"},
> >> "jamiesmac:8501_solr_slice1_shard1":{
> >>   "shard_id":"slice1",
> >>   "state":"down",
> >>   "core":"slice1_shard1",
> >>   "collection":"collection1",
> >>   "node_name":"jamiesmac:8501_solr",
> >>   "base_url":"http://jamiesmac:8501/solr"},
> >> "jamiesmac:8502_solr_slice1_shard2":{
> >>   "shard_id":"slice1",
> >>   "leader":"true",
> >>   "state":"active",
> >>   "core":"slice1_shard2",
> >>   "collection":"collection1",
> >>   "node_name":"jamiesmac:8502_solr",
> >>   "base_url":"http://jamiesmac:8502/solr"}},
> >>   "slice2":{
> >> "JamiesMac.local:8501_solr_slice2_shard2":{
> >>   "shard_id":"slice2",
> >>   "state":"active",
> >>   "core":"slice2_shard2",
> >>   "collection":"collection1",
> >>   "node_name":"JamiesMac.local:8501_solr",
> >>   "base_url":"http://JamiesMac.local:8501/solr"},
> >> "JamiesMac.local:8502_solr_slice2_shard1":{
> >>   "shard_id":"slice2",
> >>   "state":"active",
> >>   "core":"slice2_shard1",
> >>   "collection":"collection1",
> >>   "node_name":"JamiesMac.local:8502_solr",
> >>   "base_url":"http://JamiesMac.local:8502/solr"},
> >> "jamiesmac:8501_solr_slice2_shard2":{
> >>   "shard_id":"slice2",
> >>   "state":"down",
> >>   "core":"slice2_shard2",
> >>   "collection":"collection1",
> >>   "node_name":"jamiesmac:8501_solr",
> >> 

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller

On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:

> jamiesmac

Another note:

Have no idea if this is involved, but when I do tests with my linux box and mac 
I run into the following:

My linux box auto finds the address of halfmetal and my macbook mbpro.local. If 
I accept those defaults, my mac connect reach my linux box. It can only reach 
the linux box through halfmetal.local, and so I have to override the host on 
the linux box to advertise as halfmetal.local and then they can talk.

In the bad case, if my leaders where on the linux box, they would be able to 
forward to the mac no problem, but then if shards on the mac needed to recover, 
they would fail to reach the linux box through the halfmetal address.

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
hmmperhaps I'm seeing the issue you're speaking of.  I have
everything running right now and my state is as follows:

{"collection1":{
"slice1":{
  "JamiesMac.local:8501_solr_slice1_shard1":{
"shard_id":"slice1",
"leader":"true",
"state":"active",
"core":"slice1_shard1",
"collection":"collection1",
"node_name":"JamiesMac.local:8501_solr",
"base_url":"http://JamiesMac.local:8501/solr"},
  "JamiesMac.local:8502_solr_slice1_shard2":{
"shard_id":"slice1",
"state":"down",
"core":"slice1_shard2",
"collection":"collection1",
"node_name":"JamiesMac.local:8502_solr",
"base_url":"http://JamiesMac.local:8502/solr"}},
"slice2":{
  "JamiesMac.local:8502_solr_slice2_shard1":{
"shard_id":"slice2",
"leader":"true",
"state":"active",
"core":"slice2_shard1",
"collection":"collection1",
"node_name":"JamiesMac.local:8502_solr",
"base_url":"http://JamiesMac.local:8502/solr"},
  "JamiesMac.local:8501_solr_slice2_shard2":{
"shard_id":"slice2",
"state":"down",
"core":"slice2_shard2",
"collection":"dataspace",
"node_name":"JamiesMac.local:8501_solr",
"base_url":"http://JamiesMac.local:8501/solr"


how'd you resolve this issue?


On Fri, Feb 10, 2012 at 8:49 PM, Mark Miller  wrote:
>
> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
>
>> jamiesmac
>
> Another note:
>
> Have no idea if this is involved, but when I do tests with my linux box and 
> mac I run into the following:
>
> My linux box auto finds the address of halfmetal and my macbook mbpro.local. 
> If I accept those defaults, my mac connect reach my linux box. It can only 
> reach the linux box through halfmetal.local, and so I have to override the 
> host on the linux box to advertise as halfmetal.local and then they can talk.
>
> In the bad case, if my leaders where on the linux box, they would be able to 
> forward to the mac no problem, but then if shards on the mac needed to 
> recover, they would fail to reach the linux box through the halfmetal address.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Joining multicore to return top results

2012-02-10 Thread Selvam
Hi,

This should be trivial question, still I am failing to get the details.I
have 2 cores+default collection,

*collection1:*
article_id
title
content

*core0:*
cluster_id
cluster_name
cluster_count

*core1:*
article_id
article_cluster_id
score

Given an article_id, I want to return top 10 ( based on score field in
core1 ) other articles falling in the same cluster. I would like to know,
how to implement this as I am fairly new to Solr.

Version used : Solr 3.5
-- 
Regards,
S.Selvam
http://knackforge.com


Recovering from database connection resets in DataimportHandler

2012-02-10 Thread Mike O'Leary
I am trying to use Solr's DataImportHandler to index a large number of database 
records in a SQL Server database that is owned and managed by a group we are 
collaborating with. The indexing jobs I have run so far, except for the initial 
very small test runs, have failed due to database connection resets. I have 
gotten indexing jobs to go further by using CachedSqlEntityProcessor and 
specifying responseBuffering=adaptive in the connection url, but I think in 
order to index that data I'm going to have to work out how to catch database 
connection reset exceptions and resubmit the queries that failed. Can anyone 
can suggest a good way to approach this? Or have any of you encountered this 
problem and worked out a solution to it already?
Thanks,
Mike