shard indexing

2011-11-02 Thread Vadim Kisselmann
Hello folks,
i have an problem with shard indexing.

with an single core i use this update command:
http://localhost:8983/solr/update .

now i have 2 shards, we can call them core0 / core1
http://localhost:8983/solr/core0/update .


can i adjust anything to indexing in the same way like with a single core
without core-name?

thanks and regards
vadim


Re: index enum

2011-11-02 Thread Gora Mohanty
On Tue, Nov 1, 2011 at 11:07 PM, Radha Krishna Reddy
 wrote:
[...]
> 1. I have an enum column in my sql table.i want to index that column.which
> fieldtype should i specify in the schema.xml for enum?

I presume that you are using the DataImportHandler for indexing.

You will need to convert the enum value into an int (or, whatever is
appropriate). This can be done in the SQL statement that fetches
the data, or in a script processor in data-config.xml.

> 2. Normally we can index one column in a table using the  column header as
> entity name and the column data as value of the entity.Can i index 2 column
> where one column will be the entity name and the other will be the value of
> the entity in the data-config.xml?

Yes, you can.

Regards,
Gora


Re: Solr real-time update taking time

2011-11-02 Thread Jan Høydahl
Hi,

You probably want to use CommitWithin: http://wiki.apache.org/solr/CommitWithin 
to limit the number of commits to a minimum.

Some other questions:
* Are you using spellcheck with builtOnCommit? That totally kills commit 
performance...
* What's your index size, total RAM and allocated RAM to JVM?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 2. nov. 2011, at 03:58, vijay.sampath wrote:

> Hi All, 
> 
>  I recently started working on SOLR 3.3 and would need your expertise to
> provide a solution. I'm working on a POC, in which I've imported 3.5 million
> document records using DIH. We have a source system which publishes change
> data capture in a XML format. The requirement is to integrate SOLR with the
> real time CDC updates. I've written an utility program which receives the
> XML message, transform and update SOLR using SOLRJ. The source system
> publishes atleast 3-4 messages per second, and the requirement is to have
> the changes reflected within 1-2 seconds. Right now it takes almost 15-25
> seconds to get the changes committed in SOLR. I know, commit at every record
> or every second would hamper the search and indexing.
> 
> I thought of having a Master for writes and a Slave for reads, but again not
> sure how fast the replication would be? Since the requirement is to have the
> change data capture in 1-2 seconds. 
> 
> Any thoughts or suggesstions are appreciated. Thanks again. 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-tp3472709p3472709.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Deleting documents not shown in response?

2011-11-02 Thread Jan Høydahl
Hi,

The response only tells you status code and time.
If you delete by query, you can simply do a normal query before the delete 
query to get the IDs.
It would not either be easy to add a patch for this either, as the 
deleteByQuery call happens deep within IndexWriter, not returning any info.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 2. nov. 2011, at 07:50, kiran.bodigam wrote:

> I am trying to delete the document from index by using id.
> http://myserver/solr/update?stream.body=id:2009-11-04\13\:51\:07.348184
> &commit=true
> *its working fine, but from the solr response it is not showing so and so
> document is deleted.. *
> 0 name="QTime">343
>   
> *if i need deleted document information in response how would i do that(at
> least i require id in the response)?*
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Deleting-documents-not-shown-in-response-tp3473013p3473013.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: shard indexing

2011-11-02 Thread Jan Høydahl
Hi,

The only difference is the core name in the URL, which should be easy enough to 
handle from your indexing client code. I don't really understand the reason 
behind your request. How would you control which core to index your document to 
if you did not specify it in the URL?

You could name ONE of your cores as ".", meaning it would be the "default" core 
living at /solr/update, perhaps that is what you're looking for?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote:

> Hello folks,
> i have an problem with shard indexing.
> 
> with an single core i use this update command:
> http://localhost:8983/solr/update .
> 
> now i have 2 shards, we can call them core0 / core1
> http://localhost:8983/solr/core0/update .
> 
> 
> can i adjust anything to indexing in the same way like with a single core
> without core-name?
> 
> thanks and regards
> vadim



Re: shard indexing

2011-11-02 Thread Vadim Kisselmann
Hello Jan,

thanks for your quick response.

It's quite difficult to explain:
We want to create new shards on the fly every month and switch the default
shard to the newest one.
We always want to index to the newest shard with the same update query
like  http://localhost:8983/solr/update.(content stream)

Is our idea possible to implement?

Thanks in advance.
Regards

Vadim





2011/11/2 Jan Høydahl 

> Hi,
>
> The only difference is the core name in the URL, which should be easy
> enough to handle from your indexing client code. I don't really understand
> the reason behind your request. How would you control which core to index
> your document to if you did not specify it in the URL?
>
> You could name ONE of your cores as ".", meaning it would be the "default"
> core living at /solr/update, perhaps that is what you're looking for?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote:
>
> > Hello folks,
> > i have an problem with shard indexing.
> >
> > with an single core i use this update command:
> > http://localhost:8983/solr/update .
> >
> > now i have 2 shards, we can call them core0 / core1
> > http://localhost:8983/solr/core0/update .
> >
> >
> > can i adjust anything to indexing in the same way like with a single core
> > without core-name?
> >
> > thanks and regards
> > vadim
>
>


Re: shard indexing

2011-11-02 Thread Jan Høydahl
Personally I think it is better to be explicit about where you index, so that 
when you create a new shard "december", you also switch the URL for your 
indexing code.

I suppose one trick you could use is to have a core called "current", which now 
would be for november, and once you get to december, you create a "november" 
core, and do a SWAP between "current"<->"november". Then your new core would 
now be "current" and you don't need to change URLs on the index client side.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 2. nov. 2011, at 11:16, Vadim Kisselmann wrote:

> Hello Jan,
> 
> thanks for your quick response.
> 
> It's quite difficult to explain:
> We want to create new shards on the fly every month and switch the default
> shard to the newest one.
> We always want to index to the newest shard with the same update query
> like  http://localhost:8983/solr/update.(content stream)
> 
> Is our idea possible to implement?
> 
> Thanks in advance.
> Regards
> 
> Vadim
> 
> 
> 
> 
> 
> 2011/11/2 Jan Høydahl 
> 
>> Hi,
>> 
>> The only difference is the core name in the URL, which should be easy
>> enough to handle from your indexing client code. I don't really understand
>> the reason behind your request. How would you control which core to index
>> your document to if you did not specify it in the URL?
>> 
>> You could name ONE of your cores as ".", meaning it would be the "default"
>> core living at /solr/update, perhaps that is what you're looking for?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote:
>> 
>>> Hello folks,
>>> i have an problem with shard indexing.
>>> 
>>> with an single core i use this update command:
>>> http://localhost:8983/solr/update .
>>> 
>>> now i have 2 shards, we can call them core0 / core1
>>> http://localhost:8983/solr/core0/update .
>>> 
>>> 
>>> can i adjust anything to indexing in the same way like with a single core
>>> without core-name?
>>> 
>>> thanks and regards
>>> vadim
>> 
>> 



Re: shard indexing

2011-11-02 Thread Yury Kats
There's a "defaultCore" parameter in solr.xml that let's you specify what core 
should be used when none is specified in the URL. You can change that every 
time you create a new core.



>
>From: Vadim Kisselmann 
>To: solr-user@lucene.apache.org
>Sent: Wednesday, November 2, 2011 6:16 AM
>Subject: Re: shard indexing
>
>Hello Jan,
>
>thanks for your quick response.
>
>It's quite difficult to explain:
>We want to create new shards on the fly every month and switch the default
>shard to the newest one.
>We always want to index to the newest shard with the same update query
>like  http://localhost:8983/solr/update.(content stream)
>
>Is our idea possible to implement?
>
>Thanks in advance.
>Regards
>
>Vadim
>
>
>
>
>
>2011/11/2 Jan Høydahl 
>
>> Hi,
>>
>> The only difference is the core name in the URL, which should be easy
>> enough to handle from your indexing client code. I don't really understand
>> the reason behind your request. How would you control which core to index
>> your document to if you did not specify it in the URL?
>>
>> You could name ONE of your cores as ".", meaning it would be the "default"
>> core living at /solr/update, perhaps that is what you're looking for?
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>>
>> On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote:
>>
>> > Hello folks,
>> > i have an problem with shard indexing.
>> >
>> > with an single core i use this update command:
>> > http://localhost:8983/solr/update .
>> >
>> > now i have 2 shards, we can call them core0 / core1
>> > http://localhost:8983/solr/core0/update .
>> >
>> >
>> > can i adjust anything to indexing in the same way like with a single core
>> > without core-name?
>> >
>> > thanks and regards
>> > vadim
>>
>>
>
>
>

exact matches possible?

2011-11-02 Thread Roland Tollenaar

Hi,

I am trying to do a search that will only match exact words on a field.

I have read somewhere that this is not what SOLR is meant for but I am 
still hoping that its possible.


This is an example of what I have tried (to exclude spaces) but the 
workaround does not seem to work.


Word:apple NOT " "

What I am really looking for is the "=" operator in SQL (eg 
Word='apple') but I cannot find its equivalent for lucene.


Thanks for the help.

Regards,

Roland




Re: exact matches possible?

2011-11-02 Thread Erik Hatcher
It's certainly quite possible with Lucene/Solr.  But you have to index the 
field to accommodate it.  If you literally want an exact match query, use the 
"string" field type and then issue a term query.  q=field:value will work in 
simple cases (where the value has no spaces or colons, or other query parser 
syntax), but q={!term f=field}value is the fail-safe way to do that.

Erik

On Nov 2, 2011, at 07:08 , Roland Tollenaar wrote:

> Hi,
> 
> I am trying to do a search that will only match exact words on a field.
> 
> I have read somewhere that this is not what SOLR is meant for but I am still 
> hoping that its possible.
> 
> This is an example of what I have tried (to exclude spaces) but the 
> workaround does not seem to work.
> 
> Word:apple NOT " "
> 
> What I am really looking for is the "=" operator in SQL (eg Word='apple') but 
> I cannot find its equivalent for lucene.
> 
> Thanks for the help.
> 
> Regards,
> 
> Roland
> 
> 



Re: shard indexing

2011-11-02 Thread Vadim Kisselmann
Hello Yury,

thanks for your response.
This is exactly my plan. But "defaultCoreName" is buggy. When i use it
(defaultCore="core_november"), the defaultCore will be deleted.
I think this here was the issue:
https://issues.apache.org/jira/browse/SOLR-2127

Do you use this feature and did it work?

Thanks and Regards
Vadim




2011/11/2 Yury Kats 

> There's a "defaultCore" parameter in solr.xml that let's you specify what
> core should be used when none is specified in the URL. You can change that
> every time you create a new core.
>
>
>
> >
> >From: Vadim Kisselmann 
> >To: solr-user@lucene.apache.org
> >Sent: Wednesday, November 2, 2011 6:16 AM
> >Subject: Re: shard indexing
> >
> >Hello Jan,
> >
> >thanks for your quick response.
> >
> >It's quite difficult to explain:
> >We want to create new shards on the fly every month and switch the default
> >shard to the newest one.
> >We always want to index to the newest shard with the same update query
> >like  http://localhost:8983/solr/update.(content stream)
> >
> >Is our idea possible to implement?
> >
> >Thanks in advance.
> >Regards
> >
> >Vadim
> >
> >
> >
> >
> >
> >2011/11/2 Jan Høydahl 
> >
> >> Hi,
> >>
> >> The only difference is the core name in the URL, which should be easy
> >> enough to handle from your indexing client code. I don't really
> understand
> >> the reason behind your request. How would you control which core to
> index
> >> your document to if you did not specify it in the URL?
> >>
> >> You could name ONE of your cores as ".", meaning it would be the
> "default"
> >> core living at /solr/update, perhaps that is what you're looking for?
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >> Solr Training - www.solrtraining.com
> >>
> >> On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote:
> >>
> >> > Hello folks,
> >> > i have an problem with shard indexing.
> >> >
> >> > with an single core i use this update command:
> >> > http://localhost:8983/solr/update .
> >> >
> >> > now i have 2 shards, we can call them core0 / core1
> >> > http://localhost:8983/solr/core0/update .
> >> >
> >> >
> >> > can i adjust anything to indexing in the same way like with a single
> core
> >> > without core-name?
> >> >
> >> > thanks and regards
> >> > vadim
> >>
> >>
> >
> >
> >
>


SolrCloud with large synonym files

2011-11-02 Thread Phil Hoy
Hi,

I am running solrcloud and a file in the Dbootstrap_confdir is a large large 
synonym file (~50mb ) used by a SynonymFilterFactory configured in the 
schema.xml. When i start solr I get a zookeeper exception presumably because 
the file size is too large. 

Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /configs/recordsets_conf/firstnames.csv
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)

Is there a way to either increase the limit in zookeeper or perhaps configure 
the SynonymFilterFactory differently to get the file from somewhere external to 
Dbootstrap_confdir?

Phil  


Re: SolrCloud with large synonym files

2011-11-02 Thread Yung-chung Lin
Hi,

I didn't use Solr with Zookeeper before. But Solr 3.4 implements the
synonym module with a different data structure. If the version of your Solr
is not 3.4, then maybe you can try upgrading it first.

See also this thread on stackoverflow.
http://stackoverflow.com/questions/6747664/solr-and-big-synonym-file

Yung-chung Lin

2011/11/2 Phil Hoy 

> Hi,
>
> I am running solrcloud and a file in the Dbootstrap_confdir is a large
> large synonym file (~50mb ) used by a SynonymFilterFactory configured in
> the schema.xml. When i start solr I get a zookeeper exception presumably
> because the file size is too large.
>
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /configs/recordsets_conf/firstnames.csv
>at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>
> Is there a way to either increase the limit in zookeeper or perhaps
> configure the SynonymFilterFactory differently to get the file from
> somewhere external to Dbootstrap_confdir?
>
> Phil
>


Re: shard indexing

2011-11-02 Thread Vadim Kisselmann
Hello Jan,

i think personally the same (switch URL for my indexing code), but my
requirement is to use the same query.
Thanks for your suppose with this one trick. Great idea which could work in
my case, i test it.

Regards
Vadim



2011/11/2 Jan Høydahl 

> Personally I think it is better to be explicit about where you index, so
> that when you create a new shard "december", you also switch the URL for
> your indexing code.
>
> I suppose one trick you could use is to have a core called "current",
> which now would be for november, and once you get to december, you create a
> "november" core, and do a SWAP between "current"<->"november". Then your
> new core would now be "current" and you don't need to change URLs on the
> index client side.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 2. nov. 2011, at 11:16, Vadim Kisselmann wrote:
>
> > Hello Jan,
> >
> > thanks for your quick response.
> >
> > It's quite difficult to explain:
> > We want to create new shards on the fly every month and switch the
> default
> > shard to the newest one.
> > We always want to index to the newest shard with the same update query
> > like  http://localhost:8983/solr/update.(content stream)
> >
> > Is our idea possible to implement?
> >
> > Thanks in advance.
> > Regards
> >
> > Vadim
> >
> >
> >
> >
> >
> > 2011/11/2 Jan Høydahl 
> >
> >> Hi,
> >>
> >> The only difference is the core name in the URL, which should be easy
> >> enough to handle from your indexing client code. I don't really
> understand
> >> the reason behind your request. How would you control which core to
> index
> >> your document to if you did not specify it in the URL?
> >>
> >> You could name ONE of your cores as ".", meaning it would be the
> "default"
> >> core living at /solr/update, perhaps that is what you're looking for?
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >> Solr Training - www.solrtraining.com
> >>
> >> On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote:
> >>
> >>> Hello folks,
> >>> i have an problem with shard indexing.
> >>>
> >>> with an single core i use this update command:
> >>> http://localhost:8983/solr/update .
> >>>
> >>> now i have 2 shards, we can call them core0 / core1
> >>> http://localhost:8983/solr/core0/update .
> >>>
> >>>
> >>> can i adjust anything to indexing in the same way like with a single
> core
> >>> without core-name?
> >>>
> >>> thanks and regards
> >>> vadim
> >>
> >>
>
>


how to apply sort and search both on multivalued field in solr

2011-11-02 Thread vrpar...@gmail.com
Hello all,

i did googling and also as per wiki, we can not apply sorting on multivalued
field.

workaround for that is we need to add two more fields for particular
multivalued field, min and max.
 e.g. multivalued field have 4 values
 "abc",
 "cde",
 "efg",
 "pqr"
than min="abc" and max="pqr"and we can make sort on it.

this is fine if there is only required to sort on multivalued field.  

but i want to do searching and sorting on same multivalued field, then
result would not fine.

how to solve this problem ?

Thanks
vishal parekh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-apply-sort-and-search-both-on-multivalued-field-in-solr-tp3473652p3473652.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SolrCloud with large synonym files

2011-11-02 Thread Phil Hoy
It is solr 4.0 and uses the new FSTSynonymFilterFactory i believe but defers to 
ZkSolrResourceLoader to load the synonym file when in cloud mode.
Phil

-Original Message-
From: ☼ 林永忠 ☼ (Yung-chung Lin) [mailto:henearkrx...@gmail.com] 
Sent: 02 November 2011 12:24
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud with large synonym files

Hi,

I didn't use Solr with Zookeeper before. But Solr 3.4 implements the
synonym module with a different data structure. If the version of your Solr
is not 3.4, then maybe you can try upgrading it first.

See also this thread on stackoverflow.
http://stackoverflow.com/questions/6747664/solr-and-big-synonym-file

Yung-chung Lin

2011/11/2 Phil Hoy 

> Hi,
>
> I am running solrcloud and a file in the Dbootstrap_confdir is a large
> large synonym file (~50mb ) used by a SynonymFilterFactory configured in
> the schema.xml. When i start solr I get a zookeeper exception presumably
> because the file size is too large.
>
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /configs/recordsets_conf/firstnames.csv
>at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>
> Is there a way to either increase the limit in zookeeper or perhaps
> configure the SynonymFilterFactory differently to get the file from
> somewhere external to Dbootstrap_confdir?
>
> Phil
>


__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__


limiting searches to particular sources

2011-11-02 Thread Fred Zimmerman
I want to be able to list some searches to particular sources, e.g. "wiki
only", "crawled only", etc.  So I think I need to create a source field in
the schema.xml.  However, the native data for these sources does not
contain source info (e.g. "crawled").  So I want to use (I think)
 to add a string to each data set as I import it, e.g.
"website-X-crawl".  So my question is, how do I insert a string value into
a blank field?


Re: SolrCloud with large synonym files

2011-11-02 Thread Mark Miller

On Nov 2, 2011, at 7:47 AM, Phil Hoy wrote:

> Hi,
> 
> I am running solrcloud and a file in the Dbootstrap_confdir is a large large 
> synonym file (~50mb ) used by a SynonymFilterFactory configured in the 
> schema.xml. When i start solr I get a zookeeper exception presumably because 
> the file size is too large. 
> 
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /configs/recordsets_conf/firstnames.csv
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
> 
> Is there a way to either increase the limit in zookeeper or perhaps configure 
> the SynonymFilterFactory differently to get the file from somewhere external 
> to Dbootstrap_confdir?
> 
> Phil  


As a workaround you can try:

(Java system property:* jute.maxbuffer*)


This option can only be set as a Java system property. There is no
zookeeper prefix on it. It specifies the maximum size of the data
that can be stored in a znode. The default is 0xf, or just under
1M. If this option is changed, the system property must be set on
all servers and clients otherwise problems will arise. This is
really a sanity check. ZooKeeper is designed to store data on the
order of kilobytes in size.

Eventually there are other ways to solve this that we may offer...

Optional compression of files
Store a file across multiple zk nodes transparently when size is too large

- Mark Miller
lucidimagination.com













Re: SolrCloud with large synonym files

2011-11-02 Thread Robert Muir
On Wed, Nov 2, 2011 at 8:53 AM, Phil Hoy  wrote:
> It is solr 4.0 and uses the new FSTSynonymFilterFactory i believe but defers 
> to ZkSolrResourceLoader to load the synonym file when in cloud mode.
> Phil
>

FYI: The synonyms implementation supports multiple formats (currently
"solr" and "wordnet") I think.

Its possible that it could have another format "binary" or
"serialized" which is essentially the serialized byte[] of the FST.
This would be smaller on disk and faster to load, since it wouldnt
really have to 'build' itself (it would be pre-built).

In the initial implementation I didnt add this, as I wasn't sure if
this would have any real value, especially since the build time is a
lot faster now, but it is something that is possible.

-- 
lucidimagination.com


RE: SolrCloud with large synonym files

2011-11-02 Thread Phil Hoy
I tried adding the property but it did not seem to improve things. I did 
however get it working by noticing that the ZkSolrResourceLoader has a fall 
back to load resources from the shared lib, this worked for me. 

Thanks for getting back to me.
Phil

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 02 November 2011 15:06
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud with large synonym files


On Nov 2, 2011, at 7:47 AM, Phil Hoy wrote:

> Hi,
> 
> I am running solrcloud and a file in the Dbootstrap_confdir is a large large 
> synonym file (~50mb ) used by a SynonymFilterFactory configured in the 
> schema.xml. When i start solr I get a zookeeper exception presumably because 
> the file size is too large. 

> 
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /configs/recordsets_conf/firstnames.csv
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
> 
> Is there a way to either increase the limit in zookeeper or perhaps configure 
> the SynonymFilterFactory differently to get the file from somewhere external 
> to Dbootstrap_confdir?
> 
> Phil  


As a workaround you can try:

(Java system property:* jute.maxbuffer*)


This option can only be set as a Java system property. There is no
zookeeper prefix on it. It specifies the maximum size of the data
that can be stored in a znode. The default is 0xf, or just under
1M. If this option is changed, the system property must be set on
all servers and clients otherwise problems will arise. This is
really a sanity check. ZooKeeper is designed to store data on the
order of kilobytes in size.

Eventually there are other ways to solve this that we may offer...

Optional compression of files
Store a file across multiple zk nodes transparently when size is too large

- Mark Miller
lucidimagination.com












__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__


Re: Solr real-time update taking time

2011-11-02 Thread Vijay Sampath
I'll try to use CommitWithin. Just to confirm, if I have the value as 2
seconds, will it affect my search performance?  

To answer you questions, 
1. spellCheck is not used with buildOnCommit
2. Index size is 16.1 GB and RAM allocated to JVM 1GB. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-tp3472709p3474334.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: form-data post to ExtractingRequestHandler with utf-8 characters not handled

2011-11-02 Thread kgoess
I finally managed to answer my own question. UTF-8 data in the body is ok,
but you need to specify charset=utf-8 in the Content-Type header in each
part, to tell the receiver (Solr) that it's not the default ISO-8859-1

   Content-Disposition: form-data; name=literal.bptitle
   Content-Type: text/plain; charset=utf-8

   accented séance ghosts
   --W76L1XO3T9bSMjapwVc9MgXQDNwQ4DBKgevNArdl

References:
The default charset is ISO-8859-1:
http://tools.ietf.org/html/rfc2616#section-3.7.1
How to set the charset for multipartform-data:
http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.2

And if anybody's curious, here's how you specify that in Perl and send a pdf
to the /update/extract solr-cell handler:

my %form_fields = (
   title => 'accented séance ghosts',
   author => 'smith'
);

my @content;

while (my ($field, $value) = each %form_fields){
if ($value =~ /^[[:ascii:]]+$/ ){
push @content, "literal.$field" => $value;
}else{
 push @content, "literal.$field" =>
  [ undef,
   "literal.$field",
   "Content-Type" => 'text/plain; charset=utf-8',
   "Content-Disposition" => "form-data;
name=literal.$field",
   "Content" => encode('utf-8-strict', $value),
  ];
 }
 }

push @content, ( myfile => [ $path, undef, 'Content-Type' =>
'application/pdf', 'Content-Transfer-Encoding', 'binary' ]),

local $HTTP::Request::Common::DYNAMIC_FILE_UPLOAD = 1;

my $response = $ua->post(
$extract_uri,
Content_Type => 'form-data',
Content  => \@content,
);




--
View this message in context: 
http://lucene.472066.n3.nabble.com/form-data-post-to-ExtractingRequestHandler-with-utf-8-characters-not-handled-tp3461731p3474450.html
Sent from the Solr - User mailing list archive at Nabble.com.


Setting up Solr for first time

2011-11-02 Thread MBD
Looking for help getting a basic (the example) configuration up and stabilized 
so we can start experimenting with it. Requirement being that it index PDFs.

After basic install Solr (3.4) is indexing raw text/html files.

But when feeding in a PDF I'm getting a permissions error but not sure how to 
tell where, exactly, the problem is or what I need to do to fix it?!? 

$ curl "http://localhost:8983/solr/update/extract?literal.id=doc2&commit=true"; 
-F "myfile=@features.pdf"
Error 500 Can't connect to window server - not enough permissions.
java.lang.InternalError: Can't connect to window server - not enough 
permissions.
at java.lang.ClassLoader$NativeLibrary.load(Native Method)

full error can be seen here: 
Solr 3.4

thanks for ANY help!
-Mike

Mailing List

2011-11-02 Thread Carol Kuzel
Hello,

 

Can you please add me to the Solr users mailing list.

 

Thank you very much.

 

Carol Swaine-Kuzel

cku...@ebscohost.com

 



RE: Mailing List

2011-11-02 Thread Steven A Rowe
Hi Carol,

Solr mailing list subscription is self-service.  Go here 
 and click on the "Subscribe 
to List" link under the "Users" section.

Steve

> -Original Message-
> From: Carol Kuzel [mailto:cku...@ebscohost.com]
> Sent: Wednesday, November 02, 2011 3:59 PM
> To: solr-user@lucene.apache.org
> Subject: Mailing List
> 
> Hello,
> 
> 
> 
> Can you please add me to the Solr users mailing list.
> 
> 
> 
> Thank you very much.
> 
> 
> 
> Carol Swaine-Kuzel
> 
> cku...@ebscohost.com
> 
> 



Highlighting "text" field when query is for "string" field

2011-11-02 Thread solrdude
I have situation where I need to highlight matching phrases in "text" field
where as query is against "string" field. Its not highlighting now, may be
because in text field they are all terms and hence not a match for phrase.
How do i do it? With hl.alternateField, it identifies those things in
 field, but not applying default  around matching phrase.
How do I get it to mark it?

Eg:

smooth skin  // field type: string
Smooth skin// field type: text


query:
http://localhost:8080/mycore/select?facet=true&group.ngroups=true&facet.mincount=1&group.limit=3&facet.limit=10&hl=true&rows=10&version=2&start=0&q=keyword:%22smooth+skin%22+and+publishStatus:Live&group.field=productName&group=true&facet.field=brand&hl.fl=excerpt&hl.alternateField=excerpt

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-text-field-when-query-is-for-string-field-tp3475334p3475334.html
Sent from the Solr - User mailing list archive at Nabble.com.


Extended Dismax and Proximity Queries

2011-11-02 Thread Jamie Johnson
Is it possible to do Proximity queries using edismax?  I saw I could
do the following

q="batman movie"&qs=100

but I wanted to be able to handle queries like "batman movie"~100

I know I can do

text:"batman movie"~100

but I'm trying to do this without specifying a field.  Is this possible?


Re: how to : multicore setup with same config files

2011-11-02 Thread Val Minyaylo

Have you tried to query multiple cores at same time?

On 10/31/2011 8:30 AM, Vadim Kisselmann wrote:

it works.
it was one wrong placed backslash in my config;)
sharing the config/schema files is not a problem.
regards vadim


2011/10/31 Vadim Kisselmann


Hi folks,

i have a small blockade in the configuration of an multicore setup.
i use the latest solr version (4.0) from trunk and the example (with
jetty).
single core is running without problems.

We assume that i have this structure:

/solr-trunk/solr/example/multicore/

solr.xml

core0/

core1/


/solr-data/

   /conf/

 schema.xml

 solrconfig.xml

   /data/

 core0/

   index

 core1/

   index


I want so share the config-files(same instanceDir but different docDir)

How can i configure this so that it works(solrconfig.xml, solr.xml)?

Do i need the directories for core0/core1 in solr-trunk/...?


I found issues in Jira with old patches which unfortunately doesn't work.


Thanks and Regards

Vadim








Re: Solr real-time update taking time

2011-11-02 Thread Vijay Sampath
Hi Jan, 

  Thanks very much for the suggestion. I used CommitWithin(5000) and the
response came down to less than a second.  But I see an inconsistent
behaviour on the response times. Sometimes it's taking more than 20-25
seconds. May be I'll open up a separate thread. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-tp3472709p3476091.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLRJ commitWithin inconsistent

2011-11-02 Thread Vijay Sampath
Hi, 

 I'm using CommitWithin for immediate commit.  The response times are
inconsistent. Sometimes it's less than a second. Sometimes more than 25
seconds. I'm not sending concurrent requests. Any idea?

 http://wiki.apache.org/solr/CommitWithin

  Snippet: 

  UpdateRequest req = new UpdateRequest();   
  req.add( solrDoc);
  req.setCommitWithin(5000);
  req.process( server );



Thanks,
Vijay 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLRJ-commitWithin-inconsistent-tp3476104p3476104.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: index enum

2011-11-02 Thread Radha Krishna Reddy
Hi Gora,

Thanks for the replay.

First question is clear.

Second,How can i index 2 column where one column will be the entity name
and the other will be the value of the entity in the data-config.xml? is it
possible to provide me some sample?

Thanks and Regards,
Radhakrishna Reddy.

On Wed, Nov 2, 2011 at 2:43 PM, Gora Mohanty  wrote:

> On Tue, Nov 1, 2011 at 11:07 PM, Radha Krishna Reddy
>  wrote:
> [...]
> > 1. I have an enum column in my sql table.i want to index that
> column.which
> > fieldtype should i specify in the schema.xml for enum?
>
> I presume that you are using the DataImportHandler for indexing.
>
> You will need to convert the enum value into an int (or, whatever is
> appropriate). This can be done in the SQL statement that fetches
> the data, or in a script processor in data-config.xml.
>
> > 2. Normally we can index one column in a table using the  column header
> as
> > entity name and the column data as value of the entity.Can i index 2
> column
> > where one column will be the entity name and the other will be the value
> of
> > the entity in the data-config.xml?
>
> Yes, you can.
>
> Regards,
> Gora
>