German language specific problem (automatic Spelling correction, automatic Synonyms ?)

2011-08-01 Thread thomas
Hi,
we have several entries in our database our customer would like to find when
using a not exactly matching search string. The Problem is kind of related
to spelling correction and synonyms. But instead of single entries in
synonyms.txt we would like a automatic solution for this group of problems:

When searching for the name: "schmid" we want to find also documents with
the name "schmidt" included. There are analog names like "hildebrand" and
"hildebrandt" and more. That is the reason we'd like to find a automatic
solution for this group of words.

We allready use the following filters in our index chain



Unfortunatelly the german stemmer is not handling such problems. Nor is this
a problem related to compound words.

Does anyone know of a solution? maybe its possible to set up a filter rule
to extend words ending with letter "d" automatically with letter "t" in the 
query chain? Or other direction to remove "t" letters after "d" letters in
index chain.

Thanks a lot
Thomas

--
View this message in context: 
http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3216278.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)

2011-08-01 Thread thomas
Thanks Alexei,
Thanks Paul,

I played with the solr.PhoneticFilterFactory. Analysing my query in solr
admin backend showed me how and that it is working. My major problem is,
that this filter needs to be applied to the index chain as well as to the
query chain to generate matches for our search. We have a huge index at this
point and i'am not really happy to reindex all content.

Is there maybe a more subtle solution which is working by just manipulating
the query chain only?

Otherwise i need to backup the whole index and try to reindex overnight when
cms users are sleeping.

I will have a look into the ColognePhonetic encoder. Im just afraid ill have
to reindex the whole content there as well.

Thomas

--
View this message in context: 
http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3216414.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)

2011-08-04 Thread thomas
Concerning the downtime, we found a solution that works well for us. We
allready implemented an update mechanism so that when authors are changing
some content in the cms, the index regarding this piece of content gets
updated (delete than index again) as well.

All we had to do is:
1. Change the schema.xml to support the PhoneticFilter in certain fieldtypes
2. Write a script that finds all individual content items
3. Starting the update mechanism for each piece of content item on after
another.

So the index slowly emerges from the old to the new phonetic state without
any noticeable downtime for users using the search function. Its just that
they get kind of mixed results for the time of the transition. Sure it needs
some time, but we can have cms users working with content all the time. If
they create or update content during the transition it will be indexed,
reindexed followinf the new schema.xml anyway.

If we need to rollback we just replace the schema.xml with the old version
and start the update process again. 

So far this is working, thanks for your support!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3225223.html
Sent from the Solr - User mailing list archive at Nabble.com.


Near Real Time Indexing and Searching with solr 3.6

2012-07-03 Thread thomas

Hi,

As part of my bachelor thesis I'm trying to archive NRT with Solr 3.6. 
I've came up with a basic concept and would be trilled if I could get 
some feedback.


The main idea is to use two different Indexes. One persistent on disc 
and one in RAM. The plan is to route every added and modified document 
to the RAMIndex (http://imgur.com/kLfUN). After a certain period of 
time, this index would get cleared and the documents get added to the 
persistent Index.


Some major problems I still have with this idea is:
- deletions of documents from documents in the persistent index
- having the same unique IDs in both the RAM index and persitent Index, 
as a result of an updated document

  - Merging search results to filter out old versions of updated documents

Would such an idea be viable to persuit?

Thanks for you time



Result grouping options

2007-09-26 Thread Thomas

Hello,

For the project I'm working on now it is important to group the results 
of a query by a "product" field. Documents
belong to only one product and there will never be more than 10 
different products alltogether.


When searching through the archives I identified 3 options:

1) [[Client-side XSLT]]
2) Faceting and querying all possible product facets
3) Field collapsing on product field (SOLR-236)

Option 1 is not feasable.
Option 2 would be possible, but 10 queries for every single initial 
query is not really a good idea either.
Option 3 seems like the best option as far as I undestand it but it only 
exists as a patch.


Is it possible to use faceting to not only get the facet count but also 
the top-n documents for every facet

directly? If not, how hard would it be to implement this as an extension?

If its not possible at all, would field collapsing really be a solution 
here and can it somehow be used

with Solr.1.2?

Thanks a lot!

Thomas


Best practice to get all highlighting terms for a document?

2007-10-19 Thread Thomas

Hi,

One of the requirements of the application I'm currently working on is 
highlighting
of matching terms not only in the search result page but also when the 
user clicks
on a result and the whole page is displayed. In this particular app it 
is not possible
to just query for the selected document and set hl.fragsize=0. For 
display, I have

to retrieve the document from a different source.

Is there a "best practice" to retrieve all the highlighted terms? I 
thought about setting
hl.fragsize=1 and using an xsltresponsewriter to filter out the 
highlighted keywords.

Is there an easier/cleaner solution?

Thanks,

Thomas


Getting org.apache.lucene.document.Field instead of String in SolrDocument#get

2016-01-27 Thread Thomas Mortagne
Hi guys,

I have some code using SolrInstance#queryAndStreamResponse and since I
moved to Solr 5.3.1 (from 4.10.4) my StreamingResponseCallback is
called with a SolrDocument filled with Field instead of the String it
used to receive when calling #get('myfield').

Is this expected ? Should I change all my code dealing with
SolrDocument to be carefull about that ? From what I could see those
Fields are put in SolrDocument by DocsStreamer which seems to be new
in 5.3 but did not digged much more.

It looks a bit weird to me given the javadoc of #getFieldValue which
is implemented exactly like #get. Also it's not consistent with
SolrInstance#query behavior which return me SolrDocument containing
values and not Fields.

Sorry if I missed it in the releases notes.

Thanks for your time !
-- 
Thomas


Dollar signs in field names

2015-07-27 Thread Thomas Seidl
Hi all,

I've used dollar signs in field names for several years now, as an easy
way to escape "bad" characters (like colons) coming in from the original
source of the data, and I've never had any problems. Since I don't know
of any Solr request parameters that use a dollar sign as a special
character, I also wouldn't know where one might occur.

But while I remember that the "supported" format for field names was
previously completely undocumented (and it was basically "almost
anything is supported, but some things might not work with some
characters"), I now read that for about a year there has been a strict
definition/recommendation in the Solr wiki [1] which doesn't allow for
dollar signs.

[1] https://cwiki.apache.org/confluence/display/solr/Defining+Fields

So, my question is: Is this just for an easier definition, or is there a
real danger of problems when using dollar signs in field names? Or,
differently: How "bad" of an idea is it?
Also, where was this definition discussed, why was this decision
reached? Is there really an argument against dollar signs? I have to say
it is really very handy to have a character available for field names
that is usually not allowed in programming language's identifiers (as a
cheap escape character).

Thanks in advance,
Thomas


Re: Dollar signs in field names

2015-07-28 Thread Thomas Seidl
Thanks for your answer!

As mentioned, I'm aware of the problems with other characters like
colons and dashes. I've just never run into any issues with dollar
signs. And previously, before there was an official definition, I heard
from several people that "valid Java identifiers" was a good rule of
thumb – which would include dollar signs.

I'd just hoped that when there would be a definition (and it's of course
very good and important that there now is one) it would more or less
mirror that rule of thumb and also allow for dollar signs.

Now it's a pretty tough call whether to use them or not.

Cheers,
Thomas

On 2015-07-27 21:31, Erick Erickson wrote:
> The problem has been that field naming conventions weren't
> _ever_ defined strictly. It's not that anyone is taking away
> the ability to use other characters,  rather it's codifying what's always
> been true; Solr isn't guaranteed to play nice with naming
> conventions other than those specified on the page you
> referenced, alphanumerics and underscores and _not_ starting
> with numerics.
> 
> The danger is that parsing the incoming URL can run into
> "issues". Take for instance a colon. How would the parsing
> process distinguish that from a field:value separator? Or a
> hyphen when is that NOT and when is that part of a field
> name? Periods are also interesting. You can specify some
> params (e.g. facet params) with periods (f.field.prop=). No
> guarantee has ever been made that a field _name_ with a
> period won't confuse things. It happens to work, but that's
> not by design, just like dollar signs.
> 
> So you can use dollar signs, but there won't be any attempts
> to support it if some component somewhere doesn't "do the
> right thing" with it. And no guarantee that there aren't current
> corner cases where that causes problems. And if it does cause
> problems, support won't be added.
> 
> Best,
> Erick
> 
> On Mon, Jul 27, 2015 at 10:42 AM, Thomas Seidl  wrote:
>> Hi all,
>>
>> I've used dollar signs in field names for several years now, as an easy
>> way to escape "bad" characters (like colons) coming in from the original
>> source of the data, and I've never had any problems. Since I don't know
>> of any Solr request parameters that use a dollar sign as a special
>> character, I also wouldn't know where one might occur.
>>
>> But while I remember that the "supported" format for field names was
>> previously completely undocumented (and it was basically "almost
>> anything is supported, but some things might not work with some
>> characters"), I now read that for about a year there has been a strict
>> definition/recommendation in the Solr wiki [1] which doesn't allow for
>> dollar signs.
>>
>> [1] https://cwiki.apache.org/confluence/display/solr/Defining+Fields
>>
>> So, my question is: Is this just for an easier definition, or is there a
>> real danger of problems when using dollar signs in field names? Or,
>> differently: How "bad" of an idea is it?
>> Also, where was this definition discussed, why was this decision
>> reached? Is there really an argument against dollar signs? I have to say
>> it is really very handy to have a character available for field names
>> that is usually not allowed in programming language's identifiers (as a
>> cheap escape character).
>>
>> Thanks in advance,
>> Thomas
> 


Confirm Solr index corruption

2015-02-17 Thread Thomas Mathew
Hi All,

I use Solr 4.4.0 in a master-slave configuration. Last week, the master
server ran out of disk (logs got too big too quick due to a bug in our
system). Because of this, we weren't able to add new docs to an index. The
first thing I did was to delete a few old log files to free up disk space
(later I moved the other logs to free up disk). The index is working fine
even after this fiasco.

The next day, a colleague of mine pointed out that we may be missing a few
documents in the index. I suspect the above scenario may have broken the
index. I ran the checkIndex against this index. It didn't mention of any
corruption though.

Right now, the index has about 25k docs. I haven't optimized this index in
a while, and there are about 4000 deleted-docs. How can I confirm if we
lost anything? If we've lost docs, is there a way to recover it?

Thanks in advance!!

Regards
Thomas


Integration Tests with SOLR 5

2015-02-24 Thread Thomas Scheffler

Hi,

I noticed that not only SOLR does not deliver a WAR file anymore but 
also advices not to try to provide a custom WAR file that can be 
deployed anymore as future version may depend on custom jetty features.


Until 4.10. we were able to provide a WAR file with all the plug-ins we 
need for easier installs. The same WAR file was used together with an 
web application WAR running integration tests and to check if all 
application details still work. We used the cargo-mave2-plugin and 
different servlet container for testing. I think this is quiet common 
thing to do with continuous integration.


Now I wonder if anyone has a similar setup and with integration tests 
running against SOLR 5.


- No artifacts can be used, so no local repository cache is present
- How to deploy your schema.xml, stopwords, solr plug-ins etc. for 
testing in an isolated environment

- What does a maven boilerplate code look like?

Any ideas would be appreciated.

Kind regards,

Thomas


Reading an index while it is being updated?

2015-05-13 Thread Guy Thomas
Up to now we've been using Lucene without Solr.

The Lucene index is being updated and when the update is finished we notify a 
Hessian proxy service running on the web server that wants to read the index. 
When this proxy service is notified, the server knows it can read the updated 
index.

Do we have the use a similar set-up when using Solr, that is:

1. Create/update the index

2. Notify the Solr client



[cid:image001.jpg@01D08D5B.0112E420]

  Guy Thomas
  Analist-Programmeur

  Provincie Vlaams-Brabant
  Dienst Projecten en Ontwikkelingen
  Provincieplein 1 - 3010 Leuven
  Tel: 016-26 79 45
  www.vlaamsbrabant.be<http://www.vlaamsbrabant.be/>




Aan dit bericht kunnen geen rechten worden ontleend. Alle berichten naar dit
professioneel e-mailadres kunnen door de werkgever gelezen worden. In het kader
van de vervulling van onze taak van openbaar belang nemen wij uw relevante
persoonlijke gegevens op in onze bestanden. U kunt deze inzien en verbeteren
conform de Wet Verwerking Persoonsgegevens van 8 december 1992.

Het ondernemingsnummer van het provinciebestuur is 0253.973.219



Using edismax in a filter query

2015-07-10 Thread Thomas Seidl
Hi all,

I was wondering if there's any way to use the Extended DisMax query
parser in an "fq" filter query?
The problem is that I have a "facet.query" with which I want to check
whether a certain set of keywords would have any results. But since the
normal query goes across multiple fields, I end up with something like this:

  facet.query=(field1:search OR field2:search OR field3:search OR
field4:search) AND (field1:keys OR field2:keys OR field3:keys OR
field4:keys)

(Just with a lot more fields.) On the one hand this is rather ugly to
see in the logs, but mostly I'm concerned that this would be harder to
parse for Solr than using its own edismax parser to do the job.

So, is there a way to do that? Or are there any other alternatives to
achieve this (except sending a second query, of course)?
Since the fields used can change from request to request, it's not
possible to dump all their contents into a single field for that purpose.

Thanks in advance,
Thomas


Performance of q.alt vs. fq

2015-07-10 Thread Thomas Seidl
Hi all,

I am working a lot with Drupal and Apache Solr. There, we implemented a
performance improvement that would, for filter-only queries (i.e., no
"q" parameter, just "fq"s) instead move the filters to the "q.alt"
parameter (idea based on this blog post [1]).

[1]
https://web.archive.org/web/20120817044656/http://www.derivante.com/2009/04/27/100x-increase-in-solr-performance-and-throughput

Before, we had "q.alt=*:*" to return all results and then filter via
"fq". So, e.g., this query:
  q.alt=*:*&fq=field1:foo&fq=field2:bar
becomes this:
  q.alt=(field1:foo) AND (field2:bar)

However, now I've read some complaints that the former is actually
faster, and some other people also pointed out (in separate discussions)
that "fq" is much faster than "q".

So, can anyone shed some lights on this, what the internal mechanics are
that make the one or the other faster? Are there other suggestions for
how to make a "filters-only" search as fast as possible?

Also, can it be that it recently changed that the "q.alt" parameter now
influences relevance (in Solr 5.x)? I could have sworn that wasn't the
case previously.

Thanks in advance,
Thomas


Re: Using edismax in a filter query

2015-07-10 Thread Thomas Seidl
Hi Ahmet,

Brilliant, thanks a lot!
I thought it might be possible with local parameters, but couldn't find
any information anywhere on how (especially setting the multi-valued
"qf" parameter).

Thanks again,
Thomas

On 2015-07-10 14:09, Ahmet Arslan wrote:
> Hi Tomasi
> 
> Yes it is possible, please see local params : 
> https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries
> 
> fq={!edismax qf='field1 field2 field'}search key
> Ahmet
> 
> 
> On Friday, July 10, 2015 2:20 PM, Thomas Seidl  wrote:
> 
> 
> 
> Hi all,
> 
> I was wondering if there's any way to use the Extended DisMax query
> parser in an "fq" filter query?
> The problem is that I have a "facet.query" with which I want to check
> whether a certain set of keywords would have any results. But since the
> normal query goes across multiple fields, I end up with something like this:
> 
>   facet.query=(field1:search OR field2:search OR field3:search OR
> field4:search) AND (field1:keys OR field2:keys OR field3:keys OR
> field4:keys)
> 
> (Just with a lot more fields.) On the one hand this is rather ugly to
> see in the logs, but mostly I'm concerned that this would be harder to
> parse for Solr than using its own edismax parser to do the job.
> 
> So, is there a way to do that? Or are there any other alternatives to
> achieve this (except sending a second query, of course)?
> Since the fields used can change from request to request, it's not
> possible to dump all their contents into a single field for that purpose.
> 
> Thanks in advance,
> Thomas
> 


filter groups

2016-07-04 Thread Thomas Scheffler

Hi,

I have metadata and file indexed in solr. All have a different id of 
cause but share the same value for "returnId" if they belong to the same 
metadata that describes a bunch of files (1:n).


When I start a search. I usually use grouping instead of join queries to 
keep the information where the hit occurred.


Now there it's getting tricky. I want to filter out groups depending on 
a field that is only available on metadata documents: visibility.


I want to search in solr like: "Find all documents containing 'foo' 
grouped by returnId, where the metadata visibility is 'public'"


So it should find any 'foo' files but only display the result if the 
corresponding metadata documents field visibility='public'.


Faceting also uses just the information inside groups. Can I give SOLR 
some information for 'fq' and 'facet.*' to work with my setup?


I am still using SOLR 4.10.5

kind regards

Thomas


CDCR: Help With Tlog Growth Issues

2016-11-10 Thread Thomas Tickle
I am having an issue with cdcr that I could use some assistance in resolving.

I followed the instructions found here: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462

The CDCR is setup with a single source to a single target.  Both the source and 
target cluster are identically setup as 3 machines, each running an external 
zookeeper and a solr instance.  I've enabled the data replication and 
successfully seen the documents replicated from the source to the target with 
no errors in the log files.

However, when examining the /cdcr?action=QUEUES command, I noticed that the 
tlogTotalSize and tlogTotalCount were alarmingly high.  Checking the data 
directory for each shard, I was able to confirm that there was several thousand 
logs files of each 3-4 megs.  It added up to almost 35 GBs of tlogs.  
Obviously, this amount of tlogs causes a serious issue when trying to restart a 
solr server after activities such as patch.

Is it normal for old tlogs to never get removed in a CDCR setup?


Thomas Tickle



Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.


Data import handler and no status in web-ui

2017-06-06 Thread Thomas Porschberg
Hi,

I use DIH in solr-cloud mode (implicit route) in solr6.5.1.
When I start the import it works fine and I see the progress in the logfile.
However, when I click the "Refresh Status" button in the web-ui while the 
import is running
I only see "No information available (idle)". 
So I have to look in the logfile the observe when the import was finished.

In the old solr, non-cloud and non-partitioned, there was a hourglass while the 
import was running.

Any idea?

Best regards
Thomas


Clustering on copy fields

2017-07-25 Thread Thomas Krebs
I have defined a copied field on which I would like to use clustering. I 
understood that the destination field will store the full content despite the 
filter chain I defined.

Now, I have a keep word filter defined on the copied field.

If I run clustering on the copied field will it use the result of the filter 
chain, i.e. the tokens passed through the keep word filter or will it run on 
the full content?

Re: Clustering on copy fields

2017-07-25 Thread Thomas Krebs
This is understood.

My question is: I have a keep words filter on field2. field2 is used for 
clustering.
Will the cluster algorithm use „some data“ or the result of the application of 
the keep words filter applied to „some data“.

Cheers,
Thomas


> Am 26.07.2017 um 01:36 schrieb Erick Erickson :
> 
> copyFields are completely independent. The _raw_ data is passed to both. IOW,
> 
> 
> sending
> some data
> 
> is equivalent to this with no copyfield
> some data
> some data
> Best,
> Erick
> 
> 
> On Tue, Jul 25, 2017 at 11:28 AM, Thomas Krebs  wrote:
>> I have defined a copied field on which I would like to use clustering. I 
>> understood that the destination field will store the full content despite 
>> the filter chain I defined.
>> 
>> Now, I have a keep word filter defined on the copied field.
>> 
>> If I run clustering on the copied field will it use the result of the filter 
>> chain, i.e. the tokens passed through the keep word filter or will it run on 
>> the full content?



setup solrcloud from scratch vie web-ui

2017-05-12 Thread Thomas Porschberg
Hi,

I want to setup a solrcloud. I want to  test sharding with one node, no 
replication.
I have some experience with the non-cloud solr and I also run the cloud 
examples.
I also have to use the DIH for importing. I think I can live with the internal 
zookeeper.

I did my first steps with solr-6.5.1.

My first question is: Is it possible to setup a new solrcloud with the web-ui 
only?

When I start solr with: 'bin/solr start -c'

I get a menu on the left side where I can create new collections and cores.
I think when I have only one node with no replication a collection maps to one 
core, right?

Should I create first the core or the collection? 
What should I fill in as instanceDir? 

For example: When I create at the command line 
a 'books/data' directory under '$HOME/solr-6.5.1/server/solr'
and then fill in 'books' as instanceDir and 'data' as data-Directory 
I get a 
'SolrCore Initialization Failures'

books: 
org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
 Could not find configName for collection books found:null


Is something like a step by step manual available? 
Next step would be to setup DIH again. 

This is another problem I see: With my non-cloud core I have a conf-directory 
where I have dataimport.xml, schema.xml and solrconfig.xml. 
I think these 3 files are enough to import my data from my relational database.
Under example/cloud I could not find one of them. How to setup DIH for the 
solrcould?

Best regards
Thomas


Re: setup solrcloud from scratch vie web-ui

2017-05-12 Thread Thomas Porschberg
> > This is another problem I see: With my non-cloud core I have a 
> > conf-directory where I have dataimport.xml, schema.xml and solrconfig.xml. 
> > I think these 3 files are enough to import my data from my relational 
> > database.
> > Under example/cloud I could not find one of them. How to setup DIH for the 
> > solrcould?
> 
> The entire configuration (what would normally be in the conf directory)
> is in zookeeper when you're in cloud mode, not in the core directories. 
> You must upload a directory containing the same files that would
> normally be in a conf directory as a named configset to zookeeper before
> you try to create your collection.  This is something that the "bin/solr
> create" command does for you in cloud mode, typically using one of the
> configsets included on the disk as a source.
> 
> https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
> 
Ok, thank you. I did the following steps.

1. Started an external zookeeper
2. Copied a conf-directory to zookeeper: 
bin/solr zk upconfig -n books -d $HOME/solr-6.5.1/server/solr/tommy/conf -z 
localhost:2181
// This is a conf-directory from a standalone solr when dataimport was working!
--> Connecting to ZooKeeper at localhost:2181 ...
Uploading <> for config books to ZooKeeper at localhost:2181
// I think no errors, but how can I check it in zookeeper? I found no files 
solrconfig.xml ...
in the zookeeper directories (installation dir and data dir)
3. Started solr:
bin/solr start -c
4. Created a books collection with 2 shards
bin/solr create -c books -shards 2

Result: I see in the web-ui my books collection with the 2 shards. No errors so 
far.
However, the Dataimport-entry says:
"Sorry, no dataimport-handler defined!"

What could be the reason?

Thomas


Re: setup solrcloud from scratch vie web-ui

2017-05-12 Thread Thomas Porschberg
Hi,

I think I did one mistake when I started in step 3 solr without 
zookeeper-option.
I did:
 bin/solr start -c
but I think it should:
bin/solr start -c  -z localhost:2181

The problem is now when repeat step 4 (creating a collection) I get the 
following error:

//I uploaded my cat-config again to zookeeper with
// bin/solr zk upconfig -n cat -d $HOME/solr-6.5.1/server/solr/tommy/conf -z // 
localhost:2181


bin/solr create -c cat -shards 2

Connecting to ZooKeeper at localhost:2181 ...
INFO  - 2017-05-12 16:38:06.593; 
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at 
localhost:2181 ready
Re-using existing configuration directory cat

Creating new collection 'cat' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=cat&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=cat


ERROR: Failed to create collection 'cat' due to: 
{127.0.1.1:8983_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
 from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 
'cat_shard1_replica1': Unable to create core [cat_shard1_replica1] Caused by: 
Lock held by this virtual machine: 
/home/pberg/solr_new2/solr-6.5.1/server/data/bestand/index/write.lock}

This "data/bestand" is configured in solrconfig.xml (from tommy standalone) with
data/bestand

I tried to create the directory 
/home/pberg/solr_new2/solr-6.5.1/server/data/bestand/index/ manually , but 
nothing changed.

What is the reason for this CREATE-error?

Thomas




> ANNAMANENI RAVEENDRA  hat am 12. Mai 2017 um 15:54 
> geschrieben:
> 
> 
> Hi ,
> 
> If there is a request handler configured in solrconfig.xml and update the
> Conf in zookeeper it should show up
> 
> If already did it try reloading configuration
> 
> Thanks
> Ravi
> 
> 
> On Fri, 12 May 2017 at 9:46 AM, Thomas Porschberg 
> wrote:
> 
> > > > This is another problem I see: With my non-cloud core I have a
> > conf-directory where I have dataimport.xml, schema.xml and solrconfig.xml.
> > > > I think these 3 files are enough to import my data from my relational
> > database.
> > > > Under example/cloud I could not find one of them. How to setup DIH for
> > the solrcould?
> > >
> > > The entire configuration (what would normally be in the conf directory)
> > > is in zookeeper when you're in cloud mode, not in the core directories.
> > > You must upload a directory containing the same files that would
> > > normally be in a conf directory as a named configset to zookeeper before
> > > you try to create your collection.  This is something that the "bin/solr
> > > create" command does for you in cloud mode, typically using one of the
> > > configsets included on the disk as a source.
> > >
> > >
> > https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
> > >
> > Ok, thank you. I did the following steps.
> >
> > 1. Started an external zookeeper
> > 2. Copied a conf-directory to zookeeper:
> > bin/solr zk upconfig -n books -d $HOME/solr-6.5.1/server/solr/tommy/conf
> > -z localhost:2181
> > // This is a conf-directory from a standalone solr when dataimport was
> > working!
> > --> Connecting to ZooKeeper at localhost:2181 ...
> > Uploading <> for config books to ZooKeeper at localhost:2181
> > // I think no errors, but how can I check it in zookeeper? I found no
> > files solrconfig.xml ...
> > in the zookeeper directories (installation dir and data dir)
> > 3. Started solr:
> > bin/solr start -c
> > 4. Created a books collection with 2 shards
> > bin/solr create -c books -shards 2
> >
> > Result: I see in the web-ui my books collection with the 2 shards. No
> > errors so far.
> > However, the Dataimport-entry says:
> > "Sorry, no dataimport-handler defined!"
> >
> > What could be the reason?
> >
> > Thomas
> >


SolrCloud ... Unable to create core ... Caused by: Lock held by this virtual machine:...

2017-05-14 Thread Thomas Porschberg
Hi,

I have problems to setup solrcloud on one node with 2 shards. What I did:

1. Started a external zookeeper
2. Ensured that no solr process is running with 'bin/solr status'
3. Posted a working conf directory from a non-cloud solr to zookeeper
   with
   'bin/solr zk upconfig -n karpfen -d 
/home/pberg/solr_new/solr-6.5.1/server/solr/tommy/conf -z localhost:2181'
   --> no errors
4. Started solr in cloud mode with
  'bin/solr -c -z localhost:2181'
5. Tried to create a new collection with 2 shards with
   'bin/solr create -c karpfen -shards 2'

The output is:

Connecting to ZooKeeper at localhost:2181 ...
INFO  - 2017-05-12 18:52:22.807; 
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at 
localhost:2181 ready
Re-using existing configuration directory karpfen

Creating new collection 'karpfen' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=karpfen&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=karpfen


ERROR: Failed to create collection 'karpfen' due to: 
{127.0.1.1:8983_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
 from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 
'karpfen_shard2_replica1': Unable to create core [karpfen_shard2_replica1] 
Caused by: Lock held by this virtual machine: 
/home/pberg/solr_new2/solr-6.5.1/server/data/ohrdruf_bestand/index/write.lock}

   
The conf directory I copied contains the following files:
currency.xml elevate.xml  protwords.txt   stopwords.txt
dataimport-cobt2.properties  lang schema.xml  synonyms.txt
dataimport.xml   params.json  solrconfig.xml

"lang" is a directory.

Are my steps wrong? Did I miss something important? 

Any help is really welcome.

Thomas


Re: SolrCloud ... Unable to create core ... Caused by: Lock held by this virtual machine:...

2017-05-15 Thread Thomas Porschberg
Hi,

I get no error message and the shard is created when I use 
numShards=1
in the url.

http://localhost:8983/solr/admin/collections?action=CREATE&name=karpfen&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=karpfen
 --> success

http://localhost:8983/solr/admin/collections?action=CREATE&name=karpfen&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=karpfen
--> error

Thomas


> Susheel Kumar  hat am 15. Mai 2017 um 14:36 
> geschrieben:
> 
> 
> what happens if you create just one shard.  Just use this command directly
> on browser or thru curl.  Empty the contents from
>  /home/pberg/solr_new2/solr-6.5.1/server/data before running
> 
> http://localhost:8983/solr/admin/collections?action=
> CREATE&name=karpfen&numShards=1&replicationFactor=1&
> maxShardsPerNode=1&collection.configName=karpfen
> <http://localhost:8983/solr/admin/collections?action=CREATE&name=karpfen&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=karpfen>
> 
> On Mon, May 15, 2017 at 2:14 AM, Thomas Porschberg 
> wrote:
> 
> > Hi,
> >
> > I have problems to setup solrcloud on one node with 2 shards. What I did:
> >
> > 1. Started a external zookeeper
> > 2. Ensured that no solr process is running with 'bin/solr status'
> > 3. Posted a working conf directory from a non-cloud solr to zookeeper
> >with
> >'bin/solr zk upconfig -n karpfen -d 
> > /home/pberg/solr_new/solr-6.5.1/server/solr/tommy/conf
> > -z localhost:2181'
> >--> no errors
> > 4. Started solr in cloud mode with
> >   'bin/solr -c -z localhost:2181'
> > 5. Tried to create a new collection with 2 shards with
> >'bin/solr create -c karpfen -shards 2'
> >
> > The output is:
> >
> > Connecting to ZooKeeper at localhost:2181 ...
> > INFO  - 2017-05-12 18:52:22.807; 
> > org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider;
> > Cluster at localhost:2181 ready
> > Re-using existing configuration directory karpfen
> >
> > Creating new collection 'karpfen' using command:
> > http://localhost:8983/solr/admin/collections?action=
> > CREATE&name=karpfen&numShards=2&replicationFactor=1&
> > maxShardsPerNode=2&collection.configName=karpfen
> >
> >
> > ERROR: Failed to create collection 'karpfen' due to: {127.0.1.1:8983
> > _solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
> > from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore
> > 'karpfen_shard2_replica1': Unable to create core [karpfen_shard2_replica1]
> > Caused by: Lock held by this virtual machine: /home/pberg/solr_new2/solr-6.
> > 5.1/server/data/ohrdruf_bestand/index/write.lock}
> >
> >
> > The conf directory I copied contains the following files:
> > currency.xml elevate.xml  protwords.txt   stopwords.txt
> > dataimport-cobt2.properties  lang schema.xml  synonyms.txt
> > dataimport.xml   params.json  solrconfig.xml
> >
> > "lang" is a directory.
> >
> > Are my steps wrong? Did I miss something important?
> >
> > Any help is really welcome.
> >
> > Thomas
> >


Re: setup solrcloud from scratch vie web-ui

2017-05-16 Thread Thomas Porschberg
Hi,

I did not manipulating the data dir. What I did was:

1. Downloaded solr-6.5.1.zip
2. ensured no solr process is running
3. unzipped solr-6.5.1.zip to ~/solr_new2/solr-6.5.1
3. started an external zookeeper 
4. copied a conf directory from a working non-cloudsolr (6.5.1) to 
   ~/solr_new2/solr-6.5.1 so that I have ~/solr_new2/solr-6.5.1/conf
  (see http://randspringer.de/solrcloud_test/my.zip for content)
5. postd the conf to zookeeper with:
   bin/solr zk upconfig -n heise -d ./conf -z localhost:2181
6. started solr in cloud mode with
   bin/solr -c -z localhost:2181
7. tried to create a acollection with
   bin/solr create -c heise -shards 2
   -->failed with:
  
Connecting to ZooKeeper at localhost:2181 ...
INFO  - 2017-05-17 07:06:38.249; 
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at 
localhost:2181 ready
Re-using existing configuration directory heise

Creating new collection 'heise' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=heise&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=heise


ERROR: Failed to create collection 'heise' due to: 
{127.0.1.1:8983_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
 from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 
'heise_shard2_replica1': Unable to create core [heise_shard2_replica1] Caused 
by: Lock held by this virtual machine: 
/home/pberg/solr_new2/solr-6.5.1/server/data/index/write.lock}

8. Tried with 1 shard, worked -->
pberg@porschberg:~/solr_new2/solr-6.5.1$ bin/solr create -c heise -shards 1

Connecting to ZooKeeper at localhost:2181 ...
INFO  - 2017-05-17 07:21:01.632; 
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at 
localhost:2181 ready
Re-using existing configuration directory heise

Creating new collection 'heise' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=heise&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=heise

{
  "responseHeader":{
"status":0,
"QTime":2577},
  "success":{"127.0.1.1:8983_solr":{
  "responseHeader":{
"status":0,
"QTime":1441},
  "core":"heise_shard1_replica1"}}}


What did I wrong? I want to use multiple shards on ONE node.

Best regards 
Thomas



> Shawn Heisey  hat am 16. Mai 2017 um 16:30 geschrieben:
> 
> 
> On 5/12/2017 8:49 AM, Thomas Porschberg wrote:
> > ERROR: Failed to create collection 'cat' due to: 
> > {127.0.1.1:8983_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
> >  from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 
> > 'cat_shard1_replica1': Unable to create core [cat_shard1_replica1] Caused 
> > by: Lock held by this virtual machine: 
> > /home/pberg/solr_new2/solr-6.5.1/server/data/bestand/index/write.lock}
> 
> The same Solr instance is already holding the lock on the index at
> /home/pberg/solr_new2/solr-6.5.1/server/data/bestand/index.  This means
> that Solr already has a core using that index directory.
> 
> If the write.lock were present but wasn't being held by the same
> instance, then the message would have said it was held by another program.
> 
> This sounds like you are manually manipulating settings like dataDir. 
> When you start the server from an extracted download (not as a service)
> and haven't messed with any configurations, the index directory for a
> single-shard single-replica "cat" collection should be something like
> the following, and should not be overridden unless you understand
> *EXACTLY* how SolrCloud functions and have a REALLY good reason for
> changing it:
> 
> /home/pberg/solr_new2/solr-6.5.1/server/solr/cat_shard1_replica1/data/index
> 
> On the "Sorry, no dataimport-handler defined!" problem, this is
> happening because the solrconfig.xml file being used by the collection
> does not have any configuration for the dataimport handler.  It's not
> enough to add a DIH config file, solrconfig.xml must have a dataimport
> handler defined that references the DIH config file.
> 
> Thanks,
> Shawn
>


Re: setup solrcloud from scratch vie web-ui

2017-05-17 Thread Thomas Porschberg
> Tom Evans  hat am 17. Mai 2017 um 11:48 geschrieben:
> 
> 
> On Wed, May 17, 2017 at 6:28 AM, Thomas Porschberg
>  wrote:
> > Hi,
> >
> > I did not manipulating the data dir. What I did was:
> >
> > 1. Downloaded solr-6.5.1.zip
> > 2. ensured no solr process is running
> > 3. unzipped solr-6.5.1.zip to ~/solr_new2/solr-6.5.1
> > 3. started an external zookeeper
> > 4. copied a conf directory from a working non-cloudsolr (6.5.1) to
> >~/solr_new2/solr-6.5.1 so that I have ~/solr_new2/solr-6.5.1/conf
> >   (see http://randspringer.de/solrcloud_test/my.zip for content)
> 
> ..in which you've manipulated the dataDir! :)
> 
> The problem (I think) is that you have set a fixed data dir, and when
> Solr attempts to create a second core (for whatever reason, in your
> case it looks like you are adding a shard), Solr puts it exactly where
> you have told it to, in the same directory as the previous one. It
> finds the lock and blows up, because each core needs to be in a
> separate directory, but you've instructed Solr to put them in the same
> one.
> 
> Start with a the solrconfig from basic_configs configset that ships
> with Solr and add the special things that your installation needs. I
> am not massively surprised that your non cloud config does not work in
> cloud mode, when we moved to SolrCloud, we rewrote from scratch
> solrconfig.xml and schema.xml, starting from basic_configs and adding
> anything particular that we needed from our old config, checking every
> difference that we have from stock config and noting/discerning why,
> and ensuring that our field types are using the same names for the
> same types as basic_config wherever possible.
> 
> I only say all that because to fix this issue is a single thing, but
> you should spend the time comparing configs because this will not be
> the only issue. Anyway, to fix this problem, in your solrconfig.xml
> you have:
> 
>   data
> 
> It should be
> 
>   ${solr.data.dir:}
> 
> Which is still in your config, you've just got it commented out :)

Thank you. I am now a step further. 
I could import data into the new collection with the DIH. However I observed 
the following exception 
in solr.log:

request: 
http://127.0.1.1:8983/solr/hugo_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F127.0.1.1%3A8983%2Fsolr%2Fhugo_shard2_replica1%2F&wt=javabin&version=2
Remote error message: This IndexSchema is not mutable.
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)

I also noticed that only one shard is filled.
The wiki describes how to populate data with the rest api. However I use the 
data importer.
I imagine to split my data per day of the year. My idea was to create 365 
shards of type compositeKey. In my SQL I have a date field and it is no problem 
to overwrite data after one year.
However I'm looking for a good example how to achieve this. May be I need in 
this case 365 dataimport.xml files under each shard one... with some 
modulo-expression for the specific day.
Currently the dataimport.xml is in the conf directory.
So I'm looking for a good example how to use the DIH with solrcloud.
Should it work to create a implicit router instead of compositeKey router (with 
365 shards) and simply specfiy as router.field= ?

Thomas


Re: setup solrcloud from scratch vie web-ui

2017-05-17 Thread Thomas Porschberg

> Shawn Heisey  hat am 17. Mai 2017 um 15:10 geschrieben:
> 
> 
> On 5/17/2017 6:18 AM, Thomas Porschberg wrote:
> > Thank you. I am now a step further.
> > I could import data into the new collection with the DIH. However I 
> > observed the following exception 
> > in solr.log:
> >
> > request: 
> > http://127.0.1.1:8983/solr/hugo_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F127.0.1.1%3A8983%2Fsolr%2Fhugo_shard2_replica1%2F&wt=javabin&version=2
> > Remote error message: This IndexSchema is not mutable.
> 
> This probably means that the configuration has an update processor that
> adds unknown fields, but is using the classic schema instead of the
> managed schema.  If you want unknown fields to automatically be guessed
> and added, then you need the managed schema.  If not, then remove the
> custom update processor chain.  If this doesn't sound like what's wrong,
> then we will need the entire error message including the full Java
> stacktrace.  That may be in the other instance's solr.log file.

Ok, commenting out the "update processor chain" was a solution. I use classic 
schema.

> 
> > I imagine to split my data per day of the year. My idea was to create 365 
> > shards of type compositeKey.
> 
> You cannot control shard routing explicitly with the compositeId
> router.  That router uses a hash of the uniqueKey field to decide which
> shard gets the document.  As its name implies, the hash can be composite
> -- parts of the hash can be decided by multiple parts of the value in
> the field, but it's still hashed.
> 
> You must use the implicit router (which means all routing is manual) if
> you want to explicitly name the shard that receives the data.

I was now able to create 365 shards with the 'implicit' router.
In the collection-API call I also specified 
router.field=part_crit 
which is the day of the year 1..365
I added this field in my SQL-statement and in schema.xml.

Next step I thought would be to trigger the dataimport.

However I get:

2017-05-18 05:41:37.417 ERROR (Thread-14) [c:hansi s:308 r:core_node76 
x:hansi_308_replica1] o.a.s.h.d.DataImporter Full Import 
failed:java.lang.RuntimeException: org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms , collection: hansi slice: 
230
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:458)
at java.lang.Thread.run(Thread.java:745)

when I start the import.

What could be the reason?

Thank you
Thomas


Velocity template examples and hardcoded contextPath

2014-04-02 Thread Thomas Pii
The current velocity template examples in the 4.6.1 distribution have a hard
coded context path for the solr web application:
#macro(url_root)/solr#end
in VM_global_library.vm hardcodes it to /solr

I would like to change this to determine the context path at run time, so
the templates do not require modifications if deployed to a different
context path.

Does anyone have any experience with this?

I have found LinkTool in Velocity which has the method getContextPath(), but
I am unsure if it can be used and how to use it if so.
I was thinking somethink like:
#macro(url_root)$link.contextPath#end

So far my attempts have failed and I am unsure how to access it and the
right syntax for it.

Whatever i put in the url_root macro just ends up as strings in the
generated HTML:
  
  
  
  

I appreciate any pointer you can give me.
Regards
Thomas



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Velocity-template-examples-and-hardcoded-contextPath-tp4128545.html
Sent from the Solr - User mailing list archive at Nabble.com.


trigger delete on nested documents

2014-05-18 Thread Thomas Scheffler

Hi,

I plan to use nested documents to group some of my fields


art0001
My first article
  
art0001-foo
Smith, John
author
  
  
art0001-bar
Power, Max
reviewer
  


This way can ask for any documents that are reviewed by Max Power. 
However to simplify update and deletes I want to ensure that nested 
documents are deleted automatically on update and delete of the parent 
document.

Does anyone had to deal with this problem and found a solution?

regards,

Thomas


Re: trigger delete on nested documents

2014-05-18 Thread Thomas Scheffler

Am 19.05.2014 08:38, schrieb Walter underwood:

Solr does not support nested documents.  -- wunder


It does since 4.5:

http://lucene.apache.org/solr/4_5_0/solr-solrj/org/apache/solr/common/SolrInputDocument.html#addChildDocuments(java.util.Collection)

But this feature is rather poor documented and has some caveats:

http://blog.griddynamics.com/2013/09/solr-block-join-support.html

regards,

Thomas


On May 18, 2014, at 11:36 PM, Thomas Scheffler
 wrote:

Hi,

I plan to use nested documents to group some of my fields

 art0001 My first
article  art0001-foo Smith, John author
  art0001-bar Power, Max reviewer
 

This way can ask for any documents that are reviewed by Max Power.
However to simplify update and deletes I want to ensure that nested
documents are deleted automatically on update and delete of the
parent document. Does anyone had to deal with this problem and
found a solution?


Re: trigger delete on nested documents

2014-05-20 Thread Thomas Scheffler

Am 19.05.2014 19:25, schrieb Mikhail Khludnev:

Thomas,

Vanilla way to override a blocks is to send it with the same unique-key (I
guess it's "id" for your case, btw don't you have unique-key defined in the
schema?), but it must have at least one child. It seems like analysis issue
to me https://issues.apache.org/jira/browse/SOLR-5211

While block is indexed the special field _root_ equal to the 
is added across the whole block (caveat, it's not stored by default). At
least you can issue

_root_:PK_VAL

to wipe the whole block.


Thank you for your insight information. It sure helps a lot in 
understanding. The '_root_' field was new to me on this rather poor 
documented feature of SOLR. It helps already if I perform single updates 
and deletes from the index. BUT:


If I delete by a query this results in a mess:

1.) request all IDs returned by that query
2.) fire a giant delete query with "id:(id1 OR .. OR idn) _root_:(id1 OR 
.. OR idn)"


Before every update of single documents I have to fire a delete request.

This turns into a mess, when updating in batch mode:
1.) remove chunk of 100 documents and nested documents (see above)
2.) index chunk of 100 documents

All information for that is available on SOLR side. Can I configure some 
hook that is executed on SOLR-Server so that I do not have to change all 
applications? This would at least save these extra network transfers.


After big work to migrate from plain Lucene to SOLR I really require 
proper nested document support. Elastic Search seems to support it 
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html) 
but I am afraid of another migration. Elastic Search even hides the 
nested documents at queries which seems nice, too.


Does anyone have information how nested document support evolve in 
future releases of SOLR?


kind regards,

Thomas




19.05.2014 10:37 пользователь "Thomas Scheffler" <
thomas.scheff...@uni-jena.de> написал:


Hi,

I plan to use nested documents to group some of my fields


art0001
My first article
   
 art0001-foo
 Smith, John
 author
   
   
 art0001-bar
 Power, Max
 reviewer
   


This way can ask for any documents that are reviewed by Max Power. However
to simplify update and deletes I want to ensure that nested documents are
deleted automatically on update and delete of the parent document.
Does anyone had to deal with this problem and found a solution?


Re: trigger delete on nested documents

2014-05-20 Thread Thomas Scheffler

Am 20.05.2014 14:11, schrieb Jack Krupansky:

To be clear, you cannot update a single document of a nested document
in place - you must reindex the whole block, parent and all children.
This is because this feature relies on the underlying Lucene block
join feature that requires that the documents be contiguous, and
updating a single child document would make it discontiguous with the
rest of the block of documents.

Just update the block by resending the entire block of documents.

For e previous discussion of this limitation:
http://lucene.472066.n3.nabble.com/block-join-and-atomic-updates-td4117178.html


This is totally clear to me and I want nested document to not be 
accessible without it's root context.


There is no way it seems to delete the whole block by the id of the root 
document. There is no way to update the root document that removes the 
stale date from the index. Normal SOLR behavior is to automatically 
delete old documents with same ID. I expect this behavior for other 
documents in this block to.


Anyway to make things clear I issued a JIRA request and tried to explain 
it more carefully there:


https://issues.apache.org/jira/browse/SOLR-6096

regards

Thomas


grouping of multivalued fields

2014-05-21 Thread Thomas Scheffler

Hi,

I have a special case of grouping multivalued fields and I wonder if 
this is possible with SOLR.


I have a field "foo" that is generally multivalued. But for a restricted 
set of documents this field has one value or is not present. So normally 
grouping should work.


Sadly SOLR is failing fast and I wonder if there is some way to specify 
"group by first|any|last|min|max (means all the same here) value of foo".


regards,

Thomas


Re: grouping of multivalued fields

2014-05-21 Thread Thomas Scheffler

Am 21.05.2014 15:07, schrieb Joel Bernstein:

You may want to investigate the group.func option. This would allow you to
plug in your own logic to return the group by key. I don't think there is
an existing function that does exactly what you need so you may have to
write a custom function.


I thought of max(foo) for it but sadly it does work on multivalued 
fields either. I wait for other suggestions and start looking at custom 
functions (did not known that option exists) in parallel.


Thanks,

Thomas


Re: Problem faceting

2014-06-12 Thread Thomas Egense
First of all, make sure you use docvalues for facet fields with many unique
values.

If that still does not help you can try the following.
My kollega Toke Eskildsen  has made a huge improvement when faceting IF the
number of results in the facets are less than 8% of the total number of
documents.
In this case we get a substantial improvement in both memory use and query
time:
See: https://plus.google.com/+TokeEskildsen/posts/7oGxWZRKJEs
We have tested it for index with 300M documents.

From,
Thomas Egense



On Wed, Jun 11, 2014 at 5:36 PM, marcos palacios 
wrote:

> Hello everyone.
>
>
>
> I’m having problems with the performance of queries with  facets, the temp
> expend to resolve a query is very high.
>
>
>
> The index has 10Millions of documents, each one with 100 fields.
>
> The server has 8 cores and 56 Gb of ram, running with jetty with this
> memory configuration: -Xms24096m -Xmx44576m
>
>
>
> When I do a query, with 20 facets, the time expended is 4 – 5 seconds. If
> the same request is did another time, the
>
>
>
> Debug query first execution:
>
> 6037.0 name="time">265.0 name="time">5772.0
>
>
>
> Debug query seconds executions:
>
> 6037.0 name="time">1.0 name="time">4872.0
>
>
>
>
>
> What can I do? Why the facets are not cached?
>
>
>
>
>
> Thank you, Marcos
>


Basic Authentication for Admin GUI

2014-06-18 Thread Thomas Fischer
Hello,

I'm trying to set up a basic authentication for the admin function in the new 
solr GUI.
For this I have to give the appropriate url-pattern, e.g.
/
will match every URL in my solr server.
But the GUI now runs all administrative tasks under /#/ and there is no 
particular /admin/ branch anymore.
Does anybody know how to deal with that situation?
Can I move the administration to a new admin directory?

Best regards
Thomas Fischer





Re: How much free disk space will I need to optimize my index

2014-06-26 Thread Thomas Egense
That is correct, but twice the disk space is theoretically not enough.
Worst case is actually three times the storage, I guess this worst case can
happen if you also submit new documents to the index while optimizing.
I have experienced 2.5 times the disk space during an optimize for a large
index, it was a 1TB index that temporarily used 2.5TB disc space during the
optimize (near the end of the optimization).

From,
Thomas Egense


On Wed, Jun 25, 2014 at 8:21 PM, Markus Jelsma 
wrote:

>
>
>
>
> -Original message-
> > From:johnmu...@aol.com 
> > Sent: Wednesday 25th June 2014 20:13
> > To: solr-user@lucene.apache.org
> > Subject: How much free disk space will I need to optimize my index
> >
> > Hi,
> >
> >
> > I need to de-fragment my index.  My question is, how much free disk
> space I need before I can do so?  My understanding is, I need 1X free disk
> space of my current index un-optimized index size before I can optimize it.
>  Is this true?
>
> Yes, 20 GB of FREE space to force merge an existing 20 GB index.
>
> >
> >
> > That is, let say my index is 20 GB (un-optimized) then I must have 20 GB
> of free disk space to make sure the optimization is successful.  The reason
> for this is because during optimization the index is re-written (is this
> the case?) and if it is already optimized, the re-write will create a new
> 20 GB index before it deletes the old one (is this true?), thus why there
> must be at least 20 GB free disk space.
> >
> >
> > Can someone help me with this or point me to a wiki on this topic?
> >
> >
> > Thanks!!!
> >
> >
> > - MJ
> >
>


Minor bug with CloudSolrServer and collection-alias.

2013-10-23 Thread Thomas Egense
I found this bug in both 4.4 and 4.5

Using cloudSolrServer.setDefaultCollection(collectionId) does not work as
intended for an alias spanning more than 1 collection.
The virtual collection-alias collectionID is recoqnized as a existing
collection, but it does only query one of the collections it is mapped to.

You can confirm this easy in AliasIntegrationTest.

The test-class AliasIntegrationTest creates to cores with 2 and 3 different
documents. And then creates an alias pointing to both of them.

Line 153:
// search with new cloud client
CloudSolrServer cloudSolrServer = new
CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
cloudSolrServer.setParallelUpdates(random().nextBoolean());
query = new SolrQuery("*:*");
query.set("collection", "testalias");
res = cloudSolrServer.query(query);
cloudSolrServer.shutdown();
assertEquals(5, res.getResults().getNumFound());

No unit-test bug here, however if you change it from setting the
collectionid on the query but on CloudSolrServer instead,it will produce
the bug:

// search with new cloud client
CloudSolrServer cloudSolrServer = new
CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
cloudSolrServer.setDefaultCollection("testalias");
cloudSolrServer.setParallelUpdates(random().nextBoolean());
query = new SolrQuery("*:*");
//query.set("collection", "testalias");
res = cloudSolrServer.query(query);
cloudSolrServer.shutdown();
assertEquals(5, res.getResults().getNumFound());  <-- Assertion failure

Should I create a Jira issue for this?

From,
Thomas Egense


Re: Minor bug with CloudSolrServer and collection-alias.

2013-10-24 Thread Thomas Egense
Thanks to both of you for fixing the bug. Impressive response time for the
fix (7 hours).

Thomas Egense


On Wed, Oct 23, 2013 at 7:16 PM, Mark Miller  wrote:

> I filed https://issues.apache.org/jira/browse/SOLR-5380 and just
> committed a fix.
>
> - Mark
>
> On Oct 23, 2013, at 11:15 AM, Shawn Heisey  wrote:
>
> > On 10/23/2013 3:59 AM, Thomas Egense wrote:
> >> Using cloudSolrServer.setDefaultCollection(collectionId) does not work
> as
> >> intended for an alias spanning more than 1 collection.
> >> The virtual collection-alias collectionID is recoqnized as a existing
> >> collection, but it does only query one of the collections it is mapped
> to.
> >>
> >> You can confirm this easy in AliasIntegrationTest.
> >>
> >> The test-class AliasIntegrationTest creates to cores with 2 and 3
> different
> >> documents. And then creates an alias pointing to both of them.
> >>
> >> Line 153:
> >>// search with new cloud client
> >>CloudSolrServer cloudSolrServer = new
> >> CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
> >>cloudSolrServer.setParallelUpdates(random().nextBoolean());
> >>query = new SolrQuery("*:*");
> >>query.set("collection", "testalias");
> >>res = cloudSolrServer.query(query);
> >>cloudSolrServer.shutdown();
> >>assertEquals(5, res.getResults().getNumFound());
> >>
> >> No unit-test bug here, however if you change it from setting the
> >> collectionid on the query but on CloudSolrServer instead,it will produce
> >> the bug:
> >>
> >>// search with new cloud client
> >>CloudSolrServer cloudSolrServer = new
> >>CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
> >>cloudSolrServer.setDefaultCollection("testalias");
> >>cloudSolrServer.setParallelUpdates(random().nextBoolean());
> >>query = new SolrQuery("*:*");
> >>//query.set("collection", "testalias");
> >>res = cloudSolrServer.query(query);
> >>cloudSolrServer.shutdown();
> >>assertEquals(5, res.getResults().getNumFound());  <-- Assertion
> failure
> >>
> >> Should I create a Jira issue for this?
> >
> > Thomas,
> >
> > I have confirmed this with the following test patch, which adds to the
> > test rather than changing what's already there:
> >
> > http://apaste.info/9ke5
> >
> > I'm about to head off to the train station to start my commute, so I
> > will be unavailable for a little while.  If you haven't gotten the jira
> > filed by the time I get to another computer, I will create it.
> >
> > Thanks,
> > Shawn
> >
>
>


Re: How to set the shardid?

2013-10-29 Thread Thomas Egense
You can specify the shard in core.properties ie:
core.properties:
name=collection2
shard=shard2

Did this solve it ?

From,
Thomas Egense




On Mon, Feb 25, 2013 at 5:13 PM, Mark Miller  wrote:

>
> On Feb 25, 2013, at 10:00 AM, "Markus.Mirsberger" <
> markus.mirsber...@gmx.de> wrote:
>
> > How can I fix the shardId used at one server when I create a collection?
> (Im using the solrj collections api to create collections)
>
> You can't do it with the collections API currently. If you want to control
> the shard names explicitly, you have to use the CoreAdmin API to create
> each core - that lets you set the shard id.
>
> - Mark


weak documents

2013-11-27 Thread Thomas Scheffler

Hi,

I am relatively new to SOLR and I am looking for a neat way to implement 
weak documents with SOLR.


Whenever a document is updated or deleted all it's dependent documents 
should be removed from the index. In other words they exist as long as 
the document exist they refer to when they were indexed - in that 
specific version. On "update" they will be indexed after their master 
document.


I could like to have some kind of "dependsOn" field that carries the 
uniqueKey value of the master document.


Can this be done efficiently with SOLR?

I need this technique because on update and on delete I don't know how 
many dependent documents exists in the SOLR index. Especially for batch 
index processes, I need a more efficient way than query before every 
update or delete.


kind regards,

Thomas


Re: weak documents

2013-11-27 Thread Thomas Scheffler

Am 27.11.2013 09:58, schrieb Paul Libbrecht:

Thomas,

our experience with Curriki.org is that evaluating what I call the
"related documents" is a procedure that needs access to the complete
content and thus is run at the DB level and no thte sold-level.

For example, if a user changes a part of its name, we need to reindex
all of his resources. Sure we could try to run a solr query for this,
and maybe add index fields for it, but we felt it better to run this
on the index-trigger side, the thing in our (XWiki) wiki which
listens to changes and requests the reindexing of a few documents
(including deletions).

For the maintenance operation, the same issue has appeared. So, if
the indexer or listener or solr has been down for a few minutes or
hours, we'd need to reindex not only all changed documents but all
changed documents and their related documents.

If you are able to work through your solution that would be
solr-only,  to write down all depends-on at index time, it means you
would index-update all "inverse related" documents every time that
changes. For the relation above (documents of a user), it means the
user documents needs reindexing every time a new document is added. I
wonder if this makes a scale difference.


I think both use-cases differ a bit. On index-time of my master document 
I have all information of dependent documents ready. So instead of 
committing one document I commit - lets say - four.


In your case you have to query to get all documents of a user first.

Here is a more detailed use-case. I have metadata in 1 to n languages to 
describe a document (e.g. journal article).


I commit a master document in a specified default language to SOLR and 
one document for every language I have metadata for. If a user adds or 
removes metadata (e.g. abstract in French) there is one document more or 
one document less in SOLR. So their number changes and I want stalled 
data to be kept in the index.


A similar use case: I have article documents with authors. I create 
"author" documents for every article. If someone adds or removes an 
author I need to track that change. These "dump" author documents are 
used for an alphabetical person index and hold a unique field that is 
used to group them but these documents exists only as long as their 
master documents do.


My two use-cases are quite similar so I would like these "weak" 
documents functionality somehow.


SOLR knows if a document is added with id=foo it have to replace a 
document that matches id:"foo". If I can change this behavior to 
dependsOn:"foo" I am done. :-D


regards

Thomas


problems with boolean query

2013-11-28 Thread Thomas Kurz
Hi all!

I have a question regarding boolean queries. What I want to reach:

Return all documents where string-field „access" has value ‚allow*‘ or is not 
set.

My query:
fq = (access:Allow*) OR (-access:*)

But I got only the results where the field has value ‚allow*‘.

I am using solr 4.3.1 with the edismax query parser.

I am looking forward to get some hints, why the query might be failing.

Best regards and many thanks!
Thomas


--
Thomas Kurz
Forschung & Entwicklung / KMT

Salzburg Research Forschungsgesellschaft mbH 
Jakob-Haringer-Straße 5/3 | 5020 Salzburg, Austria 
T: +43.662.2288-253 | F: -222
thomas.k...@salzburgresearch.at 
http://www.salzburgresearch.at






What type of Solr plugin do I need for filtering results?

2013-12-01 Thread Thomas Seidl

Hi,

I'm currently looking at writing my first Solr plugin, but I could not 
really find any "overview" information about how a Solr request works 
internally, what the control flow is and what kind of plugins are 
available to customize this at which point. The Solr wiki page on 
plugins [1], in my opinion, already assumes too much knowledge and is 
too terse in its descriptions.


[1] http://wiki.apache.org/solr/SolrPlugins

If anyone knows of any good ressources to get me started, that would be 
awesome!


However, also pretty helpful would be just to know what kind of plugin I 
should create for my use case, as I could then at least try to find 
information specific to that. What I want to do is filter the search 
results (at the time fq filters are applied, so before sorting, 
facetting, range selection, etc. takes place) by some custom criterion 
(passed in the URL). The plan is to add the data needed for that custom 
filter as a separate set of documents to Solr and look them up from the 
Solr index when filtering the query. Basically the thing discussed in 
[2], at 29:07.


[2] http://www.youtube.com/watch?v=kJa-3PEc90g&feature=youtu.be&t=29m7s

So, the question is, what kind of plugin would I use (and how would it 
have to be configured)? I first thought it'd have to be a 
SearchComponent, but I think with that I'd only get the results after 
they are sorted and trimmed to the range, right?


Thanks a lot in advance,
Thomas Seidl


Re: What type of Solr plugin do I need for filtering results?

2013-12-04 Thread Thomas Seidl
Thanks a lot for both of your answers! The QParserPlugin is probably 
what I meant, but join queries also look interesting and like the could 
maybe solve my use case, too, without any custom code.
However, since this would make it impossible (I think) to have a score 
for the results but I do want to do fulltext searches on the returned 
field set (with score) it will probably not be enough.


Anyways, I'll look into both of your suggestions. Thanks a lot again!

On 2013-12-02 05:39, Ahmet Arslan wrote:

It depends on your use case. What is you custom criteria how is stored etc.


For example  I had two tables, lets say items and permissions tables. 
Permissions table was holding itemId,userId pairs. Meaning userId can see this 
itemId. My initial effort was index items and add a multivalued field named 
WhoCanSeeMe. And fiterQuery on that field using current user.

After sometime indexing become troublesome. Indexing was slowing down. I 
switched to two cores for each table and used query time join. (JoinQParser) as 
a fq. I didnt have anly plugin for the above.

By the way here is an example of post filter Joel advises : 
http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/






On Monday, December 2, 2013 5:14 AM, Joel Bernstein  wrote:

What you're looking for is a QParserPlugin. Here is an example:

http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_6_0/solr/core/src/java/org/apache/solr/search/FunctionRangeQParserPlugin.java?revision=1544545&view=markup

You're probably want to implement the QParserPlugin as PostFilter.




On Sun, Dec 1, 2013 at 3:46 PM, Thomas Seidl  wrote:


Hi,

I'm currently looking at writing my first Solr plugin, but I could not
really find any "overview" information about how a Solr request works
internally, what the control flow is and what kind of plugins are available
to customize this at which point. The Solr wiki page on plugins [1], in my
opinion, already assumes too much knowledge and is too terse in its
descriptions.

[1] http://wiki.apache.org/solr/SolrPlugins

If anyone knows of any good ressources to get me started, that would be
awesome!

However, also pretty helpful would be just to know what kind of plugin I
should create for my use case, as I could then at least try to find
information specific to that. What I want to do is filter the search
results (at the time fq filters are applied, so before sorting, facetting,
range selection, etc. takes place) by some custom criterion (passed in the
URL). The plan is to add the data needed for that custom filter as a
separate set of documents to Solr and look them up from the Solr index when
filtering the query. Basically the thing discussed in [2], at 29:07.

[2] http://www.youtube.com/watch?v=kJa-3PEc90g&feature=youtu.be&t=29m7s

So, the question is, what kind of plugin would I use (and how would it
have to be configured)? I first thought it'd have to be a SearchComponent,
but I think with that I'd only get the results after they are sorted and
trimmed to the range, right?

Thanks a lot in advance,
Thomas Seidl







Problems with ICUCollationField

2014-02-19 Thread Thomas Fischer
Hello,

I'm migrating to solr 4.6.1 and have problems with the ICUCollationField 
(apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).

I get consistently the error message 
Error loading class 'solr.ICUCollationField'.
even after
INFO: Adding 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' 
to classloader
and
INFO: Adding 
'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
 to classloader.

Am I missing something?

I solr's subversion I found
/SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
but no corresponding class in solr4.6.1's contrib folder.

Best
Thomas



Re: Problems with ICUCollationField

2014-02-19 Thread Thomas Fischer
Hello Robert,

I already added
contrib/analysis-extras/lib/
and
contrib/analysis-extras/lucene-libs/
via lib directives in solrconfig, this is why the classes mentioned are loaded.

Do you know which jar is supposed to contain the ICUCollationField?

Best regards
Thomas



Am 19.02.2014 um 13:54 schrieb Robert Muir:

> you need the solr analysis-extras jar in your classpath, too.
> 
> 
> 
> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer  wrote:
> 
>> Hello,
>> 
>> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>> 
>> I get consistently the error message
>> Error loading class 'solr.ICUCollationField'.
>> even after
>> INFO: Adding
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>> classloader
>> and
>> INFO: Adding
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>> to classloader.
>> 
>> Am I missing something?
>> 
>> I solr's subversion I found
>> 
>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>> but no corresponding class in solr4.6.1's contrib folder.
>> 
>> Best
>> Thomas
>> 
>> 



Re: Problems with ICUCollationField

2014-02-19 Thread Thomas Fischer
Thanks, that helps!

I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory I 
used before to the ICUCollationField.
Is there any description how to achieve this?

First tries now yield

ICUCollationField does not support specifying an analyzer.

which makes it complicated since I used the ICUCollationKeyFilterFactory to 
standardize my text fields (in particular because of German Umlauts).
But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a 
LetterTokenizer, etc. doesn't do me much good, I'm afraid.
Or is this somehow wrapped into the ICUCollationField?

I didn't find ICUCollationField  in the solr wiki and not much information in 
the reference.
And the hint

"solr.ICUCollationField is included in the Solr analysis-extras contrib - see 
solr/contrib/analysis-extras/README.txt for instructions on which jars you need 
to add to your SOLR_HOME/lib in order to use it."

is misleading insofar as this README.txt doesn't mention the 
solr-analysis-extras-4.6.1.jar in dist.

Best
Thomas


Am 19.02.2014 um 14:27 schrieb Robert Muir:

> you need the solr analysis-extras jar itself, too.
> 
> 
> 
> On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer  wrote:
> 
>> Hello Robert,
>> 
>> I already added
>> contrib/analysis-extras/lib/
>> and
>> contrib/analysis-extras/lucene-libs/
>> via lib directives in solrconfig, this is why the classes mentioned are
>> loaded.
>> 
>> Do you know which jar is supposed to contain the ICUCollationField?
>> 
>> Best regards
>> Thomas
>> 
>> 
>> 
>> Am 19.02.2014 um 13:54 schrieb Robert Muir:
>> 
>>> you need the solr analysis-extras jar in your classpath, too.
>>> 
>>> 
>>> 
>>> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer 
>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
>>>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>>>> 
>>>> I get consistently the error message
>>>> Error loading class 'solr.ICUCollationField'.
>>>> even after
>>>> INFO: Adding
>>>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>>>> classloader
>>>> and
>>>> INFO: Adding
>>>> 
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>>>> to classloader.
>>>> 
>>>> Am I missing something?
>>>> 
>>>> I solr's subversion I found
>>>> 
>>>> 
>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>>>> but no corresponding class in solr4.6.1's contrib folder.
>>>> 
>>>> Best
>>>> Thomas
>>>> 
>>>> 
>> 
>> 



Re: Problems with ICUCollationField

2014-02-19 Thread Thomas Fischer

> Hmm, for standardization of text fields, collation might be a little
> awkward.

I arrived there after using custom rules for a while (see "RuleBasedCollator" 
on http://wiki.apache.org/solr/UnicodeCollation) and then being told
"For better performance, less memory usage, and support for more locales, you 
can add the analysis-extras contrib and use ICUCollationKeyFilterFactory 
instead." (on the same page under "ICU Collation").

> For your german umlauts, what do you mean by standardize? is this to
> achieve equivalency of e.g. oe to ö in your search terms?

That is the main point, but I might also need the additional normalization of 
combined characters like
o+  ̈ = ö and probably similar constructions for other languages (like 
Hungarian).

> In that case, a simpler approach would be to put
> GermanNormalizationFilterFactory in your chain:
> http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html

I'll see how far I get with this, but from the description
• 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively.
• 'ae' and 'oe' are replaced by 'a', and 'o', respectively.
this seems to be too far-reaching a reduction: while the identification "ä=ae" 
is not very serious and rarely misleading, "ä=a" might pack words together that 
shouldn't be, "Äsen" and "Asen" are quite different concepts,

In general, the deprecation of ICUCollationKeyFilterFactory doesn't seem to be 
really thought through.

Thanks anyway, best
Thomas

> 
> On Wed, Feb 19, 2014 at 9:16 AM, Thomas Fischer  wrote:
> 
>> Thanks, that helps!
>> 
>> I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory
>> I used before to the ICUCollationField.
>> Is there any description how to achieve this?
>> 
>> First tries now yield
>> 
>> ICUCollationField does not support specifying an analyzer.
>> 
>> which makes it complicated since I used the ICUCollationKeyFilterFactory
>> to standardize my text fields (in particular because of German Umlauts).
>> But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a
>> LetterTokenizer, etc. doesn't do me much good, I'm afraid.
>> Or is this somehow wrapped into the ICUCollationField?
>> 
>> I didn't find ICUCollationField  in the solr wiki and not much information
>> in the reference.
>> And the hint
>> 
>> "solr.ICUCollationField is included in the Solr analysis-extras contrib -
>> see solr/contrib/analysis-extras/README.txt for instructions on which jars
>> you need to add to your SOLR_HOME/lib in order to use it."
>> 
>> is misleading insofar as this README.txt doesn't mention the
>> solr-analysis-extras-4.6.1.jar in dist.
>> 
>> Best
>> Thomas
>> 
>> 
>> Am 19.02.2014 um 14:27 schrieb Robert Muir:
>> 
>>> you need the solr analysis-extras jar itself, too.
>>> 
>>> 
>>> 
>>> On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer 
>> wrote:
>>> 
>>>> Hello Robert,
>>>> 
>>>> I already added
>>>> contrib/analysis-extras/lib/
>>>> and
>>>> contrib/analysis-extras/lucene-libs/
>>>> via lib directives in solrconfig, this is why the classes mentioned are
>>>> loaded.
>>>> 
>>>> Do you know which jar is supposed to contain the ICUCollationField?
>>>> 
>>>> Best regards
>>>> Thomas
>>>> 
>>>> 
>>>> 
>>>> Am 19.02.2014 um 13:54 schrieb Robert Muir:
>>>> 
>>>>> you need the solr analysis-extras jar in your classpath, too.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer 
>>>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I'm migrating to solr 4.6.1 and have problems with the
>> ICUCollationField
>>>>>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>>>>>> 
>>>>>> I get consistently the error message
>>>>>> Error loading class 'solr.ICUCollationField'.
>>>>>> even after
>>>>>> INFO: Adding
>>>>>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>>>>>> classloader
>>>>>> and
>>>>>> INFO: Adding
>>>>>> 
>>>> 
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>>>>>> to classloader.
>>>>>> 
>>>>>> Am I missing something?
>>>>>> 
>>>>>> I solr's subversion I found
>>>>>> 
>>>>>> 
>>>> 
>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>>>>>> but no corresponding class in solr4.6.1's contrib folder.
>>>>>> 
>>>>>> Best
>>>>>> Thomas
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 



SOLRJ and SOLR compatibility

2014-02-26 Thread Thomas Scheffler

Hi,

I am one developer of a repository framework. We rely on the fact, that 
"SolrJ generally maintains backwards compatibility, so you can use a 
newer SolrJ with an older Solr, or an older SolrJ with a newer Solr." [1]


This statement is not even true for bugfix releases like 4.6.0 -> 4.6.1. 
(SOLRJ 4.6.1, SOLR 4.6.0)


We use SolrInputDocument from SOLRJ to index our documents (javabin). 
But as framework developer we are not in a role to force our users to 
update their SOLR server such often. Instead with every new version we 
want to update just the SOLRJ library we ship with to enable latest 
features, if the user wishes.


When I send a query to a request handler I can attach a "version" 
parameter to tell SOLR which version of the response format I expect.


Is there such a configuration when indexing SolrInputDocuments? I did 
not find it so far.


Kind regards,

Thomas

[1] https://wiki.apache.org/solr/Solrj


Re: SOLRJ and SOLR compatibility

2014-02-26 Thread Thomas Scheffler

Am 27.02.2014 08:04, schrieb Shawn Heisey:

On 2/26/2014 11:22 PM, Thomas Scheffler wrote:

I am one developer of a repository framework. We rely on the fact, that
"SolrJ generally maintains backwards compatibility, so you can use a
newer SolrJ with an older Solr, or an older SolrJ with a newer Solr." [1]

This statement is not even true for bugfix releases like 4.6.0 -> 4.6.1.
(SOLRJ 4.6.1, SOLR 4.6.0)

We use SolrInputDocument from SOLRJ to index our documents (javabin).
But as framework developer we are not in a role to force our users to
update their SOLR server such often. Instead with every new version we
want to update just the SOLRJ library we ship with to enable latest
features, if the user wishes.

When I send a query to a request handler I can attach a "version"
parameter to tell SOLR which version of the response format I expect.

Is there such a configuration when indexing SolrInputDocuments? I did
not find it so far.


What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
that I'm completely ignorant here, but I have not heard of any.


Actually bug reports arrive me that sound like

"Unknown type 19"

I am currently not able to reproduce it myself with server version 
4.5.0, 4.5.1 and 4.6.0 when using solrj 4.6.1


It sounds to be the same issue like described here:

http://lucene.472066.n3.nabble.com/After-upgrading-indexer-to-SolrJ-4-6-1-o-a-solr-servlet-SolrDispatchFilter-Unknown-type-19-td4116152.html

The solution there was to upgrade the Server to version 4.6.1.

This helped here, too. Out there it is a very unpopular decision. Some 
user have large SOLR installs and stick to a certain (4.x) version. They 
want upgrades from us but upgrading company-wide SOLR installations is 
out of their scope.


Is that a known SOLRJ issue that is fixed in version 4.7.0?

kind regards,

Thomas


range types in SOLR

2014-03-01 Thread Thomas Scheffler

Hi,

I am in the need of range types in SOLR - similar to PostgreSQL:
https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf

My schema should allow approximate dates and queries on that. When 
having a single such date per document one can split this information 
into two separate fields. But this is not an option if the date is 
multiple. One have to to split the document into two ore more documents.


I wonder if that has to be so complicated. Does somebody know if SOLR 
already supports range types? If not, how difficult would it be to 
implement? Is anybody in the need for range types, too?


kind regards,

Thomas


Re: range types in SOLR

2014-03-01 Thread Thomas Scheffler

Am 01.03.14 18:24, schrieb Erick Erickson:

I'm not clear what you're really after here.

Solr certainly supports ranges, things like time:[* TO date_spec] or
date_field:[date_spec TO date_spec] etc.


There's also a really creative use of spatial (of all things) to, say
answer questions involving multiple dates per record. Imagine, for
instance, employees with different hours on different days. You can
use spatial to answer questions like "which employees are available
on Wednesday between 4PM and 8PM".

And if none of this is relevant, how about you give us some
use-cases? This could well be an XY problem.


Hi,

lets try this example to show the problem. You have some old text that 
was written in two periods of time:


1.) 2nd half of 13th century: -> 1250-1299
2.) Beginning of 18th century: -> 1700-1715

You are searching for text that were written between 1300-1699, than 
this document described above should not be hit.


If you make start date and end date multiple this results in:

start: [1250, 1700]
end: [1299, 1715]

A search for documents written between 1300-1699 would be:

(+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 
TO *]) (+start:[*-1699] +end:[1700 TO *])


You see that the document above would obviously hit by "(+start:[* TO 
1300] +end:[1300 TO *])"


Hope you see the problem. This problem is the same when ever you face 
multiple ranges in one field of a document. For every duplication you 
need to create a separate SOLR document. If you have two such fields in 
one document, field "one" with n values and field "two" with m values. 
You are forced to create m*n documents. This fact and the rather unhandy 
query (s.o.) are my motivation to ask for range types like in PostgreSQL 
where the problem is solved.


regards,

Thomas


Configuration problem

2014-03-03 Thread Thomas Fischer
Hello,

for some reason I have problems to get my local solr system to run (MacBook, 
tomcat 6.0.35).

The setting is
solr directories (I use different solr versions at the same time):
/srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the 
new "discovery type" (no cores), and inside the core directories are empty 
files core.properties and symbolic links to the universal conf directory.
 
solr webapps (I use very different webapps simultaneously):
/srv/www/webapps/solr/solr4.6.1 is the solr webapp

I tried to convey this information to the tomcat server by putting a file 
solr4.6.1.xml into the cataiina/localhost folder with the contents





The Tomcat Manager shows solr4.6.1 as started, but following the given link 
gives an error with the message:
"SolrCore 'collection1' is not available due to init failure: Could not load 
config file /srv/solr4.6.1/collection1/solrconfig.xml"
which is plausible, since
1. there is no folder /srv/solr4.6.1/collection1 and
2.for the actual cores solrconfig.xml is inside of 
/srv/solr4.6.1/cores/geo/conf/

But why does Tomcat try to find a solrconfig.xml there?
The problem persists if I start tomcat with 
-Dsolr.solr.home=/srv/solr/solr4.6.1, it seems that the system just ignores the 
solr home setting.

Can somebody give me a hint what I'm doing wrong?

Best regards
Thomas

P.S.: Is there a way to stop Tomcat from throwing these errors into my face 
threefold: once as heading (!), once as message and once as description?




Re: Configuration problem

2014-03-03 Thread Thomas Fischer
Am 03.03.2014 um 22:43 schrieb Shawn Heisey:

> On 3/3/2014 9:02 AM, Thomas Fischer wrote:
>> The setting is
>> solr directories (I use different solr versions at the same time):
>> /srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the 
>> new "discovery type" (no cores), and inside the core directories are empty 
>> files core.properties and symbolic links to the universal conf directory.
>>  solr webapps (I use very different webapps simultaneously):
>> /srv/www/webapps/solr/solr4.6.1 is the solr webapp
>> 
>> I tried to convey this information to the tomcat server by putting a file 
>> solr4.6.1.xml into the cataiina/localhost folder with the contents
>> 
>> > crossContext="true">
>>  > value="/srv/solr/solr4.6.1" override="true"/>
>> 
> 
> Your message is buried deep in another message thread about NoSQL, because 
> you replied to an existing message rather than starting a new message to 
> solr-user@lucene.apache.org.  On list-mirroring forums like Nabble, nobody 
> will even see your message (or this reply) unless they actually open that 
> other thread.  This is what it looks like on a threading mail reader 
> (Thunderbird):
> 
> https://www.dropbox.com/s/87ilv7jls7y5gym/solr-reply-thread.png

Yes, I'm sorry, I only afterwards realized that my question inherited the 
thread from the E-Mail I was reading and using as a template for the answer.

Meanwhile I figured out that I overlooked the third place to define solr home 
for Tomcat (after JAVA_OPTS and JNDI): web.xml in WEB-INF of the given webapp.
This overrides the other definitions and created the impression that I couldn't 
set  solr home.

But now I get the message
"Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml"
for the core "geo".
In the solr wiki I read (http://wiki.apache.org/solr/ConfiguringSolr):
"In each core, Solr will look for a conf/solrconfig.xml file" and expected solr 
to look for
/srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml (which exists), but obviously 
it doesn't.
Why? My misunderstanding?

Best
Thomas





solrconfig.xml

2014-03-03 Thread Thomas Fischer
Hello,

I'm sorry to repeat myself but I didn't manage to get out of the thread I 
inadvertently slipped into.

My problem now is this:
I have a core "geo" (with an empty file core.properties inside) and 
solrconfig.xml at
/srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml
following the hint from the solr wiki  
(http://wiki.apache.org/solr/ConfiguringSolr):
"In each core, Solr will look for a conf/solrconfig.xml file"
But I get the error message:
"Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml"
Why? My misunderstanding?

Best
Thomas


Re: range types in SOLR

2014-03-03 Thread Thomas Scheffler

Am 03.03.2014 19:12, schrieb Smiley, David W.:

The main reference for this approach is here:
http://wiki.apache.org/solr/SpatialForTimeDurations


Hoss’s illustrations he developed for the meetup presentation are great.
However, there are bugs in the instruction — specifically it’s important
to slightly buffer the query and choose an appropriate maxDistErr.  Also,
it’s more preferable to use the rectangle range query style of spatial
query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using
“Intersects(minX minY maxX maxY)”.  There’s no technical difference but
the latter is deprecated and will eventually be removed from Solr 5 /
trunk.

All this said, recognize this is a bit of a hack (one that works well).
There is a good chance a more ideal implementation approach is going to be
developed this year.


Thank you,

having a working example is great but having a practically working 
example that hides this implementation detail would even better.


I would like to store:

2014-03-04T07:05:12,345Z, 2014-03-04, 2014-03 and 2014 into one field 
and make queries on that field.


Currently I have to normalize all to the first format (inventing 
information). That is only the worst approximation. Normalize them to a 
range would be the best in my opinion. So a query like "date:2014" would 
hit all but also "date:[2014-01 TO 2014-03]".


kind regards,

Thomas


Re: SOLRJ and SOLR compatibility

2014-03-03 Thread Thomas Scheffler

Am 27.02.2014 09:15, schrieb Shawn Heisey:

On 2/27/2014 12:49 AM, Thomas Scheffler wrote:

What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
that I'm completely ignorant here, but I have not heard of any.


Actually bug reports arrive me that sound like

"Unknown type 19"


Aha!  I found it!  It was caused by the change applied for SOLR-5658,
fixed in 4.7.0 (just released) by SOLR-5762.  Just my luck that there's
a bug bad enough to contradict what I told you.

https://issues.apache.org/jira/browse/SOLR-5658
https://issues.apache.org/jira/browse/SOLR-5762

I've added a comment that will help users find SOLR-5762 with a search
for "Unknown type 19".

If you use SolrJ 4.7.0, compatibility should be better.


Hi,

I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR 
4.5.1. I received a client stack trace this morning and still waiting 
for a Log-Output from the Server:


--
ERROR unable to submit tasks
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Unknown type 19
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
--

There is not much information in that Stacktrace, I know.
I'll send further information, when I receive more. In the mean time I 
asked our customer not to upgrade the SOLR server to resolve the issue. 
So we could dig deeper.


kind regards,

Thomas


Re: SOLRJ and SOLR compatibility

2014-03-03 Thread Thomas Scheffler

Am 04.03.2014 07:21, schrieb Thomas Scheffler:

Am 27.02.2014 09:15, schrieb Shawn Heisey:

On 2/27/2014 12:49 AM, Thomas Scheffler wrote:

What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
that I'm completely ignorant here, but I have not heard of any.


Actually bug reports arrive me that sound like

"Unknown type 19"


Aha!  I found it!  It was caused by the change applied for SOLR-5658,
fixed in 4.7.0 (just released) by SOLR-5762.  Just my luck that there's
a bug bad enough to contradict what I told you.

https://issues.apache.org/jira/browse/SOLR-5658
https://issues.apache.org/jira/browse/SOLR-5762

I've added a comment that will help users find SOLR-5762 with a search
for "Unknown type 19".

If you use SolrJ 4.7.0, compatibility should be better.


Hi,

I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR
4.5.1. I received a client stack trace this morning and still waiting
for a Log-Output from the Server:


Here we go for the server side (4.5.1):

Mrz 03, 2014 2:39:26 PM org.apache.solr.core.SolrCore execute
Information: [clausthal_test] webapp=/solr path=/select
params={fl=*,score&sort=mods.dateIssued+desc&q=%2BobjectType:"mods"+%2Bcategory:"clausthal_status\:published"&wt=javabin&version=2&rows=3}
hits=186 status=0 QTime=2
Mrz 03, 2014 2:39:38 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
Information: [clausthal_test] webapp=/solr path=/update
params={wt=javabin&version=2} {} 0 0
Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log
Schwerwiegend: java.lang.RuntimeException: Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log
Schwerwiegend: null:java.lang.RuntimeException: Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRe

Re: [ANN] Lucidworks Fusion 1.0.0

2014-09-22 Thread Thomas Egense
Hi Grant.
Will there be a Fusion demostration/presentation  at Lucene/Solr Revolution
DC? (Not listed in the program yet).


Thomas Egense

On Mon, Sep 22, 2014 at 3:45 PM, Grant Ingersoll 
wrote:

> Hi All,
>
> We at Lucidworks are pleased to announce the release of Lucidworks Fusion
> 1.0.   Fusion is built to overlay on top of Solr (in fact, you can manage
> multiple Solr clusters -- think QA, staging and production -- all from our
> Admin).In other words, if you already have Solr, simply point Fusion at
> your instance and get all kinds of goodies like Banana (
> https://github.com/LucidWorks/Banana -- our port of Kibana to Solr + a
> number of extensions that Kibana doesn't have), collaborative filtering
> style recommendations (without the need for Hadoop or Mahout!), a modern
> signal capture framework, analytics, NLP integration, Boosting/Blocking and
> other relevance tools, flexible index and query time pipelines as well as a
> myriad of connectors ranging from Twitter to web crawling to Sharepoint.
> The best part of all this?  It all leverages the infrastructure that you
> know and love: Solr.  Want recommendations?  Deploy more Solr.  Want log
> analytics?  Deploy more Solr.  Want to track important system metrics?
> Deploy more Solr.
>
> Fusion represents our commitment as a company to continue to contribute a
> large quantity of enhancements to the core of Solr while complementing and
> extending those capabilities with value adds that integrate a number of 3rd
> party (e.g connectors) and home grown capabilities like an all new,
> responsive UI built in AngularJS.  Fusion is not a fork of Solr.  We do not
> hide Solr in any way.  In fact, our goal is that your existing applications
> will work out of the box with Fusion, allowing you to take advantage of new
> capabilities w/o overhauling your existing application.
>
> If you want to learn more, please feel free to join our technical webinar
> on October 2: http://lucidworks.com/blog/say-hello-to-lucidworks-fusion/.
> If you'd like to download: http://lucidworks.com/product/fusion/.
>
> Cheers,
> Grant Ingersoll
>
> 
> Grant Ingersoll | CTO
> gr...@lucidworks.com | @gsingers
> http://www.lucidworks.com
>
>


Memory issue in merge thread

2014-09-24 Thread Thomas Mortagne
Hi guys,

I recently upgraded from Solr 4.0 to 4.8.1. I start it with a clean
index (we did some change in the Solr schema in the meantime) and
after some time of indexing a very big database my instance is
becoming totally unusable with 99% of the heap filled. Then when I
restart it it get stuck very quickly with the same memory issue so it
seems linked to the size of the Lucene index more that the time spend
indexing data.

Youtkit is telling me that "Lucene Merge Thread #1"
(ConcurrentMergeScheduler$MergeThread) is keeping 4,095 instances of
org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$2 in some
local variable(s) which retain about 422MB of RAM and keep adding more
of those to the heap.

Whatever this thread is doing (ConcurrentMergeScheduler javadoc is not
very detailed) it does not seems to be doing it in a very streamed
fashion (if not "simply" a memory leak in 4.8.1). Any idea if this is
expected ? Do I have a way to control the size of the heap this thread
is going to need ?

I have the heap dump if anyone want more details or want to look at
it. The Solr setup can be seen on
https://github.com/xwiki/xwiki-platform/tree/xwiki-platform-6.2/xwiki-platform-core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-search-solr-api/src/main/resources/solr/xwiki/conf

I know all that is only talking about Lucene classes but since on my
side what I use is Solr I tough it was better to ask on this mailing
list.

Thanks,
-- 
Thomas


Problems after upgrade 4.10.1 -> 4.10.2

2014-11-12 Thread Thomas Lamy

Hi there!

As we got bitten by https://issues.apache.org/jira/browse/SOLR-6530 on a 
regular basis, we started upgrading our 7 mode cloud from 4.10.1 to 4.10.2.

The first node upgrade worked like a charm.
After upgrading the second node, two cores no longer come up and we get 
the following error:


ERROR - 2014-11-12 15:17:34.226; org.apache.solr.cloud.RecoveryStrategy; 
Recovery failed - trying again... (16) core=cams_shard1_replica4
ERROR - 2014-11-12 15:17:34.230; org.apache.solr.common.SolrException; 
Error while trying to recover. 
core=onlinelist_shard1_replica7rg.noggit.JSONParser$ParseException: JSON 
Parse Error: char=d,position=0 BEFORE='d' AFTER='own'

at org.noggit.JSONParser.err(JSONParser.java:223)
at org.noggit.JSONParser.next(JSONParser.java:622)
at org.noggit.JSONParser.nextEvent(JSONParser.java:663)
at org.noggit.ObjectBuilder.(ObjectBuilder.java:44)
at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37)
at 
org.apache.solr.common.cloud.ZkStateReader.fromJSON(ZkStateReader.java:129)
at 
org.apache.solr.cloud.ZkController.getLeaderInitiatedRecoveryStateObject(ZkController.java:1925)
at 
org.apache.solr.cloud.ZkController.getLeaderInitiatedRecoveryState(ZkController.java:1890)

at org.apache.solr.cloud.ZkController.publish(ZkController.java:1071)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1041)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1037)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:355)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)


Any hint on how to solve this? Google didn't reveal anything useful...


Kind regards
Thomas

--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139

Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476



Re: Problems after upgrade 4.10.1 -> 4.10.2

2014-11-12 Thread Thomas Lamy

Am 12.11.2014 um 15:29 schrieb Thomas Lamy:

Hi there!

As we got bitten by https://issues.apache.org/jira/browse/SOLR-6530 on 
a regular basis, we started upgrading our 7 mode cloud from 4.10.1 to 
4.10.2.

The first node upgrade worked like a charm.
After upgrading the second node, two cores no longer come up and we 
get the following error:


ERROR - 2014-11-12 15:17:34.226; 
org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying 
again... (16) core=cams_shard1_replica4
ERROR - 2014-11-12 15:17:34.230; org.apache.solr.common.SolrException; 
Error while trying to recover. 
core=onlinelist_shard1_replica7rg.noggit.JSONParser$ParseException: 
JSON Parse Error: char=d,position=0 BEFORE='d' AFTER='own'

at org.noggit.JSONParser.err(JSONParser.java:223)
at org.noggit.JSONParser.next(JSONParser.java:622)
at org.noggit.JSONParser.nextEvent(JSONParser.java:663)
at org.noggit.ObjectBuilder.(ObjectBuilder.java:44)
at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37)
at 
org.apache.solr.common.cloud.ZkStateReader.fromJSON(ZkStateReader.java:129)
at 
org.apache.solr.cloud.ZkController.getLeaderInitiatedRecoveryStateObject(ZkController.java:1925)
at 
org.apache.solr.cloud.ZkController.getLeaderInitiatedRecoveryState(ZkController.java:1890)

at org.apache.solr.cloud.ZkController.publish(ZkController.java:1071)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1041)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1037)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:355)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)


Any hint on how to solve this? Google didn't reveal anything useful...


Kind regards
Thomas


Just switched to INFO loglevel:

INFO  - 2014-11-12 15:30:31.563; org.apache.solr.cloud.RecoveryStrategy; 
Publishing state of core onlinelist_shard1_replica7 as recovering, 
leader is http://solr-bc1-blade2:8080/solr/onlinelist_shard1_replica2/ 
and I am http://solr-bc1-blade3:8080/solr/onlinelist_shard1_replica7/
INFO  - 2014-11-12 15:30:31.563; org.apache.solr.cloud.RecoveryStrategy; 
Publishing state of core cams_shard1_replica4 as recovering, leader is 
http://solr-bc1-blade2:8080/solr/cams_shard1_replica2/ and I am 
http://solr-bc1-blade3:8080/solr/cams_shard1_replica4/
INFO  - 2014-11-12 15:30:31.563; org.apache.solr.cloud.ZkController; 
publishing core=onlinelist_shard1_replica7 state=recovering 
collection=onlinelist
INFO  - 2014-11-12 15:30:31.563; org.apache.solr.cloud.ZkController; 
publishing core=cams_shard1_replica4 state=recovering collection=cams
ERROR - 2014-11-12 15:30:31.564; org.apache.solr.common.SolrException; 
Error while trying to recover. 
core=cams_shard1_replica4rg.noggit.JSONParser$ParseException: JSON Parse 
Error: char=d,position=0 BEFORE='d' AFTER='own'
ERROR - 2014-11-12 15:30:31.564; org.apache.solr.common.SolrException; 
Error while trying to recover. 
core=onlinelist_shard1_replica7rg.noggit.JSONParser$ParseException: JSON 
Parse Error: char=d,position=0 BEFORE='d' AFTER='own'
ERROR - 2014-11-12 15:30:31.564; org.apache.solr.cloud.RecoveryStrategy; 
Recovery failed - trying again... (5) core=cams_shard1_replica4
ERROR - 2014-11-12 15:30:31.564; org.apache.solr.cloud.RecoveryStrategy; 
Recovery failed - trying again... (5) core=onlinelist_shard1_replica7
INFO  - 2014-11-12 15:30:31.564; org.apache.solr.cloud.RecoveryStrategy; 
Wait 60.0 seconds before trying to recover again (6)
INFO  - 2014-11-12 15:30:31.564; org.apache.solr.cloud.RecoveryStrategy; 
Wait 60.0 seconds before trying to recover again (6)


The leader for both collections (solr-bc1-blade2) is still on 4.10.1.
As no special instructions were given in the release notes and it's a 
minor upgrade, we thought there should be no BC issues and planned to 
upgrade one node after the other.


Did that provide more insight?

--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139

Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476



Re: Problems after upgrade 4.10.1 -> 4.10.2

2014-11-13 Thread Thomas Lamy

Hi,

a big thank you to Jeon Woosung - we just upgraded our cloud to 4.10.2.
One correction: we had to use 
/collections/{collection}/leader_initiated_recovery/shard1/node5, where 
"node5" had to be replaced with the place the down node showed up in the 
solr cloud dashboard. Also no tomcat restart was neccessary - even 
contra productive, since state changes may overwrite the just-fixed enty.



Best regards
Thomas


Am 13.11.2014 um 05:47 schrieb Jeon Woosung:

you can migrate zookeeper data manually.

1. connect zookeeper.
 - zkCli.sh -server host:port
2. check old data
 - get /collections/"your collection
name"/leader_initiated_recovery/"your shard name"


[zk: localhost:3181(CONNECTED) 25] get
/collections/collection1/leader_initiated_recovery/shard1
*down*
cZxid = 0xe4
ctime = Thu Nov 13 13:38:53 KST 2014
mZxid = 0xe4
mtime = Thu Nov 13 13:38:53 KST 2014
pZxid = 0xe4
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0


i guess that there is only single word which is "down"

3. delete the data.
 - remove /collections/"your collection
name"/leader_initiated_recovery/"your shard name"

4. create new data.
 - create /collections/"your collection
name"/leader_initiated_recovery/"your shard name" {state:down}

5. restart the server.



On Thu, Nov 13, 2014 at 7:42 AM, Anshum Gupta 
wrote:


Considering the impact, I think we should put this out as an announcement
on the 'news' section of the website warning people about this.

On Wed, Nov 12, 2014 at 12:33 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:


I opened https://issues.apache.org/jira/browse/SOLR-6732

On Wed, Nov 12, 2014 at 12:29 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:


Hi Thomas,

You're right, there's a back-compat break here. I'll open an issue.

On Wed, Nov 12, 2014 at 9:37 AM, Thomas Lamy 

wrote:

Am 12.11.2014 um 15:29 schrieb Thomas Lamy:


Hi there!

As we got bitten by https://issues.apache.org/jira/browse/SOLR-6530

on

a regular basis, we started upgrading our 7 mode cloud from 4.10.1 to
4.10.2.
The first node upgrade worked like a charm.
After upgrading the second node, two cores no longer come up and we

get

the following error:

ERROR - 2014-11-12 15:17:34.226;

org.apache.solr.cloud.RecoveryStrategy;

Recovery failed - trying again... (16) core=cams_shard1_replica4
ERROR - 2014-11-12 15:17:34.230;

org.apache.solr.common.SolrException;

Error while trying to recover. core=onlinelist_shard1_
replica7rg.noggit.JSONParser$ParseException: JSON Parse Error:
char=d,position=0 BEFORE='d' AFTER='own'
 at org.noggit.JSONParser.err(JSONParser.java:223)
 at org.noggit.JSONParser.next(JSONParser.java:622)
 at org.noggit.JSONParser.nextEvent(JSONParser.java:663)
 at org.noggit.ObjectBuilder.(ObjectBuilder.java:44)
 at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37)
 at org.apache.solr.common.cloud.ZkStateReader.fromJSON(
ZkStateReader.java:129)
 at

org.apache.solr.cloud.ZkController.getLeaderInitiatedRecoveryStat

eObject(ZkController.java:1925)
 at

org.apache.solr.cloud.ZkController.getLeaderInitiatedRecoveryStat

e(ZkController.java:1890)
 at org.apache.solr.cloud.ZkController.publish(
ZkController.java:1071)
 at org.apache.solr.cloud.ZkController.publish(
ZkController.java:1041)
 at org.apache.solr.cloud.ZkController.publish(
ZkController.java:1037)
 at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
RecoveryStrategy.java:355)
 at org.apache.solr.cloud.RecoveryStrategy.run(
RecoveryStrategy.java:235)

Any hint on how to solve this? Google didn't reveal anything

useful...


Kind regards
Thomas

  Just switched to INFO loglevel:

INFO  - 2014-11-12 15:30:31.563;

org.apache.solr.cloud.RecoveryStrategy;

Publishing state of core onlinelist_shard1_replica7 as recovering,

leader

is http://solr-bc1-blade2:8080/solr/onlinelist_shard1_replica2/ and I

am

http://solr-bc1-blade3:8080/solr/onlinelist_shard1_replica7/
INFO  - 2014-11-12 15:30:31.563;

org.apache.solr.cloud.RecoveryStrategy;

Publishing state of core cams_shard1_replica4 as recovering, leader is
http://solr-bc1-blade2:8080/solr/cams_shard1_replica2/ and I am
http://solr-bc1-blade3:8080/solr/cams_shard1_replica4/
INFO  - 2014-11-12 15:30:31.563; org.apache.solr.cloud.ZkController;
publishing core=onlinelist_shard1_replica7 state=recovering
collection=onlinelist
INFO  - 2014-11-12 15:30:31.563; org.apache.solr.cloud.ZkController;
publishing core=cams_shard1_replica4 state=recovering collection=cams
ERROR - 2014-11-12 15:30:31.564; org.apache.solr.common.SolrException;
Error while trying to recover. core=cams_s

Indexing problems with BBoxField

2014-11-23 Thread Thomas Seidl
Hi all,

I just downloaded Solr 4.10.2 and wanted to try out the new BBoxField
type, but couldn't get it to work. The error (with status 400) I get is:

ERROR: [doc=foo] Error adding field
'bboxs_field_location_area'='ENVELOPE(25.89, 41.13, 47.07, 35.31)'
msg=java.lang.IllegalStateException: instead call createFields() because
isPolyField() is true

Which, of course, is rather unhelpful for a user.
The relevant portions of my schema.xml look like this (largely copied
from [1]:





[1] https://cwiki.apache.org/confluence/display/solr/Spatial+Search

And the request I send is this:


  
foo
ENVELOPE(25.89, 41.13,
47.07, 35.31)
  


Does anyone have any idea what could be going wrong here?

Thanks a lot in advance,
Thomas


leader split-brain at least once a day - need help

2015-01-07 Thread Thomas Lamy

Hi there,

we are running a 3 server cloud serving a dozen 
single-shard/replicate-everywhere collections. The 2 biggest collections 
are ~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5, 
Tomcat 7.0.56, Oracle Java 1.7.0_72-b14


10 of the 12 collections (the small ones) get filled by DIH full-import 
once a day starting at 1am. The second biggest collection is updated 
usind DIH delta-import every 10 minutes, the biggest one gets bulk json 
updates with commits once in 5 minutes.


On a regular basis, we have a leader information mismatch:
org.apache.solr.update.processor.DistributedUpdateProcessor; Request 
says it is coming from leader, but we are the leader

or the opposite
org.apache.solr.update.processor.DistributedUpdateProcessor; 
ClusterState says we are the leader, but locally we don't think so


One of these pop up once a day at around 8am, making either some cores 
going to "recovery failed" state, or all cores of at least one cloud 
node into state "gone".
This started out of the blue about 2 weeks ago, without changes to 
neither software, data, or client behaviour.


Most of the time, we get things going again by restarting solr on the 
current leader node, forcing a new election - can this be triggered 
while keeping solr (and the caches) up?
But sometimes this doesn't help, we had an incident last weekend where 
our admins didn't restart in time, creating millions of entries in 
/solr/oversser/queue, making zk close the connection, and leader 
re-elect fails. I had to flush zk, and re-upload collection config to 
get solr up again (just like in 
https://gist.github.com/isoboroff/424fcdf63fa760c1d1a7).


We have a much bigger cloud (7 servers, ~50GiB Data in 8 collections, 
1500 requests/s) up and running, which does not have these problems 
since upgrading to 4.10.2.



Any hints on where to look for a solution?

Kind regards
Thomas

--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139
Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476



Re: leader split-brain at least once a day - need help

2015-01-08 Thread Thomas Lamy

Hi Alan,
thanks for the pointer, I'll look at our gc logs

Am 07.01.2015 um 15:46 schrieb Alan Woodward:

I had a similar issue, which was caused by 
https://issues.apache.org/jira/browse/SOLR-6763.  Are you getting long GC 
pauses or similar before the leader mismatches occur?

Alan Woodward
www.flax.co.uk


On 7 Jan 2015, at 10:01, Thomas Lamy wrote:


Hi there,

we are running a 3 server cloud serving a dozen 
single-shard/replicate-everywhere collections. The 2 biggest collections are 
~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5, Tomcat 
7.0.56, Oracle Java 1.7.0_72-b14

10 of the 12 collections (the small ones) get filled by DIH full-import once a 
day starting at 1am. The second biggest collection is updated usind DIH 
delta-import every 10 minutes, the biggest one gets bulk json updates with 
commits once in 5 minutes.

On a regular basis, we have a leader information mismatch:
org.apache.solr.update.processor.DistributedUpdateProcessor; Request says it is 
coming from leader, but we are the leader
or the opposite
org.apache.solr.update.processor.DistributedUpdateProcessor; ClusterState says 
we are the leader, but locally we don't think so

One of these pop up once a day at around 8am, making either some cores going to "recovery 
failed" state, or all cores of at least one cloud node into state "gone".
This started out of the blue about 2 weeks ago, without changes to neither 
software, data, or client behaviour.

Most of the time, we get things going again by restarting solr on the current 
leader node, forcing a new election - can this be triggered while keeping solr 
(and the caches) up?
But sometimes this doesn't help, we had an incident last weekend where our 
admins didn't restart in time, creating millions of entries in 
/solr/oversser/queue, making zk close the connection, and leader re-elect 
fails. I had to flush zk, and re-upload collection config to get solr up again 
(just like in https://gist.github.com/isoboroff/424fcdf63fa760c1d1a7).

We have a much bigger cloud (7 servers, ~50GiB Data in 8 collections, 1500 
requests/s) up and running, which does not have these problems since upgrading 
to 4.10.2.


Any hints on where to look for a solution?

Kind regards
Thomas

--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139
Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476






--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139

Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476



Re: leader split-brain at least once a day - need help

2015-01-12 Thread Thomas Lamy

Hi,

I found no big/unusual GC pauses in the Log (at least manually; I found 
no free solution to analyze them that worked out of the box on a 
headless debian wheezy box). Eventually i tried with -Xmx8G (was 64G 
before) on one of the nodes, after checking allocation after 1 hour run 
time was at about 2-3GB. That didn't move the time frame where a restart 
was needed, so I don't think Solr's JVM GC is the problem.
We're trying to get all of our node's logs (zookeeper and solr) into 
Splunk now, just to get a better sorted view of what's going on in the 
cloud once a problem occurs. We're also enabling GC logging for 
zookeeper; maybe we were missing problems there while focussing on solr 
logs.


Thomas


Am 08.01.15 um 16:33 schrieb Yonik Seeley:

It's worth noting that those messages alone don't necessarily signify
a problem with the system (and it wouldn't be called "split brain").
The async nature of updates (and thread scheduling) along with
stop-the-world GC pauses that can change leadership, cause these
little windows of inconsistencies that we detect and log.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


On Wed, Jan 7, 2015 at 5:01 AM, Thomas Lamy  wrote:

Hi there,

we are running a 3 server cloud serving a dozen
single-shard/replicate-everywhere collections. The 2 biggest collections are
~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5, Tomcat
7.0.56, Oracle Java 1.7.0_72-b14

10 of the 12 collections (the small ones) get filled by DIH full-import once
a day starting at 1am. The second biggest collection is updated usind DIH
delta-import every 10 minutes, the biggest one gets bulk json updates with
commits once in 5 minutes.

On a regular basis, we have a leader information mismatch:
org.apache.solr.update.processor.DistributedUpdateProcessor; Request says it
is coming from leader, but we are the leader
or the opposite
org.apache.solr.update.processor.DistributedUpdateProcessor; ClusterState
says we are the leader, but locally we don't think so

One of these pop up once a day at around 8am, making either some cores going
to "recovery failed" state, or all cores of at least one cloud node into
state "gone".
This started out of the blue about 2 weeks ago, without changes to neither
software, data, or client behaviour.

Most of the time, we get things going again by restarting solr on the
current leader node, forcing a new election - can this be triggered while
keeping solr (and the caches) up?
But sometimes this doesn't help, we had an incident last weekend where our
admins didn't restart in time, creating millions of entries in
/solr/oversser/queue, making zk close the connection, and leader re-elect
fails. I had to flush zk, and re-upload collection config to get solr up
again (just like in https://gist.github.com/isoboroff/424fcdf63fa760c1d1a7).

We have a much bigger cloud (7 servers, ~50GiB Data in 8 collections, 1500
requests/s) up and running, which does not have these problems since
upgrading to 4.10.2.


Any hints on where to look for a solution?

Kind regards
Thomas

--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139
Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476




--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139

Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476



Re: leader split-brain at least once a day - need help

2015-01-13 Thread Thomas Lamy

Hi Mark,

we're currently at 4.10.2, update to 4.10.3 ist scheduled for tomorrow.

T

Am 12.01.15 um 17:30 schrieb Mark Miller:

bq. ClusterState says we are the leader, but locally we don't think so

Generally this is due to some bug. One bug that can lead to it was recently
fixed in 4.10.3 I think. What version are you on?

- Mark

On Mon Jan 12 2015 at 7:35:47 AM Thomas Lamy  wrote:


Hi,

I found no big/unusual GC pauses in the Log (at least manually; I found
no free solution to analyze them that worked out of the box on a
headless debian wheezy box). Eventually i tried with -Xmx8G (was 64G
before) on one of the nodes, after checking allocation after 1 hour run
time was at about 2-3GB. That didn't move the time frame where a restart
was needed, so I don't think Solr's JVM GC is the problem.
We're trying to get all of our node's logs (zookeeper and solr) into
Splunk now, just to get a better sorted view of what's going on in the
cloud once a problem occurs. We're also enabling GC logging for
zookeeper; maybe we were missing problems there while focussing on solr
logs.

Thomas


Am 08.01.15 um 16:33 schrieb Yonik Seeley:

It's worth noting that those messages alone don't necessarily signify
a problem with the system (and it wouldn't be called "split brain").
The async nature of updates (and thread scheduling) along with
stop-the-world GC pauses that can change leadership, cause these
little windows of inconsistencies that we detect and log.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


On Wed, Jan 7, 2015 at 5:01 AM, Thomas Lamy 

wrote:

Hi there,

we are running a 3 server cloud serving a dozen
single-shard/replicate-everywhere collections. The 2 biggest

collections are

~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5,

Tomcat

7.0.56, Oracle Java 1.7.0_72-b14

10 of the 12 collections (the small ones) get filled by DIH full-import

once

a day starting at 1am. The second biggest collection is updated usind

DIH

delta-import every 10 minutes, the biggest one gets bulk json updates

with

commits once in 5 minutes.

On a regular basis, we have a leader information mismatch:
org.apache.solr.update.processor.DistributedUpdateProcessor; Request

says it

is coming from leader, but we are the leader
or the opposite
org.apache.solr.update.processor.DistributedUpdateProcessor;

ClusterState

says we are the leader, but locally we don't think so

One of these pop up once a day at around 8am, making either some cores

going

to "recovery failed" state, or all cores of at least one cloud node into
state "gone".
This started out of the blue about 2 weeks ago, without changes to

neither

software, data, or client behaviour.

Most of the time, we get things going again by restarting solr on the
current leader node, forcing a new election - can this be triggered

while

keeping solr (and the caches) up?
But sometimes this doesn't help, we had an incident last weekend where

our

admins didn't restart in time, creating millions of entries in
/solr/oversser/queue, making zk close the connection, and leader

re-elect

fails. I had to flush zk, and re-upload collection config to get solr up
again (just like in https://gist.github.com/

isoboroff/424fcdf63fa760c1d1a7).

We have a much bigger cloud (7 servers, ~50GiB Data in 8 collections,

1500

requests/s) up and running, which does not have these problems since
upgrading to 4.10.2.


Any hints on where to look for a solution?

Kind regards
Thomas

--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139
Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476



--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139

Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476





--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139

Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476



RE: Solr indexer and Hadoop

2013-06-25 Thread James Thomas
>> The problem I am facing is how to read those data from hard disks which are 
>> not HDFS

If you are planning to use a Map-Reduce job to do the indexing then the source 
data will definitely have to be on HDFS.
The Map function can transform the source data to Solr documents and send them 
to Solr  (e.g. via CloudSolrServer Java API) for indexing.

-- James

-Original Message-
From: engy.morsy [mailto:engy.mo...@bibalex.org] 
Sent: Tuesday, June 25, 2013 3:14 AM
To: solr-user@lucene.apache.org
Subject: Solr indexer and Hadoop

Hi All, 

I have TB of data that need to be indexed. I am trying to use hadoop to index 
those TB. I am still newbie. 
I thought that the Map function will read data from hard disks and the reduce 
function will index them. The problem I am facing is how to read those data 
from hard disks which are not HDFS. 

I understand that the data to be indexed must be on HDFS, don't they? or I am 
missing something here. 

I can't convert the nodes on which the data resides to HDFS. Can anyone please 
help.

I would also appreciate if you can provide a good tutorial for solr indexing 
using hadoop. I googled alot but I did not find a sufficient one. 
 
Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Joins with SolrCloud

2013-06-25 Thread James Thomas
My understanding is the same that "{!join...}" does not work in SolrCloud (aka 
distributed search)
based on:
1.  https://issues.apache.org/jira/browse/LUCENE-3759
2. http://wiki.apache.org/solr/DistributedSearch
--- see "Limitations" section which refers to the JIRA above


-- James

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Tuesday, June 25, 2013 7:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Joins with SolrCloud

I have never heard mention that joins support distributed search, so you cannot 
do a join against a sharded core.

However, if from your example, innerCollection was replicated across all nodes, 
I would think that should work, because all that comes back from each server 
when a distributed search happens is the best 'n' matches, so exactly how those 
'n' matches were located doesn't matter particularly.

Simpler answer: try it!

Upayavira

On Tue, Jun 25, 2013, at 11:25 PM, Chris Toomey wrote:
> What are the restrictions/limitations w.r.t. joins when using SolrCloud?
> 
> Say I have a 3-node cluster and both my "outer" and "inner" 
> collections are sharded 3 ways across the cluster.  Could I do a query 
> such as 
> "select?q={!join+from=inner_id+fromIndex=innerCollection+to=outer_id}xx:foo&collection=outerCollection"?
> 
> Or if the above isn't supported, would it be if the "inner" collection 
> was not sharded and was replicated across all 3 nodes, so that it 
> existed in its entirety on each node?
> 
> thx,
> Chris


RE: Facet sorting seems weird

2013-07-15 Thread James Thomas
Hi Henrik,

We did something related to this that I'll share.  I'm rather new to Solr so 
take this idea cautiously :-)
Our requirement was to show exact values but have case-insensitive sorting and 
facet filtering (prefix filtering).

We created an index field (type="string") for creating facets so that the 
values are indexed as-is.
The values we indexed were given the format |
So for example, given the value "bObles", we would index the string 
"bobles|bObles".
When displaying the facet we split the facet value from Solr in half and 
display the second half to the user.
Of course the caveat is that you could have 2 facets that differ only in case, 
but to me that's a data cleansing issue.

James

-Original Message-
From: Henrik Ossipoff Hansen [mailto:h...@entertainment-trading.com] 
Sent: Monday, July 15, 2013 10:57 AM
To: solr-user@lucene.apache.org
Subject: RE: Facet sorting seems weird

Hello, thank you for the quick reply!

But given that facet.sort=index just sorts by the faceted index (and I don't 
want the facet itself to be in lower-case), would that really work?

Regards,
Henrik Ossipoff


-Original Message-
From: David Quarterman [mailto:da...@corexe.com] 
Sent: 15. juli 2013 16:46
To: solr-user@lucene.apache.org
Subject: RE: Facet sorting seems weird

Hi Henrik,

Try setting up a copyfield in your schema and set the copied field to use 
something like 'text_ws' which implements LowerCaseFilterFactory. Then sort on 
the copyfield.

Regards,

DQ

-Original Message-
From: Henrik Ossipoff Hansen [mailto:h...@entertainment-trading.com] 
Sent: 15 July 2013 15:08
To: solr-user@lucene.apache.org
Subject: Facet sorting seems weird

Hello, first time writing to the list. I am a developer for a company where we 
recently switched all of our search core from Sphinx to Solr with very great 
results. In general we've been very happy with the switch, and everything seems 
to work just as we want it to.

Today however we've run into a bit of a issue regarding faceted sort.

For example we have a field called "brand" in our core, defined as the text_en 
datatype from the example Solr core. This field is copied into facet_brand with 
the datatype string (since we don't really need to do much with it except show 
it for faceted navigation).

Now, given these two entries into the field on different documents, "LEGO" and 
"bObles", and given facet.sort=index, it appears that LEGO is sorted as being 
before bObles. I assume this is because of casing differences.

My question then is, how do we define a decent datatype in our schema, where 
the casing is exact, but we are able to sort it without casing mattering?

Thank you :)

Best regards,
Henrik Ossipoff


SolrCloud. Scale-test by duplicating same index to the shards and make it behave each index is different (uniqueId).

2013-10-01 Thread Thomas Egense
Hello everyone,
I have a small challenge performance testing a SolrCloud setup. I have 10
shards, and each shard is supposed to have index-size ~200GB. However I
only have a single index of 200GB because it will take too long to build
another index with different data,  and I hope to somehow use this index on
all 10 shards and make it behave as all documents are different on each
shard. So building more indexes from new data is not an option.

Making a query to a SolrCloud is a two-phase operation. First all shards
receive the query and return ID's and ranking. The merger will then remove
duplicate ID's and then the full documents will be retreived.

When I copy this index to all shards and make a request the following will
happen: Phase one: All shards will receive the query and return ids+ranking
(actually same set from all shards). This part is realistic enough.
Phase two: ID's will be merged and retrieving the documents is not
realistic as if they were spread out between shards (IO wise).

Is there any way I can 'fake' this somehow and have shards return a
prefixed_id for phase1 etc., which then also have to be undone when
retriving the documents for phase2.  I have tried making the hack in
org.apache.solr.handler.component.QueryComponent and a few other classes,
but no success. (The resultset are always empty). I do not need to index
any new documents, which would also be a challenge due to the ID
hash-interval for the shards with this hack.

Anyone has a good idea how to make this hack work?

From,
Thomas Egense


Faceting json response - odd format

2013-05-13 Thread Cord Thomas
Hello,

Relatively new to SOLR, I am quite happy with the API. 

I am a bit challenged by the faceting response in JSON though. 

This is what i am getting which mirrors what is in the documentation:

"facet_counts":{"facet_queries":{},

"facet_fields":{"metadata_meta_last_author":["Nick",330,"standarduser",153,"Mohan",52,"wwd",49,"gerald",45,"Riggins",36,"fallon",31,"blister",28,"
 
",26,"morfitelli",24,"Administrator",22,"morrow",22,"richard",22,"egilhoi",18,"USer
 Group",16],
  

This is not trivial to parse - I've read the docs but can't seem to figure out 
who one might get a more structured response to this.

Assuming I am not missing anything,  I guess i have to write a custom parser to 
build a separate data structure that can be more easily presented in a UI.  

Thank you

Cord

Re: Faceting json response - odd format

2013-05-13 Thread Cord Thomas
thank you Hoss,

What i would prefer to see as we do with all other parameters is a normal 
key/value pairing.  this might look like:

{"metadata_meta_last_author":[{"value": "Nick", "count": 330},{"value": 
"standard user","count": 153},{"value": "Mohan","count": 
52},{"value":"wwd","count": 49}…

Cord

On May 13, 2013, at 12:34 PM, Chris Hostetter  wrote:

> 
> : This is what i am getting which mirrors what is in the documentation:
> : 
> : "facet_counts":{"facet_queries":{},
> : 
> "facet_fields":{"metadata_meta_last_author":["Nick",330,"standarduser",153,"Mohan",52,"wwd",49,"gerald",45,"Riggins",36,"fallon",31,"blister",28,"
>  
> ",26,"morfitelli",24,"Administrator",22,"morrow",22,"richard",22,"egilhoi",18,"USer
>  Group",16],
> :   
> : 
> : This is not trivial to parse - I've read the docs but can't seem to 
> : figure out who one might get a more structured response to this.
> 
> You didn't provide any specifics about what you felt was problematic, but 
> i'm guessing what you want to do is pick the value you think is best for 
> the "json.nl" param...
> 
> http://wiki.apache.org/solr/SolJSON#JSON_specific_parameters
> 
> 
> -Hoss



Question about attributes

2013-05-17 Thread Thomas Portegys
First time on forum.
We are planning to use Solr to house some data mining formation, and we are 
thinking of using attributes to add some semantic information to indexed 
content. As a test, I wrote a filter that adds an "animal" attribute to tokens 
like "dog", "cat", etc. After adding a document with my field (animal_text), 
the attribute shows up in the analysis tab for the collection, so it seems to 
have been processed. What I'm not sure of is how attributes are surfaced in the 
index for search, either via an URL or programmatically. Any advice is 
appreciated.



Question on implementation for schema design - parsing path information into stored field

2013-05-20 Thread Cord Thomas
Hello,

I am submitting rich documents to a SOLR index via Solr Cell.   This is all
working well.

The documents are organized in meaningful folders.  I would like to capture
the folder names in my index so that I can use the folder names to provide
facets.

I can pass the path data into the indexing process and would like to
convert 2 paths deep into indexed and stored data - or copy field data.

Say i have files in these folders:

Financial
Financial/Annual
Financial/Audit
Organizational
Organizational/Offices
Organizational/Staff

I would like to then provide facets using these names.

Can someone please guide me in the right direction on how I might
accomplish this?

Thank you

Cord


Re: Question on implementation for schema design - parsing path information into stored field

2013-05-20 Thread Cord Thomas
Thank you Brendan,

I had started to read about the tokenizers and couldn't quite piece
together how it would work.  I will read about this and post my
implementation if successful.

Cord


On Mon, May 20, 2013 at 4:13 PM, Brendan Grainger <
brendan.grain...@gmail.com> wrote:

> Hi Cord,
>
> I think you'd do it like this:
>
> 1. Add this to schema.xml
>
> 
> 
>   
>/>
>   
>   
>   
>   
> 
>
>  stored="true" multiValued="true" />
>
> 2. When you index add the 'folders' to the folders_facet field (or whatever
> you want to call it).
> 3. Your query would look something like:
>
> http://localhost:8982/solr/
> /select?facet=on&facet.field=folders_facet&facet.mincount=1&
>
> There is a good explanation here:
>
> http://wiki.apache.org/solr/HierarchicalFaceting#PathHierarchyTokenizerFactory
>
>
> Hope that helps.
> Brendan
>
>
>
>
>
>
> On Mon, May 20, 2013 at 4:18 PM, Cord Thomas 
> wrote:
>
> > Hello,
> >
> > I am submitting rich documents to a SOLR index via Solr Cell.   This is
> all
> > working well.
> >
> > The documents are organized in meaningful folders.  I would like to
> capture
> > the folder names in my index so that I can use the folder names to
> provide
> > facets.
> >
> > I can pass the path data into the indexing process and would like to
> > convert 2 paths deep into indexed and stored data - or copy field data.
> >
> > Say i have files in these folders:
> >
> > Financial
> > Financial/Annual
> > Financial/Audit
> > Organizational
> > Organizational/Offices
> > Organizational/Staff
> >
> > I would like to then provide facets using these names.
> >
> > Can someone please guide me in the right direction on how I might
> > accomplish this?
> >
> > Thank you
> >
> > Cord
> >
>
>
>
> --
> Brendan Grainger
> www.kuripai.com
>


RE: Sole instance state is down in cloud mode

2013-06-05 Thread James Thomas
Are you using IE?  If so, you might want to try using Firefox.

-Original Message-
From: sathish_ix [mailto:skandhasw...@inautix.co.in] 
Sent: Wednesday, June 05, 2013 6:16 AM
To: solr-user@lucene.apache.org
Subject: Sole instance state is down in cloud mode

Hi,

When i start a core in solr-cloud im getting below message in log

I have setup zookeeper separately and uploaded the config files.
When i start the solr instance in cloud mode, state is down.


INFO: Update state numShards=null message={
  "operation":"state",
  "numShards":null,
  "shard":"shard1",
  "roles":null,
  *"state":"down",*
  "core":"core1",
  "collection":"core1",
  "node_name":"x:9980_solr",
  "base_url":"http://x:9980/solr"}
Jun 5, 2013 6:10:48 AM org.apache.solr.common.cloud.ZkStateReader$2 process
INFO: A cluster state change: WatchedEvent state:SyncConnected 
type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
(live nodes size: 1)


When i hit the url , i am getting left pane of the solr admin and righ side its 
keep on loading, any help ?

Thanks,
Sathish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sole-instance-state-is-down-in-cloud-mode-tp4068298.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How to stop index distribution among shards in solr cloud

2013-06-07 Thread James Thomas
This may help:

http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud
--- See "Document Routing" section.


-Original Message-
From: sathish_ix [mailto:skandhasw...@inautix.co.in] 
Sent: Friday, June 07, 2013 5:27 AM
To: solr-user@lucene.apache.org
Subject: How to stop index distribution among shards in solr cloud

Hi,

I have two shards, logically each shards corresponds to a region. Currently 
index is distributed in solr cloud to shards, how to load index to specific 
shard in solr cloud,

Any thoughts ?

Thanks,
Sathish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-stop-index-distribution-among-shards-in-solr-cloud-tp4068831.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: index merge question

2013-06-11 Thread James Thomas
FWIW, the Solr included with Cloudera Search, by default, "ignores all but the 
most recent document version" during merges.
The conflict resolution is configurable however.  See the documentation for 
details.
http://www.cloudera.com/content/support/en/documentation/cloudera-search/cloudera-search-documentation-v1-latest.html
-- see the user guide pdf, " update-conflict-resolver" parameter

James

-Original Message-
From: anirudh...@gmail.com [mailto:anirudh...@gmail.com] On Behalf Of Anirudha 
Jadhav
Sent: Tuesday, June 11, 2013 10:47 AM
To: solr-user@lucene.apache.org
Subject: Re: index merge question

From my experience the lucene mergeTool and the one invoked by coreAdmin is a 
pure lucene implementation and does not understand the concepts of a unique 
Key(solr land concept)

  http://wiki.apache.org/solr/MergingSolrIndexes has a cautionary note at the 
end

we do frequent index merges for which we externally run map/reduce ( java code 
using lucene api's) jobs to merge & validate merged indices with sources.
-Ani

On Tue, Jun 11, 2013 at 10:38 AM, Mark Miller  wrote:
> Yeah, you have to carefully manage things if you are map/reduce building 
> indexes *and* updating documents in other ways.
>
> If your 'source' data for MR index building is the 'truth', you also have the 
> option of not doing incremental index merging, and you could simply rebuild 
> the whole thing every time - of course, depending your cluster size, that 
> could be quite expensive.

>
> - Mark
>
> On Jun 10, 2013, at 8:36 PM, Jamie Johnson  wrote:
>
>> Thanks Mark.  My question is stemming from the new cloudera search stuff.
>> My concern its that if while rebuilding the index someone updates a 
>> doc that update could be lost from a solr perspective.  I guess what 
>> would need to happen to ensure the correct information was indexed 
>> would be to record the start time and reindex the information that changed 
>> since then?
>> On Jun 8, 2013 2:37 PM, "Mark Miller"  wrote:
>>
>>>
>>> On Jun 8, 2013, at 12:52 PM, Jamie Johnson  wrote:
>>>
 When merging through the core admin (
 http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy 
 for conflicts during the merge?  So for instance if I am merging 
 core 1 and core 2 into core 0 (first example), what happens if core 
 1 and core 2
>>> both
 have a document with the same key, say core 1 has a newer version 
 of core 2?  Does the merge fail, does the newer document remain?
>>>
>>> You end up with both documents, both with that ID - not generally a 
>>> situation you want to end up in. You need to ensure unique id's in 
>>> the input data or replace the index rather than merging into it.
>>>

 Also if using the srcCore method if a document with key 1 is 
 written
>>> while
 an index also with key 1 is being merged what happens?
>>>
>>> It depends on the order I think - if the doc is written after the 
>>> merge and it's an update, it will update the doc that was just 
>>> merged in. If the merge comes second, you have the doc twice and it's a 
>>> problem.
>>>
>>> - Mark
>



--
Anirudha P. Jadhav


RE: shardkey

2013-06-12 Thread James Thomas
This page has some good information on custom document routing: 
http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud



-Original Message-
From: Rishi Easwaran [mailto:rishi.easwa...@aol.com] 
Sent: Wednesday, June 12, 2013 1:40 PM
To: solr-user@lucene.apache.org
Subject: Re: shardkey

>From my understanding.
In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter.
CompositeId router is default if your numShards>1 on collection creation.
CompositeId router generates an hash using the uniqueKey defined in your 
schema.xml to route your documents to a dedicated shard.

You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit only 
the shard that has your shard.key  

 

 Thanks,

Rishi.

 

-Original Message-
From: Joshi, Shital 
To: 'solr-user@lucene.apache.org' 
Sent: Wed, Jun 12, 2013 10:01 am
Subject: shardkey


Hi,

We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple 
questions on shard key. 

1. Looking at the admin GUI, how do I know which field is being used 
for shard key.
2. What is the default shard key used?
3. How do I override the default shard key?

Thanks. 

 


RE: ConcurrentUpdateSolrserver - Queue size not working

2013-06-18 Thread James Thomas
Looks like the javadoc  on this parameter could use a little tweaking.
>From looking at the 4.3 source code (hoping I get this right :-), it appears 
>the ConcurrentUpdateSolrServer will begin sending documents (on a single 
>thread) as soon as the first document is added.
New threads (up to threadCount) are created only when a document is added and 
the queue is more than half full.
Kind of makes sense... why wait until the queue is full to send documents.  And 
if one thread can keep up with your ETL (adds), there's really no need to 
create new threads.

You might want to create your own buffer (e.g. ArrayList) of the 
SolrInputDocument objects and then use the "add" API that accepts the 
collection.
Calling "add" after creating 30,000 SolrInputDocument objects seems a bit much. 
 Something smaller (like 1,000) might work better.  You'll have to experiment 
to see what works best for your environment.

-- James

-Original Message-
From: Learner [mailto:bbar...@gmail.com] 
Sent: Tuesday, June 18, 2013 1:07 PM
To: solr-user@lucene.apache.org
Subject: ConcurrentUpdateSolrserver - Queue size not working

I am using ConcurrentUpdateSolrserver to create 4 threads (threadCount=4) with 
queueSize of 3.

Indexing works fine as expected.

My issue is that, I see that the documents are getting adding to server even 
before it reaches the queue size.  Am I doing anything wrong? Or is queuesize 
not implemented yet?

Also I dont see a very big performance improvements when I increase / decrease 
the number of threads. Can someone let me know the best way to improve indexing 
performance when using ConcurrentUpdateSolrserver

FYI... I am running this program on 4 core machine.. 

Sample snippet:

ConcurrentUpdateSolrServer server = new 
ConcurrentUpdateSolrServer(
solrServer, 3, 4);
try {
while ((line = bReader.readLine()) != null) {
inputDocument = line.split("\t");
  Do some processing
server.add(doc);

}}




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConcurrentUpdateSolrserver-Queue-size-not-working-tp4071408.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SolrCloud: no "timing" when no result in distributed mode

2013-06-21 Thread James Thomas
Seems to work fine for me on 4.3.0, maybe you can try a newer version.
4.3.1 is available.

-Original Message-
From: Elodie Sannier [mailto:elodie.sann...@kelkoo.fr] 
Sent: Friday, June 21, 2013 8:54 AM
To: solr-user@lucene.apache.org >> "solr-user@lucene.apache.org"
Subject: SolrCloud: no "timing" when no result in distributed mode

Hello,

I am using SolrCloud 4.2.1 with two shards, with the "debugQuery=true" 
parameter, when a query does not return documents then the "timing" debug 
information is not returned:
curl -sS "http://localhost:8983/solr/select?q=dummy&debugQuery=true"; | grep -o 
'.*'

If i use the "distrib=false" parameter, the "timing" debug information is 
returned:
curl -sS 
"http://localhost:8983/solr/select?q=dummy&debugQuery=true&distrib=false"; |  
grep -o '.*'
1.00.00.00.00.00.00.00.01.00.00.00.00.00.01.0*
**
*Is it a bug of the distributed mode ?*

*  Elodie Sannier
--

Kelkoo

*Elodie Sannier *Software engineer

*E*elodie.sann...@kelkoo.fr 
*Y!Messenger* kelkooelodies
*T* +33 (0)4 56 09 07 55 *M*
*A* 4/6 Rue des Méridiens 38130 Echirolles




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


solr.home

2011-12-21 Thread Thomas Fischer
Hello,

I'm trying to move forward with my solr system from 1.4 to 3.5 and ran into 
some problems with solr home.
Is this a known problem?

My solr 1.4 gives me the following messages (amongst many many others…) in 
catalina.out:

INFO: No /solr/home in JNDI
INFO: using system property solr.solr.home: '/srv/solr'
INFO: looking for solr.xml: /'/srv/solr'/solr.xml

then finds the solr.xml and proceeds from there (this is multicore).

With solr 3.5 I get:

INFO: No /solr/home in JNDI
INFO: using system property solr.solr.home: '/srv/solr'
INFO: Solr home set to ''/srv/solr'/'
INFO: Solr home set to ''/srv/solr'/./'
SCHWERWIEGEND: java.lang.RuntimeException: Can't find resource '' in classpath 
or ''/srv/solr'/./conf/', cwd=/

After that solr is somehow started but not aware of the cores present.

This can be solved by putting a solr.xml file into 
$CATALINA_HOME/conf/Catalina/localhost/ with

which results in
INFO: Using JNDI solr.home: /srv/solr
and everything seems to run smoothely afterwards, although solr.xml is never 
mentioned.

I would like to know when this changed and why, and why solr 3.5 is looking for 
solrconfig.xml instead of solr.xml in solr.home

(Am I the only one who finds it confusing to have the three names 
solr.solr.home (system property),  solr.home (JNDI), solr/home (Environment 
name) for the same object?)

Best
Thomas

Re: solr.home

2011-12-26 Thread Thomas Fischer
Hi Shawn,

thanks for looking into this.
I am using a start-up script for Tomcat, and in that script there was actually 
the line

export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home='/srv/solr'"

which most likely created the problem.
With

export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/srv/solr"

I get

INFO: No /solr/home in JNDI
INFO: using system property solr.solr.home: /srv/solr

and everything seems to work fine, so there obviously was a tightening of the 
syntax somewhere between solr 1.4 and solr 3.5.

Thanks again
Thomas


Am 22.12.2011 um 17:06 schrieb Shawn Heisey:

> On 12/21/2011 4:13 AM, Thomas Fischer wrote:
>> I'm trying to move forward with my solr system from 1.4 to 3.5 and ran into 
>> some problems with solr home.
>> Is this a known problem?
>> 
>> My solr 1.4 gives me the following messages (amongst many many others…) in 
>> catalina.out:
>> 
>> INFO: No /solr/home in JNDI
>> INFO: using system property solr.solr.home: '/srv/solr'
>> INFO: looking for solr.xml: /'/srv/solr'/solr.xml
>> 
>> then finds the solr.xml and proceeds from there (this is multicore).
>> 
>> With solr 3.5 I get:
>> 
>> INFO: No /solr/home in JNDI
>> INFO: using system property solr.solr.home: '/srv/solr'
>> INFO: Solr home set to ''/srv/solr'/'
>> INFO: Solr home set to ''/srv/solr'/./'
>> SCHWERWIEGEND: java.lang.RuntimeException: Can't find resource '' in 
>> classpath or ''/srv/solr'/./conf/', cwd=/
>> 
>> After that solr is somehow started but not aware of the cores present.
>> 
>> This can be solved by putting a solr.xml file into 
>> $CATALINA_HOME/conf/Catalina/localhost/ with
>> > override="true" />
>> which results in
>> INFO: Using JNDI solr.home: /srv/solr
>> and everything seems to run smoothely afterwards, although solr.xml is never 
>> mentioned.
>> 
>> I would like to know when this changed and why, and why solr 3.5 is looking 
>> for solrconfig.xml instead of solr.xml in solr.home
>> 
>> (Am I the only one who finds it confusing to have the three names 
>> solr.solr.home (system property),  solr.home (JNDI), solr/home (Environment 
>> name) for the same object?)
> 
> Here's what I have as a commandline option when starting Jetty:
> 
> -Dsolr.solr.home=/index/solr
> 
> This is what my log from Solr 3.5.0 says at the very beginning.
> 
> Dec 14, 2011 8:42:28 AM org.apache.solr.core.SolrResourceLoader locateSolrHome
> INFO: JNDI not configured for solr (NoInitialContextEx)
> Dec 14, 2011 8:42:28 AM org.apache.solr.core.SolrResourceLoader locateSolrHome
> INFO: using system property solr.solr.home: /index/solr
> Dec 14, 2011 8:42:28 AM org.apache.solr.core.SolrResourceLoader 
> INFO: Solr home set to '/index/solr/'
> 
> Note that in my log it shows the system property without any kind of quotes, 
> but in yours, it is surrounded - '/srv/solr'.  I am guessing that wherever 
> you are defining solr.solr.home, you have included those quotes, and that 
> removing them would probably fix the problem.
> 
> If this is indeed the problem, the newer version is probably interpreting 
> input values much more literally, the old version probably ran the final path 
> value through a parser that took care of removing the quotes for you, but 
> that parser also removed certain characters that some users actually needed.  
> Notice that the quotes are interspersed in the full solr.xml path in your 1.4 
> log.
> 
> Thanks,
> Shawn
> 



how to create a custom type in Solr

2010-08-06 Thread Thomas Joiner
I need to have a field that supports ranges...for instance, you specify a
range of 8000 to 9000 and if you search for 8500, it will hit.  However,
when googling, I really couldn't find any resources on how to create your
own field type in Solr.

But from what I was able to find, the AbstractSubTypeFieldType class seems
like a good starting point for the type that I want to make, however that
isn't in the current version of Solr that I am using (1.4.1).  So I guess my
question is: is Solr 3.0 ready for production?  If so, how do I get it? Do I
just need to checkout the code from svn and build it myself?  If so should I
just check out the latest, or is there a particular branch that I should go
with that is reliable?  If I switch to 3.0, will I need to reindex my data,
or has the data format not changed?

And if 3.0 isn't ready for production, what would you suggest I do?  Is the
AbstractSubTypeFieldType such that I can backport it and use it with 1.4.1,
or does it use specific features of 3.0 that I would have to backport as
well, in which case it would become a horribly convoluted mess where I would
be better off just going with 3.0.  And I guess this comes back to help on
finding resources about implementing custom types...it would just be more
complicated if I couldn't use the AbstractSubTypeFieldType.

(This is my first time posting to a mailing list, so if I have violated
horribly some etiquette of mailing lists, please tell me).

Regards,
Thomas


Re: how to create a custom type in Solr

2010-08-06 Thread Thomas Joiner
This will work for a single range.  However, I may need to support multiple
ranges, is there a way to do that?

On Fri, Aug 6, 2010 at 10:49 AM, Jan Høydahl / Cominvent <
jan@cominvent.com> wrote:

> Your use case can be solved by splitting the range into two int's:
>
> Document: {title: My document, from: 8000, to: 9000}
> Query: q=title:"My" AND (from:[* TO 8500] AND to:[8500 TO *])
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
>
> On 6. aug. 2010, at 17.02, Thomas Joiner wrote:
>
> > I need to have a field that supports ranges...for instance, you specify a
> > range of 8000 to 9000 and if you search for 8500, it will hit.  However,
> > when googling, I really couldn't find any resources on how to create your
> > own field type in Solr.
> >
> > But from what I was able to find, the AbstractSubTypeFieldType class
> seems
> > like a good starting point for the type that I want to make, however that
> > isn't in the current version of Solr that I am using (1.4.1).  So I guess
> my
> > question is: is Solr 3.0 ready for production?  If so, how do I get it?
> Do I
> > just need to checkout the code from svn and build it myself?  If so
> should I
> > just check out the latest, or is there a particular branch that I should
> go
> > with that is reliable?  If I switch to 3.0, will I need to reindex my
> data,
> > or has the data format not changed?
> >
> > And if 3.0 isn't ready for production, what would you suggest I do?  Is
> the
> > AbstractSubTypeFieldType such that I can backport it and use it with
> 1.4.1,
> > or does it use specific features of 3.0 that I would have to backport as
> > well, in which case it would become a horribly convoluted mess where I
> would
> > be better off just going with 3.0.  And I guess this comes back to help
> on
> > finding resources about implementing custom types...it would just be more
> > complicated if I couldn't use the AbstractSubTypeFieldType.
> >
> > (This is my first time posting to a mailing list, so if I have violated
> > horribly some etiquette of mailing lists, please tell me).
> >
> > Regards,
> > Thomas
>
>


Re: how to create a custom type in Solr

2010-08-09 Thread Thomas Joiner
I'd love to see your code on this, however what I've really been wondering
is the following: When did AbstractSubTypeFieldType get added?  It isn't in
1.4.1 (as far as I can tell that's the latest one that is bundled on their
site).  So, do I just need to grab it from subversion, and build it?  And if
so, is there a particular revision that I should go with?  Or should I just
pull trunk and use that, and last of all, is trunk stable enough to be used
in production?

Regards,
Thomas

On Mon, Aug 9, 2010 at 8:38 AM, Mark Allan  wrote:

> On 9 Aug 2010, at 1:01 pm, Otis Gospodnetic wrote:
>
>  Mark,
>>
>> A good way to get your changes/improvements into Solr is by putting them
>> in
>> JIRA.  Please see http://wiki.apache.org/solr/HowToContribute
>>
>> Thanks!
>> Otis
>>
>
>
> Hi Otis,
>
> For the class which requires only minor modifications, I tested it to
> ensure it doesn't break existing compatibility/functionality, and then I
> created an issue in JIRA and uploaded a patch:
>https://issues.apache.org/jira/browse/SOLR-1986
>
> I then posted a message about it to the list and got the following
> responses.
>
> On 7 Jul 2010, at 6:24 pm, Yonik Seeley wrote:
>
>> On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll 
>> wrote:
>>
>>> Originally, I had intended that it was just for one Field Sub Type,
>>> thinking that if we ever wanted multiple sub types, that a new, separate
>>> class would be needed
>>>
>>
>> Right - this was my original thinking too.  AbstractSubTypeFieldType is
>> only a convenience class to create compound types... people can do it other
>> ways.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>
> When I replied to ask if that meant the changes wouldn't be included, I got
> no response. As there's been no activity in JIRA, I didn't bother putting
> any of my other changes into JIRA as they all relied on that one.
> Mark
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>


Re: how to create a custom type in Solr

2010-08-16 Thread Thomas Joiner
Sorry to bother you, but since I haven't had a reply in a week, I figured
I'd try asking again...

What build of Solr are you using personally?  Are you just using a nightly
build, or is there a specific build that you are using?  Has it had any
major screw-ups for you?

And I still would love to see your code.

Regards,
Thomas

On Mon, Aug 9, 2010 at 8:50 AM, Thomas Joiner wrote:

> I'd love to see your code on this, however what I've really been wondering
> is the following: When did AbstractSubTypeFieldType get added?  It isn't in
> 1.4.1 (as far as I can tell that's the latest one that is bundled on their
> site).  So, do I just need to grab it from subversion, and build it?  And if
> so, is there a particular revision that I should go with?  Or should I just
> pull trunk and use that, and last of all, is trunk stable enough to be used
> in production?
>
> Regards,
> Thomas
>
>
> On Mon, Aug 9, 2010 at 8:38 AM, Mark Allan  wrote:
>
>> On 9 Aug 2010, at 1:01 pm, Otis Gospodnetic wrote:
>>
>>  Mark,
>>>
>>> A good way to get your changes/improvements into Solr is by putting them
>>> in
>>> JIRA.  Please see http://wiki.apache.org/solr/HowToContribute
>>>
>>> Thanks!
>>> Otis
>>>
>>
>>
>> Hi Otis,
>>
>> For the class which requires only minor modifications, I tested it to
>> ensure it doesn't break existing compatibility/functionality, and then I
>> created an issue in JIRA and uploaded a patch:
>>https://issues.apache.org/jira/browse/SOLR-1986
>>
>> I then posted a message about it to the list and got the following
>> responses.
>>
>> On 7 Jul 2010, at 6:24 pm, Yonik Seeley wrote:
>>
>>> On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll 
>>> wrote:
>>>
>>>> Originally, I had intended that it was just for one Field Sub Type,
>>>> thinking that if we ever wanted multiple sub types, that a new, separate
>>>> class would be needed
>>>>
>>>
>>> Right - this was my original thinking too.  AbstractSubTypeFieldType is
>>> only a convenience class to create compound types... people can do it other
>>> ways.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>
>>
>> When I replied to ask if that meant the changes wouldn't be included, I
>> got no response. As there's been no activity in JIRA, I didn't bother
>> putting any of my other changes into JIRA as they all relied on that one.
>> Mark
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>


Re: how to create a custom type in Solr

2010-08-16 Thread Thomas Joiner
Thanks you very much.

I know the feeling, I've definitely had times when I just got busy and
didn't reply, but I've had plenty to do that didn't require that to be done
first, so no worries.

Thanks,
Thomas

On Mon, Aug 16, 2010 at 9:14 AM, Mark Allan  wrote:

> Hi Thomas,
>
> Sorry for not replying before now - I've had your email flagged in my mail
> client to remind me to reply, but I've been so busy recently I never got
> round to it.
>
> I'll package up the necessary java files and send you the attachment
> directly instead of posting a zip file to the mailing list, which in most
> places would be against list etiquette.
>
> Mark
>
>
> On 16 Aug 2010, at 3:01 pm, Thomas Joiner wrote:
>
>  Sorry to bother you, but since I haven't had a reply in a week, I figured
>> I'd try asking again...
>>
>> What build of Solr are you using personally?  Are you just using a nightly
>> build, or is there a specific build that you are using?  Has it had any
>> major screw-ups for you?
>>
>> And I still would love to see your code.
>>
>> Regards,
>> Thomas
>>
>> On Mon, Aug 9, 2010 at 8:50 AM, Thomas Joiner > >wrote:
>>
>>  I'd love to see your code on this, however what I've really been
>>> wondering
>>> is the following: When did AbstractSubTypeFieldType get added?  It isn't
>>> in
>>> 1.4.1 (as far as I can tell that's the latest one that is bundled on
>>> their
>>> site).  So, do I just need to grab it from subversion, and build it?  And
>>> if
>>> so, is there a particular revision that I should go with?  Or should I
>>> just
>>> pull trunk and use that, and last of all, is trunk stable enough to be
>>> used
>>> in production?
>>>
>>> Regards,
>>> Thomas
>>>
>>>
>>> On Mon, Aug 9, 2010 at 8:38 AM, Mark Allan  wrote:
>>>
>>>  On 9 Aug 2010, at 1:01 pm, Otis Gospodnetic wrote:
>>>>
>>>> Mark,
>>>>
>>>>>
>>>>> A good way to get your changes/improvements into Solr is by putting
>>>>> them
>>>>> in
>>>>> JIRA.  Please see http://wiki.apache.org/solr/HowToContribute
>>>>>
>>>>> Thanks!
>>>>> Otis
>>>>>
>>>>>
>>>>
>>>> Hi Otis,
>>>>
>>>> For the class which requires only minor modifications, I tested it to
>>>> ensure it doesn't break existing compatibility/functionality, and then I
>>>> created an issue in JIRA and uploaded a patch:
>>>>  https://issues.apache.org/jira/browse/SOLR-1986
>>>>
>>>> I then posted a message about it to the list and got the following
>>>> responses.
>>>>
>>>> On 7 Jul 2010, at 6:24 pm, Yonik Seeley wrote:
>>>>
>>>>  On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll 
>>>>> wrote:
>>>>>
>>>>>  Originally, I had intended that it was just for one Field Sub Type,
>>>>>> thinking that if we ever wanted multiple sub types, that a new,
>>>>>> separate
>>>>>> class would be needed
>>>>>>
>>>>>>
>>>>> Right - this was my original thinking too.  AbstractSubTypeFieldType is
>>>>> only a convenience class to create compound types... people can do it
>>>>> other
>>>>> ways.
>>>>>
>>>>> -Yonik
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>
>>>> When I replied to ask if that meant the changes wouldn't be included, I
>>>> got no response. As there's been no activity in JIRA, I didn't bother
>>>> putting any of my other changes into JIRA as they all relied on that
>>>> one.
>>>> Mark
>>>>
>>>>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>


proper query handling for multiValued queries that are also polyFields?

2010-08-20 Thread Thomas Joiner
I am wondering...is there currently any way for queries to properly handle
multiValued polyFields?

For instance, if you have a



And if you added two values to that field such as "1,2" and "3,4", that
would match both "1,4", and "3,2" as well as "1,2" and "3,4".

So I'm wondering if that is something that someone has figured out a
solution, or something that I should open a JIRA issue for?


Re: Having problems with the java api in 1.4.0

2010-08-24 Thread Thomas Joiner
Is there any reason you aren't using http://wiki.apache.org/solr/Solrj to
interact with Solr?

On Tue, Aug 24, 2010 at 11:12 AM, Liz Sommers  wrote:

> I am very new to the solr/lucene world.  I am using solr 1.4.0 and cannot
> move to 1.4.1.
>
> I have to index about 50 fields for each document, these fields are already
> in key/value pairs by the time I get to my index methods.  I was able to
> index them with lucene without any problem, but found that I could not then
> read the indexes with solr/admin.  So, I decided to use Solr for my
> indexing.
>
> The error I am currently getting is
> java.lang.RuntimeException: Can't find resource 'synonyms.txt' in classpath
> or 'solr/conf'/'
>
> This exception is being thrown by SolrResourceLoader.openResource(line
> 260).
> which is called by IndexSchema (line 102)
>
> My code that leads up to this follows:
>
> 
> String path = "c:/swdev/apache-solr-1.4.0/IDW"
> SolrConfig cfg new SolrConfig(path + "/solr/conf/solrconfig.xml");
> schema = new IndexSchema(cfg,path + "/solr/conf/schema.xml",null);
>
> 
>
> This also fails if I use
> schema = new IndexSchema(cfg,"schema.xml",null);
>
>
> Any help would be greatly appreciated.
>
> Thank you
>
> Liz Sommers
> lizswo...@gmail.com
>


Re: how to deal with virtual collection in solr?

2010-08-26 Thread Thomas Joiner
I don't know about the shards, etc.

However I recently encountered that exception while indexing pdfs as well.
 The way that I resolved it was to upgrade to a nightly build of Solr. (You
can find them https://hudson.apache.org/hudson/view/Solr/job/Solr-trunk/).

The problem is that the version of Tika that 1.4.1 using is a very old
version of Tika, which uses a old version of PDFBox to do its parsing.  (You
might be able to fix the problem just by replacing the Tika jars...however I
don't know if there have been any API changes so I can't really suggest
that.)

We didn't upgrade to trunk in order for that functionality, but it was nice
that it started working. (The PDFs we'll be indexing won't be of later
versions, but a test file was).

On Thu, Aug 26, 2010 at 1:27 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
xiao...@mail.nlm.nih.gov> wrote:

> Thanks so much for your help, Jan Høydahl!
>
> I made multiple cores (aa public, aa private, bb public and bb private). I
> knew how to query them individually. Please tell me if I can do a
> combinations through shards parameter now. If yes, I tried to append
> &shards=aapub,bbpub after query string. Unfortunately it didn't work.
>
> Actually all of content is the same. I don't have "collection" field in xml
> files. Please tell me how I can set a "collection" field in schema and
> simply search collection through filter.
>
> I used curl to index pdf files. I use Solr 1.4.1. I got the following error
> when I index pdf with version 1.5 and 1.6.
>
> *
> 
> 
> 
> Error 500 
> 
> HTTP ERROR: 500org.apache.tika.exception.TikaException:
> Unexpected RuntimeException from
> org.apache.tika.parser.pdf.pdfpar...@134ae32
>
> org.apache.solr.common.SolrException:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.pdf.pdfpar...@134ae32
>at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
>at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>at org.mortbay.jetty.Server.handle(Server.java:285)
>at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>at
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
>at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
>at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
>at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: org.apache.tika.exception.TikaException: Unexpected
> RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32
>at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
>at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
>at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
>... 22 more
> Caused by: java.lang.NullPointerException
>at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
>at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
>at
> org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
>at
> org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
>at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
>at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:53)
>at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:51)
>at
> org.apac

Re: Problems indexing spatial field - undefined subField

2010-09-01 Thread Thomas Joiner
While you have already solved your problem, my guess as to why it didn't
work originally is that you probably didn't have a


What subFieldType does is it registers a dynamicField for you.
 subFieldSuffix requires that you have already defined that dynamicField.

On Tue, Aug 31, 2010 at 8:07 PM, Simon Wistow  wrote:

> On Wed, Sep 01, 2010 at 01:05:47AM +0100, me said:
> > I'm trying to index a latLon field.
> >
> >  subFieldSuffix="_latLon"/>
> > 
>
> Turns out changing it to
>
> 
>
> fixed it.
>
>
>


Handling Aggregate Records/Roll-up in Solr

2010-09-15 Thread Thomas Martin
Can someone point me to the mechanism in Sol that might allow me to
roll-up or aggregate records for display.  We have many items that are
similar and only want to show a representative record to the user until
they select that record.  

 

As an example - We carry a polo shirt and have 15 records that represent
the individual colors for that shirt.  Does the query API provide anyway
to rollup the records passed on a property or do we need to just flatten
the representation of the shirt in the data model.

 

 

 



Re: Null Pointer Exception while indexing

2010-09-16 Thread Thomas Joiner
My guess would be that Jetty has some configuration somewhere that is
telling it to use GCJ.  Is it possible to completely remove GCJ from the
system?  Another possibility would be to uninstall Jetty, and then reinstall
it, and hope that on the reinstall it would pick up on the OpenJDK.

What distro of linux are you using?  It probably depends on that how to set
the JVM.

On Thu, Sep 16, 2010 at 10:22 AM, andrewdps  wrote:

>
> Lance,
>
> We are on Solr Specification Version: 1.4.1
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Null-Pointer-Exception-while-indexing-tp1481154p1488320.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SOLR interface with PHP using javabin?

2010-09-16 Thread Thomas Joiner
If you wish to interface to Solr from PHP, and decide to go with Yonik's
suggestion to use JSON, I would suggest using
http://code.google.com/p/solr-php-client/

It has served my needs for the most part.

On Thu, Sep 16, 2010 at 1:33 PM, Yonik Seeley wrote:

> On Thu, Sep 16, 2010 at 2:30 PM, onlinespend...@gmail.com
>  wrote:
> >  I am planning on creating a website that has some SOLR search
> capabilities
> > for the users, and was also planning on using PHP for the server-side
> > scripting.
> >
> > My goal is to find the most efficient way to submit search queries from
> the
> > website, interface with SOLR, and display the results back on the
> website.
> >  If I use PHP, it seems that all the solutions use some form of character
> > based stream for the interface.  It would seem that using a binary
> > representation, such as javabin, would be more efficient.
> >
> > If using javabin, or some similar efficient binary stream to interface
> SOLR
> > with PHP is not possible, what do people recommend as the most efficient
> > solution that provides the best performance, even if that means not using
> > PHP and going with some other alternative?
>
> I'd recommend going with JSON - it will be quite a bit smaller than
> XML, and the parsers are generally quite efficient.
>
> -Yonik
> http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8
>


Re: Search the mailinglist?

2010-09-17 Thread Thomas Joiner
Also there is http://lucene.472066.n3.nabble.com/Solr-User-f472068.html if
you prefer a forum format.

On Fri, Sep 17, 2010 at 9:15 AM, Markus Jelsma wrote:

> http://www.lucidimagination.com/search/?q=
>
>
> On Friday 17 September 2010 16:10:23 alexander sulz wrote:
> >   Im sry to bother you all with this, but is there a way to search
> through
> > the mailinglist archive? Ive found
> > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ so far
> > but there isnt any convinient way to search through the archive.
> >
> > Thanks for your help
> >
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
>


  1   2   3   4   >