Optimize a Index

2011-01-07 Thread Jörg Agatz
Hallo, i have a Index withe 800.000 Dokuments, and now i hope it will be
Faster, if i optimize the Index, it sounds good ;-)

But i cant find an Example to Optimize one of milticors or all cors..


Maby one of you have a little example for that ..

King


Re: Input raw log file

2011-01-07 Thread Grijesh.singh

There is a csv update handler in solr you can use it by modifying your
logfile

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2210673.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: forward slash not working in my solr search

2011-01-07 Thread dhanesh

On 1/6/2011 2:45 PM, Grijesh.singh wrote:

use as pharase it will work like "Computer / IT" for you

Here IT is a stopword when you are trying query as

category:Computer / IT

parsed as category:Computer:IT
but IT is a stopword for default search field what you have selected

so second query removed and only category:Computer is showing

use your query as
category:"Computer / IT"

-
Grijesh

Hi Grijesh.singh
It worked... :) You helped me to fix this issue.
Thanks a lot

cheers
dhanesh s.r



Improving Solr performance

2011-01-07 Thread supersoft

have deployed a 5-sharded infrastructure where: shard1 has 3124422 docs
shard2 has 920414 docs shard3 has 602772 docs shard4 has 2083492 docs shard5
has 11915639 docs Indexes total size: 100GB

The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I
run the server using Jetty (from Solr example download) with: java -Xmx3024M
-Dsolr.solr.home=multicore -jar start.jar

The response time for a query is around 2-3 seconds. Nevertheless, if I
execute several queries at the same time the performance goes down
inmediately: 1 simultaneous query: 2516ms 2 simultaneous queries: 4250,4469
ms 3 simultaneous queries: 5781, 6219, 6219 ms 4 simultaneous queries: 6484,
7203, 7719, 7781 ms...

Using JConsole for monitoring the server java proccess I checked that Heap
Memory and the CPU Usages don't reach the upper limits so the server
shouldn't perform as overloaded. Can anyone give me an approach of how I
should tune the instance for not being so hardly dependent of the number of
simultaneous queries?

Thanks in advance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2210843.html
Sent from the Solr - User mailing list archive at Nabble.com.


Improving Solr performance

2011-01-07 Thread supersoft

have deployed a 5-sharded infrastructure where: shard1 has 3124422 docs
shard2 has 920414 docs shard3 has 602772 docs shard4 has 2083492 docs shard5
has 11915639 docs Indexes total size: 100GB

The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I
run the server using Jetty (from Solr example download) with: java -Xmx3024M
-Dsolr.solr.home=multicore -jar start.jar

The response time for a query is around 2-3 seconds. Nevertheless, if I
execute several queries at the same time the performance goes down
inmediately: 1 simultaneous query: 2516ms 2 simultaneous queries: 4250,4469
ms 3 simultaneous queries: 5781, 6219, 6219 ms 4 simultaneous queries: 6484,
7203, 7719, 7781 ms...

Using JConsole for monitoring the server java proccess I checked that Heap
Memory and the CPU Usages don't reach the upper limits so the server
shouldn't perform as overloaded. Can anyone give me an approach of how I
should tune the instance for not being so hardly dependent of the number of
simultaneous queries?

Thanks in advance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210842p2210842.html
Sent from the Solr - User mailing list archive at Nabble.com.


Saravanan Chinnadurai/Actionimages is out of the office.

2011-01-07 Thread Saravanan . Chinnadurai
I will be out of the office starting  07/01/2011 and will not return until
17/01/2011.

Please email to itsta...@actionimages.com  for any urgent issues.


Action Images is a division of Reuters Limited and your data will therefore be 
protected
in accordance with the Reuters Group Privacy / Data Protection notice which is 
available
in the privacy footer at www.reuters.com
Registered in England No. 145516   VAT REG: 397000555


Re: Improving Solr performance

2011-01-07 Thread Grijesh.singh

Some questions-
1-Are all shards on same machine
2-What is your Ram Size
3-What are the size of index on each shards in GB


-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2210878.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Improving Solr performance

2011-01-07 Thread supersoft

1 - Yes, all the shards are in the same machine
2 - The machine RAM is 7.8GB and I assign 3.4GB to Solr server
3 - The shards sizes (GB) are 17, 5, 3, 11, 64
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2211135.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Improving Solr performance

2011-01-07 Thread Grijesh.singh

shards are used when index size become huge and performance going down .
shards mean distributed indexes. But if you will put all shards on same
machine as multicore then it will not help too much on performance.

and also shards distributes indexes near equals in size.
There is also not enough Ram to perform better.If your all index can load in
Cache then it will give you better performance.

Also there are not equally distributed indexes so all shards have different
response time.
When working with shards please keep in mind that main searcher sends query
to all shards and waits for response from all shards and incorporate all
responses in a single result and returns. 

So if any of shards taking more time to response then your total response
time will affect

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2211226.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Improving Solr performance

2011-01-07 Thread Grijesh.singh

shards are used when index size become huge and performance going down .
shards mean distributed indexes. But if you will put all shards on same
machine as multicore then it will not help too much on performance.

and also shards distributes indexes near equals in size.
There is also not enough Ram to perform better.If your all index can load in
Cache then it will give you better performance.

Also there are not equally distributed indexes so all shards have different
response time.
When working with shards please keep in mind that main searcher sends query
to all shards and waits for response from all shards and incorporate all
responses in a single result and returns. 

So if any of shards taking more time to response then your total response
time will affect

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2211228.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Improving Solr performance

2011-01-07 Thread Hong-Thai Nguyen
Hi,

Always interesting question! Anyone could purpose a generic (and approximate) 
equation:

Search_time = F(Nb_of_servers, RAM_size_per_server, CPU_of_servers, 
Nb_of_shards, Nb_of_documents, Total_size_of_documents or 
Average_size_of_a_document, Nb_requests_in_minute, Nb_indexed_fields_in_index, 
...) ?

Regards,

---
Hong-Thai
-Message d'origine-
De : Grijesh.singh [mailto:pintu.grij...@gmail.com] 
Envoyé : vendredi 7 janvier 2011 12:29
À : solr-user@lucene.apache.org
Objet : Re: Improving Solr performance


shards are used when index size become huge and performance going down .
shards mean distributed indexes. But if you will put all shards on same
machine as multicore then it will not help too much on performance.

and also shards distributes indexes near equals in size.
There is also not enough Ram to perform better.If your all index can load in
Cache then it will give you better performance.

Also there are not equally distributed indexes so all shards have different
response time.
When working with shards please keep in mind that main searcher sends query
to all shards and waits for response from all shards and incorporate all
responses in a single result and returns. 

So if any of shards taking more time to response then your total response
time will affect

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2211228.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Improving Solr performance

2011-01-07 Thread Grijesh.singh

open a new mail conversation for that

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2211300.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Improving Solr performance

2011-01-07 Thread supersoft

The reason of this distribution is the kind of the documents. In spite of
having the same schema structure (and solr conf), a document belongs to 1 of
5 different kinds. 

Each kind corresponds to a concrete shard and due to this, the implemented
client tool avoids searching in all the shards when the users selects just
one or a few of kinds. The tool runs a multisharded query of the proper
shards. I guess this is a right approach but correct me if I am wrong.

The real problem of this architecture is the correlation between concurrent
users and response time:
1 query: n seconds
2 queries: 2*n second each query
3 queries: 3*n seconds each query
and so...

This is being a real headache because 1 single query has an acceptable
response time but when many users are accessing to the server the
performance goes hardly down.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2211305.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH Transformer

2011-01-07 Thread Bernd Fehling
Hi list,

currently the Transformers return row but can I skip
or drop a row from the Transformer?

If so, what should I return in that case, an empty row?

Regards,
Bernd



Re: Improving Solr performance

2011-01-07 Thread François Schiettecatte
It sounds like your system is I/O bound and I suspect (bet even) that all your 
index files are on the same disk drive. Also you have only 8GB of RAM for 100GB 
of index, so while your SOLR instance will cache some stuff and the balance 
will be used for caching file blocks, there really isn't enough memory there 
for effective caching.

I would suggest you check your machine's performance with something like atop ( 
http://www.atoptool.nl/ ) to see where your bottlenecks are (check the disk 
I/O). As I said I think you are I/O bound, and if all your shards are on the 
same drive there will be I/O contention when running simultaneous searches.

Your solutions are (in rough ascending order of cost):

- make your indices smaller (reduce disk I/O)

- buy more drives and spread your indices across the drives (reduce contention).

- buy more RAM (increase caching).

- buy more machines (more throughput).

Good luck!

François


On Jan 7, 2011, at 4:57 AM, supersoft wrote:

> 
> have deployed a 5-sharded infrastructure where: shard1 has 3124422 docs
> shard2 has 920414 docs shard3 has 602772 docs shard4 has 2083492 docs shard5
> has 11915639 docs Indexes total size: 100GB
> 
> The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I
> run the server using Jetty (from Solr example download) with: java -Xmx3024M
> -Dsolr.solr.home=multicore -jar start.jar
> 
> The response time for a query is around 2-3 seconds. Nevertheless, if I
> execute several queries at the same time the performance goes down
> inmediately: 1 simultaneous query: 2516ms 2 simultaneous queries: 4250,4469
> ms 3 simultaneous queries: 5781, 6219, 6219 ms 4 simultaneous queries: 6484,
> 7203, 7719, 7781 ms...
> 
> Using JConsole for monitoring the server java proccess I checked that Heap
> Memory and the CPU Usages don't reach the upper limits so the server
> shouldn't perform as overloaded. Can anyone give me an approach of how I
> should tune the instance for not being so hardly dependent of the number of
> simultaneous queries?
> 
> Thanks in advance
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2210843.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Optimize a Index

2011-01-07 Thread Erick Erickson
Hmmm, certainly try optimize, but often the problem
is in how you query. Providing some examples of
slow queries and the time it takes to get them would help.

Also, running with &debugQuery=on will show you a QTime
field in the response header that is the number of milliseconds
the actual query took as opposed to assembling the output.

Also, the first few queries may spend time filling caches,
especially if you're sorting.

Best
Erick

On Fri, Jan 7, 2011 at 4:02 AM, Jörg Agatz wrote:

> Hallo, i have a Index withe 800.000 Dokuments, and now i hope it will be
> Faster, if i optimize the Index, it sounds good ;-)
>
> But i cant find an Example to Optimize one of milticors or all cors..
>
>
> Maby one of you have a little example for that ..
>
> King
>


Lucene Scorer Extension?

2011-01-07 Thread dante stroe
Hello,

 What I am trying to do is build a personalized search engine. The aim
is to have the resulting documents' scores depend on users' preferences.
I've already built some Solr plugins (request handlers mainly), however I am
not sure that what I am trying to do can be achieved by a plugin.
In short, for each query, for each document, I would like to multiply the
relevance score of each document(at scoring time of course) by the result of
a function between some of document's fields values and the user's
preferences (these users preferences will most likely be loaded in memory
when the plugin initializes). Of course, I need a new request handler to
take the userID as a query parameter, but I am not sure on how to access
each document at scoring time in order to update the score based on
his preferences. Any ideas? (I have looked over
this
and after
looking at the code as well, it doesn't look so trivial ... has anybody else
tried something similar?)

Cheers,
Dante


Re: Improving Solr performance

2011-01-07 Thread Toke Eskildsen
On Fri, 2011-01-07 at 10:57 +0100, supersoft wrote:

[5 shards, 100GB, ~20M documents]

...

[Low performance for concurrent searches]

> Using JConsole for monitoring the server java proccess I checked that Heap
> Memory and the CPU Usages don't reach the upper limits so the server
> shouldn't perform as overloaded.

If memory and CPU is okay, the culprit is I/O.

Solid state Drives has more than proven their worth for random access
I/O, which is used a lot when searching with Solr/Lucene. SSD's are
plug-in replacements for harddrives and they virtually eliminate I/O
performance bottlenecks when searching. This also means shortened warm
up requirements and less need for disk caching. Expanding RAM capacity
does not scale well and requires extensive warmup. Adding more machines
is expensive and often require architectural changes. With the current
prices for SSD's, I consider them the generic first suggestion for
improving search performance.

Extra spinning disks improves the query throughput in general and speeds
up single queries when the chards are searched in parallel. They do not
help much for a single sequential searching of shards as the seek time
for a single I/O request is the same regardless of the number of drives.
If your current response time for a single user is satisfactory, adding
drives is a viable solution for you. I'll still recommend the SSD option
though, as it will also lower the response time for a single query.

Regards,
Toke Eskildsen



Re: Improving Solr performance

2011-01-07 Thread mike anderson
Making sure the index can fit in memory (you don't have to allocate that
much to Solr, just make sure it's available to the OS so it can cache it --
otherwise you are paging the hard drive, which is why you are probably IO
bound) has been the key to our performance. We recently opted to use less
RAM and store the indices on SSDs, we're still evaluating this approach but
so far it seems to be comparable, so I agree with Toke! (We have 18 shards
and over 100GB of index).

On Fri, Jan 7, 2011 at 10:07 AM, Toke Eskildsen wrote:

> On Fri, 2011-01-07 at 10:57 +0100, supersoft wrote:
>
> [5 shards, 100GB, ~20M documents]
>
> ...
>
> [Low performance for concurrent searches]
>
> > Using JConsole for monitoring the server java proccess I checked that
> Heap
> > Memory and the CPU Usages don't reach the upper limits so the server
> > shouldn't perform as overloaded.
>
> If memory and CPU is okay, the culprit is I/O.
>
> Solid state Drives has more than proven their worth for random access
> I/O, which is used a lot when searching with Solr/Lucene. SSD's are
> plug-in replacements for harddrives and they virtually eliminate I/O
> performance bottlenecks when searching. This also means shortened warm
> up requirements and less need for disk caching. Expanding RAM capacity
> does not scale well and requires extensive warmup. Adding more machines
> is expensive and often require architectural changes. With the current
> prices for SSD's, I consider them the generic first suggestion for
> improving search performance.
>
> Extra spinning disks improves the query throughput in general and speeds
> up single queries when the chards are searched in parallel. They do not
> help much for a single sequential searching of shards as the seek time
> for a single I/O request is the same regardless of the number of drives.
> If your current response time for a single user is satisfactory, adding
> drives is a viable solution for you. I'll still recommend the SSD option
> though, as it will also lower the response time for a single query.
>
> Regards,
> Toke Eskildsen
>
>


Re: solrconfig luceneMatchVersion 2.9.3

2011-01-07 Thread Johannes Goll
according to
http://www.mail-archive.com/solr-user@lucene.apache.org/msg40491.html

there is no more trunk support for 2.9 indexes.

So I tried the suggested solution to execute an optimize to convert a 2.9.3
index to a 3.x index.

However, when I tried to the optimize a 2.9.3 index using the Solr 4.0 trunk
version with luceneMatchVersion set to LUCENE_30 in the solrconfig.xml,
I am getting

SimplePostTool: POSTing file optimize.xml
SimplePostTool: FATAL: Solr returned an error: Severe errors in solr
configuration.  Check your log files for more detailed information on what
may be wrong.  -
java.lang.RuntimeException:
org.apache.lucene.index.IndexFormatTooOldException: Format version is not
supported in file '_0.fdx': 1 (needs to be between 2 and 2). This version of
Lucene only supports indexes created with release 3.0 and later.

Is there any other mechanism for converting index files to 3.x?



2011/1/6 Johannes Goll 

> Hi,
>
> our index files have been created using Lucene 2.9.3 and solr 1.4.1.
>
> I am trying to use a patched version of the current trunk (solr 1.5.0 ? ).
> The patched version works fine with newly generated index data but
> not with our existing data:
>
> After adjusting the solrconfig.xml  - I added the line
>
>   LUCENE_40
>
> also tried
>
>   LUCENE_30
>
> I am getting the following exception
>
> "java.lang.RuntimeException: 
> org.apache.lucene.index.IndexFormatTooOldException:
> Format version is not supported in file '_q.fdx': 1 (needs to be between 2 
> and 2)"
>
> When I try to change it to
>
>   LUCENE_29
>
> or
>
>   2.9
>
> or
>
>   2.9.3
>
> I am getting
>
> "SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion
> '2.9', valid values are: [LUCENE_30, LUCENE_31, LUCENE_40, LUCENE_CURRENT]
> or a string in format 'V.V'"
>
> Do you know a way to make this work with Lucene version 2.9.3 ?
>
> Thanks,
> Johannes
>



-- 
Johannes Goll
211 Curry Ford Lane
Gaithersburg, Maryland 20878


RE: Custom match scoring

2011-01-07 Thread Nelson Branco
Hum, if so, it may resolve the problem. I didn't know that. I´ll take a
look.

Thanks.

--
Nelson Branco
SAPO Mapas/GIS


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: quarta-feira, 5 de Janeiro de 2011 02:12
To: solr-user@lucene.apache.org
Subject: RE: Custom match scoring


: Yes, I already looked dismax which I'm  using for other purposes, the big
: deal for this problem is having summed only the best match of each field.
In
: dismax it sum all matches on each field.

can you describe what you want in pusedo code?

what you are describing sounds exactly like using the dismax parser with 
tie=0.  that way each "clause" of the input only gets the max score from 
each of hte fields in the qf param. 

dismax doesn't sum all matches on each field, it sums the *max* match on 
each field, plus a tie breaker multiplier times the sum of all other 
matches on each field -- if tie=0 it's a true disjunction max query, if 
tie=1 it's a true disjunction sum query.

-Hoss


smime.p7s
Description: S/MIME cryptographic signature


RE: Custom match scoring

2011-01-07 Thread Nelson Branco
Ok, I have look at it and it almost solves my problem...

My rules list demand to counting only once each token, not each field...

Any idea on how it can be done?

Currently I'm using pure logic to accomplish this, more or less like
"FieldA:token1 OR (FieldB:token1 AND -FieldA:token1) OR (FieldC:token1 AND
-FieldB:token1 AND -FieldA:token1)" 

--
Nelson Branco
SAPO Mapas/GIS


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: quarta-feira, 5 de Janeiro de 2011 02:12
To: solr-user@lucene.apache.org
Subject: RE: Custom match scoring


: Yes, I already looked dismax which I'm  using for other purposes, the big
: deal for this problem is having summed only the best match of each field.
In
: dismax it sum all matches on each field.

can you describe what you want in pusedo code?

what you are describing sounds exactly like using the dismax parser with 
tie=0.  that way each "clause" of the input only gets the max score from 
each of hte fields in the qf param. 

dismax doesn't sum all matches on each field, it sums the *max* match on 
each field, plus a tie breaker multiplier times the sum of all other 
matches on each field -- if tie=0 it's a true disjunction max query, if 
tie=1 it's a true disjunction sum query.

-Hoss


smime.p7s
Description: S/MIME cryptographic signature


DIH - Closing ResultSet in JdbcDataSource

2011-01-07 Thread Shane Perry
Hi,

I am in the process of migrating our system from Postgres 8.4 to Solr
1.4.1.  Our system is fairly complex and as a result, I have had to define
19 base entities in the data-config.xml definition file.  Each of these
entities executes 5 queries.  When doing a full-import, as each entity
completes, the server hosting Postgres shows 5 "idle in transaction" for the
entity.

In digging through the code, I found that the JdbcDataSource wraps the
ResultSet object in a custom ResultSetIterator object, leaving the ResultSet
open.  Walking through the code I can't find a close() call anywhere on the
ResultSet.  I believe this results in the "idle in transaction" processes.

Am I off base here?  I'm not sure what the overall implications are of the
"idle in transaction" processes, but is there a way I can get around the
issue without importing each entity manually?  Any feedback would be greatly
appreciated.

Thanks in advance,

Shane


How do I troubleshoot Schema / Document mismatches?

2011-01-07 Thread danieltalsky

When I use the post.jar tool, I don't get any meaningful errors if there's
some kind of mismatch between the schema and the XML 's I'm loading.

All I get is:
FATAL: Solr returned an error: Internal Server Error

There's no information about what fields were missing, additional fields,
wrong data, mismatch between schema and data... anything!  Is there a better
way to post data that allows for more meaningful error messages?  How do I
find where to start?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-troubleshoot-Schema-Document-mismatches-tp2213495p2213495.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: schema.xml in other than conf folder

2011-01-07 Thread Chris Hostetter

:   Thanks for your response. Our production environment is a read only file
: system. It is not allowing to modify or create new files under conf folder
: at runtime. So copy config through replication is not working for us.

if your entire production server is read only, where do you keep your 
indexes? how do you update them?

more specificly: if your conf directory is read only, what directory do 
you *want* to keep you schema.xml in that you cna write to?  why not put 
your conf directory there?

In general: schema.xml and solrconfig.xml can be loaded from the classpath 
as well as from the conf dir (at least: they use to be, but with a lot of 
the Solr Cloud stuff i'm not 100% certain that's still true)

-Hoss


Re: How do I troubleshoot Schema / Document mismatches?

2011-01-07 Thread Ahmet Arslan
> When I use the post.jar tool, I don't get any meaningful
> errors if there's
> some kind of mismatch between the schema and the XML
> 's I'm loading.
> 
> All I get is:
> FATAL: Solr returned an error: Internal Server Error
> 
> There's no information about what fields were missing,
> additional fields,
> wrong data, mismatch between schema and data...
> anything!  Is there a better
> way to post data that allows for more meaningful error
> messages?  How do I
> find where to start?

Detailed explanation goes to logs of servlet container.





Re: solrconfig luceneMatchVersion 2.9.3

2011-01-07 Thread Chris Hostetter

: there is no more trunk support for 2.9 indexes.
: 
: So I tried the suggested solution to execute an optimize to convert a 2.9.3
: index to a 3.x index.
: 
: However, when I tried to the optimize a 2.9.3 index using the Solr 4.0 trunk
: version with luceneMatchVersion set to LUCENE_30 in the solrconfig.xml,
: I am getting

the part about there being no trunk support for 2.9? .. that means there 
is no support at all -- you can't use trunk, with version compat set to 
LUCENE_30, to open a 2.9 index in any way.  the code doesn't exist.

you have to use the code on the 3x branch of lucene to open/optimize the 
2.9 index in order to get it into the LUCENE_30 format.  then you *may* 
be able to open thta using the code on teh trunk.

Honestly though: i'm not sure even that will actually work for you long 
term -- lots of things are still changing on the trunk, including index 
format, and the exact process for migrating from a 3x index to a 4x index 
is up in the air.

See in particular this thread...

http://lucene.472066.n3.nabble.com/Index-compatibility-1-4-Vs-3-1-Trunk-td1016232.html

: 
: SimplePostTool: POSTing file optimize.xml
: SimplePostTool: FATAL: Solr returned an error: Severe errors in solr
: configuration.  Check your log files for more detailed information on what
: may be wrong.  -
: java.lang.RuntimeException:
: org.apache.lucene.index.IndexFormatTooOldException: Format version is not
: supported in file '_0.fdx': 1 (needs to be between 2 and 2). This version of
: Lucene only supports indexes created with release 3.0 and later.
: 
: Is there any other mechanism for converting index files to 3.x?
: 
: 
: 
: 2011/1/6 Johannes Goll 
: 
: > Hi,
: >
: > our index files have been created using Lucene 2.9.3 and solr 1.4.1.
: >
: > I am trying to use a patched version of the current trunk (solr 1.5.0 ? ).
: > The patched version works fine with newly generated index data but
: > not with our existing data:
: >
: > After adjusting the solrconfig.xml  - I added the line
: >
: >   LUCENE_40
: >
: > also tried
: >
: >   LUCENE_30
: >
: > I am getting the following exception
: >
: > "java.lang.RuntimeException: 
org.apache.lucene.index.IndexFormatTooOldException:
: > Format version is not supported in file '_q.fdx': 1 (needs to be between 2 
and 2)"
: >
: > When I try to change it to
: >
: >   LUCENE_29
: >
: > or
: >
: >   2.9
: >
: > or
: >
: >   2.9.3
: >
: > I am getting
: >
: > "SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion
: > '2.9', valid values are: [LUCENE_30, LUCENE_31, LUCENE_40, LUCENE_CURRENT]
: > or a string in format 'V.V'"
: >
: > Do you know a way to make this work with Lucene version 2.9.3 ?
: >
: > Thanks,
: > Johannes
: >
: 
: 
: 
: -- 
: Johannes Goll
: 211 Curry Ford Lane
: Gaithersburg, Maryland 20878
: 

-Hoss


Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-07 Thread Adam Estrada
This is my configuration which seems to work just fine.



  
  

>From there it's just a matter of running the select statement and mapping it
against the correct fields in your index.

Adam

On Fri, Jan 7, 2011 at 2:40 PM, Shane Perry  wrote:

> Hi,
>
> I am in the process of migrating our system from Postgres 8.4 to Solr
> 1.4.1.  Our system is fairly complex and as a result, I have had to define
> 19 base entities in the data-config.xml definition file.  Each of these
> entities executes 5 queries.  When doing a full-import, as each entity
> completes, the server hosting Postgres shows 5 "idle in transaction" for
> the
> entity.
>
> In digging through the code, I found that the JdbcDataSource wraps the
> ResultSet object in a custom ResultSetIterator object, leaving the
> ResultSet
> open.  Walking through the code I can't find a close() call anywhere on the
> ResultSet.  I believe this results in the "idle in transaction" processes.
>
> Am I off base here?  I'm not sure what the overall implications are of the
> "idle in transaction" processes, but is there a way I can get around the
> issue without importing each entity manually?  Any feedback would be
> greatly
> appreciated.
>
> Thanks in advance,
>
> Shane
>


Indexing Issue between Mac OS X 10.5 and 10.6

2011-01-07 Thread Kevin Murdoff
Greetings Everyone -

I am hoping someone can help me with this unusual issue I have here.

Issue
Indexing information in a database (i.e.  /dataimport [full-import]) succeeds 
when I perform this function on a Mac OS X 10.6 with Java 1.6, but fails when I 
attempt the same indexing task on a 10.5 / Java 1.5 server.  When the indexing 
succeeds, I end up with 211,095 documents.  When the indexing fails (on the 
10.5 machine), I end up with 58,286 documents.  The error I receive in the 
Tomcat 'catalina.out' log file is:

SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.StackOverflowError
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: java.lang.StackOverflowError
at com.frontbase.jdbc.FBJRowHandler.close(Unknown Source)
at com.frontbase.jdbc.FBJRowHandler.close(Unknown Source)
...

Background
I want to index the database information as a single document in Solr 1.4.1.  
The document, as defined in the 'data-config.xml' file, has 10 entities, each 
with 5 primitive fields and 2 entity fields.  Most of these 10 entities do not 
represent very large datasets except one, which could represent over 95% of the 
result set.

I have tried tweaking the configuration values in the  section of 
the 'solrconfig.xml' file.  I lowered the  from 10,000 to 100, 
and lowered the  from 10 to 5.  Making these changes, 
independently and together, did not exhibit any change in the indexing failures 
I have been experiencing.

I expanded the JVM min/max memory settings using -Xms and -Xmx set as high as 
1024/2048 respectively.

I also obtained the Solr-1.4.1 release source code, built it on the 10.5 /1.5 
server machine, and performed the same indexing task.  This resulted in the 
same stack overflow error.

Inquiry
Can someone tell me if they have experienced something similar?  If so, did you 
find a solution?  Or, does anyone know what may be causing these stack overflow 
errors?

Please let me know what other information I can provide that would be useful.

Thank you for your help!

- KFM



Solr indexing socket timeout errors

2011-01-07 Thread Burton-West, Tom
Hello all,

We are getting intermittent socket timeout errors (see below).  Out of about 
600,000 indexing requests, 30 returned these socket timeout errors.  We haven't 
been able to correlate these with large merges, which tends to slow down the 
indexing response rate.

Does anyone know where we might look to determine the cause?

Tom

Tom Burton-West

Jan 7, 2011 2:31:07 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: [was class java.net.SocketTimeoutException] 
Read timed out
at 
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:279)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1354)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at 
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:777)
at 
org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRead(InternalInputBuffer.java:807)
at 
org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInputFilter.java:116)
at 
org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:742)
at org.apache.coyote.Request.doRead(Request.java:419)
at 
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:270)
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:403)
  at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:293)
at 
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:193)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at 
com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
at 
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
at 
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at 
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
... 24 more




Internal Server Error when indexing a pdf file

2011-01-07 Thread Alessandro Marino
Hi,
I was trying to use Solr Cell (through the Java API) to index a pdf file.
The class has been extracted from
http://wiki.apache.org/solr/ContentStreamUpdateRequestExample

public class Solr {
  public static void main(String[] args) {
try {
  String solrId = "beautiful_stm.pdf";
  indexFilesSolrCell(solrId);

} catch (Exception ex) {
  ex.printStackTrace();
}
  }

  public static void indexFilesSolrCell(String solrId)
throws IOException, SolrServerException {

String urlString = "http://localhost:8080/solr";;
SolrServer solr = new CommonsHttpSolrServer(urlString);

ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");

up.addFile(new File("Documents/"+solrId));

up.setParam("literal.id", solrId);
up.setParam("uprefix", "attr_");
up.setParam("fmap.content", "attr_content");

up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

solr.request(up);
  }
}

At runtime I get the exception below:

org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request:
http://localhost:8080/solr/update/extract?literal.id=beautiful_stm.pdf&uprefix=attr_&fmap.content=attr_content&commit=true&waitFlush=true&waitSearcher=true&wt=javabin&version=1


What could be the problem? I've tried with various pdf file with different
dimensions but I always get an internal server error.
I've installed Solr (version 1.4) on Tomcat (version 6.0.20) following the
directions at http://wiki.apache.org/solr/SolrTomcat.

Thanks and regards,
Alex


highlighting not working with Solr 3.0 trunk?

2011-01-07 Thread Teruhiko Kurosaka
I've downloaded 
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x
and ran ant there.  I've followed the tutorial but 
highlighting on analyzer debug screen isn't working.

This link found in the tutorial doesn't show any highlight.
http://localhost:8983/solr/admin/analysis.jsp?name=name&highlight=on&val=Canon+Power-Shot+SD500&qval=Powershot%20sd-500

The same link works well with Solr 1.4.1. 

Teruhiko "Kuro" Kurosaka