Re: Testing Solr4 - first impressions and problems

2012-10-15 Thread Alan Woodward
Hi Shawn,

The transaction log is only being used to support near-real-time search at the 
moment, I think, so it sounds like it's surplus to requirements for your 
use-case.  I'd just turn it off.

Alan Woodward
www.romseysoftware.co.uk

On 15 Oct 2012, at 07:04, Shawn Heisey wrote:

> On 10/14/2012 5:45 PM, Erick Erickson wrote:
>> About your second point. Try committing more often with openSearcher
>> set to false.
>> There's a bit here:
>> http://wiki.apache.org/solr/SolrConfigXml
>> 
>> 
>>   1 
>>   15000 
>>   false 
>> 
>> 
>> 
>> That should keep the size of the transaction log down to reasonable levels...
> 
> I have autocommit turned completely off -- both values set to zero.  The DIH 
> import from MySQL, over 12 million rows per shard, is done in one go on all 
> my build cores at once, then I swap cores.  It takes a little over three 
> hours and produces a 22GB index.  I have batchSize set to -1 so that jdbc 
> streams the records.
> 
> When I first set this up back on 1.4.1, I had some kind of severe problem 
> when autocommit was turned on.  I can no longer remember what it caused, but 
> it was a huge showstopper of some kind.
> 
> Thanks,
> Shawn
> 



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Alan Woodward
The extra codecs are supplied in a separate jar file now 
(lucene-codecs-4.0.0.jar) - I guess this isn't being packaged into solr.war by 
default?  You should be able to download it here:

http://search.maven.org/remotecontent?filepath=org/apache/lucene/lucene-codecs/4.0.0/lucene-codecs-4.0.0-javadoc.jar

 and drop it into the lib/ directory.

On 15 Oct 2012, at 00:49, Shawn Heisey wrote:

> On 10/14/2012 3:21 PM, Rafał Kuć wrote:
>> Hello!
>> 
>> Try adding the following to solrconfig.xml:
>> 
>> 
> 
> I did this and got a little further, but still no go.  From what it's saying 
> now, I don't think it will be possible in the current state of branch_4x to 
> use anything but the default.
> 
> SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type 
> org.apache.lucene.codecs.PostingsFormat with name 'Block' does not exist. You 
> need to add the corresponding JAR file supporting this SPI to your 
> classpath.The current classpath supports the following names: [Lucene40]
> 
> I saw that LUCENE-4446 was applied to branch_4x a few hours ago. I did 'svn 
> up' and rebuilt Solr.  Trying again, it appears to be using Lucene41, which I 
> believe is the Block format.  But when I tried to change the format for my 
> unique key fields to Bloom, that still didn't work.  Is this something I 
> should file an issue on?
> 
> SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type 
> org.apache.lucene.codecs.PostingsFormat with name 'Bloom' does not exist. You 
> need to add
> the corresponding JAR file supporting this SPI to your classpath.The current 
> classpath supports the following names: [Lucene40, Lucene41]
> 
> Thanks,
> Shawn
> 



Re: Multicore setup is ignored when deploying solr.war on Tomcat 5/6/7

2012-10-15 Thread Vadim Kisselmann
Hi Rogerio,
i can imagine what it is. Tomcat extract the war-files in
/var/lib/tomcatXX/webapps.
If you already run an older Solr-Version on your server, the old
extracted Solr-war could still be there (keyword: tomcat cache).
Delete the /var/lib/tomcatXX/webapps/solr - folder and restart tomcat,
when Tomcat should put your new war-file.
Best regards
Vadim



2012/10/14 Rogerio Pereira :
> I'll try to be more specific Jack.
>
> I just download the apache-solr-4.0.0.zip, from this archive I took the
> core1 and core2 folders from multicore example and rename them to
> collection1 and collection2, I also did all necessary changes on solr.xml
> and solrconfig.xml and schema.xml on these two correct to reflect the new
> names.
>
> After this step I just tried to deploy and war file on tomcat pointing to
> the the directory (solr/home) where these two cores are located, solr.xml
> is there, with collection1 and collection2 properly configured.
>
> The question is, now matter what is contained on solr.xml, this file isn't
> read at Tomcat startup, I tried to cause a parser error on solr.xml by
> removing closing tags, but even with this change I can't get at least a
> parser error.
>
> I hope to be clear now.
>
>
> 2012/10/14 Jack Krupansky 
>
>> I can't quite parse "the same multicore deployment as we have on apache
>> solr 4.0 distribution archive". Could you rephrase and be more specific.
>> What "archive"?
>>
>> Were you already using 4.0-ALPHA or BETA (or some snapshot of 4.0) or are
>> you moving from pre-4.0 to 4.0? The directory structure did change in 4.0.
>> Look at the example/solr directory.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Rogerio Pereira
>> Sent: Sunday, October 14, 2012 10:01 AM
>> To: solr-user@lucene.apache.org
>> Subject: Multicore setup is ignored when deploying solr.war on Tomcat 5/6/7
>>
>>
>> Hi,
>>
>> I tried to perform the same multicore deployment as we have on apache solr
>> 4.0 distribution archive, I created a directory for solr/home with solr.xml
>> inside and two subdirectories collection1 and collection2, these two cores
>> are properly configured with conf folder and solrconfi.xml and schema.xml,
>> on Tomcat I setup the system property pointing to solr/home path,
>> unfortunatelly when I start tomcat the solr.xml is ignored and only the
>> default collection1 is loaded.
>>
>> As a test, I made changes on solr.xml to cause parser errors, and guess
>> what? These errors aren't reported on tomcat startup.
>>
>> The same thing doesn't happens on multicore example that comes on
>> distribution archive, now I'm trying to figure out what's the black magic
>> happening.
>>
>> Let me do the same kind of deployment on Windows and Mac OSX, if persist,
>> I'll update this thread.
>>
>> Regards,
>>
>> Rogério
>>
>
>
>
> --
> Regards,
>
> Rogério Pereira Araújo
>
> Blogs: http://faces.eti.br, http://ararog.blogspot.com
> Twitter: http://twitter.com/ararog
> Skype: rogerio.araujo
> MSN: ara...@hotmail.com
> Gtalk/FaceTime: rogerio.ara...@gmail.com
>
> (0xx62) 8240 7212
> (0xx62) 3920 2666


Re: add shard to index

2012-10-15 Thread Radim Kolar



Can you share more please?

i do not know how exactly is formula for calculating ratio.

if you have something like: (term count in shard 1 + term count in shard 
2) / num documents in all shards


then just use shard size as weight while computing this:

(term count in shard 1 * shard1 keyspace size + term count in shard 2 * 
shard2 keyspace size) / (num documents in all shards * all shards 
keyspace size)


Solr reports: "Can not read response from server" when running import

2012-10-15 Thread Romita Saha
Hi,

I am trying to import a mysql database :  sampledatabase.

When I run the full import command 
http://localhost:8983/solr/db/dataimport?command=full-import in the 
browser, I get the following error in the terminal after about 1 minute. 

Oct 16, 2012 3:49:20 PM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/dataimport params={command=full-import} 
status=0 QTime=0
Oct 16, 2012 3:49:20 PM org.apache.solr.common.SolrException log
SEVERE: Exception while processing: customer document : 
SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException
: Unable to execute query: select contactLastName from customers 
Processing Document # 1
.
.
.
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: 
Communications link failure

The last packet sent successfully to  the server was 0 milliseconds ago. 
The driver has not received and packets from server.
at sun.reflect.NativeConstructorAccessImpl.newInstance0(Native 
Method)
.
.
.
Caused by: java.io.EOFException: Can not read response from server. 
Expected 
to read 4 bytes, read 0 bytes before connection was unexpectedly lost. 
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3039) 
at com.mysql.jdbc.MysqlIO.readPacket(MysqlIO.java:592) 
... 31 more 
.
.
.

My dataconfig.xml file looks like the following :

- 
   
- 
- 
   
  
  
  

Relevant portion of schema.xml  :

 

Relevant portion of solarconfig.xml  : 

- 
- 
  data-config.xml 
  
  


Kindly let me know what can be the issue.

Thanks and regards,
Romita Saha

Panasonic R&D Center Singapore
Blk 1022 Tai Seng Avenue #06-3530
Tai Seng Ind. Est. Singapore 534415
DID: (65) 6550 5383 FAX: (65) 6550 5459
email: romita.s...@sg.panasonic.com

Re: Solr reports: "Can not read response from server" when running import

2012-10-15 Thread Dave Meikle
Hi, 

On 15 Oct 2012, at 11:02, Romita Saha  wrote:

> My dataconfig.xml file looks like the following :
> 
> - 
>   url="jdbc:mysql://localhost:8983/home/demo/snp-comm/sampledatabase" /> 
> - 
> - 
>   
>  
>  
>  

The error information means that the connection wasn't accepted by the server.  
I would make sure that a) your connection URL is correct as it looks wrong to 
me - i.e. your database name in the URL looks like a path[1] - and b) your 
binding address is correct in your config file (my.cnf) and your associated 
host/DNS entries would let you resolve it. 

Cheers,
Dave

[1] 
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html

Re: Solr reports: "Can not read response from server" when running import

2012-10-15 Thread Romita Saha
Hi  Dave,

Thank you for your prompt reply.  The name of the database am using is 
sampledatabase.sql and it is located in home/demo/snp-comm folder. Hence I 
have specified the url as  

url="jdbc:mysql://localhost:8983/home/demo/snp-comm/sampledatabase.sql" /> 



  Could you please specify which conf file i need to look into? 

Thanks and regards,
Romita 



From:   Dave Meikle 
To: solr-user@lucene.apache.org, 
Date:   10/15/2012 06:24 PM
Subject:Re: Solr reports: "Can not read response from server" when 
running import



Hi, 

On 15 Oct 2012, at 11:02, Romita Saha  
wrote:

> My dataconfig.xml file looks like the following :
> 
> - 
>   url="jdbc:mysql://localhost:8983/home/demo/snp-comm/sampledatabase" /> 
> - 
> - 
>   
>  
>  
>  

The error information means that the connection wasn't accepted by the 
server.  I would make sure that a) your connection URL is correct as it 
looks wrong to me - i.e. your database name in the URL looks like a 
path[1] - and b) your binding address is correct in your config file 
(my.cnf) and your associated host/DNS entries would let you resolve it. 

Cheers,
Dave

[1] 
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html


Re: Solr reports: "Can not read response from server" when running import

2012-10-15 Thread Dave Meikle
Hi Romita,

On 15 Oct 2012, at 11:46, Romita Saha  wrote:

> Thank you for your prompt reply.  The name of the database am using is 
> sampledatabase.sql and it is located in home/demo/snp-comm folder. Hence I 
> have specified the url as  
> 
> url="jdbc:mysql://localhost:8983/home/demo/snp-comm/sampledatabase.sql" /> 

I suspect this is your problem in that the MySQL JDBC driver is expecting to 
connect to a server where this database is hosted as opposed to the file you 
have specified.

I assume from the name the sampledatabase.sql is just a SQL script, so I 
suggest you load that into a MySQL and then connect to the database on that 
server.

Cheers,
Dave




Re: Multicore setup is ignored when deploying solr.war on Tomcat 5/6/7

2012-10-15 Thread Rogério Pereira Araújo

Hi Vadim,

In fact tomcat is running in another non standard path, there's no old 
version deployed on tomcat, I double checked it.


Let me try in another environment.

-Mensagem Original- 
From: Vadim Kisselmann

Sent: Monday, October 15, 2012 6:01 AM
To: solr-user@lucene.apache.org ; rogerio.ara...@gmail.com
Subject: Re: Multicore setup is ignored when deploying solr.war on Tomcat 
5/6/7


Hi Rogerio,
i can imagine what it is. Tomcat extract the war-files in
/var/lib/tomcatXX/webapps.
If you already run an older Solr-Version on your server, the old
extracted Solr-war could still be there (keyword: tomcat cache).
Delete the /var/lib/tomcatXX/webapps/solr - folder and restart tomcat,
when Tomcat should put your new war-file.
Best regards
Vadim



2012/10/14 Rogerio Pereira :

I'll try to be more specific Jack.

I just download the apache-solr-4.0.0.zip, from this archive I took the
core1 and core2 folders from multicore example and rename them to
collection1 and collection2, I also did all necessary changes on solr.xml
and solrconfig.xml and schema.xml on these two correct to reflect the new
names.

After this step I just tried to deploy and war file on tomcat pointing to
the the directory (solr/home) where these two cores are located, solr.xml
is there, with collection1 and collection2 properly configured.

The question is, now matter what is contained on solr.xml, this file isn't
read at Tomcat startup, I tried to cause a parser error on solr.xml by
removing closing tags, but even with this change I can't get at least a
parser error.

I hope to be clear now.


2012/10/14 Jack Krupansky 


I can't quite parse "the same multicore deployment as we have on apache
solr 4.0 distribution archive". Could you rephrase and be more specific.
What "archive"?

Were you already using 4.0-ALPHA or BETA (or some snapshot of 4.0) or are
you moving from pre-4.0 to 4.0? The directory structure did change in 
4.0.

Look at the example/solr directory.

-- Jack Krupansky

-Original Message- From: Rogerio Pereira
Sent: Sunday, October 14, 2012 10:01 AM
To: solr-user@lucene.apache.org
Subject: Multicore setup is ignored when deploying solr.war on Tomcat 
5/6/7



Hi,

I tried to perform the same multicore deployment as we have on apache 
solr
4.0 distribution archive, I created a directory for solr/home with 
solr.xml
inside and two subdirectories collection1 and collection2, these two 
cores
are properly configured with conf folder and solrconfi.xml and 
schema.xml,

on Tomcat I setup the system property pointing to solr/home path,
unfortunatelly when I start tomcat the solr.xml is ignored and only the
default collection1 is loaded.

As a test, I made changes on solr.xml to cause parser errors, and guess
what? These errors aren't reported on tomcat startup.

The same thing doesn't happens on multicore example that comes on
distribution archive, now I'm trying to figure out what's the black magic
happening.

Let me do the same kind of deployment on Windows and Mac OSX, if persist,
I'll update this thread.

Regards,

Rogério





--
Regards,

Rogério Pereira Araújo

Blogs: http://faces.eti.br, http://ararog.blogspot.com
Twitter: http://twitter.com/ararog
Skype: rogerio.araujo
MSN: ara...@hotmail.com
Gtalk/FaceTime: rogerio.ara...@gmail.com

(0xx62) 8240 7212
(0xx62) 3920 2666 




Selective Sorting in Solr

2012-10-15 Thread Sandip Agarwal
Hi,

I have many documents indexed into Solr. I am now facing a requirement
where the search results should be returned sorted based on their scores.
In the *case of non-exact matches*, if there is a tie, another level of
sorting is to be applied on a field called priority.

I am using solr with django-haystack in django 1.4.

What can/should I do to achieve my requirement?
What I tried:
I have ordered the SearchQuerySet method by('-score', 'priority'), but this
also applies to exact matches having the same score. What should I try to
achieve the above?
Is it even possible to achieve what I am trying?

I have even posted a question on StackOverflow:
http://stackoverflow.com/questions/12890165/selective-sorting-in-solr

Hoping for your guidance.

-- 
Regards,
Sandip Agarwal


Re: Selective Sorting in Solr

2012-10-15 Thread Upayavira
sort=score desc, priority desc

Won't that do it?

Upayavira

On Mon, Oct 15, 2012, at 09:14 AM, Sandip Agarwal wrote:
> Hi,
> 
> I have many documents indexed into Solr. I am now facing a requirement
> where the search results should be returned sorted based on their scores.
> In the *case of non-exact matches*, if there is a tie, another level of
> sorting is to be applied on a field called priority.
> 
> I am using solr with django-haystack in django 1.4.
> 
> What can/should I do to achieve my requirement?
> What I tried:
> I have ordered the SearchQuerySet method by('-score', 'priority'), but
> this
> also applies to exact matches having the same score. What should I try to
> achieve the above?
> Is it even possible to achieve what I am trying?
> 
> I have even posted a question on StackOverflow:
> http://stackoverflow.com/questions/12890165/selective-sorting-in-solr
> 
> Hoping for your guidance.
> 
> -- 
> Regards,
> Sandip Agarwal


Solr - nested entity need to return two fields

2012-10-15 Thread Marvin
Following entity defintion:












   

   


 
   


You see my problem is that the nested entity 'activity' need to return 2
fields namely 'activityNotes' and a field from a nested entity 'location'
namely 'locationName'. How can I realize that? Both fields should be
indexed.. Only chance to COALESCE both strings?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-nested-entity-need-to-return-two-fields-tp4013701.html
Sent from the Solr - User mailing list archive at Nabble.com.


exception when starting single instance solr-4.0.0

2012-10-15 Thread Bernd Fehling
Hi,
while starting solr-4.0.0 I get the following exception:

SEVERE: null:java.lang.IllegalAccessError:
class org.apache.lucene.codecs.lucene3x.PreFlexRWPostingsFormat cannot access
its superclass org.apache.lucene.codecs.lucene3x.Lucene3xPostingsFormat


Very strange, because some lines earlier in the logs I have:

Oct 15, 2012 2:30:24 PM org.apache.solr.core.SolrConfig initLibs
INFO: Adding specified lib dirs to ClassLoader
Oct 15, 2012 2:30:24 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 'file:/srv/www/solr/solr-4.0.0/lib/lucene-core-4.0-SNAPSHOT.jar' 
to classloader

Why is solr-4.0.0 thinking that the superclass is not there?

Any ideas?

Regards
Bernd


Re: core.SolrCore - java.io.FileNotFoundException

2012-10-15 Thread Erick Erickson
I have no idea how you managed to get so many files in
your index directory, but that's definitely weird. How it
relates to your "file not found", I'm not quite sure, but it
could be something as simple as you've run out of file
handles.

So you could try upping the number of
file handles as a _temporary_ fix just to see if that's
the problem. See your op-system's manuals for
how.

If it does work, then I'd run an optimize
down to one segment and remove all the segment
files _other_ than that one segment. NOTE: this
means things like .fdt, .fdx, .tii files etc. NOT things
like segments.gen and segments_1. Make a
backup of course before you try this.

But I think that's secondary. To generate this many
fiels I suspect you've started a lot of indexing
jobs that you then abort (hard kill?). To get this
many files I'd guess it's something programmatic,
but that's a guess.

How are you committing? Autocommit? From a SolrJ
(or equivalent) program? Have you implemented any
custom merge policies?

But to your immediate problem. You can try running
CheckIndex (here's a tutorial from 2.9, but I think
it's still good):
http://java.dzone.com/news/lucene-and-solrs-checkindex

If that doesn't help (and you can run it in diagnostic mode,
without the --fix flag to see what it _would_ do) then I'm
afraid you'll probably have to re-index.

And you've got to get to the root of why you have so
many segment files. That number is just crazy

Best
Erick

On Sun, Oct 14, 2012 at 11:20 PM, Jun Wang  wrote:
> PS, I have found that there lots of segment in index directory, and most of
> them is empty, like . totoal file number is 35314 in  index directory.
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3n.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3o.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3o.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3p.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3p.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3q.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3q.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3r.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3r.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3s.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3s.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3t.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3t.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3u.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3u.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3v.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3v.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3w.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3w.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3x.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3x.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3y.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3y.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3z.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3z.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k40.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k40.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k41.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k41.fdx
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k42.fdt
> -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k42.fdx
>
>
>
>
> 2012/10/15 Jun Wang 
>
>> I have encounter the a FileNotFoundException exception occasionally when
>> indexing, it's not occur every time. Anyone have some clue? Here is
>> the traceback:
>>
>> 2012-10-14 11:37:28,105 ERROR core.SolrCore -
>> java.io.FileNotFoundException:
>> /home/admin/run/deploy/solr/core_p_shard2/data/index/_cwo.fnm (No such file
>> or directory)
>> at java.io.RandomAccessFile.open(Native Method)
>> at java.io.RandomAccessFile.(RandomAccessFile.java:216)
>> at
>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:218)
>> at
>> org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
>> at
>> org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:47)
>> at
>> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:101)
>> at
>> org.apache.lucene.index.SegmentReader.(SegmentReader.java:55)
>> at
>> org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:120)
>> at
>> org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:267)
>> at
>> org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
>> at
>> org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:180)
>> at
>> org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:310)
>> at
>> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:386)

Copy Field Question

2012-10-15 Thread Virendra Goswami
Can we limit copyfield source condition?
for example if we want to make lookup in source="product_name" and
dest="some_dest"
so our syntax would become

How about copying only those product_names having status=0 AND attribute1=1
AND attribute2=0.
assume status,attribute1,attribute2 and product_name being two different
attribute of a same table.
can we write something like


Thanks in advance


Re: Copy Field Question

2012-10-15 Thread Tanguy Moal
Hello,

I think you don't have that much tuning possiblities using only the
schema.xml file.

You will have to write some custom Java code (subclasses of
UpdateRequestProcessor and UpdateRequestProcessorFactory), build a Java jar
containing your custom code, put that jar in one of the path declared your
solrconfig.xml ( ) -- or add a new one,  and finally tune the
update processors chain configuration (still in solrconfig.xml) so your
custom update processor is used.

See http://wiki.apache.org/solr/UpdateRequestProcessor which uses exactly
your use case as an example.

I hope this will help you :)

--
Tanguy

2012/10/15 Virendra Goswami 

> Can we limit copyfield source condition?
> for example if we want to make lookup in source="product_name" and
> dest="some_dest"
> so our syntax would become
> 
> How about copying only those product_names having status=0 AND attribute1=1
> AND attribute2=0.
> assume status,attribute1,attribute2 and product_name being two different
> attribute of a same table.
> can we write something like
>  source="attribute:1" AND source="attribute2:0" dest="some_dest"
> maxchar=200>
>
> Thanks in advance
>


Solr - Can not set java.sql.Timestamp field …created to java.util.Date

2012-10-15 Thread Marvin
Hi there!
I cannot read timestamp data from QueryResponse (want to cast result to a
POJO). If Im using SolrDocumentList there are no errors.

db-data-config.xml:

 
  












schema.xml:
 
 



the field 'created' is a timestamp in my database and after inserting index
data a result looks like (called with browser admin console) :

  
  
   1
   2012-10-05T07:29:23.387Z
   
message test
second
third
   
   Ashley
   10
   Morgan
   DISCUSSION
   headline test
  
  ...

Now I tried to to query a 'all result'

public void search(String searchString) {
SolrQuery query = new SolrQuery();
QueryResponse rsp;
try {
query = new SolrQuery();
query.setQuery(DEFAULT_QUERY);
query.setRows(246);

rsp = getServer().query(query);

SolrDocumentList solrDocumentList = rsp.getResults(); //
IllegalArgumentException
List beans =
rsp.getBeans(SearchRequestResponseObject.class); //works
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

SearchRequestResponseObject.class:

public class SearchRequestResponseObject {
@Field
private String id;

@Field
private String title;

@Field
@Temporal(TemporalType.TIMESTAMP)
//@DateTimeFormat(style = "MMdd HH:mm:ss z")
//@DateTimeFormat(style = "MMdd")
private Timestamp created;
...
}

Exception:

Caused by: java.lang.IllegalArgumentException: Can not set
java.sql.Timestamp field
com.ebcont.redbull.wtb.solr.SearchRequestResponseObject.created to
java.util.Date
at
sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:146)
at
sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:150)
at
sun.reflect.UnsafeObjectFieldAccessorImpl.set(UnsafeObjectFieldAccessorImpl.java:63)
at java.lang.reflect.Field.set(Field.java:657)
at
org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.set(DocumentObjectBinder.java:374)
... 45 more


What do Im wrong? :(



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Can-not-set-java-sql-Timestamp-field-created-to-java-util-Date-tp4013717.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Selective Sorting in Solr

2012-10-15 Thread Walter Underwood
Here is what I posted on StackOverflow:

The boost in edismax can be used for this. It is applied to all scores, but if 
it is a small value, it will only make a difference for ties or near-ties. 
Significant differences in the base score will not be reordered.

See: 
http://wiki.apache.org/solr/ExtendedDisMax#boost_.28Boost_Function.2C_multiplicative.29

wunder

On Oct 15, 2012, at 5:16 AM, Upayavira wrote:

> sort=score desc, priority desc
> 
> Won't that do it?
> 
> Upayavira
> 
> On Mon, Oct 15, 2012, at 09:14 AM, Sandip Agarwal wrote:
>> Hi,
>> 
>> I have many documents indexed into Solr. I am now facing a requirement
>> where the search results should be returned sorted based on their scores.
>> In the *case of non-exact matches*, if there is a tie, another level of
>> sorting is to be applied on a field called priority.
>> 
>> I am using solr with django-haystack in django 1.4.
>> 
>> What can/should I do to achieve my requirement?
>> What I tried:
>> I have ordered the SearchQuerySet method by('-score', 'priority'), but
>> this
>> also applies to exact matches having the same score. What should I try to
>> achieve the above?
>> Is it even possible to achieve what I am trying?
>> 
>> I have even posted a question on StackOverflow:
>> http://stackoverflow.com/questions/12890165/selective-sorting-in-solr
>> 
>> Hoping for your guidance.
>> 
>> -- 
>> Regards,
>> Sandip Agarwal

--
Walter Underwood
wun...@wunderwood.org





Re: core.SolrCore - java.io.FileNotFoundException

2012-10-15 Thread Jun Wang
Hi, Erick
Thanks for your advice. My mergeFactor is set to 10, so it's impossible
have so many segments, specially some .fdx, .fdt file is just empty. And
sometime indexing is working fine, ended with 200+ files in data dir. My
deployment is having two core and two shard for every core, using
autocommit , DIH is used for pull data from DB,   merge policies is
using TieredMergePolicy.
there is nothing customized.

I am wondering how could empty .fdx file generated. may be some config
in indexConfig is wrong. My final index is about 20G, having 40m+ docs.
here is part of my solrconfig.xml
-
32
100

10


  
   15000
   false
 

-

PS, I found an other kind of log, but I am not sure it's the reason or
the consequence. I am planing to open debug log, to gather more information
tomorrow.


2012-10-14 10:13:19,854 ERROR update.CommitTracker - auto commit
error...:java.io.FileNotFoundException: _cwj.fdt
at
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266)
at
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
at
org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:103)
at
org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:2126)
at
org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:495)
at
org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:474)
at
org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201)
at
org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119)
at
org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148)
at
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:435)
at
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:551)
at
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2657)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)








2012/10/15 Erick Erickson 

> I have no idea how you managed to get so many files in
> your index directory, but that's definitely weird. How it
> relates to your "file not found", I'm not quite sure, but it
> could be something as simple as you've run out of file
> handles.
>
> So you could try upping the number of
> file handles as a _temporary_ fix just to see if that's
> the problem. See your op-system's manuals for
> how.
>
> If it does work, then I'd run an optimize
> down to one segment and remove all the segment
> files _other_ than that one segment. NOTE: this
> means things like .fdt, .fdx, .tii files etc. NOT things
> like segments.gen and segments_1. Make a
> backup of course before you try this.
>
> But I think that's secondary. To generate this many
> fiels I suspect you've started a lot of indexing
> jobs that you then abort (hard kill?). To get this
> many files I'd guess it's something programmatic,
> but that's a guess.
>
> How are you committing? Autocommit? From a SolrJ
> (or equivalent) program? Have you implemented any
> custom merge policies?
>
> But to your immediate problem. You can try running
> CheckIndex (here's a tutorial from 2.9, but I think
> it's still good):
> http://java.dzone.com/news/lucene-and-solrs-checkindex
>
> If that doesn't help (and you can run it in diagnostic mode,
> without the --fix flag to see what it _would_ do) then I'm
> afraid you'll probably have to re-index.
>
> And you've got to get to the root of why you have so
> many segment files. That number is just crazy
>
> Best
> Erick
>
> On Sun, Oct 14, 2012 at 11:20 PM, Jun Wang  wrote:
> > PS, I have found that there lots of segment in index directory, and most
> of
> > them is empty, like . totoal file number is 35314 in  index directory.
> > -rw-rw-r-- 1 admin systems 0 Oct 14 11:37

Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Shawn Heisey

On 10/15/2012 2:47 AM, Alan Woodward wrote:

The extra codecs are supplied in a separate jar file now 
(lucene-codecs-4.0.0.jar) - I guess this isn't being packaged into solr.war by 
default?  You should be able to download it here:

http://search.maven.org/remotecontent?filepath=org/apache/lucene/lucene-codecs/4.0.0/lucene-codecs-4.0.0-javadoc.jar

  and drop it into the lib/ directory.


This should not be required, because I am building from source.  I 
compiled Solr from lucene-solr source checked out from branch_4x.  I 
grepped the entire tree for lucene-codec and found nothing.


It turns out that running 'ant generate-maven-artifacts' created the jar 
file -- along with a huge number of other jars that I don't need.  It 
took an extremely long time to run, for a jar that's a little over 300KB.


I would argue that the codecs jar should be created by compiling a dist 
target for Solr.  Someone else should determine whether it's appropriate 
to put it in the .war file, but I think it's important enough to make 
available without compiling everything in the Lucene universe.


ncindex@bigindy5 /index/src/branch_4x $ find . | grep "\.jar$" | grep codec
./solr/core/lib/commons-codec-1.7.jar
./dist/maven/org/apache/lucene/lucene-codecs/4.1-SNAPSHOT/lucene-codecs-4.1-20121015.165734-1.jar
./dist/maven/org/apache/lucene/lucene-codecs/4.1-SNAPSHOT/lucene-codecs-4.1-20121015.165734-1-javadoc.jar
./dist/maven/org/apache/lucene/lucene-codecs/4.1-SNAPSHOT/lucene-codecs-4.1-20121015.165734-1-sources.jar
./lucene/analysis/phonetic/lib/commons-codec-1.7.jar
./lucene/build/codecs/lucene-codecs-4.1-SNAPSHOT.jar
./lucene/build/codecs/lucene-codecs-4.1-SNAPSHOT-javadoc.jar
./lucene/build/codecs/lucene-codecs-4.1-SNAPSHOT-src.jar

I put this jar in my lib, and now I get a new error when I try the 
BloomFilter postingsFormat:


SEVERE: null:java.lang.UnsupportedOperationException: Error - 
org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
constructed without a choice of PostingsFormat
at 
org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat.fieldsConsumer(BloomFilteringPostingsFormat.java:139)
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)

at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:483)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2656)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2792)
at 
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2772)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:525)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007)
at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)





Re: Using

2012-10-15 Thread P Williams
Hi,

Thanks for the suggestions.  Didn't work for me :(

I'm calling


which depends on org.eclipse.jetty:jetty-server
which depends on org.eclipse.jetty.orbit:jettty-servlet

I think I'm experiencing https://jira.codehaus.org/browse/JETTY-1493.

The pom file for
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.pom
 contains orbit, so ivy looks for
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
rather
than
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar
hence
my troubles.

I'm an IVY newbie so maybe there is something I'm missing here?  Is there
another 'conf' value other than 'default' I can use?

Thanks,
Tricia



On Fri, Oct 12, 2012 at 4:32 PM, P Williams
wrote:

> Hi,
>
> Has anyone tried using  name="solr-test-framework" rev="4.0.0" conf="test->default"/> with Apache
> IVY in their project?
>
> rev 3.6.1 works but any of the 4.0.0 ALPHA, BETA and release result in:
> [ivy:resolve] :: problems summary ::
> [ivy:resolve]  WARNINGS
> [ivy:resolve]   [FAILED ]
> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit:
>  (0ms)
> [ivy:resolve]    shared: tried
> [ivy:resolve]
> C:\Users\pjenkins\.ant/shared/org.eclipse.jetty.orbit/javax.servlet/3.0.0.v201112011016/orbits/javax.servlet.orbit
> [ivy:resolve]    public: tried
> [ivy:resolve]
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
> [ivy:resolve]   ::
> [ivy:resolve]   ::  FAILED DOWNLOADS::
> [ivy:resolve]   :: ^ see resolution messages for details  ^ ::
> [ivy:resolve]   ::
> [ivy:resolve]   ::
> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit
> [ivy:resolve]   ::
> [ivy:resolve]
> [ivy:resolve]
> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>
> Can anybody point me to the source of this error or a workaround?
>
> Thanks,
> Tricia
>


Re: Solr Cloud and Hadoop

2012-10-15 Thread Rui Vaz
Thank you very much Otis, regular old Solr
distribute search was the piece I was missing. Now it's hands on time!
--
Rui


Re: Any filter to map mutiple tokens into one ?

2012-10-15 Thread T. Kuro Kurosaka

On 10/14/12 12:19 PM, Jack Krupansky wrote:
There's a miscommunication here somewhere. Is Solr 4.0 still passing 
"*:*" to the analyzer? Show us the parsed query for "*:*", as well as 
the debugQuery "explain" for the score.

I'm not quite sure what you mean by the parsed query for "*:*".
This fake analyzer using NGramTokenizer divides "*:*" into three tokens 
"*", ":", and "*", on purpose to simulate our Tokenizer's behavior.


An excerpt of he XML results from the query is pasted in the bottom of 
this message.


I mean, "*:*" (MatchAllDocsQuery) has a "constant score", so there 
isn't any way for it to be "suboptimal".

That's exactly the point I'd like to raise.
No matter what analyzers are assigned to fields, the hit score for "*:*" 
must remain 1.0, but it's not happening when an analyzer that divides 
"*:*" are in use.



Here's an excerpt of the query response. Notice this element, which 
should not be there, in my opinion:

DisjunctionMaxQuery((name:"* : *"^0.5))
There is a space between * and :, and another space between : and *.



0
33

on

2.2
10
edismax
name^0.5
*,score
on
0
*:*






GB18030TEST
Test with some GB18030 encoded characters

No accents here
这是一个功能
This is a feature (translated)
这份文件是很有光泽
This document is very shiny (translated)

0.0
0,USD
true
1415830106215022592
0.14764866

...


*:*
*:*

(+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *"^0.5)))/no_coord

+*:* (name:"* : *"^0.5)


0.14764866 = (MATCH) sum of: 0.14764866 = (MATCH) MatchAllDocsQuery, 
product of: 0.14764866 = queryNorm



ExtendedDismaxQParser


...








Re: Using

2012-10-15 Thread P Williams
Apologies, there was a typo in my last message.

org.eclipse.jetty.orbit:jettty-servlet  should have been
org.eclipse.jetty.orbit:javax.servlet


On Mon, Oct 15, 2012 at 11:19 AM, P Williams  wrote:

> Hi,
>
> Thanks for the suggestions.  Didn't work for me :(
>
> I'm calling
>  conf="test->default"/>
>
> which depends on org.eclipse.jetty:jetty-server
> which depends on org.eclipse.jetty.orbit:jettty-servlet
>
> I think I'm experiencing https://jira.codehaus.org/browse/JETTY-1493.
>
> The pom file for
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.pom
>  contains orbit, so ivy looks for
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
>  rather
> than
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar
>  hence
> my troubles.
>
> I'm an IVY newbie so maybe there is something I'm missing here?  Is there
> another 'conf' value other than 'default' I can use?
>
> Thanks,
> Tricia
>
>
>
> On Fri, Oct 12, 2012 at 4:32 PM, P Williams <
> williams.tricia.l...@gmail.com> wrote:
>
>> Hi,
>>
>> Has anyone tried using > name="solr-test-framework" rev="4.0.0" conf="test->default"/> with
>> Apache IVY in their project?
>>
>> rev 3.6.1 works but any of the 4.0.0 ALPHA, BETA and release result in:
>> [ivy:resolve] :: problems summary ::
>> [ivy:resolve]  WARNINGS
>> [ivy:resolve]   [FAILED ]
>> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit:
>>  (0ms)
>> [ivy:resolve]    shared: tried
>> [ivy:resolve]
>> C:\Users\pjenkins\.ant/shared/org.eclipse.jetty.orbit/javax.servlet/3.0.0.v201112011016/orbits/javax.servlet.orbit
>> [ivy:resolve]    public: tried
>> [ivy:resolve]
>> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
>> [ivy:resolve]   ::
>> [ivy:resolve]   ::  FAILED DOWNLOADS::
>> [ivy:resolve]   :: ^ see resolution messages for details  ^ ::
>> [ivy:resolve]   ::
>> [ivy:resolve]   ::
>> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit
>> [ivy:resolve]   ::
>> [ivy:resolve]
>> [ivy:resolve]
>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>>
>> Can anybody point me to the source of this error or a workaround?
>>
>> Thanks,
>> Tricia
>>
>
>


Re: Any filter to map mutiple tokens into one ?

2012-10-15 Thread Jack Krupansky
And you're absolutely certain you see "*:*" being passed to your analyzer in 
the final release of Solr 4.0???


-- Jack Krupansky

-Original Message- 
From: T. Kuro Kurosaka

Sent: Monday, October 15, 2012 1:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Any filter to map mutiple tokens into one ?

On 10/14/12 12:19 PM, Jack Krupansky wrote:
There's a miscommunication here somewhere. Is Solr 4.0 still passing "*:*" 
to the analyzer? Show us the parsed query for "*:*", as well as the 
debugQuery "explain" for the score.

I'm not quite sure what you mean by the parsed query for "*:*".
This fake analyzer using NGramTokenizer divides "*:*" into three tokens
"*", ":", and "*", on purpose to simulate our Tokenizer's behavior.

An excerpt of he XML results from the query is pasted in the bottom of
this message.


I mean, "*:*" (MatchAllDocsQuery) has a "constant score", so there isn't 
any way for it to be "suboptimal".

That's exactly the point I'd like to raise.
No matter what analyzers are assigned to fields, the hit score for "*:*"
must remain 1.0, but it's not happening when an analyzer that divides
"*:*" are in use.


Here's an excerpt of the query response. Notice this element, which
should not be there, in my opinion:
DisjunctionMaxQuery((name:"* : *"^0.5))
There is a space between * and :, and another space between : and *.



0
33

on

2.2
10
edismax
name^0.5
*,score
on
0
*:*






GB18030TEST
Test with some GB18030 encoded characters

No accents here
这是一个功能
This is a feature (translated)
这份文件是很有光泽
This document is very shiny (translated)

0.0
0,USD
true
1415830106215022592
0.14764866

...


*:*
*:*

(+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *"^0.5)))/no_coord

+*:* (name:"* : *"^0.5)


0.14764866 = (MATCH) sum of: 0.14764866 = (MATCH) MatchAllDocsQuery,
product of: 0.14764866 = queryNorm


ExtendedDismaxQParser


...




 



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Alan Woodward

> 
> This should not be required, because I am building from source.  I compiled 
> Solr from lucene-solr source checked out from branch_4x.  I grepped the 
> entire tree for lucene-codec and found nothing.
> 
> It turns out that running 'ant generate-maven-artifacts' created the jar file 
> -- along with a huge number of other jars that I don't need.  It took an 
> extremely long time to run, for a jar that's a little over 300KB.
> 
> I would argue that the codecs jar should be created by compiling a dist 
> target for Solr.  Someone else should determine whether it's appropriate to 
> put it in the .war file, but I think it's important enough to make available 
> without compiling everything in the Lucene universe.

I agree - it looks as though the codecs module wasn't added to the solr build 
when it was split off.  I've created a JIRA ticket 
(https://issues.apache.org/jira/browse/SOLR-3947) and added a patch.

On the error below, I'll have to defer to someone who knows how this actually 
works...

> 
> I put this jar in my lib, and now I get a new error when I try the 
> BloomFilter postingsFormat:
> 
> SEVERE: null:java.lang.UnsupportedOperationException: Error - 
> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
> constructed without a choice of PostingsFormat
>at 
> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat.fieldsConsumer(BloomFilteringPostingsFormat.java:139)
>at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130)
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335)
>at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
>at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
>at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
>at 
> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:483)
>at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
>at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
>at 
> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2656)
>at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2792)
>at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2772)
>at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:525)
>at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007)
>at 
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
>at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
> 
> 



Re: Spatial Search response time complexity

2012-10-15 Thread Smiley, David W.
Hi TJ.

If you use a circle query shape, it's O(N), plus it puts all the points in 
memory.  If you use a rectangle via bbox then I'm not sure but its fast enough 
that I wouldn't worry about it.  If my understanding is correct on Lucene 
TrieRange fields, it's O(Log(N)).  If you want fast filtering no matter what 
the query shape is, then I suggest Solr 4.0 SpatialRecursivePrefixTreeFieldType 
  ("location_rpt" in the example schema)

~ David Smiley

On Oct 9, 2012, at 5:00 PM, TJ Tong wrote:

> Hi all,
> 
> Does anyone know the Solr (lucene)spatial search time complexity, such as
> geofilt on LatLonType fields? Is it logN? 
> 
> Thanks!
> TJ
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Spatial-Search-response-time-complexity-tp4012801.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Any filter to map mutiple tokens into one ?

2012-10-15 Thread T. Kuro Kurosaka

On 10/15/12 10:35 AM, Jack Krupansky wrote:
And you're absolutely certain you see "*:*" being passed to your 
analyzer in the final release of Solr 4.0???
I don't have a direct evidence. This is the only theory I have that 
explains why changing FieldType causes the sub-optimal scores.

If you know of a way to tell if a tokenizer is really invoked, let me know.



-- Jack Krupansky

-Original Message- From: T. Kuro Kurosaka
Sent: Monday, October 15, 2012 1:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Any filter to map mutiple tokens into one ?

On 10/14/12 12:19 PM, Jack Krupansky wrote:
There's a miscommunication here somewhere. Is Solr 4.0 still passing 
"*:*" to the analyzer? Show us the parsed query for "*:*", as well as 
the debugQuery "explain" for the score.

I'm not quite sure what you mean by the parsed query for "*:*".
This fake analyzer using NGramTokenizer divides "*:*" into three tokens
"*", ":", and "*", on purpose to simulate our Tokenizer's behavior.

An excerpt of he XML results from the query is pasted in the bottom of
this message.


I mean, "*:*" (MatchAllDocsQuery) has a "constant score", so there 
isn't any way for it to be "suboptimal".

That's exactly the point I'd like to raise.
No matter what analyzers are assigned to fields, the hit score for "*:*"
must remain 1.0, but it's not happening when an analyzer that divides
"*:*" are in use.


Here's an excerpt of the query response. Notice this element, which
should not be there, in my opinion:
DisjunctionMaxQuery((name:"* : *"^0.5))
There is a space between * and :, and another space between : and *.



0
33

on

2.2
10
edismax
name^0.5
*,score
on
0
*:*






GB18030TEST
Test with some GB18030 encoded characters

No accents here
这是一个功能
This is a feature (translated)
这份文件是很有光泽
This document is very shiny (translated)

0.0
0,USD
true
1415830106215022592
0.14764866

...


*:*
*:*

(+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : 
*"^0.5)))/no_coord


+*:* (name:"* : *"^0.5)


0.14764866 = (MATCH) sum of: 0.14764866 = (MATCH) MatchAllDocsQuery,
product of: 0.14764866 = queryNorm


ExtendedDismaxQParser


...









solrcloud: what if ZK instances are evanescent?

2012-10-15 Thread John Brinnand
Hi Folks,

I have been looking at solrcloud to solve some of our problems with solr in
a distributed environment. As you know, in such an environment, every
instance of solr or zookeeper can come into existence and go out of
existence - at any time. So what happens if instances of ZK disappear and
re-appear with different hostnames and DNS entries? How would solr know
about these instances and how would it re-sync with these instances?

In essence my question is: what if the hostname and port of the ZK instance
no longer exists - how will solrcloud discover the new instance(s)?

Thanks,

John



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-what-if-ZK-instances-are-evanescent-tp4013740.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Alan Woodward
See discussion on https://issues.apache.org/jira/browse/SOLR-3843, this was 
apparently intentional.

That also links to the following: 
http://wiki.apache.org/solr/SolrConfigXml#codecFactory, which suggests you need 
to use solr.SchemaCodecFactory for per-field codecs - this might solve your 
postingsFormat exception.

On 15 Oct 2012, at 18:41, Alan Woodward wrote:

> 
>> 
>> This should not be required, because I am building from source.  I compiled 
>> Solr from lucene-solr source checked out from branch_4x.  I grepped the 
>> entire tree for lucene-codec and found nothing.
>> 
>> It turns out that running 'ant generate-maven-artifacts' created the jar 
>> file -- along with a huge number of other jars that I don't need.  It took 
>> an extremely long time to run, for a jar that's a little over 300KB.
>> 
>> I would argue that the codecs jar should be created by compiling a dist 
>> target for Solr.  Someone else should determine whether it's appropriate to 
>> put it in the .war file, but I think it's important enough to make available 
>> without compiling everything in the Lucene universe.
> 
> I agree - it looks as though the codecs module wasn't added to the solr build 
> when it was split off.  I've created a JIRA ticket 
> (https://issues.apache.org/jira/browse/SOLR-3947) and added a patch.
> 
> On the error below, I'll have to defer to someone who knows how this actually 
> works...
> 
>> 
>> I put this jar in my lib, and now I get a new error when I try the 
>> BloomFilter postingsFormat:
>> 
>> SEVERE: null:java.lang.UnsupportedOperationException: Error - 
>> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
>> constructed without a choice of PostingsFormat
>>   at 
>> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat.fieldsConsumer(BloomFilteringPostingsFormat.java:139)
>>   at 
>> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130)
>>   at 
>> org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335)
>>   at 
>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
>>   at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
>>   at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
>>   at 
>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
>>   at 
>> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:483)
>>   at 
>> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
>>   at 
>> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
>>   at 
>> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2656)
>>   at 
>> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2792)
>>   at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2772)
>>   at 
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:525)
>>   at 
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
>>   at 
>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>>   at 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007)
>>   at 
>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
>>   at 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>>   at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750)
>>   at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
>>   at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
>> 
>> 
> 



Re: Solr 4 spatial search - point intersects polygon

2012-10-15 Thread Smiley, David W.
Hi Jorge,

Please see the notes on Polygons:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4#JTS_.2BAC8_WKT_.2BAC8_Polygon_notes

This bullet in particular is relevant:
• The standard way to specify a rectangle in WKT is a Polygon -- WKT 
doesn't have a rectangle shape. If you want to specify a Rectangle via WKT 
(instead of the Spatial4j basic non-WKT syntax), you should take care to 
specify the coordinates in counter-clockwise order, the WKT standard. If this 
is done wrong then the rectangle will go the opposite direction longitudinally, 
even if it means one that spans nearly the entire globe (>180 degrees width). 
OpenLayers seems to not honor the WKT standard here, and depending on the 
corner you drag the rectangle from, might use a clockwise order. Some systems 
like PostGIS don't care what the ordering is, but the problem there is that 
there is then no way to specify a rectangle that has >= 180 width because there 
would be ambiguity. Spatial4j follows the WKT spec.

You aren't the first to have run into this problem.  Perhaps I should add a 
mode in which you cannot specify rectangles with a width >= 180 but in exchange 
your rectangle will always go the way you intended (assuming always < 180) 
without having to worry about coordinate order.

~ David Smiley

On Oct 8, 2012, at 5:25 AM, Jorge Suja wrote:

> Hi everyone, 
> 
> I've been playing around with the new spatial search functionalities
> included in the newer versions of solr (solr 4.1 and solr trunk 5.0), and
> i've found something strange when I try to find a point inside a polygon
> (particularly inside a square).
> 
> You can reproduce this problem using the spatial-solr-sandbox project that
> has the following config for the fields:
> 
> /[...]
>  units="degrees" />
> [...]
>  multiValued="false" />
> [...]/
> 
> I'm trying to find the following document:
> 
> /
>   G292223
>   Dubai
>   55.28 25.252220
> 
> /
> I want to test if this point is located inside a polygon so i'm using the
> following query:
> 
> /q=geohash:"Intersects(POLYGON((55.18 25.352220,55.38
> 25.352220,55.38 25.152220,55.18 25.152220,55.18 25.352220)))"/
> 
> As you can see, it's a small square that contains the point described
> before. I get some results, but that document is not there, and the ones
> returned are wrong since they are not even inside the square.
> 
> /
>   
>   G1809498
>   Guilin
>   110.286390 25.281940
>   
>   
>   [...]/
> 
> However, if i change a little bit the shape of the square (just changed a
> little bit one corner), it returns the result as expected
> 
> /q=geohash:"Intersects(POLYGON((55.18 25.352220,*55.48*
> 25.352220,55.38 25.152220,55.18 25.152220,55.18 25.352220)))"/
> 
> Now it returns a single result and it's OK
> 
> /
>   
>   G292223
>   Dubai
>   55.28 25.252220
>   
> /
> 
> 
> If i use a bbox with the same size and position than the first square, it
> returns correctly the document.
> 
> /q=geohash:"Intersects(55.18 25.152220 55.38 25.352220)"
> 
> 
>   
>   G292223
>   Dubai
>   55.28 25.252220
>   
> /
> 
> If you draw another polygon such a triangle it works well too.
> 
> I've tested this against different points and it's always the same, it seems
> that if you draw a straight square (or rectangle),
> it can't find the point inside it, and it returns wrong results.
> 
> Am i doing anything wrong?
> 
> Thanks in advance
> 
> Jorge
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-spatial-search-point-intersects-polygon-tp4012402.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExternalFileField/FileFloatSource improvements

2012-10-15 Thread Otis Gospodnetic
Hi Alan,

I don't have any direct feedback... but I know there is an issue that
you may want to be aware of (and incorporate?) -
https://issues.apache.org/jira/browse/SOLR-3514

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Mon, Oct 15, 2012 at 9:37 AM, Alan Woodward
 wrote:
> Hi list,
>
> I'm having a go at improving the performance of ExternalFileField (relevant 
> to this thread: 
> http://lucene.472066.n3.nabble.com/Reloading-ExternalFileField-blocks-Solr-td4012399.html#a4013305),
>  and thought I'd get some feedback.  What do people think of the following?
>
> - FileFloatSource needs to be updated in three cases:
> - when new segments are added
> - when segments are merged
> - when the external file source is updated
>
> In our use-case, new documents will not have values in the external file (it 
> contains things like click-data, which will only appear after the document 
> has been in the index for a while), so we don't need to reload when new 
> segments are added.
>
> My plan is to hook the cache refresh into either newSearcher or postCommit.  
> I change the FileFloatSource internals to be keyed on individual 
> SegmentReaders rather than top-level IndexReaders, so existing float caches 
> don't need to be reloaded for unchanged segments; I (somehow?) detect if 
> segments with empty caches contain new documents (in which case we can just 
> give them all default values) or are the result of merges (in which case we 
> need to reload the external file and repopulate).
>
> I also plan to modify the reloadCaches update handler so that instead of just 
> clearing the cache (and hence slowing down the next query to hit, as the new 
> caches are lazy-loaded), it reloads the file in the background and then cuts 
> over to the new caches.
>
> I'll open a JIRA and post patches once I've begun the actual implementation, 
> but if anybody notices something that would stop this working, it would be 
> nice to hear about it before I start…  :-)
>
> Thanks,
>
> Alan Woodward


Re: exception when starting single instance solr-4.0.0

2012-10-15 Thread Erick Erickson
My first guess would be a classpath error given this
references lucene3x.

Since all that's deprecated, is there any chance you're
somehow getting a current trunk (5x) jar in there somehow?

Because I see no such error when I start 4.0...

Best
Erick

On Mon, Oct 15, 2012 at 8:42 AM, Bernd Fehling
 wrote:
> Hi,
> while starting solr-4.0.0 I get the following exception:
>
> SEVERE: null:java.lang.IllegalAccessError:
> class org.apache.lucene.codecs.lucene3x.PreFlexRWPostingsFormat cannot access
> its superclass org.apache.lucene.codecs.lucene3x.Lucene3xPostingsFormat
>
>
> Very strange, because some lines earlier in the logs I have:
>
> Oct 15, 2012 2:30:24 PM org.apache.solr.core.SolrConfig initLibs
> INFO: Adding specified lib dirs to ClassLoader
> Oct 15, 2012 2:30:24 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> INFO: Adding 'file:/srv/www/solr/solr-4.0.0/lib/lucene-core-4.0-SNAPSHOT.jar' 
> to classloader
>
> Why is solr-4.0.0 thinking that the superclass is not there?
>
> Any ideas?
>
> Regards
> Bernd


Re: exception when starting single instance solr-4.0.0

2012-10-15 Thread Chris Hostetter

: SEVERE: null:java.lang.IllegalAccessError:
: class org.apache.lucene.codecs.lucene3x.PreFlexRWPostingsFormat cannot access
: its superclass org.apache.lucene.codecs.lucene3x.Lucene3xPostingsFormat

that sounds like a classpath error.

: Very strange, because some lines earlier in the logs I have:
: 
: Oct 15, 2012 2:30:24 PM org.apache.solr.core.SolrConfig initLibs
: INFO: Adding specified lib dirs to ClassLoader
: Oct 15, 2012 2:30:24 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
: INFO: Adding 'file:/srv/www/solr/solr-4.0.0/lib/lucene-core-4.0-SNAPSHOT.jar' 
to classloader

...and that looks like a mistake.  based on that log line, you either have 
a copy of the lucene core jar in the implicit "lib" dir for your solr 
core, or you have an explicit  directive pointed 
somewhere that contains a copy of the lucene-core jar -- either way 
telling slr to load the lucene-core jar as a plugin.

but lucene-core should not be loaded as a plugin.  lucene-core is already 
in the solr.war, and should have been loaded long before SolrConfig 
started looking for plugin libraries.

which means you probably have two copies of the lucene-core jar ... and 
if you have two copies of that jar, you probably have two copies of oher 
lucene jars.

which begs the questions:

 * what is you solr home dir? (i'me guessing maybe it's 
"/srv/www/solr/solr-4.0.0/" ?)
 * why do you have a copy of lucene-core in /srv/www/solr/solr-4.0.0/lib ?
 * what  directives do you have in your solrconfig.xml and why?


-Hoss


Re: How do I make Soft Commits thru' EmbeddedSolrServer visible to Searcher?

2012-10-15 Thread solr_user_999
After a bit of research; I realized that if I am using EmbeddedSolrServer
then I need to also do a hard commit in the Searcher (which runs in a
separate jvm). So I tried that; but am getting a LockException. Looks like
the EmbeddedSolrServer locks the Solr index for writing & when I try to do a
commit in the searcher the LockException is thrown.

Is there any way around this?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-make-Soft-Commits-thru-EmbeddedSolrServer-visible-to-Searcher-tp4012776p4013769.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 4.0 spatial questions

2012-10-15 Thread Smiley, David W.
Hi Matt.

The documentation is here:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

The sort / relevancy section is a TODO; I've been improving this document 
lately a bit at a time lately.
 
My comments are within...

On Oct 5, 2012, at 10:10 AM, Matt Mitchell wrote:

> Hi,
> 
> Apologies if some of this has been asked before. I searched the list,
> found similar questions but the suggestions didn't solve my issues.
> 
> I've been playing with the new spatial features in Solr trunk, very
> cool. I successfully indexed a MULTIPOLYGON and could see my query
> working using the "Intersects" function <- that is very exciting! My
> question is, how can I find out more info on this stuff? Some of the
> things I'm looking for, specifically:
> 
> What functions are available? For example, is there a "contains"
> function? Is there java source-code I could look at to figure out
> what's available?

SpatialOperation.java.  For what's in Lucene / Solr 4.0, the only operation 
that is effectively implemented right now is INTERSECTS.  WITHIN is supported 
by PointVector field type but that is semantically equivalent to INTERSECTS 
when the indexed data is Points, and PointVector as its name suggests only 
supports points.  In the future, I figure my employer will have the need for a 
WITHIN and CONTAINS operation, and I know how to add that to the 
RecursivePrefixTree based field types.  It won't be easy.  I believe Chris Male 
has already done this on the ElasticSearch port of Lucene spatial, but I 
haven't looked at it.

> Is there a way to dynamically buffer a geometry, then query on that
> buffered geometry?

I have this at work but it's not yet in the open-source offering.  It's pretty 
easy thanks to JTS, which does the hard work (it's just a method call).  Once 
we get an open-source extensible WKT parser in Spatial4j (which Chris has 
already done for ElasticSearch, so it's going to happen in the very near 
future), we can then add a buffer operation.

> Can I get the distance (as a pseudo field) to a stored/indexed
> MULTIPOLYGON from a given point?

If you are already sorting it, then see the example below (notice the "distdeg" 
pseudo-field alias).  The solution below will work even if you don't sort it 
but it will trigger RAM requirements that are a bit needless.  If you don't 
want the RAM requirements, then you should perform this calculation yourself at 
the client.

> What about sorting by distance to MULTIPOLYGON from point?

Yes... though I'm not happy with the implementation.  I recommend you index a 
field just for the center point.  If there is going to be only one per 
document, then use PointVector or LatLonType.  If there is multiple... then 
you're stuck with the existing implementation with seems to work but definitely 
isn't scalable for real-time-search nor for millions of documents or more.

Here's a comment on the JIRA issue where I left an example:
https://issues.apache.org/jira/browse/SOLR-3304?focusedCommentId=13456188&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13456188

That query is:
http://localhost:8983/solr/select?q=*%3A*&wt=xml&fq={!%20v=$sq}&sq=store:%22Intersects%28Circle%2854.729696,-98.525391%20d=10%29%29%22&debugQuery=on&sort=query%28$sortsq%29+asc&fl=id,store,score,distdeg:query%28$sortsq%29&sortsq={!%20score=distance%20v=$sq}

> Can or will it be possible to transform shapes, for example select the
> minimum-bounding-box of a complex shape? Another example would be
> extracting the center point of a polygon.

BBox of an indexed shape is not really supported so you'd have to index the 
bbox as a rectangle, probably via Lucene 5 spatial BBoxStrategy.

For a query shape... that is one of those operations, like a buffer, that I'd 
like to add.

> I've tried to sort and get the distance using some of the tips on the
> Wiki, but couldn't get any of it to work.
> 
> I'd be glad to get some of this into the Wiki too.

Just to repeat:

The documentation is here:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

~ David

Re: Solr4 without slf4j bindings -- apparent catch-22

2012-10-15 Thread Chris Hostetter

: I'm trying to get a Solr4 install going, building without slf4j bindings.  I
...
: If I use the standard .war file, sharedLib works as I would expect it to.  The
: solr.xml file is found and it finds the sharedLib directory just fine, as you
: can see from this log excerpt:
...
: INFO: Adding 'file:/index/solr4/lib/slf4j-api-1.7.2.jar' to classloader
...
: The problem that I am having is with the -excl-slf4j.war file, which I am
: trying to use in order to use log4j instead of jdk logging.  When I do that,
: it seems to be unable to find the sharedLib folder in solr.xml.  Because it
: can't find any slf4j bindings at all, I cannot see what's going on in the log.
: Entire log included:

I think one, or both, of us is confused about how the dist-war-excl-slf4j 
target is intended to be used.

I'm fairly certain you can't try to use slf4j/log4j from the sharedLib -- 
because at that point a lot of solr has already been initialized and 
already started doing logging so slf4j should have already tried to 
resolve the binding it should use, found nothing, and picked it's default 
NOP implementation -- as you can see in your logs...

: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
: SLF4J: Defaulting to no-operation (NOP) logger implementation
: SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
: details.

I *believe* the intended way to us the -excl-slf4j.war is by having the 
servlet container load both the log4j.jar and the slf4j binding for log4j 
-- either by putting it in the jetty/lib, or by specifing them in the 
runtime classpath -- but I think you also need to configure log4j at the 
servlet container level so that it will be initialized.

: I also tried putting the slf4j jars in /opt/solr4/lib (jetty's lib directory).
: Unsurprisingly, there was no change. Where can I put the jars to make this

did you move them or copy them, because it wouldn't suprise me if having 
duplicate copies of those slf4j jars in sharedLib broke logging in solr 
even if things were configured and working properly at the jetty level.


-Hoss


Re: Solr4 without slf4j bindings -- apparent catch-22

2012-10-15 Thread Chris Hostetter

: As an interim measure, I tried putting the jars in a separate directory and
: added a commandline option for the classpath.  I also downgraded to 1.6.4,
: because despite asking for a war without it, the war still contains slf4j-api
: version 1.6.4. The log still shows that it failed to find a logger binding -
: no difference from above.

As a followup to my other comments:

I think the reason the slf4j-api jar is left in the war is because the 
"exclusion" is only of the specifc binding used.  users can't arbitrarily 
drop in any version of slf4j that they want at runtime, the slf4j-api has 
to match what solr was compiled against so that the logging calls solr 
makes will still work.

: -Djava.util.logging.config.file=etc/logging.properties option. Trying to set
: that property in jetty.xml according to the wiki didn't work.  I notice that
: the example says 'mortbay' ... perhaps Jetty 8 does it differently?

Very probably


-Hoss


Re: Solr4 without slf4j bindings -- apparent catch-22

2012-10-15 Thread Michael Della Bitta
> slf4j-api has to match what solr was compiled against so that the logging 
> calls solr makes will still work.

To my knowledge, that's not strictly true:
http://www.slf4j.org/faq.html#compatibility


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Oct 15, 2012 at 3:35 PM, Chris Hostetter
 wrote:
> slf4j-api has
> to match what solr was compiled against so that the logging calls solr
> makes will still work.


Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Shawn Heisey

On 10/15/2012 12:38 PM, Alan Woodward wrote:

See discussion on https://issues.apache.org/jira/browse/SOLR-3843, this was 
apparently intentional.

That also links to the following: 
http://wiki.apache.org/solr/SolrConfigXml#codecFactory, which suggests you need 
to use solr.SchemaCodecFactory for per-field codecs - this might solve your 
postingsFormat exception.


I already added this to my solrconfig.xml as a top-level element:

 

Once I added this, I tried Bloom, but I had an incorrect name.  That 
resulted in this error, showing that the codecFactory config element 
gave me more choices than Lucene40 and Lucene41:


SEVERE: java.lang.IllegalArgumentException: A SPI class of type 
org.apache.lucene.codecs.PostingsFormat with name 'Bloom' does not 
exist. You need to add the corresponding JAR file supporting this SPI to 
your classpath.The current classpath supports the following names: 
[Lucene40, Lucene41, Pulsing41, SimpleText, Memory, BloomFilter, Direct]


Once I got that, I knew I had made some progress, so I changed it to 
BloomFilter and got the error in the previous message.  Repasting here 
without the full stacktrace:


SEVERE: null:java.lang.UnsupportedOperationException: Error - 
org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
constructed without a choice of PostingsFormat


Based on that error message, along with something I remember reading 
during my Google travels, I suspect that not all codecs (BloomFilter 
being a prime example) have whatever corresponding Solr bits are required.


Thanks,
Shawn



Re: Multicore setup is ignored when deploying solr.war on Tomcat 5/6/7

2012-10-15 Thread Chris Hostetter

: on Tomcat I setup the system property pointing to solr/home path,
: unfortunatelly when I start tomcat the solr.xml is ignored and only the

Please elaborate on how exactly you pointed tomcat at your solr/home.

you mentioned "system property" but when using system properties to set 
the Solr Home" you wnat to set "solr.solr.home" .. "solr/home" is the JNDI 
variable name used as an alternative.

if you look at the logging when solr first starts up, you should ese 
several messages about how/where it's trying to locate the Solr Home Dir 
... please double check that it's finding the one you intended.

Please give us more details about those log messages related to the solr 
home dir, as well as how you are trying to set it, and what your directory 
structure looks like in tomcat.

If you haven't seen it yet...

https://wiki.apache.org/solr/SolrTomcat



-Hoss


Re: Testing Solr4 - first impressions and problems

2012-10-15 Thread Chris Hostetter

: I have autocommit turned completely off -- both values set to zero.  The DIH
...
: When I first set this up back on 1.4.1, I had some kind of severe problem when
: autocommit was turned on.  I can no longer remember what it caused, but it was
: a huge showstopper of some kind.

the key question about using autocommit is wether or not you use 
"openSearcher" with it and wether you have the updateLog turned on.

as i understand it: if you don't care about real time get, or transaction 
recovery of "uncommited documents" on hard crash, or any of the Solr Coud 
features, then you don't need the updateLog -- and you shouldn't add it to 
your existing configs when upgrading to Solr4. any existing usage (or 
non-usage) you had of autocommit should continue to work fine.

If you *do* care about things that require the updateLog, then you want to 
ensure that you are doing "hard commits" (ie: perisisting the index to 
disk) relatively frequently in order to keep the size of the updateLog 
from growing w/o bound -- but in Solr 4, doing a hard commit no longer 
requires that you open a new searcher.  opening a new searcher and 
dealing with the cache loading is one of the main reasons people typically 
avoided autoCommit in the past.

So if you look at the Solr 4 example: it uses the updateLog combined with 
a 15 second autoCommit that has openSearcher=false -- meaning that the 
autocommit logic is ensuring that anytime the index has modifications they 
are written to disk every 15 seconds, but the new documents aren't exposed 
to search clients as a result of those autocommits, and if a client uses 
real time get, or if there is a a hard crash, the uncommited docs are 
still available in the udpateLog.

For your usecase and upgrade: don't add the updateLog to your configs, and 
don't add autocommit to your configs, and things should work fine.  if you 
decide you wnat to start using something that requires the updateLog, you 
should probably add a short autoCommit with openSearcher=false.


-Hoss


Re: Solr4 without slf4j bindings -- apparent catch-22

2012-10-15 Thread Shawn Heisey

On 10/15/2012 1:32 PM, Chris Hostetter wrote:

I think one, or both, of us is confused about how the dist-war-excl-slf4j
target is intended to be used.


You arevery likely correct, and it's probably me that's confused.


I'm fairly certain you can't try to use slf4j/log4j from the sharedLib --
because at that point a lot of solr has already been initialized and
already started doing logging so slf4j should have already tried to
resolve the binding it should use, found nothing, and picked it's default
NOP implementation -- as you can see in your logs...

: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
: SLF4J: Defaulting to no-operation (NOP) logger implementation
: SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
: details.

I *believe* the intended way to us the -excl-slf4j.war is by having the
servlet container load both the log4j.jar and the slf4j binding for log4j
-- either by putting it in the jetty/lib, or by specifing them in the
runtime classpath -- but I think you also need to configure log4j at the
servlet container level so that it will be initialized.


I tried both of these-- putting them in jetty's lib, as well as putting 
them inan arbitrary directory and and putting the relative path 
(blah/filename.jar) on the commandline with -cp (and -classpath).  I 
suspect what I will need to do is create the standard war, extract it, 
fiddle with the contents, and then make a new war.  Not terribly 
automated, but upgrading is not something I will be doing all that 
often.  In my test environment (where multiple back to back compiles may 
be commonplace) it will be a bit painful, but I suppose I can just build 
the standard war and use jdk logging there, until I'm ready to deploy to 
production.



: I also tried putting the slf4j jars in /opt/solr4/lib (jetty's lib directory).
: Unsurprisingly, there was no change. Where can I put the jars to make this

did you move them or copy them, because it wouldn't suprise me if having
duplicate copies of those slf4j jars in sharedLib broke logging in solr
even if things were configured and working properly at the jetty level.


I thought of this, and was using 'mv' for each test iteration.

Thanks,
Shawn



Re: Testing Solr4 - first impressions and problems

2012-10-15 Thread Shawn Heisey

On 10/15/2012 2:51 PM, Chris Hostetter wrote:

For your usecase and upgrade: don't add the updateLog to your configs, and
don't add autocommit to your configs, and things should work fine.  if you
decide you wnat to start using something that requires the updateLog, you
should probably add a short autoCommit with openSearcher=false.


Thank you for your answer.  Using updateLog seems to have another 
downside -- a huge hit to performance.  It wouldn't be terrible on 
incremental updates.  These happen once a minute and normally complete 
extremely quickly - less than a second, followed by a commit that may 
take 2-3 seconds.  If it took 5-10 seconds instead of 3, that's not too 
bad.  But when you are expecting a process to take three hours and it 
actually takes 8-10 hours, it's another story.


Shawn



With Grouping enabled, 0 results yields maxScore of -Infinity

2012-10-15 Thread Amit Nithian
I see that when there are 0 results with the grouping enabled, the max
score is -Infinity which causes parsing problems on my client. Without
grouping enabled the max score is 0.0. Is there any particular reason
for this difference? If not, would there be any resistance to
submitting a patch that will set the score to 0 if the numFound is 0
in the grouping component? I see code that sets the max score to
-Infinity and then will set it to a different value when iterating
over some set of scores. With 0 scores, then it stays as -Infinity and
serializes out as such.

I'll be more than happy to work on this patch but before I do, I
wanted to check that I am not missing something first.

Thanks
Amit


How many documents in each Lucene segment?

2012-10-15 Thread Shawn Heisey
Is there any way to easily determine how many documents exist in a 
Lucene index segment?  Ideally I want to check the document counts in 
segments on an index that is being built by a large MySQL dataimport, 
before the dataimport completes.  If that's not possible, I can take 
steps to do a smaller import and make sure the changes are committed.


Thanks,
Shawn



RE: How many documents in each Lucene segment?

2012-10-15 Thread Michael Ryan
Easiest way I know of without parsing any of the index files is to take the 
size of the fdx file in bytes and divide by 8. This will give you the exact 
number of documents before 4.0, and a close approximation in 4.0.

Though, the fdx file might not be on disk if you haven't committed.

-Michael

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Monday, October 15, 2012 9:21 PM
To: solr-user@lucene.apache.org
Subject: How many documents in each Lucene segment?

Is there any way to easily determine how many documents exist in a 
Lucene index segment?  Ideally I want to check the document counts in 
segments on an index that is being built by a large MySQL dataimport, 
before the dataimport completes.  If that's not possible, I can take 
steps to do a smaller import and make sure the changes are committed.

Thanks,
Shawn



Solr Autocomplete

2012-10-15 Thread Rahul Paul
Hi,
I am using mysql for solr indexing data in solr. I have two fields: "name"
and "college". How can I add auto suggest based on these two fields?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autocomplete-tp4013859.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Autocomplete

2012-10-15 Thread Ahmet Arslan
> I am using mysql for solr indexing data in solr. I have two
> fields: "name"
> and "college". How can I add auto suggest based on these two
> fields?

Here is a blog post and code an example.  
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/


Re: Solr Autocomplete

2012-10-15 Thread Lance Norskog
http://find.searchhub.org/?q=autosuggest+OR+autocomplete

- Original Message -
| From: "Rahul Paul" 
| To: solr-user@lucene.apache.org
| Sent: Monday, October 15, 2012 9:01:14 PM
| Subject: Solr Autocomplete
| 
| Hi,
| I am using mysql for solr indexing data in solr. I have two fields:
| "name"
| and "college". How can I add auto suggest based on these two fields?
| 
| 
| 
| --
| View this message in context:
| http://lucene.472066.n3.nabble.com/Solr-Autocomplete-tp4013859.html
| Sent from the Solr - User mailing list archive at Nabble.com.
| 


Re: How many documents in each Lucene segment?

2012-10-15 Thread Shawn Heisey

On 10/15/2012 8:06 PM, Michael Ryan wrote:

Easiest way I know of without parsing any of the index files is to take the 
size of the fdx file in bytes and divide by 8. This will give you the exact 
number of documents before 4.0, and a close approximation in 4.0.

Though, the fdx file might not be on disk if you haven't committed.


When you are importing 12 million documentsfrom a database, you get LOTS 
of completed segments even if there is no commit until the end.  The 
ramBuffer fills up pretty quick.


I intend to figure out how many documents are in the segments 
(ramBufferSizeMB=256) and try out an autoCommit setting a little bit 
lower than that.  I had trouble with autoCommit on previous versions, 
but with 4.0 I can turn off openSearcher, which may allow it to work right.


Thanks,
Shawn



Re: exception when starting single instance solr-4.0.0

2012-10-15 Thread Bernd Fehling
The solr home dir is as suggested for solr 4.0 to be located below jetty.
So my directory structure is:
/srv/www/solr/solr-4.0.0/
-- dist   ** has all apache solr and lucene libs not in .war
-- lib** has all other libs not in .war and not in dist, but required
-- jetty  ** the jetty copied from solr/example with context, etc, webapps, ...
   jetty/solr  ** solr with its subdirectories
   jetty/solr/conf
   jetty/solr/data
   jetty/solr/solr.xml

Currently lucene-core is also in lib directory because of the error message.
I thought this would fix my problem, but no change so if I remove it the error 
remains.

In solrconfig.xml I have only two lib directives:




Strange thing is, solr/example starts without problems and I could also start
my solr-4.0.0 development installation from eclipse with runjettyrun.


Just tested, after removing lucene-core from lib directory the error remains 
the same.


Seriously a stupid config error, but where?

Regards
Bernd


Am 15.10.2012 21:05, schrieb Chris Hostetter:
> 
> : SEVERE: null:java.lang.IllegalAccessError:
> : class org.apache.lucene.codecs.lucene3x.PreFlexRWPostingsFormat cannot 
> access
> : its superclass org.apache.lucene.codecs.lucene3x.Lucene3xPostingsFormat
> 
> that sounds like a classpath error.
> 
> : Very strange, because some lines earlier in the logs I have:
> : 
> : Oct 15, 2012 2:30:24 PM org.apache.solr.core.SolrConfig initLibs
> : INFO: Adding specified lib dirs to ClassLoader
> : Oct 15, 2012 2:30:24 PM org.apache.solr.core.SolrResourceLoader 
> replaceClassLoader
> : INFO: Adding 
> 'file:/srv/www/solr/solr-4.0.0/lib/lucene-core-4.0-SNAPSHOT.jar' to 
> classloader
> 
> ...and that looks like a mistake.  based on that log line, you either have 
> a copy of the lucene core jar in the implicit "lib" dir for your solr 
> core, or you have an explicit  directive pointed 
> somewhere that contains a copy of the lucene-core jar -- either way 
> telling slr to load the lucene-core jar as a plugin.
> 
> but lucene-core should not be loaded as a plugin.  lucene-core is already 
> in the solr.war, and should have been loaded long before SolrConfig 
> started looking for plugin libraries.
> 
> which means you probably have two copies of the lucene-core jar ... and 
> if you have two copies of that jar, you probably have two copies of oher 
> lucene jars.
> 
> which begs the questions:
> 
>  * what is you solr home dir? (i'me guessing maybe it's 
> "/srv/www/solr/solr-4.0.0/" ?)
>  * why do you have a copy of lucene-core in /srv/www/solr/solr-4.0.0/lib ?
>  * what  directives do you have in your solrconfig.xml and why?
> 
> 
> -Hoss
>