Authentication Issue in Shards Query

2012-06-20 Thread tosenthu
Hi

I have a Solr server with 5 Cores, I have modified the Web.xml of solr.war
to have a basic authentication feature enabled for all the web resources.
Also I have written my own Login Module to have the login check. Now when I
query a single core It asks for the User name and password, with proper
credential the query works fine.  But when I use a shard type of Query I get
a 401 error. 

Basically the credential provided to the query is not passed on to shard
queries. Is there a way to overcome this issue via some configurations.

Also the replication is blocked because of authentication.

Please provide me a work arround for this issue.

Regards
Senthil Kumar M R

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Authentication-Issue-in-Shards-Query-tp3990481.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexation Speed?

2012-06-20 Thread Bruno Mannina

Ok thanks for this information,

Le 20/06/2012 05:44, Lance Norskog a écrit :

M. Della Bitta is right- we're not talking about post.jar, but starting Solr:

java -xMx300m -jar start.jar

On Tue, Jun 19, 2012 at 10:05 AM, Erick Erickson
  wrote:

Well, it _used_ to be defaulted in the code, but on looking at 3.6 it's seems
like it defaults to Integer.MAX_VALUE, so you're fine

And it's all deprecated in 4.x, will be gone

Best
Erick

On Tue, Jun 19, 2012 at 7:07 AM, Bruno Mannina  wrote:

Actually -Xmx512m and no effect

Concerning  maxFieldLength, no problem it's commented

Le 19/06/2012 13:02, Erick Erickson a écrit :


Then try -Xmx600M
next try -Xmx900M


etc. The idea is to bump things on separate runs.

But be a little cautious here. Look in your solrconfig.xml file, you'll
see
a commented-out line
1

The default behavior for Solr/Lucene is to index the first 10,000 tokens
(not characters, think of tokens as words for not) in each
document and throw the rest on the floor. At the sizes you're talking
about,
that's probably not a problem, but do be aware of it.

Best
Erick

On Tue, Jun 19, 2012 at 5:44 AM, Bruno Manninawrote:

Like that?

java -Xmx300m -jar post.jar myfile.xml



Le 19/06/2012 11:11, Lance Norskog a écrit :


Ah! Java memory size is a java command line option:


http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html

You would try increasing the memory size in stages up to maybe 300m.

On Tue, Jun 19, 2012 at 2:04 AM, Bruno Mannina
  wrote:


Le 19/06/2012 10:51, Lance Norskog a écrit :


675 doc/s is respectable for that server. You might move the memory
allocated to Java up and down- there is a balance between amount of
memory in Java v.s. the OS disk buffer.


How can I do that ? is there an option during my command line or in a
config
file?
sorry for this newbie question :(



And, of course, use the latest trunk.

Solr 3.6



On Tue, Jun 19, 2012 at 12:10 AM, Bruno Mannina
  wrote:

Correction: file size is 40 Mo !!!

Le 19/06/2012 09:09, Bruno Mannina a écrit :


Dear All,

I would like to know if the indexation speed is right.

I have a 40Go file size with around 27 000 docs inside.
I index around 20 fields,

My (old) test server is a DualCore 3.06GHz Intel Xeon with only 1Go
Ram

The file takes 40 seconds with the command line:
java -jar post.jar myfile.xml

Could I increase this speed or reduce this time?

Thanks a lot,
PS: Newbie user









Solr with Tomcat on VPS

2012-06-20 Thread Hill Michael (NHQ-AC)
I am running Solr in a shared Tomcat v5.5.28 (I have access to all
instances) on a Linux VPS server.
When I set it all up, Tomcat starts properly and I can see that it has
accesses my Solr Config directory properly.  

I can access the JSP pages if I reference them directly
(http://mysite.com/solr/admin/index.jsp for example) but access to URL's
like:
 1. http://mysite.com/solr/admin/ 
 2.
http://mysite.com/solr/admin/dataimport.jsp?clean=false&commit=true&comm
and=full-import 
 3.
http://mysite.com/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&inden
t=on

all return 404 errors like "URL /solr/select/ was not found on this
server."
I have tried all I can think of and wondering if anyone else has some
thoughts.

This all works great on my development PC where I run the same version
of Tomcat.

Thanks,
Mike


Solr Autosuggest

2012-06-20 Thread Shri Kanish
Hi,
I have a question regarding solr Autosuggest. (If this is not the correct link 
to Post, Please suggest).
 
I have implemented solr Autosuggest with Suggester component. I have read in a 
blog saying, "Currently implemented Lookups keep their data in memory, so 
unlike spellchecker data, this data is discarded on core reload and not 
available until you invoke the build command, either explicitly or implicitly 
during a commit."
 
I have a Master-Slave setup. If i add new documents to Master and give commit, 
then suggest would be built( as i gave given buildOnCommit=true). But, when 
replication is done, the Slave would reload the core, At that point, will it 
affect Autosuggestion of the newly added docs.
 
Thanks,
Shri

Re: parameters to decide solr memory consumption

2012-06-20 Thread Erick Erickson
This is really difficult to answer because there are so many variables;
the number of unique terms, whether you store fields or not (which is
really unrelated to memory consumption during searching), etc, etc,
etc. So even trying the index and just looking at the index directory
won't tell you much about memory consumption.

And memory use has been dramatically improved in the 4.x code line, so
anything we can say is actually wrong.

Not to mention that your particular use of caches (filterCache, queryResultCache
etc) will change during runtime.

I'm afraid you'll just have to try it and see.

Yes, LIA is accurate...

Best
Erick

On Tue, Jun 19, 2012 at 8:28 AM, Sachin Aggarwal
 wrote:
> hello,
>
> need help regarding how  solr stores the indexes i was reading a article
> that says solr also stores the indexes in same format as explained in
> appendix B of  lucene in action   is it true
>
> and what parameters do i need to focus on while estimating the memory used
> by my use case
>
> as i have table like (userid, username, usertime, userlocation, userphn,
> timestamp, address)
> what i believe in my case cardinality of some fields like gender and
> location userphnmodel will be very less will that influence
>
> any links to read further will b appreciated.
>
> --
>
> Thanks & Regards
>
> Sachin Aggarwal
> 7760502772


Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread Erick Erickson
Internal Lucene document IDs are signed 32 bit numbers, so having
2.5B docs seems to be just _asking_ for trouble. Which could
explain the fact that this just came out of thin air. If you kept adding
docs to the problem instance, you wouldn't have changed configs
etc, just added more docs

I really think it's time to shard.

Best
Erick

On Wed, Jun 20, 2012 at 2:15 AM, avenka  wrote:
> For the first install, I copied over all files in the directory "example"
> into, let's call it, "install1". I did the same for "install2". The two
> installs run on different ports, use different jar files, are not really
> related to each other in any way as far as I can see. In particular, they
> are not "multicore". They have the same access control setup via jetty. I
> did a diff on config files and confirmed that only port numbers are
> different.
>
> Both had been running fine in parallel importing from a common database for
> several weeks. The documents indexed by install1, the problematic one
> currently, is a vastly bigger (~2.5B) superset of those indexed by install2
> (~250M).
>
> At this point, select queries on install1 incurs the NullPointerException
> irrespective of whether install2 is running or not. The log file looks like
> it is indexing normally as always though. The index is also growing at the
> usual rate each day. Just select queries fail. :(
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990476.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: 3 Way Solr Join . . ?

2012-06-20 Thread Sabeer Hussain
I have a similar situation in my application. I have five different entities.
The relationships among entities as follows

Protocol --> ( zero or more) Study --> (  zero or more) Patient
Protocol --> ( zero or more) Drug
Patient --> (zero or more) Study
Form --> (zero or many) Study

Moreover, all these entities can be exist independently also (as per the
requirement of my application). So, I cannot create a document to include
all these entities using demoralization.  If I need to find out the Drug
Name (from Drug entity), Protocol Name (from Protocol entity), Study Name
(from Study entity), Patient Name (from Patient entity) and Form Name ( from
Form entity) based on Drug Batch Number (from Drug entity) I passed. Using
Join in Solr, I can get either child or parent not from both. What is the
best way to index the data in Solr? Do I need to create separate indices for
each entity or single one for all


--
View this message in context: 
http://lucene.472066.n3.nabble.com/3-Way-Solr-Join-tp3815979p3990515.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread avenka
Erick, thanks for pointing that out. I was going to say in my original post
that it is almost like some limit on max documents got violated all of a
sudden, but the rest of the symptoms didn't seem to quite match. But now
that I think about it, the problem probably happened at 2B (corresponding
exactly to the size of the signed int space) as my ID space in the database
has roughly 85% holes and the problem probably happened when the ID hit
around 2.4B. 

It is still odd that indexing appears to proceed normally and the select
queries "know" which IDs are used because the error happens only for queries
with non-empty results, e.g., searching for an ID that doesn't exist gives a
valid "0 numResponses" response. Is this because solr uses 'long' or more
for indexing (given that the schema supports long) but not in the querying
modules?

I hadn't used solr sharding because I really needed "rolling" partitions,
where I keep a small index of recent documents and throw the rest into a
slow "archive" index. So maintaining the smaller instance2 (usually < 50M)
and replicating it if needed was my homebrewed sharding approach. But I
guess it is time to shard the archive after all.

AV

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990534.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Schema / Config Error?

2012-06-20 Thread Jan Høydahl
As I understand, James is not upgrading, but trying to start a fresh downloaded 
3.6.0.

James, can you provide some more details, especially, which AppServer are you 
using, how did you start Solr... Can you copy/paste the error msg from your log 
files?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 6. juni 2012, at 13:33, Jack Krupansky wrote:

> Read CHANGES.txt carefully, especially the section entitled "Upgrading from 
> Solr 3.5". For example,
> 
> "* As of Solr 3.6, the  and  sections of 
> solrconfig.xml are deprecated
> and replaced with a new  section. Read more in SOLR-1052 below."
> 
> If you simply copied your schema/config directly, unchanged, then this could 
> be the problem.
> 
> You may need to compare your schema/config line-by-line to the new 3.6 
> schema/config for any differences.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Erick Erickson
> Sent: Wednesday, June 06, 2012 6:57 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Schema / Config Error?
> 
> That implies one of two things:
> 1> you changed solr.xml. I'd go back to the original and re-edit
> anything you've changed
> 2> you somehow got a corrupted download. Try blowing your installation
> away and getting a new copy
> 
> Because it works perfectly for me.
> 
> Best
> Erick
> 
> On Wed, Jun 6, 2012 at 4:14 AM, Spadez  wrote:
>> Hi,
>> 
>> I installed a fresh copy of Solr 3.6.0 or my server but I get the following
>> page when I try to access Solr:
>> 
>> http://176.58.103.78:8080/solr/
>> 
>> It says errors to do with my Solr.xml. This is my solr.xml:
>> 
>> 
>> 
>> I really cant figure out how I am meant to fix this, so if anyone is able to
>> give some input I would really appreciate it.
>> 
>> James
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html
>> Sent from the Solr - User mailing list archive at Nabble.com. 
> 



Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread Erick Erickson
Let's make sure we're talking about the same thing. Solr happily
indexes and stores long (64) bit values, no problem. What it doesn't
do is assign _internal_ documents IDs as longs, those are ints.

on admin/statistics, look at maxDocs and numDocs. maxDocs +1 will be the
next _internal_ lucene doc id assigned, so if that's wonky or > 2B, this
is where the rub happens. BTW, the difference between numDocs and
maxDocs is the number of documents deleted from your index. If your number
of current documents is much smaller than 2B, you can get maxDocs
to equal numDocs if you optimize, and get yourself some more headroom.
whether your index will be OK I'm not prepared to guarantee though...

But if I'm reading your notes correctly, the "85% holes" applies to a value in
your document, and has nothing to do with the internal lucene ID issue.

But internally, the int limit isn't robustly enforced, so I'm not
surprised that it
pops out (if, indeed, this is your problem) in odd places.

Best
Erick

On Wed, Jun 20, 2012 at 10:02 AM, avenka  wrote:
> Erick, thanks for pointing that out. I was going to say in my original post
> that it is almost like some limit on max documents got violated all of a
> sudden, but the rest of the symptoms didn't seem to quite match. But now
> that I think about it, the problem probably happened at 2B (corresponding
> exactly to the size of the signed int space) as my ID space in the database
> has roughly 85% holes and the problem probably happened when the ID hit
> around 2.4B.
>
> It is still odd that indexing appears to proceed normally and the select
> queries "know" which IDs are used because the error happens only for queries
> with non-empty results, e.g., searching for an ID that doesn't exist gives a
> valid "0 numResponses" response. Is this because solr uses 'long' or more
> for indexing (given that the schema supports long) but not in the querying
> modules?
>
> I hadn't used solr sharding because I really needed "rolling" partitions,
> where I keep a small index of recent documents and throw the rest into a
> slow "archive" index. So maintaining the smaller instance2 (usually < 50M)
> and replicating it if needed was my homebrewed sharding approach. But I
> guess it is time to shard the archive after all.
>
> AV
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990534.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexation Speed?

2012-06-20 Thread Bruno Mannina

Little question please:

I have directories with around 30 files of 40Mo with around 17 000 doc 
for each files.


is it better to index:
- file by file with java -jar 1.xml, java -jar 2.xml, etc
or
- all at the same time with java -jar *.xml

All files are verified, so my question is just concerning speed

Thx for your comments,
Bruno


Le 20/06/2012 05:44, Lance Norskog a écrit :

M. Della Bitta is right- we're not talking about post.jar, but starting Solr:

java -xMx300m -jar start.jar

On Tue, Jun 19, 2012 at 10:05 AM, Erick Erickson
  wrote:

Well, it _used_ to be defaulted in the code, but on looking at 3.6 it's seems
like it defaults to Integer.MAX_VALUE, so you're fine

And it's all deprecated in 4.x, will be gone

Best
Erick

On Tue, Jun 19, 2012 at 7:07 AM, Bruno Mannina  wrote:

Actually -Xmx512m and no effect

Concerning  maxFieldLength, no problem it's commented

Le 19/06/2012 13:02, Erick Erickson a écrit :


Then try -Xmx600M
next try -Xmx900M


etc. The idea is to bump things on separate runs.

But be a little cautious here. Look in your solrconfig.xml file, you'll
see
a commented-out line
1

The default behavior for Solr/Lucene is to index the first 10,000 tokens
(not characters, think of tokens as words for not) in each
document and throw the rest on the floor. At the sizes you're talking
about,
that's probably not a problem, but do be aware of it.

Best
Erick

On Tue, Jun 19, 2012 at 5:44 AM, Bruno Manninawrote:

Like that?

java -Xmx300m -jar post.jar myfile.xml



Le 19/06/2012 11:11, Lance Norskog a écrit :


Ah! Java memory size is a java command line option:


http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html

You would try increasing the memory size in stages up to maybe 300m.

On Tue, Jun 19, 2012 at 2:04 AM, Bruno Mannina
  wrote:


Le 19/06/2012 10:51, Lance Norskog a écrit :


675 doc/s is respectable for that server. You might move the memory
allocated to Java up and down- there is a balance between amount of
memory in Java v.s. the OS disk buffer.


How can I do that ? is there an option during my command line or in a
config
file?
sorry for this newbie question :(



And, of course, use the latest trunk.

Solr 3.6



On Tue, Jun 19, 2012 at 12:10 AM, Bruno Mannina
  wrote:

Correction: file size is 40 Mo !!!

Le 19/06/2012 09:09, Bruno Mannina a écrit :


Dear All,

I would like to know if the indexation speed is right.

I have a 40Go file size with around 27 000 docs inside.
I index around 20 fields,

My (old) test server is a DualCore 3.06GHz Intel Xeon with only 1Go
Ram

The file takes 40 seconds with the command line:
java -jar post.jar myfile.xml

Could I increase this speed or reduce this time?

Thanks a lot,
PS: Newbie user









Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread avenka
Yes, wonky indeed. 
  numDocs : -2006905329
  maxDoc : -1993357870 

And yes, I meant that the holes are in the database auto-increment ID space,
nothing to do with lucene IDs.

I will set up sharding. But is there any way to retrieve most of the current
index? Currently, all select queries even in ranges in the hundreds of
millions return the NullPointerException. It would suck to lose all of this.
:(

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990542.html
Sent from the Solr - User mailing list archive at Nabble.com.


Malay Language Detection

2012-06-20 Thread Rohit
Hi,

 

We are using http://code.google.com/p/language-detection/  along with Solr
for language detection, but it seems that the following jar doesn't have
support for Malay detection.

 

So, I created the profile for malay which is used by the jar, this works in
local test environment, but I don't know how to get it to work with Solr.
Has anyone else worked on this earlier?

 

 

Regards,

Rohit

 



How to import this Json-line by DIH?

2012-06-20 Thread jueljust


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-import-this-Json-line-by-DIH-tp3990544.html
Sent from the Solr - User mailing list archive at Nabble.com.


solrj and replication

2012-06-20 Thread tom

hi,

i was just wondering if i need to do smth special if i want to have an 
embedded slave to get replication working ?


my setup is like so:
- in my clustered application that uses embedded solr(j) (for 
performance). the cores are configured as slaves that should connect to 
a master which runs in a jetty.

- the embedded codes dont expose any of the solr servlets

note: that the slave config, if started in jetty, does proper 
replication, while when embedded it doesnt.


using solr 3.5

thx

tom


Re: Indexation Speed?

2012-06-20 Thread Erick Erickson
I doubt you'll find any significant difference in indexing speed. But the
post.jar file is really intended as a demo program to quickly get the
examples working. It was never intended to be a production-ready
program. I'd think about using something like SolrJ etc. to index the docs.

And I'm assuming your documents are in the approved Solr format, somthing
like


  value for field
.
.


   .
   .
   .



solr will not index arbitrary XML. If you're trying to do this, you'll
need to transform
your arbitrary XML into the above format, consider SolrJ or something
like that in
this case.

Best
Erick

On Wed, Jun 20, 2012 at 10:40 AM, Bruno Mannina  wrote:
> Little question please:
>
> I have directories with around 30 files of 40Mo with around 17 000 doc for
> each files.
>
> is it better to index:
> - file by file with java -jar 1.xml, java -jar 2.xml, etc
> or
> - all at the same time with java -jar *.xml
>
> All files are verified, so my question is just concerning speed
>
> Thx for your comments,
> Bruno
>
>
>
> Le 20/06/2012 05:44, Lance Norskog a écrit :
>>
>> M. Della Bitta is right- we're not talking about post.jar, but starting
>> Solr:
>>
>>
>> java -xMx300m -jar start.jar
>>
>> On Tue, Jun 19, 2012 at 10:05 AM, Erick Erickson
>>   wrote:
>>>
>>> Well, it _used_ to be defaulted in the code, but on looking at 3.6 it's
>>> seems
>>> like it defaults to Integer.MAX_VALUE, so you're fine
>>>
>>> And it's all deprecated in 4.x, will be gone
>>>
>>> Best
>>> Erick
>>>
>>> On Tue, Jun 19, 2012 at 7:07 AM, Bruno Mannina  wrote:

 Actually -Xmx512m and no effect

 Concerning  maxFieldLength, no problem it's commented

 Le 19/06/2012 13:02, Erick Erickson a écrit :

> Then try -Xmx600M
> next try -Xmx900M
>
>
> etc. The idea is to bump things on separate runs.
>
> But be a little cautious here. Look in your solrconfig.xml file, you'll
> see
> a commented-out line
> 1
>
> The default behavior for Solr/Lucene is to index the first 10,000
> tokens
> (not characters, think of tokens as words for not) in each
> document and throw the rest on the floor. At the sizes you're talking
> about,
> that's probably not a problem, but do be aware of it.
>
> Best
> Erick
>
> On Tue, Jun 19, 2012 at 5:44 AM, Bruno Mannina
>  wrote:
>>
>> Like that?
>>
>> java -Xmx300m -jar post.jar myfile.xml
>>
>>
>>
>> Le 19/06/2012 11:11, Lance Norskog a écrit :
>>
>>> Ah! Java memory size is a java command line option:
>>>
>>>
>>>
>>> http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html
>>>
>>> You would try increasing the memory size in stages up to maybe 300m.
>>>
>>> On Tue, Jun 19, 2012 at 2:04 AM, Bruno Mannina
>>>  wrote:


 Le 19/06/2012 10:51, Lance Norskog a écrit :

> 675 doc/s is respectable for that server. You might move the memory
> allocated to Java up and down- there is a balance between amount of
> memory in Java v.s. the OS disk buffer.


 How can I do that ? is there an option during my command line or in
 a
 config
 file?
 sorry for this newbie question :(


> And, of course, use the latest trunk.

 Solr 3.6


> On Tue, Jun 19, 2012 at 12:10 AM, Bruno Mannina
>  wrote:
>>
>> Correction: file size is 40 Mo !!!
>>
>> Le 19/06/2012 09:09, Bruno Mannina a écrit :
>>
>>> Dear All,
>>>
>>> I would like to know if the indexation speed is right.
>>>
>>> I have a 40Go file size with around 27 000 docs inside.
>>> I index around 20 fields,
>>>
>>> My (old) test server is a DualCore 3.06GHz Intel Xeon with only
>>> 1Go
>>> Ram
>>>
>>> The file takes 40 seconds with the command line:
>>> java -jar post.jar myfile.xml
>>>
>>> Could I increase this speed or reduce this time?
>>>
>>> Thanks a lot,
>>> PS: Newbie user
>>>
>>>
>>
>>
>


Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread Erick Erickson
That indeed sucks. But I don't personally know of a good way to
try to split apart an existing index into shards. I'm afraid you're
going to be stuck with re-indexing

Wish I had a better solution
Erick

On Wed, Jun 20, 2012 at 10:45 AM, avenka  wrote:
> Yes, wonky indeed.
>  numDocs : -2006905329
>  maxDoc : -1993357870
>
> And yes, I meant that the holes are in the database auto-increment ID space,
> nothing to do with lucene IDs.
>
> I will set up sharding. But is there any way to retrieve most of the current
> index? Currently, all select queries even in ranges in the hundreds of
> millions return the NullPointerException. It would suck to lose all of this.
> :(
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990542.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread avenka
Thanks. Do you know if the tons of index files with names like '_zxt.tis' in
the index/data/ directory have the lucene IDs embedded in the binaries? The
files look good to me and are partly readable even if in binary. I am
wondering if I could just set up a new solr instance and move these index
files there and hope to use them (or most of them) as is without shards? If
so, I will just set up a separate sharded index for the documents indexed
henceforth, but won't bother splitting the huge existing index.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990560.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread Erick Erickson
Don't even try to do that. First of all, you have to have a reliable way to
index the same docs to the same shards. The docs are all mixed up
in the segment files and would lead to chaos. Solr/Lucene report
the same doc multiple times if it's indifferent shards, so if you
ever updated a document, you wouldn't know what shard to
send it to.

Second, the segments are all parts of a single index, and Solr expects
(well, actually Lucene) expects them to be consistent. Putting some on
one shard and some on another would probably not allow Solr to start
(but I confess I've never tried that).

So I really wouldn't even try to go there.

Best
Erick

On Wed, Jun 20, 2012 at 12:35 PM, avenka  wrote:
> Thanks. Do you know if the tons of index files with names like '_zxt.tis' in
> the index/data/ directory have the lucene IDs embedded in the binaries? The
> files look good to me and are partly readable even if in binary. I am
> wondering if I could just set up a new solr instance and move these index
> files there and hope to use them (or most of them) as is without shards? If
> so, I will just set up a separate sharded index for the documents indexed
> henceforth, but won't bother splitting the huge existing index.
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990560.html
> Sent from the Solr - User mailing list archive at Nabble.com.


write.lock

2012-06-20 Thread Christopher Gross
I'm running Solr 3.4.  The past 2 months I've been getting a lot of
write.lock errors.  I switched to the "simple" lockType (and made it
clear the lock on restart), but my index is still locking up a few
times a week.

I can't seem to determine what is causing the locks -- does anyone out
there have any ideas/experience as to what is causing the locks, and
what config changes that I can make in order to prevent the lock?

Any help would be very appreciated!

-- Chris


Help with Solr File Based spell check

2012-06-20 Thread Sanjay Dua - Network
Hi,

We are trying to implement file based search in our application using Solr 1.4. 
This is the code we have written

- 

- 

  default
  solr.FileBasedSpellChecker
  /usr/home/lilly/sixfeetup/projects/alm-buildout/etc/solr/spelling.txt
  ./filespellchecker
  0.7
  
  text
  


We are facing a issue and need your help on the same.

When the user searches for a word "medicine", which is a correct word and is 
present in the dictionary. We still get a suggestion "medicines" from 
dictionary.

We only want suggestion if the word is incorrectly spelled or is not included 
in the dictionary.

Can you please provide some suggestions.

Regards,
Sanjay Dua


Re: LanguageDetection inside of ExtractingRequestHandler

2012-06-20 Thread Jan Høydahl
Hi,

In my opinion, instead of hardcoding such functionality into multiple request 
handlers, we should go the opposite direction -> modularization, factoring out 
Tika extraction into its own UpdateProcessor 
(https://issues.apache.org/jira/browse/SOLR-1763). Then the 
ExtractingRequestHandler would eventually go away, and you could use it and 
language detection with any Request Handler you choose, including XML and DIH...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 19. juni 2012, at 17:10, Martin Ruckli wrote:

> Hi all,
> 
> I just wanted to check if there is a demand for this feature. I had to 
> implement this functionality for one of our customers and would like to 
> contribute it.
> 
> Here is the use case:
> We are using the ExtractingRequestHandler with the extractOnly=true flag set.
> With a request to this handler we get the content of a posted document like 
> we want to. We would also like to detect the language and return it as a 
> metadata field in the response from solr.
> As there is already support for LanguageDetection based on tika integrated 
> into solr, the only thing what I did was add a new param to enable or disable 
> this feature and then do the language detection nearly the same way as it is 
> done in the TikaLanguageIdentifierUpdateProcessor
> I think this would be a nice addition, especially in the extractOnly mode.
> 
> What are your thoughts on this?
> 
> Cheers
> Martin
> 



Exception using distributed field-collapsing

2012-06-20 Thread Bryan Loofbourrow
I am doing a search on three shards with identical schemas (I
double-checked!), using the group feature, and Solr/Lucene 3.5. Solr is
giving me back the exception listed at the bottom of this email:



Other information:



My schema uses the following field types: StrField, DateField,
TrieDateField, TextField, SortableInt, SortableLong, BoolField



My query looks like this (I’ve messed with it to anonymize but, I hope,
kept the essentials:



http://[solr core2] /select/?&start=0&rows=25&q={!qsol}machines&sort=[sort
field] &fl=[list of fields] &shards=[solr core1]%2c[solr core2]%2c[solr
core3]&group=true&group.field=[group field]



java.lang.ClassCastException: java.util.Date cannot be cast to java.lang.String

at 
org.apache.lucene.search.FieldComparator$StringOrdValComparator.compareValues(FieldComparator.java:844)

at 
org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:180)

at 
org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:154)

at java.util.TreeMap.put(TreeMap.java:547)

at java.util.TreeSet.add(TreeSet.java:255)

at 
org.apache.lucene.search.grouping.SearchGroup$GroupMerger.updateNextGroup(SearchGroup.java:222)

at 
org.apache.lucene.search.grouping.SearchGroup$GroupMerger.merge(SearchGroup.java:285)

at 
org.apache.lucene.search.grouping.SearchGroup.merge(SearchGroup.java:340)

at 
org.apache.solr.search.grouping.distributed.responseprocessor.SearchGroupShardResponseProcessor.process(SearchGroupShardResponseProcessor.java:77)

at 
org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:565)

at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:548)

at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289)

at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)

at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)

at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)

at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)

at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)

at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)

at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)

at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:679)



Any thoughts or advice?



Thanks,



-- Bryan


Re: Exception using distributed field-collapsing

2012-06-20 Thread Martijn v Groningen
Hi Bryan,

What is the fieldtype of the groupField? You can only group by field
that is of type string as is described in the wiki:
http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters

When you group by another field type a http 400 should be returned
instead if this error. At least that what I'd expect.

Martijn

On 20 June 2012 20:37, Bryan Loofbourrow
 wrote:
> I am doing a search on three shards with identical schemas (I
> double-checked!), using the group feature, and Solr/Lucene 3.5. Solr is
> giving me back the exception listed at the bottom of this email:
>
>
>
> Other information:
>
>
>
> My schema uses the following field types: StrField, DateField,
> TrieDateField, TextField, SortableInt, SortableLong, BoolField
>
>
>
> My query looks like this (I’ve messed with it to anonymize but, I hope,
> kept the essentials:
>
>
>
> http://[solr core2] /select/?&start=0&rows=25&q={!qsol}machines&sort=[sort
> field] &fl=[list of fields] &shards=[solr core1]%2c[solr core2]%2c[solr
> core3]&group=true&group.field=[group field]
>
>
>
> java.lang.ClassCastException: java.util.Date cannot be cast to 
> java.lang.String
>
>        at 
> org.apache.lucene.search.FieldComparator$StringOrdValComparator.compareValues(FieldComparator.java:844)
>
>        at 
> org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:180)
>
>        at 
> org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:154)
>
>        at java.util.TreeMap.put(TreeMap.java:547)
>
>        at java.util.TreeSet.add(TreeSet.java:255)
>
>        at 
> org.apache.lucene.search.grouping.SearchGroup$GroupMerger.updateNextGroup(SearchGroup.java:222)
>
>        at 
> org.apache.lucene.search.grouping.SearchGroup$GroupMerger.merge(SearchGroup.java:285)
>
>        at 
> org.apache.lucene.search.grouping.SearchGroup.merge(SearchGroup.java:340)
>
>        at 
> org.apache.solr.search.grouping.distributed.responseprocessor.SearchGroupShardResponseProcessor.process(SearchGroupShardResponseProcessor.java:77)
>
>        at 
> org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:565)
>
>        at 
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:548)
>
>        at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289)
>
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
>
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>
>        at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>
>        at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>
>        at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>
>        at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>
>        at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>
>        at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>
>        at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>
>        at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>
>        at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
>
>        at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
>
>        at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>
>        at java.lang.Thread.run(Thread.java:679)
>
>
>
> Any thoughts or advice?
>
>
>
> Thanks,
>
>
>
> -- Bryan



-- 
Met vriendelijke groet,

Martijn van Groningen


RE: Exception using distributed field-collapsing

2012-06-20 Thread Bryan Loofbourrow
> Hi Bryan,
>
> What is the fieldtype of the groupField? You can only group by field
> that is of type string as is described in the wiki:
> http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters
>
> When you group by another field type a http 400 should be returned
> instead if this error. At least that what I'd expect.
>
> Martijn

Martijn,

The group-by field is a string. I have been unable to figure how a date
comes into the picture at all, and have basically been wondering if there
is some problem in the grouping code that misaligns the field values from
different results in the group, so that it is not comparing like with
like. Not a strong theory, just the only thing I can think of.

-- Bryan


Re: Indexation Speed?

2012-06-20 Thread Bruno Mannina

Hi Erick,


I doubt you'll find any significant difference in indexing speed. But the
post.jar file is really intended as a demo program to quickly get the
examples working. It was never intended to be a production-ready
program. I'd think about using something like SolrJ etc. to index the docs.


ah?! I don't know yet SolrJ :(
I need to know how to program in java?

I transformed all my xml source files to the xml structure below and I'm 
using post.jar

I thought it was (post.jar) a standard tool to index docs.


And I'm assuming your documents are in the approved Solr format, somthing
like


   value for field
 .
 .


.
.
.



Yes all my xml docs have this format.


solr will not index arbitrary XML. If you're trying to do this, you'll
need to transform
your arbitrary XML into the above format, consider SolrJ or something
like that in
this case.


If all my xml docs are in the xml structure above, is it necessary to 
use SolrJ ?





RE: How to import this Json-line by DIH?

2012-06-20 Thread Steven A Rowe
Hi jueljust,

Nabble removed the entire content of your email before sending it to the 
mailing list.

Maybe use a different service that doesn't throw away your message?

Steve


From: jueljust [juelj...@gmail.com]
Sent: Wednesday, June 20, 2012 10:56 AM
To: solr-user@lucene.apache.org
Subject: How to import this Json-line by DIH?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-import-this-Json-line-by-DIH-tp3990544.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexation Speed?

2012-06-20 Thread Erik Hatcher
I think it's a bit of an "it depends" on whether post.jar is the Right choice 
for production. 

It -is- SolrJ inside after all, Erick :) and it's pretty much the same as using 
curl. Just be sure you control commits as needed. 

Erik

On Jun 20, 2012, at 15:18, Bruno Mannina  wrote:

> Hi Erick,
> 
>> I doubt you'll find any significant difference in indexing speed. But the
>> post.jar file is really intended as a demo program to quickly get the
>> examples working. It was never intended to be a production-ready
>> program. I'd think about using something like SolrJ etc. to index the docs.
> 
> ah?! I don't know yet SolrJ :(
> I need to know how to program in java?
> 
> I transformed all my xml source files to the xml structure below and I'm 
> using post.jar
> I thought it was (post.jar) a standard tool to index docs.
> 
>> And I'm assuming your documents are in the approved Solr format, somthing
>> like
>> 
>> 
>>   value for field
>> .
>> .
>> 
>> 
>>.
>>.
>>.
>> 
>> 
> Yes all my xml docs have this format.
> 
>> solr will not index arbitrary XML. If you're trying to do this, you'll
>> need to transform
>> your arbitrary XML into the above format, consider SolrJ or something
>> like that in
>> this case.
> 
> If all my xml docs are in the xml structure above, is it necessary to use 
> SolrJ ?
> 
> 


Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread avenka
Erick, thanks for the advice, but let me make sure you haven't misunderstood
what I was asking.

I am not trying to split the huge existing index in install1 into shards. I
am also not trying to make the huge install1 index as one shard of a sharded
solr setup. I plan to use a sharded setup only for future docs.

I do want to avoid trying to re-index the docs in install1 and think of them
as a slow "tape archive" index server if I ever need to go and query the
past documents. So I was wondering if I could somehow use the existing
segment files to run an isolated (unsharded) solr server that lets me query
roughly the first 2B docs before the wraparound problem happened. If the
"negative" internal doc IDs have pervasively corrupted the segment files,
this would not be possible, but I am not able to imagine an underlying
lucene design that would cause such a problem. Is my only option to re-index
the past 2B docs if I want to be able to query them at this point or is
there any way to use the existing segment files?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990615.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Apache Lucene Eurocon 2012

2012-06-20 Thread Lance Norskog
Hello Mikhail-

Your mail did not come through.

Hope things are well,

Lance Norskog
Lucid Imagination

On Wed, Jun 20, 2012 at 11:16 AM, Mikhail Khludnev
 wrote:
> up
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> 
>  



-- 
Lance Norskog
goks...@gmail.com


Re: Editing solr update handler sub class

2012-06-20 Thread Shameema Umer
Can anybody tell me where are the lucene jar files
org.apache.lucene.index and org.apache.lucene.search located?

Thanks
Shameema

On Wed, Jun 20, 2012 at 4:44 PM, Shameema Umer  wrote:
> Hi,
>
> I decompiled DirectUpdateHandler2.class to .java file and edited it to
> suit my requirement to stop overwriting duplicates(I needed the first
> fetched tstamp).
> But when I tried to compile it to .class file, it shows 91 errors. Am
> I wrong anywhere?
>
> I am new to java application but fluent in web languages.
>
> Please help.
>
> Thanks
> Shameema


Re: Editing solr update handler sub class

2012-06-20 Thread irshad siddiqui
 Hi,

Jar file are  located in dist folder . check ur dist folder or you can
check your solrconfig.xml  file where you will get jar location path.


On Thu, Jun 21, 2012 at 9:47 AM, Shameema Umer  wrote:

> Can anybody tell me where are the lucene jar files
> org.apache.lucene.index and org.apache.lucene.search located?
>
> Thanks
> Shameema
>
> On Wed, Jun 20, 2012 at 4:44 PM, Shameema Umer  wrote:
> > Hi,
> >
> > I decompiled DirectUpdateHandler2.class to .java file and edited it to
> > suit my requirement to stop overwriting duplicates(I needed the first
> > fetched tstamp).
> > But when I tried to compile it to .class file, it shows 91 errors. Am
> > I wrong anywhere?
> >
> > I am new to java application but fluent in web languages.
> >
> > Please help.
> >
> > Thanks
> > Shameema
>


Re: parameters to decide solr memory consumption

2012-06-20 Thread Sachin Aggarwal
thanks for help


hey
I tried some exercise
I m storing schema (uuid,key, userlocation)
uuid and key are unique and user location have cardinality as 150
uuid and key are stored and indexed while userlocation is indexed not
stored.
still the index directory size is 51 MB just for 200,000 records don't u
think its not optimal
what if i go for billions of records.

-- 

Thanks & Regards

Sachin Aggarwal
7760502772


Re: solr limits

2012-06-20 Thread Sachin Aggarwal
hello,

plz clarify documents means unique id's or something else

lets say i have file indexed each file no. is unique so file count will b
2.14 billions
assume i have content in database as records each record have unique id so
record count will be 2.14 billions

m i right?



-- 

Thanks & Regards

Sachin Aggarwal
7760502772


Re: Apache Lucene Eurocon 2012

2012-06-20 Thread Mikhail Khludnev
Ok.  Do you know when and where Lucene Eurocon 2012 gonna happen?

On Wed, Jun 20, 2012 at 10:16 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> up
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> 
>  
>
>


-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: solr limits

2012-06-20 Thread irshad siddiqui
Hi,

One index records is one documents along with one unique id. like in
database one rows is one document is solr.





On Thu, Jun 21, 2012 at 11:39 AM, Sachin Aggarwal <
different.sac...@gmail.com> wrote:

> hello,
>
> plz clarify documents means unique id's or something else
>
> lets say i have file indexed each file no. is unique so file count will b
> 2.14 billions
> assume i have content in database as records each record have unique id so
> record count will be 2.14 billions
>
> m i right?
>
>
>
> --
>
> Thanks & Regards
>
> Sachin Aggarwal
> 7760502772
>


Re: solr limits

2012-06-20 Thread Sachin Aggarwal
thanks ..

On Thu, Jun 21, 2012 at 11:51 AM, irshad siddiqui wrote:

> Hi,
>
> One index records is one documents along with one unique id. like in
> database one rows is one document is solr.
>
>
>
>
>
> On Thu, Jun 21, 2012 at 11:39 AM, Sachin Aggarwal <
> different.sac...@gmail.com> wrote:
>
> > hello,
> >
> > plz clarify documents means unique id's or something else
> >
> > lets say i have file indexed each file no. is unique so file count will b
> > 2.14 billions
> > assume i have content in database as records each record have unique id
> so
> > record count will be 2.14 billions
> >
> > m i right?
> >
> >
> >
> > --
> >
> > Thanks & Regards
> >
> > Sachin Aggarwal
> > 7760502772
> >
>



-- 

Thanks & Regards

Sachin Aggarwal
7760502772