Re: Viewing the Solr MoinMoin wiki offline

2013-01-08 Thread Alexandre Rafalovitch
Did we have any progress with that? I'd still love an offline package.

Regards,
   Alex.
P.s. I found 'package' command in the menu and got all excited. It let me
execute it, but the resulting zip is on the server and I cannot download
it. I hope I did not offend Infra gods so much as to get banned :-)
P.p.s. The dump command mentioned in the really first post on this looked
like a command line. Maybe it is suitable for cron invocation if the output
is useful?


Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jan 1, 2013 at 4:32 AM, Upayavira  wrote:

> I have permission to provide an export. Right now I'm thinking of it
> being a one off dump, without the user dir. If someone wants to research
> how to make moin automate it, I at least promise to listen.
>


DIH fails after processing roughly 10million records

2013-01-08 Thread vijeshnair
Solr version : 4.0 (running with 9GB of RAM)
MySQL : 5.5
JDBC : mysql-connector-java-5.1.22-bin.jar

I am trying to run the full import for my catalog data which is roughly
13million of products. The DIH ran smoothly for 18 hours, and processed
roughly 10million of records. But all of a sudden it broke due to the jdbc
exception i.e. Communication failure with the server. I did an extensive
googling on this topic, and there are multiple recommendation to use
"readonly=true", "autocommit=true" etc. If I understand it correctly, the
possible reason is when DIH stops indexing due to the segment merging, and
when it tries to reconnect with the server. When index is slightly large and
multiple merging happening at the same time, DIH stops indexing for some
time, and by the time it re-starts MySQL would have already discontinued the
connection. So I am going to increase the wait time out at MySQL side from
the default 120 to some thing slightly large, to see if that solve the issue
or not. I would know the result of that approach only after completing one
full run, which I will update you tomorrow. Mean time I thought of
validating my approach, and checking with you for any other fix which exist.

Here is the error stack

Jan 8, 2013 12:44:00 PM org.apache.solr.handler.dataimport.JdbcDataSource
closeConnection
SEVERE: Ignoring Error when closing connection
java.sql.SQLException: Streaming result set
com.mysql.jdbc.RowDataDynamic@32d051c1 is still active. No statements may be
issued when any streaming result sets are open and in use on a given
connection. Ensure that you have called .close() on any active streaming
result sets before attempting more queries.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:923)
at
com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:3234)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2399)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
at 
com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4908)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4794)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4403)
at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1594)
at
org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:400)
at
org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:391)
at
org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:291)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:280)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Jan 8, 2013 12:44:00 PM org.apache.solr.handler.dataimport.JdbcDataSource
closeConnection
SEVERE: Ignoring Error when closing connection
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException:
Communications link failure during rollback(). Transaction resolution
unknown.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1014)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:988)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:974)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:919)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4808)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4403)
at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1594)
at
org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:400)
at
org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:391)
at
org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:291)
at
org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:293)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:280)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
at
org.apache.

Re: theory of sets

2013-01-08 Thread Uwe Reh

OK, OK,

I will try it again with dynamic fields. May be the Problem has been 
something else. All statements sound reasonable.
Even Lisheng's thoughts about the impact of to many fields on memory 
consumption should not be the problem for a JVM with 32G Ram an almost 
no gc.


Please give me some time.
Thanks
Uwe


Am 08.01.2013 00:27, schrieb Zhang, Lisheng:

Hi,

Just thought this possibility: I think dynamic field is solr concept, on lcene
level all fields are the same, but in initial startup, lucene should load all
field information into memory (not field data, but schema).

If we have too many fields (like *_my_fields, * => a1, a2, ...), does this take
too much memory and slow down performance (even if very few fields are really
used)?

Best regards, Lisheng

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk]
Sent: Monday, January 07, 2013 2:57 PM
To: solr-user@lucene.apache.org
Subject: Re: theory of sets


Dynamic fields resulted in poor response times? How many fields did each
document have? I can't see how a dynamic field should have any
difference from any other field in terms of response time.

Or are you querying across a large number of dynamic fields
concurrently? I can imagine that slowing things down.

Upayavira



Am 07.01.2013 17:40, schrieb Petersen, Robert:

Hi Uwe,

We have hundreds of dynamic fields but since most of our docs only use some of 
them it doesn't seem to be a performance drag.  They can be viewed as a sparse 
matrix of fields in your indexed docs.  Then if you make the 
sortinfo_for_groupx an int then that could be used in a function query to 
perform your sorting.  See  http://wiki.apache.org/solr/FunctionQuery






Language schema Template which includes appropriate language analysis field type as a separate xml file

2013-01-08 Thread Sujatha Arun
Hi ,

Our requirement is have a separate schema for every language which differs
in the field type definition for language based analysis.If I have a
standard schema which differs only in the language analysis part ,which can
be inserted by any of the 3 methods in the schema.xml as mentioned in the
following  this wiki page,
http://wiki.apache.org/solr/SolrConfigXml
 ..

Which would be best one to go with? I am using solr 3.6.1 version

1)Xinclude does not work with 3.x schema file and patch exists only for
4.x+ version

2)Includes via Document Entities - any performance degradation while
indexing / searching due to addtional parsing expected?

3) System property substitution - Can this be used  to substitute field
types in the schema?

What are the other methods if any of achieving the same?

Thanks
Sujatha


FW: How can i multiply documents after DIH?

2013-01-08 Thread Harshvardhan Ojha
All,

Looking into a finding solution for Hotel searches based on the below criteria's

1.City/Hotel
2.Data Range
3.Persons

We have created documents which contains all the basic needed information 
inclusive of per day rates. The document looks like the below 

=


 SHL
 2013-01-06T18:30:00Z
 2013-01-06T18:30:00Z
 2008090516
2400.0
 600.0
 1423509483572690944



=

My search requirement is like

q=city AND startdate:[2013-01-06 TO 2013-01-08]

or

q=id: 2008090516 AND startdate:[2013-01-06 TO 2013-01-08]

and this combination for dates can be anything from daterange:[x TO y].

I have close to a 100K combinations to start with based on 
city,date-ranges,number of nights(days of stay) . I am looking at options to 
create search responses or even using this set of documents as an input source 
for them 

e.g: Running some Map-Reduce jobs to get all the 100K search responses and 
putting into the store or cache.  

Looking for suggestions cum options. 

Regards
Harshvardhan Ojha




RE: How can i multiply documents after DIH?

2013-01-08 Thread Shubham Srivastava
Why the fck is the FW

-Original Message-
From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com] 
Sent: 08 January 2013 16:51
To: solr-user@lucene.apache.org
Subject: FW: How can i multiply documents after DIH?

All,

Looking into a finding solution for Hotel searches based on the below criteria's

1.City/Hotel
2.Data Range
3.Persons

We have created documents which contains all the basic needed information 
inclusive of per day rates. The document looks like the below 

=


 SHL
 2013-01-06T18:30:00Z
 2013-01-06T18:30:00Z
 2008090516
2400.0
 600.0
 1423509483572690944



=

My search requirement is like

q=city AND startdate:[2013-01-06 TO 2013-01-08]

or

q=id: 2008090516 AND startdate:[2013-01-06 TO 2013-01-08]

and this combination for dates can be anything from daterange:[x TO y].

I have close to a 100K combinations to start with based on 
city,date-ranges,number of nights(days of stay) . I am looking at options to 
create search responses or even using this set of documents as an input source 
for them 

e.g: Running some Map-Reduce jobs to get all the 100K search responses and 
putting into the store or cache.  

Looking for suggestions cum options. 

Regards
Harshvardhan Ojha




RE: How can i multiply documents after DIH?

2013-01-08 Thread Shubham Srivastava
 Apologies folks was an mistake.

-Original Message-
From: Shubham Srivastava [mailto:shubham.srivast...@makemytrip.com] 
Sent: 08 January 2013 16:58
To: solr-user@lucene.apache.org
Subject: RE: How can i multiply documents after DIH?

Why the fck is the FW

-Original Message-
From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com] 
Sent: 08 January 2013 16:51
To: solr-user@lucene.apache.org
Subject: FW: How can i multiply documents after DIH?

All,

Looking into a finding solution for Hotel searches based on the below criteria's

1.City/Hotel
2.Data Range
3.Persons

We have created documents which contains all the basic needed information 
inclusive of per day rates. The document looks like the below 

=


 SHL
 2013-01-06T18:30:00Z
 2013-01-06T18:30:00Z
 2008090516
2400.0
 600.0
 1423509483572690944



=

My search requirement is like

q=city AND startdate:[2013-01-06 TO 2013-01-08]

or

q=id: 2008090516 AND startdate:[2013-01-06 TO 2013-01-08]

and this combination for dates can be anything from daterange:[x TO y].

I have close to a 100K combinations to start with based on 
city,date-ranges,number of nights(days of stay) . I am looking at options to 
create search responses or even using this set of documents as an input source 
for them 

e.g: Running some Map-Reduce jobs to get all the 100K search responses and 
putting into the store or cache.  

Looking for suggestions cum options. 

Regards
Harshvardhan Ojha




Hotel Searches

2013-01-08 Thread Harshvardhan Ojha
Hi All,

Looking into a finding solution for Hotel searches based on the below criteria's

1.City/Hotel
2.Data Range
3.Persons

We have created documents which contains all the basic needed information 
inclusive of per day rates. The document looks like the below 

=


 SHL
 2013-01-06T18:30:00Z
 2013-01-06T18:30:00Z
 2008090516
2400.0
 600.0
 1423509483572690944



=

My search requirement is like

q=city AND startdate:[2013-01-06 TO 2013-01-08]

or

q=id: 2008090516 AND startdate:[2013-01-06 TO 2013-01-08]

and this combination for dates can be anything from daterange:[x TO y].

I have close to a 100K combinations to start with based on 
city,date-ranges,number of nights(days of stay) . I am looking at options to 
create search responses or even using this set of documents as an input source 
for them 

e.g: Running some Map-Reduce jobs to get all the 100K search responses and 
putting into the store or cache.  

Looking for suggestions cum options. 

Regards
Harshvardhan Ojha


Error on using the projection parameter - fl - in Solr 4

2013-01-08 Thread samarth s
Hi all,

I am in a process of migrating my application from Solr 3.6 to Solr 4. A
query that used to work is giving an error with Solr 4.

The query looks like:
q=*:*&fl=E_abc@@xyz

The error displayed on the admin page is:
can not use FieldCache on multivalued field: E_abc

The field printed in the error has dropped the part after the character '@'.

Could not find any useful pointers on the forums, except one that has a
similar issue but while using the 'qt' parameter. Reference to this chain
is:
Subject: "multivalued filed question (FieldCache error)" on solr-user forums

Thanks for any pointers.

-- 
Regards,
Samarth


Re: Hotel Searches

2013-01-08 Thread Gora Mohanty
On 8 January 2013 17:10, Harshvardhan Ojha
 wrote:
> Hi All,
>
> Looking into a finding solution for Hotel searches based on the below 
> criteria's
[...]

Didn't you just post this on a separate thread,
complete with some nonsensical follow-up from
a colleague of yours? Please do not repost the
same message over and over again.

It is not clear what you are trying to achieve.
What is the difference between a city and a hotel
in your data? How is a person represented in
your documents? Is it by the ID field?

Are you looking to cache all possible combinations
of ID, city, and startdate? If so, to what end?  This
smells like a XY problem:
http://people.apache.org/~hossman/#xyproblem

Regards,
Gora


RE: Hotel Searches

2013-01-08 Thread Harshvardhan Ojha
Sorry for that, we just spoiled that thread so posted my question in a fresh 
thread.

Problem is indeed very simple.
I have solr documents, which has all the required fields(from db).
Say DOC1,DOC2,DOC3.DOCn.

Every document has 1 night tariff and I have 180 nights tariff.
So a person can search for any combination in these 180 nights. 

Say a request came to me to give total tariff for 10th to 15th of jan 2013.
Now I need to get a sum of tariff field of 6 docs.

So how can I keep this data indexed, to avoid search time calculation, and 
there are other dimensions of this data also beside tariff.
Hope this makes sense.

Regards
Harshvardhan Ojha

-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Tuesday, January 08, 2013 5:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Hotel Searches

On 8 January 2013 17:10, Harshvardhan Ojha  
wrote:
> Hi All,
>
> Looking into a finding solution for Hotel searches based on the below 
> criteria's
[...]

Didn't you just post this on a separate thread, complete with some nonsensical 
follow-up from a colleague of yours? Please do not repost the same message over 
and over again.

It is not clear what you are trying to achieve.
What is the difference between a city and a hotel in your data? How is a person 
represented in your documents? Is it by the ID field?

Are you looking to cache all possible combinations of ID, city, and startdate? 
If so, to what end?  This smells like a XY problem:
http://people.apache.org/~hossman/#xyproblem

Regards,
Gora


Re: Solr cloud not starting properly. Only starts leaders.

2013-01-08 Thread Erick Erickson
Solr 4.0 or a nightly build? There's been a lot of work since 4.0, I'd be
curious if you see the same problem in a nightly build.


Erick


On Mon, Jan 7, 2013 at 7:29 PM, davers wrote:

> It is new to me... I am using the collections API to delete and recreate
> collections. Having a lot of trouble. I have 6 servers and I am trying to
> create numShards=3D2 with replicationFactor=3D1 but it doesn't work
> correct=
> ly
> the first time I believe because it is racing on downloading the
> configuration directory from solr then trying to create the core. Then the
> next time I run the command it picks another 4 servers randomly.
>
> I believe the bug here is the race on downloading the configuration
> directory from zookeeper and trying to create the solrcore but I could be
> wrong. Does this sound familiar?
>
> I ended up just using the core api instead although it has the same issue.
>
> I am uploading and linking my configuration to zookeeper using ZkCLI.
>
> Then when I issue the command
>
> solr/admin/cores?action=CREATE&name=productindex&collection=productindex&shard=shard1
> it fails the first time (because it is trying to initialize the core before
> the config directory is downloaded from zookeeper). Then when I run the
> same
> command again it succeeds.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-cloud-not-starting-properly-Only-starts-leaders-tp4031349p4031405.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: SolrCloud breaks distributed query strings

2013-01-08 Thread Markus Jelsma
This problem persists, i've filed an issue to track it:  
https://issues.apache.org/jira/browse/SOLR-4285
 
-Original message-
> From:Markus Jelsma 
> Sent: Mon 17-Dec-2012 10:49
> To: solr-user@lucene.apache.org
> Subject: RE: SolrCloud breaks distributed query strings
> 
> Anyone else noticed a similar issue where Solr mangles distributed query 
> parameters? Any hints on how to track this issue? Where to look?
> 
> Thanks 
>  
> -Original message-
> > From:Markus Jelsma 
> > Sent: Wed 12-Dec-2012 15:11
> > To: solr-user@lucene.apache.org
> > Subject: RE: SolrCloud breaks distributed query strings
> > 
> > Hi Per,
> > 
> > We're running Tomcat6 with the today's checkout from trunk. I cannot 
> > remember i've seen it before and i cannot reproduce it manually in my 
> > browser, only in concurrent stress tests firing queries.
> > 
> > Thanks
> > Markus 
> >  
> > -Original message-
> > > From:Per Steffensen 
> > > Sent: Wed 12-Dec-2012 15:04
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: SolrCloud breaks distributed query strings
> > > 
> > > It doesnt sound exactly like a problem we experienced some time ago, 
> > > where long request where mixed put during transport. Jetty was to blame. 
> > > I might be Jetty that f up you request too? SOLR-4031. Are you still 
> > > running 8.1.2?
> > > 
> > > Regards, Per Steffensen
> > > 
> > > Markus Jelsma skrev:
> > > > Hi,
> > > >
> > > > We're starting to see issues on a test cluster where Solr breaks up 
> > > > query string parameters that are either defined in the request handler 
> > > > or are passed in the URL in the initial request.
> > > >
> > > > In our request handler we have an SF parameter for edismax (SOLR-3925):
> > > >
> > > >   
> > > > title_general~2^4
> > > > title_nl~2^4
> > > > title_en~2^4
> > > > title_de~2^4
> > > >  
> > > >
> > > > Almost all queries pass without issue but some fail because the 
> > > > parameter arrives in an incorrect format, i've logged several 
> > > > occurences:
> > > >
> > > > 2012-12-12 12:01:12,159 ERROR [solr.core.SolrCore] - 
> > > > [http-8080-exec-23] - : org
> > > > .apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
> > > > Invalid a
> > > > rguments for sf, must be sf=FIELD~DISTANCE^BOOST, got 
> > > > title_general~2^4
> > > > title_nl~2^4
> > > > title_en~2^4
> > > > title_de~2
> > > > 4
> > > >
> > > >   
> > > > at 
> > > > org.apache.solr.handler.component.QueryComponent.prepare(QueryCompone
> > > > nt.java:154)
> > > > 
> > > >
> > > > 2012-12-12 12:00:57,164 ERROR [solr.core.SolrCore] - [http-8080-exec-1] 
> > > > - : org.
> > > > apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
> > > > Invalid ar
> > > > guments for sf, must be sf=FIELD~DISTANCE^BOOST, got 
> > > > title_general~2^4
> > > > title_nl~2
> > > > 4
> > > > title_en~2^4
> > > > title_de~2^4
> > > >
> > > >   
> > > > at 
> > > > org.apache.solr.handler.component.QueryComponent.prepare(QueryCompone
> > > > nt.java:154)
> > > > 
> > > >
> > > > 2012-12-12 12:01:11,223 ERROR [solr.core.SolrCore] - [http-8080-exec-8] 
> > > > - : org.
> > > > apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
> > > > Invalid ar
> > > > guments for sf, must be sf=FIELD~DISTANCE^BOOST, got ^
> > > > title_general~2^4
> > > > title_nl~2^4
> > > > title_en~2^4
> > > > title_de~2^4
> > > >
> > > >   
> > > > at 
> > > > org.apache.solr.handler.component.QueryComponent.prepare(QueryCompone
> > > > nt.java:154)
> > > > 
> > > >
> > > > This seems crazy! For some reason, some times, the parameter get 
> > > > corrupted in some manner! We've also seen this with a function query in 
> > > > the edismax boost parameter where for some reasons a comma is replaced 
> > > > by a newline:
> > > >
> > > > 2012-12-12 11:11:45,527 ERROR [solr.core.SolrCore] - 
> > > > [http-8080-exec-16] - : org.apache.solr.common.SolrException: 
> > > > org.apache.solr.search.SyntaxError: Expected ',' at position 55 in 
> > > > 'if(exists(date),max(recip(ms(NOW/DAY,date),3.17e-8,143
> > > > .9),.8),.7)'
> > > > at 
> > > > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:154)
> > > > ...
> > > > at 
> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > > at java.lang.Thread.run(Thread.java:662)
> > > > Caused by: org.apache.solr.search.SyntaxError: Expected ',' at position 
> > > > 55 in 'if(exists(date),max(recip(ms(NOW/DAY,date),3.17e-8,143
> > > > .9),.8),.7)'
> > > >
> > > > Accompanying these errors is a number of AIOOBexceptions without stack 
> > > > trace and Spellchecker NPE's (SOLR-4049).  I'm completely puzzled here 
> > > > because it queries get randomly mangled in some manner. The SF 
> > > > parameter see

Re: Hotel Searches

2013-01-08 Thread Alexandre Rafalovitch
Did you look at a conversation thread from 12 Dec 2012 on this list? Just
go to the archives and search for 'hotel'. Hopefully that will give you
something to work with.

If you have any questions after that, come back with more specifics.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jan 8, 2013 at 7:18 AM, Harshvardhan Ojha <
harshvardhan.o...@makemytrip.com> wrote:

> Sorry for that, we just spoiled that thread so posted my question in a
> fresh thread.
>
> Problem is indeed very simple.
> I have solr documents, which has all the required fields(from db).
> Say DOC1,DOC2,DOC3.DOCn.
>
> Every document has 1 night tariff and I have 180 nights tariff.
> So a person can search for any combination in these 180 nights.
>
> Say a request came to me to give total tariff for 10th to 15th of jan 2013.
> Now I need to get a sum of tariff field of 6 docs.
>
> So how can I keep this data indexed, to avoid search time calculation, and
> there are other dimensions of this data also beside tariff.
> Hope this makes sense.
>
> Regards
> Harshvardhan Ojha
>
> -Original Message-
> From: Gora Mohanty [mailto:g...@mimirtech.com]
> Sent: Tuesday, January 08, 2013 5:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Hotel Searches
>
> On 8 January 2013 17:10, Harshvardhan Ojha <
> harshvardhan.o...@makemytrip.com> wrote:
> > Hi All,
> >
> > Looking into a finding solution for Hotel searches based on the below
> > criteria's
> [...]
>
> Didn't you just post this on a separate thread, complete with some
> nonsensical follow-up from a colleague of yours? Please do not repost the
> same message over and over again.
>
> It is not clear what you are trying to achieve.
> What is the difference between a city and a hotel in your data? How is a
> person represented in your documents? Is it by the ID field?
>
> Are you looking to cache all possible combinations of ID, city, and
> startdate? If so, to what end?  This smells like a XY problem:
> http://people.apache.org/~hossman/#xyproblem
>
> Regards,
> Gora
>


Re: SOLR Cloud : what is the best backup/restore strategy ?

2013-01-08 Thread Mark Miller
If your doing periodic backups, I'm just not getting why you would care. I'm 
still missing what stopping indexing would gain you.

- Mark

On Jan 8, 2013, at 1:36 AM, Otis Gospodnetic  wrote:

> Hi,
> 
> Right, you can continue indexing, but if you need to run
> http://master_host:port/solr/replication?command=backup  on each node and
> if you want a snapshot that represents a specific index state, then you
> need to stop indexing (and hard commit).  That's what I had in mind.  But
> if one just wants *some* snapshot and it doesn't matter that a snapshot on
> each node is a from a slightly different time with a slightly different
> index make up, so to speak, then yes, just continue indexing.
> 
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
> 
> 
> 
> 
> 
> On Mon, Jan 7, 2013 at 2:12 PM, Mark Miller  wrote:
> 
>> You should be able to continue indexing fine - it will just keep a point
>> in time snapshot around until the copy is done. So you can trigger a backup
>> at anytime to create a backup for that specific time, and keep indexing
>> away, and the next night do the same thing. You will always have backed up
>> to the point in time the backup command is received.
>> 
>> - Mark
>> 
>> On Jan 7, 2013, at 1:45 PM, Otis Gospodnetic 
>> wrote:
>> 
>>> Hi,
>>> 
>>> There may be a better way, but stopping indexing and then
>>> using http://master_host:port/solr/replication?command=backup on each
>> node
>>> may do the backup trick.  I'd love to see how/if others do it.
>>> 
>>> Otis
>>> --
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Mon, Jan 7, 2013 at 10:33 AM, LEFEBVRE Guillaume <
>>> guillaume.lefeb...@cegedim.fr> wrote:
>>> 
 Hello,
 
 Using a SOLR Cloud architecture, what is the best procedure to backup
>> and
 restore SOLR index and configuration ?
 
 Thanks,
 Guillaume
 
 
>> 
>> 



Re: Solr 4 exceptions on trying to create a collection

2013-01-08 Thread Mark Miller
If you are using 4.0 you can't use the CloudSolrServer with the collections API 
- you have to pick a server and use the HttpSolrServer impl. In 4.1 you can use 
the CloudSolrServer with the collections API.

- Mark

On Jan 6, 2013, at 8:42 PM, Jay Parashar  wrote:

> The exception "No live SolrServers" is being thrown when trying to create a 
> new Collection ( code at end of this mail). On the CloudSolrServer request 
> method, we have this line
> "ClientUtils.appendMap(coll, slices, clusterState.getSlices(coll));" where 
> "coll" is the new collection I am trying to create and hence 
> clusterState.getSlices(coll)); is returning null.
> And then the loop of the slices which adds to the urlList never happens and 
> hence the LBHttpSolrServer created in the CloudSolrServer has a null url list 
> in the constructor.
> This is giving the "No live SolrServers" exception.
> 
> What I am missing?
> 
> Instead of passing the CloudSolrServer to the create.process, if I pass the 
> LBHttpSolrServer  (server.getLbServer()), the collection gets created but 
> only on one server.
> 
> My code to create a new Cloud Server and new Collection:-
> 
> String[] urls = 
> {"http://127.0.0.1:8983/solr/","http://127.0.0.1:8900/solr/","http://127.0.0.1:7500/solr/","http://127.0.0.1:7574/solr/"};
> CloudSolrServer server = new CloudSolrServer("127.0.0.1:2181", new 
> LBHttpSolrServer(urls));
> server.getLbServer().getHttpClient().getParams().setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT,
>  5000);
> server.getLbServer().getHttpClient().getParams().setParameter(CoreConnectionPNames.SO_TIMEOUT,
>  2);
> server.setDefaultCollection(collectionName);
> server.connect();
> CoreAdminRequest.Create create = new CoreAdminRequest.Create();
> create.setCoreName("myColl");
> create.setCollection("myColl");
> create.setInstanceDir("defaultDir");
> create.setDataDir("myCollData");
> create.setNumShards(2);
> create.process(server); //Exception No live SolrServers  is thrown here
> 
> 
> Thanks
> Jay
> 
> 
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
> Sent: Friday, January 04, 2013 6:08 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 4 exceptions on trying to create a collection
> 
> Tried Wireshark yet to see what host/port it is trying to connect and why it 
> fails? It is a complex tool, but well worth learning.
> 
> Regards,
>  Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at once. 
> Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> 
> 
> On Fri, Jan 4, 2013 at 6:58 PM, Jay Parashar  wrote:
> 
>> Thanks! I had a different version of httpclient in the classpath. So 
>> the 2nd exception is gone but now I am  back to the first one " 
>> org.apache.solr.client.solrj.SolrServerException: No live SolrServers  
>> available to handle this request"
>> 
>> -Original Message-
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: Friday, January 04, 2013 4:21 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr 4 exceptions on trying to create a collection
>> 
>> For the second one:
>> 
>> Wrong version of library on a classpath or multiple versions of 
>> library on the classpath which causes wrong classes with missing 
>> fields/variables? Or library interface baked in and the implementation 
>> is newer. Some sort of mismatch basically. Most probably in Apache http 
>> library.
>> 
>> Regards,
>>   Alex.
>> 
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all 
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD 
>> book)
>> 
>> 
>> On Fri, Jan 4, 2013 at 4:34 PM, Jay Parashar 
>> wrote:
>> 
>>> 
>>> Hi All,
>>> 
>>> I am getting exceptions on trying to create a collection. Any help 
>>> is appreciated.
>>> 
>>> While trying to create a collection, I got this error Caused by:
>>> org.apache.solr.client.solrj.SolrServerException: No live 
>>> SolrServers available to handle this request
>>>at
>>> 
>>> 
>> org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.
>>> java:322)
>>>at
>>> 
>>> org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrS
>>> er
>>> ver.ja
>>> va:257)
>>>at
>>> 
>>> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAd
>>> mi
>>> nReque
>>> st.java:423)
>>> 
>>> 
>>> On trying to increase the server timeout by
>>> 
>>> server.getLbServer().getHttpClient().getParams().setParameter(CoreCo
>>> nn
>>> ection
>>> PNames.CONNECTION_TIMEOUT, 5000);
>>> 
>>> server.getLbServer().getHttpClient().getParams().setParameter(CoreCo
>>> nn
>>> ection
>>> PNames.SO_TIMEOUT, 2);
>>> 
>>> I get this...
>>> 
>>> SEVERE: The exception contained within MappableContainerE

Re: DIH fails after processing roughly 10million records

2013-01-08 Thread Travis Low
What you describe sounds right to me and seems consistent with the error
stacktrace..  I would increase the MySQL wait_timeout to 3600 and,
depending on your server, you might want to also increase max_connections.

cheers,

Travis

On Tue, Jan 8, 2013 at 4:10 AM, vijeshnair  wrote:

> Solr version : 4.0 (running with 9GB of RAM)
> MySQL : 5.5
> JDBC : mysql-connector-java-5.1.22-bin.jar
>
> I am trying to run the full import for my catalog data which is roughly
> 13million of products. The DIH ran smoothly for 18 hours, and processed
> roughly 10million of records. But all of a sudden it broke due to the jdbc
> exception i.e. Communication failure with the server. I did an extensive
> googling on this topic, and there are multiple recommendation to use
> "readonly=true", "autocommit=true" etc. If I understand it correctly, the
> possible reason is when DIH stops indexing due to the segment merging, and
> when it tries to reconnect with the server. When index is slightly large
> and
> multiple merging happening at the same time, DIH stops indexing for some
> time, and by the time it re-starts MySQL would have already discontinued
> the
> connection. So I am going to increase the wait time out at MySQL side from
> the default 120 to some thing slightly large, to see if that solve the
> issue
> or not. I would know the result of that approach only after completing one
> full run, which I will update you tomorrow. Mean time I thought of
> validating my approach, and checking with you for any other fix which
> exist.
>
> Here is the error stack
>
> Jan 8, 2013 12:44:00 PM org.apache.solr.handler.dataimport.JdbcDataSource
> closeConnection
> SEVERE: Ignoring Error when closing connection
> java.sql.SQLException: Streaming result set
> com.mysql.jdbc.RowDataDynamic@32d051c1 is still active. No statements may
> be
> issued when any streaming result sets are open and in use on a given
> connection. Ensure that you have called .close() on any active streaming
> result sets before attempting more queries.
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:923)
> at
> com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:3234)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2399)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
> at
> com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4908)
> at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4794)
> at
> com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4403)
> at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1594)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:400)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:391)
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:291)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:280)
> at
>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
> at
>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
> at
>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
> Jan 8, 2013 12:44:00 PM org.apache.solr.handler.dataimport.JdbcDataSource
> closeConnection
> SEVERE: Ignoring Error when closing connection
> com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException:
> Communications link failure during rollback(). Transaction resolution
> unknown.
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
> at com.mysql.jdbc.Util.getInstance(Util.java:386)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1014)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:988)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:974)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:919)
> at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4808)
> at
> com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4403)
> at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1594)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:400)
> at
>
> org.apache.solr.han

fieldtype for name

2013-01-08 Thread Michael Jones
Hi,

What would be the best fieldtype for a persons name? at the moment I'm
using text_general but, if I search for bob smith, some results I get back
might be rob thomas. In that it's matched 'ob'.

But I only really want results that are either

'bob smith'
'bob, smith'
'smith, bob'
'smith bob'

Thanks


Re: DIH fails after processing roughly 10million records

2013-01-08 Thread Shawn Heisey

On 1/8/2013 2:10 AM, vijeshnair wrote:

Solr version : 4.0 (running with 9GB of RAM)
MySQL : 5.5
JDBC : mysql-connector-java-5.1.22-bin.jar

I am trying to run the full import for my catalog data which is roughly
13million of products. The DIH ran smoothly for 18 hours, and processed
roughly 10million of records. But all of a sudden it broke due to the jdbc
exception i.e. Communication failure with the server. I did an extensive
googling on this topic, and there are multiple recommendation to use
"readonly=true", "autocommit=true" etc. If I understand it correctly, the
possible reason is when DIH stops indexing due to the segment merging, and
when it tries to reconnect with the server. When index is slightly large and
multiple merging happening at the same time, DIH stops indexing for some
time, and by the time it re-starts MySQL would have already discontinued the
connection. So I am going to increase the wait time out at MySQL side from
the default 120 to some thing slightly large, to see if that solve the issue
or not. I would know the result of that approach only after completing one
full run, which I will update you tomorrow. Mean time I thought of
validating my approach, and checking with you for any other fix which exist.



This is how I fixed it.  On version 4, this goes in the indexConfig 
section.  On 3.x it goes into indexDefaults:


  
4
4
  

A recent jira issue (LUCENE-4661) changed the maxThreadCount to 1 for 
better performance, so I'm not sure if both of my changes above are 
actually required or if just maxMergeCount will fix it.  I commented on 
the issue to find out.


https://issues.apache.org/jira/browse/LUCENE-4661

If I don't get a definitive answer soon, I'll go ahead and test for myself.

Side question: you're already setting batchSize to a negative number, right?

Thanks,
Shawn



SolrJ and Solr 4.0 | doc.getFieldValue() returns String instead of Date

2013-01-08 Thread uwe72
A Lucene 4.0 document returns for a Date field now a string value, instead of
a Date object.

 "2009-10-29T00:00:009Z"
Solr3.6 --> Date instance

Can this be set somewhere in the config?

I prefer to receive a date instance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-and-Solr-4-0-doc-getFieldValue-returns-String-instead-of-Date-tp4031588.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fieldtype for name

2013-01-08 Thread Shawn Heisey

On 1/8/2013 7:30 AM, Michael Jones wrote:

Hi,

What would be the best fieldtype for a persons name? at the moment I'm
using text_general but, if I search for bob smith, some results I get back
might be rob thomas. In that it's matched 'ob'.

But I only really want results that are either

'bob smith'
'bob, smith'
'smith, bob'
'smith bob'


A search for bob smith could only match rob thomas if the fieldtype 
includes the edge ngram filter, or if you do a fuzzy term search, or if 
the fieldtype includes a stemming filter that turns bob, robert, and rob 
into the same root word.  Eliminate all those things and it should work 
like you expect.


Thanks,
Shawn



RE: SolrJ and Solr 4.0 | doc.getFieldValue() returns String instead of Date

2013-01-08 Thread Darren Govoni

SimpleDateFormat df= new SimpleDateFormat("-MM-dd'T'hh:mm:ss.S'Z'");
Date dateObj = df.parse("2009-10-29T00:00:009Z");

--- Original Message ---
On 1/8/2013  09:34 AM uwe72 wrote:A Lucene 4.0 document returns for a Date 
field now a string value, instead of
a Date object.



Solr4.0 --> "2009-10-29T00:00:009Z"
Solr3.6 --> Date instance

Can this be set somewhere in the config?

I prefer to receive a date instance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-and-Solr-4-0-doc-getFieldValue-returns-String-instead-of-Date-tp4031588.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr 4 exceptions on trying to create a collection

2013-01-08 Thread Jay Parashar
Thanks Mark...I will use it with 4.1. For now, I used httpclient to call the
Collections api directly (do a Get on
http://127.0.0.1:8983/solr/admin/collections?action=CREATE etc). This is
working.


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, January 08, 2013 7:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4 exceptions on trying to create a collection

If you are using 4.0 you can't use the CloudSolrServer with the collections
API - you have to pick a server and use the HttpSolrServer impl. In 4.1 you
can use the CloudSolrServer with the collections API.

- Mark

On Jan 6, 2013, at 8:42 PM, Jay Parashar  wrote:

> The exception "No live SolrServers" is being thrown when trying to 
> create a new Collection ( code at end of this mail). On the
CloudSolrServer request method, we have this line
"ClientUtils.appendMap(coll, slices, clusterState.getSlices(coll));" where
"coll" is the new collection I am trying to create and hence
clusterState.getSlices(coll)); is returning null.
> And then the loop of the slices which adds to the urlList never happens
and hence the LBHttpSolrServer created in the CloudSolrServer has a null url
list in the constructor.
> This is giving the "No live SolrServers" exception.
> 
> What I am missing?
> 
> Instead of passing the CloudSolrServer to the create.process, if I pass
the LBHttpSolrServer  (server.getLbServer()), the collection gets created
but only on one server.
> 
> My code to create a new Cloud Server and new Collection:-
> 
> String[] urls = 
> {"http://127.0.0.1:8983/solr/","http://127.0.0.1:8900/solr/","http://1
> 27.0.0.1:7500/solr/","http://127.0.0.1:7574/solr/"};
> CloudSolrServer server = new CloudSolrServer("127.0.0.1:2181", new 
> LBHttpSolrServer(urls)); 
> server.getLbServer().getHttpClient().getParams().setParameter(CoreConn
> ectionPNames.CONNECTION_TIMEOUT, 5000); 
> server.getLbServer().getHttpClient().getParams().setParameter(CoreConn
> ectionPNames.SO_TIMEOUT, 2); 
> server.setDefaultCollection(collectionName);
> server.connect();
> CoreAdminRequest.Create create = new CoreAdminRequest.Create(); 
> create.setCoreName("myColl"); create.setCollection("myColl"); 
> create.setInstanceDir("defaultDir");
> create.setDataDir("myCollData");
> create.setNumShards(2);
> create.process(server); //Exception No live SolrServers  is thrown 
> here
> 
> 
> Thanks
> Jay
> 
> 
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Friday, January 04, 2013 6:08 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 4 exceptions on trying to create a collection
> 
> Tried Wireshark yet to see what host/port it is trying to connect and why
it fails? It is a complex tool, but well worth learning.
> 
> Regards,
>  Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all 
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD 
> book)
> 
> 
> On Fri, Jan 4, 2013 at 6:58 PM, Jay Parashar 
wrote:
> 
>> Thanks! I had a different version of httpclient in the classpath. So 
>> the 2nd exception is gone but now I am  back to the first one "
>> org.apache.solr.client.solrj.SolrServerException: No live SolrServers 
>> available to handle this request"
>> 
>> -Original Message-
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: Friday, January 04, 2013 4:21 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr 4 exceptions on trying to create a collection
>> 
>> For the second one:
>> 
>> Wrong version of library on a classpath or multiple versions of 
>> library on the classpath which causes wrong classes with missing 
>> fields/variables? Or library interface baked in and the 
>> implementation is newer. Some sort of mismatch basically. Most probably
in Apache http library.
>> 
>> Regards,
>>   Alex.
>> 
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all 
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via 
>> GTD
>> book)
>> 
>> 
>> On Fri, Jan 4, 2013 at 4:34 PM, Jay Parashar 
>> wrote:
>> 
>>> 
>>> Hi All,
>>> 
>>> I am getting exceptions on trying to create a collection. Any help 
>>> is appreciated.
>>> 
>>> While trying to create a collection, I got this error Caused by:
>>> org.apache.solr.client.solrj.SolrServerException: No live 
>>> SolrServers available to handle this request
>>>at
>>> 
>>> 
>>
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.
>>> java:322)
>>>at
>>> 
>>> org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrS
>>> er
>>> ver.ja
>>> va:257)
>>>at
>>> 
>>> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAd
>>> mi
>>> nReque
>>> st.java:423)
>>> 
>>> 
>>> O

Sor Cloud Autosuggest not working

2013-01-08 Thread Jay Parashar
I recently migrated to Solr Cloud (4.0.0 from 3.6.0) and my auto suggest
feature does not seem to be working. It is a typical implementation with a
"/suggest" searchHandler defined on the config.
Are there any changes I need to incorporate?

Regards
Jay



Re: DIH fails after processing roughly 10million records

2013-01-08 Thread Shawn Heisey
> A recent jira issue (LUCENE-4661) changed the maxThreadCount to 1 for
> better performance, so I'm not sure if both of my changes above are
> actually required or if just maxMergeCount will fix it.  I commented on
> the issue to find out.

Discussion on the issue has suggested that a maxThreadCount of 1 and a
maxMergeCount of 6 will probably make sure this issue never happens and
that I get the best possible performance for spinning-magnet disks.

I will be testing this theory when I make it into work today.

Thanks,
Shawn





wildcard faceting in solr cloud

2013-01-08 Thread jmozah
Hi

I am performing wildcard faceting using the patch in SOLR-247 on solr 4.0.

It works like a charm in a single instance...
But it does not work in a distributed mode...

Am i missing something?

./zahoor






Re: Solr 4 exceptions on trying to create a collection

2013-01-08 Thread Per Steffensen

JIRA about the fix for 4.1: https://issues.apache.org/jira/browse/SOLR-4140

On 1/8/13 4:01 PM, Jay Parashar wrote:

Thanks Mark...I will use it with 4.1. For now, I used httpclient to call the
Collections api directly (do a Get on
http://127.0.0.1:8983/solr/admin/collections?action=CREATE etc). This is
working.





RE: wildcard faceting in solr cloud

2013-01-08 Thread Michael Ryan
I'd guess that the patch simply doesn't implement it for distributed searches. 
The code for distributed facets is quite a bit more complicated, and I don't 
see it touched in this patch.

-Michael

-Original Message-
From: jmozah [mailto:jmo...@gmail.com] 
Sent: Tuesday, January 08, 2013 10:51 AM
To: solr-user@lucene.apache.org
Subject: wildcard faceting in solr cloud

Hi

I am performing wildcard faceting using the patch in SOLR-247 on solr 4.0.

It works like a charm in a single instance...
But it does not work in a distributed mode...

Am i missing something?

./zahoor






Re: wildcard faceting in solr cloud

2013-01-08 Thread jmozah

I can try to bump it for distributed search... 
Some pointer where to start will be helpful...
Can SOLR-2894 be a good start to look at this?

./Zahoor

On 08-Jan-2013, at 9:27 PM, Michael Ryan  wrote:

> I'd guess that the patch simply doesn't implement it for distributed 
> searches. The code for distributed facets is quite a bit more complicated, 
> and I don't see it touched in this patch.
> 
> -Michael
> 
> -Original Message-
> From: jmozah [mailto:jmo...@gmail.com] 
> Sent: Tuesday, January 08, 2013 10:51 AM
> To: solr-user@lucene.apache.org
> Subject: wildcard faceting in solr cloud
> 
> Hi
> 
> I am performing wildcard faceting using the patch in SOLR-247 on solr 4.0.
> 
> It works like a charm in a single instance...
> But it does not work in a distributed mode...
> 
> Am i missing something?
> 
> ./zahoor
> 
> 
> 
> 



Re: Sor Cloud Autosuggest not working

2013-01-08 Thread Mark Miller
I think distrib with components has to be setup a little differently - you 
might need to use shards.qt to point back to the same request handler for the 
sub searches. Just a guess - been a while since I've looked at spellcheck 
distrib support and I'm not 100% positive the suggest stuff is all distrib 
capable - though I think it should be.

- Mark

On Jan 8, 2013, at 10:06 AM, Jay Parashar  wrote:

> I recently migrated to Solr Cloud (4.0.0 from 3.6.0) and my auto suggest
> feature does not seem to be working. It is a typical implementation with a
> "/suggest" searchHandler defined on the config.
> Are there any changes I need to incorporate?
> 
> Regards
> Jay
> 



Re: fieldtype for name

2013-01-08 Thread Otis Gospodnetic
Or if synonyms are involved, which they likely aren't in this case.
although for name matching I'd think one would want them, perhaps on
another copy of the name field to allow strict vs. "nickname" matching.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 8, 2013 9:35 AM, "Shawn Heisey"  wrote:

> On 1/8/2013 7:30 AM, Michael Jones wrote:
>
>> Hi,
>>
>> What would be the best fieldtype for a persons name? at the moment I'm
>> using text_general but, if I search for bob smith, some results I get back
>> might be rob thomas. In that it's matched 'ob'.
>>
>> But I only really want results that are either
>>
>> 'bob smith'
>> 'bob, smith'
>> 'smith, bob'
>> 'smith bob'
>>
>
> A search for bob smith could only match rob thomas if the fieldtype
> includes the edge ngram filter, or if you do a fuzzy term search, or if the
> fieldtype includes a stemming filter that turns bob, robert, and rob into
> the same root word.  Eliminate all those things and it should work like you
> expect.
>
> Thanks,
> Shawn
>
>


SolrJ DirectXmlRequest

2013-01-08 Thread Ryan Josal
I have encountered an issue where using DirectXmlRequest to index data on a 
remote host results in eventually running out have temp disk space in the 
java.io.tmpdir directory.  This occurs when I process a sufficiently large 
batch of files.  About 30% of the temporary files end up permanent.  The 
filenames look like: upload__2341cdae_13c02829b77__7ffd_00029003.tmp.  Has 
anyone else had this happen before?  The relevant code is:

DirectXmlRequest up = new DirectXmlRequest( "/update", xml );
up.process(solr);

where `xml` is a String containing Solr formatted XML, and `solr` is the 
SolrServer.  When disk space is eventually exhausted, this is the error message 
that is repeatedly seen on the master host:

2013-01-07 19:22:16,911 [http-bio-8090-exec-2657] [] ERROR 
org.apache.solr.servlet.SolrDispatchFilter  [] - 
org.apache.commons.fileupload.FileUploadBase$IOFileUploadException: Processing 
of multipart/form-data request failed. No space left on device
at 
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)
at 
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
at 
org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:344)
at 
org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:397)
at 
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
... truncated stack trace

I am running Solr 3.6 on an Ubuntu 12.04 server.  I am considering working 
around this by pulling out as much as I can from XMLLoader into my client, and 
processing the XML myself into SolrInputDocuments for indexing, but this is 
certainly not ideal.

Ryan
-
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.


RE: Sor Cloud Autosuggest not working

2013-01-08 Thread Jay Parashar
Thanks Mark!

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, January 08, 2013 10:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Sor Cloud Autosuggest not working

I think distrib with components has to be setup a little differently - you
might need to use shards.qt to point back to the same request handler for
the sub searches. Just a guess - been a while since I've looked at
spellcheck distrib support and I'm not 100% positive the suggest stuff is
all distrib capable - though I think it should be.

- Mark

On Jan 8, 2013, at 10:06 AM, Jay Parashar  wrote:

> I recently migrated to Solr Cloud (4.0.0 from 3.6.0) and my auto 
> suggest feature does not seem to be working. It is a typical 
> implementation with a "/suggest" searchHandler defined on the config.
> Are there any changes I need to incorporate?
> 
> Regards
> Jay
> 



solr invalid date string

2013-01-08 Thread eShard
I'm currently running solr 4.0 alpha with manifoldCF v1.1 dev
Manifold is sending solr the datetime as milliseconds expired after
1-1-1970.
I've tried setting several date.formats in the extraction handler but I
always get this error: 
and the manifoldcf crawl aborts.
SolrCoreorg.apache.solr.common.SolrException: Invalid Date
String:'134738361' at
org.apache.solr.schema.DateField.parseMath(DateField.java:174) at
org.apache.solr.schema.TrieField.createField(TrieField.java:540)

here's my extraction handler:
requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">

  text
  solr.title
  solr.name
  link
  pubdate
  summary
  comments
  published
  
  last_modified
  attr_
  true
  ignored_

 
  -MM-dd
  -MM-dd'T'HH:mm:ss.SSS'Z'

 
-->
  


here's pubdate in the schema


the dates are already in UTC time they're just in milliseconds...

What am I doing wrong?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-invalid-date-string-tp4031661.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr invalid date string

2013-01-08 Thread Chris Hostetter
: Manifold is sending solr the datetime as milliseconds expired after
: 1-1-1970.

Hmm... are you certain there is no way to change ManifoldCF to send the 
date in ISO-8601 canonical so Solr can handle it natively?

: I've tried setting several date.formats in the extraction handler but I

Are you sure ExtractingRequestHandler is what ManifoldCF expects to send 
the documents to? I thought ManifoldCF took care of all the parsing and 
send in structured documents?

If it is ExtractingRequestHandler you need to use, then i think you are 
missunderstanding hte point of date.formats...

:
:   -MM-dd
:   -MM-dd'T'HH:mm:ss.SSS'Z'
: 

...my understanidng is that you should be configuring date.formats with 
the formats you want ExtractingRequestHandler to use to *parse* your raw 
input -- ie: -MM-dd is what you specify if the raw values in your 
document will look like 2001-12-30...

http://wiki.apache.org/solr/ExtractingRequestHandler

Unfortunately, i don't believe SimpleDateFormat has a pattern for 
specifying "millis since epoch" ... so you may need to use an 
UpdateRequestProcessor for this.

I would sanity check on the ManifoldCF user list that there isn't 
already an easier way to do this on the client side.


-Hoss


Re: wildcard faceting in solr cloud

2013-01-08 Thread jmozah
Hmm. Fixed it.

Did similar thing as SOLR-247 for distributed search.
Basically modified the FacetInfo method of the FacetComponent.java to make it 
work.. :-)

./zahoor


On 08-Jan-2013, at 9:35 PM, jmozah  wrote:

> 
> I can try to bump it for distributed search... 
> Some pointer where to start will be helpful...
> Can SOLR-2894 be a good start to look at this?
> 
> ./Zahoor
> 
> On 08-Jan-2013, at 9:27 PM, Michael Ryan  wrote:
> 
>> I'd guess that the patch simply doesn't implement it for distributed 
>> searches. The code for distributed facets is quite a bit more complicated, 
>> and I don't see it touched in this patch.
>> 
>> -Michael
>> 
>> -Original Message-
>> From: jmozah [mailto:jmo...@gmail.com] 
>> Sent: Tuesday, January 08, 2013 10:51 AM
>> To: solr-user@lucene.apache.org
>> Subject: wildcard faceting in solr cloud
>> 
>> Hi
>> 
>> I am performing wildcard faceting using the patch in SOLR-247 on solr 4.0.
>> 
>> It works like a charm in a single instance...
>> But it does not work in a distributed mode...
>> 
>> Am i missing something?
>> 
>> ./zahoor
>> 
>> 
>> 
>> 
> 



Re: solr invalid date string

2013-01-08 Thread eShard
I'll certainly ask manifold if they can send the date in the correct format.
Meanwhile;
How would I create an updater to change the format of a date?
Are there any decent examples out there?

thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-invalid-date-string-tp4031661p4031669.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr invalid date string

2013-01-08 Thread Erik Hatcher
One quick and not so dirty way to do this is to use the 
.  Oops, sorry the wiki is a 
bit sparse currently, but the feature is "documented" in the Solr 4.0 release 
as collection1/conf/update-script.js in the Solr example.  It'll probably 
require a little trial and error to get the script to do what you want just 
right, but shouldn't be too bad.  Let us know if you try this and what effect 
it has on your indexing performance (that's my biggest concern with this new 
feature).

You can also write the same sort of thing as a UpdateProcessor in Java and 
plugged into Solr more natively and surely also more performantly (but ideally 
only marginally so ultimately).

Erik



On Jan 8, 2013, at 15:21 , eShard wrote:

> I'll certainly ask manifold if they can send the date in the correct format.
> Meanwhile;
> How would I create an updater to change the format of a date?
> Are there any decent examples out there?
> 
> thanks,
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-invalid-date-string-tp4031661p4031669.html
> Sent from the Solr - User mailing list archive at Nabble.com.



DIH clean=true behavior in SolrCloud

2013-01-08 Thread jimtronic
I'm confused about the behavior of clean=true using the DataImportHandler.

When I use clean=true on just one instance, it doesn't blow all the data out
until the import succeeds. In a cluster, however, it appears to blow all the
data out of the other nodes first, then starts adding new docs.

Am I wrong about this?

Jim




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-clean-true-behavior-in-SolrCloud-tp4031680.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: wildcard faceting in solr cloud

2013-01-08 Thread Upayavira
Have you uploaded a patch to JIRA???

Upayavira

On Tue, Jan 8, 2013, at 07:57 PM, jmozah wrote:
> Hmm. Fixed it.
> 
> Did similar thing as SOLR-247 for distributed search.
> Basically modified the FacetInfo method of the FacetComponent.java to
> make it work.. :-)
> 
> ./zahoor
> 
> 
> On 08-Jan-2013, at 9:35 PM, jmozah  wrote:
> 
> > 
> > I can try to bump it for distributed search... 
> > Some pointer where to start will be helpful...
> > Can SOLR-2894 be a good start to look at this?
> > 
> > ./Zahoor
> > 
> > On 08-Jan-2013, at 9:27 PM, Michael Ryan  wrote:
> > 
> >> I'd guess that the patch simply doesn't implement it for distributed 
> >> searches. The code for distributed facets is quite a bit more complicated, 
> >> and I don't see it touched in this patch.
> >> 
> >> -Michael
> >> 
> >> -Original Message-
> >> From: jmozah [mailto:jmo...@gmail.com] 
> >> Sent: Tuesday, January 08, 2013 10:51 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: wildcard faceting in solr cloud
> >> 
> >> Hi
> >> 
> >> I am performing wildcard faceting using the patch in SOLR-247 on solr 4.0.
> >> 
> >> It works like a charm in a single instance...
> >> But it does not work in a distributed mode...
> >> 
> >> Am i missing something?
> >> 
> >> ./zahoor
> >> 
> >> 
> >> 
> >> 
> > 
> 


is there an easy way to upgrade from Solr 4 alpha to 4.0 final?

2013-01-08 Thread eShard
I just found out I must upgrade to Solr 4.0 final (from 4.0 alpha)
I'm currently running Solr 4.0 alpha on Tomcat 7.
Is there an easy way to surgically replace files and upgrade? 
Or should I completely start over with a fresh install?
Ideally, I'm looking for a set of steps...
Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-an-easy-way-to-upgrade-from-Solr-4-alpha-to-4-0-final-tp4031682.html
Sent from the Solr - User mailing list archive at Nabble.com.


Maximum number of cores on a single Solr instance

2013-01-08 Thread Uomesh
Hi Solr Experts,

Could you please suggest how many cores we can have on a single solr
instance? support we have 8 slaves running in a load balancing enviornment,
can we have around 800 cores on each slave instance?

I saw a pending request SOLR-1293 which will support lots of core but that
is not yet fixed.

Thanks,
Umesh





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Maximum-number-of-cores-on-a-single-Solr-instance-tp4031683.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fieldtype for name

2013-01-08 Thread Uwe Reh

Hi Michael,

in our index ob bibliographic metadata, we see the need for at least 
tree fields:
- name_facet: String as type, because the facet should should represent 
the original inverted format from our data.
- name: TextField for searching. This field is heavily analyzed to match 
different orders, to match synonyms, phonetic similarity, German umlauts 
and other European stuff.
- name_lc: TextField. This field is just mapped to lower case. It's used 
to boost docs with the same style of writing like the users input.


Uwe

Am 08.01.2013 15:30, schrieb Michael Jones:

Hi,

What would be the best fieldtype for a persons name? at the moment I'm
using text_general but, if I search for bob smith, some results I get back
might be rob thomas. In that it's matched 'ob'.

But I only really want results that are either

'bob smith'
'bob, smith'
'smith, bob'
'smith bob'

Thanks





Re: is there an easy way to upgrade from Solr 4 alpha to 4.0 final?

2013-01-08 Thread Shawn Heisey

On 1/8/2013 2:27 PM, eShard wrote:

I just found out I must upgrade to Solr 4.0 final (from 4.0 alpha)
I'm currently running Solr 4.0 alpha on Tomcat 7.
Is there an easy way to surgically replace files and upgrade?
Or should I completely start over with a fresh install?
Ideally, I'm looking for a set of steps...
Thanks,


For the most part, you should be able to just replace your .war file, 
erase the tomcat deployment directory (where it extracted any war 
files), and restart tomcat.  If you used any additional jar files from 
the Lucene/Solr distribution (dataimport handler, additional analyzers, 
etc., and any dependent jars) then you would have to also delete the old 
versions and copy the new versions.


If you have custom Lucene/Solr components, that is where you're most 
likely to run into trouble.  There were a number of internal Java API 
changes from alpha to beta to release that might affect those.


It's possible, but not super likely, that you might need to make config 
changes.  From what I've seen, the basics were mostly unchanged from 
ALPHA to release.  There should be a list of things that changed in 
CHANGES.txt that you can peruse for items that might affect your config.


It should go without saying, but I'll say it anyway: You should have 
enough redundancy so that you won't be down even if the upgrade goes 
badly on your secondary server(s), and you should also have good backups 
of *everything*, including your index files.


Thanks,
Shawn



Re: Maximum number of cores on a single Solr instance

2013-01-08 Thread Shawn Heisey

On 1/8/2013 2:33 PM, Uomesh wrote:

Hi Solr Experts,

Could you please suggest how many cores we can have on a single solr
instance? support we have 8 slaves running in a load balancing enviornment,
can we have around 800 cores on each slave instance?

I saw a pending request SOLR-1293 which will support lots of core but that
is not yet fixed.


There's really no way to answer your question.  It depends on resource 
availability (primarily I/O on disk and network, RAM, disk space) versus 
how much of those resources each core will consume.


Whether or not you can get 800 cores to perform well will depend on a 
lot of variables.  There are no inherent limitations in Solr that would 
prevent you from creating cores well beyond 800.


Thanks,
Shawn



coord missing from debugQuery explain?

2013-01-08 Thread Tom Burton-West
Hello,

I'm trying to understand some Solr relevance issues using debugQuery=on,
but I don't see the coord factor listed anywhere in the explain output.
My understanding is that the coord factor is not included in either the
querynorm or the fieldnorm.
What am I missing?

Tom


RE: coord missing from debugQuery explain?

2013-01-08 Thread Markus Jelsma
Hi Tom,

The coord is not written to the debug output if it's 1.0. Therefore single term 
queries never shown coord and neither do documents that match all terms in a 
multi term query.

You should see it for documents that do not match all of the terms in a multi 
term query.

I do think it should be displayed regardless of its value but you can safely 
asume it's 1.0 if you don't see it.

Cheers,
Markus

 
 
-Original message-
> From:Tom Burton-West 
> Sent: Tue 08-Jan-2013 23:03
> To: solr-user@lucene.apache.org
> Subject: coord missing from debugQuery explain?
> 
> Hello,
> 
> I'm trying to understand some Solr relevance issues using debugQuery=on,
> but I don't see the coord factor listed anywhere in the explain output.
> My understanding is that the coord factor is not included in either the
> querynorm or the fieldnorm.
> What am I missing?
> 
> Tom
> 


RE: is there an easy way to upgrade from Solr 4 alpha to 4.0 final?

2013-01-08 Thread Markus Jelsma
I am not sure this applies to alpha and final but i do think upgrading from 4.0 
to 4.1 will give you trouble regarding data in Zookeeper. At least 
clusterstate.json has changed.

Check the appropriate Jira issues between alpha and final regarding Zookeeper 
or test to make sure it works.
 
-Original message-
> From:Shawn Heisey 
> Sent: Tue 08-Jan-2013 22:50
> To: solr-user@lucene.apache.org
> Subject: Re: is there an easy way to upgrade from Solr 4 alpha to 4.0 final?
> 
> On 1/8/2013 2:27 PM, eShard wrote:
> > I just found out I must upgrade to Solr 4.0 final (from 4.0 alpha)
> > I'm currently running Solr 4.0 alpha on Tomcat 7.
> > Is there an easy way to surgically replace files and upgrade?
> > Or should I completely start over with a fresh install?
> > Ideally, I'm looking for a set of steps...
> > Thanks,
> 
> For the most part, you should be able to just replace your .war file, 
> erase the tomcat deployment directory (where it extracted any war 
> files), and restart tomcat.  If you used any additional jar files from 
> the Lucene/Solr distribution (dataimport handler, additional analyzers, 
> etc., and any dependent jars) then you would have to also delete the old 
> versions and copy the new versions.
> 
> If you have custom Lucene/Solr components, that is where you're most 
> likely to run into trouble.  There were a number of internal Java API 
> changes from alpha to beta to release that might affect those.
> 
> It's possible, but not super likely, that you might need to make config 
> changes.  From what I've seen, the basics were mostly unchanged from 
> ALPHA to release.  There should be a list of things that changed in 
> CHANGES.txt that you can peruse for items that might affect your config.
> 
> It should go without saying, but I'll say it anyway: You should have 
> enough redundancy so that you won't be down even if the upgrade goes 
> badly on your secondary server(s), and you should also have good backups 
> of *everything*, including your index files.
> 
> Thanks,
> Shawn
> 
> 


Re: is there an easy way to upgrade from Solr 4 alpha to 4.0 final?

2013-01-08 Thread Mark Miller
If it is a problem, you should be able to just stop your cluster and nuke that 
file in zookeeper, than startup with the new version.

- Mark

On Jan 8, 2013, at 5:09 PM, Markus Jelsma  wrote:

> I am not sure this applies to alpha and final but i do think upgrading from 
> 4.0 to 4.1 will give you trouble regarding data in Zookeeper. At least 
> clusterstate.json has changed.
> 
> Check the appropriate Jira issues between alpha and final regarding Zookeeper 
> or test to make sure it works.
> 
> -Original message-
>> From:Shawn Heisey 
>> Sent: Tue 08-Jan-2013 22:50
>> To: solr-user@lucene.apache.org
>> Subject: Re: is there an easy way to upgrade from Solr 4 alpha to 4.0 final?
>> 
>> On 1/8/2013 2:27 PM, eShard wrote:
>>> I just found out I must upgrade to Solr 4.0 final (from 4.0 alpha)
>>> I'm currently running Solr 4.0 alpha on Tomcat 7.
>>> Is there an easy way to surgically replace files and upgrade?
>>> Or should I completely start over with a fresh install?
>>> Ideally, I'm looking for a set of steps...
>>> Thanks,
>> 
>> For the most part, you should be able to just replace your .war file, 
>> erase the tomcat deployment directory (where it extracted any war 
>> files), and restart tomcat.  If you used any additional jar files from 
>> the Lucene/Solr distribution (dataimport handler, additional analyzers, 
>> etc., and any dependent jars) then you would have to also delete the old 
>> versions and copy the new versions.
>> 
>> If you have custom Lucene/Solr components, that is where you're most 
>> likely to run into trouble.  There were a number of internal Java API 
>> changes from alpha to beta to release that might affect those.
>> 
>> It's possible, but not super likely, that you might need to make config 
>> changes.  From what I've seen, the basics were mostly unchanged from 
>> ALPHA to release.  There should be a list of things that changed in 
>> CHANGES.txt that you can peruse for items that might affect your config.
>> 
>> It should go without saying, but I'll say it anyway: You should have 
>> enough redundancy so that you won't be down even if the upgrade goes 
>> badly on your secondary server(s), and you should also have good backups 
>> of *everything*, including your index files.
>> 
>> Thanks,
>> Shawn
>> 
>> 



SOLR '0' Status: Communication Error

2013-01-08 Thread ddineshkumar
 0 down vote favorite


I am using SOLR for indexing documents.I create index from a mysql database.
I create index from PHP which runs on wamp server. I am using SOLR PHP
client to create index. When I create index from the server on which SOLR is
deployed, everything works fine. But when I try to create index from a
different machine, I get the following error:

'0' Status: Communication Error

I tried changing php socket time out, solr commitlocktimeout and solr
writelocktimeout. But still I get the same error. When I create index from
the solr server itslef, there is no error.

PHP version : 5.2.2 SOLR version : 1.4.1

Any idea on why this happens?

Thank you




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-0-Status-Communication-Error-tp4031698.html
Sent from the Solr - User mailing list archive at Nabble.com.


Clean Up Aged Index Using DeletionPolicy

2013-01-08 Thread hyrax
Hi folks, 
I'm using Solr 4.0.0 and trying to modify the example to build a search app.
So far it works fine.
However, I couldn't figure out how to clean up old index, say index created
20 days ago.
I noticed the DeletionPolicy and I activated it by modifying solrconfig.xml
by adding:

  
  1
  
  0
  
  
 3MINUTES

I think the above should clean up the index created 3 minutes ago but it
definitely not the case...
What I want to test is that we created some indexes and 3 minutes they would
be deleted by Solr.
Could some tell me how to accomplish that?
Many many thanks

  -Hao



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clean-Up-Aged-Index-Using-DeletionPolicy-tp4031704.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: unittest fail (sometimes) for float field search

2013-01-08 Thread Roman Chyla
apparently, it fails also with @SuppressCodecs("Lucene3x")

roman


On Tue, Jan 8, 2013 at 6:15 PM, Roman Chyla  wrote:

> Hi,
>
> I have a float field 'read_count' - and unittest like:
>
> assertQ(req("q", "read_count:1.0"),
> "//doc/int[@name='recid'][.='9218920']",
> "//*[@numFound='1']");
>
> sometimes, the unittest will fail, sometimes it succeeds.
>
> @SuppressCodecs("Lucene3x")
>
> Seems to solve the issue, however I don't understand what's wrong. Is this
> behaviour expected?
>
> thanks,
>
>   roman
>
>
> INFO: Opening Searcher@752a2259 main
> 9.1.2013 06:51:32 org.apache.solr.search.SolrIndexSearcher getIndexDir
> WARNING: WARNING: Directory impl does not support setting indexDir:
> org.apache.lucene.store.MockDirectoryWrapper
> 9.1.2013 06:51:32 org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
> 9.1.2013 06:51:32 org.apache.solr.core.QuerySenderListener newSearcher
> INFO: QuerySenderListener sending requests to 
> Searcher@752a2259main{StandardDirectoryReader(segments_2:3 _0(4.0.0.2):C30)}
> 9.1.2013 06:51:32 org.apache.solr.core.QuerySenderListener newSearcher
> INFO: QuerySenderListener done.
> 9.1.2013 06:51:32 org.apache.solr.core.SolrCore registerSearcher
> INFO: [collection1] Registered new searcher 
> Searcher@752a2259main{StandardDirectoryReader(segments_2:3 _0(4.0.0.2):C30)}
>


Re: SOLR '0' Status: Communication Error

2013-01-08 Thread Shawn Heisey

On 1/8/2013 3:11 PM, ddineshkumar wrote:


I am using SOLR for indexing documents.I create index from a mysql database.
I create index from PHP which runs on wamp server. I am using SOLR PHP
client to create index. When I create index from the server on which SOLR is
deployed, everything works fine. But when I try to create index from a
different machine, I get the following error:

'0' Status: Communication Error

I tried changing php socket time out, solr commitlocktimeout and solr
writelocktimeout. But still I get the same error. When I create index from
the solr server itslef, there is no error.


When this happens, it is usually firewall software blocking the 
connection.  If that doesn't seem to be the case, get a packet capture 
at the solr server and look to see what that indicates.


Thanks,
Shawn



Re: unittest fail (sometimes) for float field search

2013-01-08 Thread Chris Hostetter

: apparently, it fails also with @SuppressCodecs("Lucene3x")

what exactly is the test failure message?

When you run tests that use the lucene test framework, any failure should 
include information about the random seed used to run the test -- that 
random seed affects things like the codec used, the directoryfactory used, 
etc...

Can you confirm wether the test reliably passes/fails consistently when 
you reuse the same seed?

Can you elaborate more on what exactly your test does? ... we probably 
need to see the entire test to make sense of why you might get 
inconsistent failures.



-Hoss


Re: Clean Up Aged Index Using DeletionPolicy

2013-01-08 Thread Shawn Heisey

On 1/8/2013 3:38 PM, hyrax wrote:

Hi folks,
I'm using Solr 4.0.0 and trying to modify the example to build a search app.
So far it works fine.
However, I couldn't figure out how to clean up old index, say index created
20 days ago.
I noticed the DeletionPolicy and I activated it by modifying solrconfig.xml
by adding:
 

   1

   0


  3MINUTES
 
I think the above should clean up the index created 3 minutes ago but it
definitely not the case...
What I want to test is that we created some indexes and 3 minutes they would
be deleted by Solr.


Because you have maxCommitsToKeep set to 1 (the default), the deletion 
policy will never really do anything out of the ordinary, and there will 
only ever be one commit point.  What the deletionPolicy does is let 
Lucene keep track of multiple commits (previous versions of the index at 
each time an index commit is done), and decide how many of the old ones 
to keep around.  It is not keeping track of multiple indexes.


If you are saying that the index should delete any document older than 3 
minutes, I do not think you can get Solr to do this on the server side. 
 You'll have to set up a client process that connects and deletes 
whatever data you wish to delete - if you can construct a query that 
identifies documents older than 3 minutes, you can do a deleteByQuery 
using that query.


Thanks,
Shawn



Re: Do we have some sort of recomposing token filter?

2013-01-08 Thread Steve Rowe
Hi Alexandre,

CombiningFilter sounds close (no option to put spaces between original terms), 
but hasn't yet been committed: 
.

Steve

On Jan 8, 2013, at 4:55 PM, Alexandre Rafalovitch  wrote:

> Hello,
> 
> I want to take a composite email address  such as "John Doe <
> john...@example.com>" and leave "John Doe" as a facet field.
> 
> So far, I got UAX29 Tokenizer combined with TypeTokenFilterFactory to
> filter out email type.
> 
> But that leaves with "John" and "Doe" as tokens which I cannot figure out
> how to combine back with extra space to make it back into John Doe.
> 
> I thought about using regexp instead to just string <>, but that feels
> even less robust.
> 
> Do we have anything ready to use for that or do I need to custom code?
> 
> Regards,
>   Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



Re: Is there faceting with Solr 4 spatial?

2013-01-08 Thread Erick Erickson
For facets, doesn't
http://localhost:8983/solr/select?wt=json&indent=true&fl=name,store&q=*:*&facet=on
&facet.query={!frange l=0 u=3}geodist(store,45.15,-93.85)
&facet.query={!frange l=3.001 u=4}geodist(store,45.15,-93.85)
&facet.query={!frange l=4.001 u=5}geodist(store,45.15,-93.85)

work (from
http://wiki.apache.org/solr/SpatialSearch#How_to_facet_by_distance)

Although I also confess to being really unfamiliar with all things
geodist...

Best
Erick


On Tue, Jan 8, 2013 at 4:02 AM, Alexandre Rafalovitch wrote:

> Hello,
>
> I am trying to understand the new Solr 4 spatial type and what it can do. I
> sort of understand the old implementation, though also far from well.
>
> The use case is to have companies that has multiple offices, for which I
> indexed locations. I then want to do a 'radar' style ranges/facets, so I
> can say "show me everything in 100k, in 300k, etc". The wiki page for old
> implementation shows how to do it, but I am having troubles figuring this
> out for new implementation.
>
> Regards,
>Alex.
> P.s. "Not yet possible", "wait till 4.1/5", etc are perfectly valid
> shortest answers for me, at this stage.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>


Re: unittest fail (sometimes) for float field search

2013-01-08 Thread Roman Chyla
The test checks we are properly getting/indexing data  - we index database
and fetch parts of the documents separately from mongodb. You can look at
the file here:
https://github.com/romanchyla/montysolr/blob/3c18312b325874bdecefceb9df63096b2cf20ca2/contrib/adsabs/src/test/org/apache/solr/update/TestAdsDataImport.java

But your comment made me to run the tests on command line and I am seeing I
can't make it fail (it fails only inside Eclipse). Sorry, I should have
tried that myself, but I am so used to running unittests inside Eclipse it
didn't occur to me...i'll try to find out what is going on...

thanks,

  roman



On Tue, Jan 8, 2013 at 6:53 PM, Chris Hostetter wrote:

>
> : apparently, it fails also with @SuppressCodecs("Lucene3x")
>
> what exactly is the test failure message?
>
> When you run tests that use the lucene test framework, any failure should
> include information about the random seed used to run the test -- that
> random seed affects things like the codec used, the directoryfactory used,
> etc...
>
> Can you confirm wether the test reliably passes/fails consistently when
> you reuse the same seed?
>
> Can you elaborate more on what exactly your test does? ... we probably
> need to see the entire test to make sense of why you might get
> inconsistent failures.
>
>
>
> -Hoss
>


Re: Error on using the projection parameter - fl - in Solr 4

2013-01-08 Thread Erick Erickson
You really have a field name with '@' symbols in it? If it worked in 3.6,
it was probably not intentional, classic "undocumented behavior".

The first thing I'd try is replacing the @ with __ in my schema...

Best
Erick

On Tue, Jan 8, 2013 at 6:58 AM, samarth s wrote:

> q=*:*&fl=E_abc@@xyz


Convert Complex Lucene Query to SolrQuery

2013-01-08 Thread Jagdish Nomula
Hello Solr Users,

I am trying to convert a complex lucene query to solrquery to use it
in a embeddedsolrserver instance.

I have tried the regular toString method without success. Is there any
suggested method to do this ?.

Greatly appreciate the response.


Thanks,




-- 
Jagadish Nomula - Senior Manager Search
Simply Hired, Inc.
370 San Aleso Ave, Ste. 200
Sunnyvale, CA 94085

simplyhired.com


Re: Convert Complex Lucene Query to SolrQuery

2013-01-08 Thread Otis Gospodnetic
Hi Jagdish,

So when you use the Lucene parser through Solr you get a different query
than if you use Lucene's QP directly?  Maybe you can share your raw/English
query?

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 8, 2013 9:14 PM, "Jagdish Nomula"  wrote:

> Hello Solr Users,
>
> I am trying to convert a complex lucene query to solrquery to use it
> in a embeddedsolrserver instance.
>
> I have tried the regular toString method without success. Is there any
> suggested method to do this ?.
>
> Greatly appreciate the response.
>
>
> Thanks,
>
>
>
>
> --
> Jagadish Nomula - Senior Manager Search
> Simply Hired, Inc.
> 370 San Aleso Ave, Ste. 200
> Sunnyvale, CA 94085
>
> simplyhired.com
>


Analysing Solr Log Files

2013-01-08 Thread deniz
Hi All,

i want to analyze the solr log file... the thing i want to do is, putting
all the queries coming to the server to a log file, on a daily or hourly
basis, and then running a tool to make analysis like most used field or
queries, the queries which have hits and so on... are there any tools that I
can do this without modifying solr source code? or I need to find a third
party tool or write my own code to process the out put in logs? 




-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Analysing-Solr-Log-Files-tp4031746.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Analysing Solr Log Files

2013-01-08 Thread Otis Gospodnetic
Deniz,

Look at Sematext Search Analytics service, it does that and a lot more.
It's free. URL below.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 8, 2013 9:23 PM, "deniz"  wrote:

> Hi All,
>
> i want to analyze the solr log file... the thing i want to do is, putting
> all the queries coming to the server to a log file, on a daily or hourly
> basis, and then running a tool to make analysis like most used field or
> queries, the queries which have hits and so on... are there any tools that
> I
> can do this without modifying solr source code? or I need to find a third
> party tool or write my own code to process the out put in logs?
>
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Analysing-Solr-Log-Files-tp4031746.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Analysing Solr Log Files

2013-01-08 Thread deniz
thank you Otis 

I have used sematext's trial version but it requires sending log files to
another url(correct me if i am wrong :) ), but i need something which could
run on local, something would be triggrered by cronjob or something could be
integrated(somehow) with the admin interface 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Analysing-Solr-Log-Files-tp4031746p4031748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Do we have some sort of recomposing token filter?

2013-01-08 Thread Otis Gospodnetic
Hi,

Are you just trying to extract the personal name? I think Java Mail has the
ability to do that.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 8, 2013 4:56 PM, "Alexandre Rafalovitch"  wrote:

> Hello,
>
> I want to take a composite email address  such as "John Doe <
> john...@example.com>" and leave "John Doe" as a facet field.
>
> So far, I got UAX29 Tokenizer combined with TypeTokenFilterFactory to
> filter out email type.
>
> But that leaves with "John" and "Doe" as tokens which I cannot figure out
> how to combine back with extra space to make it back into John Doe.
>
> I thought about using regexp instead to just string <>, but that feels
> even less robust.
>
> Do we have anything ready to use for that or do I need to custom code?
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>


Re: SolrJ DirectXmlRequest

2013-01-08 Thread Otis Gospodnetic
Hi Ryan,

I'm not sure what is creating those upload files something in Solr? Or
Tomcat?

Why not specify a different temp dir via system property command line
parameter?

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 8, 2013 12:17 PM, "Ryan Josal"  wrote:

> I have encountered an issue where using DirectXmlRequest to index data on
> a remote host results in eventually running out have temp disk space in the
> java.io.tmpdir directory.  This occurs when I process a sufficiently large
> batch of files.  About 30% of the temporary files end up permanent.  The
> filenames look like: upload__2341cdae_13c02829b77__7ffd_00029003.tmp.  Has
> anyone else had this happen before?  The relevant code is:
>
> DirectXmlRequest up = new DirectXmlRequest( "/update", xml );
> up.process(solr);
>
> where `xml` is a String containing Solr formatted XML, and `solr` is the
> SolrServer.  When disk space is eventually exhausted, this is the error
> message that is repeatedly seen on the master host:
>
> 2013-01-07 19:22:16,911 [http-bio-8090-exec-2657] [] ERROR
> org.apache.solr.servlet.SolrDispatchFilter  [] -
> org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
> Processing of multipart/form-data request failed. No space left on device
> at
> org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)
> at
> org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
> at
> org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:344)
> at
> org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:397)
> at
> org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> ... truncated stack trace
>
> I am running Solr 3.6 on an Ubuntu 12.04 server.  I am considering working
> around this by pulling out as much as I can from XMLLoader into my client,
> and processing the XML myself into SolrInputDocuments for indexing, but
> this is certainly not ideal.
>
> Ryan
> -
> This transmission (including any attachments) may contain confidential
> information, privileged material (including material protected by the
> solicitor-client or other applicable privileges), or constitute non-public
> information. Any use of this information by anyone other than the intended
> recipient is prohibited. If you have received this transmission in error,
> please immediately reply to the sender and delete this information from
> your system. Use, dissemination, distribution, or reproduction of this
> transmission by unintended recipients is not authorized and may be unlawful.
>


Re: Convert Complex Lucene Query to SolrQuery

2013-01-08 Thread Jack Krupansky
How complex? Does it use any of the more advanced Query Types or detailed 
options that are not supported in the Solr query syntax?


What specific problems did you have.

-- Jack Krupansky

-Original Message- 
From: Jagdish Nomula

Sent: Tuesday, January 08, 2013 9:13 PM
To: solr-user@lucene.apache.org
Subject: Convert Complex Lucene Query to SolrQuery

Hello Solr Users,

I am trying to convert a complex lucene query to solrquery to use it
in a embeddedsolrserver instance.

I have tried the regular toString method without success. Is there any
suggested method to do this ?.

Greatly appreciate the response.


Thanks,




--
Jagadish Nomula - Senior Manager Search
Simply Hired, Inc.
370 San Aleso Ave, Ste. 200
Sunnyvale, CA 94085

simplyhired.com 



RE: Hotel Searches

2013-01-08 Thread Harshvardhan Ojha
Hi Alex,

Thanks for your reply.
I saw "prices based on daterange using multipoints ". But this is not my 
problem. Instead the problem statement for me is pretty simple.

Say I have 100 documents each having tariff as field.
Doc1

2400.0


Doc2

2500.0


Now a user's search should give me a total tariff.

Desired result

4900.0


And this could be any combination for 100 docs it is (100+101)/2. (N*N+1)/2.

How can I get these combination of documents already indexed ?
Or is there any way to do calculations at runtime?

How can I place this constraint that if there is any 1 doc missing in a range 
don’t give me any result.(if a user asked for hotel tariff from 11th to 13th, 
and I don’t have tariff for 12th, I shouldn't add 11th and 13th only).

Hope I made my problem very simple.

Regards
Harshvardhan Ojha

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Tuesday, January 08, 2013 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Hotel Searches

Did you look at a conversation thread from 12 Dec 2012 on this list? Just go to 
the archives and search for 'hotel'. Hopefully that will give you something to 
work with.

If you have any questions after that, come back with more specifics.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jan 8, 2013 at 7:18 AM, Harshvardhan Ojha < 
harshvardhan.o...@makemytrip.com> wrote:

> Sorry for that, we just spoiled that thread so posted my question in a 
> fresh thread.
>
> Problem is indeed very simple.
> I have solr documents, which has all the required fields(from db).
> Say DOC1,DOC2,DOC3.DOCn.
>
> Every document has 1 night tariff and I have 180 nights tariff.
> So a person can search for any combination in these 180 nights.
>
> Say a request came to me to give total tariff for 10th to 15th of jan 2013.
> Now I need to get a sum of tariff field of 6 docs.
>
> So how can I keep this data indexed, to avoid search time calculation, 
> and there are other dimensions of this data also beside tariff.
> Hope this makes sense.
>
> Regards
> Harshvardhan Ojha
>
> -Original Message-
> From: Gora Mohanty [mailto:g...@mimirtech.com]
> Sent: Tuesday, January 08, 2013 5:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Hotel Searches
>
> On 8 January 2013 17:10, Harshvardhan Ojha < 
> harshvardhan.o...@makemytrip.com> wrote:
> > Hi All,
> >
> > Looking into a finding solution for Hotel searches based on the 
> > below criteria's
> [...]
>
> Didn't you just post this on a separate thread, complete with some 
> nonsensical follow-up from a colleague of yours? Please do not repost 
> the same message over and over again.
>
> It is not clear what you are trying to achieve.
> What is the difference between a city and a hotel in your data? How is 
> a person represented in your documents? Is it by the ID field?
>
> Are you looking to cache all possible combinations of ID, city, and 
> startdate? If so, to what end?  This smells like a XY problem:
> http://people.apache.org/~hossman/#xyproblem
>
> Regards,
> Gora
>


How to run many MoreLikeThis request efficiently?

2013-01-08 Thread Yandong Yao
Hi Solr Guru,

I have two set of documents in one SolrCore, each set has about 1M
documents with different document type, say 'type1' and 'type2'.

Many documents in first set are very similar with 1 or 2 documents in the
second set, What I want to get is:  for each document in set 2, return the
most similar document in set 1 using either 'MoreLikeThisHandler' or
'MoreLikeThisComponent'.

Currently I use following code to get the result, while it will send far
too many request to Solr server serially.  Is there any way to enhance this
besides using multi-threading?  Thanks very much!

for each document in set 2 whose type is 'type2'
run MoreLikeThis request against Solr server and get the most similar
document
end.

Regards,
Yandong


Re: Hotel Searches

2013-01-08 Thread Gora Mohanty
On 8 January 2013 17:48, Harshvardhan Ojha
 wrote:
> Sorry for that, we just spoiled that thread so posted my question in a fresh 
> thread.
>
> Problem is indeed very simple.
> I have solr documents, which has all the required fields(from db).
> Say DOC1,DOC2,DOC3.DOCn.
>
> Every document has 1 night tariff and I have 180 nights tariff.
> So a person can search for any combination in these 180 nights.
>
> Say a request came to me to give total tariff for 10th to 15th of jan 2013.
> Now I need to get a sum of tariff field of 6 docs.
>
> So how can I keep this data indexed, to avoid search time calculation, and 
> there are other dimensions of this data also beside tariff.
[...]

I think that you might be making this more complex
than it needs to be. To start with, have you tested
the response time for a search, plus adding the tariffs
for the returned results, to see if this meets your needs?

It is not feasible to pre-calculate, and cache the results
of all possible combinations of 1,2,3,...180 nights from
a set of 180 nights. And, that would be just one hotel.
What if the user wanted to search for a date range on
a chosen set of hotels?

I would suggest starting by defining the full extent of
what you need, and to decide on acceptable response
times which would be driven by business imperatives.
You might need to compromise on some things to make
this feasible. E.g., if rather than having separate tariffs
for each day, you could do with daily, weekly, monthly,
etc., rates.

Regards,
Gora