Solr 3.6.2 or 4.0

2013-01-04 Thread vijeshnair
We are starting a new e-com application from this month onwards, for which I
am trying to identify the right SOLR release. We were using 3.4 in our
previous project, bu I have read in multiple blogs and forums about the
improvements that SOLR 4 has in terms of efficient memory management, less
OOMs etc. So my question would be, can I start using SOLR 4 for my new
project ? Why is it that Apache keeping both 3.6.2 and 4.0 releases in the
downloads? Are there any major changes in 4.0 comparing to 3.x, so that I
should study those changes before getting in to 4.0 ?  Please help, so that
I can propose 4.0 to my team.

Thanks
Vijesh Nair



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-2-or-4-0-tp4030527.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.6.2 or 4.0

2013-01-05 Thread vijeshnair
Thanks guyz, it really helps. I will now go ahead with my original plan of
using 4.0 for this project, I should be able to update you guyz soon on
this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-2-or-4-0-tp4030527p4030842.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH fails after processing roughly 10million records

2013-01-08 Thread vijeshnair
Solr version : 4.0 (running with 9GB of RAM)
MySQL : 5.5
JDBC : mysql-connector-java-5.1.22-bin.jar

I am trying to run the full import for my catalog data which is roughly
13million of products. The DIH ran smoothly for 18 hours, and processed
roughly 10million of records. But all of a sudden it broke due to the jdbc
exception i.e. Communication failure with the server. I did an extensive
googling on this topic, and there are multiple recommendation to use
"readonly=true", "autocommit=true" etc. If I understand it correctly, the
possible reason is when DIH stops indexing due to the segment merging, and
when it tries to reconnect with the server. When index is slightly large and
multiple merging happening at the same time, DIH stops indexing for some
time, and by the time it re-starts MySQL would have already discontinued the
connection. So I am going to increase the wait time out at MySQL side from
the default 120 to some thing slightly large, to see if that solve the issue
or not. I would know the result of that approach only after completing one
full run, which I will update you tomorrow. Mean time I thought of
validating my approach, and checking with you for any other fix which exist.

Here is the error stack

Jan 8, 2013 12:44:00 PM org.apache.solr.handler.dataimport.JdbcDataSource
closeConnection
SEVERE: Ignoring Error when closing connection
java.sql.SQLException: Streaming result set
com.mysql.jdbc.RowDataDynamic@32d051c1 is still active. No statements may be
issued when any streaming result sets are open and in use on a given
connection. Ensure that you have called .close() on any active streaming
result sets before attempting more queries.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:923)
at
com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:3234)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2399)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
at 
com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4908)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4794)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4403)
at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1594)
at
org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:400)
at
org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:391)
at
org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:291)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:280)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Jan 8, 2013 12:44:00 PM org.apache.solr.handler.dataimport.JdbcDataSource
closeConnection
SEVERE: Ignoring Error when closing connection
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException:
Communications link failure during rollback(). Transaction resolution
unknown.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1014)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:988)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:974)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:919)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4808)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4403)
at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1594)
at
org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:400)
at
org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:391)
at
org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:291)
at
org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:293)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:280)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
at
org.apache.

SOLR 4 getting stuck during restart

2013-01-19 Thread vijeshnair
I have my index based spell checker configured, and the select request
handlers are configured with collation i.e. true

For my testing I have indexed 2 million records there after generated the
index based dictionary (I am evaluating the DirectSpellChecker, I am seeing
my memory consumption is more when I use DirectSpellChecker).  Now I wanted
modify some threshold parameters in the config file for the new spellchecker
component. So when I restarted my tomcat, the tomcat re-start is getting
stuck at the following line

INFO: QuerySenderListener sending requests to Searcher@332b9f79
main{StandardDirectoryReader(segments_1f:281 _2x(4.0.0.2):C1653773)}

Any comments, am I missing some thing or some miss configuration. Please
help.

My temporary work around :- removed the index based dictionary which was
created before, and restarted. I will regenerate the dictionary now.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-getting-stuck-during-restart-tp4034734.html
Sent from the Solr - User mailing list archive at Nabble.com.


Data import handler start bulging the memory after completing 1 million

2013-01-20 Thread vijeshnair
 

You may refer this snapshot to get an understanding of the resource
consumption. I am trying to index a total number of 13 million documents
from MySQL to SOLR. First 1 million document's got completed very smoothly
in the first 2 minutes, later it started bulging the RAM, and it never gets
released in between. I have tried all the known tricks and tactics, still
failing to rectify this issue. I am using SOLR 4.0, using DIH to import from
MySQL 5.5 DB. Any help will be much appreciated, and I am trying to find any
loop hole in my schema and config files.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-import-handler-start-bulging-the-memory-after-completing-1-million-tp4034949.html
Sent from the Solr - User mailing list archive at Nabble.com.


Full import through DIH leaving documents as uncommited

2013-01-21 Thread vijeshnair
I am using SOLR 4.0 , and using DIH to import and index the catalog data from
my MySQL database. The DIH took around 55minutes to complete the indexing,
but my max documents processed and actual documents which got indexed were
not matching. So when I checked the update handler statistics, it's showing
me roughly six hundred thousand documents as pending. Here is the update
handler stats

commits:2130
autocommit maxTime:   15000ms
autocommits:169
soft autocommit maxTime:1000ms
soft autocommits:1958
optimizes:0
rollbacks:0
expungeDeletes:0
docsPending:622009
adds:0
deletesById:0
deletesByQuery:0
errors:12
cumulative_adds:12191849
cumulative_deletesById:0
cumulative_deletesByQuery:1
cumulative_errors:0

Now when I have tried to commit it manually, i.e. /update?commit=true , it's
was throwing an out of memory error. It says writer hit out of memory error,
so cannot commit. Here is the stack

java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot
commit
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2717)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2875)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2855)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)


It will be really great help if you can tell me a way to commit those
pending documents. Shall I restart the tomcat ? Any help will be much
appreciated.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Full-import-through-DIH-leaving-documents-as-uncommited-tp4035084.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Full import through DIH leaving documents as uncommited

2013-01-21 Thread vijeshnair
I saw the following in the IndexWriter Java doc page

NOTE: if you hit an OutOfMemoryError then IndexWriter will quietly record
this fact and block all future segment commits. This is a defensive measure
in case any internal state (buffered documents and deletions) were
corrupted. Any subsequent calls to commit() will throw an
IllegalStateException. The only course of action is to call close(), which
internally will call rollback(), to undo any changes to the index since the
last commit. You can also just call rollback() directly.

So am I left with only option of restarting the server ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Full-import-through-DIH-leaving-documents-as-uncommited-tp4035084p4035097.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR 4 getting stuck during restart

2013-01-24 Thread vijeshnair
Thanks James for the heads up and apologies for a delayed response.Here's the
full details about this issue. Mine is an e-com app so the index contains
the product catalog comprising roughly 13million products. At this point I
thought of using the index based dictionary as the bet option for the "Did
you Mean" functionality. I am not sure if every one facing this issue, but
here is what I am observing as far as dictionary is concerned. 

Index based dictionary

- I was building the dictionary using the following url, once I completed
the full indexing. For the time being I have kept the buildOnCommit and
buildOnOptimize options intentionally to false, as I didn't want it to slow
down the full indexing.

http://localhost:8090/solr/select?rows=0&spellcheck=true&spellcheck.build=true&spellcheck.dictionary=jarowinkler
 

- Once I created the dictionary when I tried to re-start my tomcat, I am
facing the issue which I have stated before (I was waiting for around 20mts,
the restart didn't happen).
- When I removed the dictionary from the "data" folder, the server restart
started working. 
- I have tried the spellcheck.collation=false as you suggested, but it
didn't help.

Direct Spell Checker

I have experimented with the new "DirectSolrSpellChecker", where it does not
create a separate dictionary folder, rather build the spellchecker in the
main index itself. The results were exactly same as before, I was getting
stuck during the restarts. I think the traditional spellchecker would be
better in this case, as you can remove, restart and move back the dictionary
as and when required. Where in case of DirectSolrSpellChecker, it doesn't
create a separate dictionary folder, so not sure what to remove from the
index, so that server can restart.

James, I will request you to validate this, and it will be really great help
if you can point out if I am doing any mistakes here. If you think what I am
doing make sens, I will go ahead and log this bug in JIIRA.

Thanks
Vijesh K Nair



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-getting-stuck-during-restart-tp4034734p4036163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Question on Facet field constraints sort order

2013-01-31 Thread vijeshnair
It could be a foolish question or concern, but I have no option :-) . We do
have an e-com site where we consuming the feed from the CSE partners and
indexing it in to SOLR for our search. Instead of the traditional
auto-suggest, the predictive search in the header search box recommends the
categories(category facet) for which it found the matching for the given
keyword. With this approach for a search like "apple iphone" will yield more
results for "cell phone accessories" than "Cell phone", hence in the drop
down cell phone accessories will come first then the cell phone. Which is
quiet natural and works as expected as we have the default sorting for facet
constraints as "count". Today my boss "tech director" was asking me to tweak
this order, i.e. business team will prioritize the whole 1300 categories
which is available today in my taxonomy in some order, then my category
facet constraint's order should be based on the order that they are
providing to us. He was telling me in Oracle Endeca it is possible, where he
was showing me to change the order of category etc, mean to say any sort of
customization to change the order etc, so check if SOLR supports. Though my
answer was no to that, he was proposing to handle this in the code other
wise, i.e. change the order in the client side. So the intention of writing
this is to check whether there are any such options available in SOLR or
not. I understand the two types of sorting which is available i.e count and
index, are there some thing beyond? where I can alter this using an external
list or some thing like that. Any help will be appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-Facet-field-constraints-sort-order-tp4037647.html
Sent from the Solr - User mailing list archive at Nabble.com.