adding and updating a lot of document to Solr, metadata extraction etc

2009-10-30 Thread Eugene Dzhurinsky
Hi there!

We are trying to evaluate Apache Solr for our custom search implementation, 
which
includes the following requirements:

- ability to add/update/delete a lot of documents at once

- ability to iterate over all documents, returned in search, as Lucene does
  provide within a HitCollector instance. We would need to extract and
  aggregate various fields, stored in index, to group results and aggregate them
  in some way.

After reading the tutorial I've realized that adding and removal of documents
is performed through passing an XML file to controller in POST request.
However our XML files may be very, very large - so I hope there is some
another option to avoid interaction through HTTP protocol.

Also I did not find any way in the tutorial to access the search results with
all fields to be processed by our application.

I think I simply did not read the documentation well or missed some point, so
can somebody please point me to the articles, which may explain basics of how
to achieve my goals?

Thank you very much in advance!

-- 
Eugene N Dzhurinsky


pgpJ2ZR6rFHSF.pgp
Description: PGP signature


Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-11-03 Thread Eugene Dzhurinsky
On Mon, Nov 02, 2009 at 05:45:37PM -0800, Lance Norskog wrote:
> About large XML files and http overhead: you can tell solr to load the
> file directly from a file system. This will stream thousands of
> documents in one XML file without loading everything in memory at
> once.
> 
> This is a new book on Solr. It will help you through this early learning 
> phase.
> 
> http://www.packtpub.com/solr-1-4-enterprise-search-server

Thank you, but we have to prepare some proof of concept with the stable
version. I didn't see any 1.4.0 artifacts released to repo1.maven.org for now.

Additionally, I've learned about http://wiki.apache.org/solr/DataImportHandler
and looks like this way is preferred in my case.

I do have a lot of HTML pages on disk storage, and some metadata being stored
in SQL tables. What I seem to need is to provide some sort of EntityProcessor
and DataSource to DataImportHandler. Additionally I will need to provide some
sort of properties to instruct data source for data retrieval (table names
etc).

So may be there is some tutorial or how-to, describing the process of creation
of custom classes for importing the data into Solr 1.3.0?

Thank you in advance!

-- 
Eugene N Dzhurinsky


pgpN3WZoxS6be.pgp
Description: PGP signature


Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-11-10 Thread Eugene Dzhurinsky
On Tue, Nov 03, 2009 at 05:49:23PM -0800, Lance Norskog wrote:
> The DIH has improved a great deal from Solr 1.3 to 1.4. You will be
> much better off using the DIH from this.
> 
> This is the current Solr release candidate binary:
> http://people.apache.org/~gsingers/solr/1.4.0/

In fact we are prohibited to use release candidates/nightly builds, we are
forced to use only releases of Solr :(

-- 
Eugene N Dzhurinsky


pgp3tGF8YojpA.pgp
Description: PGP signature


Obtaining list of dynamic fields beind available in index

2009-11-13 Thread Eugene Dzhurinsky
Hi there!

How can we retrieve the complete list of dynamic fields, which are currently
available in index?

Thank you in advance!
-- 
Eugene N Dzhurinsky


pgpKftn1PiY0K.pgp
Description: PGP signature


java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421)

2009-11-17 Thread Eugene Dzhurinsky
Hi there!

I am trying to test the distributed search on 2 servers. I've created simple
application which adds sample documents to 2 different solr servers (version 
1.3.0).

While it is possible to search for certain keyphrase on any of these servers,
I am getting weird error when trying to search on both of these servers (like
it was described at http://wiki.apache.org/solr/DistributedSearch)

HTTP ERROR: 500

null

java.lang.NullPointerException
at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:265)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:264)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

These servers are using the same configuration. What may cause this error?

Thank you in advance.

-- 
Eugene N Dzhurinsky


pgp1itLsu8na6.pgp
Description: PGP signature


Re: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421)

2009-11-17 Thread Eugene Dzhurinsky
On Tue, Nov 17, 2009 at 06:09:56PM +0200, Eugene Dzhurinsky wrote:
> java.lang.NullPointerException
> at 
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421)

I compared schema.xml from Solr installation package with the one I created,
and found out that my unique key was not marked as storable. After I made it
storable and re-indexed things - everything started to work fine. Just to
record this for someone who may experience the same problem.

-- 
Eugene N Dzhurinsky


pgpwMXeCH9RcC.pgp
Description: PGP signature