Re: How to index PDF file stored in SQL Server 2008

2011-04-10 Thread Roy Liu
Hi, I have copied \apache-solr-3.1.0\dist\apache-solr-dataimporthandler-extras-3.1.0.jar into \apache-tomcat-6.0.32\webapps\solr\WEB-INF\lib\ Other Errors: Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Unclosed quotation mark after the character string 'B@3e574'. -- Best Regards,

Indexing Best Practice

2011-04-10 Thread Darx Oman
Hi guys I'm wondering how to best configure solr to fulfills my requirements. I'm indexing data from 2 data sources: 1- Database 2- PDF files (password encrypted) Every file has related information stored in the database. Both the file content and the related database fields must be indexed as

Clustering with grouping

2011-04-10 Thread ramires
hi we use solr trunk nightly 4.0. We grouped our results with no problem. When we try to clustering these with this clustering?q=rose&group=true&group.field=site we get 500 error. Problem accessing /solr/clustering. Reason: null java.lang.NullPointerException at org.apache.solr.hand

Re: How to index PDF file stored in SQL Server 2008

2011-04-10 Thread Roy Liu
Hi, all Thank YOU very much for your kindly help. *1. I have upgrade from Solr 1.4 to Solr 3.1* *2. Change data-config-sql.xml * *** * *3. solrconfig.xml and schema.xml are NOT changed.* However, when I

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-10 Thread Jayendra Patil
The migration of Tika to the latest 0.8 version seems to have reintroduced the issue. I was able to get this working again with the following patches. (Solr Cell and Data Import handler) https://issues.apache.org/jira/browse/SOLR-2416 https://issues.apache.org/jira/browse/SOLR-2332 You can try t

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-10 Thread Joey Hanzel
Hi Gary, I have been experiencing the same problem... Unable to extract content from archive file formats. I just tried again with a clean install of Solr 3.1.0 (using Tika 0.8) and continue to experience the same results. Did you have any success with this problem with Solr 1.4.1 or 3.1.0 ? I'

Re: DIH OutOfMemoryError?

2011-04-10 Thread Lance Norskog
Make sure streaming is on. Try using autoCommit in solrconfig.xml. This will push documents out of memory onto disk at a regular interval. On Thu, Mar 31, 2011 at 8:51 AM, Markus Jelsma wrote: > Try splitting the files into smaller chunks. It'll help. > >> Hi, >> >> I'm trying to index a big XML

Re: Concatenate multivalued DIH fields

2011-04-10 Thread Lance Norskog
The XPathEntityProcessor allows you to use an external XSL transform file. In that you can do anything you want. Another option is to use the script transformer: http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer On Wed, Apr 6, 2011 at 12:16 PM, alexei wrote: > Hi Everyone, > > I am

Re: difference between geospatial search from database angle and from solr angle

2011-04-10 Thread Yonik Seeley
On Sun, Apr 10, 2011 at 5:24 PM, Lance Norskog wrote: > Wait! How can you do distance calculations across different shards > efficiently? Basic spatial search (bounding box filter, radius filter, sort by distance) has no cross-document component, so "it just works" with distributed search. -Yon

Re: Solrj performance bottleneck

2011-04-10 Thread Lance Norskog
There is a separate auto-suggest tool that creates a simple in-memory database outside of the Lucene index. This is called TST. On Tue, Apr 5, 2011 at 3:36 AM, rahul wrote: > Thanks Stefan and Victor ! we are using GWT for front end. We stopped issuing > multiple asynchronous queries and issue a

Re: difference between geospatial search from database angle and from solr angle

2011-04-10 Thread Lance Norskog
Wait! How can you do distance calculations across different shards efficiently? On Thu, Apr 7, 2011 at 7:19 AM, Smiley, David W. wrote: > I haven't used PostGIS so I can't offer a real comparison. I think if you > were to try out both, you'd be impressed with Solr's performance/scalability > th

Re: Problems indexing very large set of documents

2011-04-10 Thread Lance Norskog
There is a library called iText. It parses and writes PDFs very very well, and a simple program will let you do a batch conversion. PDFs are made by a wide range of programs, not just Adobe code. Many of these do weird things and make small mistakes that Tika does not know to handle. In other word

Re: How to index PDF file stored in SQL Server 2008

2011-04-10 Thread Lance Norskog
You have to upgrade completely to the Apache Solr 3.1 release. It is worth the effort. You cannot copy any jars between Solr releases. Also, you cannot copy over jars from newer Tika releases. On Fri, Apr 8, 2011 at 10:47 AM, Darx Oman wrote: > Hi again > what you are missing is field mapping >

Re: Tips for getting unique results?

2011-04-10 Thread Shaun Campbell
Hi Pete Still think facets are what you need. We use facets to identify the most common tags for documents in our library. I use them to print the top 25 most common document tags. The sort by count (the default) gives you the one with the highest count first and then the next most common and so

Re: Lucid Works

2011-04-10 Thread Lance Norskog
Just to be clear, we are talking about two different Lucid Imagination products. The Certified Distribution is a repackaging of the public Solr releases with various add-on goodies that Lucid and others have written over the years. This is the "drop-in replacement" for the Apache release of Solr.

Re: Solr architecture diagram

2011-04-10 Thread Lance Norskog
Very cool! "The Life Cycle of the IndexSearcher" would also be a great diagram. The whole dance that happens during a commit is hard to explain. Also, it would help show why garbage collection can act up around commits. Lance On Sun, Apr 10, 2011 at 2:05 AM, Jan Høydahl wrote: >> Looks really go

Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-10 Thread Lance Norskog
There is an option somewhere to use the full XML DOM implementation for using xpaths. The purpose of the XPathEP is to be as simple and dumb as possible and handle most cases: RSS feeds and other open standards. Search for xsl(optional) http://wiki.apache.org/solr/DataImportHandler#Configuration_

Re: Solr 3.1 performance compared to 1.4.1

2011-04-10 Thread Yonik Seeley
On Fri, Apr 8, 2011 at 9:53 AM, Marius van Zwijndregt wrote: > Hello ! > > I'm new to the list, have been using SOLR for roughly 6 months and love it. > > Currently i'm setting up a 3.1 installation, next to a 1.4.1 installation > (Ubuntu server, same JVM params). I have copied the configuration f

Re: dismax "boost query" not useful?

2011-04-10 Thread Chris Hostetter
: 1. Each piece is still subject to the IDF component of the score, : requiring me to make each individual category have a boost factoring : that in. For example, if I want meta:promote to be twice as boosted as : category:featured, I can't simply boost the first to 2 and the second to : 1 (t

Build from svn

2011-04-10 Thread Harvi
Hello. Tell me please, how I can build and run solr instance from svn. After ant building I can't find any .war files. Need I create it? When I ran start.jar from example directory - I got only 404 page. Thanks.

Solr 3.1 performance compared to 1.4.1

2011-04-10 Thread Marius van Zwijndregt
Hello ! I'm new to the list, have been using SOLR for roughly 6 months and love it. Currently i'm setting up a 3.1 installation, next to a 1.4.1 installation (Ubuntu server, same JVM params). I have copied the configuration from 1.4.1 to the 3.1. Both version are running fine, but one thing ive n

Clustering with grouping

2011-04-10 Thread ramires
hi we use solr trunk nightly 4.0. We grouped our results with no problem. When we try to clustering these with this clustering?q=rose&group=true&group.field=site we get 500 error. Problem accessing /solr/clustering. Reason: null java.lang.NullPointerException at org.apache.solr.han

[Announce] Solr with Near Real Time (NRT) Functionality

2011-04-10 Thread Nagendra Nagarajayya
Hi! I would like to announce Solr with RankingAlgorithm has Near Real Time functionality now. The NRT functionality allows you to add documents without the IndexSearchers being closed or caches being cleared. A commit is not needed with the document update. Searches can run concurrently with

Re: Problems installing DIH in Solr 3.1

2011-04-10 Thread raimon.bosch
The problem was that I was putting solr-libs in tomcat/lib instead of using the feature to load libs from solrconfig.xml I have recovered my original tomcat lib's and added the following lines of xml code: At /solr is where I have my binary distribution of Solr. Thanks. -- View thi

Problems installing DIH in Solr 3.1

2011-04-10 Thread raimon.bosch
Hi all, I get a problem installing solr 3.1 that I'm not able to resolve. After following instructions in http://wiki.apache.org/solr/DIHQuickStart I get this error: 10-abr-2011 19:46:27 org.apache.solr.common.SolrException log GRAVE: org.apache.solr.common.SolrException: Error Instantiating Req

DIH null context

2011-04-10 Thread Robert Zotter
Has anyone else observed this behavior? https://issues.apache.org/jira/browse/SOLR-2463

Re: Performance with search terms starting and ending with wildcards

2011-04-10 Thread Ueland
>Which version of solr are you using ? Currently testing with 3.1 > NGrams could be an option but could you give us the field definition in > your schema ? The words count in this field index ? I wont share the complete schema but i can summarize it: For testing, we have around 30 fields used t

Re: Performance with search terms starting and ending with wildcards

2011-04-10 Thread lboutros
Which version of solr are you using ? NGrams could be an option but could you give us the field definition in your schema ? The words count in this field index ? Ludovic. 2011/4/10 Ueland [via Lucene] < ml-node+2802561-121096623-383...@n3.nabble.com> > Hi! > > I have been doing some testing wi

Performance with search terms starting and ending with wildcards

2011-04-10 Thread Ueland
Hi! I have been doing some testing with solr and wildcards. Queries like: - *foo - foo* Does complete quickly(1-2s) in a test index on about 40-50GB. But when i try to do a search for *foo*, the search time can without any trouble come upwards for 30seconds plus. Any ideas on how that issue c

Re: Build from svn

2011-04-10 Thread Harvi
I'm sorry. I solved the problem. There is a changed directory structure, and I just ran the build from inner 'solr' folder.

Re: Solr architecture diagram

2011-04-10 Thread Jan Høydahl
> Looks really good, but two bits that i think might confuse people are > the implications that a "Query Parser" then invokes a series of search > components; and that "analysis" (and the pieces of an analyzer chain) > are what to lookups in the underlying lucene index. > > the first might just