date:20110410

Re: How to index PDF file stored in SQL Server 2008

2011-04-10 Thread Roy Liu

Hi, I have copied \apache-solr-3.1.0\dist\apache-solr-dataimporthandler-extras-3.1.0.jar into \apache-tomcat-6.0.32\webapps\solr\WEB-INF\lib\ Other Errors: Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Unclosed quotation mark after the character string 'B@3e574'. -- Best Regards,

Indexing Best Practice

2011-04-10 Thread Darx Oman

Hi guys I'm wondering how to best configure solr to fulfills my requirements. I'm indexing data from 2 data sources: 1- Database 2- PDF files (password encrypted) Every file has related information stored in the database. Both the file content and the related database fields must be indexed as

Clustering with grouping

2011-04-10 Thread ramires

hi we use solr trunk nightly 4.0. We grouped our results with no problem. When we try to clustering these with this clustering?q=rose&group=true&group.field=site we get 500 error. Problem accessing /solr/clustering. Reason: null java.lang.NullPointerException at org.apache.solr.hand

Re: How to index PDF file stored in SQL Server 2008

2011-04-10 Thread Roy Liu

Hi, all Thank YOU very much for your kindly help. *1. I have upgrade from Solr 1.4 to Solr 3.1* *2. Change data-config-sql.xml * *** * *3. solrconfig.xml and schema.xml are NOT changed.* However, when I

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-10 Thread Jayendra Patil

The migration of Tika to the latest 0.8 version seems to have reintroduced the issue. I was able to get this working again with the following patches. (Solr Cell and Data Import handler) https://issues.apache.org/jira/browse/SOLR-2416 https://issues.apache.org/jira/browse/SOLR-2332 You can try t

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-10 Thread Joey Hanzel

Hi Gary, I have been experiencing the same problem... Unable to extract content from archive file formats. I just tried again with a clean install of Solr 3.1.0 (using Tika 0.8) and continue to experience the same results. Did you have any success with this problem with Solr 1.4.1 or 3.1.0 ? I'

Re: DIH OutOfMemoryError?

2011-04-10 Thread Lance Norskog

Make sure streaming is on. Try using autoCommit in solrconfig.xml. This will push documents out of memory onto disk at a regular interval. On Thu, Mar 31, 2011 at 8:51 AM, Markus Jelsma wrote: > Try splitting the files into smaller chunks. It'll help. > >> Hi, >> >> I'm trying to index a big XML

Re: Concatenate multivalued DIH fields

2011-04-10 Thread Lance Norskog

The XPathEntityProcessor allows you to use an external XSL transform file. In that you can do anything you want. Another option is to use the script transformer: http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer On Wed, Apr 6, 2011 at 12:16 PM, alexei wrote: > Hi Everyone, > > I am

Re: difference between geospatial search from database angle and from solr angle

2011-04-10 Thread Yonik Seeley

On Sun, Apr 10, 2011 at 5:24 PM, Lance Norskog wrote: > Wait! How can you do distance calculations across different shards > efficiently? Basic spatial search (bounding box filter, radius filter, sort by distance) has no cross-document component, so "it just works" with distributed search. -Yon

Re: Solrj performance bottleneck

2011-04-10 Thread Lance Norskog

There is a separate auto-suggest tool that creates a simple in-memory database outside of the Lucene index. This is called TST. On Tue, Apr 5, 2011 at 3:36 AM, rahul wrote: > Thanks Stefan and Victor ! we are using GWT for front end. We stopped issuing > multiple asynchronous queries and issue a

Re: difference between geospatial search from database angle and from solr angle

2011-04-10 Thread Lance Norskog

Wait! How can you do distance calculations across different shards efficiently? On Thu, Apr 7, 2011 at 7:19 AM, Smiley, David W. wrote: > I haven't used PostGIS so I can't offer a real comparison. I think if you > were to try out both, you'd be impressed with Solr's performance/scalability > th

Re: Problems indexing very large set of documents

2011-04-10 Thread Lance Norskog

There is a library called iText. It parses and writes PDFs very very well, and a simple program will let you do a batch conversion. PDFs are made by a wide range of programs, not just Adobe code. Many of these do weird things and make small mistakes that Tika does not know to handle. In other word

Re: How to index PDF file stored in SQL Server 2008

2011-04-10 Thread Lance Norskog

You have to upgrade completely to the Apache Solr 3.1 release. It is worth the effort. You cannot copy any jars between Solr releases. Also, you cannot copy over jars from newer Tika releases. On Fri, Apr 8, 2011 at 10:47 AM, Darx Oman wrote: > Hi again > what you are missing is field mapping >

Re: Tips for getting unique results?

2011-04-10 Thread Shaun Campbell

Hi Pete Still think facets are what you need. We use facets to identify the most common tags for documents in our library. I use them to print the top 25 most common document tags. The sort by count (the default) gives you the one with the highest count first and then the next most common and so

Re: Lucid Works

2011-04-10 Thread Lance Norskog

Just to be clear, we are talking about two different Lucid Imagination products. The Certified Distribution is a repackaging of the public Solr releases with various add-on goodies that Lucid and others have written over the years. This is the "drop-in replacement" for the Apache release of Solr.

Re: Solr architecture diagram

2011-04-10 Thread Lance Norskog

Very cool! "The Life Cycle of the IndexSearcher" would also be a great diagram. The whole dance that happens during a commit is hard to explain. Also, it would help show why garbage collection can act up around commits. Lance On Sun, Apr 10, 2011 at 2:05 AM, Jan Høydahl wrote: >> Looks really go

Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-10 Thread Lance Norskog

There is an option somewhere to use the full XML DOM implementation for using xpaths. The purpose of the XPathEP is to be as simple and dumb as possible and handle most cases: RSS feeds and other open standards. Search for xsl(optional) http://wiki.apache.org/solr/DataImportHandler#Configuration_

Re: Solr 3.1 performance compared to 1.4.1

2011-04-10 Thread Yonik Seeley

On Fri, Apr 8, 2011 at 9:53 AM, Marius van Zwijndregt wrote: > Hello ! > > I'm new to the list, have been using SOLR for roughly 6 months and love it. > > Currently i'm setting up a 3.1 installation, next to a 1.4.1 installation > (Ubuntu server, same JVM params). I have copied the configuration f

Re: dismax "boost query" not useful?

2011-04-10 Thread Chris Hostetter

: 1. Each piece is still subject to the IDF component of the score, : requiring me to make each individual category have a boost factoring : that in. For example, if I want meta:promote to be twice as boosted as : category:featured, I can't simply boost the first to 2 and the second to : 1 (t

Build from svn

2011-04-10 Thread Harvi

Hello. Tell me please, how I can build and run solr instance from svn. After ant building I can't find any .war files. Need I create it? When I ran start.jar from example directory - I got only 404 page. Thanks.

Solr 3.1 performance compared to 1.4.1

2011-04-10 Thread Marius van Zwijndregt

Hello ! I'm new to the list, have been using SOLR for roughly 6 months and love it. Currently i'm setting up a 3.1 installation, next to a 1.4.1 installation (Ubuntu server, same JVM params). I have copied the configuration from 1.4.1 to the 3.1. Both version are running fine, but one thing ive n

Clustering with grouping

2011-04-10 Thread ramires

hi we use solr trunk nightly 4.0. We grouped our results with no problem. When we try to clustering these with this clustering?q=rose&group=true&group.field=site we get 500 error. Problem accessing /solr/clustering. Reason: null java.lang.NullPointerException at org.apache.solr.han

[Announce] Solr with Near Real Time (NRT) Functionality

2011-04-10 Thread Nagendra Nagarajayya

Hi! I would like to announce Solr with RankingAlgorithm has Near Real Time functionality now. The NRT functionality allows you to add documents without the IndexSearchers being closed or caches being cleared. A commit is not needed with the document update. Searches can run concurrently with

Re: Problems installing DIH in Solr 3.1

2011-04-10 Thread raimon.bosch

The problem was that I was putting solr-libs in tomcat/lib instead of using the feature to load libs from solrconfig.xml I have recovered my original tomcat lib's and added the following lines of xml code: At /solr is where I have my binary distribution of Solr. Thanks. -- View thi

Problems installing DIH in Solr 3.1

2011-04-10 Thread raimon.bosch

Hi all, I get a problem installing solr 3.1 that I'm not able to resolve. After following instructions in http://wiki.apache.org/solr/DIHQuickStart I get this error: 10-abr-2011 19:46:27 org.apache.solr.common.SolrException log GRAVE: org.apache.solr.common.SolrException: Error Instantiating Req

DIH null context

2011-04-10 Thread Robert Zotter

Has anyone else observed this behavior? https://issues.apache.org/jira/browse/SOLR-2463

Re: Performance with search terms starting and ending with wildcards

2011-04-10 Thread Ueland

>Which version of solr are you using ? Currently testing with 3.1 > NGrams could be an option but could you give us the field definition in > your schema ? The words count in this field index ? I wont share the complete schema but i can summarize it: For testing, we have around 30 fields used t

Re: Performance with search terms starting and ending with wildcards

2011-04-10 Thread lboutros

Which version of solr are you using ? NGrams could be an option but could you give us the field definition in your schema ? The words count in this field index ? Ludovic. 2011/4/10 Ueland [via Lucene] < ml-node+2802561-121096623-383...@n3.nabble.com> > Hi! > > I have been doing some testing wi

Performance with search terms starting and ending with wildcards

2011-04-10 Thread Ueland

Hi! I have been doing some testing with solr and wildcards. Queries like: - *foo - foo* Does complete quickly(1-2s) in a test index on about 40-50GB. But when i try to do a search for *foo*, the search time can without any trouble come upwards for 30seconds plus. Any ideas on how that issue c

Re: Build from svn

2011-04-10 Thread Harvi

I'm sorry. I solved the problem. There is a changed directory structure, and I just ran the build from inner 'solr' folder.

Re: Solr architecture diagram

2011-04-10 Thread Jan Høydahl

> Looks really good, but two bits that i think might confuse people are > the implications that a "Query Parser" then invokes a series of search > components; and that "analysis" (and the pieces of an analyzer chain) > are what to lookups in the underlying lucene index. > > the first might just

Re: How to index PDF file stored in SQL Server 2008

Indexing Best Practice

Clustering with grouping

Re: How to index PDF file stored in SQL Server 2008

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

Re: DIH OutOfMemoryError?

Re: Concatenate multivalued DIH fields

Re: difference between geospatial search from database angle and from solr angle

Re: Solrj performance bottleneck

Re: difference between geospatial search from database angle and from solr angle

Re: Problems indexing very large set of documents

Re: How to index PDF file stored in SQL Server 2008

Re: Tips for getting unique results?

Re: Lucid Works

Re: Solr architecture diagram

Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

Re: Solr 3.1 performance compared to 1.4.1

Re: dismax "boost query" not useful?

Build from svn

Solr 3.1 performance compared to 1.4.1

Clustering with grouping

[Announce] Solr with Near Real Time (NRT) Functionality

Re: Problems installing DIH in Solr 3.1

Problems installing DIH in Solr 3.1

DIH null context

Re: Performance with search terms starting and ending with wildcards

Re: Performance with search terms starting and ending with wildcards

Performance with search terms starting and ending with wildcards

Re: Build from svn

Re: Solr architecture diagram

31 matches

Site Navigation

Mail list logo

Footer information