Re: clear index

2007-08-20 Thread Charles Hornberger
IIRC you can also also simply stop the servlet container, delete the contents of the data directory by hand, then restart the container. -Charlie On 8/20/07, Pieter Berkel <[EMAIL PROTECTED]> wrote: > If you are using solr 1.2 the following command (followed by a commit / > optimize) should do th

Commit performance

2007-08-20 Thread Lance Norskog
How long should a take? I've got about 9.8G of data for 9M of records. (Yes, I'm indexing too much data.) My commits are taking 20-30 seconds. Since other people set the autocommit to 1 second, I'm guessing we have a major mistake somewhere in our configurations. We have a lot of deletes/re-adds

Re: clear index

2007-08-20 Thread Pieter Berkel
If you are using solr 1.2 the following command (followed by a commit / optimize) should do the trick: *:* cheers, Piete On 21/08/07, Sundling, Paul <[EMAIL PROTECTED]> wrote: > > what is the best approach to clearing an index? > > The use case is that I'm doing some performance testing with va

clear index

2007-08-20 Thread Sundling, Paul
what is the best approach to clearing an index? The use case is that I'm doing some performance testing with various index sizes. In between indexing (embedded and soon HTTP/XML) I need to clear the index so I have a fresh start. What's the best approach, close the index and delete the files?

RE: solr + carrot2

2007-08-20 Thread Lance Norskog
No, this about the Carrot2 clustering tool, specifically the Swing application. To make this app use a Solr service you have to code a custom searcher for your Solr. I'm requesting a generic UI for Carrot2 that works against any Solr. -Original Message- From: Mike Klaas [mailto:[EMAIL PRO

Re: index size

2007-08-20 Thread Mike Klaas
On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote: Are there any tips on reducing the index size or what factors most impact index size? My index has 2.7 million documents and is 200 gigabytes and growing. Most documents are around 2-3kb and there are about 30 indexed fields. An "ls -sh" wil

Re: solr + carrot2

2007-08-20 Thread Mike Klaas
On 20-Aug-07, at 11:24 AM, Lance Norskog wrote: Exactly! The Lucene version requires direct access to the file. Our indexes are on servers which do not have graphics (VNC) configured. A generic Solr access UI would be great. A generic Solr access UI? Is this different from the existing adm

Re: Custom Sorting

2007-08-20 Thread Chris Hostetter
: Sort sort = new Sort(new SortField[] : { SortField.FIELD_SCORE, new SortField(customValue, SortField.FLOAT, : true) }); : indexSearcher.search(q, sort) that appears to just be a sort on score withe a secondary reversed float sort on whatever field name is in the variable "customValue" .

RE: How to read values of a field efficiently

2007-08-20 Thread Chris Hostetter
: > TermEnum terms = searcher.getReader().terms(new Term(field, "")); : > while (terms.term() != null && terms.term().field() == field){ : > //do things : > terms.next(); : > } : while( te.next() ) { : final Term term = te.term(); you're missing the ke

Re: Enquiry on Search Results counting

2007-08-20 Thread Erik Hatcher
Have a look at using Solr's faceting to give you counts back on specific fields. Details here: On Aug 20, 2007, at 12:35 PM, Jeffrey Tiong wrote: Hi, I am trying to do some counting on certain fields of the search results, curr

RE: Solr 1.1. vs. 1.2.

2007-08-20 Thread Lance Norskog
While we're on the topic, there appear to be a ton of new features in 1.3, and they are getting debugged. When do you plan to do an official 1.3 release? -Original Message- From: Yu-Hui Jin [mailto:[EMAIL PROTECTED] Sent: Friday, August 17, 2007 11:53 PM To: solr-user@lucene.apache.org S

RE: solr + carrot2

2007-08-20 Thread Lance Norskog
Exactly! The Lucene version requires direct access to the file. Our indexes are on servers which do not have graphics (VNC) configured. A generic Solr access UI would be great. Lance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stanislaw Osinski Sent

RE: problem with quering solr after indexing UTF-8 encoded CSV files

2007-08-20 Thread Lance Norskog
If you are running on Windows, it does not default to UTF-8. It has a java property that changes it to UTF-8. Unfortunately, not all libraries get this information, and some of the String converters don't have a character-encoding argument. I learned this the hard way. _ From: Ben Shlo

Enquiry on Search Results counting

2007-08-20 Thread Jeffrey Tiong
Hi, I am trying to do some counting on certain fields of the search results, currently I am using PHP to do the counting, but it is impossible to do this when the results sets reach a few hundred thousands. Does anyone here has any idea on how to do this? Example of scenario, 1. The solr sche

problem with quering solr after indexing UTF-8 encoded CSV files

2007-08-20 Thread Ben Shlomo, Yatir
Hi! I have utf-8 encoded data inside a csv file (actually it’s a tab separated file - attached) I can index it with no apparent errors I did not forget to set this in my tomcat configuration When I query a document using the UTF-8 text I get zero matches: -

RE: Structured Lucene documents

2007-08-20 Thread Pierre-Yves LANDRON
Hello ! At least, I've had the oportunity to test your solution, Pieter, which was to use dynamic field : > > > Store each page in a separate field (e.g. page1, page2, page3 .. pageN) then > at query time, use the highlighting parameters to highlight matches in the > page fields. You should

Re: Indexing large documents

2007-08-20 Thread Fouad Mardini
thanks, i reindexed the documents and now it works, there was an issue with text extraction it seems. I also changed the maxFieldLength and it must have helped thanks On 8/20/07, Pieter Berkel <[EMAIL PROTECTED]> wrote: > > You will probably need to increase the value of maxFieldLength in your >

Re: how to retrieve all the documents in an index?

2007-08-20 Thread Erik Hatcher
Yes - they come back in the order indexed. Erik On Aug 19, 2007, at 7:20 PM, Yu-Hui Jin wrote: BTW, Hoss, is there a default order for the documents returned by running this query? thanks, -Hui On 8/16/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Any of you know whether t

Re: Indexing large documents

2007-08-20 Thread Pieter Berkel
You will probably need to increase the value of maxFieldLength in your solrconfig.xml. The default value is 1 which might explain why your documents are not being completely indexed. Piete On 20/08/07, Peter Manis <[EMAIL PROTECTED]> wrote: > > The that should show some errors if something

Re: Indexing large documents

2007-08-20 Thread Peter Manis
The that should show some errors if something goes wrong, if not the console usually will. The errors will look like a java stacktrace output. Did increasing the heap do anything for you? Changing mine to 256mb max worked fine for all of our files. On 8/20/07, Fouad Mardini <[EMAIL PROTECTED]>

Re: Indexing large documents

2007-08-20 Thread Fouad Mardini
Well, I am using the java textmining library to extract text from documents, then i do a post to solr I do not have an error log, i only have *.request.log files in the logs directory Thanks On 8/20/07, Peter Manis <[EMAIL PROTECTED]> wrote: > > Fouad, > > I would check the error log or console f

Re: Indexing large documents

2007-08-20 Thread Peter Manis
Fouad, I would check the error log or console for any possible errors first. They may not show up, it really depends on how you are processing the word document (custom solr, feeding the text to it, etc). We are using a custom version of solr with PDF, DOC, XLS, etc text extraction and I have suc

RE: Indexing large documents

2007-08-20 Thread praveen jain
Hi I want to know how to update my .xml file which have other field then the default one , so which file o have to modify, and how. pRAVEEN jAIN +919890599250 -Original Message- From: Fouad Mardini [mailto:[EMAIL PROTECTED] Sent: Monday, August 20, 2007 4:00 PM To: solr-user@lucene.ap

Indexing large documents

2007-08-20 Thread Fouad Mardini
Hello, I am using solr to index text extracted from word documents, and it is working really well. Recently i started noticing that some documents are not indexed, that is i know that the word foobar is in a document, but when i search for foobar the id of that document is not returned. I suspect

RE: UTF-8 encoding problem on one of two Solr setups

2007-08-20 Thread Mario Knezovic
> You might want to check out this page > http://wiki.apache.org/solr/SolrTomcat > > Tomcat needs a small config change out > of the box to properly support UTF-8. This exactly solved the problem. Thanks a lot! Mario

RE: How to read values of a field efficiently

2007-08-20 Thread Martin Grotzke
On Sun, 2007-08-19 at 21:39 +0200, Ard Schrijvers wrote: > > On Mon, 2007-07-30 at 00:30 -0700, Chris Hostetter wrote: > > > : Is it possible to get the values from the ValueSource (or from > > > : getFieldCacheCounts) sorted by its natural order (from lowest to > > > : highest values)? > > > > >