date:20100825

Re: How to delete documents from SOLR index using DIH

2010-08-25 Thread Pawan Darira

Thanks Erick. Your solution do make sense. Actually i wanted to know, how to use delete via query or unique id through DIH? Is there any specific query to be mentioned in data-config.xml? Also Is there any separate command like "full-import" ,"delta-import" for deleting documents from index? On

JVM GC is very frequent.

2010-08-25 Thread Chengyang

We have about 500million documents are indexed.The index size is aobut 10G. Running on a 32bit box. During the pressure testing, we monitered that the JVM GC is very frequent, about 5min once. Is there any tips to turning this?

Re: Duplicating a Solr Doc

2010-08-25 Thread Max Lynch

It seems like this is a way to accomplish what I was looking for: CoreContainer coreContainer = new CoreContainer(); File home = new File("/home/max/packages/test/apache-solr-1.4.1/example/solr"); File f = new File(home, "solr.xml"); coreContainer.load("/home/max/packages/

Re: Delete by query issue

2010-08-25 Thread Max Lynch

Thanks Lance. I'll give that a try going forward. On Wed, Aug 25, 2010 at 9:59 PM, Lance Norskog wrote: > Here's the problem: the standard Solr parser is a little weird about > negative queries. The way to make this work is to say >*:* AND -field:[* TO *] > > This means "select everything A

Duplicating a Solr Doc

2010-08-25 Thread Max Lynch

Right now I am doing some processing on my Solr index using Lucene Java. Basically, I loop through the index in Java and do some extra processing of each document (processing that is too intensive to do during indexing). However, when I try to update the document in solr with new fields (using Sol

Re: Delete by query issue

2010-08-25 Thread Lance Norskog

Here's the problem: the standard Solr parser is a little weird about negative queries. The way to make this work is to say *:* AND -field:[* TO *] This means "select everything AND only these documents without a value in the field". On Wed, Aug 25, 2010 at 7:55 PM, Max Lynch wrote: > I was t

Re: Is there any strss test tool for testing Solr?

2010-08-25 Thread Amit Nithian

i recommend JMeter. We use that to do load testing on a search server. of course you have to provide a reasonable set of queries as input... if you don't have any then a reasonable estimation based on your expected traffic should suffice. JMeter can be used for other load testing too.. Be careful

Re: Restricting HTML search?

2010-08-25 Thread Lance Norskog

Cool! I did not know that Tika had a thorough&careful HTML parser. On Wed, Aug 25, 2010 at 7:49 PM, Ken Krugler wrote: > Actually TagSoup's reason for existence is to clean up all of the messy HTML > that's out in the wild. > > Tika's HTML parser wraps this, and uses it to generate the stream of

Re: Delete by query issue

2010-08-25 Thread Max Lynch

I was trying to filter out all documents that HAVE that field. I was trying to delete any documents where that field had empty values. I just found a way to do it, but I did a range query on a string date in the Lucene DateTools format and it worked, so I'm satisfied. However, I believe it worke

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Yonik Seeley

On Wed, Aug 25, 2010 at 2:34 PM, Peter Spam wrote: > This is a very small number of documents (7000), so I am surprised Solr is > having such a hard time with it!! > > I do facet on 3 terms. > > Subsequent "hello" searches are faster, but still well over a second. This > is a very fast Mac Pro,

Is there any strss test tool for testing Solr?

2010-08-25 Thread 朱炎詹

We're currently building a Solr index with ober 1.2 million documents. I want to do a good stress test of it. Does anyone know if ther's a appropriate stress test tool for Solr? Or any good suggestion? Best Regards, Scott

Re: Increasing Logging of Delta Queries

2010-08-25 Thread Lance Norskog

There is a LogTransformer that logs data instead of adding to the document: http://www.lucidimagination.com/search/document/CDRG_ch06_6.4.7.3?q=logging transformer http://wiki.apache.org/solr/DataImportHandler#LogTransformer On Wed, Aug 25, 2010 at 12:35 PM, Vladimir Sutskever wrote: > Hi All,

Re: Restricting HTML search?

2010-08-25 Thread Ken Krugler

Actually TagSoup's reason for existence is to clean up all of the messy HTML that's out in the wild. Tika's HTML parser wraps this, and uses it to generate the stream of SAX events that it then consumes and turns into a normalized XHTML 1.0- compliant data stream. -- Ken On Aug 25, 2010,

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Lance Norskog

How much disk space is used by the index? If you run the Lucene CheckIndex program, how many terms etc. does it report? When you do the first facet query, how much does the memory in use grow? Are you storing the text fields, or only indexing? Do you fetch the facets only, or do you also fetch t

Re: Delete by query issue

2010-08-25 Thread 朱炎詹

Excuse me, what's the hyphen before the field name 'date_added_solr'? Is this some kind of new query format that I didn't know? -date_added_solr:[* TO *]' - Original Message - From: "Max Lynch" To: Sent: Thursday, August 26, 2010 6:12 AM Subject: Delete by query issue > Hi, > I am

How to set custom fields for SolrSearchBean Query in Nutch?

2010-08-25 Thread Savannah Beckett

I am using SolrSearchBean inside my custom parse filter in Nutch 1.1. My solr/nutch setup is working. I have Nutch to crawl and index into Solr and I am able to search solr index with my solr admin page. My solr schema is completely different than the one in Nutch. When I tried to query my

Re: Distinct values versus schema change?

2010-08-25 Thread Lance Norskog

What you want is something called 'field collapsing'. This is a Solr implementation that (at a high level) gives you one of these documents and a report of how many more match the query. Collapsing multiple product styles/colors/sizes to one consumer-visible product is a common use case for this. A

Re: Regd WSTX EOFException

2010-08-25 Thread Lance Norskog

Does this happen when you are indexing with many threads at once? There are reports of sockets blocking and timing out in during multi-threaded indexing. On Wed, Aug 25, 2010 at 6:40 AM, Yonik Seeley wrote: > On Wed, Aug 25, 2010 at 6:41 AM, Pooja Verlani > wrote: >> Hi, >> Sometimes while inde

Re: Restricting HTML search?

2010-08-25 Thread Lance Norskog

This assumes that the HTML is good quality. I don't know exactly what your use case is. If you're crawling the web you will find some very screwed-up HTML. On Wed, Aug 25, 2010 at 6:45 AM, Ken Krugler wrote: > > On Aug 24, 2010, at 10:55pm, Paul Libbrecht wrote: > >> Wouldn't the usage of the Nec

Re: SolrJ addField with Reader

2010-08-25 Thread Lance Norskog

There are a couple of options here. Solr can fetch text from a file or from HTTP given an url. Look at the stream.file and stream.url parameters. You can use these from EmbeddedSolr. Also, there are 'ContentStream' objects in the SolrJ API which you can also use. Look at http://lucene.apache.org/s

Re: Create a new index while Solr is running

2010-08-25 Thread 朱炎詹

Take a look at Multicore feature, particular the SWAP, CREATE & MERGE actions. Eric Pugh's "Solr 1.4 Enterprise Search Server" Book has good explanation. Scott - Original Message - From: "mraible" To: Sent: Thursday, August 26, 2010 6:31 AM Subject: Create a new index while Solr is

Re: Create a new index while Solr is running

2010-08-25 Thread Ron Mayer

mraible wrote: > We're starting to use Solr for our application. The data that we'll be > indexing will change often and not accumulate over time. This means that we > want to blow away our index and re-create it every hour or so. What's the > easier way to do this while Solr is running and not giv

Create a new index while Solr is running

2010-08-25 Thread mraible

We're starting to use Solr for our application. The data that we'll be indexing will change often and not accumulate over time. This means that we want to blow away our index and re-create it every hour or so. What's the easier way to do this while Solr is running and not give users a "no data fou

Delete by query issue

2010-08-25 Thread Max Lynch

Hi, I am trying to delete all documents that have null values for a certain field. To that effect I can see all of the documents I want to delete by doing this query: -date_added_solr:[* TO *] This returns about 32,000 documents. However, when I try to put that into a curl call, no documents get

Re: how to deal with virtual collection in solr?

2010-08-25 Thread Jan Høydahl / Cominvent

> 1. Currently we use Verity and have more than 20 collections, each collection > has a index for public items and a index for private items. So there are > virtual collections which point to each collection and a virtual collection > which points to all. For example, we have AA and BB collectio

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley

On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler wrote: > Hi Solr experts, > > There is a huge difference doing facet sorting on lex vs count > The strange thing is that count sorting is fast when setting a small limit. > I realize I can do sorting in the client, but I am just curious why this is. >

RE: how to deal with virtual collection in solr?

2010-08-25 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

Thank you for letting me know. Does Autonomy still support Verity search engine? -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Wednesday, August 25, 2010 3:41 PM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr?

Re: how to deal with virtual collection in solr?

2010-08-25 Thread Walter Underwood

On Aug 25, 2010, at 12:18 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: > I just started to investigate Solr several weeks ago. Our current project > uses Verity search engine which is commercial product and the company is out > of business. Verity is not out of business. They were acquired by Aut

Increasing Logging of Delta Queries

2010-08-25 Thread Vladimir Sutskever

Hi All, Is there a way to increase the debugging level of SOLR delta query imports. I would like to see records that have been "picked up" by SOLR be spit out to Standard Output or a log file. Thank You! Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan Chase, Inc. Th

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley

On Wed, Aug 25, 2010 at 2:50 PM, Yonik Seeley wrote: > On Wed, Aug 25, 2010 at 10:55 AM, Eric Grobler > wrote: >> Thanks for the technical explanation. >> I will in general try to use lex and sort by count in the client if there >> are not too many rows. > > I just developed a patch that may help

how to deal with virtual collection in solr?

2010-08-25 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

Hello, I just started to investigate Solr several weeks ago. Our current project uses Verity search engine which is commercial product and the company is out of business. I am trying to evaluate if Solr can meet our requirements. I have following questions. 1. Currently we use Verity and have

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley

On Wed, Aug 25, 2010 at 10:55 AM, Eric Grobler wrote: > Thanks for the technical explanation. > I will in general try to use lex and sort by count in the client if there > are not too many rows. I just developed a patch that may help this scenario: https://issues.apache.org/jira/browse/SOLR-2089

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Peter Spam

This is a very small number of documents (7000), so I am surprised Solr is having such a hard time with it!! I do facet on 3 terms. Subsequent "hello" searches are faster, but still well over a second. This is a very fast Mac Pro, with 6GB of RAM. Thanks, Peter On Aug 25, 2010, at 9:52 AM,

Re: How to delete documents from SOLR index using DIH

2010-08-25 Thread Erick Erickson

I'm not sure what you mean here. You can delete via query or unique id. But DIH really isn't relevant here. If you've defined a unique key, simply re-adding any changed documents will delete the old one and insert the new document. If this makes no sense, could you explain what the underlying pro

Distinct values versus schema change?

2010-08-25 Thread Willie Whitehead

Hi, I'm having a problem where a Solr query on all items in one category is returning duplicated items when an item appears in more than one subcategory. My schema involves a document for each item's subcategory instance. I know this is not correct. I'm not sure if I ever tried multiple values on

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Yonik Seeley

On Wed, Aug 25, 2010 at 11:29 AM, Peter Spam wrote: > So, I went through all the effort to break my documents into max 1 MB chunks, > and searching for hello still takes over 40 seconds (searching across 7433 > documents): > > 8 results (41980 ms) > > What is going on??? (scroll down for

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Peter Spam

So, I went through all the effort to break my documents into max 1 MB chunks, and searching for hello still takes over 40 seconds (searching across 7433 documents): 8 results (41980 ms) What is going on??? (scroll down for my config). -Peter On Aug 16, 2010, at 3:59 PM, Markus Jels

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Eric Grobler

Hi Yonik, Thanks for the technical explanation. I will in general try to use lex and sort by count in the client if there are not too many rows. Have a nice day. Regards ericz On Wed, Aug 25, 2010 at 4:41 PM, Yonik Seeley wrote: > On Wed, Aug 25, 2010 at 10:07 AM, Eric Grobler > wrote: > > I

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley

On Wed, Aug 25, 2010 at 10:07 AM, Eric Grobler wrote: > I use Solr 1.41 > There are 14000 cities in the index. > The type is just a simple string: class="solr.StrField" sortMissingLast="true" omitNorms="true"/> > The facet method is fc. > > You are right I do not need 5000 cities, I was just surp

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Eric Grobler

Hi Yonik, Thanks for your response. I use Solr 1.41 There are 14000 cities in the index. The type is just a simple string: The facet method is fc. You are right I do not need 5000 cities, I was just surprised to see this big difference, there are places where I do need to sort count and return

Re: Solr search speed very low

2010-08-25 Thread Geert-Jan Brits

have a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters to see how that works. 2010/8/25 Marco Martinez > You should use the tokenizer solr.WhitespaceTokenizerFactory in your field > type to get your terms indexed, once you have indexed the data, you dont > need to use the * i

Re: Restricting HTML search?

2010-08-25 Thread Ken Krugler

On Aug 24, 2010, at 10:55pm, Paul Libbrecht wrote: Wouldn't the usage of the NeckoHTML (as an XML-parser) and XPath be safer? I guess it all depends on the "quality" of the source document. If you're processing HTML then you definitely want to use something like NekoHTML or TagSoup. Not

Re: Regd WSTX EOFException

2010-08-25 Thread Yonik Seeley

On Wed, Aug 25, 2010 at 6:41 AM, Pooja Verlani wrote: > Hi, > Sometimes while indexing to solr, I am getting the following exception. > "com.ctc.wstx.exc.WstxEOFException: Unexpected end of input block in end tag" > I think its some configuration issue. Kindly suggest. > > I have a solr working w

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley

On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler wrote: > There is a huge difference doing facet sorting on lex vs count > The strange thing is that count sorting is fast when setting a small limit. > I realize I can do sorting in the client, but I am just curious why this is. There are a lot of opt

Slow facet sorting - lex vs count

2010-08-25 Thread Eric Grobler

Hi Solr experts, There is a huge difference doing facet sorting on lex vs count The strange thing is that count sorting is fast when setting a small limit. I realize I can do sorting in the client, but I am just curious why this is. FAST - 16ms facet.field=city f.city.facet.limit=5000 f.city.face

Re: Solr search speed very low

2010-08-25 Thread Marco Martinez

You should use the tokenizer solr.WhitespaceTokenizerFactory in your field type to get your terms indexed, once you have indexed the data, you dont need to use the * in your queries that is a heavy query to solr. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Át

Regd WSTX EOFException

2010-08-25 Thread Pooja Verlani

Hi, Sometimes while indexing to solr, I am getting the following exception. "com.ctc.wstx.exc.WstxEOFException: Unexpected end of input block in end tag" I think its some configuration issue. Kindly suggest. I have a solr working with Tomcat 6 Thanks Pooja

Solr search speed very low

2010-08-25 Thread Andrey Sapegin

Dear ladies and gentlemen. I'm newbie with Solr, I didn't find an aswer in wiki, so I'm writing here. I'm analysing Solr performance and have 1 problem. *Search time is about 7-10 seconds per query.* I have a *.csv 5Gb-database with about 15 fields and 1 key field (record number). I uploaded

Re: SolrException log

2010-08-25 Thread Tommaso Teofili

Hi again Bastian, 2010/8/23 Bastian Spitzer > I dont seem to find a decent documentation on how those parameters > actually work. > > this is the default, example block: > > > > 1 > > 0 > > > > so do i have to increase the maxCommitsToKeep to a value of 2 wh

solrCloud zookeepr related excpetions

2010-08-25 Thread Yatir Ben Shlomo

Hi I am running a zookeeper ensemble of 3 zookeeper instances and established a solrCloud to work with it (2 masters , 2 slaves) on each master machine I have 2 shards (4 shards in total) on one of the masters I keep noticing ZooKeeper related exceptions which I can't understand: One appears to be

Re: SolrJ addField with Reader

2010-08-25 Thread Shalin Shekhar Mangar

On Tue, Aug 24, 2010 at 10:37 AM, Bojan Vukojevic wrote: > I am using SolrJ with embedded Solr server and some documents have a lot > of > text. Solr will be running on a small device with very limited memory. In > my > tests I cannot process more than 3MB of text (in a body) with 64MB heap. > Ac

Re: reduce the content???

2010-08-25 Thread Shalin Shekhar Mangar

On Wed, Aug 25, 2010 at 12:51 PM, satya swaroop wrote: > Hi all, > i indexed nearly 100 java pdf files which are of large size(min 1MB). > The solr is showing the results with the entire content that it indexed > which is taking time to show the results.. cant we reduce the content it > show

Re: 'Error 404: missing core name in path ' in adminconsole

2010-08-25 Thread Robert Naczinski

Thanx for your help. I bound de.lvm.services.logging.PerformanceLoggingFilter in web.xml and mapped it to /admin/* . It works fine with EmbeddedSolr. I get NullPointer in some links under admin/index.jsp, but I will solve this problem. Robert 2010/8/25 Chris Hostetter : > > : we use in our appli

reduce the content???

2010-08-25 Thread satya swaroop

Hi all, i indexed nearly 100 java pdf files which are of large size(min 1MB). The solr is showing the results with the entire content that it indexed which is taking time to show the results.. cant we reduce the content it shows or can i just have the file names and ids instead of the entire

54 matches

Mail list logo