Thanks Erick. Your solution do make sense. Actually i wanted to know, how to
use delete via query or unique id through DIH?
Is there any specific query to be mentioned in data-config.xml? Also Is
there any separate command like "full-import" ,"delta-import" for deleting
documents from index?
On
We have about 500million documents are indexed.The index size is aobut 10G.
Running on a 32bit box. During the pressure testing, we monitered that the JVM
GC is very frequent, about 5min once. Is there any tips to turning this?
It seems like this is a way to accomplish what I was looking for:
CoreContainer coreContainer = new CoreContainer();
File home = new
File("/home/max/packages/test/apache-solr-1.4.1/example/solr");
File f = new File(home, "solr.xml");
coreContainer.load("/home/max/packages/
Thanks Lance. I'll give that a try going forward.
On Wed, Aug 25, 2010 at 9:59 PM, Lance Norskog wrote:
> Here's the problem: the standard Solr parser is a little weird about
> negative queries. The way to make this work is to say
>*:* AND -field:[* TO *]
>
> This means "select everything A
Right now I am doing some processing on my Solr index using Lucene Java.
Basically, I loop through the index in Java and do some extra processing of
each document (processing that is too intensive to do during indexing).
However, when I try to update the document in solr with new fields (using
Sol
Here's the problem: the standard Solr parser is a little weird about
negative queries. The way to make this work is to say
*:* AND -field:[* TO *]
This means "select everything AND only these documents without a value
in the field".
On Wed, Aug 25, 2010 at 7:55 PM, Max Lynch wrote:
> I was t
i recommend JMeter. We use that to do load testing on a search server. of
course you have to provide a reasonable set of queries as input... if you
don't have any then a reasonable estimation based on your expected traffic
should suffice. JMeter can be used for other load testing too..
Be careful
Cool! I did not know that Tika had a thorough&careful HTML parser.
On Wed, Aug 25, 2010 at 7:49 PM, Ken Krugler
wrote:
> Actually TagSoup's reason for existence is to clean up all of the messy HTML
> that's out in the wild.
>
> Tika's HTML parser wraps this, and uses it to generate the stream of
I was trying to filter out all documents that HAVE that field. I was trying
to delete any documents where that field had empty values.
I just found a way to do it, but I did a range query on a string date in the
Lucene DateTools format and it worked, so I'm satisfied. However, I believe
it worke
On Wed, Aug 25, 2010 at 2:34 PM, Peter Spam wrote:
> This is a very small number of documents (7000), so I am surprised Solr is
> having such a hard time with it!!
>
> I do facet on 3 terms.
>
> Subsequent "hello" searches are faster, but still well over a second. This
> is a very fast Mac Pro,
We're currently building a Solr index with ober 1.2 million documents. I
want to do a good stress test of it. Does anyone know if ther's a
appropriate stress test tool for Solr? Or any good suggestion?
Best Regards,
Scott
There is a LogTransformer that logs data instead of adding to the document:
http://www.lucidimagination.com/search/document/CDRG_ch06_6.4.7.3?q=logging
transformer
http://wiki.apache.org/solr/DataImportHandler#LogTransformer
On Wed, Aug 25, 2010 at 12:35 PM, Vladimir Sutskever
wrote:
> Hi All,
Actually TagSoup's reason for existence is to clean up all of the
messy HTML that's out in the wild.
Tika's HTML parser wraps this, and uses it to generate the stream of
SAX events that it then consumes and turns into a normalized XHTML 1.0-
compliant data stream.
-- Ken
On Aug 25, 2010,
How much disk space is used by the index?
If you run the Lucene CheckIndex program, how many terms etc. does it report?
When you do the first facet query, how much does the memory in use grow?
Are you storing the text fields, or only indexing? Do you fetch the
facets only, or do you also fetch t
Excuse me, what's the hyphen before the field name 'date_added_solr'? Is this
some kind of new query format that I didn't know?
-date_added_solr:[* TO *]'
- Original Message -
From: "Max Lynch"
To:
Sent: Thursday, August 26, 2010 6:12 AM
Subject: Delete by query issue
> Hi,
> I am
I am using SolrSearchBean inside my custom parse filter in Nutch 1.1. My
solr/nutch setup is working. I have Nutch to crawl and index into Solr and I
am
able to search solr index with my solr admin page. My solr schema is
completely
different than the one in Nutch. When I tried to query my
What you want is something called 'field collapsing'. This is a Solr
implementation that (at a high level) gives you one of these documents
and a report of how many more match the query. Collapsing multiple
product styles/colors/sizes to one consumer-visible product is a
common use case for this. A
Does this happen when you are indexing with many threads at once?
There are reports of sockets blocking and timing out in during
multi-threaded indexing.
On Wed, Aug 25, 2010 at 6:40 AM, Yonik Seeley
wrote:
> On Wed, Aug 25, 2010 at 6:41 AM, Pooja Verlani
> wrote:
>> Hi,
>> Sometimes while inde
This assumes that the HTML is good quality. I don't know exactly what
your use case is. If you're crawling the web you will find some very
screwed-up HTML.
On Wed, Aug 25, 2010 at 6:45 AM, Ken Krugler
wrote:
>
> On Aug 24, 2010, at 10:55pm, Paul Libbrecht wrote:
>
>> Wouldn't the usage of the Nec
There are a couple of options here. Solr can fetch text from a file or
from HTTP given an url. Look at the stream.file and stream.url
parameters. You can use these from EmbeddedSolr.
Also, there are 'ContentStream' objects in the SolrJ API which you can
also use. Look at
http://lucene.apache.org/s
Take a look at Multicore feature, particular the SWAP, CREATE & MERGE
actions.
Eric Pugh's "Solr 1.4 Enterprise Search Server" Book has good explanation.
Scott
- Original Message -
From: "mraible"
To:
Sent: Thursday, August 26, 2010 6:31 AM
Subject: Create a new index while Solr is
mraible wrote:
> We're starting to use Solr for our application. The data that we'll be
> indexing will change often and not accumulate over time. This means that we
> want to blow away our index and re-create it every hour or so. What's the
> easier way to do this while Solr is running and not giv
We're starting to use Solr for our application. The data that we'll be
indexing will change often and not accumulate over time. This means that we
want to blow away our index and re-create it every hour or so. What's the
easier way to do this while Solr is running and not give users a "no data
fou
Hi,
I am trying to delete all documents that have null values for a certain
field. To that effect I can see all of the documents I want to delete by
doing this query:
-date_added_solr:[* TO *]
This returns about 32,000 documents.
However, when I try to put that into a curl call, no documents get
> 1. Currently we use Verity and have more than 20 collections, each collection
> has a index for public items and a index for private items. So there are
> virtual collections which point to each collection and a virtual collection
> which points to all. For example, we have AA and BB collectio
On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler wrote:
> Hi Solr experts,
>
> There is a huge difference doing facet sorting on lex vs count
> The strange thing is that count sorting is fast when setting a small limit.
> I realize I can do sorting in the client, but I am just curious why this is.
>
Thank you for letting me know. Does Autonomy still support Verity search
engine?
-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Wednesday, August 25, 2010 3:41 PM
To: solr-user@lucene.apache.org
Subject: Re: how to deal with virtual collection in solr?
On Aug 25, 2010, at 12:18 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
> I just started to investigate Solr several weeks ago. Our current project
> uses Verity search engine which is commercial product and the company is out
> of business.
Verity is not out of business. They were acquired by Aut
Hi All,
Is there a way to increase the debugging level of SOLR delta query imports.
I would like to see records that have been "picked up" by SOLR be spit out to
Standard Output or a log file.
Thank You!
Kind regards,
Vladimir Sutskever
Investment Bank - Technology
JPMorgan Chase, Inc.
Th
On Wed, Aug 25, 2010 at 2:50 PM, Yonik Seeley
wrote:
> On Wed, Aug 25, 2010 at 10:55 AM, Eric Grobler
> wrote:
>> Thanks for the technical explanation.
>> I will in general try to use lex and sort by count in the client if there
>> are not too many rows.
>
> I just developed a patch that may help
Hello,
I just started to investigate Solr several weeks ago. Our current project uses
Verity search engine which is commercial product and the company is out of
business. I am trying to evaluate if Solr can meet our requirements. I have
following questions.
1. Currently we use Verity and have
On Wed, Aug 25, 2010 at 10:55 AM, Eric Grobler
wrote:
> Thanks for the technical explanation.
> I will in general try to use lex and sort by count in the client if there
> are not too many rows.
I just developed a patch that may help this scenario:
https://issues.apache.org/jira/browse/SOLR-2089
This is a very small number of documents (7000), so I am surprised Solr is
having such a hard time with it!!
I do facet on 3 terms.
Subsequent "hello" searches are faster, but still well over a second. This is
a very fast Mac Pro, with 6GB of RAM.
Thanks,
Peter
On Aug 25, 2010, at 9:52 AM,
I'm not sure what you mean here. You can delete via query or unique id. But
DIH really isn't relevant here.
If you've defined a unique key, simply re-adding any changed documents will
delete the old one and insert the new document.
If this makes no sense, could you explain what the underlying pro
Hi,
I'm having a problem where a Solr query on all items in one category
is returning duplicated items when an item appears in more than one
subcategory. My schema involves a document for each item's subcategory
instance. I know this is not correct.
I'm not sure if I ever tried multiple values on
On Wed, Aug 25, 2010 at 11:29 AM, Peter Spam wrote:
> So, I went through all the effort to break my documents into max 1 MB chunks,
> and searching for hello still takes over 40 seconds (searching across 7433
> documents):
>
> 8 results (41980 ms)
>
> What is going on??? (scroll down for
So, I went through all the effort to break my documents into max 1 MB chunks,
and searching for hello still takes over 40 seconds (searching across 7433
documents):
8 results (41980 ms)
What is going on??? (scroll down for my config).
-Peter
On Aug 16, 2010, at 3:59 PM, Markus Jels
Hi Yonik,
Thanks for the technical explanation.
I will in general try to use lex and sort by count in the client if there
are not too many rows.
Have a nice day.
Regards
ericz
On Wed, Aug 25, 2010 at 4:41 PM, Yonik Seeley wrote:
> On Wed, Aug 25, 2010 at 10:07 AM, Eric Grobler
> wrote:
> > I
On Wed, Aug 25, 2010 at 10:07 AM, Eric Grobler
wrote:
> I use Solr 1.41
> There are 14000 cities in the index.
> The type is just a simple string: class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
> The facet method is fc.
>
> You are right I do not need 5000 cities, I was just surp
Hi Yonik,
Thanks for your response.
I use Solr 1.41
There are 14000 cities in the index.
The type is just a simple string:
The facet method is fc.
You are right I do not need 5000 cities, I was just surprised to see this
big difference, there are places where I do need to sort count and return
have a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters to
see how that works.
2010/8/25 Marco Martinez
> You should use the tokenizer solr.WhitespaceTokenizerFactory in your field
> type to get your terms indexed, once you have indexed the data, you dont
> need to use the * i
On Aug 24, 2010, at 10:55pm, Paul Libbrecht wrote:
Wouldn't the usage of the NeckoHTML (as an XML-parser) and XPath be
safer?
I guess it all depends on the "quality" of the source document.
If you're processing HTML then you definitely want to use something
like NekoHTML or TagSoup.
Not
On Wed, Aug 25, 2010 at 6:41 AM, Pooja Verlani wrote:
> Hi,
> Sometimes while indexing to solr, I am getting the following exception.
> "com.ctc.wstx.exc.WstxEOFException: Unexpected end of input block in end tag"
> I think its some configuration issue. Kindly suggest.
>
> I have a solr working w
On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler wrote:
> There is a huge difference doing facet sorting on lex vs count
> The strange thing is that count sorting is fast when setting a small limit.
> I realize I can do sorting in the client, but I am just curious why this is.
There are a lot of opt
Hi Solr experts,
There is a huge difference doing facet sorting on lex vs count
The strange thing is that count sorting is fast when setting a small limit.
I realize I can do sorting in the client, but I am just curious why this is.
FAST - 16ms
facet.field=city
f.city.facet.limit=5000
f.city.face
You should use the tokenizer solr.WhitespaceTokenizerFactory in your field
type to get your terms indexed, once you have indexed the data, you dont
need to use the * in your queries that is a heavy query to solr.
Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Át
Hi,
Sometimes while indexing to solr, I am getting the following exception.
"com.ctc.wstx.exc.WstxEOFException: Unexpected end of input block in end tag"
I think its some configuration issue. Kindly suggest.
I have a solr working with Tomcat 6
Thanks
Pooja
Dear ladies and gentlemen.
I'm newbie with Solr, I didn't find an aswer in wiki, so I'm writing here.
I'm analysing Solr performance and have 1 problem. *Search time is about
7-10 seconds per query.*
I have a *.csv 5Gb-database with about 15 fields and 1 key field (record
number). I uploaded
Hi again Bastian,
2010/8/23 Bastian Spitzer
> I dont seem to find a decent documentation on how those parameters
> actually work.
>
> this is the default, example block:
>
>
>
> 1
>
> 0
>
>
>
> so do i have to increase the maxCommitsToKeep to a value of 2 wh
Hi I am running a zookeeper ensemble of 3 zookeeper instances
and established a solrCloud to work with it (2 masters , 2 slaves)
on each master machine I have 2 shards (4 shards in total)
on one of the masters I keep noticing ZooKeeper related exceptions which I
can't understand:
One appears to be
On Tue, Aug 24, 2010 at 10:37 AM, Bojan Vukojevic wrote:
> I am using SolrJ with embedded Solr server and some documents have a lot
> of
> text. Solr will be running on a small device with very limited memory. In
> my
> tests I cannot process more than 3MB of text (in a body) with 64MB heap.
> Ac
On Wed, Aug 25, 2010 at 12:51 PM, satya swaroop wrote:
> Hi all,
> i indexed nearly 100 java pdf files which are of large size(min 1MB).
> The solr is showing the results with the entire content that it indexed
> which is taking time to show the results.. cant we reduce the content it
> show
Thanx for your help.
I bound de.lvm.services.logging.PerformanceLoggingFilter in web.xml
and mapped it to /admin/* .
It works fine with EmbeddedSolr. I get NullPointer in some links under
admin/index.jsp, but I will solve this problem.
Robert
2010/8/25 Chris Hostetter :
>
> : we use in our appli
Hi all,
i indexed nearly 100 java pdf files which are of large size(min 1MB).
The solr is showing the results with the entire content that it indexed
which is taking time to show the results.. cant we reduce the content it
shows or can i just have the file names and ids instead of the entire
54 matches
Mail list logo