Wouldn't the usage of the NeckoHTML (as an XML-parser) and XPath be
safer?
I guess it all depends on the "quality" of the source document.
paul
Le 25-août-10 à 02:09, Lance Norskog a écrit :
I would do this with regular expressions. There is a Pattern Analyzer
and a Tokenizer which do regul
Hi
I am using data import handler to build index. How can i delete documents
from my index using DIH.
--
Thanks,
Pawan Darira
Has anyone changed your tomcat settings? You're logging
information at the INFO level, I wonder if you used to be
logging at WARN.
I'd also take a look at your log directory to see if you've got
a bazillion log files, and/or how big your log file is, has it
been accumulating log messages forever?
If you attach &debugQuery=on to your query, you'll often
get pointers as to what's actually happening under the covers...
Best
Erick
On Tue, Aug 24, 2010 at 2:26 AM, C0re wrote:
>
> We have a query which takes the form of
>
> ".../select?q=*&sort=evalDate+desc,score+desc&start=0&rows=10"
>
> Th
Thanks! That' make sense :)
- Original Message -
From: "Ahmet Arslan"
To:
Sent: Tuesday, August 24, 2010 4:30 PM
Subject: Re: Why it's boosted up?
Then why short fields are boost up?
In other words longer documents are punished. Because they contain
possibly many terms/words. If
In addition to deletebyquery, you might want to optimize
your index periodically to reclaim some space
Best
Erick
On Tue, Aug 24, 2010 at 2:53 AM, Andreas Jung wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Andy wrote:
> > My documents have an "expiration_datetime" field that
Thanks for your clear explanation! I got it :)
- Original Message -
From: "MitchK"
To:
Sent: Tuesday, August 24, 2010 3:37 PM
Subject: Re: Why it's boosted up?
Hi Scott,
(so shorter fields are automatically boosted up). "
The theory behind that is the following (in easy word
Another thing you might try is set preserverOriginal=1
(just saw this in another thread). Which one is "better"
usually depends on your problem space...
Best
Erick
On Mon, Aug 23, 2010 at 9:16 AM, Scottie wrote:
>
> Nikolas, thanks a lot for that, I've just gave it a quick test and it
> definit
It uses the XmlUpdateRequestHandler internally; it does not really
send XML. This is understandably confusing. Embedded Solr calls all of
the Solr classes directly; it does not use HTTP or serialized data.
On Tue, Aug 24, 2010 at 6:28 AM, Constantijn Visinescu
wrote:
> If my requests aren't seria
We have found that 200-250mb per Lucene index is where efficiency
drops off and Lucene gets slow. You will have to use a sharding
approach: many small indexes, and all have different sets of
documents. Solr has a tool for doing queries across many shards,
called Distributed Search.
http://wiki.apa
I would do this with regular expressions. There is a Pattern Analyzer
and a Tokenizer which do regular expression-based text chopping. (I'm
not sure how to make them do what you want). A more precise tool is
the RegexTransformer in the DataImportHandler.
Lance
On Tue, Aug 24, 2010 at 7:08 AM, And
: we use in our application to the JEE EmbeddedSolrServer. It works very
: well. Now I wanted to create the admin JSPs. For that I have copied
: the JSPs from webroot Solr example. When I try to access
: ...admin/index.jsp , I get 'Error 404: missing core name in path'
just copying JSPs isn't eno
Johann,
try to remove the wordDelimiterFilter from the query-analyzer of your
fieldType.
If your index-analyzer-wordDelimiterFilter is well configured, it will find
everything you want.
Does this solve the problem?
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n
I have a fieldtype with the following definition:
I have a value "blume2000.de" in a field with the fieldtype above. If I issue a
query with select?q
We will be ingesting gigabytes of new data per day, but have a lot of legacy
data (petabytes) that will also need to be indexed. We will probably index
many fields per record (ave. 50/record) and hope to add facets in the near
future.
If this solution gives us the speed and facet capabilities we
Liz,
I've built terrabyte (1-2 TB) test Lucene indexes, but have not
reached to the petabyte level, so I am not sure. Certainly there is
overhead in using the http and xml marshaling/de-marshaling, which may
or may not be a critical factor for you.
Could you give more information with respect to
Oh my God! That's awesome!
Thank you guys
2010/8/24 Ahmet Arslan :
>
>> I need to get the first 100 chars of a string-type field,
>> but I am not
>> able to find something like a SubstringTransformer,
>> therefore I am
>> using the RegexTransformer, but I suspect that it eats a
>> lot of time
>>
We do have synonyms.txt in our config directory. The config directory is a
copy of the example directory. We will probably also run into this problem
with stopwords.xml.
We don't understand how to make it look in the correct directory. We
thought it got the correct directory out of the solrconf
I was worried that it wouldn't scale. We are going to be indexing petabytes
of data. Does the httpserver solution scale?
Thanks
Liz Sommers
lizswo...@gmail.com
On Tue, Aug 24, 2010 at 12:23 PM, Thomas Joiner
wrote:
> Is there any reason you aren't using http://wiki.apache.org/solr/Solrj to
>
Hello!
The exception thrown by Solr says that You do not have synonyms.txt
file either in classpath or in solr core config directory. Check Your
schema.xml file for a filter - SynonymFilterFactory. That filter use
synonyms.txt file to read synonyms definitions. If You don`t need
synonyms filter
Is there any reason you aren't using http://wiki.apache.org/solr/Solrj to
interact with Solr?
On Tue, Aug 24, 2010 at 11:12 AM, Liz Sommers wrote:
> I am very new to the solr/lucene world. I am using solr 1.4.0 and cannot
> move to 1.4.1.
>
> I have to index about 50 fields for each document, t
I am very new to the solr/lucene world. I am using solr 1.4.0 and cannot
move to 1.4.1.
I have to index about 50 fields for each document, these fields are already
in key/value pairs by the time I get to my index methods. I was able to
index them with lucene without any problem, but found that I
Hello,
I just started to investigate Solr several weeks ago. Our current project uses
Verity search engine which is commercial product and the company is out of
business. I am trying to evaluate if Solr can meet our requirements. I have
following questions.
1. Currently we use Verity and have
I'm quite new to SOLR and wondering if the following is possible: in
addition to normal full text search, my users want to have the option to
search only HTML heading innertext, i.e. content inside of , , or
tags.
Thank you,
Andy Cogan
Oops, forgot to include solr-user@ in the original email. FYI below...
-- Forwarded Message
From: "Mattmann, Chris A (388J)"
Reply-To:
Date: Tue, 24 Aug 2010 07:02:58 -0700
To:
Subject: [Spatial] Geonames and extension to Spatial Solution for Solr
Hi Folks,
You may have noticed over the p
Thread dump
Got like 240 thread like this :
"http-8080-Processor222" daemon prio=10 tid=0x7fe36c010c00 nid=0x1e94
waiting for monitor entry [0x4caa6000..0x4caa6d20]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.logging.StreamHandler.publish(Str
What happens to your performance if you query for *:* instead of * ?
(probably have to url encode the colon)
On Tue, Aug 24, 2010 at 11:26 AM, C0re wrote:
>
> We have a query which takes the form of
>
> ".../select?q=*&sort=evalDate+desc,score+desc&start=0&rows=10"
>
> This query takes around 5 s
If my requests aren't serialized via a request writer then why does my
embedded solr crash when i comment out the following line in my
solrconfig:
it crashes with the exception that it can't with the /update URL. (I
left in the javabin request handler).
On Mon, Aug 23, 2010 at 10:40 PM, Rya
Unfortunately, when I use /admin/ I get the error too.
My contextroot is not 'solr'. I use the EmbeddedSolrServer in another JEE-App
Another idea?
Robert
2010/8/24 Lucas F. A. Teixeira :
> I hate when this happen.
>
> Look, if you enter the url:
>
> http://server:port/solr/core/admin
>
> yo
I hate when this happen.
Look, if you enter the url:
http://server:port/solr/core/admin
you'll have this error you said... try a final slash on it
http://server:port/solr/core/admin/
and will work.
Lucas Frare Teixeira .·.
- lucas...@gmail.com
- lucastex.com.br
- blog.lucastex.com
- twitter
Hello,
we use in our application to the JEE EmbeddedSolrServer. It works very
well. Now I wanted to create the admin JSPs. For that I have copied
the JSPs from webroot Solr example. When I try to access
...admin/index.jsp , I get 'Error 404: missing core name in path'
We run the application on We
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Andy wrote:
> My documents have an "expiration_datetime" field that holds the expiration
> datetime of the document.
>
> I use a filter query to exclude expired documents from my query results.
>
> Is it a good idea to periodically go through the in
My documents have an "expiration_datetime" field that holds the expiration
datetime of the document.
I use a filter query to exclude expired documents from my query results.
Is it a good idea to periodically go through the index and remove expired
documents from it? If so what is the best way t
Hey, I guess this option has been removed in Lucene 2.0 - you could
look as maxBufferedDocs and ramBufferSizeMB to control how many
documents / heap space is used to buffer documents before they are
flushed and merged into a new segment. Don't know what you are trying
to do but those are the factor
in lucene is this option for the index configuration available. In Solr too ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/minMergeDocs-supported-tp1302856p1307821.html
Sent from the Solr - User mailing list archive at Nabble.com.
We have a query which takes the form of
".../select?q=*&sort=evalDate+desc,score+desc&start=0&rows=10"
This query takes around 5 seconds to complete.
I changed the query to the following;
".../select?q=[* TO NOW]&sort=evalDate+desc,score+desc&start=0&rows=10"
The query now returns in around 6
> Then why short fields are boost up?
In other words longer documents are punished. Because they contain possibly
many terms/words. If this mechanism does not exist, longer documents takes over
and pops up usually in the first page.
>
> Hi all,
> I got the solution for my problem. I changed my port number and i
> kept the old one in the stream.url... so problem was that...
> thanks all
>
> Now i got another problem, it is when i send any requests to remote
> system for the files that have names with escape
> The request is from our business
> team, they wish user of our product can
> type in partial string of a word that exists in title or
> body field. But now
> I also doubt if this request is really necessary?
"partial string of a word"? I think there is misunderstanding here.
SingleFilter oper
Hi Chris,
On 23.08.2010 21:37, Chris Hostetter wrote:
> : The document is indexed correctly, a search for "at s" found it and all
> : fields looked great ("at&s and not for example, at&s).
> :
> : As my stopword list does not contain "at" or "&" or "&", I don't
> : quite understand, why my result
> I need to get the first 100 chars of a string-type field,
> but I am not
> able to find something like a SubstringTransformer,
> therefore I am
> using the RegexTransformer, but I suspect that it eats a
> lot of time
> on indexation time.
>
> So, in short, I need something like a SubstringTrans
Hi Scott,
> (so shorter fields are automatically boosted up). "
>
The theory behind that is the following (in easy words):
Let's say you got two documents, each doc contains on 1 field (like it was
in my example).
Additionally we got a query that contains two words.
Let's say doc1 contains o
On Tue, 24 Aug 2010 08:46:52 +0200
Gonzalo Payo Navarro wrote:
> Hi everyone!
>
> I need to get the first 100 chars of a string-type field, but I
> am not able to find something like a SubstringTransformer,
> therefore I am using the RegexTransformer, but I suspect that it
> eats a lot of time o
43 matches
Mail list logo