In my DIH tests I ran a nested loop where the outer RSS feed gave a list of
feeds, and the inner loop walked each feed. Some of the feeds were bogus, and
the DIH loop immediately failed.
It would be good to have at least "ignoreerrors=true" the way 'ant' does. This
would be set inside each loop
Hi,
For message parsing you'll either have to write a custom parser or see if you
can use JavaMail for that (or some other library if you are not working with
Java).
As for the second part, that's not directly related to Solr. Extracting
meaning out of text would be something that your applic
Hi Tom, if you're on a non Windows box, could you perhaps try your
test on the latest Solr nightly build? We've recently improved this
through the use of NIO.
-Yonik
On Fri, Nov 7, 2008 at 4:23 PM, Burton-West, Tom <[EMAIL PROTECTED]> wrote:
> Hello,
>
> We are testing Solr with a simulation of
I've been asked to look at the Enron e-mail corpus
(http://www.cs.cmu.edu/~enron/) and I've decided to use Solr as a means to
analyse it.
So I have a few questions...
First off, how can I convert the flat file text below:
Message-ID: <[EMAIL PROTECTED]>
Date: Mon, 14 May 2001 16:39:00 -0700 (
Use synonym.
Added these line to your ../conf/synonym.txt
Stephen,Steven,Steve
Bobby,Bob,Robert
...
-Original Message-
From: news [mailto:[EMAIL PROTECTED] On Behalf Of Jon Drukman
Sent: Friday, November 07, 2008 3:19 Joe
To: solr-user@lucene.apache.org
Subject: Handling proper names
Is
Is there any way to tell Solr that Stephen is the same as Steven and
Steve? Carl and Karl? Bobby/Bob/Robert, and so on...
-jsd-
Hello,
We are testing Solr with a simulation of 30 concurrent users. We are
getting socket timeouts and the thread dump from the admin tool shows
about 100+ threads with a similar message about a lock. (Message
appended below).
We supsect this may have something to do with one or more phrase que
Hi all,
I would like to see the output from snapshooter as it executes after it
has been called via the postCommit event of the
solr.RunExecutableListener class.
In my solrconfig.xml, the listener is described by:
snapshooter
solr/bin
true
Is there a way
Hi!
Sorry that I was unclear, when I wrote that it works in the web interface I
also
meant to say that it is set in the schema.xml file and therefore working
there.
Sorry about that
Regards Erik
On Fri, Nov 7, 2008 at 11:33 AM, Jorge Solari <[EMAIL PROTECTED]> wrote:
> setting in schema.xml
>
>
setting in schema.xml
On Fri, Nov 7, 2008 at 5:21 PM, Erik Holstad <[EMAIL PROTECTED]> wrote:
> Hi!
> When making a query using the web interface the we get the expected
> OR function. But when using the java client it look like it is treating the
> query as an AND query.
>
> Is there way to s
Hi!
When making a query using the web interface the we get the expected
OR function. But when using the java client it look like it is treating the
query as an AND query.
Is there way to see what operator is used for the query using Solrj?
Regards Erik
On Fri, Nov 7, 2008 at 12:58 PM, Cedric Houis <[EMAIL PROTECTED]> wrote:
> Another remark, why do we have better performance when we use parallel
> instances of SOLR that use the same index on the same machine?
Internal locking.
SOLR-465 was committed yesterday... it may improve some things slight
That should be easy enough to test with a trivial bit of logging.
Worst case, make a *new* analyzer for each of your fields.
PerFieldAnalyzerWrapper is your friend here.
Best
Erick
On Fri, Nov 7, 2008 at 11:23 AM, Yuri Jan <[EMAIL PROTECTED]> wrote:
> I'm subclassing my own tokenizer.
> I'm not
Hello Yonik.
I’ve made few tests more today. Here are the results:
Start thread every 0.1->0.5 sec.
Each thread waits 2->10 sec before starting a new query
Each thread runs 5 min.
With FastSolrLRU
FullText :
- 10 users : 998 queries / Average time 0.037 sec
- 50 users : 4819 qu
If you need anything close to realtime (~ few seconds) hadoop and its
ilk is not a choice. Solr is fine. But be prepared to dedicate a lot
of hardware for that
On Fri, Nov 7, 2008 at 10:53 PM, souravm <[EMAIL PROTECTED]> wrote:
> Hi Shalin,
>
> Thanks for your input.
>
> Yes I agree that my applic
Hi Shalin,
Thanks for your input.
Yes I agree that my application is not much about full text search.
Hive/Chukwa/Pig (a combination) running on Hadoop can be a good bet. But where
they fall short is in online querying of the huge data.
I am specifically talking about Pig in this case which ha
On Fri, Nov 7, 2008 at 5:48 PM, Vaijanath N. Rao <[EMAIL PROTECTED]> wrote:
> Hi Solr-Users,
>
> I am not sure but does there exist any mechanism where-in we can specify
> solr as Batch and incremental indexing.
> What I mean by batch indexing is solr would delete all the records which
> existed in
OK .you can raise an issue anyway
On Fri, Nov 7, 2008 at 7:03 PM, Steven Anderson <[EMAIL PROTECTED]> wrote:
> Ideally, it would be a configuration option.
>
> Also, it would be great to have a hook to log or process an exception.
>
> Steve
>
>
> -Original Message-
> From: Noble Paul ?
I'm subclassing my own tokenizer.
I'm not sure though if I can rely on the fact this tokenizer will be used
for this field sequentially.
I'm going to use it with different fields and doesn't want the member
variable to be used when tokenizing different fields or even the same field
on different doc
Hello Ryantxu.
I've check the cache state, it seems that the cache is well used :
cumulative_evictions : 0
Thanks for your help,
Regards,
Cédric
ryantxu wrote:
>
>>
>> Data :
>>
>> 367380 documents
>>
>> nGeographicLocations : 39298 distincts values
>> nPersonNames : 325142 distincts valu
Hi,
you have to keep track of the character position yourself in your
custom Tokenizer.
See org.apache.lucene.analysis.CharTokenizer for a starting example.
Cheers,
J.
On Fri, Nov 7, 2008 at 3:33 PM, Yoav Caspi <[EMAIL PROTECTED]> wrote:
> Thanks, Jerome.
>
> My problem is that in Tok
>From what I can understand, you have little full-text search involved here.
You should probably look at Hadoop and its contrib and sub-projects such as
Pig, Hive and Chukwa.
http://wiki.apache.org/hadoop/
http://wiki.apache.org/hadoop/Hive
http://wiki.apache.org/hadoop/Chukwa
http://incubator.apa
Why not just subclass your own tokenizer and use
that one? Each call to next could increment a member
variable in your new class and you could make your
decisions based upon that...
Best
Erick
On Fri, Nov 7, 2008 at 10:33 AM, Yoav Caspi <[EMAIL PROTECTED]> wrote:
> Thanks, Jerome.
>
> My problem
Dan A. Dickey wrote:
I just came across the maxFieldLength setting for the mainIndex
in solrconfig.xml and have a question or two about it.
The default value is 1.
I'm extracting text from pdf documents and
storing them into a text field. Is the length of this text field limited
to 1 ch
Thanks, Jerome.
My problem is that in Token next(Token result) there is no information about
the location inside the stream.
I can read characters from the input Reader, but couldn't find a way to know
if it's the beginning of the input or not.
-J
On Fri, Nov 7, 2008 at 6:13 AM, Jérôme Etévé <[E
Hi Guys,
Here I'm struggling with to decide whether Solr would be a fitting solution for
me. Highly appreciate you
The key requirements can be summarized as below -
1. Need to process very high volume of data online from log files of various
applications - around 100s of Millions of total size
I believe it's 10,000 tokens, not characters, but that's a quibble.
Yes, you need to change maxFieldLength to be greater than
any doc you expect to index. It can be made huge, I don't
think there's a penalty for making this number, say, 100,000,000
and indexing documents with only 10 tokens.
Thanks Noble for your answer.
Regards,
Sourav
-Original Message-
From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 06, 2008 7:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Multicore ...
On Fri, Nov 7, 2008 at 3:28 AM, souravm <[EMAIL PROTECTED]>
Thanks Otis for clarification.
Sourav
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 06, 2008 8:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Distributed Search ...
Sourav,
Whichever Solr instance you send the request to will dispatch r
I just came across the maxFieldLength setting for the mainIndex
in solrconfig.xml and have a question or two about it.
The default value is 1.
I'm extracting text from pdf documents and
storing them into a text field. Is the length of this text field limited
to 1 characters? Many pdf doc
If I have a field with value "foo blah blah blah bar" and "foo blah blah
blah". I want to be abe to find documents with "foo" NOT "bar" within 5
token positions. Is that possible?
--
View this message in context:
http://www.nabble.com/combining-negation-in-query-tp20381550p20381550.html
Sent fr
FYI, SOLR-465 has been committed. Let us know if it improves your scenario.
-Yonik
On Wed, Nov 5, 2008 at 5:39 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Wed, Nov 5, 2008 at 5:18 PM, wojtekpia <[EMAIL PROTECTED]> wrote:
>> I'd like to integrate this improvement into my deployment. Is it ju
Ideally, it would be a configuration option.
Also, it would be great to have a hook to log or process an exception.
Steve
-Original Message-
From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED]
Sent: Thu 11/6/2008 11:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Large Data Set
On Nov 7, 2008, at 7:23 AM, [EMAIL PROTECTED] wrote:
Sorry, but I have one more question. Does the java client solrj
support facet.date?
Yeah, but it doesn't have explicit setters for it. A SolrQuery is
also a ModifiableSolrParams - so you can call the add/set methods on
it using the sam
Sorry, but I have one more question. Does the java client solrj support
facet.date?
QueryResponse knows the getFacetDates() method but I don't understand how to
set facet.date, facet.date.start, facet.date.end, and facet.date.gap for the
query. It seems that SolrQuery doesn't provide functions
Hi,
For batch indexing, what you could do is to use two core. One in
production and one used for your update.
Once your update core is build (delete *:* plus batch insert) , you
can swap the cores to put it in production:
http://wiki.apache.org/solr/CoreAdmin#head-928b872300f1b66748c85cebb12a59bb
Hi Solr-Users,
I am not sure but does there exist any mechanism where-in we can specify
solr as Batch and incremental indexing.
What I mean by batch indexing is solr would delete all the records which
existed in the index and will create an new index form the given data.
For incremental I want
On Fri, Nov 7, 2008 at 12:49 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Your problem is most likely the time it takes to facet on those
> multi-valued fields.
> Help is coming within the month I'd estimate, in the form of faster
> faceting for multivalued fields where the number of values per
>
Hi there,
I developed a personalized SearchComponent in which I'm building a
docset from a personalized Query, and a personalized Priority Queue.
To be short, I'm doing that (in the process method) :
HitCollector hitCol = new HitCollector() {
@Override
public void colle
Hi all,
I just want to use solar in a certain Knowledge Management System that I am
going to develop. I basically got pdfs and docs and these can be converted
into suitable forms via pdf box and poi frameworks. In the search function I
need to obtain data in such a way that I can give the sourc
Hi,
I think you could implement your personalized tokenizer in a way it
changes its behaviour after it has delivered X tokens.
This implies a new tokenizer instance is build from the factory for
every string analyzed, which I believe is true.
Can this be confirmed ?
Cheers !
Jerome.
On Thu
Team Lead
EC Software
__ Information from ESET NOD32 Antivirus, version of virus signature
database 3593 (20081107) __
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
__ Information from ESET NOD32 Antivirus, version of virus signature
Does anyone have solution for my problem?
I am doing index for Products. Each product can have multiple price (like
Government, Club users, Public and etc). This multiple price is not limited.
One product can have n number of price, depends on clients need. Now I need to
do index for all the p
43 matches
Mail list logo