Hi,
When I index chinese content using chinese tokenizer and analyzer in solr
1.3 ,some of the chinese text files are getting indexed but others are not.
Since chinese has got many different language subtypes as in standard
chinese,simplified chinese etc which of these does the chinese tokenizer
Hello.
I have been beating my head around the data-config.xml listed
at the end of this message. It breaks in a few different ways.
1) I have bodged TemplateTransformer to allow it to return
when one of the variables is undefined. This ensures my
uniqueKey is always defined. But thin
Shalin Shekhar Mangar wrote:
The implementation is a bit more complicated.
1. Read all tokens from the specified field in the solr index.
2. Create n-grams of the terms read in #1 and index them into a separate
Lucene index (spellcheck index).
3. When asked for suggestions, create n-grams of the
Hi Hoss,
Is it a problem if the snappuller miss one snapshot before the last one ??
Cheer,
Have a nice day,
hossman wrote:
>
> :
> : There are a couple queries that we would like to run almost realtime so
> : I would like to have it so our client sends an update on every new
> : document and
Hi
I would like to know if a snapshot is automaticly created even if there is
no document update or added ?
Thanks a lot,
--
View this message in context:
http://www.nabble.com/snapshot-created-if-there-is-no-documente-updated-new--tp22034462p22034462.html
Sent from the Solr - User mailing l
I guess , it should not be a problem
--Noble
On Mon, Feb 16, 2009 at 3:28 PM, sunnyfr wrote:
>
> Hi Hoss,
>
> Is it a problem if the snappuller miss one snapshot before the last one ??
>
> Cheer,
> Have a nice day,
>
>
> hossman wrote:
>>
>> :
>> : There are a couple queries that we would like to
15 feb 2009 kl. 20.15 skrev Yonik Seeley:
On Sat, Feb 14, 2009 at 6:45 AM, karl wettin
wrote:
Also, as my threadshold is based on the distance in score between the
first result it sounds like using a result start position greater
than
0 is something I have to look out for. Or?
Hmmm - th
On Mon, Feb 16, 2009 at 3:22 PM, Fergus McMenemie wrote:
> Hello.
>
> I have been beating my head around the data-config.xml listed
> at the end of this message. It breaks in a few different ways.
>
> 1) I have bodged TemplateTransformer to allow it to return
> when one of the variables is un
Hi,
Can we use multicore to have several indexes per webapp and use distributed
search to merge the indexes?
for exampe if we have 3 cores -core0 ,core1 and core2 for 3 different
languages and to search across all the 3 indexes
use the shard parameter as
shard=localhost:8080/solr/core0,localhost:
On Feb 16, 2009, at 12:05 AM, Pooja Verlani wrote:
Hi All,
I am interested in TermComponent addition in solr 1.4 (
http://wiki.apache.org/solr/TermsComponent). When
should we expect solr 1.4 to be available for use?
Also, can this Termcomponent be made available as a plugin for solr
1.3?
I'
On Feb 15, 2009, at 10:33 PM, Johnny X wrote:
Hi there,
I was told before that I'd need to create a custom search component
to do
what I want to do, but I'm thinking it might actually be a custom
analyzer.
Basically, I'm indexing e-mail in XML in Solr and searching the
'content'
fie
I recommend that you search both this and the
Lucene list. You'll find that this topic has been
discussed many times, and several approaches
have been outlined.
The searchable archives are linked to from here:
http://lucene.apache.org/java/docs/mailinglists.html.
Best
Erick
On Mon, Feb 16, 2009
Hi Noble,
So ok I don't mind really if it miss one, if it get the last one it's good.
I've was wondering as well if a snapshot is created even if no document has
been update?
Thanks a lot Noble,
Wish you a very nice day,
Noble Paul നോബിള് नोब्ळ् wrote:
>
> I guess , it should not be a probl
Hi,
Is it normal or did I miss something ??
5.8Gbook/data/snapshot.20090216153346
12K book/data/spellchecker2
4.0Kbook/data/index
12K book/data/spellcheckerFile
12K book/data/spellchecker1
5.8Gbook/data/
Last update ?
92562
45492
0
2009-02-16 15:20:01
2009-02-16 15:20:0
It change a lot in few minute ?? is it normal ? thanks
5.8Gbook/data/snapshot.20090216153346
4.0Kbook/data/index
5.8Gbook/data/
r...@search-07:/data/solr# du -h book/data/
5.8Gbook/data/snapshot.20090216153346
3.7Gbook/data/index
4.0Kbook/data/snapshot.20090216153759
9.4G
I would go for a business logic solution and not a Solr customization in
this case, as you need to filter information that you actually would like to
see in diferent fields on your index.
Did you already tried to split the email in several fields like subject,
from, to, content, signature, etc etc
Basically I'm working on the Enron dataset, and I've already de-duplicated
the collection and applied a spam filter. All the e-mails after this have
been parsed to XML and each field (so To, From, Date etc) has been
separated, along with one large field for the remaining e-mail content
(called Con
I think you essentially have to do much of the same work either
way, so take whatever comes easiest. Personally, I think
that pre-processing the data (and using two fields) would be
easiest, but it's up to you.
Using a custom analyzer would involve collecting all the contents,
deciding what is "re
>> On Sat, Feb 14, 2009 at 6:45 AM, karl wettin
>> wrote:
>>> Also, as my threadshold is based on the distance in score between the
>>> first result it sounds like using a result start position greater than
>>> 0 is something I have to look out for. Or?
>>
>> Hmmm - this isn't that easy in general
the logging used is changed j.u.l to slf4j . That is the only problem
I can see. If you drop in that jar as well it should just work
On Mon, Feb 16, 2009 at 6:49 PM, Grant Ingersoll wrote:
>
> On Feb 16, 2009, at 12:05 AM, Pooja Verlani wrote:
>
>> Hi All,
>> I am interested in TermComponent add
yes , it does . it just blindly creates hard links irrespective of a
document is added or not. but no snappull will happen because there is
no new file to be downloaded
On Mon, Feb 16, 2009 at 7:40 PM, sunnyfr wrote:
>
> Hi Noble,
>
> So ok I don't mind really if it miss one, if it get the last o
Hi,
Ok but can I use it more often then every day like every three hours,
because snapshot are quite big.
Thanks a lot,
Bill Au wrote:
>
> The --delete option of the rsync command deletes extraneous files from the
> destination directory. It does not delete Solr snapshots. To do that you
>
they are just hardlinks. they do not consume space on disk
On Mon, Feb 16, 2009 at 10:34 PM, sunnyfr wrote:
>
> Hi,
>
> Ok but can I use it more often then every day like every three hours,
> because snapshot are quite big.
>
> Thanks a lot,
>
>
> Bill Au wrote:
>>
>> The --delete option of the r
Hi,
I have an Input XML as
now for SOLR indexing converted it to
1
12-Feb-2009
1
NJ
safsafsd#sf08
Dev
1
NJ
CP
2
KL
080jnkdfhjwf
Int
0
080jnkdfhjwf
08dedf
I was able to index it. Just put this single x
Hi Noble,
But how come i've space error ?? :(
thanks a lot,
Feb 16 18:28:34 search-07 jsvc.exec[8872]: ataImporter.java:361) Caused by:
java.io.IOException: No space left on device ^Iat
java.io.RandomAccessFile.writeBytes(Native Method) ^Iat
java.io.RandomAccessFile.write(RandomAccessFile.java:
On Mon, Feb 16, 2009 at 11:47 PM, Adi_Jinx wrote:
>
> I was able to index it. Just put this single xml and searched based on
> rec.id and response xml returned however input xml tag order was not
> maintained. So I was unable to identify which attributes of account belongs
> to which account. Is
We have been trying to figure out how to construct, for example, a
directory page with an overview of available facets for several
fields.
Looking at the issue and wiki
http://wiki.apache.org/solr/TermsComponent
https://issues.apache.org/jira/browse/SOLR-877
It would seem like this component wou
Your request seems to be fine. Have you reindexed after setting
termOffsets definition
to document field?
Koji
Jeffrey Baker wrote:
I'm trying to exercise the termOffset functions in the nightly build
(2009-02-11) but it doesn't seem to do anything. I have an item in my
schema like so:
An
Sorry for budding in on this thread but what value is added by TermComponent
when you can use faceting for auto-suggest? And with faceting, you can
limit the suggestion by existing words before the word the user is typing by
using it for "q".
~ David Smiley
Pooja Verlani wrote:
>
> Hi All,
>
Hi Noble,
I maybe don't get something
Ok if it's hard link but how come i've not space left on device error and
30G shown on the data folder ??
sorry I'm quite new
6.0G/data/solr/book/data/snapshot.20090216214502
35M /data/solr/book/data/snapshot.20090216195003
12M /data/solr/book
Hi Noble,
I maybe don't get something
Ok if it's hard link but how come i've not space left on device error and
30G shown on the data folder ??
sorry I'm quite new
6.0G/data/solr/book/data/snapshot.20090216214502
35M /data/solr/book/data/snapshot.20090216195003
12M /data/solr/book
On Feb 16, 2009, at 6:13 PM, David Smiley @MITRE.org wrote:
Sorry for budding in on this thread but what value is added by
TermComponent
when you can use faceting for auto-suggest?
Yeah, you can do auto-suggest w/ faceting, no doubt. In fact the
TermComponent could just as well be cal
Shalin Shekhar Mangar wrote:
>
> On Mon, Feb 16, 2009 at 11:47 PM, Adi_Jinx wrote:
>
> How about creating a Solr document for each account and adding the recid
> and
> updt attributes from the record tag?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
>
However then I do need to allow dupl
Hi,
That should work, yes, though it may not be a wise thing to do
performance-wise, if the number of CPU cores that solr server has is lower than
the number of Solr cores.
Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: revathy arun
Hi,
While some of the characters in simplified and traditional Chinese do differ,
the Chinese tokenizer doesn't care - it simply creates ngram tokens.
Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: revathy arun
To: solr-user@lucene.
Hi,
The best option would be to identify the language after parsing the PDF and
then index it using an appropriate analyzer defined in schema.xml.
Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: revathy arun
To: solr-user@lucene.apac
Siddharth,
At the end of your email you said:
"One option I see is to break the file in chunks, but with this, I won't be
able to search with multiple words if they are distributed in different
documents."
Unless I'm missing something unusual about your application, I don't think the
above is
Hi,
Wouldn't this be as easy as:
- split email into "paragraphs"
- for each paragraph compute signature (MD5 or something fuzzier, like in
SOLR-799)
- for each signature look for other emails with this signature
- when you find an email with an identical signature, you know you've found the
"ban
Otis,
I haven't tried it yet but what I meant is :
If we divide the content in multiple parts, then words will be splitted in two
different SOLR documents. If the main document contains 'Hello World' then
these two words might get indexed in two different documents. Searching for
'Hello world
Siddharth,
But does your 150MB file represent a single Document? That doesn't sound right.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: "Gargate, Siddharth"
To: solr-user@lucene.apache.org
Sent: Tuesday, February 17, 2009 12:39:53
Hello,
I am using getResults method of queryResponse class, on a keyword that has
more than hundred of matching records. Bit this method returns me only 10
results. And then throw an array index out of bound exception.
how can I fetch all the results?
Its really important and urgent for me ,
Increment the start value by 10 and make another request.
wunder
On 2/16/09 9:13 PM, "Neha Bhardwaj" wrote:
> Hello,
>
> I am using getResults method of queryResponse class, on a keyword that has
> more than hundred of matching records. Bit this method returns me only 10
> results. And then
The hardlinks will prevent the unused files from getting cleaned up.
So the diskspace is consumed for unused index files also. You may need
to delete unused snapshots from time to time
--Noble
On Tue, Feb 17, 2009 at 5:24 AM, sunnyfr wrote:
>
> Hi Noble,
>
> I maybe don't get something
> Ok if it
On Tue, Feb 17, 2009 at 10:26 AM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:
> Siddharth,
>
> But does your 150MB file represent a single Document? That doesn't sound
> right.
>
Otis, Solrj writes the whole XML in memory before writing it to server. That
may be one reason behind Sidhh
Right. But I was trying to point out that a single 150MB Document is not in
fact what the o.p. wants to do. For example, if your 150MB represents, say, a
whole book, should that really be a single document? Or should individual
chapters be separate documents, for example?
Otis
--
Sematext --
Ralf,
Not sure if you got this working or not, but perhaps a simple solution is
changing the default boolean operator from OR to AND.
Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: "Kraus, Ralf | pixelhouse GmbH"
To: solr-user@lucen
46 matches
Mail list logo