Hi Ahmet,
Ok. Thanks for your advice.
Regards,
Edwin
On 25 November 2017 at 10:23, Ahmet Arslan wrote:
>
>
> Hi Zheng,
>
> UAX29UET recognizes URLs and e-mails. It does not tokenize them. It keeps
> them single token.
>
> StandardTokenizer produce two or more tokens for an entity.
>
> Please t
Hi Rick,
For both of the tokenizers, it does not split on the hyphens for email like
this:
solr-user@lucene.apache.org
The entire email address remains intact for both of the tokenizers.
Regards,
Edwin
On 24 November 2017 at 20:19, Rick Leir wrote:
> Edwin
> There is a spec for which characte
Hi Zheng,
UAX29UET recognizes URLs and e-mails. It does not tokenize them. It keeps them
single token.
StandardTokenizer produce two or more tokens for an entity.
Please try them using the analysis page, use which one suits your requirements.
Ahmet
On Friday, November 24, 2017, 11:46:57 A
Erick,
thanks for explaining the memory aspects.
Regarding the end user perspective, our intention is to provide a first
layer of filtering, where data will be rolled up in some buckets and be
displayed in charts and tables.
When I told about provide access to "full" documents, it was not to displ
You need to play with the (many) parameters for WordDelimiterFilterFactory.
For instance, you have preserveOriginal set to 1. That's what's
generating the token with the dot.
You have catenateAll and catenateNumbers set to zero. That means that
someone searching for 61149008 won't get a hit.
The
Kojo:
bq: My question is, isn´t it to
expensive in terms of memory consumption to enable docValues on fields that
I dont need to facet, search etc?
Well, yes and no. The memory consumed is your OS memory space and a
small bit of control structures on your Java heap. It's a bit scary
that your _in
I Think that I found the solution. After analysis, change from /export
request handler to /select request handler in order to obtain other fields.
I will try that.
2017-11-24 15:15 GMT-02:00 Kojo :
> Thank you very much for your answer, Shawn.
>
> That is it, I was looking for another way to in
Thank you very much for your answer, Shawn.
That is it, I was looking for another way to include fields non docValues
to the filtered result documents.
I can enable docValues to other fields and reindex all if necessary. I will
tell you about the use case, because I am not sure that I am on the r
Yes. You are right. I understand now.
Let me explain my issue a bit better with the exact problem i have.
I have this text "Information number 61149-008."
Using the tokenizers and filters described previously i get this list of
tokens.
information
number
61149-008.
61149
008
Basically last token
On 11/23/2017 1:51 PM, Kojo wrote:
I am working on Solr to develop a toll to make analysis. I am using search
function of Streaming Expressions, which requires a field to be indexed
with docValues enabled, so I can get it.
Suppose that after someone finishes the analysis, and would like to get
o
On 11/24/2017 2:32 AM, marotosg wrote:
Hi Shaw.
Thanks for your reply. Actually my issue is with the last token. It looks
like for the last token of a string. It keeps the dot.
In your case Testing. This is a test. Test.
Keeps the "Test."
Is there any reason I can't see for that behauviour?
On 11/23/2017 11:31 PM, Leo Prince wrote:
We were using bit older version Solr 4.10.2 and upgrading to Solr7.
We have like 4mil records in one of the core which is of course pretty
huge, hence re-sourcing the index is nearly impossible and re-querying from
source Solr to Solr7 is also going to b
Hi,
yesterday I sent a message bellow to this list, but just after I sent the
message I received an e-mail from the mail server that said that my e-mail
bounced. I don´t know what that means, and since I receive no answer for
the question, I don´t know whether if the message has arrived to the lis
Edwin
There is a spec for which characters are acceptable in an email name, and
another spec for chars in a domain name. I suspect you will have more success
with a tokenizer which is specialized for email, but I have not looked at
UAX29URLEmailTokenizerFactory. Does ClassicTokenizerFactory spli
Hi Shaw.
Thanks for your reply. Actually my issue is with the last token. It looks
like for the last token of a string. It keeps the dot.
In your case Testing. This is a test. Test.
Keeps the "Test."
Is there any reason I can't see for that behauviour?
Thanks,
Sergio
Testing. This is a test.
Hi,
I am indexing email addresses into Solr via EML files. Currently, I am
using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I also
found that we can also use UAX29URLEmailTokenizerFactory with
LowerCaseFilterFactory.
Does anyone have any recommendation on which Tokenizer is bet
16 matches
Mail list logo