I haven’t removed stopwords since 1996, when I joined Infoseek. What is your
special case where you must remove them?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jun 22, 2019, at 9:51 PM, akash jayaweera
> wrote:
>
> Hello Walter,
>
> Thank y
Hello Walter,
Thank you for the reply.
But for some of my use-case I need to identify stopword. So I need a better
way to identify domain specific stopwords. I used TF-IDF to identify
stopwords. But it has the issue I mentioned above.
Regards,
*Akash Jayaweera.*
E akash.jayawe...@gmail.com
M +
Don’t remove stopwords. That was a useful hack when we were running search
engines on 16-bit machines. These days, it causes more problems than it solves.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jun 22, 2019, at 8:14 PM, akash jayaweera
> w
Hello All,
I'm trying to identify stopwords for a non-English corpus using TF-IDF
score. I calculated the score for each unique term in the corpus. But my
question is how can I select stopwords using the score.
For example if we have a corpus of football, term "football" get the lowest
TF-IDF score
Hello,
Do you think backing up and restoring separate shards of collections with
implicit routing might be useful?
I suppose it might work of certain multitenancy scenarios: when many small
indices is created once but might not be used then for a long time.
--
Sincerely yours
Mikhail Khludnev
FWIW, fixed in 8.2.
Thanks, Colvin!
On Wed, Jun 12, 2019 at 5:30 PM Colvin Cowie
wrote:
> I realize that attachments might not work on the mailing list, so here is
> the test case on Drive
>
> https://drive.google.com/file/d/0B7mypFpwbHptTE5nZE0weURFOExFSHphRFlUV0EyTElaOC0w/view?usp=sharing
>
>
Matheo Software Info wrote:
> My question is very simple ☺ I would like to know if Solr can process
> around 30To of data (Pdf, Text, Word, etc…) ?
Simple answer: Yes. Assuming 30To means 30 terabyte.
> What is the best way to index this huge data ? several servers ?
> several shards ? other ?