Re: Integrating word2vec and glove results into Solr

Benedict Holland Tue, 30 Oct 2018 16:06:02 -0700

Thanks Doug.

It is funny that you should mention that. It is very hard trying to
convince people that just because words are somehow related, we really
don't know how they are related. This is especially true when they are
handed the results of a shallow neural net that took a research team a few
weeks to put together.


I am always happy to have the reminder about common and rare words.
Honestly, I am not that happy with the size of our corpus but it might be
just enough. Alternatively, we weight the results of the embeddings really
low for the search engine when it comes to displaying most relevant to
least.

Oh, given the lack of text being a problem, is there a problem with doing
this on twitter data? I assume that running vector relationships over
Twitter data is probably not going to do much.

Thank you so much for the feedback.
~Ben



On Tue, Oct 30, 2018 at 5:59 PM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> You may already know this, but just be very careful. Embeddings are useful,
> but people often think of them as detecting synonyms, but really just
> encode contexts. For example antonyms and words with similar functions
> often are seen as similar.
>
> There's also issues with terms that occur in sparsely (you don't get enough
> contexts to get a good embedding)
> and issues with terms that occur very commonly (they tend to clump together
> despite different meanings)
>
> Older form of embedding, but the lessons still apply
>
> https://opensourceconnections.com/blog/2016/03/29/semantic-search-with-latent-semantic-analysis/
>
> I'd also recommend my talk at Activate that spends a ton of time on
> building/customizing embeddings for your use case
>
> https://docs.google.com/presentation/d/1-nPKX5VYUR7uue5IL0tm7M2YH0agb0aRO1y9sMKl1Hs/edit#slide=id.g3abdd68a3e_0_192
>
> -Doug
>
> On Tue, Oct 30, 2018 at 5:37 PM Benedict Holland <
> benedict.m.holl...@gmail.com> wrote:
>
> > Oh very cool. I will have to look into this more. This is something up
> and
> > coming I take it?
> >
> > Thanks,
> > ~Ben
> >
> > On Tue, Oct 30, 2018 at 4:36 PM Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> > > Simon Hughes presentation on just finished Activate may be relevant:
> > >
> > >
> >
> https://www.slideshare.net/SimonHughes13/vectors-in-search-towards-more-semantic-matching
> > > The video will be available in a couple of weeks, I am guessing from
> > > LucidWorks channel.
> > >
> > > Related repos:
> > > *) https://github.com/DiceTechJobs/VectorsInSearch
> > > *) https://github.com/DiceTechJobs/ConceptualSearch (older)
> > > *) https://github.com/kojisekig/word2vec-lucene - something else quite
> > old
> > >
> > > These are just keyword matches on your question. I am sure others may
> > > have some more real details.
> > >
> > > Regards,
> > >    Alex.
> > > On Tue, 30 Oct 2018 at 16:09, Benedict Holland
> > > <benedict.m.holl...@gmail.com> wrote:
> > > >
> > > > Hello all,
> > > >
> > > > We came up with a fascinating question. We actually have for our
> > corpora,
> > > > word2vec, doc2vec, and GloVe results. Is it possible to use these
> > > datasets
> > > > within the search engine? If so, could you please point me to
> > > documentation
> > > > on how to get Solr to use them?
> > > >
> > > > Thank you so much,
> > > > ~Ben
> > >
> >
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug
>

Re: Integrating word2vec and glove results into Solr

Reply via email to