That's interesting. I've been really reluctant to "hard code" n-grams, but 
it's probably the best way to go.

On Monday, March 9, 2015 at 6:12:43 PM UTC-4, John Wiseman wrote:
>
> One thing you can do is index 1, 2, 3...n-grams and use a simple & fast 
> key-value store (like leveldb etc.)  e.g., you could have entries like
>
> "aunt rhodie" -> song-9, song-44
> "woman" -> song-12, song-65, song-96
>
>
> That's basically how I made the Metafilter N-gram Viewer 
> <http://mefingram.appspot.com/>, a clone of Google Books Ngram Viewer 
> <https://books.google.com/ngrams>.
>
> Another possibility is using Lucene.  Just be aware that Lucene calls 
> n-grams of characters ("au", "un", "nt") n-grams but it calls n-grams of 
> words ("that the", "the old", "old gray") shingles.  So you would end up 
> using (I think, I haven't done this) the ShingleFilter 
> <https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html>
> .
>
> You might also find this article by Russ Cox interesting, where he 
> describes building and using an inverted trigram index: 
> http://swtch.com/~rsc/regexp/regexp4.html
>
>
> John
>
>
>
>
>
> Three things that you might find interesting:
>
> Russ Cox' explanation of doing indexing and retrieval with an inverted 
> trigram index: http://swtch.com/~rsc/regexp/regexp4.html
>
>
> On Sat, Mar 7, 2015 at 3:22 AM, Matching Socks <[email protected] 
> <javascript:>> wrote:
>
>> A lot of guys would use Lucene.  Lucene calls n-grams of words 
>> "shingles". [1]
>>
>> As for "architecture", here is a suggestion to use Lucene to find keys to 
>> records in your "real" database. [2]
>>
>> [1] https://lucidworks.com/blog/whats-a-shingle-in-lucene-parlance/
>>
>> [2] https://groups.google.com/d/msg/datomic/8yrCYxcQq34/GIomGaarX5QJ
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to [email protected] 
>> <javascript:>
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to