[Tutor] regarding minhash and lsh

2019-02-11 Thread lokesh kumar
Hi There,
i want to make a code to run few DNA seg. so that i will be able to find
similarity in them. file are in million as well as seq. are large so i
tried developing program but fails in it i think minhash and lsh can able
to solve my problem.
i need kind of program that will be easy to handle.

from scipy.spatial.distance import cosine
from random import randint
import numpy as np
N = 128
max_val = (2**32)-1

perms = [ (randint(0,max_val), randint(0,max_val)) for i in range(N)]
vec = [float('inf') for i in range(N)]

def minhash(s, prime=4294967311):
  '''
  Given a set `s`, pass each member of the set through all permutation
  functions, and set the `ith` position of `vec` to the `ith` permutation
  function's output if that output is smaller than `vec[i]`.
  '''

  vec = [float('inf') for i in range(N)]

  for val in s:


if not isinstance(val, int): val = hash(val)


for perm_idx, perm_vals in enumerate(perms):
  a, b = perm_vals


  output = (a * val + b) % prime
   if vec[perm_idx] > output:
vec[perm_idx] = output

return vec
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regarding minhash and lsh

2019-02-11 Thread Alan Gauld via Tutor
On 11/02/2019 09:13, lokesh kumar wrote:

> i want to make a code to run few DNA seg. so that i will be able to find
> similarity in them. file are in million as well as seq. are large so i
> tried developing program but fails in it i think minhash and lsh can able
> to solve my problem.

Bear in mind that this is a general programming forum and
relatively few of us are from a scientific background and
even fewer work with DNA sequences. So I'm glad you have
an idea about your solution but have no idea what minhash
or lsh are, let alone whether they will help you.

Also you may find more people who understand your
work area on the scipy forums.

> i need kind of program that will be easy to handle.

What does that mean? Easy to operate? Easy to maintain?
Easy to distribute to others? All of the above?
Or something else?

As for your code, I'm not sure what that is for?
Do you want us to critique it?
Or is there a problem?
If so you will need to describe the issue and include
any error messages.

At the moment it just defines a function which is
never called...

> from scipy.spatial.distance import cosine
> from random import randint
> import numpy as np
> N = 128
> max_val = (2**32)-1
> 
> perms = [ (randint(0,max_val), randint(0,max_val)) for i in range(N)]
> vec = [float('inf') for i in range(N)]
> 
> def minhash(s, prime=4294967311):
>   '''
>   Given a set `s`, pass each member of the set through all permutation
>   functions, and set the `ith` position of `vec` to the `ith` permutation
>   function's output if that output is smaller than `vec[i]`.
>   '''
>   vec = [float('inf') for i in range(N)]
> 
>   for val in s:
> if not isinstance(val, int): val = hash(val)
> 
> for perm_idx, perm_vals in enumerate(perms):
>   a, b = perm_vals
>   output = (a * val + b) % prime
>if vec[perm_idx] > output:
> vec[perm_idx] = output
> 
> return vec

Notice that the last if statement appears to
be incorrectly indented. But that may just be an
email glitch...

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regarding minhash and lsh

2019-02-11 Thread Alan Gauld via Tutor
On 11/02/2019 10:10, Alan Gauld via Tutor wrote:

>> def minhash(s, prime=4294967311):
>>   vec = [float('inf') for i in range(N)]
>>
>>   for val in s:
>> if not isinstance(val, int): val = hash(val)
>>
>> for perm_idx, perm_vals in enumerate(perms):
>>   a, b = perm_vals
>>   output = (a * val + b) % prime
>>if vec[perm_idx] > output:
>> vec[perm_idx] = output
>>
>> return vec
> 
> Notice that the last if statement appears to
> be incorrectly indented. But that may just be an
> email glitch...

I just noticed that the return statement appears
to be inside the outer for loop so that loop
will only run once.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Putting a Bow on It

2019-02-11 Thread Chip Wachob
Thanks.  These are both great helps to get me started.

The little bit of searching does leave me a little bit confused, but the
reference to the book is somewhat helpful / encouraging.

I see a lot of people saying that certain approaches have been depreciated,
then re-appreciated (?) then depreciated once more and so on..  that sure
makes it confusing to me.  Unfortunately since I'm using someone's pre-made
libraries, and that requires 2.7, I'm sort of locked at that version, but
it seems like most, if not all, of these options will work for any version
of Python.

These posts give me some keywords that should help me narrow the field a
bit.

I realize that choosing a tool is always a case of personal preference.  I
don't want to start a 'this is better than that' debate.

If the 'pros' out there have more input, I'm all ears.

Best,






On Sun, Feb 10, 2019 at 7:11 AM Albert-Jan Roskam 
wrote:

>
>
> On 8 Feb 2019 19:18, Chip Wachob  wrote:
>
> Hello,
>
> I've been off working on other projects, but I'm finally back to the
> project that so many here have helped me work through.  Thank you to the
> group at large.
>
> So, this leads me to my question for today.
>
> I'm not sure what the "correct" term is for this, but I want to create
> what
> I'll call a Package.
>
> I want to bundle all my scripts, extra libraries, etc into one file.  If I
> can include a copy of Python with it that would be even better.
>
>
>
> ==》 Hi, also check out pyscaffold or the similar cookiecutter to generate
> "package skeletons". Directory structure, default setup.py, Readme.md file
> template, etc etc. I've never used them, but py2exe or cx_freeze might also
> interest you.
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Putting a Bow on It

2019-02-11 Thread Mats Wichmann
On 2/11/19 6:48 AM, Chip Wachob wrote:
> Thanks.  These are both great helps to get me started.
> 
> The little bit of searching does leave me a little bit confused, but the
> reference to the book is somewhat helpful / encouraging.
> 
> I see a lot of people saying that certain approaches have been depreciated,
> then re-appreciated (?) then depreciated once more and so on..  that sure
> makes it confusing to me.  Unfortunately since I'm using someone's pre-made
> libraries, and that requires 2.7, I'm sort of locked at that version, but
> it seems like most, if not all, of these options will work for any version
> of Python.
> 
> These posts give me some keywords that should help me narrow the field a
> bit.
> 
> I realize that choosing a tool is always a case of personal preference.  I
> don't want to start a 'this is better than that' debate.
> 
> If the 'pros' out there have more input, I'm all ears.

I'm having the same problems, everybody seems to have an idea of what is
state of the art, and they don't often agree. And sadly, people do not
always date their blog entries so you can eliminate what is too old to
be useful to a "newbie" (a category I fall into with packaging)

There are really two classes of solution:

base tools for manually packaging.  The Python Packaging Authority is
supposed to be definitive for what the state of these is.

smart systems which automate some or all of the steps.  These are often
labeled with some sort of hypelabel - Python packaging finally done
right or some such.  (I've tried a couple and they have failed utterly
for the project I want to redo the packaging on. My impression is these
will usually fail if your project is not meant to be imported)


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Putting a Bow on It

2019-02-11 Thread Alan Gauld via Tutor
On 11/02/2019 13:48, Chip Wachob wrote:

> I realize that choosing a tool is always a case of personal preference.  I
> don't want to start a 'this is better than that' debate.
> 
> If the 'pros' out there have more input, I'm all ears.
To be fair this is not just a Python problem but applies
to almost all languages. Java and Smalltalk probably come
closest to having a solution that genuinely works
across multiple platforms.

Three is a lot of work going on in Python land but no universal
solution. Some things work better on particular platforms.
And building library packages is easier than complete applications.

But for now, if you have more than a pure Python application,
you are pretty much stuck with building multiple solutions with
multiple tools.

I think ;-)

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Putting a Bow on It

2019-02-11 Thread Chip Wachob
Mats,

You put just the right words to my difficulties.  Thank you.

Since I last posted, I attempted to use Setuptools, and got a handful of
files that were less than 1kB.  I also attempted to use py2exe (I know this
is only for Windoze, but I wanted to find some sliver of success) and
py2exe does not like the fact that I have Python 2.7.15 installed (which I
am locked to).  I tried using pip to install py2exe==0.6.9 (a version that
says it supports Python 2.7) but pip is telling me that it can't find any
version of Python 2.7.

I'm trying to make the installation of the script / executable as simple as
possible because I know those who will be using it will NOT be Python savvy
in the remotest way.

Thanks for confirming that I'm not simply going mad...

Best,


On Mon, Feb 11, 2019 at 11:30 AM Mats Wichmann  wrote:

> On 2/11/19 6:48 AM, Chip Wachob wrote:
> > Thanks.  These are both great helps to get me started.
> >
> > The little bit of searching does leave me a little bit confused, but the
> > reference to the book is somewhat helpful / encouraging.
> >
> > I see a lot of people saying that certain approaches have been
> depreciated,
> > then re-appreciated (?) then depreciated once more and so on..  that sure
> > makes it confusing to me.  Unfortunately since I'm using someone's
> pre-made
> > libraries, and that requires 2.7, I'm sort of locked at that version, but
> > it seems like most, if not all, of these options will work for any
> version
> > of Python.
> >
> > These posts give me some keywords that should help me narrow the field a
> > bit.
> >
> > I realize that choosing a tool is always a case of personal preference.
> I
> > don't want to start a 'this is better than that' debate.
> >
> > If the 'pros' out there have more input, I'm all ears.
>
> I'm having the same problems, everybody seems to have an idea of what is
> state of the art, and they don't often agree. And sadly, people do not
> always date their blog entries so you can eliminate what is too old to
> be useful to a "newbie" (a category I fall into with packaging)
>
> There are really two classes of solution:
>
> base tools for manually packaging.  The Python Packaging Authority is
> supposed to be definitive for what the state of these is.
>
> smart systems which automate some or all of the steps.  These are often
> labeled with some sort of hypelabel - Python packaging finally done
> right or some such.  (I've tried a couple and they have failed utterly
> for the project I want to redo the packaging on. My impression is these
> will usually fail if your project is not meant to be imported)
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor