This sounds interesting.

I would question this step though, as it seems intrinically a bottlneck:
> Removes duplicate lines:
> $ sort filename.txt | uniq

First off - do you need a sorted list?  If not do not perform the sort...

Secondly - how efficient is the Unix 'uniq' utility?
How about, every time you generate a string, check if it has already been 
generated.  I guess this then starts to depend on the efficiency of searching 
through a file.

Thinking about this problem, would it be better to allocate each MPI rank a 
unique set of string to generate?
Say there are N characters in the set A-Z a-z 0-9 plus symbols . This is likely 
to be 255 as I'm assuming you are talking the ASCII character set.
So then if you have 256  MPI ranks then allocate rank 1 to generate all strings 
commencing with 'A'  then rank 2 'B' etc.
If you are lucky enough to have 256x256 cores to allocate then it goes 'AA' 
'AB' 'AC'  etc. etc.

Which further makes me think - why generate random strings in the first place? 
If you want a sorted list, then set the MPI ranks off to generate sequential 
strings,
and as I say make sure there is no duplicated effort somehow by allocating 
unique subsections to each rank.
All you have to do is gather in the already sorted lists at the end, starting 
at AA , AB etc etc
This is quite thought provoking - maybe the random approach IS more efficient 
???

Thinking further though, if you have processors with AVX externsions then is 
there an efficient way to bring those to bear on the problem - 256 and 512 
vector lengths mapping rather well to ASCII 255 characters.

I know this is an exercise in learning MPI, but I also idly winder if the 
Micron Automata could do this rather well
http://www.micronautomata.com/




________________________________________
From: Beowulf [beowulf-boun...@beowulf.org] on behalf of Skylar Thompson 
[skylar.thomp...@gmail.com]
Sent: 08 October 2016 15:20
To: dar...@wisecorp.co.uk
Cc: beowulf@beowulf.org
Subject: Re: [Beowulf] Generation of strings MPI fashion..

If you haven't used MPI before, I would break this up into chunks:

1. Write a MPI program that gathers a fixed amount of /dev/urandom
(Brian's suggestion is wise) data and sends it back to the master. This
will get you used to the basic MPI_Send/MPI_Recv commands.

2. Use the same program, but use MPI_Gather rather than MPI_Recv to
assemble the final array. That will get you used to collective
communication.

3. Find a parallel sorting algorithm and implement it. You'll still need
to have rank 0 do some work, but the point of the algorithm would be to
reduce the amount of work it has to do. I found a good description of
parallel merge sort here:

http://penguin.ewu.edu/~trolfe/ParallelMerge/ParallelMerge.html

Good luck!

Skylar

On 10/07/2016 10:19 AM, Darren Wise wrote:
> Heya folks,
>
> This may seem really simple to some and it is fairly simple from the
> terminal using bash of generation and sorting (examples below)
>
> What I would like to do, to get started with making a program to run in
> MPI on my cluster.. I thought of a fairly simple bash script and this
> works fine for a single machine but what is the best way to go around it
> or converting this simple notion in to an MPI runable command.
>
> Generates random strings of chars:
> $ tr -cd '[upper:]' < /dev/random | fold -w9 | head -c${1:-1000} | tee
> somefile.txt
>
> Removes duplicate lines:
> $ sort filename.txt | uniq
>
> This is fine for generation as I mention for a single machine, but
> what's the best way to turn this around and use MPI with a shared NFS
> mounted filesystem..
>
> While this is an example im going to generate a bash script to allow me
> to generate every possible permutation of a desired string length along
> with types of chars a-z, A-Z, 0-9 along with symbols..
>
> So any advice is welcome really and it's just an educational way for me
> to transpire into generating code that can be used within my small beowulf.
>
>> Kind regards,
>> Darren Wise Esq.
>>
>> www.wisecorp.co.uk <http://www.wisecorp.co.uk>
>> www.wisecorp.co.uk/babywise <http://www.wisecorp.co.uk/babywise>
>> www.darrenwise.co.uk <http://www.darrenwise.co.uk>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf
Any views or opinions presented in this email are solely those of the author 
and do not necessarily represent those of the company. Employees of XMA Ltd are 
expressly required not to make defamatory statements and not to infringe or 
authorise any infringement of copyright or any other legal right by email 
communications. Any such communication is contrary to company policy and 
outside the scope of the employment of the individual concerned. The company 
will not accept any liability in respect of such communication, and the 
employee responsible will be personally liable for any damages or other 
liability arising. XMA Limited is registered in England and Wales (registered 
no. 2051703). Registered Office: Wilford Industrial Estate, Ruddington Lane, 
Wilford, Nottingham, NG11 7EP
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to