Sent from my iPhone
> On 9 Oct 2016, at 11:05 pm, John Hearns <john.hea...@xma.co.uk> wrote: > > This sounds interesting. > > I would question this step though, as it seems intrinically a bottlneck: >> Removes duplicate lines: >> $ sort filename.txt | uniq > > First off - do you need a sorted list? If not do not perform the sort... > > Secondly - how efficient is the Unix 'uniq' utility? The command could be collapsed to sort -u Gnu sort is quite efficient I think. > How about, every time you generate a string, check if it has already been > generated. I guess this then starts to depend on the efficiency of searching > through a file. Assuming you're using a binary tree to store those, you're effectively sorting the list anyway. Probably with O(N log N) execution time. > Thinking about this problem, would it be better to allocate each MPI rank a > unique set of string to generate? > Say there are N characters in the set A-Z a-z 0-9 plus symbols . This is > likely to be 255 as I'm assuming you are talking the ASCII character set. > So then if you have 256 MPI ranks then allocate rank 1 to generate all > strings commencing with 'A' then rank 2 'B' etc. > If you are lucky enough to have 256x256 cores to allocate then it goes 'AA' > 'AB' 'AC' etc. etc. That's no longer random though is it? You're absolutely fixing the frequency of the first letter. The more you do that, the less random your strings will become. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf