Maybe in the 32-bit compile, a value is stored in a 64-bit register, and when it gets "robbed" (to populate the missing value for an adjacent variable) the 32 bits of backfill are taken, so the remaining value is good; but in a 64-bit compile, all 64 bits are taken so the remaininder is rubbish. It would depend on both the compiler and the hardware, and the takeaway is to not do that :-) Peter
On 8/6/08, Gus Correa <[EMAIL PROTECTED]> wrote: > > Hi Ricardo, David, Mark, and list > > If as Ricardo says, he suppressed the 5th parameter ("use_work") on the > call > to rfftwnd_f77_mpi, which has 6 parameters, wouldn't it start mismatching > pointers > on the 5th parameter, instead of on the 2nd parameter ("n_fields")? > I.e. "use_work" would take the value of "FFTW_NORMAL_ORDER", > and "FFTW_NORMAL_ORDER" would get a random value (OS permitting), > but the initial 4 parameters would be correct, right? > In any case, there is little difference between this and what David said, > the point of failure is different, the nature is the same. > > However, it is interesting that somehow > at runtime the program segfaults in 64-bits, but doesn't fail in 32-bits, > although it most likely computes wrong stuff. > Ricardo have you ever QCd' the 32-bit output before you fixed/inserted > "use_work"? > If you were in a big lucky strike the random value left on the > FFTW_NORMAL_ORDER > address matched your needs, and the result may be correct! :) > > Anyway, somehow the program seems to behave differently, > with the OS superego being more compliant (in a nasty sense) in 32-bits > than it is in 64-bits. > Does the OS paradoxically give less memory room for the stack in 64-bits, > leading to the segfault? > Or does it give the same room, but because the pointers are bigger the > segfault is more likely? > Or does the segfault happen somewhere else, not on the stack? > Where? > Why in 64-bits? > Why not in 32 bits? > > Yes, as David noted about programming, here I also got and continue to get > these bugs, > particularly in Fortran programs where no parameter checking is enforced. > And the nastier ones are those that don't segfault, > then come back to haunt you when somebody looks at the output, > if you are not careful enough to look at it before anybody else does. > > Cheers, > Gus Correa > > Compilar e' preciso, > rodar e' impreciso! > > ... mais uma do vosso alter-ego P'ssoa ... :) > > -- > --------------------------------------------------------------------- > Gustavo J. Ponce Correa, PhD - Email: [EMAIL PROTECTED] > Lamont-Doherty Earth Observatory - Columbia University > P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA > --------------------------------------------------------------------- > > > Lombard, David N wrote: > > On Tue, Aug 05, 2008 at 02:57:42AM -0700, Ricardo Reis wrote: >> >> >>> On Mon, 4 Aug 2008, Mark Kosmowski wrote: >>> >>> >>> >>>> So, why did the 32-bit test case work? Shouldn't the same problem >>>> crash both systems if it is a code issue? >>>> >>>> >>> >> Not necessarily given the error described below. >> >> >> >>> I asked the same question myself... The function interface is: >>> >>> call rfftwnd_f77_mpi(plan_c2r, & >>> 1, local_data, work, use_work, FFTW_NORMAL_ORDER) >>> >>> where use_work is an integer, value 1 if you use the work temporary >>> array, 0 otherwise. This was the variable I wasn't passing. >>> >>> >> ... >> >> >>> The wrapper function for this is (from rfftw_f77_mpi.c): >>> >>> void F77_FUNC_(rfftwnd_f77_mpi,RFFTWND_F77_MPI) >>> (rfftwnd_mpi_plan *p, int *n_fields, fftw_real *local_data, >>> fftw_real *work, int *use_work, int *ioutput_order) >>> >>> >> >> >> >>> .... So it must be a pointer issue revealed by the 64 bit, no? When I >>> wasn't doing it "properly" the value of *ioutput_order wasn't set. >>> >>> >> >> The value of the first element of local_data was used for the n_fields >> scalar. >> >> The work array was being laid down starting at the location of the >> use_work scalar. >> >> The FFTW_NORMAL_ORDER value was being interpreted as use_work scalar. >> >> Finally, ioutput_order scalar was some random value. >> >> So, a lot was going wrong there. It's just one of life's little, um, >> pleasures >> that it looked like it was working for your 32-bit test case. Don't >> worry, you'll >> likely do this again, as likely *every* one of us on this list has, too. >> >> BTW, Fortran passes by reference; that's why all args are pointers. >> >> >> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf