Even if the signal given by the histogram is not really a signal, it seems that spec.pgram can give an interesting evaluation of how the genes are spaced in the chromosome, like in the article. So now, when I study a chromosome with 200 interesting genes, I would like to compare the amplitude of the spectrum of the periodogram given by spec.pgram (applied on the histogram of the distances between genes) with another periodogram for a chromosome with 350 interesting genes. As the spectrum seems to be calculated approximately like fft(x)^2/N (according to the computation of pgram[] in spec.pgram but perhaps I am wrong) where x stands for the signal and N for the number of observations in the signal, I suppose I can compare to periodogram (and the values of the peaks) by dividing the value of the spectrum by N. I apply this kind of technique with a cosine signal like this: > x= 1:1000 > cox = cos (x) > spec.pgram (cox, log="no", taper=0.5) > x= 1:100 > cox = cos (x) > spec.pgram (cox, log="no", taper=0.5) and the two periodograms have an amplitude of the peak equals when we divide it by the corresponding Ns. Am I wrong if I use this technique to compare my periodograms?
Thanks in advance. Best regards, Anthony On Mon, Jun 16, 2008 at 8:37 PM, stephen sefick <[EMAIL PROTECTED]> wrote: > OK so this is what I think- The gaussian smoothing window is just like the > smooth curve that i suggested that you draw on top of the histogram (try > looking at ?density and on the R site search page). They take this and then > make the density = 1. I am not sure how this is done, but i am sure that > you could figure it out. I believe that what they are doing is still taking > the area under the curve at discrete "periodicities" (I would not call these > periodicities because there is not really a periodic part of this signal- > there is re-occurance, but not really periodicity) and that is what the > "power spectrum" is revealing. This is not a typical use of the fourier > transform, but may be valid. this signal is not stationary So I would > suggest using wavelet analysis, but it still does not seem be a classical > signal analysis problem- I would look at Price et al. in the reference > section to see if there is presidence for this type of analysis. But from > my experience in time domain to frequency domain problems this does not fit > the model of data that I have worked with, and therefore it may be a bias on > my part, but I would use the histogram as my justification for the distances > being significant. > Good Luck > > Stephen > > > On Mon, Jun 16, 2008 at 2:04 PM, stephen sefick <[EMAIL PROTECTED]> wrote: > >> i am reading the paper and trying to figure out what they are doing. At >> this point in time it looks like what they are doing is using the value at >> the top of the histogram bar as the value at the distance on the x-axis. >> they then use the equivalent of spec.pgram. My nearest approximation of >> what this does is that this analysis is integrating the area under the curve >> at a particular time. It doesn't seem that there is any periodicity in this >> data because of the fact that there isn't a real signal here- it is binned >> by distance between the genes. Not to say that spectral density is not >> valid, but is is not a periodicity that this analysis is look at rather an >> amount of power (area under the curve). I am not entirely sure that this is >> any more information than what is contained in the historgrams. Take a pen >> and draw from the top of each box starting on the left- This is the >> "signal" that is being analyzed. I need to read the rest of the paper and >> think about it a little bit more. If you have any ideas- pass them along. >> >> Stephen >> >> >> On Mon, Jun 16, 2008 at 12:14 PM, Anthony Mathelier < >> [EMAIL PROTECTED]> wrote: >> >>> OK, it seems like I do not succeed in expressing what I do, or want to >>> do. So, I give you the example that bring me to this kind of analysis. I >>> wrote the paper "Chromosomal periodicity of evolutionary conserved gene >>> pairs" (which you can download at >>> http://www.pnas.org/cgi/reprint/104/25/10559). In figure 2, they have a >>> histogram of distances between genes on a chromosome and they make a >>> discrete fourier transform analysis to exhibit a period of 117kb. They >>> explain how they did in the first paragraph of "Distributions of distances >>> and positions and fourier transform" (last page). I thought that this kind >>> of analysis was made by spec.pgram with a histogram. But perhaps I am wrong >>> because I really do not understand what they mean by "the histogram was >>> tranformed into a continuous probability density by using a Gaussian >>> smoothing window and normalizing the total density over the entire genome to >>> 1. A discrete Fourier transform of the data were computed from 0 to 1,000kb >>> by using a Tukey window to taper the end (ratio of 0.5 for tapered to >>> untapered length.". >>> I hope it explains better what I want to obtain from my distances. >>> Best regards, >>> >>> Anthony >>> >>> >>> On Mon, Jun 16, 2008 at 5:25 PM, stephen sefick <[EMAIL PROTECTED]> >>> wrote: >>> >>>> To get some sort of frequency which in your case seem to be cycles per >>>> distance? Is a valid use of a fourier transform as long as it is a >>>> distance >>>> that is measured in a way that would be analogous to a time series- In >>>> other words if the distance proceeds from an origin in one direction- >>>> geophysicists do this often with the realization of an earthquake picked up >>>> by sensors that are a distance away from the origin of the epicenter, but >>>> they are looking for coherencies in the signal from one place to the next >>>> in >>>> the frequency domain seperated by distance- this is called beam forming- >>>> They use the raw signal- by binning (making a histogram) the data you are >>>> loosing the signal- you are looking at frequency of occurance of certain >>>> values not for the underlying periodicities of the data (in time or >>>> space). You are fitting cos and isin function to you data to see if there >>>> is periodicity- the power is the integration of the convolution of this >>>> sin >>>> and cosine function with your data- It seems to me meaningless to preform >>>> this convolution agianst something that is not a signal (the histogram). >>>> If >>>> you want to use a frequency domain technique you have to have a frequency >>>> to >>>> investigate- a histogram does not have this- I is a frequency of >>>> occurance >>>> by bin size which is NOT what you want (your would have cycles/binlength >>>> that doesn't make any sense to me) to do this analysis on- You want a >>>> signal- dissolved oxygen curve, sunspot record, etc. through time, or >>>> distance as stated above- you are looking for the frequency of a waveform- >>>> Anyway, I may be misunderstanding- supply some code and explain the data >>>> otherwise this line of though- in my limited expertise- is a dead end, but >>>> agian I still don't know what it is that you are, exactly, trying to do- >>>> and >>>> what your dataset constits. I hope these ruminations help >>>> >>>> I recommend doing this analysis on the raw data- It doesn't matter that >>>> you don't have the same amount of data points- as long as both sets of data >>>> have circa ten times the length of (cycles/distance) what you want to >>>> detect- If things in your case are spaced by one meter then the lowest >>>> cycle perdistance that you can reliably detect if 0.5 meters, this is all >>>> speculation because you don't have a problem with reproducible code, and we >>>> have no idea what you are measuring or what your data looks like- without >>>> this information there is no way that I can say one way or the other that >>>> you approach (suggested non-histogram) would be right or wrong. >>>> >>>> Stephen >>>> >>>> >>>> On Mon, Jun 16, 2008 at 9:33 AM, Anthony Mathelier < >>>> [EMAIL PROTECTED]> wrote: >>>> >>>>> Perhaps I'm applying spec.pgram wrong as you said. I will explain what >>>>> I want, so you can tell me why I'm wrong and perhaps what I have to do to >>>>> do >>>>> it well. >>>>> I have some points in a 1-D space and I want to know if they are spaced >>>>> at a certain periodic distance. So, I computed all the distances between >>>>> points in my space. Then, I would like to know if a certain distance >>>>> (period), or multiples of a certain distance, is preferred to space my >>>>> data. >>>>> I made a histogram of the distances and apply the spec.pgram function to >>>>> know the frequence (so the period) which is the most important to space >>>>> the >>>>> original data. >>>>> But, when I have to sets of data (without necessarily the same number >>>>> of observation in each set), I want to compare the importance of the >>>>> period >>>>> given by spec.pgram between the sets. Could I normalize the amplitude of >>>>> the >>>>> peaks given by spec.pgram? >>>>> So, am I wrong to apply this methodology to exhibit a periodic distance >>>>> between my data? If, true, what could you recommend me to do this? >>>>> Thanks in advance for your answers. >>>>> Best regards, >>>>> >>>>> Anthony >>>>> >>>>> On Tue, Jun 10, 2008 at 6:13 PM, stephen sefick <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >>>>>> I from a first thought I would say that you are apply this wrong! The >>>>>> fourier transform convolves a function (cos(x)+isin(x) (this may not be >>>>>> the >>>>>> exact formula but I don't have my books near)) to the data and then >>>>>> integrates over -1/2 to 1/2 takes the modulus and plots this- the >>>>>> periodogram. The reason you preform a fourier transform is to look at >>>>>> recurring frequencies in the data, which are in the time domain. The >>>>>> fourier transform converts the time series into the frequency domain and >>>>>> viola you have a peak into the hidden/recurring parts of your signal. >>>>>> From >>>>>> your explaination your are applying this technique wrong- look at >>>>>> schumway, >>>>>> MASS4, et al. books to get a handle on how this technique is used. If >>>>>> you >>>>>> are to apply a time series analysis please use it on a time series. >>>>>> Maybe >>>>>> your logic is not flawed but I don't see how a histogram with its >>>>>> associated >>>>>> binning is a better candidate for time series analysis than the original >>>>>> time series if at all. >>>>>> good luck >>>>>> >>>>>> Stephen >>>>>> >>>>>> On Tue, Jun 10, 2008 at 8:49 AM, Matthieu Stigler < >>>>>> [EMAIL PROTECTED]> wrote: >>>>>> >>>>>>> Hello >>>>>>> >>>>>>> I don't know exactly what you want to do but: >>>>>>> >>>>>>> -why do you use in your example h$counts and not h? Furthermore helpl >>>>>>> file says it should be a time series, why then rather not your time >>>>>>> series? >>>>>>> >>>>>>> -usually na.action will make the "default" action, which you can see >>>>>>> by getOptions("na.action") >>>>>>> >>>>>>> -here in this function it is provided in the function values >>>>>>> na.action = na.fail so it will just remove the NA in the time series >>>>>>> >>>>>>> -if you want to study a function, I advise you to copy it entirely, >>>>>>> rename it and then just insert print(curiousobject...) in the function, >>>>>>> this >>>>>>> will allow you to let the function run and grasp the interessting >>>>>>> objects, >>>>>>> like: >>>>>>> >>>>>>> study<-function (x, spans = NULL, kernel = NULL, taper = 0.1, pad = >>>>>>> 0, fast = TRUE, demean = FALSE, detrend = TRUE, plot = TRUE, >>>>>>> na.action = na.fail, ...) >>>>>>> { >>>>>>> series <- deparse(substitute(x)) >>>>>>> x <- na.action(as.ts(x)) >>>>>>> print(x) >>>>>>> xfreq <- frequency(x) >>>>>>> ...} >>>>>>> study(sunspots) >>>>>>> >>>>>>> -when you provide an example, instead of giving an external reference >>>>>>> for the data, try to search a convenient internal data (accessed by >>>>>>> data() >>>>>>> ), so one will be able to reproduce your problems. Here you could use >>>>>>> sunspots >>>>>>> >>>>>>> -to obtain the commented code... I don't know it... >>>>>>> >>>>>>> -good luck >>>>>>> >>>>>>> Matthieu >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi everyone, >>>>>>>> >>>>>>>> first of all, I would like to say that I am a newbie in R, so I >>>>>>>> apologize in >>>>>>>> advance if my questions seem to be too easy for you. >>>>>>>> >>>>>>>> Well, I'm looking for periodicity in histograms. I have histograms >>>>>>>> of >>>>>>>> certain phenomenons and I'm asking whether a periodicity exists in >>>>>>>> these >>>>>>>> data. So, I make a periodogram with the function spec.pgram. For >>>>>>>> instance, >>>>>>>> if I have a histogram h, I call spec.pgram by spec.pgram (h, >>>>>>>> log="no", >>>>>>>> taper=0.5). So, I have some peaks that appear and I would like to >>>>>>>> interpret >>>>>>>> them but I do not know how they are computed and so what a peak with >>>>>>>> a value >>>>>>>> of 10000 represents in comparison with a peak of value 600 with >>>>>>>> another >>>>>>>> histogram. >>>>>>>> I looked at the source code of the function spec.pgram to better >>>>>>>> understand >>>>>>>> what is behind. But, when I apply the source code line by line, I've >>>>>>>> got a >>>>>>>> problem. For instance, I make: >>>>>>>> >>>>>>>> >>>>>>>>> >data = scan ("file.txt") >>>>>>>>> >h = hist (data, breaks=max(data)/5000) >>>>>>>>> >>>>>>>>> >>>>>>>> #then I apply the first two lines of the spec.pgram function >>>>>>>> >>>>>>>> >>>>>>>>> >series <- deparse(substitute(h$counts)) >>>>>>>>> >x <- na.action(as.ts(h$counts)) >>>>>>>>> >x >>>>>>>>> >>>>>>>>> >>>>>>>> NULL >>>>>>>> I do not understand why when I apply the first two lines of the >>>>>>>> function I >>>>>>>> have x which is equal to NULL (which make a mistake in the following >>>>>>>> lines >>>>>>>> of the code) but if I apply the function directly with h$counts it >>>>>>>> gives me >>>>>>>> a result. >>>>>>>> So, if someone can explain to me what is the problem and/or how >>>>>>>> spec.pgram >>>>>>>> exactly computes the periodogram and how to interpret it with my >>>>>>>> data, I >>>>>>>> would be so grateful. >>>>>>>> And subsidiary questions: >>>>>>>> - Is it possible to have the commented source code of the function? >>>>>>>> - I do not understand what is the function na.action in the second >>>>>>>> line of >>>>>>>> spec.pgram, so if you can explain it to me. >>>>>>>> >>>>>>>> Thanks in advance for your answers. >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Anthony Mathelier >>>>>>>> >>>>>>>> [[alternative HTML version deleted]] >>>>>>>> >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-help@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>>> http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Let's not spend our time and resources thinking about things that are >>>>>> so little or so large that all they really do for us is puff us up and >>>>>> make >>>>>> us feel like gods. We are mammals, and have not exhausted the annoying >>>>>> little problems of being mammals. >>>>>> >>>>>> -K. Mullis >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Let's not spend our time and resources thinking about things that are so >>>> little or so large that all they really do for us is puff us up and make us >>>> feel like gods. We are mammals, and have not exhausted the annoying little >>>> problems of being mammals. >>>> >>>> -K. Mullis >>>> >>> >>> >> >> >> -- >> Let's not spend our time and resources thinking about things that are so >> little or so large that all they really do for us is puff us up and make us >> feel like gods. We are mammals, and have not exhausted the annoying little >> problems of being mammals. >> >> -K. Mullis >> > > > > -- > Let's not spend our time and resources thinking about things that are so > little or so large that all they really do for us is puff us up and make us > feel like gods. We are mammals, and have not exhausted the annoying little > problems of being mammals. > > -K. Mullis > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.